Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document and implement doubly-linked implementation of prop (see #1040) #1059

Merged
merged 12 commits into from
Feb 13, 2024

Conversation

blefloch
Copy link
Member

There remains \prop_show:N and log, which means that we cannot really
test the code seriously yet. However, it is already exercised as
part of the implementation of \prop_concat:NNN and the from_keyval
functions.

@blefloch blefloch added expl3 enhancement New feature or request labels Feb 28, 2022
@josephwright
Copy link
Member

Any news on this: I think 'large' props would be a great addition (cf. work on tagging for example).

@car222222
Copy link
Contributor

What about "props as hash-table" (since this seems to be in use nearly in the kernel by now)??

@blefloch
Copy link
Member Author

I decided to rebase locally on the main branch then force-pushed rather than having a pretty painful merge. I also squashed the commits into a single one because there was no natural way to split up the code changes. For now the "added" date is today 2024-01-15 to make l3doc happy. At long last, this pull request is ready to be reviewed and merged, I think.

@blefloch blefloch marked this pull request as ready for review January 15, 2024 12:57
@blefloch blefloch added the optimization Optimizations and small tweaks label Jan 15, 2024
% \cs[no-index]{prop_map_\ldots{}}.
%
% \item
% The \enquote{linked} storage method is meant for property lists with a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
% The \enquote{linked} storage method is meant for property lists with a
% The \enquote{linked} storage method is meant for property lists with

Copy link
Member

@josephwright josephwright left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me at the code level - lets see what everyone thinks interface-wise.

Copy link
Member

@FrankMittelbach FrankMittelbach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, but I'm not through reviewing. Any, here are some first comments on the interface section

% using \cs{prop_new:N} (\enquote{flat} storage) or \cs{prop_new_large:N}
% (\enquote{linked} storage). Once a property list is declared with
% \cs{prop_new:N} or \cs{prop_new_large:N}, the type of internal data storage
% can no longer be changed. All other \pkg{l3prop} functions transparently
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While the current implementation does not over a conversion from flat to linked it should be technically simple to provide, and we should consider to offer it. Use case: property lists that are define as flat in the kernel or in a package but that in certain circumstances, e.g., when a special package is loaded, grow a lot compared to the default case. In that case it would be nice if there is a mechanism to convert from one form to another even if the original declaration happened elsewhere and the list has already entries.

More or less syntactic sugar given that the prop_set_eq:NN will basically do the work, but I think it deserves a defined name.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Easy to provide but (1) what name? (2) for local props it's probably annoying if people try to change the storage type within a group, I'll have to see how much bookkeeping is needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would agree that changing this “type” only locally does not sound like a good idea.

Name: \prop_make_large:N

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that \prop_set_eq:NN will do the conversion if needed (both ways).
So it just needs a small extension to cope with the name staying the same??

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

\prop_make_...:N is a good name. I also thought of \prop_renew_...:N but it might suggest that the process would give an empty prop like \prop_new_...:N. Regardless of name we need both directions I suppose.

There is a bit more work than for \prop_set_eq:NN because some auxiliary structures have to be added / deleted.

%
% \begin{function}[added = 2024-01-15]{\prop_new_large:N, \prop_new_large:c}
% \begin{syntax}
% \cs{prop_new_large:N} \meta{property list}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One can say that "linked" is an implementation detail and "large" is the intended use, but initially linked lists might always stay small and flat ones might grew very large. And given that "linked" also shows up in the documentation and in error messages (I think) I wonder if it would be better to call this \prop_new_linked:N and perhaps even offer \prop_new_flat:N

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You correctly pinpointed my reason to prefer "large" in function names (and my inconsistency in using a different word in error messages), which is that linked is an implementation detail. In fact, the new props are both doubly-linked and implemented using the hash table, and the latter point is an important part of why they end up faster for accessing entries. Is "linked" really a good word for that? "hashed"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that names that refer to the implementation are not good (since there may be further changes to this in another decade or so).

I also understand @FrankMittelbach ‘s problem with labelling them as “large”.

Needs further thought!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Further: in some of the documentation, it is probably best to use both names.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a name that does not imply any “ordering” on the type of implementation, and says nothing about the expected contents, but is also independent of any details of the implementation.

% \prop_gclear_new_large:N, \prop_gclear_new_large:c
% }
% \begin{syntax}
% \cs{prop_clear_new_large:N} \meta{property list}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here \prop_clear_new_linked:N reads a tiny bit better to me. Pity we save an "or" in the name

% \pkg{l3keys}), each key here \emph{must} be followed with an \texttt{=}
% sign.
% \begin{function}[added = 2024-01-15]
% {\prop_const_large_from_keyval:Nn, \prop_const_large_from_keyval:cn}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again "linked" feels more natural to me

@car222222
Copy link
Contributor

The diff here is "too big" for my github app! It refuses to load it.

% \end{function}
%
% \begin{function}[added = 2014-08-12, updated = 2021-04-29]{\prop_log:N, \prop_log:c}
% \begin{syntax}
% \cs{prop_log:N} \meta{property list}
% \end{syntax}
% Writes the entries in the \meta{property list} in the log file.
% Writes the entries in the \meta{property list} in the log file,
% and specifies its storage type.
% \end{function}
%
% \section{Scratch property lists}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is probably sensible to add a note here that the are no scratch “large” pls.

@car222222
Copy link
Contributor

But I can get the diff using Chrome.

But response is achingly slow: not sure if this is due to the size of the diff or just my connection?

So I have now added a few responses and comments on the documentation. Not yet tackled the code.

@FrankMittelbach
Copy link
Member

A suggested stress test for the implementation:

force all property lists to be"large" in the test suite by making \prop_new:N equal to \prop_new_large:N in the kernel and then run the 2e test suite. It would be very interesting to see what that would do to the time it takes to run the tests in all directories (compared to the time it currently needs).

@blefloch
Copy link
Member Author

Indeed Frank that would be a good test, but I'll need some time to set it up (pretty busy work week). I'm worried because the order of prop items is probably different with the new prop implementation than the old one, which may show up in various test logs.

@car222222
Copy link
Contributor

car222222 commented Jan 17, 2024

In the explanation of the "calculation" of the prefix, add something like this:

The aim here is to make this string as short as we can, given the range of distinct characters available.

(Maybe also explain why it should be as short as possible, possibly by reference to earlier where the reason is fully explained?)

%
% \item
% The \enquote{linked} storage method is meant for property lists with a
% large numbers of entries. It has more memory overhead, but is
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imho "memory overhead" should be clarified. Half of the user will probably think that that means computer memory.

@car222222
Copy link
Contributor

At a later stage, the code for the production of short, unique strings should probably be moved to somewhere that makes it more widely available.

@blefloch
Copy link
Member Author

I think I addressed all of the comments. I also found a few bugs in my code, and minor ones elsewhere (see recent issues).

Most importantly, I renamed the functions as Frank suggested, \prop_new_linked:N etc., and I had to spend quite some time fighting with the scope checking for l3debug. From my point of view this could be merged.

The small merge conflict with main is trivial to fix: keep my additional \__prop_chk:w and remove the line break. The failing tests seem related to Joseph's l3build changes. Rebasing on main should fix them I suppose.

@josephwright
Copy link
Member

Looks good to me: lets see what everyone else thinks!

@blefloch
Copy link
Member Author

Ah, one check that Frank suggested was to run the LaTeX2e suite with an additional line making all props linked. I didn't figure out how to do that, but in case someone wants to try, one has to add \cs_gset_eq:NN \prop_new:N \prop_new_linked:N after the definition of \prop_new_linked:N in l3prop.dtx, and to comment out \str_if_eq:nnF {#3} { flat } { #3~ } in l3msg.dtx. This makes a failing m3prop007 file because of some missing linked words, but it will minimize the number of test log changes on the LaTeX2e side.

@blefloch
Copy link
Member Author

I managed to use the new linked prop in the latex2e test suite, which uncovers a problem with treatment of hashes. I'm investigating.

I also forgot to add the suggested \prop_make_linked:N and \prop_make_flat:N functions (and global versions \prop_gmake_linked:N and \prop_gmake_flat:N I suppose).

@josephwright
Copy link
Member

Presumably we can apply the 'make flat' function in 2e rollback if available: either the prop is flat and 'make flat' is unvailable, or the 'make flat' function is available and will work.

@blefloch
Copy link
Member Author

Yes. Anyways, props are flat by default, so this branch doesn't change anything in LaTeX2e. I just wanted to test whether the LaTeX2e test suite runs correctly with props being linked props. Overall it slows down the test suite slightly there, presumably because there are not that many very long props.

Despite the failing tests, I think the branch can be merged/rebased into main. Last I checked (before updating l3build with the latest normalizations), the LaTeX3 tests succeeded.

@car222222
Copy link
Contributor

I would think that there are no "very long props" there (depending on where "very" starts!), or some such have been explicitly and purposefully added.

blefloch and others added 12 commits February 13, 2024 13:49
The hard part was surprisingly to implement \prop_show:N and log, which
have to tediously check every single detail of the internal structure of
the linked property list, in a way that should be fully robust to a
broken data structure.  The order in which functions are defined has
been modified, making the diff somewhat ugly, sorry.  The new test is
basically a concatenation of all the other tests, but with linked
property lists instead of flat ones.
This helps get the same order of items in both implementations.
For linked props this would leave some material in the input stream.
Removing \begin{document} ensures that left-over material leads
to a Missing \begin{document} error.
Since some l3prop functions such as \prop_put_from_keyval:Nn use
others repeatedly, one has to make it so that none of the underlying
code runs any debug check (by using lower-level \cs_set_nopar:Npe
etc), and add a lot of prop functions to the list of patched
commands in l3debug.
@josephwright josephwright merged commit 3604e82 into main Feb 13, 2024
6 checks passed
@josephwright josephwright deleted the gh1040-prop-large branch February 13, 2024 14:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request optimization Optimizations and small tweaks
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants