Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit Forward references in Mathtext parser #26198

Merged
merged 1 commit into from Jun 29, 2023

Conversation

ksunden
Copy link
Member

@ksunden ksunden commented Jun 27, 2023

PR summary

Comment indicates that limiting forward references is important for
performance, but many more were used than needed.

This is a 75% reduction in forward reference instances.

Many of these are included in the definition of remaining forward
references but do not themselves need to be a forward reference (indeed
only 1 thing in a given cycle actually needs to be such that it can be
referred to prior to its definition)

auto_delim is recursively self-referential, and thus must be all on
its own

placeable and accent form a tight loop, so one of those two needed
to be a forward reference, and placeable was the easiest to keep (and
most likely to be used in another definition that may introduce similar
constraints)

token is one of the more generic definitions, that is used in multiple
other definitions.

required_group and optional_group both caused failed tests when I
tried to make them no longer forward references, has something to do
with how __call__ works for Forward vs And objects in pyparsing
(though did not dig too much deeper into it, beyond noticing that the
names in future required_group("name") calls still had error messages
that say "group", not their newer names, and that reverting to
Forward instances fixed it).

operatorname needed to move to avoid using forward references for
simple, but does not actually have a cycle.

Otherwise rewrapped some of the definitions to be one line and fixed a
typing error that I noticed when looking into the parsing things.

PR checklist

@devRD I would appreciate your review on this. If this reorg would cause conflicts for other PRs already in flight, it can absolutely wait.
I just saw the comment saying to minimize usage of Forward and then was immediately like "why is this a Forward" and kept going.

The changes themselves are relatively minimal and would be easy to redo if needed:

  • delete line that instantiates the Forward
  • delete the << from <<= in the redefinition
  • a few reorderings to maintain directionality without Forword references
  • keep anything that is referred to before it is defined (and cannot be reordered) as a Forward (plus the optional/required_group)

Comment indicates that limiting forward references is important for
performance, but many more were used than needed.

This is a 75% reduction in forward reference instances.

Many of these are included in the definition of remaining forward
references but do not themselves need to be a forward reference (indeed
only 1 thing in a given cycle actually needs to be such that it can be
referred to prior to its definition)

`auto_delim` is recursively self-referential, and thus _must_ be all on
its own

`placeable` and `accent` form a tight loop, so one of those two needed
to be a forward reference, and placeable was the easiest to keep (and
most likely to be used in another definition that may introduce similar
constraints)

`token` is one of the more generic definitions, that is used in multiple
other definitions.

`required_group` and `optional_group` both caused failed tests when I
tried to make them no longer forward references, has something to do
with how `__call__` works for `Forward` vs `And` objects in pyparsing
(though did not dig too much deeper into it, beyond noticing that the
names in future `required_group("name")` calls still had error messages
that say `"group"`, not their newer names, and that reverting to
`Forward` instances fixed it).

`operatorname` needed to move to avoid using forward references for
`simple`, but does not actually have a cycle.

Otherwise rewrapped some of the definitions to be one line and fixed a
typing error that I noticed when looking into the parsing things.
Comment on lines +1820 to +1824
# Set names on (almost) everything -- very useful for debugging
# token, placeable, and auto_delim are forward references which
# are left without names to ensure useful error messages
if key not in ("token", "placeable", "auto_delim"):
val.setName(key)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change I didn't fully intend on keeping in this PR... this was part of the discussion from #26152 that is likely to be part of the fix for making the error messages work for future versions of pyparsing...

I'm not totally against leaving this in as I think it is likely to be wanted soon, but will allow reviewers to revert it if it is too soon.

Suggested change
# Set names on (almost) everything -- very useful for debugging
# token, placeable, and auto_delim are forward references which
# are left without names to ensure useful error messages
if key not in ("token", "placeable", "auto_delim"):
val.setName(key)
# Set names on everything -- very useful for debugging
val.setName(key)

@ksunden ksunden added this to the v3.8.0 milestone Jun 27, 2023
@ksunden
Copy link
Member Author

ksunden commented Jun 27, 2023

As for performance claims, it (in not that high of N) seems to be about a 5% speed up, judging from time the time it takes to run pytest test_mathtext.py (26.5 seconds compared to 28.1 seconds on main)(which has some significant overhead, but may be a bit more realistic on what even heavy users of mathtext may see)

Running on [parser.parse(a) for a in math_tests] (where math_tests is the list from the mathtext tests, stripped of any Nones. It has length of 77, so memoising should be minimized which is good for testing speed of parsing:

main:

387 ms ± 2.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

on pr branch:

301 ms ± 1.43 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

So that is a 22% improvement focusing on parsing, which is honestly more than I thought it would be (granted, parsing mathtext is rarely the bottleneck, but still).

@ksunden
Copy link
Member Author

ksunden commented Jun 27, 2023

Closing and reopening to pick up #26199 since ghactions pytest run didn't go.

@ksunden ksunden closed this Jun 27, 2023
@ksunden ksunden reopened this Jun 27, 2023
@devRD
Copy link
Contributor

devRD commented Jun 27, 2023

I think the changes are really helpful!
Not sure how many PRs it affects currently, but with these new changes I think one will have to update the formatting for any new commands they have added, which is a minimal change. I believe we are good to go ahead with this.

# Set names on (almost) everything -- very useful for debugging
# token, placeable, and auto_delim are forward references which
# are left without names to ensure useful error messages
if key not in ("token", "placeable", "auto_delim"):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is kept (seems OK to me although I cannot really see the potential impact), I'd suggest defining the tuple as a set outside of the loop.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Impact should be pretty minimal for earlier versions of pyparsing. Essentially if one of these raises an error itself it gets more verbose... but I'm having trouble even making these error (themselves and not a deeper constituent piece) when I try to, so not totally convinced those error message ever happen.

On pyparsing 3.1.0, these error messages override the constituent parts, and thus are more visible, so it goes from

Expected token, got ...

To:

Expected Forward {{ simple | ... } | unknown_symbol}, got ...

Which could be better or worse depending on context, but the idea is to facilitate better errors in as yet unreleased pyparsing:

And on future versions of pyparsing, assuming we complete the line drawing we have been discussing, this will signal to return to pre 3.1.0 behavior (at least with many of the proposals).

Personally, I think moving the tuple negatively impacts readablity for rather minimal performance gains (this function is only called at parser init time, though admittedly multiple times, and the for loop has relatively few entries (~13 in the first call, ~40 in the second call, prior to this PR it is called 3 times, but I found no need for it to be called both before and after adding the Forward definitions, which don't rely on it having been called prior)) So moving would be saving ~50 tuple definitions, total per python process (even the parser object gets cached as a class variable and reused via the public API, so even creating a new MathTextParser does not rerun this). And even then, not convinced the variable lookup would be any faster

I think we are talking ~1 microsecond, total.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I always have a hard time judging minor impact on readability and minor impact on performance and will always go for performance...

@tacaswell tacaswell merged commit 3c38700 into matplotlib:main Jun 29, 2023
51 of 54 checks passed
@ksunden ksunden deleted the mathtext_forwards branch June 29, 2023 02:49
ksunden added a commit to ksunden/matplotlib that referenced this pull request Aug 1, 2023
May require backporting a portion of matplotlib#26198 to the 3.7 branch if we wish to have a 3.7.3. (specifically the part where token, placeable, and auto_delim are excluded from )
ksunden added a commit to ksunden/matplotlib that referenced this pull request Aug 1, 2023
May require backporting a portion of matplotlib#26198 to the 3.7 branch if we wish to have a 3.7.3.
(specifically the part where token, placeable, and auto_delim are excluded from setName)
ksunden added a commit to meeseeksmachine/matplotlib that referenced this pull request Aug 2, 2023
ksunden added a commit to meeseeksmachine/matplotlib that referenced this pull request Aug 2, 2023
Cherry picked and fixed up, ignored changes to the pyi file (which doesn't exist on this branch).
ksunden added a commit to meeseeksmachine/matplotlib that referenced this pull request Aug 2, 2023
Cherry picked and fixed up, ignored changes to the pyi file (which doesn't exist on this branch).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants