Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Pyparsing 3.1 breaks tests #26152

Closed
oscargus opened this issue Jun 19, 2023 · 24 comments · Fixed by #26431
Closed

[Bug]: Pyparsing 3.1 breaks tests #26152

oscargus opened this issue Jun 19, 2023 · 24 comments · Fixed by #26431
Labels
Release critical For bugs that make the library unusable (segfaults, incorrect plots, etc) and major regressions. status: upstream fix required topic: text/mathtext
Milestone

Comments

@oscargus
Copy link
Contributor

Bug summary

Pyparsing 3.1.0 was released today and something has changed which causes quite a bit of test failures.

List of changes https://github.com/pyparsing/pyparsing/releases/tag/3.1.0 (not obvious to me what caused it).

Code for reproduction

Run the tests

Actual outcome

FAILED lib/matplotlib/tests/test_mathtext.py::test_mathtext_exceptions[hspace without value]
FAILED lib/matplotlib/tests/test_mathtext.py::test_mathtext_exceptions[hspace with invalid value]
FAILED lib/matplotlib/tests/test_mathtext.py::test_mathtext_exceptions[function without space]
FAILED lib/matplotlib/tests/test_mathtext.py::test_mathtext_exceptions[accent without space]
FAILED lib/matplotlib/tests/test_mathtext.py::test_mathtext_exceptions[frac without parameters]
FAILED lib/matplotlib/tests/test_mathtext.py::test_mathtext_exceptions[frac with empty parameters]
FAILED lib/matplotlib/tests/test_mathtext.py::test_mathtext_exceptions[binom without parameters]
FAILED lib/matplotlib/tests/test_mathtext.py::test_mathtext_exceptions[binom with empty parameters]
FAILED lib/matplotlib/tests/test_mathtext.py::test_mathtext_exceptions[genfrac without parameters]
FAILED lib/matplotlib/tests/test_mathtext.py::test_mathtext_exceptions[genfrac with empty parameters]
FAILED lib/matplotlib/tests/test_mathtext.py::test_mathtext_exceptions[sqrt without parameters]
FAILED lib/matplotlib/tests/test_mathtext.py::test_mathtext_exceptions[sqrt with invalid value]
FAILED lib/matplotlib/tests/test_mathtext.py::test_mathtext_exceptions[overline without parameters]
FAILED lib/matplotlib/tests/test_mathtext.py::test_mathtext_exceptions[overline with empty parameter]
FAILED lib/matplotlib/tests/test_mathtext.py::test_mathtext_exceptions[left with invalid delimiter]
FAILED lib/matplotlib/tests/test_mathtext.py::test_mathtext_exceptions[right with invalid delimiter]
FAILED lib/matplotlib/tests/test_mathtext.py::test_mathtext_exceptions[unclosed parentheses with sizing]
FAILED lib/matplotlib/tests/test_mathtext.py::test_mathtext_exceptions[unclosed parentheses without sizing]
FAILED lib/matplotlib/tests/test_mathtext.py::test_mathtext_exceptions[dfrac without parameters]
FAILED lib/matplotlib/tests/test_mathtext.py::test_mathtext_exceptions[dfrac with empty parameters]
FAILED lib/matplotlib/tests/test_mathtext.py::test_mathtext_exceptions[overset without parameters]
FAILED lib/matplotlib/tests/test_mathtext.py::test_mathtext_exceptions[underset without parameters]
FAILED lib/matplotlib/tests/test_mathtext.py::test_mathtext_exceptions[unknown symbol]
FAILED lib/matplotlib/tests/test_mathtext.py::test_mathtext_exceptions[double superscript]
FAILED lib/matplotlib/tests/test_mathtext.py::test_mathtext_exceptions[double subscript]
FAILED lib/matplotlib/tests/test_mathtext.py::test_mathtext_exceptions[super on sub without braces]
FAILED lib/matplotlib/tests/test_text.py::test_parse_math - AssertionError: R...
FAILED lib/matplotlib/tests/test_text.py::test_parse_math_rcparams - Assertio...

Expected outcome

No failures

Additional information

I guess a two-stage fix:

  1. Pin the version to <3.1
  2. Fix the actual problem and unpin

Operating system

No response

Matplotlib Version

main-branch

Matplotlib Backend

No response

Python version

No response

Jupyter version

No response

Installation

None

@oscargus oscargus added this to the v3.7.2 milestone Jun 19, 2023
@oscargus oscargus added topic: text/mathtext Release critical For bugs that make the library unusable (segfaults, incorrect plots, etc) and major regressions. labels Jun 19, 2023
@ksunden
Copy link
Member

ksunden commented Jun 19, 2023

Possibly related to pyparsing/pyparsing#474

Have not proven it, but that has to do with handling escape sequences and the errors appear to all be related to \

I'll also note that pyparsing handling of \ is (or at least has been) problematic for python 3.12 support, so could be a different change related to that as well...

@ksunden
Copy link
Member

ksunden commented Jun 24, 2023

pyparsing/pyparsing@4cd691f#diff-651b49e05968bd7f42614f05b7e9d94b30f52db27eb692ce20ef87e9799c624fR4449-R4453

Okay, I have bisected/tested that this change is what is causing our test failures.

I will note that it is only the error messages that change (and so anything that does not raise will still work)

Essentially we were relying on the "pass through" of error messages to provide actually useful error messages (and testing that we got them)

@ptmcg, perhaps you can advise us on how to respond? It is unclear to me (at least quickly) if there is a good way to get the errors from further down when we want it.

@ptmcg
Copy link

ptmcg commented Jun 24, 2023

I am not able to readily set up my own development environment for running these tests. Can someone please post (or link to a pastebin) a full test output, including expected and observed values?

Essentially we were relying on the "pass through" of error messages to provide actually useful error messages (and testing that we got them)

I think I was striving for improving the error messages as part of this effort.

@oscargus
Copy link
Contributor Author

These are the failing tests:

@pytest.mark.parametrize(
'math, msg',
[
(r'$\hspace{}$', r'Expected \hspace{space}'),
(r'$\hspace{foo}$', r'Expected \hspace{space}'),
(r'$\sinx$', r'Unknown symbol: \sinx'),
(r'$\dotx$', r'Unknown symbol: \dotx'),
(r'$\frac$', r'Expected \frac{num}{den}'),
(r'$\frac{}{}$', r'Expected \frac{num}{den}'),
(r'$\binom$', r'Expected \binom{num}{den}'),
(r'$\binom{}{}$', r'Expected \binom{num}{den}'),
(r'$\genfrac$',
r'Expected \genfrac{ldelim}{rdelim}{rulesize}{style}{num}{den}'),
(r'$\genfrac{}{}{}{}{}{}$',
r'Expected \genfrac{ldelim}{rdelim}{rulesize}{style}{num}{den}'),
(r'$\sqrt$', r'Expected \sqrt{value}'),
(r'$\sqrt f$', r'Expected \sqrt{value}'),
(r'$\overline$', r'Expected \overline{body}'),
(r'$\overline{}$', r'Expected \overline{body}'),
(r'$\leftF$', r'Expected a delimiter'),
(r'$\rightF$', r'Unknown symbol: \rightF'),
(r'$\left(\right$', r'Expected a delimiter'),
# PyParsing 2 uses double quotes, PyParsing 3 uses single quotes and an
# extra backslash.
(r'$\left($', re.compile(r'Expected ("|\'\\)\\right["\']')),
(r'$\dfrac$', r'Expected \dfrac{num}{den}'),
(r'$\dfrac{}{}$', r'Expected \dfrac{num}{den}'),
(r'$\overset$', r'Expected \overset{annotation}{body}'),
(r'$\underset$', r'Expected \underset{annotation}{body}'),
(r'$\foo$', r'Unknown symbol: \foo'),
(r'$a^2^2$', r'Double superscript'),
(r'$a_2_2$', r'Double subscript'),
(r'$a^2_a^2$', r'Double superscript'),
],
ids=[
'hspace without value',
'hspace with invalid value',
'function without space',
'accent without space',
'frac without parameters',
'frac with empty parameters',
'binom without parameters',
'binom with empty parameters',
'genfrac without parameters',
'genfrac with empty parameters',
'sqrt without parameters',
'sqrt with invalid value',
'overline without parameters',
'overline with empty parameter',
'left with invalid delimiter',
'right with invalid delimiter',
'unclosed parentheses with sizing',
'unclosed parentheses without sizing',
'dfrac without parameters',
'dfrac with empty parameters',
'overset without parameters',
'underset without parameters',
'unknown symbol',
'double superscript',
'double subscript',
'super on sub without braces'
]
)
def test_mathtext_exceptions(math, msg):
parser = mathtext.MathTextParser('agg')
match = re.escape(msg) if isinstance(msg, str) else msg
with pytest.raises(ValueError, match=match):
parser.parse(math)

Here is the full traceback of one of them:

――――――――――――――――――――――――――――――――――――― test_mathtext_exceptions[double subscript] ――――――――――――――――――――――――――――――――――――――

math = '$a_2_2$', msg = 'Double subscript'

    @pytest.mark.parametrize(
        'math, msg',
        [
            (r'$\hspace{}$', r'Expected \hspace{space}'),
            (r'$\hspace{foo}$', r'Expected \hspace{space}'),
            (r'$\sinx$', r'Unknown symbol: \sinx'),
            (r'$\dotx$', r'Unknown symbol: \dotx'),
            (r'$\frac$', r'Expected \frac{num}{den}'),
            (r'$\frac{}{}$', r'Expected \frac{num}{den}'),
            (r'$\binom$', r'Expected \binom{num}{den}'),
            (r'$\binom{}{}$', r'Expected \binom{num}{den}'),
            (r'$\genfrac$',
             r'Expected \genfrac{ldelim}{rdelim}{rulesize}{style}{num}{den}'),
            (r'$\genfrac{}{}{}{}{}{}$',
             r'Expected \genfrac{ldelim}{rdelim}{rulesize}{style}{num}{den}'),
            (r'$\sqrt$', r'Expected \sqrt{value}'),
            (r'$\sqrt f$', r'Expected \sqrt{value}'),
            (r'$\overline$', r'Expected \overline{body}'),
            (r'$\overline{}$', r'Expected \overline{body}'),
            (r'$\leftF$', r'Expected a delimiter'),
            (r'$\rightF$', r'Unknown symbol: \rightF'),
            (r'$\left(\right$', r'Expected a delimiter'),
            # PyParsing 2 uses double quotes, PyParsing 3 uses single quotes and an
            # extra backslash.
            (r'$\left($', re.compile(r'Expected ("|\'\\)\\right["\']')),
            (r'$\dfrac$', r'Expected \dfrac{num}{den}'),
            (r'$\dfrac{}{}$', r'Expected \dfrac{num}{den}'),
            (r'$\overset$', r'Expected \overset{annotation}{body}'),
            (r'$\underset$', r'Expected \underset{annotation}{body}'),
            (r'$\foo$', r'Unknown symbol: \foo'),
            (r'$a^2^2$', r'Double superscript'),
            (r'$a_2_2$', r'Double subscript'),
            (r'$a^2_a^2$', r'Double superscript'),
        ],
        ids=[
            'hspace without value',
            'hspace with invalid value',
            'function without space',
            'accent without space',
            'frac without parameters',
            'frac with empty parameters',
            'binom without parameters',
            'binom with empty parameters',
            'genfrac without parameters',
            'genfrac with empty parameters',
            'sqrt without parameters',
            'sqrt with invalid value',
            'overline without parameters',
            'overline with empty parameter',
            'left with invalid delimiter',
            'right with invalid delimiter',
            'unclosed parentheses with sizing',
            'unclosed parentheses without sizing',
            'dfrac without parameters',
            'dfrac with empty parameters',
            'overset without parameters',
            'underset without parameters',
            'unknown symbol',
            'double superscript',
            'double subscript',
            'super on sub without braces'
        ]
    )
    def test_mathtext_exceptions(math, msg):
        parser = mathtext.MathTextParser('agg')
        match = re.escape(msg) if isinstance(msg, str) else msg
        with pytest.raises(ValueError, match=match):
>           parser.parse(math)

test_mathtext.py:346:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <matplotlib.mathtext.MathTextParser object at 0x0000018B5CC266D0>, s = '$a_2_2$', dpi = 72, prop = None

    def parse(self, s, dpi=72, prop=None):
        """
        Parse the given math expression *s* at the given *dpi*.  If *prop* is
        provided, it is a `.FontProperties` object specifying the "default"
        font to use in the math expression, used for all non-math text.

        The results are cached, so multiple calls to `parse`
        with the same expression should be fast.

        Depending on the *output* type, this returns either a `VectorParse` or
        a `RasterParse`.
        """
        # lru_cache can't decorate parse() directly because prop
        # is mutable; key the cache using an internal copy (see
        # text._get_text_metrics_with_cache for a similar case).
        prop = prop.copy() if prop is not None else None
>       return self._parse_cached(s, dpi, prop)

..\mathtext.py:77:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <matplotlib.mathtext.MathTextParser object at 0x0000018B5CC266D0>, s = '$a_2_2$', dpi = 72
prop = <matplotlib.font_manager.FontProperties object at 0x0000018B5CA92C70>

    @functools.lru_cache(50)
    def _parse_cached(self, s, dpi, prop):
        from matplotlib.backends import backend_agg

        if prop is None:
            prop = FontProperties()
        fontset_class = _api.check_getitem(
            self._font_type_mapping, fontset=prop.get_math_fontfamily())
        load_glyph_flags = {
            "vector": LOAD_NO_HINTING,
            "raster": backend_agg.get_hinting_flag(),
        }[self._output_type]
        fontset = fontset_class(prop, load_glyph_flags)

        fontsize = prop.get_size_in_points()

        if self._parser is None:  # Cache the parser globally.
            self.__class__._parser = _mathtext.Parser()

>       box = self._parser.parse(s, fontset, fontsize, dpi)

..\mathtext.py:98:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <matplotlib._mathtext.Parser object at 0x0000018B57DB44C0>, s = '$a_2_2$'
fonts_object = <matplotlib._mathtext.BakomaFonts object at 0x0000018B5CA92910>, fontsize = 12.0, dpi = 72

    def parse(self, s, fonts_object, fontsize, dpi):
        """
        Parse expression *s* using the given *fonts_object* for
        output, at the given *fontsize* and *dpi*.

        Returns the parse tree of `Node` instances.
        """
        self._state_stack = [
            ParserState(fonts_object, 'default', 'rm', fontsize, dpi)]
        self._em_width_cache = {}
        try:
            result = self._expression.parseString(s)
        except ParseBaseException as err:
            # explain becomes a plain method on pyparsing 3 (err.explain(0)).
>           raise ValueError("\n" + ParseException.explain(err, 0)) from None
E           ValueError:
E
E           ^
E           ParseFatalException: Expected token  (at char 0), (line:1, col:1)

..\_mathtext.py:2056: ValueError

During handling of the above exception, another exception occurred:

math = '$a_2_2$', msg = 'Double subscript'

    @pytest.mark.parametrize(
        'math, msg',
        [
            (r'$\hspace{}$', r'Expected \hspace{space}'),
            (r'$\hspace{foo}$', r'Expected \hspace{space}'),
            (r'$\sinx$', r'Unknown symbol: \sinx'),
            (r'$\dotx$', r'Unknown symbol: \dotx'),
            (r'$\frac$', r'Expected \frac{num}{den}'),
            (r'$\frac{}{}$', r'Expected \frac{num}{den}'),
            (r'$\binom$', r'Expected \binom{num}{den}'),
            (r'$\binom{}{}$', r'Expected \binom{num}{den}'),
            (r'$\genfrac$',
             r'Expected \genfrac{ldelim}{rdelim}{rulesize}{style}{num}{den}'),
            (r'$\genfrac{}{}{}{}{}{}$',
             r'Expected \genfrac{ldelim}{rdelim}{rulesize}{style}{num}{den}'),
            (r'$\sqrt$', r'Expected \sqrt{value}'),
            (r'$\sqrt f$', r'Expected \sqrt{value}'),
            (r'$\overline$', r'Expected \overline{body}'),
            (r'$\overline{}$', r'Expected \overline{body}'),
            (r'$\leftF$', r'Expected a delimiter'),
            (r'$\rightF$', r'Unknown symbol: \rightF'),
            (r'$\left(\right$', r'Expected a delimiter'),
            # PyParsing 2 uses double quotes, PyParsing 3 uses single quotes and an
            # extra backslash.
            (r'$\left($', re.compile(r'Expected ("|\'\\)\\right["\']')),
            (r'$\dfrac$', r'Expected \dfrac{num}{den}'),
            (r'$\dfrac{}{}$', r'Expected \dfrac{num}{den}'),
            (r'$\overset$', r'Expected \overset{annotation}{body}'),
            (r'$\underset$', r'Expected \underset{annotation}{body}'),
            (r'$\foo$', r'Unknown symbol: \foo'),
            (r'$a^2^2$', r'Double superscript'),
            (r'$a_2_2$', r'Double subscript'),
            (r'$a^2_a^2$', r'Double superscript'),
        ],
        ids=[
            'hspace without value',
            'hspace with invalid value',
            'function without space',
            'accent without space',
            'frac without parameters',
            'frac with empty parameters',
            'binom without parameters',
            'binom with empty parameters',
            'genfrac without parameters',
            'genfrac with empty parameters',
            'sqrt without parameters',
            'sqrt with invalid value',
            'overline without parameters',
            'overline with empty parameter',
            'left with invalid delimiter',
            'right with invalid delimiter',
            'unclosed parentheses with sizing',
            'unclosed parentheses without sizing',
            'dfrac without parameters',
            'dfrac with empty parameters',
            'overset without parameters',
            'underset without parameters',
            'unknown symbol',
            'double superscript',
            'double subscript',
            'super on sub without braces'
        ]
    )
    def test_mathtext_exceptions(math, msg):
        parser = mathtext.MathTextParser('agg')
        match = re.escape(msg) if isinstance(msg, str) else msg
        with pytest.raises(ValueError, match=match):
>           parser.parse(math)
E           AssertionError: Regex pattern did not match.
E            Regex: 'Double\\ subscript'
E            Input: '\n\n^\nParseFatalException: Expected token  (at char 0), (line:1, col:1)'

test_mathtext.py:346: AssertionError

All other looks very similar.

@oscargus
Copy link
Contributor Author

That particular exception we raise here:

raise ParseFatalException("Double subscript")

@ksunden
Copy link
Member

ksunden commented Jun 24, 2023

Essentially we had been getting informative error messages and now just get "Expected token (at char 0)" for any failure to parse, which is not informative. Reverting the try/except linked above fixes our tests

@ptmcg
Copy link

ptmcg commented Jun 25, 2023

I'm still trying to setup a development environment of my own following the matplotlib dev guide, but now getting TypeErrors with FT2Font something something.

What happens if you stop calling set_name("token") on p.token? Are there other p attributes that are also emitting unhelpful names? Something like this?

        def set_names_and_parse_actions():
            for key, val in vars(p).items():
                if not key.startswith('_'):
                    # Set names on everything -- very useful for debugging
                    if key != "token":  # <-- don't obscure token contents with unhelpful "token" name
                        val.setName(key)
                    # Set actions
                    if hasattr(self, key):
                        val.setParseAction(getattr(self, key))

Or possibly change the if statement to any attribute whose value is a pyparsing Literal or Keyword, which would probably give clearer-looking exceptions than using your internal variable name:

                    if not isinstance(val.expr, (Literal, Keyword)):

@ptmcg
Copy link

ptmcg commented Jun 26, 2023

tl;dr - This change has been in staging and pre-releases for over a year before being released, and I announced (in Twitter and comp.lang.python.announce) the coming 3.1.0 release of pyparsing in alpha and beta releases in March, April, and May of this year. In general, I feel that this particular change improves logging for expressions that contain other expressions when the container expression has been assigned a custom name, so I do not plan on reverting this change. It saddens me that there is this post-release surprise issue with a pyparsing user as significant as matplotlib.

As a workaround, you can see what the effects are by not calling setName on selected expressions in your parser. Or as you have done for now, pin to a 3.0.x version of pyparsing.

I also have tried to spin up my own development environment of matplotlib, so that I could run your tests against my in-development pyparsing, but with no success (both Windows and Ubuntu). I welcome any assistance in doing this, so that I can run matplotlib regression tests for myself before releasing new versions of pyparsing.

Some possible changes I could make in a future pyparsing release:

  • add a method to ParserElement to adjust the verbosity of error messages, so that you could get an error message that included the contained element's custom name or default name as well, something like "Expected token ('\hspace{space}')" - I am most optimistic that this would be a good addition, and would serve to address the issue you have with your tests.
  • allow parsers to undo a call to setName by calling setName with a None value (I have looked briefly at this, I am not sure this will have a very clean implementation, but I'll be the first to admit it would not be the first unclean thing in pyparsing)

I have pressing deadlines at work for the next month, so this future release will not be for several weeks yet. Also, I would like to give 3.1.0 some more time in the field, to get any other fallout from other users.

@ksunden - I tested this version of pyparsing with Python 3.12, so I believe I've addressed any issues with \. Note that \ in QuotedString is parsed a little differently (and to my mind more sanely) in this new version. If there are still problems with \ in 3.12, it points to gaps in my test suite, which I would be happy to learn about.

@ksunden
Copy link
Member

ksunden commented Jun 26, 2023

The guess at QuotedString / \ behavior was an initial guess upon reading the changelog without looking at the code, I suspect that is not an issue (though will check in the relatively near term on python 3.12 to be sure...) certainly I'm convinced it is not actually the cause for what we are seeing on python <= 3.11 now

I will focus discussion in the interim on one particular path, as I suspect that once we find a path for one, the rest will fall into line with relatively trivial modifications.

So let's focus on unknown_symbol (the test case for which is r"$\sinx$")

Unknown symbol is Regex(r"\\[A-Za-z]*)("name") (i.e. a alpha word preceded by a \, such that it looks like a LaTeX command, but is not one we support in Mathtext)

It's parsing function always raises:

def unknown_symbol(self, s, loc, toks):
    raise ParseFatalException(s, loc, f"Unknown symbol: {toks['name']}")

As such, our expected behavior is that when such a symbol is parsed, we should get an error message that points directly to the unknown symbol and says Unknown symbol in the error message explanation.

A simple test case, extracting from the pytest markup:

import matplotlib.mathtext

parser = matplotlib.mathtext.MathTextParser("agg")
parser.parse(r"$\sinx$")

This should raise a ValueError which prints out the explanation including the phrase Unknown symbol. This is what it does with pyparsing 3.0.9:

ValueError: 
\sinx
^
ParseFatalException: Unknown symbol: \sinx, found '\'  (at char 0), (line:1, col:1)

With pyparsing 3.1.0 and no modifications to matplotlib this gives:

ValueError: 
\sinx
^
ParseFatalException: Expected token, found '\'  (at char 0), (line:1, col:1)

Which does not include the key phrase we are looking for in out tests, and is generally not a helpful error message in our usage.

Excluding token from having setName called it gets better, but does not go all the way:

ValueError: 
\sinx
^
ParseFatalException: Forward: {{simple | auto_delim} | unknown_symbol}, found '\'  (at char 0), (line:1, col:1)

That is the internal definition of token, which at least is progress, but does not get us to where we are seeing the error message we were looking for (pretty much any of the other suggestions such as keying off of Literal/Keyword etc had similar results)

If I modify pyparsing a little more minimally such that instead of:

            try:
                return self.expr._parse(instring, loc, doActions, callPreParse=False)
            except ParseBaseException as pbe:
                pbe.msg = self.errmsg
                raise

I do (i.e. do not reset the error message for Forward instances):

            try:
                return self.expr._parse(instring, loc, doActions, callPreParse=False)
            except ParseBaseException as pbe:
                if not isinstance(self, Forward):
                    pbe.msg = self.errmsg
                raise

Then the matplotlib tests pass (even without the modification so that token does not get setName called). If this would be an acceptable upstream change, I will happily PR it, make sure our pins are just !=3.1.0 and call this resolved. Though I do understand if that is still not ideal on your end. It does not seem to cause any test failures that I can tell (I am having some trouble regarding the diagrams portions of pyparsing which are erroring the full test suite, so ran the non-diagram tests as individual files, had some diagram-related failures still, so may have missed something)

The "depth" argument for "explain" does not seem to affect the output of these messages (well, other than adding a trailing line of matplotlib._mathtext.Parser to the output) I could see that potentially giving the best of both worlds... though have not looked into it closely enough to know if that is easy or not (certainly would still require us to change the explain call).

RE your dev setup for matplotlib, happy to help if I can, but not sure what could be going wrong.
We have some docs for getting started that may be of help: https://matplotlib.org/stable/devel/development_setup.html

RE preventing surprises post-release in the future, we do have a test setup which pulls nightly builds of numpy/pandas (from here). We could probably modify that to get either at least --pre from pypi.org or get you in touch with the people who run those nightlies and get your CI set up to upload there, so that we (and other downstreams, potentially) can install from there (which would allow potentially even further advanced warning... I'm happy to PR it if you want to go that route and we get the green light from them, though you will probably need to handle generating an upload token/adding it as a github secret)

@ksunden
Copy link
Member

ksunden commented Jun 26, 2023

Also, we newly have github codespaces set up to get a dev setup with VSCode through the browser, while we are still working on some of the docs for using it, the config files are there, you may be able to get something good enough to run some quick tests on through that.

@ksunden
Copy link
Member

ksunden commented Jun 26, 2023

Perhaps the other heuristic which may be more in line with what you were thinking would happen by not calling setName would be to do:

            try:
                return self.expr._parse(instring, loc, doActions, callPreParse=False)
            except ParseBaseException as pbe:
                if self.customName is not None:
                    pbe.msg = self.errmsg
                raise

Which would prevent autogen names from showing (unless they are the leaf).

@ptmcg
Copy link

ptmcg commented Jun 26, 2023

Your code fragment also leads me to another thought, since the issue seems not to be so much about set_name (as I originally thought), but about the custom message in your raised exception. What if you raised your ParseFatalExceptions as a subclass of ParseFatalException, and I only do the substitution if the exception is of a type known to pyparsing (i.e., ParseBaseException, ParseException, ParseFatalExecption, or ParseSyntaxException). I've seen this exception style (don't raise library exceptions, raise your own subclass of them) recommended in other sources.

@ptmcg
Copy link

ptmcg commented Jun 26, 2023

Something like this:

            try:
                return self.expr._parse(instring, loc, doActions, callPreParse=False)
            except ParseBaseException as pbe:
                # only update the exception message if it is one of the built-in types (not any user-defined subclass)
                if type(pbe) in (ParseBaseException, ParseException, ParseFatalException, ParseSyntaxException):
                    pbe.msg = self.errmsg
                raise

@ptmcg
Copy link

ptmcg commented Jun 26, 2023

This would then be a documented mechanism for client apps to emit their own special error messages, by subclassing from one of the pyparsing exception types.

(Sorry, that took me a few tries to get right. Explicit testing for types is just not natural for me!)

@ksunden
Copy link
Member

ksunden commented Jun 26, 2023

Unfortunately that only solves 7/26 of our test cases. (The particular example of Unknown Symbol is one of the ones fixed)

All of the ones that fail in parsing out parameters to LaTeX functions (e.g. a/b in \frac{a}{b}) either because parameters are missing/errantly empty/otherwise invalid are not custom exceptions that are raised (and I think get raised before our handling functions, so not really a good place to intercept/reraise)

What about setting errmsg to None as a sentinel?

Then it becomes on our side:

diff --git a/lib/matplotlib/_mathtext.py b/lib/matplotlib/_mathtext.py
index 811702f1cd..730a82e45d 100644
--- a/lib/matplotlib/_mathtext.py
+++ b/lib/matplotlib/_mathtext.py
@@ -1819,6 +1819,8 @@ class Parser:
                 if not key.startswith('_'):
                     # Set names on everything -- very useful for debugging
                     val.setName(key)
+                    if isinstance(val, Forward):
+                        val.errmsg = None
                     # Set actions
                     if hasattr(self, key):
                         val.setParseAction(getattr(self, key))

And on pyparsing's side:

diff --git a/pyparsing/core.py b/pyparsing/core.py
index 8233f72..e83dcda 100644
--- a/pyparsing/core.py
+++ b/pyparsing/core.py
@@ -4547,7 +4547,8 @@ class ParseElementEnhance(ParserElement):
             try:
                 return self.expr._parse(instring, loc, doActions, callPreParse=False)
             except ParseBaseException as pbe:
-                pbe.msg = self.errmsg
+                if self.errmsg is not None:
+                    pbe.msg = self.errmsg
                 raise
         else:
             raise ParseException(instring, loc, "No expression defined", self)

(plus docs/type hints/tests/possibly other handling around self.errmsg possibly being None)

This gives us:

a) opt in to "pass through"
b) can still have useful names and/or raise their own exceptions when wanted
c) parse-time exceptions "pass through" as well

Also open to other sentinels (including a new variable, perhaps?), that was just one that occurred to me.

@ksunden
Copy link
Member

ksunden commented Jun 26, 2023

I guess that diff begs the question of "why set names if you just set errmsg to be empty?"

And I think I'm personally comeng back to arguing that perhaps Forward should be special cased somehow... such that:

  • A Forward instance with a name does just get Expected <name>, I think that is fine, actually, and probably what most people would expect...
  • A Forward instance with no name just reraises, and does not replace the error message
    • Because that seems to be what MatchFirst does, and Forward as an implementation detail for the parser... so I don't think a Forward really should enter into the error messages shown
    • Once it has a name, then it is its own concrete thing, that I can see why you would not want to go deeper for error messages (in most use cases...)

I think we would then just need to find the subset of our Forwards to avoid setting the name on (starting with token, and possibly even ending there...)

@ptmcg
Copy link

ptmcg commented Jun 26, 2023

Thanks for the extended thought process on this. When I get a chance I'll revisit that dev guide to see which steps I missed.

netbsd-srcmastr pushed a commit to NetBSD/pkgsrc that referenced this issue Jul 20, 2023
Matplotlib currently forbids our version of pyparsing (3.1.0).
The issue only affects error messages.
This update hacks out the pyparsing < 3.1.0 requirement so that
matplotlib dependents can at least function, though some error
messages won't pass through properly.

matplotlib/matplotlib#26152
@jklymak
Copy link
Member

jklymak commented Jul 27, 2023

This is marked 3.7.2 RC. Is there a path forward, and if so, from whose side does this need to be fixed?

@ksunden
Copy link
Member

ksunden commented Jul 27, 2023

I have pyparsing/pyparsing#493 open. Once that is merged I'll adjust the pinning to either !=3.1.0 or allow it and xfail the error message tests with that particular version of pyparsing (it works just gives less informative error messages than the ones we test for)

@ptmcg
Copy link

ptmcg commented Jul 29, 2023

I was working with 3.1.0 today on a new parser for work, and the exception messages are really pretty bad! I'll definitely get on this over the weekend.

@ptmcg
Copy link

ptmcg commented Jul 29, 2023

Merged your PR (and a few others, and addressing some other reported issues). Thanks for your work on this.

Harmon758 added a commit to Harmon758/Harmonbot that referenced this issue Jul 30, 2023
@ptmcg
Copy link

ptmcg commented Jul 30, 2023

PR merged and released in pyparsing 3.1.1 - let me know if there are further issues

@sin-ack
Copy link

sin-ack commented Aug 24, 2023

It appears that this is still broken, unfortunately :( I ran tests with pyparsing 3.1.1 on Gentoo through Portage, and got the following failure log:
pytest-failures.log

@ksunden
Copy link
Member

ksunden commented Aug 24, 2023

@sin-ack, this did require some changes on our end, but is fixed on main/3.8.0rc1/3.7.x backport branch, just we haven't got 3.7.3 or 3.8.0 (final) out yet.

Ref #26432 (for the 3.7 backport)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Release critical For bugs that make the library unusable (segfaults, incorrect plots, etc) and major regressions. status: upstream fix required topic: text/mathtext
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants