Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CFamilyLexer: refuse quotes between parentheses for function definiti… #2208

Merged
merged 1 commit into from
Aug 15, 2022

Conversation

jeanas
Copy link
Contributor

@jeanas jeanas commented Aug 14, 2022

…ons and declarations

Something like

id id2("){ ... }");

is no longer wrongly recognized as a "function"

id id2(") {
...
}
");

As the difference in the tests shows, this has the unfortunate side
effect that we no longer highlight something like

int f(param="default");

as a function declaration, but it is hard to imagine another way to
fix this (cf. “most vexing parse” problem).

Fixes #2207

…ons and declarations

Something like

id id2("){ ... }");

is no longer wrongly recognized as a "function"

id id2(") {
  ...
}
");

As the difference in the tests shows, this has the unfortunate side
effect that we no longer highlight something like

int f(param="default");

as a function declaration, but it is hard to imagine another way to
fix this (cf. “most vexing parse” problem).

Fixes pygments#2207
@birkenfeld
Copy link
Member

Yeah we can't pretend to be a C++ parser.

@amitkummer
Copy link
Contributor

Looks great!

@jeanas jeanas merged commit d6968f8 into pygments:master Aug 15, 2022
@jeanas jeanas deleted the c-decl branch August 15, 2022 09:45
@martinburchell
Copy link

martinburchell commented Aug 15, 2022

@jean-abou-samra thanks for your prompt efforts here.

This is an improvement for our documentation. There is one regression with this change:


class QCP_LIB_DECL QCPLayer : public QObject
{
  Q_OBJECT
  /// \cond INCLUDE_QPROPERTIES
  Q_PROPERTY(LayerMode mode READ mode WRITE setMode)
  /// \endcond
public:

  /*!
    Defines the different rendering modes of a layer. Depending on the mode, certain layers can be
    replotted individually, without the need to replot (possibly complex) layerables on other
    layers.

    \see setMode
  */
  enum LayerMode { lmLogical   ///< Layer is used only for rendering order, and shares paint buffer with all other adjacent logical layers.
                   ,lmBuffered ///< Layer has its own paint buffer and may be replotted individually (see \ref replot).
                 };
};

now fails. The full file is at https://github.com/ucam-department-of-psychiatry/camcops/blob/master/tablet_qt/qcustomplot/qcustomplot.h

I realise this could end up as a game of C++ lexer whac-a-mole and one failiure is better than seven. I appreciate it's never going to be perfect and I'm going to add the ability to skip problematic files in our docs build.

@Anteru Anteru added this to the 2.13.0 milestone Aug 15, 2022
@Anteru Anteru added the A-lexing area: changes to individual lexers label Aug 15, 2022
@amitkummer
Copy link
Contributor

TLDR: I don't have a fix, but I am sharing my findings on what happens wrong with the lexer when lexing @martinburchell's code. It is a little hard to spot.

In @martinburchell's snippet this entire part is somehow lexed as a function declaration:

Q_OBJECT
  /// \cond INCLUDE_QPROPERTIES
  Q_PROPERTY(LayerMode mode READ mode WRITE setMode)
  /// \endcond
public:

  /*!
    Defines the different rendering modes of a layer. Depending on the mode, certain layers can be
    replotted individually, without the need to replot (possibly complex) layerables on other
    layers.

    \see setMode
  */
  enum LayerMode { lmLogical   ///< Layer is used only for rendering order, and shares paint buffer with all other adjacent logical layers.
                   ,lmBuffered ///< Layer has its own paint buffer and may be replotted individually (see \ref replot).
                 };

More specifically:

  • Q_OBJECT is matched as the return type (which is fine).
  • Q_PROPERTY is matched as the function's name (also fine).
  • And this entire part is matched as the function's parameters (very much not fine):
(LayerMode mode READ mode WRITE setMode)
 /// \endcond
public:

 /*!
   Defines the different rendering modes of a layer. Depending on the mode, certain layers can be
   replotted individually, without the need to replot (possibly complex) layerables on other
   layers.

   \see setMode
 */
 enum LayerMode { lmLogical   ///< Layer is used only for rendering order, and shares paint buffer with all other adjacent logical layers.
                  ,lmBuffered ///< Layer has its own paint buffer and may be replotted individually (see \ref replot)

It appears that this line in the 'functions' regex is what captures the above code block.
Not sure why this happens, as r'(\([^;"\']*?\))' has a *? and not just a * so it's supposed to be non-greedy, and only capture (LayerMode mode READ mode WRITE setMode) at the start -- not the entire above code block.

@jeanas
Copy link
Contributor Author

jeanas commented Aug 15, 2022

Aaah, it also needs to exclude parentheses. Because *? means it's the first occurrence compatible with the rest of the pattern.

>>> import re
>>> re.match(r"a*?\w", "aaaaaab")
<re.Match object; span=(0, 1), match='a'>
>>> re.match(r"a*?b", "aaaaaab")
<re.Match object; span=(0, 7), match='aaaaaab'>

Like with default="string", there are likely cases where this will not recognize something that is a valid function definition/declaration, but I don't think they are too frequent, are they?

BTW, this doesn't look like a regression introduced by this PR to me, I already see an error token before.

@amitkummer
Copy link
Contributor

Oh! Thanks for explaining that *?. Makes total sense now. I will make a small pull request with a fix.

And yes, this is not a regression.

amitkummer added a commit to amitkummer/Pygments that referenced this pull request Aug 16, 2022
This fixes an issue where in code like this:

```
int foo(float bar) // hello() {}
```

The lexer would match `(float bar) // hello()`
as the parameters of the function `foo`, instead
of just `(float bar)`.

In addition, a similar test case to what was originally
reported in pygments#2208 is added.
jeanas pushed a commit that referenced this pull request Aug 18, 2022
This fixes an issue where in code like this:

```
int foo(float bar) // hello() {}
```

The lexer would match `(float bar) // hello()`
as the parameters of the function `foo`, instead
of just `(float bar)`.

In addition, a similar test case to what was originally
reported in #2208 is added.
@jeanas
Copy link
Contributor Author

jeanas commented Oct 11, 2022 via email

@martinburchell
Copy link

By “failure”, do you mean you’re getting error tokens? (I’m not at a computer right now.)

Yes, but I believe this is fixed in the next release isn't it?

File "/home/martinb/.virtualenvs/camcops/lib/python3.8/site-packages/pygments/__init__.py", line 82, in highlight
return format(lex(code, lexer), formatter, outfile)
File "/home/martinb/.virtualenvs/camcops/lib/python3.8/site-packages/pygments/__init__.py", line 61, in format
formatter.format(tokens, realoutfile)
File "/home/martinb/.virtualenvs/camcops/lib/python3.8/site-packages/pygments/formatter.py", line 94, in format
return self.format_unencoded(tokensource, outfile)
File "/home/martinb/.virtualenvs/camcops/lib/python3.8/site-packages/pygments/formatters/html.py", line 988, in format_unencoded
for t, piece in source:
File "/home/martinb/.virtualenvs/camcops/lib/python3.8/site-packages/pygments/formatters/html.py", line 802, in _wrap_div
yield from inner
File "/home/martinb/.virtualenvs/camcops/lib/python3.8/site-packages/pygments/formatters/html.py", line 819, in _wrap_pre
yield from inner
File "/home/martinb/.virtualenvs/camcops/lib/python3.8/site-packages/pygments/formatters/html.py", line 843, in _format_lines
for ttype, value in tokensource:
File "/home/martinb/.virtualenvs/camcops/lib/python3.8/site-packages/pygments/filter.py", line 19, in _apply
yield from filter_.filter(lexer, stream)
File "/home/martinb/.virtualenvs/camcops/lib/python3.8/site-packages/pygments/filters/__init__.py", line 785, in filter
raise self.exception(value)
pygments.filters.ErrorToken: \

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-lexing area: changes to individual lexers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

C++ Lexer ErrorToken: \ # '
5 participants