Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a PEG (Parsing Expression Grammar) lexer #1336

Merged
merged 2 commits into from Jan 16, 2020

Conversation

goodmami
Copy link
Contributor

Parsing Expression Grammars (PEGs) have been around for about 15 years and they are fairly popular these days for describing non-ambiguous grammars, such as for programming languages and data formats.

This lexer (pygments.lexers.grammar_notation.PegLexer) builds on the original PEG syntax described by Bryan Ford (see here) with some common extensions seen in various implementations of PEG, such as (optionally) using | for choices instead of /, = or : instead of <- for rule definitions, cut operators (^ or ~), and string modifiers, such as r"a regex".

I've added tests/test_grammar_notation.py, but only included tests relevant to this PR instead of going back and adding tests for BnfLexer, etc.

Copy link
Collaborator

@Anteru Anteru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me -- can you please add the grammar to the language list as well (doc/languages.rst), then it should be all good from my end.

pygments/lexers/grammar_notation.py Show resolved Hide resolved
@goodmami
Copy link
Contributor Author

goodmami commented Dec 12, 2019

Added the versionadded tag and added the language to doc/languages.rst, as requested.

There are two remaining things about the lexer that I thought about and I'll write them here for posterity:

  • At least one syntax variant (used in Parsimonious) uses ~"...", ~r"...", etc. for regular expressions, as an extension. This was not compatible with ~ as a cut operator (which is used in the new pegen and older, but not exactly PEG, TatSu libraries), but the impact is minimal: it will just be highlighted differently and does not break parsing.
  • I did not account for extensions that put semantic actions between { and } at the end of the rule (as in the previously mentioned pegen, as well as Rats! and some other parsers). I chose not to do this because I didn't see such support in the other grammar notation lexers, but I can add it if requested.

@goodmami
Copy link
Contributor Author

@Anteru do you wish me to resolve the conflicts in pygments/lexers/_mapping.py or does the person merging handle this? And is there anything else you're waiting on me for?

@Anteru
Copy link
Collaborator

Anteru commented Jan 16, 2020

Hi, that would be appreciated, but I regenerate the _mapping.py file quite often myself already. I just didn't get around to merging this yet, will try to do it this week.

@Anteru Anteru self-assigned this Jan 16, 2020
@goodmami
Copy link
Contributor Author

Ok I've rebased to the current master branch. Let me know if you need anything else.

@Anteru Anteru merged commit 299cfe0 into pygments:master Jan 16, 2020
@Anteru
Copy link
Collaborator

Anteru commented Jan 16, 2020

Merged, thanks for the contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants