-
Notifications
You must be signed in to change notification settings - Fork 661
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix #1416: add WebAssembly lexer #1564
Conversation
For the empty case, use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! A few points still need addressing...
pygments/lexers/webassembly.py
Outdated
(words(builtins), Name.Builtin, 'arguments'), | ||
(r'i32.const', Name.Builtin), | ||
(words(['i32', 'i64', 'f32', 'f64']), Keyword.Type), | ||
(r'\$[A-Za-z0-9!#$%&\'*+-./:<=>?@\\^_`|~]+', Name.Variable), # yes, all of the are valid in identifiers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+-.
is suspicious, better put the dash at the end of the class.
pygments/lexers/webassembly.py
Outdated
'root': [ | ||
(words(keywords, suffix=r'(?=[^a-z_\.])'), Keyword), | ||
(words(builtins), Name.Builtin, 'arguments'), | ||
(r'i32.const', Name.Builtin), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this one special?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't remember why I did this, can be removed.
pygments/lexers/webassembly.py
Outdated
i64.reinterpret_f64 | ||
f32.reinterpret_i32 | ||
f64.reinterpret_i64 | ||
""".split() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please change this to a tuple display.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what you mean by that. Should I do them like the keywords?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, please, but as a tuple (with (...)
). This compiles the whole list into a constant without runtime overhead.
""".split() | ||
|
||
|
||
class WatLexer(RegexLexer): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs a docstring with version information (see other lexers for examples).
pygments/lexers/webassembly.py
Outdated
(r'[+-]?\d.\d(_?\d)*[eE][+-]?\d(_?\d)*', Number.Float), | ||
(r'[+-]?\d.\d(_?\d)*', Number.Float), | ||
(r'[+-]?\d.[eE][+-]?\d(_?\d)*', Number.Float), | ||
(r'[+-]?(inf|nan|nan:0x[\dA-Fa-f](_?[\dA-Fa-f])*)', Number.Float), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The nan:xxx
case should come before the nan
case, otherwise it will not get matched.
pygments/lexers/webassembly.py
Outdated
'nesting_comment': [ | ||
(r'\(;', Comment.Multiline, '#push'), | ||
(r';\)', Comment.Multiline, '#pop'), | ||
(r'(.|\n)', Comment.Multiline) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can be sped up a lot using:
(r'[^;]+', Comment.Multiline),
(r';', Comment.Multiline),
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't quite get this to work with nesting comments, but this does the trick:
'nesting_comment': [
(r'\(;', Comment.Multiline, '#push'),
(r';\)', Comment.Multiline, '#pop'),
(r'(\([^;]|;[^\)]|[^;\(])+', Comment.Multiline),
],
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah right, I missed that comments nest. In that case it's
(r'[^;(]+', Comment.Multiline),
(r'[;(]', Comment.Multiline),
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I changed it to your suggestion.
pygments/lexers/webassembly.py
Outdated
], | ||
'string': [ | ||
(r'\\[\dA-Fa-f][\dA-Fa-f]*', String.Escape), | ||
(r'\t', String.Escape), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these are literal tab, newline and CR. did you mean r'\\t'
etc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I tend to get confused with all the escaping.
pygments/lexers/webassembly.py
Outdated
(r"\\'", String.Escape), | ||
(r'\\u\{[\dA-Fa-f](_?[\dA-Fa-f])*\}', String.Escape), | ||
(r'\\\\', String.Escape), | ||
(chr(92) + chr(92), Error), # backslash (double for regex) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just r'\\'
would be fine, but it's better not to produce Error
tokens explicitly. Just replace the final .
pattern by:
(r'[^"\\]+', String.Double),
(also speeds up the matching).
pygments/lexers/webassembly.py
Outdated
], | ||
'arguments': [ | ||
(r'\s+', Text), | ||
(r'(offset)(=)(\d(_?\d)*)', bygroups(Keyword, Operator, Number.Integer)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will match offset=0x...
, so should get moved below the next rule. Similar with align
below.
I think i addressed all comments. |
@birkenfeld Could you please take another look at this one? |
The tests need to be fixed up (output file for the example, and snippets instead of the .py file)... Otherwise looks good, just the "versionadded" is now 2.8 |
Let's make this 2.9 as the target. Given this needs the tests updated, and 2.8 coming out today, no need to rush this in. |
pygments/lexers/webassembly.py
Outdated
and https://webassembly.github.io/spec/core/text/. | ||
|
||
|
||
:copyright: Copyright 2006-2020 by the Pygments team, see AUTHORS. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:copyright: Copyright 2006-2020 by the Pygments team, see AUTHORS. | |
:copyright: Copyright 2006-2021 by the Pygments team, see AUTHORS. |
tests/test_webassembly.py
Outdated
Basic WatLexer Test | ||
~~~~~~~~~~~~~~~~~~~~ | ||
|
||
:copyright: Copyright 2006-2020 by the Pygments team, see AUTHORS. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:copyright: Copyright 2006-2020 by the Pygments team, see AUTHORS. | |
:copyright: Copyright 2006-2021 by the Pygments team, see AUTHORS. |
Can you please merge the latest master into this and update for the new test system? That seems to be causing the failures, and in theory all you need to do is to move your test file into a folder and add a golden test output file. |
Hi @jendrikw, I would love to have |
I thinks this is what you hand in mind. Could you please check if this can be merged? |
Pretty much yes -- nearly there :) Your unit test file should also use the new mechanism, given you only test an input for a given set of tokens. Look at tests/snippets -- you can extract the input per test into a single file, and let pytest take care of the tokens. That's easier for us because we can regenerate those without having to touch Python code. |
That should be it. |
Merged, thanks! |
This is the WebAssembly lexer I've been using and it worked quite well.
There is one test failure that I'm not sure how to resolve:
Fixes #1416.