Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #1416: add WebAssembly lexer #1564

Merged
merged 10 commits into from
Apr 4, 2021

Conversation

jendrikw
Copy link
Contributor

@jendrikw jendrikw commented Oct 4, 2020

This is the WebAssembly lexer I've been using and it worked quite well.

There is one test failure that I'm not sure how to resolve:

/path/pygments/.tox/lint/lib/python3.8/site-packages/pygments/lexers/webassembly.py:252: (WatLexer:arguments:pat#6) E999: Matches empty string
              (r'', Text, '#pop'),
                           ^ here

Fixes #1416.

@jendrikw jendrikw changed the title add WebAssembly lexer Fix #1416: add WebAssembly lexer Oct 4, 2020
@birkenfeld
Copy link
Member

For the empty case, use default('#pop').

Copy link
Member

@birkenfeld birkenfeld left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! A few points still need addressing...

(words(builtins), Name.Builtin, 'arguments'),
(r'i32.const', Name.Builtin),
(words(['i32', 'i64', 'f32', 'f64']), Keyword.Type),
(r'\$[A-Za-z0-9!#$%&\'*+-./:<=>?@\\^_`|~]+', Name.Variable), # yes, all of the are valid in identifiers
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+-. is suspicious, better put the dash at the end of the class.

'root': [
(words(keywords, suffix=r'(?=[^a-z_\.])'), Keyword),
(words(builtins), Name.Builtin, 'arguments'),
(r'i32.const', Name.Builtin),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this one special?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't remember why I did this, can be removed.

i64.reinterpret_f64
f32.reinterpret_i32
f64.reinterpret_i64
""".split()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please change this to a tuple display.

Copy link
Contributor Author

@jendrikw jendrikw Oct 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what you mean by that. Should I do them like the keywords?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, please, but as a tuple (with (...)). This compiles the whole list into a constant without runtime overhead.

""".split()


class WatLexer(RegexLexer):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs a docstring with version information (see other lexers for examples).

(r'[+-]?\d.\d(_?\d)*[eE][+-]?\d(_?\d)*', Number.Float),
(r'[+-]?\d.\d(_?\d)*', Number.Float),
(r'[+-]?\d.[eE][+-]?\d(_?\d)*', Number.Float),
(r'[+-]?(inf|nan|nan:0x[\dA-Fa-f](_?[\dA-Fa-f])*)', Number.Float),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The nan:xxx case should come before the nan case, otherwise it will not get matched.

'nesting_comment': [
(r'\(;', Comment.Multiline, '#push'),
(r';\)', Comment.Multiline, '#pop'),
(r'(.|\n)', Comment.Multiline)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be sped up a lot using:

(r'[^;]+', Comment.Multiline),
(r';', Comment.Multiline),

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't quite get this to work with nesting comments, but this does the trick:

        'nesting_comment': [
            (r'\(;', Comment.Multiline, '#push'),
            (r';\)', Comment.Multiline, '#pop'),
            (r'(\([^;]|;[^\)]|[^;\(])+', Comment.Multiline),
        ],

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right, I missed that comments nest. In that case it's

(r'[^;(]+', Comment.Multiline),
(r'[;(]', Comment.Multiline),

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I changed it to your suggestion.

],
'string': [
(r'\\[\dA-Fa-f][\dA-Fa-f]*', String.Escape),
(r'\t', String.Escape),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are literal tab, newline and CR. did you mean r'\\t' etc?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I tend to get confused with all the escaping.

(r"\\'", String.Escape),
(r'\\u\{[\dA-Fa-f](_?[\dA-Fa-f])*\}', String.Escape),
(r'\\\\', String.Escape),
(chr(92) + chr(92), Error), # backslash (double for regex)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just r'\\' would be fine, but it's better not to produce Error tokens explicitly. Just replace the final . pattern by:

(r'[^"\\]+', String.Double),

(also speeds up the matching).

],
'arguments': [
(r'\s+', Text),
(r'(offset)(=)(\d(_?\d)*)', bygroups(Keyword, Operator, Number.Integer)),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will match offset=0x..., so should get moved below the next rule. Similar with align below.

@jendrikw
Copy link
Contributor Author

jendrikw commented Oct 4, 2020

I think i addressed all comments.

@Anteru
Copy link
Collaborator

Anteru commented Feb 7, 2021

@birkenfeld Could you please take another look at this one?

@Anteru Anteru added this to the 2.8 milestone Feb 7, 2021
@birkenfeld
Copy link
Member

The tests need to be fixed up (output file for the example, and snippets instead of the .py file)... Otherwise looks good, just the "versionadded" is now 2.8

@Anteru Anteru removed this from the 2.8 milestone Feb 14, 2021
@Anteru
Copy link
Collaborator

Anteru commented Feb 14, 2021

Let's make this 2.9 as the target. Given this needs the tests updated, and 2.8 coming out today, no need to rush this in.

@Anteru Anteru added this to the 2.9 milestone Feb 14, 2021
@Anteru Anteru added the update needed Waiting for an update from the PR/issue creator label Feb 14, 2021
and https://webassembly.github.io/spec/core/text/.


:copyright: Copyright 2006-2020 by the Pygments team, see AUTHORS.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
:copyright: Copyright 2006-2020 by the Pygments team, see AUTHORS.
:copyright: Copyright 2006-2021 by the Pygments team, see AUTHORS.

Basic WatLexer Test
~~~~~~~~~~~~~~~~~~~~

:copyright: Copyright 2006-2020 by the Pygments team, see AUTHORS.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
:copyright: Copyright 2006-2020 by the Pygments team, see AUTHORS.
:copyright: Copyright 2006-2021 by the Pygments team, see AUTHORS.

@Anteru
Copy link
Collaborator

Anteru commented Feb 27, 2021

Can you please merge the latest master into this and update for the new test system? That seems to be causing the failures, and in theory all you need to do is to move your test file into a folder and add a golden test output file.

@PhilippGackstatter
Copy link

Hi @jendrikw, I would love to have wat highlighting and was wondering if or when you might find the time to finish this PR? Thanks a lot!

@jendrikw
Copy link
Contributor Author

jendrikw commented Apr 3, 2021

I thinks this is what you hand in mind. Could you please check if this can be merged?

@Anteru
Copy link
Collaborator

Anteru commented Apr 3, 2021

Pretty much yes -- nearly there :) Your unit test file should also use the new mechanism, given you only test an input for a given set of tokens. Look at tests/snippets -- you can extract the input per test into a single file, and let pytest take care of the tokens. That's easier for us because we can regenerate those without having to touch Python code.

@jendrikw
Copy link
Contributor Author

jendrikw commented Apr 3, 2021

That should be it.

@Anteru Anteru merged commit 06f2eba into pygments:master Apr 4, 2021
@Anteru
Copy link
Collaborator

Anteru commented Apr 4, 2021

Merged, thanks!

@Anteru Anteru added changelog-update Items which need to get mentioned in the changelog and removed update needed Waiting for an update from the PR/issue creator changelog-update Items which need to get mentioned in the changelog labels Apr 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support for WebAssembly
5 participants