Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add lexer for LLVM's MIR format #1361

Merged
merged 1 commit into from
Jan 24, 2020
Merged

Conversation

dsandersllvm
Copy link
Contributor

MIR is a human readable serialization format that's used to represent LLVM's
machine specific intermediate representation. It allows LLVM's developers to
see the state of the compilation process at various points, as well as test
individual pieces of the compiler. Our documentation for the format can be
found at https://llvm.org/docs/MIRLangRef.html.

Adding a lexer for this format will allow the LLVM documentation to contain
syntax highlighted examples of LLVM-MIR. Two lexers are included in this
change. 'llvm-mir' lexes the overall document format and delegates to 'llvm' and
'llvm-mir-body' as appropriate. 'llvm-mir-body' lexes the contents of the 'body:'
attribute and can be used directly to syntax highlight code examples without
including the document boilerplate.

Since the 'llvm-mir' lexer delegates to the 'llvm' lexer at times, this change
also adds the 'immarg' and 'willreturn' keywords to the 'llvm' lexer as these
were missing.

Copy link
Collaborator

@Anteru Anteru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also needs a small example file so it gets tested automatically, a single function in MIR will do it. Please add that to pygments/tests/examplefiles/. Otherwise, this looks good!

pygments/lexers/asm.py Show resolved Hide resolved
pygments/lexers/asm.py Show resolved Hide resolved
@dsandersllvm
Copy link
Contributor Author

This also needs a small example file so it gets tested automatically, a single function in MIR will do it. Please add that to pygments/tests/examplefiles/. Otherwise, this looks good!

I've created an artificial example as taking a real file from LLVM would have licensing issues to figure out. This example conforms to the format but isn't exactly a valid file as it breaks some constraints that LLVM would normally enforce and isn't for any particular LLVM target. I think it will suffice as a test though.

@dsandersllvm
Copy link
Contributor Author

Hmm, the test environment seems to be a bit fussier than pygmentize. I've reproduced those test failures locally and I'll investigate

MIR is a human readable serialization format that's used to represent LLVM's
machine specific intermediate representation. It allows LLVM's developers to
see the state of the compilation process at various points, as well as test
individual pieces of the compiler. Our documentation for the format can be
found at https://llvm.org/docs/MIRLangRef.html.

Adding a lexer for this format will allow the LLVM documentation to contain
syntax highlighted examples of LLVM-MIR. Two lexers are included in this
change. 'llvm-mir' lexes the overall document format and delegates to 'llvm' and
'llvm-mir-body' as appropriate. 'llvm-mir-body' lexes the contents of the 'body:'
attribute and can be used directly to syntax highlight code examples without
including the document boilerplate.

Since the 'llvm-mir' lexer delegates to the 'llvm' lexer at times, this change
also adds the 'immarg' and 'willreturn' keywords to the 'llvm' lexer as these
were missing.
@dsandersllvm
Copy link
Contributor Author

Hmm, the test environment seems to be a bit fussier than pygmentize. I've reproduced those test failures locally and I'll investigate

The error token was easy enough to fix by adding:
(r'[-+]', Operator),
to the 'mir' state. However, it then failed the round trip test. It turned out I'd managed to use bygroups(using(...)) in a way that resulted in the same newline character generating multiple tokens. The round trip test would then fail on the doubled newlines. I've modified both patterns that use bygroups(using(...)) to fix this

@Anteru
Copy link
Collaborator

Anteru commented Jan 22, 2020

Thanks for adding the tests -- and the fixes. I'll give it a spin and will merge it then.

@dsandersllvm
Copy link
Contributor Author

Thanks

@Anteru Anteru added this to the 2.6 milestone Jan 23, 2020
@Anteru Anteru merged commit 59396bb into pygments:master Jan 24, 2020
@Anteru
Copy link
Collaborator

Anteru commented Jan 25, 2020

Merged, thanks for the contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants