Add a lexer for Lean 4 #2618

eric-wieser · 2023-12-30T18:38:21Z

This is taken from https://github.com/leanprover/lean4/blob/d92948bc20b12f53542814c79469711ceff19fbf/doc/latex/lean4.py, with subsequent commits showing the changes made on top of this file

For that reason, I'd prefer if the history could be preserved; though I'd be happy to rewrite it if you want fewer commits / different message conventions.

The test file I'd added is the lean4 translation of the test file that was being used for Lean 3.

Since lean3 and lean4 files share a file extension, this adds analyse_text methods that discriminate based on import Foo vs import foo; this is the same heuristic used by GitHub linguist.

cc @Kha

…f53542814c79469711ceff19fbf/doc/latex/lean4.py This is based off an old version of the Lean 3 lexer.

There is no need for `re.UNICODE` any more, or `u` prefixes.

The version and url information no longer lives in the docstring.

This allows a lean4 version of the lean3 test to be committed

* Use `Whitespace` not `Text` for Whitespace * Use a single token for an entire multiline comment, not one per character * Fix brace-matching for `@[attr]` syntax * Add docstring highlighting

Julian

Nice! This looks pretty good. I'm sure I've missed a thing or two, but I tried to double check the keyword list and give a cursory look over the token list which pretty much looks good!

pygments/lexers/lean.py

cmarqu · 2024-01-09T07:37:28Z

Maybe also add a line to the CHANGES file.

This also corrects the integer parser to not include field projections

Julian

Nice. This LGTM at this point I think (with or without the operator tweaking)!

pygments/lexers/lean.py

jeanas · 2024-01-13T20:51:12Z

pygments/lexers/lean.py

+            (r'\d+', Number.Integer),
+            (r'"', String.Double, 'string'),
+            (r'[~?][a-z][\w\']*:', Name.Variable),
+            (r'\S', Name.Builtin.Pseudo),


What is this rule for?

Arbitrary notation defined by the user; without it, all such notation would be marked as invalid

tests/test_theorem.py

jeanas · 2024-01-13T20:57:17Z

This looks generally good, only a couple minor questions above.

jeanas

Good. Thank you!

eric-wieser · 2024-01-14T14:22:18Z

Thanks! I've made a small follow-up in #2626; it would be great if they could both land in the coming release.

jeanas · 2024-01-14T14:34:38Z

Sure, this will all be in 2.18.

An improved `lean4` lexer is now part of pygments. This depends on pygments/pygments#2618 (now merged), and [a subsequent release](https://github.com/pygments/pygments/milestone/23)

eric-wieser mentioned this pull request Dec 30, 2023

doc: upstream the Lean4 pygments lexer leanprover/lean4#3125

Merged

eric-wieser force-pushed the new-lean4-syntax branch 2 times, most recently from e0e4be3 to a2266d1 Compare December 30, 2023 19:44

Kha and others added 4 commits December 30, 2023 22:53

Add lexer from https://github.com/leanprover/lean4/blob/d92948bc20b12…

e210489

…f53542814c79469711ceff19fbf/doc/latex/lean4.py This is based off an old version of the Lean 3 lexer.

Remove python2-isms

a1a2688

There is no need for `re.UNICODE` any more, or `u` prefixes.

Update to reflect Lexer changes

cca04f9

The version and url information no longer lives in the docstring.

rebuild the list of lexers

7ddd61c

eric-wieser force-pushed the new-lean4-syntax branch from 58b75ee to 843b426 Compare December 30, 2023 22:54

eric-wieser added 5 commits December 30, 2023 22:55

Add the analyze_text method

972fbf6

Do not consider unrecognized tokens as errors

a8976e2

This allows a lean4 version of the lean3 test to be committed

Various fixes copied from the lean3 lexer

1f9d198

* Use `Whitespace` not `Text` for Whitespace * Use a single token for an entire multiline comment, not one per character * Fix brace-matching for `@[attr]` syntax * Add docstring highlighting

update the name regex too

9fb3719

Add support for escaping EOL in string literals

83209aa

eric-wieser force-pushed the new-lean4-syntax branch from 843b426 to 83209aa Compare December 30, 2023 22:55

Update the lean4 name regex to include ! and ?

a7ee6a0

Julian reviewed Jan 8, 2024

View reviewed changes

pygments/lexers/lean.py Show resolved Hide resolved

pygments/lexers/lean.py Outdated Show resolved Hide resolved

pygments/lexers/lean.py Outdated Show resolved Hide resolved

pygments/lexers/lean.py Show resolved Hide resolved

eric-wieser added 4 commits January 9, 2024 14:30

add to CHANGES

ffebe66

remove syntax that doesn't exist

7832a96

Add missing keywords

43aa796

Add float parser

57f88c3

This also corrects the integer parser to not include field projections

eric-wieser force-pushed the new-lean4-syntax branch from 7b89197 to 57f88c3 Compare January 9, 2024 14:47

Julian approved these changes Jan 9, 2024

View reviewed changes

Update list of operators to exclude mathlib notation and include \mapsto

0e4e71c