-
Notifications
You must be signed in to change notification settings - Fork 658
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial commit for MCFunction Lexer + tests #2107
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's a first round of review, will take another look later. This looks quite good-quality overall.
There are a bunch of linting errors in the CI (https://github.com/pygments/pygments/runs/5868270378?check_suite_focus=true), be sure to check and fix them (run the tool locally with |
Thanks so much for these comments. Since you mentioned you wanted to take a closer look, I wasn't sure if I should implement your first-look suggestions or leave it static. Also, I had some issues with literals in the MCFunctionLexer (particularly floats and numbers). In the commands syntax, we have a coordinate concept which allows you to specify different aspects of relative through preceding operators (the tp @s 100.5 80 -100.5
tp @s 10 ~ -10
tp @s ~10 ~ ~-10
tp @s 10 ^0.5 -10
tp @s 10 ^-0.5 -10 For some reason, my literals and operators doesn't play nice with each other, but I'm not entirely sure why! (Edit: Oh, regexlint is nice, I ran |
It'd be helpful for me (and likely for other maintainers if they want to review this as well) if you started doing the updates now.
Will take a look when I review this again. |
Co-authored-by: Jean Abou-Samra <jean@abou-samra.fr>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I accidentally started a review, but I have to submit review to get those comments posted?
Edit: Working on some updates!
pygments/lexers/mcfunction.py
Outdated
], | ||
|
||
"list": [ | ||
(r"(?<=\[)[BIL](?:\;)", Keyword.Type), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NBT can have special typed Int
, Byte
and Long
arrays which start with the type:
fake_command {ints: [I; 1, 2, 3], bytes: [B; 0b, 1b, 2b], longs: [L; 1234L, 2134L, 3245L]}
I was ensuring that this type definition only appears at the start.
(accidentally clicked Start A Review, oops)
pygments/lexers/mcfunction.py
Outdated
"compound": [ | ||
# this handles the unquoted snbt keys | ||
# note: stringified keys still work | ||
(r"[A-z_]+", Name.Attribute), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mhm, this is tricky. theoretically, "yes", but it must be quoted (i think). U can even have stuff like .
and :
as a key here, but you must quote the key, which makes it almost JSON-esque. However, if something is quoted, it'll match as a literal anyways so it should be fine here.
My latest push has many things broken still (working on the refactor). I've actually fixed my issues with the float and int literals (still need to get 2.2e10 working). I'm working on refactoring the NBT aspect of the lexer. When I use the In my "nbt": [
(r"\{", Punctuation, "nbt.compound"),
(r"\[", Punctuation, "nbt.list"),
],
"nbt.compound": [
(r".+", using(SNBTLexer, state=("compound",))),
default("#pop"),
],
"nbt.list": [
(r".+", using(SNBTLexer, state=("list",))),
default("#pop"),
], And then my "root": [
(r"\{", Punctuation.SNBT.Start, "compound"), # The token has a special name for debugging
(r"[^\{]+", Text.TEST),
], With I think I'm not understanding how I think I need to add some extra rules in the state with Alternatively, I can maybe combine selector and snbt parsing (also json since snbt works for that anyways). I didn't want to do this at first, but it might be better in the long-run since the structure for the selectors and similar structures ( Edit: Currently trying this idea.. |
|
Just pushed my latest, this has generic property which treats keys and values the same across every interface of mapping (and lists). I think this is the best way to handle the various property maps across this strange language. I think I might use the subclassing method to handle the literals the same across the |
Just a few more comments - also the mapfile seems to be out of date, so please run |
Co-authored-by: Georg Brandl <georg@python.org>
Co-authored-by: Georg Brandl <georg@python.org>
I'm getting closer to a completed lexer here (looks like I still failed tests?). At the moment, I'm mostly worried about my handling of some literals. I think the route I went with a generic property map was successful, hopefully I did fine with my usage of state (I tried to avoid state inflation by ensuring I popped as I went). I actually used a small debug script in the Edit: Tests failed bc I stupidly changed |
This time, I made sure to pass all of the regex tests and ran |
Hi, I'm a newer open-source contributor so hopefully I've dotted my eyes and crossed my ts.
I'm committing 2 lexers for the
mcfunction
language (and the snbt data format) used in the popular game, Minecraft. This lexer was loosely based off of work done by @Arcensoth here, which is currently being used to syntax highlightmcfunction
on Github (and most text editors). This lexer mostly strays away from the line-based lexing approach that's currently on Github, which allows it to work for commands and data to span multiple lines cleanly.This lexer is designed for the Java Edition of the game (you can read more about the language on the wiki), however, it should also function for the Bedrock Edition.
I've included 1 full examplefiles test alongside some key snippets to help cover the entire lexer. Potentially, we can add more to help cover the lexer if needed.
I've tried my best to keep a approach to all of the regex, but I'm a little unsure on some of the detailed cases. Particularly, I spun out the
snbt
data format (read more here), since it's quite useful to highlight this data on it's own, but I ran into some weird handling trying to reuse the same lexer when needed inside the language. This is mostly due to the fact that both JSON and SNBT are very similar, yet, distinct data formats that the game uses (sometimes, inside of each other).I also had some struggles with the
"selector"
state management, I feel like it could be simplified somewhat. Hopefully, someone else can take a look and provide suggestions.I appreciate anyone taking a look at this PR! Since Github has
mcfunction
support, it's common to see it used in codeblocks in documentation. I'm hoping this lexer will be useful for those using Sphinx alongside their minecraft project.Thanks!