Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix errors in grammar syntax #416

Merged
merged 12 commits into from
Oct 25, 2022

Conversation

DaelonSuzuka
Copy link
Collaborator

@DaelonSuzuka DaelonSuzuka commented Sep 5, 2022

Per #415, this PR fixes some regexes that appeared to work in VSCode/onigumura but do not work in linguist using PCRE.

When this PR is done, I need to remember to post in github-linguist/linguist#3924 and let the linguist project know that we've fixed our grammars.

Additionally, @lildude has documented a process here that we can use to verify that these changes satisfy linguist's requirements. It's worth investigating if this process or a derivative can be integrated into our CI, which will at least provide syntax validation of our grammar files.

All three of the errors are invalid uses of positive lookbehind using a non-fixed length.

Problem areas:

I've already fixed the first problem, in the GDScript grammar, but unfortunately I've discovered that there was a regression sometime between 1.3.1 and now. The GDShader syntax doesn't even run, and there aren't any error messages about failing to load the grammar.

I'll have to find the cause of this regression before I can fix the two regex errors in GDShader.

@DaelonSuzuka
Copy link
Collaborator Author

DaelonSuzuka commented Sep 5, 2022

I have absolutely no idea how this happened, but there were a ton of regex special characters that were missing the second backslash(they need to be double escaped because reasons). I recopied the grammar from @AlfishSoftware's extension, fixed our handful of differences, and it's working again.

I replaced the invalid lookbehinds and it seems to work. Next, I'll run linguist's grammar compiler and see if we pass.

@Calinou Calinou added the bug label Sep 5, 2022
@DaelonSuzuka DaelonSuzuka marked this pull request as ready for review September 5, 2022 20:16
@DaelonSuzuka
Copy link
Collaborator Author

I followed the steps to set up linguist, ran script/bootstrap, manually replaced the grammars it download with the fixed versions from this PR, ran script/grammar-compiler add vendor/grammars/godot-vscode-plugin/, and I believe this is a positive result:

daelon@isotope:~/linguist$ sudo script/grammar-compiler add vendor/grammars/godot-vscode-plugin/
latest: Pulling from linguist/grammar-compiler
Digest: sha256:203514bbf04b83d3594245338fc08097301e54139ccec20ab05861302c23ab64
Status: Image is up to date for linguist/grammar-compiler:latest
docker.io/linguist/grammar-compiler:latest
OK! added grammar source 'vendor/grammars/godot-vscode-plugin/'
        new scope: source.gdshader
        new scope: source.gdresource
        new scope: source.gdscript
daelon@isotope:~/linguist$ 

@lildude does this mean that your compiler is happy now?

@Calinou I think this is done, but it might be safer to wait for the thumbs-up from the linguist people before merging.

@AlfishSoftware
Copy link

By the way, I didn't test it in VSCode (no time at the moment), so I don't know if the dot changes affect .length(). It might be the case where it picks .length as identifierField instead of length() as a identifierFunction as before, so you might want to double-check that length is colored correctly as a function, just in case.

@DaelonSuzuka
Copy link
Collaborator Author

I'll definitely double check that, thanks for the tip.

@DaelonSuzuka
Copy link
Collaborator Author

It might be the case where it picks .length as identifierField instead of length() as a identifierFunction

It was doing exactly this. Fantastic catch, as usual.

syntaxes/GDShader.tmLanguage.json Outdated Show resolved Hide resolved
syntaxes/GDShader.tmLanguage.json Outdated Show resolved Hide resolved
@AlfishSoftware
Copy link

AlfishSoftware commented Sep 6, 2022

Oh. I just realized review comments aren't published right away, it seems? Sorry about that, I guess you didn't see the first one (marked outdated) before.

@DaelonSuzuka
Copy link
Collaborator Author

Oh. I just realized review comments aren't published right away, it seems? Sorry about that, I guess you didn't see the first one (marked outdated) before.

I replied to every comment I can see, and I marked the first one as 'resolved' after I (tried to) fix the relevant bits.

If there are other comments I didn't interact with, then I definitely blame github's rather obtuse UI.

@DaelonSuzuka
Copy link
Collaborator Author

@Calinou This PR is also done. I rebased it to master myself just to confirm that #419 doesn't conflict(both PRs touched the GDScript grammar).

Comment on lines 289 to 290
"match": "\\b(?<=for\\s[\\w]*\\s)(in)\\b",
"match": "\\b(?:for\\s[\\w]*\\s)(in)\\b",
"name": "keyword.control.gdscript"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't test this, but I suspect that it will now consider the whole thing as a keyword.
E.g. for i in array then for i in is a keyword. It shouldn't mark the loop variable (i in this case) as a keyword.

The correct thing to do here is to join this with the for keyword so that one rule recognizes both keywords. You don't need to remove for from control_flow (for should still be a keyword even before in is typed) as long as this is before so it has higher priority; but change this rule to something along the lines of:

{
  "match": "\\b(for)\\s+([a-zA-Z_]\\w*)\\s+(in)\\b",
  "captures": {
    "1": { "name": "keyword.control.gdscript" },
    "2": { "name": "variable.name.gdscript" },
    "3": { "name": "keyword.control.gdscript" }
  }
},

I changed \w (which matches [a-zA-Z0-9_] including numbers) since I believe identifiers can't start with a number (correct me if I'm wrong). It's probably worth checking this on other places in the code too. I'm assuming identifiers allow ASCII only.
I also allowed more than one space in the syntax.
Btw, using a tool like https://regex101.com/ while you write it really helps with understanding nuances in regexes and checking that it will really do what is expected, and prevent against corner cases.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot believe that I missed this. You were correct, i was purple.

image

Your rule worked great. I removed the middle capture group and it's name block, because we don't actually use the variable.name tag for variables. Python doesn't tag variable names, they're just white, so that's the behavior I want to imitate.

I changed \w (which matches [a-zA-Z0-9_] including numbers) since I believe identifiers can't start with a number (correct me if I'm wrong). It's probably worth checking this on other places in the code too. I'm assuming identifiers allow ASCII only.

Many other places used the full form [a-zA-Z_][a-zA-Z_0-9]*, but I just replaced them with [a-zA-Z_]\\w*. I wish we could give a block like this a custom name so we don't have to copy it everywhere.

Btw, using a tool like regex101.com while you write it really helps with understanding nuances in regexes and checking that it will really do what is expected, and prevent against corner cases.

I started off doing this, but having to add all the extra escapes to move a rule over to vscode was too annoying so I stopped. I've thought about switching to yaml files, since the regex strings won't need escapes, but there was some problem with that too.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably worth checking this on other places in the code too.

Many other places used the full form [a-zA-Z_][a-zA-Z_0-9]*, but I just replaced them with [a-zA-Z_]\\w*.

I was actually referring to places capturing identifiers with just \w+, which would allow names starting with numbers, e.g. 1whatever, 0e2f, 0xFACADE. I took a look and saw it in 2 places.
So, unless this is on purpose for some reason, this seems like it makes more sense:

Line 360:

"match": "(:)\\s*([a-zA-Z_]\\w*)?",

Line 375:

"match": "(setget)\\s+([a-zA-Z_]\\w*)?(?:[,]\\s*([a-zA-Z_]\\w*)?)?",

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another good catch, thanks. I found some other places where identifier names were being matched with [[:alpha:]_]\\w*, so I standardized all of them to [a-zA-Z_]\\w*.

@DaelonSuzuka
Copy link
Collaborator Author

I checked this latest commit with linguist again, and there's no warnings/errors.

I've decided that I'm definitely going to set up that grammar unit testing tool, but I don't have time right now.

@DaelonSuzuka
Copy link
Collaborator Author

I checked the latest changes with linguist and it looks like they're good to go.

@DaelonSuzuka DaelonSuzuka added this to the 1.4.0 milestone Oct 25, 2022
Copy link
Member

@Calinou Calinou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@Calinou Calinou merged commit 5a67e4c into godotengine:master Oct 25, 2022
@AlfishSoftware
Copy link

AlfishSoftware commented Oct 26, 2022

Cool! Later I'll replicate those changes to gdshader (removing look-behinds) back in my own extension too, just in case.

@aaronfranke
Copy link
Member

Does this PR fix #415?

@DaelonSuzuka
Copy link
Collaborator Author

Does this PR fix #415?

It does, but there's been some delay in communicating with the Github LInguist project to make sure the original source of the reported issue is fixed. I intended to close all the related items together but then time got away from me.

Thanks for noticing my miss!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants