New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update ActionScript regex heuristic #5101
Update ActionScript regex heuristic #5101
Conversation
lib/linguist/heuristics.yml
Outdated
@@ -64,7 +64,7 @@ disambiguations: | |||
- extensions: ['.as'] | |||
rules: | |||
- language: ActionScript | |||
pattern: '^\s*(?:package(?:\s+[\w.]+)?\s+(?:{|$)|import\s+[\w.*]+\s*;|(?=.*?(?:intrinsic|extends))(intrinsic\s+)?class\s+[\w<>.]+(?:\s+extends\s+[\w<>.]+)?|(?:(?:public|protected|private|static)\s+)*(?:(?:var|const|local)\s+\w+\s*:\s*[\w<>.]+(?:\s*=.*)?\s*;|function\s+\w+\s*\((?:\s*\w+(?:\s*:\s*[\w<>.]+)?,?)*\)(?:\s*:\s*[\w<>.]+)?))' | |||
pattern: '^\s*(?:package(?:\s+[\w.]+)?\s+(?:{|$)|import\s+[\w.*]+\s*;|(?=.*?(?:intrinsic|extends))(intrinsic\s+)?class\s+[\w<>.]+(?:\s+extends\s+[\w<>.]+)?|(?:(?:public|protected|private|static)\s+)*(?:(?:var|const|local)\s+\w+\s*:\s*[\w<>.]+(?:\s*=.*)?\s*;|function\s+\w+\s*\((?:\s*\w+(?:\s*:\s*[\w<>.]+)?,?)*\s*\)(?:\s*:\s*[\w<>.]+)?))' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pattern: '^\s*(?:package(?:\s+[\w.]+)?\s+(?:{|$)|import\s+[\w.*]+\s*;|(?=.*?(?:intrinsic|extends))(intrinsic\s+)?class\s+[\w<>.]+(?:\s+extends\s+[\w<>.]+)?|(?:(?:public|protected|private|static)\s+)*(?:(?:var|const|local)\s+\w+\s*:\s*[\w<>.]+(?:\s*=.*)?\s*;|function\s+\w+\s*\((?:\s*\w+(?:\s*:\s*[\w<>.]+)?,?)*\s*\)(?:\s*:\s*[\w<>.]+)?))' | |
pattern: '^\s*(?:package(?:\s+[\w.]+)?\s+(?:{|$)|import\s+[\w.*]+\s*;|(?=.*?(?:intrinsic|extends))(intrinsic\s+)?class\s+[\w<>.]+(?:\s+extends\s+[\w<>.]+)?|(?:(?:public|protected|private|static)\s+)*(?:(?:var|const|local)\s+\w+\s*:\s*[\w<>.]+(?:\s*=.*)?\s*;|function\s+\w+\s*\((?:\s*\w+\s*:\s*[\w<>.]+\s*(,\s*\w+\s*:\s*[\w<>.]+\s*)*\)?\)(?:\s*:\s*[\w<>.]+)?))' |
I think a head(, tail)*
form for the function arguments would be more resilient and less prone to backtracking:
(?:\s*\w+(?:\s*:\s*[\w<>.]+)?,?)*\s*
has a lot of starred sub-patterns and because the comma separator itself is optional,arg:type
could be matched as[arg:type]
, or[a, rg:type]
or[a, r, g:type]
, etc.(?:\s*\w+\s*:\s*[\w<>.]+\s*(,\s*\w+\s*:\s*[\w<>.]+\s*)*\)?
changes this to match an argument, which now requires a:type
, and optionally matches additional comma-arg pairs.
I'm curious about the [\w<>.]+
pattern allowed to the right of the :
. Looking at the AS3 docs I'm not sure if <>
ever shows up there. The .
could be valid if we want to match ...args
, and if we did, we could add that as a trailing pattern: (?:\s*\w+\s*:\s*[\w<>.]+\s*(,\s*\w+\s*:\s*[\w<>.]+\s*)(,\s*\.{3}\s*\w+)?*\)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zerowidth Looks like the suggested regex is missing a closing paren and doesn't match functions with an arity of 0.
I'm not sure if
<>
ever shows up there
Heh, no idea 😅 . My main goal with this PR is to fix the acute perf problem and I didn't consider auditing the regex fidelity to AS3 syntax.
Co-authored-by: Nathan Witmer <nathan@zerowidth.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks @BenEddy 🙇
Fixes a performance and correctness issue with the ActionScript heuristic regex. Specifically, the regex didn't account for trailing space between the final function argument type and the closing paren.
Checklist: