Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update ActionScript regex heuristic #5101

Merged
merged 2 commits into from Dec 11, 2020

Conversation

BenEddy
Copy link
Contributor

@BenEddy BenEddy commented Dec 2, 2020

Fixes a performance and correctness issue with the ActionScript heuristic regex. Specifically, the regex didn't account for trailing space between the final function argument type and the closing paren.

test_input = "function testFunc ( argOne : Boolean, argTwo : Boolean, argThree : Boolean )"
# Previous regex hangs 
test_input.match(/^\s*(?:package(?:\s+[\w.]+)?\s+(?:{|$)|import\s+[\w.*]+\s*;|(?=.*?(?:intrinsic|extends))(intrinsic\s+)?class\s+[\w<>.]+(?:\s+extends\s+[\w<>.]+)?|(?:(?:public|protected|private|static)\s+)*(?:(?:var|const|local)\s+\w+\s*:\s*[\w<>.]+(?:\s*=.*)?\s*;|function\s+\w+\s*\((?:\s*\w+(?:\s*:\s*[\w<>.]+)?,?)*\)(?:\s*:\s*[\w<>.]+)?))
# New regex terminates
test_input.match(/^\s*(?:package(?:\s+[\w.]+)?\s+(?:{|$)|import\s+[\w.*]+\s*;|(?=.*?(?:intrinsic|extends))(intrinsic\s+)?class\s+[\w<>.]+(?:\s+extends\s+[\w<>.]+)?|(?:(?:public|protected|private|static)\s+)*(?:(?:var|const|local)\s+\w+\s*:\s*[\w<>.]+(?:\s*=.*)?\s*;|function\s+\w+\s*\((?:\s*\w+(?:\s*:\s*[\w<>.]+)?,?)*\s*\)(?:\s*:\s*[\w<>.]+)?))/)
=> #<MatchData "function testFunc (  argOne : Boolean, argTwo : Boolean, argThree : Boolean )" 1:nil>

Checklist:

  • I am adding new or changing current functionality
    • I have added or updated the tests for the new or changed functionality.

@@ -64,7 +64,7 @@ disambiguations:
- extensions: ['.as']
rules:
- language: ActionScript
pattern: '^\s*(?:package(?:\s+[\w.]+)?\s+(?:{|$)|import\s+[\w.*]+\s*;|(?=.*?(?:intrinsic|extends))(intrinsic\s+)?class\s+[\w<>.]+(?:\s+extends\s+[\w<>.]+)?|(?:(?:public|protected|private|static)\s+)*(?:(?:var|const|local)\s+\w+\s*:\s*[\w<>.]+(?:\s*=.*)?\s*;|function\s+\w+\s*\((?:\s*\w+(?:\s*:\s*[\w<>.]+)?,?)*\)(?:\s*:\s*[\w<>.]+)?))'
pattern: '^\s*(?:package(?:\s+[\w.]+)?\s+(?:{|$)|import\s+[\w.*]+\s*;|(?=.*?(?:intrinsic|extends))(intrinsic\s+)?class\s+[\w<>.]+(?:\s+extends\s+[\w<>.]+)?|(?:(?:public|protected|private|static)\s+)*(?:(?:var|const|local)\s+\w+\s*:\s*[\w<>.]+(?:\s*=.*)?\s*;|function\s+\w+\s*\((?:\s*\w+(?:\s*:\s*[\w<>.]+)?,?)*\s*\)(?:\s*:\s*[\w<>.]+)?))'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pattern: '^\s*(?:package(?:\s+[\w.]+)?\s+(?:{|$)|import\s+[\w.*]+\s*;|(?=.*?(?:intrinsic|extends))(intrinsic\s+)?class\s+[\w<>.]+(?:\s+extends\s+[\w<>.]+)?|(?:(?:public|protected|private|static)\s+)*(?:(?:var|const|local)\s+\w+\s*:\s*[\w<>.]+(?:\s*=.*)?\s*;|function\s+\w+\s*\((?:\s*\w+(?:\s*:\s*[\w<>.]+)?,?)*\s*\)(?:\s*:\s*[\w<>.]+)?))'
pattern: '^\s*(?:package(?:\s+[\w.]+)?\s+(?:{|$)|import\s+[\w.*]+\s*;|(?=.*?(?:intrinsic|extends))(intrinsic\s+)?class\s+[\w<>.]+(?:\s+extends\s+[\w<>.]+)?|(?:(?:public|protected|private|static)\s+)*(?:(?:var|const|local)\s+\w+\s*:\s*[\w<>.]+(?:\s*=.*)?\s*;|function\s+\w+\s*\((?:\s*\w+\s*:\s*[\w<>.]+\s*(,\s*\w+\s*:\s*[\w<>.]+\s*)*\)?\)(?:\s*:\s*[\w<>.]+)?))'

I think a head(, tail)* form for the function arguments would be more resilient and less prone to backtracking:

  • (?:\s*\w+(?:\s*:\s*[\w<>.]+)?,?)*\s* has a lot of starred sub-patterns and because the comma separator itself is optional, arg:type could be matched as [arg:type], or [a, rg:type] or [a, r, g:type], etc.
  • (?:\s*\w+\s*:\s*[\w<>.]+\s*(,\s*\w+\s*:\s*[\w<>.]+\s*)*\)? changes this to match an argument, which now requires a :type, and optionally matches additional comma-arg pairs.

I'm curious about the [\w<>.]+ pattern allowed to the right of the :. Looking at the AS3 docs I'm not sure if <> ever shows up there. The . could be valid if we want to match ...args, and if we did, we could add that as a trailing pattern: (?:\s*\w+\s*:\s*[\w<>.]+\s*(,\s*\w+\s*:\s*[\w<>.]+\s*)(,\s*\.{3}\s*\w+)?*\)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zerowidth Looks like the suggested regex is missing a closing paren and doesn't match functions with an arity of 0.

I'm not sure if <> ever shows up there

Heh, no idea 😅 . My main goal with this PR is to fix the acute perf problem and I didn't consider auditing the regex fidelity to AS3 syntax.

Co-authored-by: Nathan Witmer <nathan@zerowidth.com>
Copy link
Member

@lildude lildude left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @BenEddy 🙇

@lildude lildude merged commit 726fe04 into github-linguist:master Dec 11, 2020
@lildude lildude mentioned this pull request Dec 11, 2020
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants