New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The parser does not like long boolean expressions #3634
Comments
What is the target language? C#? Note that technically ALL( |
Yes, we are a C# project. We are parsing VBA code. |
@parrt correct. Are we doomed? Our grammar is here if you want to have a look - it's based on the VBA language specifications: https://github.com/rubberduck-vba/Rubberduck/blob/next/Rubberduck.Parsing/Grammar/VBAParser.g4 |
Also, feel free to visit our warroom here: https://chat.stackexchange.com/rooms/14929/vba-rubberducking |
Not doomed but might have to convert https://github.com/rubberduck-vba/Rubberduck/blob/next/Rubberduck.Parsing/Grammar/VBAParser.g4#L588 to avoid left-recursion. Are you using latest antlr 4.7.1? |
@parrt still on v4.3... seems there have been a number of breaking changes we're struggling with. Conversion is in progress though. Not sure avoiding recursion is possible with that one (not to mention all the implications in our resolver code).. hopefully 4.7 works better with it? |
There have been important optimizations in later version so definitely upgrade
…Sent from my iPhone
On Dec 21, 2017, at 4:32 PM, Mathieu Guindon ***@***.***> wrote:
@parrt still on v4.3... seems there have been a number of breaking changes we're struggling with. Conversion is in progress though. Not sure avoiding recursion is possible with that one (not to mention all the implications in our resolver code).. hopefully 4.7 works better with it?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Some testing...
This parses/resolves in 4 seconds: Sub test()
Debug.Print False And _
False And _
False And _
False And _
False And _
False And _
False And _
False And _
False And _
False And _
False And _
False And _
False
End Sub Whereas, this resolves in 12 seconds: Sub test()
Debug.Print Not False And _
Not False And _
Not False And _
Not False And _
Not False And _
Not False And _
Not False And _
Not False And _
Not False And _
Not False And _
Not False And _
Not False And _
Not False
End Sub But, if I edit that code to substitute And this is still resolving after 10 minutes... Sub test()
Debug.Print Not 42 = 42 And _
Not 42 = 42 And _
Not 42 = 42 And _
Not 42 = 42 And _
Not 42 = 42 And _
Not 42 = 42 And _
Not 42 = 42 And _
Not 42 = 42 And _
Not 42 = 42 And _
Not 42 = 42 And _
Not 42 = 42 And _
Not 42 = 42 And _
Not 42 = 42
End Sub |
Definitely caches partial parsing results so that similar parses will get the benefit of reusing the analysis. It warms up just like a jit.
…Sent from my iPhone
On Dec 21, 2017, at 5:57 PM, ThunderFrame ***@***.***> wrote:
Some testing...
I'm getting odd behavior with refresh times - it's almost like ANTLR is caching a parsing strategy... The first parse takes some time, but with small edits, the subsequent parse is very fast.
I'm not sure that it's the multiple boolean conditions per se, but rather the complexity of the expressions
This parses/resolves in 4 seconds:
Sub test()
Debug.Print False And _
False And _
False And _
False And _
False And _
False And _
False And _
False And _
False And _
False And _
False And _
False And _
False
End Sub
Whereas, this resolves in 12 seconds:
Sub test()
Debug.Print Not False And _
Not False And _
Not False And _
Not False And _
Not False And _
Not False And _
Not False And _
Not False And _
Not False And _
Not False And _
Not False And _
Not False And _
Not False
End Sub
But, if I edit that code to substitute False for True, and reparse, and then revert True to False and reparse, it resolves in 4 seconds. It's like ANTLR has cached the parsing strategy for this statement. If I add an extra condition and reparse, the parse time shoots up again, and then if I remove the extra condition and reparse, the parse time is low again, which suggests that the cached optimizations are independent of individual reparses?
And this is still resolving after 10 minutes...
Sub test()
Debug.Print Not 42 = 42 And _
Not 42 = 42 And _
Not 42 = 42 And _
Not 42 = 42 And _
Not 42 = 42 And _
Not 42 = 42 And _
Not 42 = 42 And _
Not 42 = 42 And _
Not 42 = 42 And _
Not 42 = 42 And _
Not 42 = 42 And _
Not 42 = 42 And _
Not 42 = 42
End Sub
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Are you using two-stage parsing? SLL then full ALL only if SLL fails? |
@parrt Yup. We attempt a quick parse first, and fallback to the slower prediction mode if the initial pass throws a parse exception. |
okidoki. we had a big optimization for left-recursive expressions in a recent release, so definitely upgrade. |
yes. We first attempt to parse with SLL to get the performance boost. In something like <= 5% of cases (from logfiles posted in issues) SLL will fail, and we retry parsing the whole module with LL prediction. See VBAModuleParser for code |
note that we're running multiple parse passes that way, because Attributes are only available when exporting a module. We need the line numbers to match with the displayed code though, so there's two runs for each module. Additionally we handle compilation directives before each of these runs. |
As @Vogel612 already showed above, the complexity of the boolean expression seems to be an important factor. I think the basic problem here is that the parser has a hard time sorting out where the operands of Parses in about 1s: Public Sub Test()
Dim Ignore As Boolean
Ignore = TypeOf Me Is Class1 And TypeOf Me Is Class1 And _
(TypeOf Me Is Class1) And (TypeOf Me Is Class1) And _
(TypeOf Me Is Class1) And (TypeOf Me Is Class1) And _
(TypeOf Me Is Class1) And (TypeOf Me Is Class1) And _
(TypeOf Me Is Class1) And (TypeOf Me Is Class1) And _
(TypeOf Me Is Class1) And (TypeOf Me Is Class1) And _
(TypeOf Me Is Class1) And (TypeOf Me Is Class1) And _
(TypeOf Me Is Class1) And (TypeOf Me Is Class1) And _
(TypeOf Me Is Class1) And (TypeOf Me Is Class1) And _
(TypeOf Me Is Class1) And (TypeOf Me Is Class1) And _
(TypeOf Me Is Class1) And (TypeOf Me Is Class1) And _
(TypeOf Me Is Class1) And (TypeOf Me Is Class1) And _
(TypeOf Me Is Class1) And (TypeOf Me Is Class1) And _
(TypeOf Me Is Class1) And (TypeOf Me Is Class1) And _
(TypeOf Me Is Class1) And (TypeOf Me Is Class1) And _
(TypeOf Me Is Class1) And (TypeOf Me Is Class1) And _
(TypeOf Me Is Class1) And (TypeOf Me Is Class1) And _
(TypeOf Me Is Class1) And (TypeOf Me Is Class1) And _
(TypeOf Me Is Class1) And (TypeOf Me Is Class1) And _
(TypeOf Me Is Class1) And (TypeOf Me Is Class1) And _
(TypeOf Me Is Class1) And (TypeOf Me Is Class1) And _
(TypeOf Me Is Class1) And (TypeOf Me Is Class1) And _
(TypeOf Me Is Class1) And (TypeOf Me Is Class1) And _
(TypeOf Me Is Class1) And (TypeOf Me Is Class1) And _
(TypeOf Me Is Class1) And (TypeOf Me Is Class1)
End Sub Parses in about 380s: Public Sub Test()
Dim Ignore As Boolean
Ignore = TypeOf Me Is Class1 And TypeOf Me Is Class1 And _
TypeOf Me Is Class1 And TypeOf Me Is Class1 And _
TypeOf Me Is Class1 And TypeOf Me Is Class1 And _
TypeOf Me Is Class1 And TypeOf Me Is Class1 And _
TypeOf Me Is Class1 And TypeOf Me Is Class1 And _
TypeOf Me Is Class1 And TypeOf Me Is Class1
End Sub |
I can confirm that this has been fixed by the move to Antlr 4.6.4 (PR #3715). Now the parse for the examples above is basically instant. |
The problem is that the parser slows down to a crawl in the presence of long boolean expressions. The runtime seems to be roughly exponential in the number of concatenated expressions.
Examples:
From #3556
From chat:
My guess is that the parser is constantly guessing the wrong way to proceed. Reordering the subrules of expression might help. Upgrading Antlr might help as well, since the left-recursion resollution changed. (This would make changes to the ordering necesary anyway in order to get the SLL parser to work again.)
ref #2498
The text was updated successfully, but these errors were encountered: