Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The parser does not like long boolean expressions #3634

Closed
MDoerner opened this issue Dec 21, 2017 · 16 comments
Closed

The parser does not like long boolean expressions #3634

MDoerner opened this issue Dec 21, 2017 · 16 comments
Labels
critical Marks a bug as a must-fix, showstopper issue enhancement Feature requests, or enhancements to existing features. Ideas. Anything within the project's scope. performance

Comments

@MDoerner
Copy link
Contributor

The problem is that the parser slows down to a crawl in the presence of long boolean expressions. The runtime seems to be roughly exponential in the number of concatenated expressions.

Examples:

From #3556

If preserveUserColor = True _
   And rCell.row > 6 _
   And Not cColor = matColor("blue", 50) _
   And Not cColor = matColor("blue", 100) _
   And Not cColor = matColor("blue", 200) _
   And Not cColor = matColor("blue", 300) _
   And Not cColor = matColor("blue", 400) _
   And Not cColor = matColor("blue", 500) _
   And Not cColor = matColor("blue", 600) _
   And Not cColor = matColor("blue", 700) _
   And Not cColor = matColor("blue", 800) _
   And Not cColor = matColor("blue", 900) _
   And Not cColor = matColor("amber", 100) _
   And Not cColor = matColor("amber", 200) _
   And Not cColor = matColor("lime", 100) _
   And Not cColor = matColor("lime", 200) _
   And Not cColor = matColor("bluegrey", 100) _
   And Not cColor = matColor("bluegrey", 200) _
   And Not cColor = matColor("deeppurple", 100) _
   And Not cColor = matColor("deeppurple", 200) _
   And Not cColor = matColor("yellow", 200) _
   And Not cColor = matColor("teal", 200) _
Then GoTo skiprCellFor

From chat:

Dim Ignore As Boolean
Ignore = TypeOf Me Is Class1 And TypeOf Me Is Class1 And _
TypeOf Me Is Class1 And TypeOf Me Is Class1 And _
TypeOf Me Is Class1 And TypeOf Me Is Class1 And _
TypeOf Me Is Class1 And TypeOf Me Is Class1 And _
TypeOf Me Is Class1 And TypeOf Me Is Class1 And _
TypeOf Me Is Class1 And TypeOf Me Is Class1 And _
TypeOf Me Is Class1 And TypeOf Me Is Class1 And _
TypeOf Me Is Class1 

My guess is that the parser is constantly guessing the wrong way to proceed. Reordering the subrules of expression might help. Upgrading Antlr might help as well, since the left-recursion resollution changed. (This would make changes to the ordering necesary anyway in order to get the SLL parser to work again.)

ref #2498

@Vogel612 Vogel612 added critical Marks a bug as a must-fix, showstopper issue enhancement Feature requests, or enhancements to existing features. Ideas. Anything within the project's scope. performance labels Dec 21, 2017
@parrt
Copy link

parrt commented Dec 22, 2017

What is the target language? C#? Note that technically ALL(*) is not linear and large expressions can hit a landmine, often with big diffs between target runtime speed.

@Hosch250
Copy link
Member

Yes, we are a C# project. We are parsing VBA code.

@retailcoder
Copy link
Member

@parrt correct. Are we doomed? Our grammar is here if you want to have a look - it's based on the VBA language specifications: https://github.com/rubberduck-vba/Rubberduck/blob/next/Rubberduck.Parsing/Grammar/VBAParser.g4

@Hosch250
Copy link
Member

Also, feel free to visit our warroom here: https://chat.stackexchange.com/rooms/14929/vba-rubberducking

@parrt
Copy link

parrt commented Dec 22, 2017

Not doomed but might have to convert https://github.com/rubberduck-vba/Rubberduck/blob/next/Rubberduck.Parsing/Grammar/VBAParser.g4#L588 to avoid left-recursion. Are you using latest antlr 4.7.1?

@retailcoder
Copy link
Member

@parrt still on v4.3... seems there have been a number of breaking changes we're struggling with. Conversion is in progress though. Not sure avoiding recursion is possible with that one (not to mention all the implications in our resolver code).. hopefully 4.7 works better with it?

@parrt
Copy link

parrt commented Dec 22, 2017 via email

@ThunderFrame
Copy link
Member

Some testing...

  • I'm getting odd behavior with refresh times - it's almost like ANTLR is caching a parsing strategy... The first parse takes some time, but with small edits, the subsequent parse is very fast.

  • I'm not sure that it's the multiple boolean conditions per se, but rather the complexity of the expressions

This parses/resolves in 4 seconds:

Sub test()
  Debug.Print False And _
              False And _
              False And _
              False And _
              False And _
              False And _
              False And _
              False And _
              False And _
              False And _
              False And _
              False And _
              False
End Sub

Whereas, this resolves in 12 seconds:

Sub test()
  Debug.Print Not False And _
              Not False And _
              Not False And _
              Not False And _
              Not False And _
              Not False And _
              Not False And _
              Not False And _
              Not False And _
              Not False And _
              Not False And _
              Not False And _
              Not False
End Sub

But, if I edit that code to substitute False for True, and reparse, and then revert True to False and reparse, it resolves in 4 seconds. It's like ANTLR has cached the parsing strategy for this statement. If I add an extra condition and reparse, the parse time shoots up again, and then if I remove the extra condition and reparse, the parse time is low again, which suggests that the cached optimizations are independent of individual reparses?

And this is still resolving after 10 minutes...

Sub test()
  Debug.Print Not 42 = 42 And _
              Not 42 = 42 And _
              Not 42 = 42 And _
              Not 42 = 42 And _
              Not 42 = 42 And _
              Not 42 = 42 And _
              Not 42 = 42 And _
              Not 42 = 42 And _
              Not 42 = 42 And _
              Not 42 = 42 And _
              Not 42 = 42 And _
              Not 42 = 42 And _
              Not 42 = 42
End Sub

@parrt
Copy link

parrt commented Dec 22, 2017 via email

@parrt
Copy link

parrt commented Dec 22, 2017

Are you using two-stage parsing? SLL then full ALL only if SLL fails?

@retailcoder
Copy link
Member

@parrt Yup. We attempt a quick parse first, and fallback to the slower prediction mode if the initial pass throws a parse exception.

@parrt
Copy link

parrt commented Dec 22, 2017

okidoki. we had a big optimization for left-recursive expressions in a recent release, so definitely upgrade.

@Vogel612
Copy link
Member

yes. We first attempt to parse with SLL to get the performance boost. In something like <= 5% of cases (from logfiles posted in issues) SLL will fail, and we retry parsing the whole module with LL prediction. See VBAModuleParser for code

@Vogel612
Copy link
Member

note that we're running multiple parse passes that way, because Attributes are only available when exporting a module. We need the line numbers to match with the displayed code though, so there's two runs for each module.

Additionally we handle compilation directives before each of these runs.

@MDoerner
Copy link
Contributor Author

As @Vogel612 already showed above, the complexity of the boolean expression seems to be an important factor. I think the basic problem here is that the parser has a hard time sorting out where the operands of And and Or begin and end. In the presence of parentheses, the parse is much faster.

Parses in about 1s:

Public Sub Test()
    Dim Ignore As Boolean
    Ignore = TypeOf Me Is Class1 And TypeOf Me Is Class1 And _
    (TypeOf Me Is Class1) And (TypeOf Me Is Class1) And _
    (TypeOf Me Is Class1) And (TypeOf Me Is Class1) And _
    (TypeOf Me Is Class1) And (TypeOf Me Is Class1) And _
    (TypeOf Me Is Class1) And (TypeOf Me Is Class1) And _
    (TypeOf Me Is Class1) And (TypeOf Me Is Class1) And _
    (TypeOf Me Is Class1) And (TypeOf Me Is Class1) And _
    (TypeOf Me Is Class1) And (TypeOf Me Is Class1) And _
    (TypeOf Me Is Class1) And (TypeOf Me Is Class1) And _
    (TypeOf Me Is Class1) And (TypeOf Me Is Class1) And _
    (TypeOf Me Is Class1) And (TypeOf Me Is Class1) And _
    (TypeOf Me Is Class1) And (TypeOf Me Is Class1) And _
    (TypeOf Me Is Class1) And (TypeOf Me Is Class1) And _
    (TypeOf Me Is Class1) And (TypeOf Me Is Class1) And _
    (TypeOf Me Is Class1) And (TypeOf Me Is Class1) And _
    (TypeOf Me Is Class1) And (TypeOf Me Is Class1) And _
    (TypeOf Me Is Class1) And (TypeOf Me Is Class1) And _
    (TypeOf Me Is Class1) And (TypeOf Me Is Class1) And _
    (TypeOf Me Is Class1) And (TypeOf Me Is Class1) And _
    (TypeOf Me Is Class1) And (TypeOf Me Is Class1) And _
    (TypeOf Me Is Class1) And (TypeOf Me Is Class1) And _
    (TypeOf Me Is Class1) And (TypeOf Me Is Class1) And _
    (TypeOf Me Is Class1) And (TypeOf Me Is Class1) And _
    (TypeOf Me Is Class1) And (TypeOf Me Is Class1) And _
    (TypeOf Me Is Class1) And (TypeOf Me Is Class1)
End Sub

Parses in about 380s:

Public Sub Test()
    Dim Ignore As Boolean
    Ignore = TypeOf Me Is Class1 And TypeOf Me Is Class1 And _
    TypeOf Me Is Class1 And TypeOf Me Is Class1 And _
    TypeOf Me Is Class1 And TypeOf Me Is Class1 And _
    TypeOf Me Is Class1 And TypeOf Me Is Class1 And _
    TypeOf Me Is Class1 And TypeOf Me Is Class1 And _
    TypeOf Me Is Class1 And TypeOf Me Is Class1
End Sub

@MDoerner
Copy link
Contributor Author

I can confirm that this has been fixed by the move to Antlr 4.6.4 (PR #3715).

Now the parse for the examples above is basically instant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
critical Marks a bug as a must-fix, showstopper issue enhancement Feature requests, or enhancements to existing features. Ideas. Anything within the project's scope. performance
Projects
None yet
Development

No branches or pull requests

6 participants