Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(lib): correctly handle miss for loop in loop #393

Merged
merged 3 commits into from
Jun 2, 2024

Conversation

lukas-code
Copy link
Contributor

@lukas-code lukas-code commented May 24, 2024

fixes #392

This PR changes the construction of the graph for a regex Mir::Loop when there is a miss path present. Given an input that essentially represents this control flow:

flowchart LR
    start-->1
    1((1))--a*-->2((2))
    2((2))--then-->3((3))
    1((1))--miss-->4((4))
Loading

we previously lowered it to a single loop and unconditionally merged the then and old miss arms into the new miss of the loop:

flowchart LR
    start-->1((1))
    1((1))--a-->1((1))
    1((1))--_-->2((2))
    2((2))--then-->3((3))
    2((2))--miss-->4((4))
Loading

This lowering can lead to situations, where we incorrectly match a followed by miss, even though a must be followed by then.

This PR solves the issue by generating two separate paths for the first and other iterations of the loop, and merging the then and old miss only for the new miss of the first iteration path:

flowchart LR
    start-->1
    1((1))--a-->2((2))
    2((2))--a-->2((2))
    2((2))--_-->3((3))
    3((3))--then-->4((4))
    1((1))--_-->5((5))
    5((5))--then-->4((4))
    5((5))--miss-->6((6))
Loading

Since the path for a may be potentially long, this can result in a lot more generated code, especially for deeply nested loops. But I hope that such patterns are rare enough in the wild that we can accept a performance hit for the sake of a correctness fix. For what it's worth, the cargo benches seem to perform similar before and after this patch on my machine.

@jeertmans jeertmans changed the title correctly handle miss for loop in loop fix(lib): correctly handle miss for loop in loop Jun 2, 2024
@jeertmans jeertmans added the bug Something isn't working label Jun 2, 2024
@jeertmans
Copy link
Collaborator

Hello @lukas-code, thanks for your very comprehensive analysis on this issue, and I am actually impressed by the mermaid diagram, and how they render interactively in PR!

How did you generate the code? That might be super nice to have Logos generate those diagram in debug mode!

Anyway, all the tests are in your favor, so I don't see why I should not accept this patch :D

Thanks for your work!!

@jeertmans jeertmans merged commit cbd6218 into maciejhirsz:master Jun 2, 2024
18 checks passed
@lukas-code lukas-code deleted the loop-in-loop branch June 2, 2024 16:15
@lukas-code
Copy link
Contributor Author

How did you generate the code? That might be super nice to have Logos generate those diagram in debug mode!

The code for the diagrams is not generated. I looked at the Debug representation of the Graph and then manually converted that to mermaid syntax. Having a separate formatter for the Graph that directly outputs mermaid seems like something that would be nice to have and reasonably straight forward to implement, but I currently do not have the time to work on that.

@jeertmans
Copy link
Collaborator

No issue, I have created #395 so we can keep track of this feature request in the future :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

regex with loop in loop incorrecly matches when it shouldn't
2 participants