Limit amount of time/effort interegular will use when creating an example #1266

MegaIng · 2023-03-10T12:50:24Z

Use limit_depth when creating an example for a collision to prevent unbounded memory and time usage.

An example for two regex which do have a collision, but which can take a long time to find the example for are:

A: /(?:#( |\t)*(?!if|ifdef|else|elif|endif|define|set|unset|error|exit)[^\n]+|(;|\\/\\/)[^\n]*)/
B: /\\#(?:(?:(?:\\ |\t))+)?(?i:error)/i

Coming from this grammar: https://github.com/adbancroft/TunerStudioIniParser/blob/master/ts_ini_parser/grammars/pre_processor.lark

However, even with this particular change, loading that grammar still takes a few seconds since almost all regex collide with each other.

The reason for many of these collisions is that the grammar is unclear whether or not the keywords are case insensitive, so all these warnings and collisions are actually fully accurate and using #ErRor instead of #error will cause problems for the grammar.

…nbounded memory and time usage.

erezsh · 2023-03-11T10:17:56Z

So perhaps we can limit ourselves to checking a fixed amount of collisions, say 8, after which we'll just write something like "8 regex collisions reached; disabling detection. More collisions may exist."

Also, I think we can reduce the example search time to 200ms or similar, it's already quite a bit of time. But maybe for strict-mode put it even higher at 2000ms, because then we only have to call it once.

What do you think?

MegaIng · 2023-03-11T11:50:43Z

Yeah, putting a limit on the total amount of collisions is something I also considered.

For search time, I sadly don't have a good measurement of the parameter I can control -> time it takes. But I could try and see if I can figure out something.

…will be outputted, and limit max time that is being spent searching for examples.

MegaIng · 2023-03-11T15:55:48Z

Ok, I made a few improvements. It now should be relatively reliable, and I think there are a few changes I can do in just interegular to make it so that it manages to provide examples for all collisions in the above mentioned grammar very quickly, where previously it only manage to report one or two. Via testing, I calculated an approximation for max_time-> max_iterations. This should be decently reliable unless the hardware or python implementation is very slow.

use limit depth when creating an example for a collision to prevent u…

6c766f9

…nbounded memory and time usage.

Factor out _check_regex_collisions. Limit total collision count that …

8fba797

…will be outputted, and limit max time that is being spent searching for examples.

erezsh merged commit f35df9b into lark-parser:master Mar 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Limit amount of time/effort interegular will use when creating an example #1266

Limit amount of time/effort interegular will use when creating an example #1266

MegaIng commented Mar 10, 2023

erezsh commented Mar 11, 2023

MegaIng commented Mar 11, 2023

MegaIng commented Mar 11, 2023

Limit amount of time/effort interegular will use when creating an example #1266

Limit amount of time/effort interegular will use when creating an example #1266

Conversation

MegaIng commented Mar 10, 2023

erezsh commented Mar 11, 2023

MegaIng commented Mar 11, 2023

MegaIng commented Mar 11, 2023