Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

very slow DFA construction (resulting in a very large DFA) #128

Closed
skvadrik opened this issue Dec 9, 2015 · 3 comments
Closed

very slow DFA construction (resulting in a very large DFA) #128

skvadrik opened this issue Dec 9, 2015 · 3 comments
Assignees

Comments

@skvadrik
Copy link
Owner

skvadrik commented Dec 9, 2015

Consider the following code (slow.re):

/*!re2c
    [ac]{0,12} [a] [ac]{0,12} {}
*/

It takes re2c a long time to compile this code and the generated lexer is very large:

$ time re2c slow.re > slow.c && stat -c '%s' slow.c

real    0m2.764s
user    0m2.743s
sys     0m0.022s
1384716

Increasing repetition counter (any of the two, or both):

/*!re2c
    [ac]{0,14} [a] [ac]{0,14} {}
*/

Causes insane increase of time and size:

$ time re2c slow.re > slow.c && stat -c '%s' slow.c

real    1m54.837s
user    1m54.733s
sys     0m0.120s
5627102

If we minimize the generated DFA, it would get reasonably small. (I experimented with minimization a bit; the number of states is reduced by several orders of magnitude and the minimization itself takes reasonable time, which suggests that the slowdown might be caused by inefficiency in FSA determinisation).

@skvadrik skvadrik added this to the 0.16 milestone Dec 9, 2015
@skvadrik skvadrik self-assigned this Dec 9, 2015
skvadrik added a commit that referenced this issue Dec 19, 2015
This partially fixes bug #128: "very slow DFA construction (resulting
in a very large DFA)". DFA construction is no longer slow, but the
resulting DFA is still too large and needs to be minimized.
@skvadrik
Copy link
Owner Author

Assuming the same example:

/*!re2c
    [ac]{0,14} [a] [ac]{0,14} {}
*/

Commit b98997b reduces the time of DFA construction from ~115s to ~1.2s:

$ time ./re2c slow.re > slow.c && stat -c '%s' slow.c

real    0m1.139s
user    0m1.073s
sys     0m0.065s
5646703

The generated DFA is still too big and should be minimized.

skvadrik added a commit that referenced this issue Dec 31, 2015
…ery large DFA)".

After minimization the resulting DFA is much smaller:
    /*!re2c
        [ac]{0,14} [a] [ac]{0,14} {}
    */

Was:
    $ time re2c slow.re > slow.c && stat -c '%s' slow.c

    real    1m54.837s
    user    1m54.733s
    sys     0m0.120s
    5627102

Now:
    $ time ./re2c slow.re > slow.c && stat -c '%s' slow.c

    real    0m0.732s
    user    0m0.684s
    sys     0m0.048s
    15078
@skvadrik
Copy link
Owner Author

DFA size is reduced by commit 1136f2f that adds DFA minimization.

$ time ./re2c slow.re > slow.c && stat -c '%s' slow.c

real    0m0.732s
user    0m0.684s
sys     0m0.048s
15078

@skvadrik
Copy link
Owner Author

The intermediate DFA (the one that is constructed by NFA determinization and further reduced by DFA minimization) is still large. We'll need incremental minimization to fully solve pathological cases like this. Closing the bug until someone encounters a real-world example.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant