New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

very slow DFA construction (resulting in a very large DFA) #128

Closed
skvadrik opened this Issue Dec 9, 2015 · 3 comments

Comments

Projects
None yet
1 participant
@skvadrik
Owner

skvadrik commented Dec 9, 2015

Consider the following code (slow.re):

/*!re2c
    [ac]{0,12} [a] [ac]{0,12} {}
*/

It takes re2c a long time to compile this code and the generated lexer is very large:

$ time re2c slow.re > slow.c && stat -c '%s' slow.c

real    0m2.764s
user    0m2.743s
sys     0m0.022s
1384716

Increasing repetition counter (any of the two, or both):

/*!re2c
    [ac]{0,14} [a] [ac]{0,14} {}
*/

Causes insane increase of time and size:

$ time re2c slow.re > slow.c && stat -c '%s' slow.c

real    1m54.837s
user    1m54.733s
sys     0m0.120s
5627102

If we minimize the generated DFA, it would get reasonably small. (I experimented with minimization a bit; the number of states is reduced by several orders of magnitude and the minimization itself takes reasonable time, which suggests that the slowdown might be caused by inefficiency in FSA determinisation).

@skvadrik skvadrik added this to the 0.16 milestone Dec 9, 2015

@skvadrik skvadrik self-assigned this Dec 9, 2015

skvadrik added a commit that referenced this issue Dec 19, 2015

Keep DFA states in a hash map (to speedup lookup fo an identical state).
This partially fixes bug #128: "very slow DFA construction (resulting
in a very large DFA)". DFA construction is no longer slow, but the
resulting DFA is still too large and needs to be minimized.
@skvadrik

This comment has been minimized.

Show comment
Hide comment
@skvadrik

skvadrik Dec 19, 2015

Owner

Assuming the same example:

/*!re2c
    [ac]{0,14} [a] [ac]{0,14} {}
*/

Commit b98997b reduces the time of DFA construction from ~115s to ~1.2s:

$ time ./re2c slow.re > slow.c && stat -c '%s' slow.c

real    0m1.139s
user    0m1.073s
sys     0m0.065s
5646703

The generated DFA is still too big and should be minimized.

Owner

skvadrik commented Dec 19, 2015

Assuming the same example:

/*!re2c
    [ac]{0,14} [a] [ac]{0,14} {}
*/

Commit b98997b reduces the time of DFA construction from ~115s to ~1.2s:

$ time ./re2c slow.re > slow.c && stat -c '%s' slow.c

real    0m1.139s
user    0m1.073s
sys     0m0.065s
5646703

The generated DFA is still too big and should be minimized.

skvadrik added a commit that referenced this issue Dec 31, 2015

Added test for bug #128 "very slow DFA construction (resulting in a v…
…ery large DFA)".

After minimization the resulting DFA is much smaller:
    /*!re2c
        [ac]{0,14} [a] [ac]{0,14} {}
    */

Was:
    $ time re2c slow.re > slow.c && stat -c '%s' slow.c

    real    1m54.837s
    user    1m54.733s
    sys     0m0.120s
    5627102

Now:
    $ time ./re2c slow.re > slow.c && stat -c '%s' slow.c

    real    0m0.732s
    user    0m0.684s
    sys     0m0.048s
    15078
@skvadrik

This comment has been minimized.

Show comment
Hide comment
@skvadrik

skvadrik Dec 31, 2015

Owner

DFA size is reduced by commit 1136f2f that adds DFA minimization.

$ time ./re2c slow.re > slow.c && stat -c '%s' slow.c

real    0m0.732s
user    0m0.684s
sys     0m0.048s
15078
Owner

skvadrik commented Dec 31, 2015

DFA size is reduced by commit 1136f2f that adds DFA minimization.

$ time ./re2c slow.re > slow.c && stat -c '%s' slow.c

real    0m0.732s
user    0m0.684s
sys     0m0.048s
15078
@skvadrik

This comment has been minimized.

Show comment
Hide comment
@skvadrik

skvadrik Dec 31, 2015

Owner

The intermediate DFA (the one that is constructed by NFA determinization and further reduced by DFA minimization) is still large. We'll need incremental minimization to fully solve pathological cases like this. Closing the bug until someone encounters a real-world example.

Owner

skvadrik commented Dec 31, 2015

The intermediate DFA (the one that is constructed by NFA determinization and further reduced by DFA minimization) is still large. We'll need incremental minimization to fully solve pathological cases like this. Closing the bug until someone encounters a real-world example.

@skvadrik skvadrik closed this Dec 31, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment