New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serious performance regression when using GHC 8.2.1 + Text.RE.TDFA + Data.Text.Lazy #156

Open
ntc2 opened this Issue Nov 22, 2017 · 2 comments

Comments

Projects
None yet
2 participants
@ntc2

ntc2 commented Nov 22, 2017

This appears to be a GHC bug, not a bug in the regex packages themselves, but reporting here also in case anyone else runs into this issue. I reported against GHC here.

Details

In GHC 8.2.1, I observe apparently exponential time in the length of
the file when matching a simple regex using Text.Regex.TDFA and
Data.Text.Lazy. And this is a Heisenbug, in that the performance
problem goes away if I build with profiling support! The problem is
not present in GHC 8.0.2, or when using String or strict
Data.Text.

For the problematic combination, the run times are 3s, 10s, 22s, and 40s
for counting regex matches in files with 10000,
20000, 30000, and 40000 lines, respectively. For all of the
unproblematic combinations, the run time is always about 1s.

Repo with test code illustrating the problem.

@saurabhnanda

This comment has been minimized.

Show comment
Hide comment
@saurabhnanda

saurabhnanda Jul 19, 2018

@ntc2 are you observing this problem with GHC 8.4.x branch as well?

saurabhnanda commented Jul 19, 2018

@ntc2 are you observing this problem with GHC 8.4.x branch as well?

@ntc2

This comment has been minimized.

Show comment
Hide comment
@ntc2

ntc2 Jul 26, 2018

@saurabhnanda Yes. I just tried with GHC 8.4.3 and it's just as slow. I had to change the deps a bit to build my example code. I ended up with

    base-compat-0.10.4
    hashable-1.2.7.0
    regex-base-0.93.2
    regex-pcre-builtin-0.94.4.8.8.35
    regex-tdfa-1.2.3.1
    regex-tdfa-text-1.0.0.3
    time-locale-compat-0.1.1.4
    unordered-containers-0.2.9.0
    utf8-string-1.0.1.1
    regex-1.0.1.3

Note that @tdammers investigated this issue and determined it was caused by a bug in the text library, not by GHC itself: haskell/text#216.

ntc2 commented Jul 26, 2018

@saurabhnanda Yes. I just tried with GHC 8.4.3 and it's just as slow. I had to change the deps a bit to build my example code. I ended up with

    base-compat-0.10.4
    hashable-1.2.7.0
    regex-base-0.93.2
    regex-pcre-builtin-0.94.4.8.8.35
    regex-tdfa-1.2.3.1
    regex-tdfa-text-1.0.0.3
    time-locale-compat-0.1.1.4
    unordered-containers-0.2.9.0
    utf8-string-1.0.1.1
    regex-1.0.1.3

Note that @tdammers investigated this issue and determined it was caused by a bug in the text library, not by GHC itself: haskell/text#216.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment