New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Unicode character classes are slow [sf#8] #143

Closed
lsf37 opened this Issue Feb 15, 2015 · 4 comments

Comments

Projects
None yet
1 participant
@lsf37
Member

lsf37 commented Feb 15, 2015

Reported by willink on 2002-09-04 15:46 UTC

I just moved up from JLex.

Much impressed by the better speed, error detection.

Had a problwm with exponential time and memopry on
1.3.5, which seems to be
much alleviated in the 1.4_pre1 (it now runs).

I suspest there is still something that could be done to
speed large Unicode grammars.

It seems wrong that just expanding the number of
lenumerated elements in an unchanging
number of input character classes should change the
number of DFA states, and consequently
the DFA to NFA conversion time.

To demonstrate, use a typical XML grammar (attached).
It requires 3187 NFA states, whereas
after commenting out all 16 bit character lasses it
needs only 1000 odd. The latter compiles
quite rapidly, the former si slow but tractiavble with the
pre-release.

@lsf37 lsf37 changed the title from Unicode character classes are slow to [Feature] Unicode character classes are slow [sf#8] Feb 15, 2015

@lsf37 lsf37 closed this Feb 15, 2015

@lsf37

This comment has been minimized.

Show comment
Hide comment
@lsf37

lsf37 Feb 15, 2015

Member

Commented by willink on 2002-09-04 15:46 UTC
Slow to compile grammar

Member

lsf37 commented Feb 15, 2015

Commented by willink on 2002-09-04 15:46 UTC
Slow to compile grammar

@lsf37

This comment has been minimized.

Show comment
Hide comment
@lsf37

lsf37 Feb 15, 2015

Member

Commented by lsf37 on 2002-09-25 12:23 UTC
Logged In: YES
user_id=93534

Sorry for the late answer, the bug tracker's "monitor" function
seems to have let me down.

The problem with the character classes in the attached
grammar is that they are formulated with the "|" operator, not
as a pure char class expression (which would use only one
[..]). JFlex translates this as the general "or" (which increases
the initial DFA size) and not as character class (which
wouldn't). The final DFA size (after minimization) should be
the same, though (which doesn't help you, of course, if JFlex
never reaches that state).

I realize that it is impractical to have char classes of this size
in one single expression in the specification. I will see if I can
optimize the first stage (RegExp->DFA) so that JFlex
recognizes when "|" is used for char class enumeration only.

Member

lsf37 commented Feb 15, 2015

Commented by lsf37 on 2002-09-25 12:23 UTC
Logged In: YES
user_id=93534

Sorry for the late answer, the bug tracker's "monitor" function
seems to have let me down.

The problem with the character classes in the attached
grammar is that they are formulated with the "|" operator, not
as a pure char class expression (which would use only one
[..]). JFlex translates this as the general "or" (which increases
the initial DFA size) and not as character class (which
wouldn't). The final DFA size (after minimization) should be
the same, though (which doesn't help you, of course, if JFlex
never reaches that state).

I realize that it is impractical to have char classes of this size
in one single expression in the specification. I will see if I can
optimize the first stage (RegExp->DFA) so that JFlex
recognizes when "|" is used for char class enumeration only.

@lsf37

This comment has been minimized.

Show comment
Hide comment
@lsf37

lsf37 Feb 15, 2015

Member

Updated by lsf37 on 2002-09-25 12:25 UTC

  • labels: --> generator
  • assigned_to: nobody --> lsf37
Member

lsf37 commented Feb 15, 2015

Updated by lsf37 on 2002-09-25 12:25 UTC

  • labels: --> generator
  • assigned_to: nobody --> lsf37
@lsf37

This comment has been minimized.

Show comment
Hide comment
@lsf37

lsf37 Feb 15, 2015

Member

Updated by lsf37 on 2004-04-12 12:31 UTC

  • status: open --> closed
Member

lsf37 commented Feb 15, 2015

Updated by lsf37 on 2004-04-12 12:31 UTC

  • status: open --> closed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment