Skip to content

[Feature] Unicode character classes are slow [sf#8] #143

@lsf37

Description

@lsf37

Reported by willink on 2002-09-04 15:46 UTC

I just moved up from JLex.

Much impressed by the better speed, error detection.

Had a problwm with exponential time and memopry on
1.3.5, which seems to be
much alleviated in the 1.4_pre1 (it now runs).

I suspest there is still something that could be done to
speed large Unicode grammars.

It seems wrong that just expanding the number of
lenumerated elements in an unchanging
number of input character classes should change the
number of DFA states, and consequently
the DFA to NFA conversion time.

To demonstrate, use a typical XML grammar (attached).
It requires 3187 NFA states, whereas
after commenting out all 16 bit character lasses it
needs only 1000 odd. The latter compiles
quite rapidly, the former si slow but tractiavble with the
pre-release.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions