Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent duplicate states from being added to tables #22

Open
aredridel opened this issue Apr 17, 2014 · 6 comments
Open

Prevent duplicate states from being added to tables #22

aredridel opened this issue Apr 17, 2014 · 6 comments

Comments

@aredridel
Copy link
Contributor

With the simple grammar from Aycock and Horspool:

S -> A A A A
A -> "a"
A -> E
E -> null

When run against the trivial input:

aa

This yields the following parse tables -- note the duplicate states.

table 0
     { _start → ● S },0
     { S → ● A A A A },0
     { A → ● "a" },0
     { A → ● E },0
     { A → E ● },0
     { S → A ● A A A },0
     { S → A A ● A A },0
     { S → A A A ● A },0
     { S → A A A A ● },0
     { _start → S ● },0
table 1
     { A → "a" ● },0
     { S → A ● A A A },0
     { S → A A ● A A },0
     { S → A A A ● A },0
     { S → A A A A ● },0
     { A → ● "a" },1
     { A → ● E },1
     { _start → S ● },0
     { A → E ● },1
     { S → A A ● A A },0
     { S → A A A ● A },0
     { S → A A A A ● },0
     { S → A A A ● A },0
     { S → A A A A ● },0
     { S → A A A A ● },0
     { _start → S ● },0
     { _start → S ● },0
     { _start → S ● },0
table 2
     { A → "a" ● },1
     { S → A A ● A A },0
     { S → A A A ● A },0
     { S → A A A A ● },0
     { S → A A A ● A },0
     { S → A A A A ● },0
     { S → A A A A ● },0
     { A → ● "a" },2
     { A → ● E },2
     { _start → S ● },0
     { _start → S ● },0
     { _start → S ● },0
     { A → E ● },2
     { S → A A A ● A },0
     { S → A A A A ● },0
     { S → A A A A ● },0
     { S → A A A A ● },0
     { _start → S ● },0
     { _start → S ● },0
     { _start → S ● },0

It does parse correctly, however, yielding

[ [ [ [] ], [ [] ], [ 'a' ], [ 'a' ] ],
  [ [ [] ], [ 'a' ], [ [] ], [ 'a' ] ],
  [ [ 'a' ], [ [] ], [ [] ], [ 'a' ] ],
  [ [ [] ], [ 'a' ], [ 'a' ], [ [] ] ],
  [ [ 'a' ], [ [] ], [ 'a' ], [ [] ] ],
  [ [ 'a' ], [ 'a' ], [ [] ], [ [] ] ] ]

So this is just an efficiency concern.

Right now this is due to the lack of duplication checking (or Set-like datastructure) in State.prototype.process, specifically table[location].push(x);

I'm actively working on this but haven't figured out a tidy way to solve it yet.

@kach
Copy link
Owner

kach commented Apr 17, 2014

Ah, thanks for taking a look. Speed is definitely an issue we need to work on. Perhaps we need a Java-esque State.prototype.equals(another_state)?

Should I put up my Lua parser as a significantly nontrivial test case for you to play with? It may help catch obscure bugs/generate large tests; or it may just cause more headaches.

@aredridel
Copy link
Contributor Author

Oh heck yes. I know @silentbicycle is also working on a Lua Earley parser, so at the least, he'd love that!

@aredridel
Copy link
Contributor Author

I don't have a good gut feeling on whether an equals(other) method is the best way to go, or to try to exploit object identity. How this ties in to building DFA states in #11 is perhaps relevant, too, or perhaps makes it unneeded, since the operation there is state = nextStateFor(character), and the state has the table pre-built in it.

Also I just noticed that my grammar is now very post-midnight and I need to sleep, apparently!

@kach
Copy link
Owner

kach commented Apr 18, 2014

@aredridel
Copy link
Contributor Author

I just realized that the duplicate states are the different (ambiguous) derivations in my grammar.

@rwindelz
Copy link
Contributor

when COMPLETE adds the additional duplicate states to the current chart, it allows the closure to complete the full set of states in the chart. in order to remove the duplicate states, the closure calculation will have to change. i've seen other Earley parsers use a naive closure calc - 'process the full chart until it is not possible to add any more states' - that gets expensive.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants