Grammar sends Earley implementation to an infinite loop #3

erezsh · 2017-02-17T22:18:37Z

Earley implementation never terminates when running with the input "a", and the following grammar:

start: a
a: a | "a"

Of course the grammar is badly formed, but since we aspire to parse "any grammar", an error is a more appropriate behavior.

lucaswiman · 2017-02-24T00:23:35Z

@erezsh If I understand correctly, this grammar is not malformed, but also enters into an infinite loop:

from lark import Lark

l = Lark('''
  start: bar+
  bar: /a|b|c*/ "foo"
''')

l.parse('afoobfooccfoo')

lucaswiman · 2017-02-24T00:30:27Z

It seems the issue is that the regex can match zero characters. This one works fine:

Lark('''
  start: bar+
  bar: /a|b|c+/ "foo"
''')

and this one fails in the same way as the example above:

Lark('''
  start: bar+
  bar: /a|b|/ "foo"
''')

erezsh · 2017-02-24T08:03:39Z

Thanks, good catch!

I think the correct response is to forbid empty regexps.

erezsh · 2017-03-02T16:49:02Z

Both issues resolved.

erezsh · 2017-03-09T07:46:31Z

Fixed!

erezsh added the bug label Feb 24, 2017

erezsh self-assigned this Feb 24, 2017

erezsh added a commit that referenced this issue Mar 2, 2017

Solved issue #3: infinite loop due to zero-length tokens

96ebe94

erezsh closed this as completed Mar 2, 2017

erezsh added a commit that referenced this issue Mar 9, 2017

Fixed issue #3 (infinite recursion in grammar)

24f8656

Provide feedback