Crashes if grammar is infinitely ambiguous on given input #54

safinaskar · 2021-05-19T12:35:45Z

Consider this code:

{-# LANGUAGE Haskell2010 #-}
{-# LANGUAGE RecursiveDo #-}

import Text.Earley
import Control.Applicative

data L = A | B | C deriving (Eq, Show)

lang :: Grammar r (Prod r () L ())
lang = mdo {
  r1 <- rule $ (pure () <* token A) <|> r1;
  return r1;
}

main :: IO ()
main = do {
  putStrLn $ show $ fullParses (parser lang) [A];
}

This program crashes with stack overflow if I run it in ghci. And freezes if I compile and run it. I cannot even test whether list of parses is null, i. e. null $ fst $ fullParses (parser lang) [A] crashes in ghci.

I want some function, which can test whether given token list has multiple parses. I. e. I don't want list of parses or even count of parses. I simply want one of 3 results: 0 parses, 1 parse, more than 1 parse. And this function should terminate in case of infinitely ambiguous grammar.

I use Earley 0.13.0.1

The text was updated successfully, but these errors were encountered:

ollef · 2021-05-21T09:18:39Z

I think the reason for this is that when Earley builds its result lists, it will add later results to the start of the list.

Intuitively, we start with

r1_results = []

and then we parse token A, which sets

r1_results = [()]

Next, we process the other alternative, which yields

r1_results = r1_results ++ [()]

which is our final result. The stack overflow you see is from trying to force this value. I would expect the report function to work for this example because it doesn't force the result. (But it won't be useful to you since it doesn't say anything about ambiguity.)

We could write results to the end instead (but this has performance implications, discussed in #15), which would yield

r1_results = [()] ++ r1_results

This would work for your example since it's productive. But note that it would stop being productive again if you flipped the arguments to <|>.

So I can't see an easy fix that would always work. But if what I've written here is correct, your example would work if you flipped the arguments to <|>.

safinaskar · 2021-05-21T14:53:41Z

But if what I've written here is correct, your example would work if you flipped the arguments to <|>

I just checked. No, null $ fst $ fullParses (parser lang) [A] still causes stack overflow in ghci

I would expect the report function to work for this example because it doesn't force the result

Yes, report (parser lang) [A] works. With both (pure () <* token A) <|> r1 and r1 <|> (pure () <* token A)

ollef · 2021-05-21T18:34:04Z

Hmm, okay, it's been a while so I may be getting the details wrong.

safinaskar · 2021-05-23T20:06:30Z

upTo 1 (generator lang [A]) gives stack overflow, too

safinaskar · 2021-05-23T23:50:26Z

This motivated me to publish alternative library for checking ambiguity: https://hackage.haskell.org/package/check-cfg-ambiguity . My library does not freeze on this example. I hope you are not offended

ollef · 2021-05-27T06:13:33Z

Cool! I wonder if you could implement the same on a grammar from this library. :)

safinaskar · 2021-05-27T13:12:50Z

Cool! I wonder if you could implement the same on a grammar from this library. :)

What you mean?

safinaskar · 2021-05-27T13:13:44Z

You mean implementing my algorithm on your Grammar type?

ollef · 2021-05-27T18:27:55Z

Yes.

safinaskar · 2021-05-27T20:46:29Z

I don't know. My type of grammar is very simple (Data.Map.Map n [[TerminalOrNonterminal t n]]). t and n are any kind of IDs for terminals and nonterminals (for example, strings with names). TerminalOrNonterminal is essentially Either. I will repeat explanation of my algorithm from haskell-cafe:

I generate all strings of symbols, which can be reached from start symbol by replacing nonterminals with their productions no more than "count" times. If I get duplicate, then the grammar is ambiguous. This strings are strings of any symbols, not necessary terminals. And "count" means count of replacements, i. e. count of productions applications.
This is simple brute force algorithm. It does not really checks grammar for ambiguity (this is impossible on Turing machine)

My algorithm is trivial, and you can easily get idea by looking at code of lowLevelTestAmbiguity: https://hackage.haskell.org/package/check-cfg-ambiguity-0.0.0.1/docs/src/CheckCFGAmbiguity.html#lowLevelTestAmbiguity .

My algorithm requires that nonterminals can be checked for equality. Because I generate all strings which I could get using count expansions, and then try to find duplicates. And strings contain both terminals and nonterminals.

Also, I am currently writing library, which gets grammar in some big bloated form, and then produces both Data.Map.Map n [[TerminalOrNonterminal t n]] and Earley's Grammar. Then this grammar checked for ambiguity using check-cfg-ambiguity and also used to actually parse something using Earley. Also I produce pretty-printer

safinaskar · 2021-07-11T19:11:51Z

I wrote parsing library I talked about in #54 (comment) : https://mail.haskell.org/pipermail/haskell-cafe/2021-July/134217.html , it is based on Earley. Also, I recently published another parsing library, which is based on Earley, too: https://mail.haskell.org/pipermail/haskell-cafe/2021-July/134205.html

ollef mentioned this issue May 21, 2021

Please add tags "test ambiguity" etc #53

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Crashes if grammar is infinitely ambiguous on given input #54

Crashes if grammar is infinitely ambiguous on given input #54

safinaskar commented May 19, 2021

ollef commented May 21, 2021

safinaskar commented May 21, 2021

ollef commented May 21, 2021

safinaskar commented May 23, 2021

safinaskar commented May 23, 2021

ollef commented May 27, 2021

safinaskar commented May 27, 2021

safinaskar commented May 27, 2021

ollef commented May 27, 2021

safinaskar commented May 27, 2021

safinaskar commented Jul 11, 2021

Crashes if grammar is infinitely ambiguous on given input #54

Crashes if grammar is infinitely ambiguous on given input #54

Comments

safinaskar commented May 19, 2021

ollef commented May 21, 2021

safinaskar commented May 21, 2021

ollef commented May 21, 2021

safinaskar commented May 23, 2021

safinaskar commented May 23, 2021

ollef commented May 27, 2021

safinaskar commented May 27, 2021

safinaskar commented May 27, 2021

ollef commented May 27, 2021

safinaskar commented May 27, 2021

safinaskar commented Jul 11, 2021