Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crashes if grammar is infinitely ambiguous on given input #54

Open
safinaskar opened this issue May 19, 2021 · 11 comments
Open

Crashes if grammar is infinitely ambiguous on given input #54

safinaskar opened this issue May 19, 2021 · 11 comments

Comments

@safinaskar
Copy link

Consider this code:

{-# LANGUAGE Haskell2010 #-}
{-# LANGUAGE RecursiveDo #-}

import Text.Earley
import Control.Applicative

data L = A | B | C deriving (Eq, Show)

lang :: Grammar r (Prod r () L ())
lang = mdo {
  r1 <- rule $ (pure () <* token A) <|> r1;
  return r1;
}

main :: IO ()
main = do {
  putStrLn $ show $ fullParses (parser lang) [A];
}

This program crashes with stack overflow if I run it in ghci. And freezes if I compile and run it. I cannot even test whether list of parses is null, i. e. null $ fst $ fullParses (parser lang) [A] crashes in ghci.

I want some function, which can test whether given token list has multiple parses. I. e. I don't want list of parses or even count of parses. I simply want one of 3 results: 0 parses, 1 parse, more than 1 parse. And this function should terminate in case of infinitely ambiguous grammar.

I use Earley 0.13.0.1

@ollef
Copy link
Owner

ollef commented May 21, 2021

I think the reason for this is that when Earley builds its result lists, it will add later results to the start of the list.

Intuitively, we start with

r1_results = []

and then we parse token A, which sets

r1_results = [()]

Next, we process the other alternative, which yields

r1_results = r1_results ++ [()]

which is our final result. The stack overflow you see is from trying to force this value. I would expect the report function to work for this example because it doesn't force the result. (But it won't be useful to you since it doesn't say anything about ambiguity.)

We could write results to the end instead (but this has performance implications, discussed in #15), which would yield

r1_results = [()] ++ r1_results

This would work for your example since it's productive. But note that it would stop being productive again if you flipped the arguments to <|>.

So I can't see an easy fix that would always work. But if what I've written here is correct, your example would work if you flipped the arguments to <|>.

@safinaskar
Copy link
Author

But if what I've written here is correct, your example would work if you flipped the arguments to <|>

I just checked. No, null $ fst $ fullParses (parser lang) [A] still causes stack overflow in ghci

I would expect the report function to work for this example because it doesn't force the result

Yes, report (parser lang) [A] works. With both (pure () <* token A) <|> r1 and r1 <|> (pure () <* token A)

@ollef
Copy link
Owner

ollef commented May 21, 2021

Hmm, okay, it's been a while so I may be getting the details wrong.

@safinaskar
Copy link
Author

upTo 1 (generator lang [A]) gives stack overflow, too

@safinaskar
Copy link
Author

This motivated me to publish alternative library for checking ambiguity: https://hackage.haskell.org/package/check-cfg-ambiguity . My library does not freeze on this example. I hope you are not offended

@ollef
Copy link
Owner

ollef commented May 27, 2021

Cool! I wonder if you could implement the same on a grammar from this library. :)

@safinaskar
Copy link
Author

Cool! I wonder if you could implement the same on a grammar from this library. :)

What you mean?

@safinaskar
Copy link
Author

You mean implementing my algorithm on your Grammar type?

@ollef
Copy link
Owner

ollef commented May 27, 2021

Yes.

@safinaskar
Copy link
Author

I don't know. My type of grammar is very simple (Data.Map.Map n [[TerminalOrNonterminal t n]]). t and n are any kind of IDs for terminals and nonterminals (for example, strings with names). TerminalOrNonterminal is essentially Either. I will repeat explanation of my algorithm from haskell-cafe:

I generate all strings of symbols, which can be reached from start symbol by replacing nonterminals with their productions no more than "count" times. If I get duplicate, then the grammar is ambiguous. This strings are strings of any symbols, not necessary terminals. And "count" means count of replacements, i. e. count of productions applications.
This is simple brute force algorithm. It does not really checks grammar for ambiguity (this is impossible on Turing machine)

My algorithm is trivial, and you can easily get idea by looking at code of lowLevelTestAmbiguity: https://hackage.haskell.org/package/check-cfg-ambiguity-0.0.0.1/docs/src/CheckCFGAmbiguity.html#lowLevelTestAmbiguity .

My algorithm requires that nonterminals can be checked for equality. Because I generate all strings which I could get using count expansions, and then try to find duplicates. And strings contain both terminals and nonterminals.

Also, I am currently writing library, which gets grammar in some big bloated form, and then produces both Data.Map.Map n [[TerminalOrNonterminal t n]] and Earley's Grammar. Then this grammar checked for ambiguity using check-cfg-ambiguity and also used to actually parse something using Earley. Also I produce pretty-printer

@safinaskar
Copy link
Author

I wrote parsing library I talked about in #54 (comment) : https://mail.haskell.org/pipermail/haskell-cafe/2021-July/134217.html , it is based on Earley. Also, I recently published another parsing library, which is based on Earley, too: https://mail.haskell.org/pipermail/haskell-cafe/2021-July/134205.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants