Generate all members of a grammar? #20

copumpkin · 2016-02-22T00:31:06Z

Intuitively, it feels like it should be possible for your library to generate all strings (as a potentially infinite list) that a grammar would parse, as well as all parsed values that those strings would generate. Is that something you've considered or would consider adding?

sboosali · 2016-02-22T06:42:44Z

i have some code that "refines" a grammar, ie Prod, (or rather, see below)
into a "Maybe FiniteGrammar" then enumerates all strings in that finite
grammar.

getting the order right for infinite granmars might take some work, like
a breadth first search traversal, but doesnt seem too hard.
more importantly, terminals are currently stored as predicates (String
-> Bool), not Strings. thus, we cant know what stringa Terminal t can
parse.

(we could add a "OneTerminal :: String -> (String -> a) -> Prod a" case,
and change the symbol smart constructor to use that?)

(or, maybe some dsl for defining predicates that can be "show"n? this seems
hard. ive looked at Data.List.Split in the aplit package, but its core type
isnt showable, being a function).

anyway, thats why in my project i have a type (call it My.Prod) with a
simple
conversion function that takes it into an Earley.Prod (as well as the
FiniteGrammar mentioned). there are some tricks i have (like an Alt of
Terminals is "optimized" to a Terminal with the predicates or'd together).

unfortunately, but it (1) may lose some efficiency, where distinct calls
are made to a predicates and (2) of course, it does lose expressivity, as i
cant match against isUpper (without an explicit IsUpper case in My.Prod).

On Monday, February 22, 2016, Daniel Peebles notifications@github.com
wrote:

Intuitively, it feels like it should be possible for your library to
generate all strings (as a potentially infinite list) that a grammar would
parse, as well as all parsed values that those strings would generate. Is
that something you've considered or would consider adding?

—
Reply to this email directly or view it on GitHub
#20.

(this message was composed with dictation: charitably interpret typos)Sam
Boosalis

ollef · 2016-02-22T07:35:34Z

I've been thinking about doing something like this for testing, to basically have tests that first generate arbitrary grammars and then test that against random (or all) strings in the language. I used to do something like that in my old Grempa library.

If the token type is finite I think we should be able to do it without too many tricks. I'll look into it.

phadej · 2016-02-22T09:25:04Z

I guess token don't need to be finite (e.g. Enum token, Bounded token), better to take an explicit [token]

phadej · 2016-02-22T09:29:37Z

or optionally have Monad m => m token -> Grammar .... -> m [token] generator!

danr · 2017-01-31T12:15:10Z

Use case: test the grammar for ambiguities.

phadej · 2017-01-31T12:25:13Z

Generally whether CFG is ambigious is undecidable, but you indeed could QuickCheck it. Yet, Earley's power is that it can deal with ambiguous grammars.

ollef · 2017-01-31T12:50:44Z

I think we could also do an exhaustive test up to some fixed input length and get some more confidence than random tests would give us.

ollef · 2017-02-01T20:14:27Z

There's some code for this here: https://github.com/ollef/Earley/tree/Generator

Does anyone want to volunteer some testing and/or tests?

ollef · 2017-02-01T20:16:31Z

language romanNumeralsGrammar "IVX"
= [(0,""),(1,"I"),(5,"V"),(10,"X"),(20,"XX"),(11,"XI"),(15,"XV"),(6,"VI"),(9,"IX"),(4,"IV"),(2,"II"),(3,"III"),(19,"XIX"),(16,"XVI"),(14,"XIV"),(12,"XII"),(7,"VII"),(21,"XXI"),(25,"XXV"),(30,"XXX"),(31,"XXXI"),(35,"XXXV"),(8,"VIII"),(13,"XIII"),(17,"XVII"),(26,"XXVI"),(29,"XXIX"),(24,"XXIV"),(22,"XXII"),(18,"XVIII"),(36,"XXXVI"),(39,"XXXIX"),(34,"XXXIV"),(32,"XXXII"),(23,"XXIII"),(27,"XXVII"),(33,"XXXIII"),(28,"XXVIII"),(37,"XXXVII"),(38,"XXXVIII")]

Pretty fun stuff!

copumpkin · 2017-02-01T20:18:38Z

@ollef is the "IVX" above the set of terminals to explore the grammar over, due to the token -> Bool predicate thing mentioned earlier?

ollef · 2017-02-01T20:19:35Z

@copumpkin: Yeah, that's it!

copumpkin · 2017-02-01T20:20:49Z

Oh that's super cool, so it's actually finite in this case

ollef · 2017-02-01T20:28:17Z

It's finite unless you give it some more tokens to work with (apparently).

TODO:

Add tests and make sure this works. Maybe try getting some tests that use both the language generator and the parser going.
Bikeshed the name of the module (currently Text.Earley.Language), and the generation function (currently language).
Expose the internals of the language generator as an .Internal module.
Add info about this to the readme
Write changelog, bump version, upload to Hackage.

phadej · 2017-02-02T06:12:17Z

@ollef have you tried it with nasty grammars? left-recursive, right-recursive, ridiculously ambiguous?

sboosali · 2017-02-02T06:55:41Z

@phadej yeah, for termination? what i did before was just to first validate the grammar as finite (by checking recursively for "non-finite" cases like many), and then return a Maybe [t]. also, known-infinite (or unknown-to-be-finite) grammars could be represented by an opaque symbol (possible the (<?>) if present), if it's needed. like a list of "Either (Maybe e) t" or something.

…

On Wed, Feb 1, 2017 at 10:12 PM Oleg Grenrus ***@***.***> wrote: @ollef <https://github.com/ollef> have you tried it with nasty grammars? left-recursive, right-recursive, ridiculously ambiguous? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#20 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ACNoMRQ5h1R-WPrX_lMO7M1Dg5OJFELMks5rYXPBgaJpZM4HfJGl> .

ollef · 2017-02-02T07:14:15Z

@phadej: I haven't done any principled testing, just tried a few examples. Those are good starting points, thanks! 👍

@sboosali: Hmm, I don't quite understand what you mean. Don't we want this to work even if a grammar generates an infinite language? What we're after, I think, is that the language generation is productive, such that you can take the number of results you want in infinite cases.

It'll likely have roughly the same restrictions as the parser by the way, so degenerate grammars like https://github.com/ollef/Earley/blob/master/examples/VeryAmbiguous.hs will loop because they produce circular results.

phadej · 2017-02-02T07:14:52Z

@sboosali for productivity. For finite languages you could use regex-applicative (and even regular languages aren't always finite).

sboosali · 2017-02-02T08:05:31Z

@olle (right, productivity makes sense. I was just thinking about finite subgrammars.) are the results "breath first" (or broadly, not too biased in some way)? otherwise, they might not be representative of the language. also, what use cases did you have in mind? maybe we could give a basic sanity check for ambiguity by running the parser on the "language"? or even check the stream for duplicates, depending on how its generated? (not sure about any of this, havent studied cfg's that much).

…

On Wed, Feb 1, 2017 at 11:14 PM Oleg Grenrus ***@***.***> wrote: @sboosali <https://github.com/sboosali> for productivity. For finite languages you could use regex-applicative (and even regular languages aren't always finite). — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#20 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ACNoMUP5Xk11aJ1e8tEgbXU_W_f32jZKks5rYYJsgaJpZM4HfJGl> .

ollef · 2017-02-02T08:46:10Z

The results are roughly in the same order as from the parser, so ordered by input length, and within the same input length shuffled around a bit (governed by the internals of the implementation).

I'm sure people can find more uses, but here are some that I can think of:

Manual inspection.
Ambiguity checking, e.g. up to some input length (as mentioned above).
Test case generation. For example, if you have some code that uses parse results that code can be tested, exhaustively or randomly, against the elements of the language that the grammar generates.

ollef · 2017-02-08T06:43:12Z

This is included in the 0.12.0.0 release. Thanks for reminding me!

ollef added the enhancement label Mar 1, 2016

ollef closed this as completed Feb 8, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generate all members of a grammar? #20

Generate all members of a grammar? #20

copumpkin commented Feb 22, 2016

sboosali commented Feb 22, 2016

ollef commented Feb 22, 2016

phadej commented Feb 22, 2016

phadej commented Feb 22, 2016

danr commented Jan 31, 2017

phadej commented Jan 31, 2017

ollef commented Jan 31, 2017

ollef commented Feb 1, 2017

ollef commented Feb 1, 2017

copumpkin commented Feb 1, 2017

ollef commented Feb 1, 2017

copumpkin commented Feb 1, 2017

ollef commented Feb 1, 2017 •

edited

Loading

phadej commented Feb 2, 2017

sboosali commented Feb 2, 2017 via email

ollef commented Feb 2, 2017

phadej commented Feb 2, 2017

sboosali commented Feb 2, 2017 via email

ollef commented Feb 2, 2017

ollef commented Feb 8, 2017

Generate all members of a grammar? #20

Generate all members of a grammar? #20

Comments

copumpkin commented Feb 22, 2016

sboosali commented Feb 22, 2016

ollef commented Feb 22, 2016

phadej commented Feb 22, 2016

phadej commented Feb 22, 2016

danr commented Jan 31, 2017

phadej commented Jan 31, 2017

ollef commented Jan 31, 2017

ollef commented Feb 1, 2017

ollef commented Feb 1, 2017

copumpkin commented Feb 1, 2017

ollef commented Feb 1, 2017

copumpkin commented Feb 1, 2017

ollef commented Feb 1, 2017 • edited Loading

phadej commented Feb 2, 2017

sboosali commented Feb 2, 2017 via email

ollef commented Feb 2, 2017

phadej commented Feb 2, 2017

sboosali commented Feb 2, 2017 via email

ollef commented Feb 2, 2017

ollef commented Feb 8, 2017

ollef commented Feb 1, 2017 •

edited

Loading