Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate all members of a grammar? #20

Closed
copumpkin opened this issue Feb 22, 2016 · 20 comments
Closed

Generate all members of a grammar? #20

copumpkin opened this issue Feb 22, 2016 · 20 comments

Comments

@copumpkin
Copy link

Intuitively, it feels like it should be possible for your library to generate all strings (as a potentially infinite list) that a grammar would parse, as well as all parsed values that those strings would generate. Is that something you've considered or would consider adding?

@sboosali
Copy link
Collaborator

i have some code that "refines" a grammar, ie Prod, (or rather, see below)
into a "Maybe FiniteGrammar" then enumerates all strings in that finite
grammar.

  1. getting the order right for infinite granmars might take some work, like
    a breadth first search traversal, but doesnt seem too hard.
  2. more importantly, terminals are currently stored as predicates (String
    -> Bool), not Strings. thus, we cant know what stringa Terminal t can
    parse.

(we could add a "OneTerminal :: String -> (String -> a) -> Prod a" case,
and change the symbol smart constructor to use that?)

(or, maybe some dsl for defining predicates that can be "show"n? this seems
hard. ive looked at Data.List.Split in the aplit package, but its core type
isnt showable, being a function).

anyway, thats why in my project i have a type (call it My.Prod) with a
simple
conversion function that takes it into an Earley.Prod (as well as the
FiniteGrammar mentioned). there are some tricks i have (like an Alt of
Terminals is "optimized" to a Terminal with the predicates or'd together).

unfortunately, but it (1) may lose some efficiency, where distinct calls
are made to a predicates and (2) of course, it does lose expressivity, as i
cant match against isUpper (without an explicit IsUpper case in My.Prod).

On Monday, February 22, 2016, Daniel Peebles notifications@github.com
wrote:

Intuitively, it feels like it should be possible for your library to
generate all strings (as a potentially infinite list) that a grammar would
parse, as well as all parsed values that those strings would generate. Is
that something you've considered or would consider adding?


Reply to this email directly or view it on GitHub
#20.

(this message was composed with dictation: charitably interpret typos)Sam
Boosalis

@ollef
Copy link
Owner

ollef commented Feb 22, 2016

I've been thinking about doing something like this for testing, to basically have tests that first generate arbitrary grammars and then test that against random (or all) strings in the language. I used to do something like that in my old Grempa library.

If the token type is finite I think we should be able to do it without too many tricks. I'll look into it.

@phadej
Copy link
Collaborator

phadej commented Feb 22, 2016

I guess token don't need to be finite (e.g. Enum token, Bounded token), better to take an explicit [token]

@phadej
Copy link
Collaborator

phadej commented Feb 22, 2016

or optionally have Monad m => m token -> Grammar .... -> m [token] generator!

@danr
Copy link

danr commented Jan 31, 2017

Use case: test the grammar for ambiguities.

@phadej
Copy link
Collaborator

phadej commented Jan 31, 2017

Generally whether CFG is ambigious is undecidable, but you indeed could QuickCheck it. Yet, Earley's power is that it can deal with ambiguous grammars.

@ollef
Copy link
Owner

ollef commented Jan 31, 2017

I think we could also do an exhaustive test up to some fixed input length and get some more confidence than random tests would give us.

@ollef
Copy link
Owner

ollef commented Feb 1, 2017

There's some code for this here: https://github.com/ollef/Earley/tree/Generator

Does anyone want to volunteer some testing and/or tests?

@ollef
Copy link
Owner

ollef commented Feb 1, 2017

language romanNumeralsGrammar "IVX"
= [(0,""),(1,"I"),(5,"V"),(10,"X"),(20,"XX"),(11,"XI"),(15,"XV"),(6,"VI"),(9,"IX"),(4,"IV"),(2,"II"),(3,"III"),(19,"XIX"),(16,"XVI"),(14,"XIV"),(12,"XII"),(7,"VII"),(21,"XXI"),(25,"XXV"),(30,"XXX"),(31,"XXXI"),(35,"XXXV"),(8,"VIII"),(13,"XIII"),(17,"XVII"),(26,"XXVI"),(29,"XXIX"),(24,"XXIV"),(22,"XXII"),(18,"XVIII"),(36,"XXXVI"),(39,"XXXIX"),(34,"XXXIV"),(32,"XXXII"),(23,"XXIII"),(27,"XXVII"),(33,"XXXIII"),(28,"XXVIII"),(37,"XXXVII"),(38,"XXXVIII")]

Pretty fun stuff!

@copumpkin
Copy link
Author

@ollef is the "IVX" above the set of terminals to explore the grammar over, due to the token -> Bool predicate thing mentioned earlier?

@ollef
Copy link
Owner

ollef commented Feb 1, 2017

@copumpkin: Yeah, that's it!

@copumpkin
Copy link
Author

Oh that's super cool, so it's actually finite in this case

@ollef
Copy link
Owner

ollef commented Feb 1, 2017

It's finite unless you give it some more tokens to work with (apparently).

TODO:

  • Add tests and make sure this works. Maybe try getting some tests that use both the language generator and the parser going.
  • Bikeshed the name of the module (currently Text.Earley.Language), and the generation function (currently language).
  • Expose the internals of the language generator as an .Internal module.
  • Add info about this to the readme
  • Write changelog, bump version, upload to Hackage.

@phadej
Copy link
Collaborator

phadej commented Feb 2, 2017

@ollef have you tried it with nasty grammars? left-recursive, right-recursive, ridiculously ambiguous?

@sboosali
Copy link
Collaborator

sboosali commented Feb 2, 2017 via email

@ollef
Copy link
Owner

ollef commented Feb 2, 2017

@phadej: I haven't done any principled testing, just tried a few examples. Those are good starting points, thanks! 👍

@sboosali: Hmm, I don't quite understand what you mean. Don't we want this to work even if a grammar generates an infinite language? What we're after, I think, is that the language generation is productive, such that you can take the number of results you want in infinite cases.

It'll likely have roughly the same restrictions as the parser by the way, so degenerate grammars like https://github.com/ollef/Earley/blob/master/examples/VeryAmbiguous.hs will loop because they produce circular results.

@phadej
Copy link
Collaborator

phadej commented Feb 2, 2017

@sboosali for productivity. For finite languages you could use regex-applicative (and even regular languages aren't always finite).

@sboosali
Copy link
Collaborator

sboosali commented Feb 2, 2017 via email

@ollef
Copy link
Owner

ollef commented Feb 2, 2017

The results are roughly in the same order as from the parser, so ordered by input length, and within the same input length shuffled around a bit (governed by the internals of the implementation).

I'm sure people can find more uses, but here are some that I can think of:

  • Manual inspection.
  • Ambiguity checking, e.g. up to some input length (as mentioned above).
  • Test case generation. For example, if you have some code that uses parse results that code can be tested, exhaustively or randomly, against the elements of the language that the grammar generates.

@ollef
Copy link
Owner

ollef commented Feb 8, 2017

This is included in the 0.12.0.0 release. Thanks for reminding me!

@ollef ollef closed this as completed Feb 8, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants