Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Grammar tests #2234

Closed
brson opened this issue Apr 17, 2012 · 26 comments
Closed

Grammar tests #2234

brson opened this issue Apr 17, 2012 · 26 comments
Labels
A-grammar Area: The grammar of Rust A-testsuite Area: The testsuite used to check the correctness of rustc P-medium Medium priority

Comments

@brson
Copy link
Contributor

brson commented Apr 17, 2012

It would be nice to have confidence about what sort of grammar Rust has. One possible way we could do that:

  • Extract EBNF from the manual
  • Use a script to convert it to whatever format antlr wants
  • Run the entire test suite through the both the rustc parser and antlr parser. Where one fails so should the other.
@nikomatsakis
Copy link
Contributor

How many negative parsing tests do we have? I'm guessing not many... but we
could make some.

@brson
Copy link
Contributor Author

brson commented Apr 18, 2012

I also would assume that coverage of failure cases in the parser is not great, so we could kill two birds with one stone.

@brson
Copy link
Contributor Author

brson commented Apr 18, 2012

Speaking of coverage, today I realized that #690 is unblocked. It might be possible for us to measure test coverage.

@graydon
Copy link
Contributor

graydon commented Apr 18, 2012

Antlr is a possibility but I think it is willing to handle a lot more grammar ambiguity, and it tends to blur lexing and parsing rules. I want us to remain in the classical regular-lexing + LL(1)-parsing space.

I picked the EBNF in the manual for compatibility with llnextgen, http://os.ghalkes.nl/LLnextgen/ ; I got part way into wiring up the rules for extracting and testing the grammar but didn't finish in time for 0.1, haven't come back to it yet.

Lexical rules I figured we could feed to quex http://quex.sourceforge.net/ but other possibilities exist. It just seems like the current leader in the space we're interested in.

@brson
Copy link
Contributor Author

brson commented Apr 19, 2012

We can use the fuzzer to find arbitrary numbers of random samples to feed to both parsers.

@nikomatsakis
Copy link
Contributor

good idea...

@graydon
Copy link
Contributor

graydon commented Apr 25, 2013

nominating for well-defined

@emberian
Copy link
Member

emberian commented Jul 7, 2013

Still relevant

@fhahn
Copy link
Contributor

fhahn commented Sep 15, 2013

I had a look at this issue and started working on a parser using Flex and LLnextgen (https://github.com/fhahn/rust-grammer). Right now, the parser supports only a tiny tiny bit of the Rust grammer, but I wanted to make sure my approach is valid, before continuing.

One main difference to the grammer specification in the documentation is that flex uses regular expressions for token definitions, not ebnf, so I started converting the ebnf from section 3 "Lexical structure" to regular expressions for flex.

@huonw
Copy link
Member

huonw commented Sep 15, 2013

Note that grammar in the manual is highly likely to be incorrect (which is presumably exactly what this issue is aiming to address).

@Kimundi
Copy link
Member

Kimundi commented Sep 15, 2013

Note that someone already completed an grammar months ago, it just never got used for anything yet. No idea where to find it though.

@emberian
Copy link
Member

https://github.com/jbclements/rust-antlr

To the best of his knowledge, it was correct at the time.

On Sun, Sep 15, 2013 at 9:20 AM, Marvin Löbel notifications@github.comwrote:

Note that someone already completed an grammar months ago, it just never
got used for anything yet. No idea where to find it though.


Reply to this email directly or view it on GitHubhttps://github.com//issues/2234#issuecomment-24471342
.

@fhahn
Copy link
Contributor

fhahn commented Sep 15, 2013

I found a extract_grammer.py script in src/etc, which extracts the grammer from rust.md, but the resulting grammer does not work with LLnextgen at the moment. There is no separate lexing step. My approach is to use flex for lexical analysis (for everything in Section 3 of the manual) and use LLnextgen for the parsing.

So far, I stumbled over one part of the grammer where I think the productions in the rust.md are not LL(1).
I think there is a FIRST/FOLLOW conflict in block_comment . block_comment_body in block_comment is can be epsilon and is followed by '' and an alternative in block_comment_body starts with '', so it isn't possible to decide which production to take with look ahead of 1

block_comment : "/*" block_comment_body * '*' + '/' ;
block_comment_body : non_star * | '*' + non_slash_or_star ;

But I think ignoring comments should be done in the lexer, so this wouldn't be a problem for the Rust grammer being LL(1).

@Kimundi
Copy link
Member

Kimundi commented Sep 15, 2013

@fhahn Again, the grammar fragments in the manual are useless, what should actually be done is

  • Take @jbclements existing grammar, which has already been proven correct and LL(2) at the time, and verify it against the source.
  • Optionally transform it in something simpler, like EBNF.
  • Update the manual with the correct grammar.

But thanks for showing interest for this work. :)

@fhahn
Copy link
Contributor

fhahn commented Sep 15, 2013

What's the preferred parser generator?
Should I use antlr and build on @jbclements project or LLnextgen (combined with flex)? LLnextgen would probably be easier to integrate into the main rust repository, because it does not rely on Java (and I'm more familiar with the flex + (yacc/LLnextgen) workflow)

@emberian
Copy link
Member

@fhahn use whatever you're comfortable with, is my suggestion.

@brson
Copy link
Contributor Author

brson commented Apr 3, 2014

I'm still very keen to make this happen for 1.0. There is existing infrastructure in the tree for testing the manual's grammer with llnextgen.

@jbclements
Copy link
Contributor

I agree, this would be very valuable. Especially when we get automated testing working.

@emberian
Copy link
Member

emberian commented Apr 4, 2014

I'm working on updating rust-antlr.

@bleibig
Copy link
Contributor

bleibig commented Apr 15, 2014

I've made some good progress on an LLnextgen-capable grammar at https://github.com/bleibig/rust-grammar. There's still a ways to go, but once it's done it shouldn't be hard to integrate the grammar into the manual and have the grammar tests work with that.

@brson
Copy link
Contributor Author

brson commented Apr 15, 2014

Nominating. I want to have confidence in our grammar.

@pnkfelix
Copy link
Member

leaving as P-high. We really would like to have a formal definition of our grammar and have it tested, but we do not think it should be a blocker for 1.0 at this time.

Cc'ing @cmr since he is working on grammar stuff.

@Arcnor
Copy link

Arcnor commented Jun 4, 2014

@cmr did you end up updating rust-antlr? I wanted to have some sort of IDE support for Rust, and having an existing ANTLR grammar will make that happen a lot faster.

@emberian
Copy link
Member

emberian commented Jun 4, 2014

@Arcnor actively working on it.

@steveklabnik
Copy link
Member

Was going to move to the RFC repo, but let's see how #21452 shakes out.

bors added a commit that referenced this issue Jan 24, 2015
This adds a new lexer/parser combo for the entire Rust language can be generated with with flex and bison, taken from my project at https://github.com/bleibig/rust-grammar. There is also a testing script that runs the generated parser with all *.rs files in the repository (except for tests in compile-fail or ones that marked as "ignore-test" or "ignore-lexer-test"). If you have flex and bison installed, you can run these tests using the new "check-grammar" make target.

This does not depend on or interact with the existing testing code in the grammar, which only provides and tests a lexer specification.

OS X users should take note that the version of bison that comes with the Xcode toolchain (2.3) is too old to work with this grammar, they need to download and install version 3.0 or later.

The parser builds up an S-expression-based AST, which can be displayed by giving the "-v" argument to parser-lalr (normally it only gives output on error). It is only a rough approximation of what is parsed and doesn't capture every detail and nuance of the program.

Hopefully this should be sufficient for issue #2234, or at least a good starting point.
@steveklabnik
Copy link
Member

#21452 added make check-grammar.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-grammar Area: The grammar of Rust A-testsuite Area: The testsuite used to check the correctness of rustc P-medium Medium priority
Projects
None yet
Development

No branches or pull requests