Grammar tests #2234

brson · 2012-04-17T23:33:41Z

It would be nice to have confidence about what sort of grammar Rust has. One possible way we could do that:

Extract EBNF from the manual
Use a script to convert it to whatever format antlr wants
Run the entire test suite through the both the rustc parser and antlr parser. Where one fails so should the other.

nikomatsakis · 2012-04-18T01:39:14Z

How many negative parsing tests do we have? I'm guessing not many... but we
could make some.

brson · 2012-04-18T02:46:32Z

I also would assume that coverage of failure cases in the parser is not great, so we could kill two birds with one stone.

brson · 2012-04-18T02:47:48Z

Speaking of coverage, today I realized that #690 is unblocked. It might be possible for us to measure test coverage.

graydon · 2012-04-18T17:54:35Z

Antlr is a possibility but I think it is willing to handle a lot more grammar ambiguity, and it tends to blur lexing and parsing rules. I want us to remain in the classical regular-lexing + LL(1)-parsing space.

I picked the EBNF in the manual for compatibility with llnextgen, http://os.ghalkes.nl/LLnextgen/ ; I got part way into wiring up the rules for extracting and testing the grammar but didn't finish in time for 0.1, haven't come back to it yet.

Lexical rules I figured we could feed to quex http://quex.sourceforge.net/ but other possibilities exist. It just seems like the current leader in the space we're interested in.

brson · 2012-04-19T03:45:44Z

We can use the fuzzer to find arbitrary numbers of random samples to feed to both parsers.

nikomatsakis · 2012-04-19T03:55:20Z

good idea...

graydon · 2013-04-25T14:10:18Z

nominating for well-defined

emberian · 2013-07-07T10:29:53Z

Still relevant

fhahn · 2013-09-15T10:16:56Z

I had a look at this issue and started working on a parser using Flex and LLnextgen (https://github.com/fhahn/rust-grammer). Right now, the parser supports only a tiny tiny bit of the Rust grammer, but I wanted to make sure my approach is valid, before continuing.

One main difference to the grammer specification in the documentation is that flex uses regular expressions for token definitions, not ebnf, so I started converting the ebnf from section 3 "Lexical structure" to regular expressions for flex.

huonw · 2013-09-15T11:16:12Z

Note that grammar in the manual is highly likely to be incorrect (which is presumably exactly what this issue is aiming to address).

Kimundi · 2013-09-15T13:20:35Z

Note that someone already completed an grammar months ago, it just never got used for anything yet. No idea where to find it though.

emberian · 2013-09-15T15:35:02Z

https://github.com/jbclements/rust-antlr

To the best of his knowledge, it was correct at the time.

On Sun, Sep 15, 2013 at 9:20 AM, Marvin Löbel notifications@github.comwrote:

Note that someone already completed an grammar months ago, it just never
got used for anything yet. No idea where to find it though.

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/2234#issuecomment-24471342
.

fhahn · 2013-09-15T16:06:13Z

I found a extract_grammer.py script in src/etc, which extracts the grammer from rust.md, but the resulting grammer does not work with LLnextgen at the moment. There is no separate lexing step. My approach is to use flex for lexical analysis (for everything in Section 3 of the manual) and use LLnextgen for the parsing.

So far, I stumbled over one part of the grammer where I think the productions in the rust.md are not LL(1).
I think there is a FIRST/FOLLOW conflict in block_comment . block_comment_body in block_comment is can be epsilon and is followed by '' and an alternative in block_comment_body starts with '', so it isn't possible to decide which production to take with look ahead of 1

block_comment : "/*" block_comment_body * '*' + '/' ;
block_comment_body : non_star * | '*' + non_slash_or_star ;

But I think ignoring comments should be done in the lexer, so this wouldn't be a problem for the Rust grammer being LL(1).

Kimundi · 2013-09-15T16:24:43Z

@fhahn Again, the grammar fragments in the manual are useless, what should actually be done is

Take @jbclements existing grammar, which has already been proven correct and LL(2) at the time, and verify it against the source.
Optionally transform it in something simpler, like EBNF.
Update the manual with the correct grammar.

But thanks for showing interest for this work. :)

fhahn · 2013-09-15T16:34:20Z

What's the preferred parser generator?
Should I use antlr and build on @jbclements project or LLnextgen (combined with flex)? LLnextgen would probably be easier to integrate into the main rust repository, because it does not rely on Java (and I'm more familiar with the flex + (yacc/LLnextgen) workflow)

emberian · 2013-09-15T16:36:55Z

@fhahn use whatever you're comfortable with, is my suggestion.

brson · 2014-04-03T23:38:12Z

I'm still very keen to make this happen for 1.0. There is existing infrastructure in the tree for testing the manual's grammer with llnextgen.

jbclements · 2014-04-04T01:23:56Z

I agree, this would be very valuable. Especially when we get automated testing working.

emberian · 2014-04-04T01:34:18Z

I'm working on updating rust-antlr.

bleibig · 2014-04-15T04:31:56Z

I've made some good progress on an LLnextgen-capable grammar at https://github.com/bleibig/rust-grammar. There's still a ways to go, but once it's done it shouldn't be hard to integrate the grammar into the manual and have the grammar tests work with that.

brson · 2014-04-15T19:22:23Z

Nominating. I want to have confidence in our grammar.

pnkfelix · 2014-04-17T21:10:19Z

leaving as P-high. We really would like to have a formal definition of our grammar and have it tested, but we do not think it should be a blocker for 1.0 at this time.

Cc'ing @cmr since he is working on grammar stuff.

Arcnor · 2014-06-04T12:08:57Z

@cmr did you end up updating rust-antlr? I wanted to have some sort of IDE support for Rust, and having an existing ANTLR grammar will make that happen a lot faster.

emberian · 2014-06-04T17:52:26Z

@Arcnor actively working on it.

steveklabnik · 2015-01-21T19:28:50Z

Was going to move to the RFC repo, but let's see how #21452 shakes out.

This adds a new lexer/parser combo for the entire Rust language can be generated with with flex and bison, taken from my project at https://github.com/bleibig/rust-grammar. There is also a testing script that runs the generated parser with all *.rs files in the repository (except for tests in compile-fail or ones that marked as "ignore-test" or "ignore-lexer-test"). If you have flex and bison installed, you can run these tests using the new "check-grammar" make target. This does not depend on or interact with the existing testing code in the grammar, which only provides and tests a lexer specification. OS X users should take note that the version of bison that comes with the Xcode toolchain (2.3) is too old to work with this grammar, they need to download and install version 3.0 or later. The parser builds up an S-expression-based AST, which can be displayed by giving the "-v" argument to parser-lalr (normally it only gives output on error). It is only a rough approximation of what is parsed and doesn't capture every detail and nuance of the program. Hopefully this should be sufficient for issue #2234, or at least a good starting point.

steveklabnik · 2015-04-28T20:51:42Z

#21452 added make check-grammar.

eholk mentioned this issue Jul 25, 2012

Generate gcov coverage data #690

Closed

brson mentioned this issue Sep 24, 2012

Miscellaneous Rust projects Mozilla-Student-Projects/Projects-Tracker#38

Closed

graydon mentioned this issue Apr 25, 2013

Number grammar #1589

Closed

pnkfelix mentioned this issue May 6, 2013

| with the pat macro fragment specifier #4581

Closed

brson added the I-nominated label Apr 15, 2014

pnkfelix removed the I-nominated label Apr 17, 2014

bleibig mentioned this issue Jan 21, 2015

Add a LALR grammar for Rust with testing support #21452

Merged

steveklabnik closed this as completed Apr 28, 2015

edunham mentioned this issue Sep 22, 2015

Automatically run grammar tests (grammar bot) #28592

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Grammar tests #2234

Grammar tests #2234

brson commented Apr 17, 2012

nikomatsakis commented Apr 18, 2012

brson commented Apr 18, 2012

brson commented Apr 18, 2012

graydon commented Apr 18, 2012

brson commented Apr 19, 2012

nikomatsakis commented Apr 19, 2012

graydon commented Apr 25, 2013

emberian commented Jul 7, 2013

fhahn commented Sep 15, 2013

huonw commented Sep 15, 2013

Kimundi commented Sep 15, 2013

emberian commented Sep 15, 2013

fhahn commented Sep 15, 2013

Kimundi commented Sep 15, 2013

fhahn commented Sep 15, 2013

emberian commented Sep 15, 2013

brson commented Apr 3, 2014

jbclements commented Apr 4, 2014

emberian commented Apr 4, 2014

bleibig commented Apr 15, 2014

brson commented Apr 15, 2014

pnkfelix commented Apr 17, 2014

Arcnor commented Jun 4, 2014

emberian commented Jun 4, 2014

steveklabnik commented Jan 21, 2015

steveklabnik commented Apr 28, 2015

Grammar tests #2234

Grammar tests #2234

Comments

brson commented Apr 17, 2012

nikomatsakis commented Apr 18, 2012

brson commented Apr 18, 2012

brson commented Apr 18, 2012

graydon commented Apr 18, 2012

brson commented Apr 19, 2012

nikomatsakis commented Apr 19, 2012

graydon commented Apr 25, 2013

emberian commented Jul 7, 2013

fhahn commented Sep 15, 2013

huonw commented Sep 15, 2013

Kimundi commented Sep 15, 2013

emberian commented Sep 15, 2013

fhahn commented Sep 15, 2013

Kimundi commented Sep 15, 2013

fhahn commented Sep 15, 2013

emberian commented Sep 15, 2013

brson commented Apr 3, 2014

jbclements commented Apr 4, 2014

emberian commented Apr 4, 2014

bleibig commented Apr 15, 2014

brson commented Apr 15, 2014

pnkfelix commented Apr 17, 2014

Arcnor commented Jun 4, 2014

emberian commented Jun 4, 2014

steveklabnik commented Jan 21, 2015

steveklabnik commented Apr 28, 2015