Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Field access of block: discrepancy between parser-lalr and rustc #28379

Open
rprichard opened this Issue Sep 12, 2015 · 11 comments

Comments

Projects
None yet
6 participants
@rprichard
Copy link
Contributor

rprichard commented Sep 12, 2015

struct S { f: u32 }
fn main() {
    { S { f: 42 } }.f; // rustc accepts, parser-lalr rejects
    { 42 } as i32; // rustc and parser-lalr reject
    { 42 } == 42; // rustc and parser-lalr reject
}

The grammar in src/grammar/parser-lalr.y should probably be relaxed to accept the first example.

AFAIK, Rust could allow the other two cases (and others), because there are no statements or expressions that begin with as or ==.

@steveklabnik steveklabnik added the A-docs label Sep 28, 2015

@steveklabnik

This comment has been minimized.

Copy link
Member

steveklabnik commented Sep 30, 2015

/cc @rust-lang/lang , is this grammar correct or just incidental?

(tagging with lang until I know what they want, and I'll document that.)

@steveklabnik steveklabnik added A-lang and removed A-docs labels Sep 30, 2015

@nikomatsakis

This comment has been minimized.

Copy link
Contributor

nikomatsakis commented Sep 30, 2015

Hmm, interesting question! I think the reference grammar is correct here, and rustc is not working properly. The rule is that "statement-like" expressions cannot take part in binary expressions; I think that should include the . operator. I think that the code incorrectly permits block expressions to be used with the "suffix" operators (., [], ()). But there are some weird edge cases too:

For example, this fails to compile:

fn foo(x: u32) { }
fn main() {
  let f = foo;
  { f } (22)
}

but the error it gives me is surprising:

<anon>:4:5: 4:6 error: mismatched types:
 expected `()`,
    found `fn(u32) {foo}`
(expected (),
    found fn item) [E0308]
<anon>:4   { f } (22);
             ^
<anon>:4:5: 4:6 help: see the detailed explanation for E0308
error: aborting due to previous error
playpen: application terminated with error code 101
@rprichard

This comment has been minimized.

Copy link
Contributor Author

rprichard commented Sep 30, 2015

If {1} is a statement-like expression, then it can appear in a binary expression. This is accepted by rustc and parser-lalr:

fn main() {
  1 == {1};
  1 == ({1} + {1});
}

As the LALR grammar describes it, a semicolon-terminated expression statement is a nonblock_expr non-terminal. A nonblock_expr can be a binary operation, and the LHS must also be a nonblock_expr, whereas the RHS can be an arbitrary expr. I can't think of a nicer way to describe the rule at the moment.

Both parsers reject this:

fn main() {
  {1} == 1;
}

I discovered an interesting case, though:

fn main() {
  let x = { 1 + {2} * {3} };
  println!("{}", x);
}

This code parses and compiles, but the two parsers disagree on what it should output.

rustc says 9, indicating a parse tree of (1 + 2) * 3.
parser-lalr -v instead parses it as expected, 1 + (2 * 3):

|   |   |   |   |   |   |   (ExprBlock
|   |   |   |   |   |   |   |   (ExprBinary
|   |   |   |   |   |   |   |   |   BiAdd
|   |   |   |   |   |   |   |   |   (ExprLit
|   |   |   |   |   |   |   |   |   |   (LitInteger
|   |   |   |   |   |   |   |   |   |   |   1
|   |   |   |   |   |   |   |   |   |   )
|   |   |   |   |   |   |   |   |   )
|   |   |   |   |   |   |   |   |   (ExprBinary
|   |   |   |   |   |   |   |   |   |   BiMul
|   |   |   |   |   |   |   |   |   |   (ExprBlock
|   |   |   |   |   |   |   |   |   |   |   (ExprLit
|   |   |   |   |   |   |   |   |   |   |   |   (LitInteger
|   |   |   |   |   |   |   |   |   |   |   |   |   2
|   |   |   |   |   |   |   |   |   |   |   |   )
|   |   |   |   |   |   |   |   |   |   |   )
|   |   |   |   |   |   |   |   |   |   )
|   |   |   |   |   |   |   |   |   |   (ExprBlock
|   |   |   |   |   |   |   |   |   |   |   (ExprLit
|   |   |   |   |   |   |   |   |   |   |   |   (LitInteger
|   |   |   |   |   |   |   |   |   |   |   |   |   3
|   |   |   |   |   |   |   |   |   |   |   |   )
|   |   |   |   |   |   |   |   |   |   |   )
|   |   |   |   |   |   |   |   |   |   )
|   |   |   |   |   |   |   |   |   )
|   |   |   |   |   |   |   |   )
|   |   |   |   |   |   |   )

If I remove the unnecessary block around the expression, rustc prints 7 as expected:

fn main() {
  let x = 1 + {2} * {3};  // removed the unnecessary block here
  println!("{}", x);
}
@rprichard

This comment has been minimized.

Copy link
Contributor Author

rprichard commented Sep 30, 2015

I split off the operator precedence issue to #28777.

@nikomatsakis

This comment has been minimized.

Copy link
Contributor

nikomatsakis commented Sep 30, 2015

Let me be more precise. It cannot START a binary operator, when in
statement position. There is a big difference between "{1} + 2" and "1 +
{2}", in other words.

On Wed, Sep 30, 2015 at 4:45 PM, Ryan Prichard notifications@github.com
wrote:

If {1} is a statement-like expression, then it can appear in a binary
expression. This is accepted by rustc and parser-lalr:

fn main() {
1 == {1};
1 == ({1} + {1});
}

As the LALR grammar describes it, a semicolon-terminated expression
statement is a nonblock_expr non-terminal. A nonblock_expr can be a
binary operation, and the LHS must also be a nonblock_expr, whereas the
RHS can be an arbitrary expr. I can't think of a nicer way to describe
the rule at the moment.

Both parsers reject this:

fn main() {
{1} == 1;
}

I discovered an interesting case, though:

fn main() {
let x = { 1 + {2} * {3} };
println!("{}", x);
}

This code parses and compiles, but the two parsers disagree on what it
should output.

rustc says 9, indicating a parse tree of (1 + 2) * 3.
parser-lalr -v instead parses it as expected, 1 + (2 * 3):

| | | | | | | (ExprBlock
| | | | | | | | (ExprBinary
| | | | | | | | | BiAdd
| | | | | | | | | (ExprLit
| | | | | | | | | | (LitInteger
| | | | | | | | | | | 1
| | | | | | | | | | )
| | | | | | | | | )
| | | | | | | | | (ExprBinary
| | | | | | | | | | BiMul
| | | | | | | | | | (ExprBlock
| | | | | | | | | | | (ExprLit
| | | | | | | | | | | | (LitInteger
| | | | | | | | | | | | | 2
| | | | | | | | | | | | )
| | | | | | | | | | | )
| | | | | | | | | | )
| | | | | | | | | | (ExprBlock
| | | | | | | | | | | (ExprLit
| | | | | | | | | | | | (LitInteger
| | | | | | | | | | | | | 3
| | | | | | | | | | | | )
| | | | | | | | | | | )
| | | | | | | | | | )
| | | | | | | | | )
| | | | | | | | )
| | | | | | | )

If I remove the unnecessary block around the expression, rustc prints 7
as expected:

fn main() {
let x = 1 + {2} * {3}; // removed the unnecessary block here
println!("{}", x);
}


Reply to this email directly or view it on GitHub
#28379 (comment).

@nikomatsakis

This comment has been minimized.

Copy link
Contributor

nikomatsakis commented Sep 30, 2015

In any case, this all feels like good motivation for me to try and get some
version of parser-lalr running in LALRPOP :)

On Wed, Sep 30, 2015 at 5:05 PM, Nicholas Matsakis nmatsakis@mozilla.com
wrote:

Let me be more precise. It cannot START a binary operator, when in
statement position. There is a big difference between "{1} + 2" and "1 +
{2}", in other words.

On Wed, Sep 30, 2015 at 4:45 PM, Ryan Prichard notifications@github.com
wrote:

If {1} is a statement-like expression, then it can appear in a binary
expression. This is accepted by rustc and parser-lalr:

fn main() {
1 == {1};
1 == ({1} + {1});
}

As the LALR grammar describes it, a semicolon-terminated expression
statement is a nonblock_expr non-terminal. A nonblock_expr can be a
binary operation, and the LHS must also be a nonblock_expr, whereas the
RHS can be an arbitrary expr. I can't think of a nicer way to describe
the rule at the moment.

Both parsers reject this:

fn main() {
{1} == 1;
}

I discovered an interesting case, though:

fn main() {
let x = { 1 + {2} * {3} };
println!("{}", x);
}

This code parses and compiles, but the two parsers disagree on what it
should output.

rustc says 9, indicating a parse tree of (1 + 2) * 3.
parser-lalr -v instead parses it as expected, 1 + (2 * 3):

| | | | | | | (ExprBlock
| | | | | | | | (ExprBinary
| | | | | | | | | BiAdd
| | | | | | | | | (ExprLit
| | | | | | | | | | (LitInteger
| | | | | | | | | | | 1
| | | | | | | | | | )
| | | | | | | | | )
| | | | | | | | | (ExprBinary
| | | | | | | | | | BiMul
| | | | | | | | | | (ExprBlock
| | | | | | | | | | | (ExprLit
| | | | | | | | | | | | (LitInteger
| | | | | | | | | | | | | 2
| | | | | | | | | | | | )
| | | | | | | | | | | )
| | | | | | | | | | )
| | | | | | | | | | (ExprBlock
| | | | | | | | | | | (ExprLit
| | | | | | | | | | | | (LitInteger
| | | | | | | | | | | | | 3
| | | | | | | | | | | | )
| | | | | | | | | | | )
| | | | | | | | | | )
| | | | | | | | | )
| | | | | | | | )
| | | | | | | )

If I remove the unnecessary block around the expression, rustc prints 7
as expected:

fn main() {
let x = 1 + {2} * {3}; // removed the unnecessary block here
println!("{}", x);
}


Reply to this email directly or view it on GitHub
#28379 (comment).

@nikomatsakis

This comment has been minimized.

Copy link
Contributor

nikomatsakis commented Oct 1, 2015

Nominating for discussion. Related to #28785

@nikomatsakis

This comment has been minimized.

Copy link
Contributor

nikomatsakis commented Oct 1, 2015

I just realized why that example I gave of {f} (22) is giving me the type error. It's the rule which says that a brace-expression, when used as a statement, must have type unit. Precisely to prevent this sort of confusion!

@nikomatsakis

This comment has been minimized.

Copy link
Contributor

nikomatsakis commented Oct 1, 2015

triage: P-medium

This is a backwards incompat issue, but feels like an edge case. Nonetheless we should definitely fix it!

@rprichard

This comment has been minimized.

Copy link
Contributor Author

rprichard commented Nov 13, 2015

I discovered another quirk involving the LALR grammar -- it accepts this:

fn main() {
    struct S { f: i32 };
    let z = S { f: 0 };
    match 0 { _ => &z }.f   // <-- note the lack of semicolon
    match 0 { _ => &z }.f   // <-- note the lack of semicolon
    ()
}

A stmt can be a full_block_expr, which is a block_expr with optional named/integral field access operators suffixed (and with optional index or call operators suffixed to the field access operators). However, if there is a field access suffixed, then the full expression must have a semicolon to be a statement, but that restriction isn't implemented. The second match example is parsed as a function call, naturally.

rustc also accepts the code, but only if the semicolons are added after the two match lines. I think @nikomatsakis is saying that rustc should not accept them, due to the rule about blocks starting binops.

This quirk only applies to non-basic block expressions. parser-lalr does not parse the code if the lines are replaced with { &z }.f.

@rprichard

This comment has been minimized.

Copy link
Contributor Author

rprichard commented Nov 13, 2015

A stmt can be a full_block_expr, which is a block_expr with optional named/integral field access operators suffixed (and with optional index or call operators suffixed to the field access operators).

This statement is not completely true, which reveals another LALR bug. It should accept this:

fn main() {
    { ([0],) }.0[0];
    { ((|| {}),) }.0();
}

parser-lalr doesn't because full_block_expr only handles the index/call suffixes after a named field, not after a tuple index field.

Edit: I'm slightly off still. parser-lalr rejects this because it doesn't allow a field of a plain block. If I instead use match expressions, then parser-lalr accepts the code, but it parses it differently than rustc. i.e. It parses [0] and () statements instead of index and call expressions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.