Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Field access of block: discrepancy between parser-lalr and rustc #28379

Closed
rprichard opened this issue Sep 12, 2015 · 12 comments
Closed

Field access of block: discrepancy between parser-lalr and rustc #28379

rprichard opened this issue Sep 12, 2015 · 12 comments
Labels
A-grammar Area: The grammar of Rust C-bug Category: This is a bug. I-needs-decision Issue: In need of a decision. P-medium Medium priority T-lang Relevant to the language team, which will review and decide on the PR/issue.

Comments

@rprichard
Copy link
Contributor

struct S { f: u32 }
fn main() {
    { S { f: 42 } }.f; // rustc accepts, parser-lalr rejects
    { 42 } as i32; // rustc and parser-lalr reject
    { 42 } == 42; // rustc and parser-lalr reject
}

The grammar in src/grammar/parser-lalr.y should probably be relaxed to accept the first example.

AFAIK, Rust could allow the other two cases (and others), because there are no statements or expressions that begin with as or ==.

@steveklabnik
Copy link
Member

/cc @rust-lang/lang , is this grammar correct or just incidental?

(tagging with lang until I know what they want, and I'll document that.)

@nikomatsakis
Copy link
Contributor

Hmm, interesting question! I think the reference grammar is correct here, and rustc is not working properly. The rule is that "statement-like" expressions cannot take part in binary expressions; I think that should include the . operator. I think that the code incorrectly permits block expressions to be used with the "suffix" operators (., [], ()). But there are some weird edge cases too:

For example, this fails to compile:

fn foo(x: u32) { }
fn main() {
  let f = foo;
  { f } (22)
}

but the error it gives me is surprising:

<anon>:4:5: 4:6 error: mismatched types:
 expected `()`,
    found `fn(u32) {foo}`
(expected (),
    found fn item) [E0308]
<anon>:4   { f } (22);
             ^
<anon>:4:5: 4:6 help: see the detailed explanation for E0308
error: aborting due to previous error
playpen: application terminated with error code 101

@rprichard
Copy link
Contributor Author

If {1} is a statement-like expression, then it can appear in a binary expression. This is accepted by rustc and parser-lalr:

fn main() {
  1 == {1};
  1 == ({1} + {1});
}

As the LALR grammar describes it, a semicolon-terminated expression statement is a nonblock_expr non-terminal. A nonblock_expr can be a binary operation, and the LHS must also be a nonblock_expr, whereas the RHS can be an arbitrary expr. I can't think of a nicer way to describe the rule at the moment.

Both parsers reject this:

fn main() {
  {1} == 1;
}

I discovered an interesting case, though:

fn main() {
  let x = { 1 + {2} * {3} };
  println!("{}", x);
}

This code parses and compiles, but the two parsers disagree on what it should output.

rustc says 9, indicating a parse tree of (1 + 2) * 3.
parser-lalr -v instead parses it as expected, 1 + (2 * 3):

|   |   |   |   |   |   |   (ExprBlock
|   |   |   |   |   |   |   |   (ExprBinary
|   |   |   |   |   |   |   |   |   BiAdd
|   |   |   |   |   |   |   |   |   (ExprLit
|   |   |   |   |   |   |   |   |   |   (LitInteger
|   |   |   |   |   |   |   |   |   |   |   1
|   |   |   |   |   |   |   |   |   |   )
|   |   |   |   |   |   |   |   |   )
|   |   |   |   |   |   |   |   |   (ExprBinary
|   |   |   |   |   |   |   |   |   |   BiMul
|   |   |   |   |   |   |   |   |   |   (ExprBlock
|   |   |   |   |   |   |   |   |   |   |   (ExprLit
|   |   |   |   |   |   |   |   |   |   |   |   (LitInteger
|   |   |   |   |   |   |   |   |   |   |   |   |   2
|   |   |   |   |   |   |   |   |   |   |   |   )
|   |   |   |   |   |   |   |   |   |   |   )
|   |   |   |   |   |   |   |   |   |   )
|   |   |   |   |   |   |   |   |   |   (ExprBlock
|   |   |   |   |   |   |   |   |   |   |   (ExprLit
|   |   |   |   |   |   |   |   |   |   |   |   (LitInteger
|   |   |   |   |   |   |   |   |   |   |   |   |   3
|   |   |   |   |   |   |   |   |   |   |   |   )
|   |   |   |   |   |   |   |   |   |   |   )
|   |   |   |   |   |   |   |   |   |   )
|   |   |   |   |   |   |   |   |   )
|   |   |   |   |   |   |   |   )
|   |   |   |   |   |   |   )

If I remove the unnecessary block around the expression, rustc prints 7 as expected:

fn main() {
  let x = 1 + {2} * {3};  // removed the unnecessary block here
  println!("{}", x);
}

@rprichard
Copy link
Contributor Author

I split off the operator precedence issue to #28777.

@nikomatsakis
Copy link
Contributor

Let me be more precise. It cannot START a binary operator, when in
statement position. There is a big difference between "{1} + 2" and "1 +
{2}", in other words.

On Wed, Sep 30, 2015 at 4:45 PM, Ryan Prichard notifications@github.com
wrote:

If {1} is a statement-like expression, then it can appear in a binary
expression. This is accepted by rustc and parser-lalr:

fn main() {
1 == {1};
1 == ({1} + {1});
}

As the LALR grammar describes it, a semicolon-terminated expression
statement is a nonblock_expr non-terminal. A nonblock_expr can be a
binary operation, and the LHS must also be a nonblock_expr, whereas the
RHS can be an arbitrary expr. I can't think of a nicer way to describe
the rule at the moment.

Both parsers reject this:

fn main() {
{1} == 1;
}

I discovered an interesting case, though:

fn main() {
let x = { 1 + {2} * {3} };
println!("{}", x);
}

This code parses and compiles, but the two parsers disagree on what it
should output.

rustc says 9, indicating a parse tree of (1 + 2) * 3.
parser-lalr -v instead parses it as expected, 1 + (2 * 3):

| | | | | | | (ExprBlock
| | | | | | | | (ExprBinary
| | | | | | | | | BiAdd
| | | | | | | | | (ExprLit
| | | | | | | | | | (LitInteger
| | | | | | | | | | | 1
| | | | | | | | | | )
| | | | | | | | | )
| | | | | | | | | (ExprBinary
| | | | | | | | | | BiMul
| | | | | | | | | | (ExprBlock
| | | | | | | | | | | (ExprLit
| | | | | | | | | | | | (LitInteger
| | | | | | | | | | | | | 2
| | | | | | | | | | | | )
| | | | | | | | | | | )
| | | | | | | | | | )
| | | | | | | | | | (ExprBlock
| | | | | | | | | | | (ExprLit
| | | | | | | | | | | | (LitInteger
| | | | | | | | | | | | | 3
| | | | | | | | | | | | )
| | | | | | | | | | | )
| | | | | | | | | | )
| | | | | | | | | )
| | | | | | | | )
| | | | | | | )

If I remove the unnecessary block around the expression, rustc prints 7
as expected:

fn main() {
let x = 1 + {2} * {3}; // removed the unnecessary block here
println!("{}", x);
}


Reply to this email directly or view it on GitHub
#28379 (comment).

@nikomatsakis
Copy link
Contributor

In any case, this all feels like good motivation for me to try and get some
version of parser-lalr running in LALRPOP :)

On Wed, Sep 30, 2015 at 5:05 PM, Nicholas Matsakis nmatsakis@mozilla.com
wrote:

Let me be more precise. It cannot START a binary operator, when in
statement position. There is a big difference between "{1} + 2" and "1 +
{2}", in other words.

On Wed, Sep 30, 2015 at 4:45 PM, Ryan Prichard notifications@github.com
wrote:

If {1} is a statement-like expression, then it can appear in a binary
expression. This is accepted by rustc and parser-lalr:

fn main() {
1 == {1};
1 == ({1} + {1});
}

As the LALR grammar describes it, a semicolon-terminated expression
statement is a nonblock_expr non-terminal. A nonblock_expr can be a
binary operation, and the LHS must also be a nonblock_expr, whereas the
RHS can be an arbitrary expr. I can't think of a nicer way to describe
the rule at the moment.

Both parsers reject this:

fn main() {
{1} == 1;
}

I discovered an interesting case, though:

fn main() {
let x = { 1 + {2} * {3} };
println!("{}", x);
}

This code parses and compiles, but the two parsers disagree on what it
should output.

rustc says 9, indicating a parse tree of (1 + 2) * 3.
parser-lalr -v instead parses it as expected, 1 + (2 * 3):

| | | | | | | (ExprBlock
| | | | | | | | (ExprBinary
| | | | | | | | | BiAdd
| | | | | | | | | (ExprLit
| | | | | | | | | | (LitInteger
| | | | | | | | | | | 1
| | | | | | | | | | )
| | | | | | | | | )
| | | | | | | | | (ExprBinary
| | | | | | | | | | BiMul
| | | | | | | | | | (ExprBlock
| | | | | | | | | | | (ExprLit
| | | | | | | | | | | | (LitInteger
| | | | | | | | | | | | | 2
| | | | | | | | | | | | )
| | | | | | | | | | | )
| | | | | | | | | | )
| | | | | | | | | | (ExprBlock
| | | | | | | | | | | (ExprLit
| | | | | | | | | | | | (LitInteger
| | | | | | | | | | | | | 3
| | | | | | | | | | | | )
| | | | | | | | | | | )
| | | | | | | | | | )
| | | | | | | | | )
| | | | | | | | )
| | | | | | | )

If I remove the unnecessary block around the expression, rustc prints 7
as expected:

fn main() {
let x = 1 + {2} * {3}; // removed the unnecessary block here
println!("{}", x);
}


Reply to this email directly or view it on GitHub
#28379 (comment).

@nikomatsakis
Copy link
Contributor

Nominating for discussion. Related to #28785

@nikomatsakis
Copy link
Contributor

I just realized why that example I gave of {f} (22) is giving me the type error. It's the rule which says that a brace-expression, when used as a statement, must have type unit. Precisely to prevent this sort of confusion!

@nikomatsakis
Copy link
Contributor

triage: P-medium

This is a backwards incompat issue, but feels like an edge case. Nonetheless we should definitely fix it!

@rprichard
Copy link
Contributor Author

I discovered another quirk involving the LALR grammar -- it accepts this:

fn main() {
    struct S { f: i32 };
    let z = S { f: 0 };
    match 0 { _ => &z }.f   // <-- note the lack of semicolon
    match 0 { _ => &z }.f   // <-- note the lack of semicolon
    ()
}

A stmt can be a full_block_expr, which is a block_expr with optional named/integral field access operators suffixed (and with optional index or call operators suffixed to the field access operators). However, if there is a field access suffixed, then the full expression must have a semicolon to be a statement, but that restriction isn't implemented. The second match example is parsed as a function call, naturally.

rustc also accepts the code, but only if the semicolons are added after the two match lines. I think @nikomatsakis is saying that rustc should not accept them, due to the rule about blocks starting binops.

This quirk only applies to non-basic block expressions. parser-lalr does not parse the code if the lines are replaced with { &z }.f.

@rprichard
Copy link
Contributor Author

A stmt can be a full_block_expr, which is a block_expr with optional named/integral field access operators suffixed (and with optional index or call operators suffixed to the field access operators).

This statement is not completely true, which reveals another LALR bug. It should accept this:

fn main() {
    { ([0],) }.0[0];
    { ((|| {}),) }.0();
}

parser-lalr doesn't because full_block_expr only handles the index/call suffixes after a named field, not after a tuple index field.

Edit: I'm slightly off still. parser-lalr rejects this because it doesn't allow a field of a plain block. If I instead use match expressions, then parser-lalr accepts the code, but it parses it differently than rustc. i.e. It parses [0] and () statements instead of index and call expressions.

@brson brson added A-grammar Area: The grammar of Rust I-wrong I-needs-decision Issue: In need of a decision. labels Aug 4, 2016
@steveklabnik steveklabnik added T-lang Relevant to the language team, which will review and decide on the PR/issue. and removed A-lang labels Mar 24, 2017
@Mark-Simulacrum Mark-Simulacrum added C-bug Category: This is a bug. and removed I-wrong labels Jul 24, 2017
@Enselic
Copy link
Member

Enselic commented Sep 24, 2023

Triage: src/grammar/parser-lalr.y was removed in #64896 so let's close this issue as obsolete.

@Enselic Enselic closed this as not planned Won't fix, can't repro, duplicate, stale Sep 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-grammar Area: The grammar of Rust C-bug Category: This is a bug. I-needs-decision Issue: In need of a decision. P-medium Medium priority T-lang Relevant to the language team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

7 participants