Range expressions: discrepancies between rustc and parser-lalr #28785

rprichard · 2015-10-01T05:53:05Z

I'd guess that the last test case here is a rustc bug, but I don't know about the others.

fn match_rangefull() {
    match .. { _ => () } // rustc compiles it, parser-lalr rejects
}
fn field_of_range() {
    123.. .start; // rustc rejects, parser-lalr accepts
}
fn eq_rangefull() {
    .. == ..; // rustc rejects, parser-lalr accepts
}
fn return_rangeto() -> std::ops::RangeTo<i32> {
    return ..5; // rustc compiles it, parser-lalr parses it as (return)..5
}
fn assign_range() {
    let x;
    x = 4..5; // rustc compiles it, parser-lalr parses it as (x = 4)..5
}
fn nonblock_expr_range() {
    // rustc compiles this, and I don't think it should.  I think parser-lalr
    // gets it right.
    //
    //    rustc parses it like:       let _one = (1..).start;
    //    parser-lalr parses it like: let _one = {1; ..}.start;
    //
    let _one = { {1} .. }.start;
}
fn main() {}

Also see #25119.

The text was updated successfully, but these errors were encountered:

rprichard · 2015-10-01T06:15:39Z

Looking through #20811, I see two more cases where rustc and parser-lalr disagree:

fn rangefrom_compare() {
    1.. == (1..); // rustc syntax error, parser-lalr parses as (1..) == (1..)
}
fn compare_rangeto() {
    (..1) == ..1; // rustc syntax error, parser-lalr parses as (..1) == (..1)
}

nikomatsakis · 2015-10-01T21:38:09Z

triage: P-medium

This is a backwards incompat issue, but feels like an edge case. Nonetheless we should definitely fix it!

dgrunwald · 2015-11-04T14:10:01Z

Can you explain why you think that { {1} .. }.start should parse as {1; ..}.start? I'm not familiar with the details of Rust's semicolon insertion.

I would expect {} within an expression to work pretty much wherever () does (except in if conditions / match expressions or similar, where {} starts the control flow block), and ((1)..).start is definitely valid syntax.

On all other examples, I'd say the rustc behavior is correct. The decision in #21374 was: .. is non-associative has the same precedence as the assignment operator. The unary forms of .. are only valid at the same precedence level as the binary form.

Comparing ranges needs parentheses: the attempt to give .. a higher precedence than the comparison operators in #20811 was not adopted because it caused new ambiguities between unary and binary &&, e.g. in r == 1.. && condition.

Admittedly, having .. at the same precedence level as assignment is somewhat weird, as the operators on that level have different associativity (assignment is right-associative; .. is non-associative). It would be better to put .. on its own level directly above the assignment operator and placement-new-arrow.

rprichard · 2015-11-05T03:59:08Z

Regarding { {1} .. }.start, my reasoning was based on the idea that a "statement-like" expression in statement position cannot start a binary expression. e.g. let x = { {} - 1 }; parses and type-checks because it's the same thing as:

let x = {
  {}
  -1
}

(Rust ignores newlines outside of string-literals, AFAIK.)

let x = { () - 1 }; doesn't type-check.

@nikomatsakis made some comments to this effect here:

Field access of block: discrepancy between parser-lalr and rustc #28379 (comment)
Field access of block: discrepancy between parser-lalr and rustc #28379 (comment)

Of course, 1.. is a unary expression, not a binary expression.

FWIW, I noticed that the current nightly Rust compiler is now parsing { {1} .. }.start differently -- the .. is parsed as a nullary operator (i.e. as-if there were a semicolon after the {1}).

dgrunwald · 2015-11-05T16:21:11Z

That makes sense; I didn't realize we had that rule about {} not starting binary expressions.

It also fits what I wrote in #25119: .. is only accepted as expression in locations where a..b would also be accepted.
{ {1} a..b } parses as if there's a semicolon after the {1}, so the same should apply to { { 1 } .. }.

In a formal grammar, the restrictions around .. are easily expressed by using a production for every precedence level:

asgn_expr
: range_expr
| range_expr "=" asgn_expr
;

range_expr
: logic_or_expr
| logic_or_expr ".." logic_or_expr
| logic_or_expr ".."
| ".." logic_or_expr
| ".."

logic_or_expr
: logic_and_expr
| logic_or_expr "||" logic_and_expr
;

Is there any way to do this with the %precedence mechanism? A production for each precedence level would bloat the grammar quite a bit, given that they get multiplied with the restrictions (normal, nonblock, nonparen, nostruct).

rprichard · 2015-11-05T17:58:40Z

I think we can reduce the problem to:

expr
: DOTDOT
| expr DOT ID
| ID
{}
;

We need to reject DOTDOT DOT ID. My full test case is here. Menhir and yacc both think my precedence declarations are useless. I suspect the precedence mechanism doesn't do what we want.

(Aside: I'm finding menhir --interpret --interpret-show-cst useful for playing with grammars. You can type DOTDOT DOT ID and see what CST is parsed, or whether it's rejected.)

Some parser generators (e.g. menhir) allowing parameterizing rules. e.g. Maybe a formal grammar could have a single range_expr rule that is parameterized over each of the restrictions.

(Also: I get the impression that the nonparen restriction class might be going away. It's needed only for the placement new box (xxx) yyy syntax, which has changed to box in xxx { yyy }.)

dgrunwald · 2015-11-08T13:36:46Z

Actually, I'm starting to think that the rustc behavior might not be ideal: it interacts badly with lambda syntax.

Rust currently has 7 non-associative operators: .. and the 6 comparison operators. But within lambdas, these behave differently!

Rustc currently rejects |a, b| a == b == c as a syntax error (cannot chain comparison operators).
But it parses |a, b| a .. b .. c successfully, as (|a, b| (a .. b)) .. c.

Lambdas have the same "precedence inversion" problem that I tried to avoid for (unary) ..: they can appear within operators with high precedence, but contain an arbitrary expression with lower precedence.

For example, a * || b + c is a syntactically valid expression, parsing as a * (|| (b + c)). A programmer who knows about the precedence of * and + but not about the effect of lambdas would probably expect (a * (|| b)) + c instead.

This gets especially problematic when the programmer assumes the lambda body is an expression, but the expression actually ends earlier. Is |a, b| a .. b .. c the only case of a valid expression that "breaks out" of the lambda?

One solution would be to parse .. as parser-larl does, and then later reject syntax trees where .. appears as operand of another operator with higher precedence. (similar to how chained comparisons are being rejected)

The other option would be to disallow lambdas from appearing as left-hand-operand of a binary operator (just like we already disallow blocks in such a position).

dgrunwald · 2015-11-08T13:46:17Z

|a, b| a .. b .. c isn't the only case, I found another one. The following is a valid rust program, but probably shouldn't be:

fn main() {
    || println!("Hello World") as ()()
}

I think we should disallow lambdas from appearing as left-hand-operand of a binary operator or as operand of a suffix expression.

nikomatsakis · 2015-11-09T19:26:45Z

On Thu, Nov 05, 2015 at 09:59:14AM -0800, Ryan Prichard wrote:

Some parser generators (e.g. menhir) allowing parameterizing rules. e.g. Maybe a formal grammar could have a single range_expr rule that is parameterized over each of the restrictions.

Just wanted to point out that LALRPOP supports this sort of
parameterization (and generates Rust). At some point I want to go
ahead and port one of these LALR Rust grammars to LALRPOP.

brson · 2016-08-04T16:25:41Z

@nikomatsakis can mentor. Needs investigation into the correct grammar.

Mark-Simulacrum · 2017-05-10T19:35:30Z

@nikomatsakis Could you either provide some instructions here, or otherwise unmark as E-mentor?

steveklabnik · 2018-10-31T16:19:45Z

Triage; is parser-lalr still a thing? is this issue relevant?

also, removing e-mentor

dgrunwald · 2021-12-30T14:22:52Z

Apparently || println!("Hello World") as ()() was fixed at some point by the introduction of a "casts cannot be followed by a function call" error, but the issues with the interaction of .. and lambdas still exist.

dtolnay · 2021-12-30T20:29:17Z

Yep: Parse & reject postfix operators after casts #68985

huonw added A-grammar Area: The grammar of Rust I-nominated T-lang Relevant to the language team, which will review and decide on the PR/issue. labels Oct 1, 2015

nikomatsakis mentioned this issue Oct 1, 2015

Field access of block: discrepancy between parser-lalr and rustc #28379

Closed

rust-highfive added P-medium Medium priority and removed I-nominated labels Oct 1, 2015

brson added the E-mentor Call for participation: This issue has a mentor. Use #t-compiler/help on Zulip for discussion. label Aug 4, 2016

brson added the I-wrong label Aug 4, 2016

nikomatsakis added I-needs-decision Issue: In need of a decision. and removed I-wrong labels Aug 4, 2016

brson added the I-wrong label Aug 4, 2016

dgrunwald mentioned this issue Oct 21, 2016

Tracking issue for stmt_expr_attributes: Add attributes to expressions, etc. #15701

Open

Mark-Simulacrum added C-bug Category: This is a bug. and removed I-wrong labels Jul 24, 2017

steveklabnik removed the E-mentor Call for participation: This issue has a mentor. Use #t-compiler/help on Zulip for discussion. label Oct 31, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Range expressions: discrepancies between rustc and parser-lalr #28785

Range expressions: discrepancies between rustc and parser-lalr #28785

rprichard commented Oct 1, 2015

rprichard commented Oct 1, 2015

nikomatsakis commented Oct 1, 2015

dgrunwald commented Nov 4, 2015

rprichard commented Nov 5, 2015

dgrunwald commented Nov 5, 2015

rprichard commented Nov 5, 2015

dgrunwald commented Nov 8, 2015

dgrunwald commented Nov 8, 2015

nikomatsakis commented Nov 9, 2015

brson commented Aug 4, 2016

Mark-Simulacrum commented May 10, 2017

steveklabnik commented Oct 31, 2018

dgrunwald commented Dec 30, 2021

dtolnay commented Dec 30, 2021

Range expressions: discrepancies between rustc and parser-lalr #28785

Range expressions: discrepancies between rustc and parser-lalr #28785

Comments

rprichard commented Oct 1, 2015

rprichard commented Oct 1, 2015

nikomatsakis commented Oct 1, 2015

dgrunwald commented Nov 4, 2015

rprichard commented Nov 5, 2015

dgrunwald commented Nov 5, 2015

rprichard commented Nov 5, 2015

dgrunwald commented Nov 8, 2015

dgrunwald commented Nov 8, 2015

nikomatsakis commented Nov 9, 2015

brson commented Aug 4, 2016

Mark-Simulacrum commented May 10, 2017

steveklabnik commented Oct 31, 2018

dgrunwald commented Dec 30, 2021

dtolnay commented Dec 30, 2021