Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Range expressions: discrepancies between rustc and parser-lalr #28785

Open
rprichard opened this issue Oct 1, 2015 · 14 comments
Open

Range expressions: discrepancies between rustc and parser-lalr #28785

rprichard opened this issue Oct 1, 2015 · 14 comments
Labels
A-grammar Area: The grammar of Rust C-bug Category: This is a bug. I-needs-decision Issue: In need of a decision. P-medium Medium priority T-lang Relevant to the language team, which will review and decide on the PR/issue.

Comments

@rprichard
Copy link
Contributor

I'd guess that the last test case here is a rustc bug, but I don't know about the others.

fn match_rangefull() {
    match .. { _ => () } // rustc compiles it, parser-lalr rejects
}
fn field_of_range() {
    123.. .start; // rustc rejects, parser-lalr accepts
}
fn eq_rangefull() {
    .. == ..; // rustc rejects, parser-lalr accepts
}
fn return_rangeto() -> std::ops::RangeTo<i32> {
    return ..5; // rustc compiles it, parser-lalr parses it as (return)..5
}
fn assign_range() {
    let x;
    x = 4..5; // rustc compiles it, parser-lalr parses it as (x = 4)..5
}
fn nonblock_expr_range() {
    // rustc compiles this, and I don't think it should.  I think parser-lalr
    // gets it right.
    //
    //    rustc parses it like:       let _one = (1..).start;
    //    parser-lalr parses it like: let _one = {1; ..}.start;
    //
    let _one = { {1} .. }.start;
}
fn main() {}

Also see #25119.

@rprichard
Copy link
Contributor Author

Looking through #20811, I see two more cases where rustc and parser-lalr disagree:

fn rangefrom_compare() {
    1.. == (1..); // rustc syntax error, parser-lalr parses as (1..) == (1..)
}
fn compare_rangeto() {
    (..1) == ..1; // rustc syntax error, parser-lalr parses as (..1) == (..1)
}

@huonw huonw added A-grammar Area: The grammar of Rust I-nominated T-lang Relevant to the language team, which will review and decide on the PR/issue. labels Oct 1, 2015
@nikomatsakis
Copy link
Contributor

triage: P-medium

This is a backwards incompat issue, but feels like an edge case. Nonetheless we should definitely fix it!

@rust-highfive rust-highfive added P-medium Medium priority and removed I-nominated labels Oct 1, 2015
@dgrunwald
Copy link
Contributor

Can you explain why you think that { {1} .. }.start should parse as {1; ..}.start? I'm not familiar with the details of Rust's semicolon insertion.

I would expect {} within an expression to work pretty much wherever () does (except in if conditions / match expressions or similar, where {} starts the control flow block), and ((1)..).start is definitely valid syntax.

On all other examples, I'd say the rustc behavior is correct. The decision in #21374 was: .. is non-associative has the same precedence as the assignment operator. The unary forms of .. are only valid at the same precedence level as the binary form.

Comparing ranges needs parentheses: the attempt to give .. a higher precedence than the comparison operators in #20811 was not adopted because it caused new ambiguities between unary and binary &&, e.g. in r == 1.. && condition.

Admittedly, having .. at the same precedence level as assignment is somewhat weird, as the operators on that level have different associativity (assignment is right-associative; .. is non-associative). It would be better to put .. on its own level directly above the assignment operator and placement-new-arrow.

@rprichard
Copy link
Contributor Author

Regarding { {1} .. }.start, my reasoning was based on the idea that a "statement-like" expression in statement position cannot start a binary expression. e.g. let x = { {} - 1 }; parses and type-checks because it's the same thing as:

let x = {
  {}
  -1
}

(Rust ignores newlines outside of string-literals, AFAIK.)

let x = { () - 1 }; doesn't type-check.

@nikomatsakis made some comments to this effect here:

Of course, 1.. is a unary expression, not a binary expression.

FWIW, I noticed that the current nightly Rust compiler is now parsing { {1} .. }.start differently -- the .. is parsed as a nullary operator (i.e. as-if there were a semicolon after the {1}).

@dgrunwald
Copy link
Contributor

That makes sense; I didn't realize we had that rule about {} not starting binary expressions.

It also fits what I wrote in #25119: .. is only accepted as expression in locations where a..b would also be accepted.
{ {1} a..b } parses as if there's a semicolon after the {1}, so the same should apply to { { 1 } .. }.

In a formal grammar, the restrictions around .. are easily expressed by using a production for every precedence level:

asgn_expr
: range_expr
| range_expr "=" asgn_expr
;

range_expr
: logic_or_expr
| logic_or_expr ".." logic_or_expr
| logic_or_expr ".."
| ".." logic_or_expr
| ".."

logic_or_expr
: logic_and_expr
| logic_or_expr "||" logic_and_expr
;

Is there any way to do this with the %precedence mechanism? A production for each precedence level would bloat the grammar quite a bit, given that they get multiplied with the restrictions (normal, nonblock, nonparen, nostruct).

@rprichard
Copy link
Contributor Author

I think we can reduce the problem to:

expr
: DOTDOT
| expr DOT ID
| ID
{}
;

We need to reject DOTDOT DOT ID. My full test case is here. Menhir and yacc both think my precedence declarations are useless. I suspect the precedence mechanism doesn't do what we want.

(Aside: I'm finding menhir --interpret --interpret-show-cst useful for playing with grammars. You can type DOTDOT DOT ID and see what CST is parsed, or whether it's rejected.)

Some parser generators (e.g. menhir) allowing parameterizing rules. e.g. Maybe a formal grammar could have a single range_expr rule that is parameterized over each of the restrictions.

(Also: I get the impression that the nonparen restriction class might be going away. It's needed only for the placement new box (xxx) yyy syntax, which has changed to box in xxx { yyy }.)

@dgrunwald
Copy link
Contributor

Actually, I'm starting to think that the rustc behavior might not be ideal: it interacts badly with lambda syntax.

Rust currently has 7 non-associative operators: .. and the 6 comparison operators. But within lambdas, these behave differently!

Rustc currently rejects |a, b| a == b == c as a syntax error (cannot chain comparison operators).
But it parses |a, b| a .. b .. c successfully, as (|a, b| (a .. b)) .. c.

Lambdas have the same "precedence inversion" problem that I tried to avoid for (unary) ..: they can appear within operators with high precedence, but contain an arbitrary expression with lower precedence.

For example, a * || b + c is a syntactically valid expression, parsing as a * (|| (b + c)). A programmer who knows about the precedence of * and + but not about the effect of lambdas would probably expect (a * (|| b)) + c instead.

This gets especially problematic when the programmer assumes the lambda body is an expression, but the expression actually ends earlier. Is |a, b| a .. b .. c the only case of a valid expression that "breaks out" of the lambda?

One solution would be to parse .. as parser-larl does, and then later reject syntax trees where .. appears as operand of another operator with higher precedence. (similar to how chained comparisons are being rejected)

The other option would be to disallow lambdas from appearing as left-hand-operand of a binary operator (just like we already disallow blocks in such a position).

@dgrunwald
Copy link
Contributor

|a, b| a .. b .. c isn't the only case, I found another one. The following is a valid rust program, but probably shouldn't be:

fn main() {
    || println!("Hello World") as ()()
}

I think we should disallow lambdas from appearing as left-hand-operand of a binary operator or as operand of a suffix expression.

@nikomatsakis
Copy link
Contributor

On Thu, Nov 05, 2015 at 09:59:14AM -0800, Ryan Prichard wrote:

Some parser generators (e.g. menhir) allowing parameterizing rules. e.g. Maybe a formal grammar could have a single range_expr rule that is parameterized over each of the restrictions.

Just wanted to point out that LALRPOP supports this sort of
parameterization (and generates Rust). At some point I want to go
ahead and port one of these LALR Rust grammars to LALRPOP.

@brson brson added the E-mentor Call for participation: This issue has a mentor. Use #t-compiler/help on Zulip for discussion. label Aug 4, 2016
@brson
Copy link
Contributor

brson commented Aug 4, 2016

@nikomatsakis can mentor. Needs investigation into the correct grammar.

@Mark-Simulacrum
Copy link
Member

@nikomatsakis Could you either provide some instructions here, or otherwise unmark as E-mentor?

@Mark-Simulacrum Mark-Simulacrum added C-bug Category: This is a bug. and removed I-wrong labels Jul 24, 2017
@steveklabnik steveklabnik removed the E-mentor Call for participation: This issue has a mentor. Use #t-compiler/help on Zulip for discussion. label Oct 31, 2018
@steveklabnik
Copy link
Member

Triage; is parser-lalr still a thing? is this issue relevant?

also, removing e-mentor

@dgrunwald
Copy link
Contributor

Apparently || println!("Hello World") as ()() was fixed at some point by the introduction of a "casts cannot be followed by a function call" error, but the issues with the interaction of .. and lambdas still exist.

@dtolnay
Copy link
Member

dtolnay commented Dec 30, 2021

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-grammar Area: The grammar of Rust C-bug Category: This is a bug. I-needs-decision Issue: In need of a decision. P-medium Medium priority T-lang Relevant to the language team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

9 participants