Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

range notation (a..b) has weird operator precedence #20811

Closed
dgrunwald opened this issue Jan 9, 2015 · 18 comments · Fixed by #21374
Closed

range notation (a..b) has weird operator precedence #20811

dgrunwald opened this issue Jan 9, 2015 · 18 comments · Fixed by #21374
Milestone

Comments

@dgrunwald
Copy link
Contributor

First off, the operator precedence of a..b is extremely low.
struct Range recently got #[derive(Eq)], so people might want to compare a range to an expression in range notation.

r == 1..10 parses as (r == 1)..10. Similarly, 1..10 == r parses as 1..(10 == r). Both result in type errors.
I think there's no good reason for .. to have a precedence lower than comparisons. I'd suggest to put it in-between the comparison operators and the bitwise operators.

But the current implementation gets weirder when we consider the other forms of range notation:
r == 1.. parses as (r == 1).., and is a type error (or maybe a RangeFrom<bool>).
But r == ..1 parses as r == (..1), and is a valid range comparison.

In the other direction, ..1 == r parses as ..(1 == r) and is potentially a RangeTo<bool>.
But 1.. == r is a syntax error (expected one of ) or ,, found ==).
This inconsistent with the prefix form.

If we keep the current precedence of .. (lower than all binops), I think we should disallow r == ..1. I think this can be implemented easily by using RESTRICTION_NO_DOTS when parsing the RHS of any binop.

Also, we should use RESTRICTION_NO_DOTS when parsing the RHS of prefix-version ..expr -- currently ..1..2 is a RangeTo<Range<_>>, while 1..2..3 and 1..2.. both are syntax errors.

If we increase the precedence, we will need to be careful not to allow code like 1 + ..2. I think any binop with precedence smaller than that of .. will have to use RESTRICTION_NO_DOTS on the RHS.

I also think the implementation can be simplified quite a bit if the prefix .. wasn't handled in parse_prefix_expr (as that's only supposed to be used for operators with high precedence), but at the same level as the binary and postfix version (parse_assign_expr, or parse_binops if we change the precedence). This would allow us to get rid of RESTRICTION_NO_DOTS and use the normal operator precedence handling instead.

@dgrunwald
Copy link
Contributor Author

I am planning to fix this myself, but I need input on whether to increase the operator precedence so that r == 1..2 is valid.

@bstrie
Copy link
Contributor

bstrie commented Jan 9, 2015

cc @aturon

I think this is a candidate for a nomination as well.

@aturon
Copy link
Member

aturon commented Jan 9, 2015

cc @nick29581

@aturon
Copy link
Member

aturon commented Jan 9, 2015

Nominating.

@aturon
Copy link
Member

aturon commented Jan 9, 2015

cc @nikomatsakis @alexcrichton

@alexcrichton
Copy link
Member

May be related to #20830

@dgrunwald
Copy link
Contributor Author

@alexcrichton: That looks like a different bug in the same line of code.

@nikomatsakis
Copy link
Contributor

I agree about the current precedence being wrong, and think that the suggested position is probably good, but I am very wary of the "restriction no dots" rules. Every restriction is a parser bug until proven otherwise, from my POV, and I am not convinced that we need RESTRICTION_NO_DOTS at all anymore.

@dgrunwald
Copy link
Contributor Author

@nikomatsakis: I think some kind of parser special case will always be necessary if we want to allow v[0..a+b], avoid grammar inconsistencies between ..a and a.., and not have a combinatorial explosion in the complexity of the formal grammar at the same time.

I know this problem from when I tried writing a C# parser using a parser generator. It turns out that the Microsoft C# compiler is a hand-written recursive-descent parser using the shunting yard algorithm to parse binops. But not all C# binops are true binary operators taking two expressions: the as operator takes a type on the RHS, yet has lower precedence than some other binary operators. This results in some strange effects -- a + b as c + d is valid in C# and parses as ((a + b) as c) + d, even though the precedence of as is lower than that of +. As far as I know, there is no way to write a formal grammar that accepts the same expressions as the MS C# compiler without duplicating all expression-related productions.

I'd prefer if rust does not make the same mistake as C#, so any prefix- and suffix- operators must be forbidden from being used in a context which conflicts with their operator precedence, even if operator precedence is unnecessary to disambiguate the expression. As far as I know, .. is currently the only prefix-/suffix- operator where this problem occurs, since as and all unary operators have a higher precedence than any binary operator.

Here's how I imagine the formal grammar:

BitOrExpr = BitXorExpr | BitOrExpr "^" BitXorExpr
RangeExpr = BitOrExpr | ".." BitOrExpr | BitOrExpr ".." | BitOrExpr ".." BitOrExpr
CompExpr = RangeExpr | RangeExpr ("==" | "!=" | "<" | ">" | "<=" | ">=") RangeExpr
AndExpr = CompExpr | AndExpr "&&" CompExpr

This means that "^" and "&&" are left-associative; comparison operators don't have any associativity (since RFC 558); and the range expression doesn't have any associativity either.

I hope that this can be implemented without using a restrictions bitflag in the parser. But if not, I'd prefer special cases in the parser over special cases in the grammar.

@dgrunwald
Copy link
Contributor Author

Changing the precedence level as I suggested actually creates new syntactic ambiguities:
a == b.. && c
currently is... actually a syntax error because can_begin_expr doesn't handle && (#20901), but if that is fixed, it would be:
(a == b)..(&&c)

With my suggestion, it would be ambiguous between
a == (b..(&&c))
and
a == (b..) && c

I'm starting to think we should keep the precedence as-is, so that .. must always be enclosed in parenthesis when used in a more complex expression, and only fix the inconsistencies between the prefix form and the other two forms.

@nikomatsakis
Copy link
Contributor

On Sat, Jan 10, 2015 at 09:56:42AM -0800, Daniel Grunwald wrote:

@nikomatsakis: I think some kind of parser special case will always be necessary if we want to allow v[0..a+b], avoid grammar inconsistencies between ..a and a.., and not have a combinatorial explosion in the complexity of the formal grammar at the same time.

Can you spell this out a bit more?

@nikomatsakis
Copy link
Contributor

On Sat, Jan 10, 2015 at 01:43:35PM -0800, Daniel Grunwald wrote:

With my suggestion, it would be ambiguous between
a == (b..(&&c))
and
a == (b..) && c

Why is this not just an argument for a more-refined precedence (like,
we pick one interpretation and run with it). Only one is remotely
sensible (b..(&&c)) so that seems to be the one to use. That said,
I've not sat down and looked at the table of precedences to see how
that would fit.

@dgrunwald
Copy link
Contributor Author

Why is this not just an argument for a more-refined precedence (like, we pick one interpretation and run with it). Only one is remotely sensible (b..(&&c)) so that seems to be the one to use.

True, I guess we can just pick one interpretation and use it.
Though I would have picked the other interpretation: if r1 == 1.. && r2 == ..2 {} should probably be if r1 == (1..) && r2 == (..2) {}
The other choice would disambiguate it as if r1 == (1.. (&& r2)) == ..2 {}.

I'm starting to think we should keep the precedence as-is, so that .. must always be enclosed in parenthesis

When I wrote that, I was just concerned about the complexity of having an optional expression in a chain of binops, but I think probably isn't as bad as I initially thought.

The only binops that potentially conflict with the initial token in an expression are: &&, ||, *, & and .. itself. (UFCS will add < and <<)
Only those with a precedence lower than .. are problematic, so that's && and || only.
I don't see much use for ranges of double references or ranges of lambdas, so I'd add the following rule to disambiguate the formal grammar I've given:
If .. is followed by a token that can be interpreted both as start of expression or binary operator, it will be parsed as:

  • start of expression if the interpretation as binop has a higher precedence than ..
  • binary operator if it has a lower precedence than ..

This corresponds to the disambiguation of if r1 == 1.. && r2 == ..2 {} as if r1 == (1..) && r2 == (..2) {}, but 1..*i will be 1..(*i), not (1..)*(i).

@Gankra
Copy link
Contributor

Gankra commented Jan 11, 2015

#20921 points out that for i in 0 .. n {} doesn't even parse correctly right now, so at very least that should be fixed.

@dgrunwald
Copy link
Contributor Author

I have an implementation of my suggested precedence + disambigation. I will create a PR once I have done some testing.

I was able to eliminate RESTRICTION_NO_DOTS at the cost of adding a minprec parameter to parse_binops and parse_prefix_expr.

@dgrunwald
Copy link
Contributor Author

I think we may need a special case to handle for i in 1.. { }. I'd expect that to be an infinite loop, but currently it gets parsed as for i in (1..{}).

dgrunwald added a commit to dgrunwald/rust that referenced this issue Jan 11, 2015
Fixes rust-lang#20811 and rust-lang#20241.

Note: this changes the semantics of parse_more_binops() to parse binops with
precedence >= min_prec.
Previously, it would parse binops with precedence > min_prec.
@nikomatsakis
Copy link
Contributor

On Sun, Jan 11, 2015 at 08:21:03AM -0800, Daniel Grunwald wrote:

I think we may need a special case to handle for i in 1.. { }. I'd
expect that to be an infinite loop, but currently it gets parsed as
for i in (1..{}).

We do generally have a subset of expression types that appear in
positions terminated by { (e.g., if, for). This seems relevant
here too.

@dgrunwald
Copy link
Contributor Author

Yes, I already handled that case in my implementation: https://github.com/rust-lang/rust/pull/20958/files#diff-da9d34ca1d0f6beee2838cf02e07345cR2942
Makes RESTRICTION_NO_STRUCT_LITERAL a slight misnomer, it should probably be RESTRICTION_NO_OPEN_BRACE_WHERE_EXPRESSION_MIGHT_END except that's too long.

Overview over what my pull request does:

  • .. precedence changed to between comparison operators and bit or; using the formal grammar:
BitOrExpr = BitXorExpr | BitOrExpr "^" BitXorExpr
RangeExpr = BitOrExpr | ".." BitOrExpr | BitOrExpr ".." | BitOrExpr ".." BitOrExpr
CompExpr = RangeExpr | RangeExpr ("==" | "!=" | "<" | ">" | "<=" | ">=") RangeExpr
AndExpr = CompExpr | AndExpr "&&" CompExpr
  • Where the above grammar is ambiguous due to collisions between binary operators and prefix operators (e.g. r == 1.. && condition), && and || are parsed as binary operators, other tokens are parsed as prefix operators.
  • for i in 1.. {} is parsed as for i in (1..) {}

@brson brson added this to the 1.0 beta milestone Jan 15, 2015
dgrunwald added a commit to dgrunwald/rust that referenced this issue Jan 22, 2015
Grammar changes:
* allow 'for _ in 1..i {}' (fixes rust-lang#20241)
* allow 'for _ in 1.. {}' as infinite loop
* prevent use of range notation in contexts where only operators of high
  precedence are expected (fixes rust-lang#20811)

Parser code cleanup:
* remove RESTRICTION_NO_DOTS
* make AS_PREC const and follow naming convention
* make min_prec inclusive
bors added a commit that referenced this issue Jan 23, 2015
This PR is intended as alternative to #20958. It fixes the same grammar inconsistencies, but does not increase the operator precedence of `..`, leaving it at the same level as the assignment operator.
For previous discussion, see #20811 and #20958.

Grammar changes:
* allow `for _ in 1..i {}` (fixes #20241)
* allow `for _ in 1.. {}` as infinite loop
* prevent use of range notation in contexts where only operators of high precedence are expected (fixes #20811)

Parser code cleanup:
* remove `RESTRICTION_NO_DOTS`
* make `AS_PREC` const and follow naming convention
* make `min_prec` inclusive

r? nikomatsakis
dlrobertson pushed a commit to dlrobertson/rust that referenced this issue Nov 29, 2018
Grammar changes:
* allow 'for _ in 1..i {}' (fixes rust-lang#20241)
* allow 'for _ in 1.. {}' as infinite loop
* prevent use of range notation in contexts where only operators of high
  precedence are expected (fixes rust-lang#20811)

Parser code cleanup:
* remove RESTRICTION_NO_DOTS
* make AS_PREC const and follow naming convention
* make min_prec inclusive
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
7 participants