Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

is operator for pattern-matching and binding #3573

Open
wants to merge 10 commits into
base: master
Choose a base branch
from
284 changes: 284 additions & 0 deletions text/3573-is.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,284 @@
- Feature Name: `is`
- Start Date: 2024-02-16
- RFC PR: [rust-lang/rfcs#3573](https://github.com/rust-lang/rfcs/pull/3573)
- Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000)

# Summary

Introduce an `is` operator in Rust 2024, to test if an expression matches a
pattern and bind the variables in the pattern.

# Motivation

This RFC introduces an `is` operator that tests if an expression matches a
pattern, and if so, binds the variables bound by the pattern and evaluates to
true. This operator can be used as part of any boolean expression, and combined
with boolean operators.

Previous discussions around `let`-chains have treated the `is` operator as an
alternative on the basis that they serve similar functions, rather than
proposing that they can and should coexist. This RFC proposes that we allow
`let`-chaining *and* add the `is` operator.
Copy link
Member

@flip1995 flip1995 Feb 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit concerned about this. This would introduce the possibility of doing the same thing in 2 different ways on a language level. IMHO this is a bad idea, as it opens the door for mixed-style code bases, that just get harder to read.

For tooling, this is also a problem: Clippy will most likely get (restriction) lint requests for not allowing is OR not allowing let-chains.

Another problem I see here is: What should Clippy do when producing suggestions? If we have the policy to always suggest is over let-chains, that might pollute code bases where let-chains are preferred (and vice versa). We also can't really check things like "is this a let-chain code base" or "are we in an is-chain expression" when producing suggestions. One lint suggesting is and another suggesting let will make this problem even worse, and that is almost impossible to avoid with changing contributors and team members.

We recently had the situation described above with suggesting the new-ish _ = binding over let _ =. We decided to suggest let _ = as we don't have to check the MSRV before producing the suggestion that way.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rust already has many different ways to do the same thing. You can write a for loop or you can write iterator code. You can use combinators or write a match or write an if let. You can write let-else or use a match. You can write x > 3 or 3 < x. You can write x + 3 or 3 + x.

as it opens the door for mixed-style code bases, that just get harder to read

In this RFC, I'm proposing that both of them have value, and that it's entirely valid for a codebase to use both, for different purposes.

if let PAT = EXPR && ... emphasizes the pattern and its binding. It seems appropriate for clear division into cases based primarily on the pattern, by writing if let ... else.

if EXPR is PAT && ... leads with the expression, then the pattern, then the next condition. It feels more appropriate for cases where you expect the reader to find it easiest to process in order of the sequence of operations from left to right: "run this EXPR, see if it matches PAT, check the next condition ..."

I personally expect to find myself writing both, in different cases.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me those are not really comparable:

  • The language gives you the for loop, the standard library gives you the option/power to do this with iterator method chains.
  • match is rather if you want to match one expression to multiple variants, while if let is for checking if the expression is that exact variant (there's a Clippy style lint for this).
  • let-else was introduced for a specific use case to save some lines of code over using a match
  • I hardly see how commutativity is related to this.

The proposed is language construct doesn't do the same:

  • Both let-chains and is are provided as a language construct.
  • There's no clear rule of thumb with is vs let. It's a pure style choice IMO.
  • It doesn't simplify (in terms of amount of code) a certain, often occurring pattern.

The second point is the biggest problem for tooling: It is impossible to determine what to suggest. With the other examples it's usually clear, because the alternative is more concise/readable/idiomatic/....


The focus on expression vs pattern I can see and think is a valid point. But to that, I want to point out the equatable_if_let Clippy lint, that tried to address something similar, but never got out of nursery as we (mainly I) couldn't agree when expr == pat is preferable over pat == expr/let pat = expr. rust-lang/rust-clippy#7777

So I see the addition of the is as giving the user a choice between two styles and not much more. IMO this is not worth the downsides that come with this. But that is my opinion and millage may vary obviously.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to also link and quote one of my comments below: #3573 (comment)

As let-chains are not stabilized yet, and iff there is consensus that the is approach is better, I think we should go with the is approach and remove let-chains again. I just think having both can cause problems and confusion, as I argued above.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If let-chain is to be scraped, this RFC should really have a section to refute the counterarguments made in RFC 2497.


`if`-`let` provides a natural extension of `let` for fallible bindings, and
highlights the binding by putting it on the left, like a `let` statement.
Allowing developers to chain multiple `let` operations and other expressions in
the `if` condition provides a natural extension that simplifies what would
otherwise require complex nested conditionals. As the `let`-chains RFC notes,
this is a feature people already expect to work.

The `is` operator similarly allows developers to chain multiple match-and-bind
operations and simplify what would otherwise require complex nested
conditionals. However, the `is` operator allows writing and reading a pattern
match from left-to-right, which reads more naturally in many circumstances. For
instance, consider an expression like `x is Some(y) && y > 5`; that boolean
expression reads more naturally from left-to-right than
`let Some(y) = x && y > 5`.

This is even more true at the end of a longer expression chain, such as
`x.method()?.another_method().await? is Some(y)`. Rust method chaining and `?`
and `.await` all encourage writing code that reads in operation order from left
to right, and `is` fits naturally at the end of such a sequence.

Having an `is` operator would also help to reduce the proliferation of methods
on types such as `Option` and `Result`, by allowing prospective users of those
methods to write a condition using `is` instead. While any such condition could
equivalently be expressed using `let`-chains, the binding would then move
further away from the condition expression referencing the binding, which would
result in a less natural reading order for the expression.

Consider the following examples:

```rust
if expr_producing_option().is_some_and(|v| condition(v))

if let Some(v) = expr_producing_option() && condition(v)

if expr_producing_option() is Some(v) && condition(v)
```

The condition using `is` is a natural translation from the `is_some_and`
method, whereas the if-let construction requires reversing the binding of `v`
and the expression producing the option. This seems sufficiently cumbersome in
some cases that the absence of `is` would motivate continued use and
development of helper methods.

# Guide-level explanation

Rust provides an `is` operator, which can be used in any expression:
`EXPR is PATTERN`

This operator tests if the value of `EXPR` matches the specified `PATTERN`; see
<https://doc.rust-lang.org/reference/patterns.html> for details on patterns.

If the `EXPR` matches the `PATTERN`, the `is` expression evaluates to `true`,
and additionally binds any bindings specified in `PATTERN` in the current scope
for code subsequently executed along the path where the `is` expression
is known to be `true`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

known

In all examples, it's obvious and I'm sure the compiler can figure it out, but I can trivially write code examples where it's non-obvious whether the value is true

let is_true = x is Some(y);
let is_true = identity(is_true);
if is_true { y; }

What happens here? What about just having a local variable? The compiler needs to draw a line somewhere (as Rust is turing complete), and this doesn't say anything about what that is. I think no matter where it lies, it can be pretty confusing to users.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get the impression from the RFC text that probably it's not intended for the binding y to live beyond the end of the first statement, as the second statement is potentially reachable when x is None.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I gathered that too and didn't realise that it wasn't stated explicitly. To put it more clearly, x is y only binds y for the current expression in the tree, at the same level, meaning that (x is y) == ... will not allow y to occur inside ....

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed; like the super let conversation this needs to be detailed about what the scopes of everything are.

That said, so long as it can a MIR desugaring, we can then depend on the existing MIR checking to ensure that things are only used once initialized.

(Which, yes, will not allow things like nils's example, same as how let x; if b { x = 3; } if b { dbg!(x); } doesn't work.)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I do not have a solution it does feel like the binding should last longer than just the expression it is within. Else this Rust equivalent of C# would not be possible:

if(foo is {Values: [var value1, var value2]}) {
    return (value1, value2);
}


For example:

```rust
if an_option is Some(x) && x > 3 {
println!("{x}");
}
```

The bindings in the pattern are not bound along any code path potentially
reachable where the expression did not match:

```rust
if (an_option is Some(x) && x > 3) || (more_conditions /* x is not bound here*/) {
// x is not bound here
} else {
// x is not bound here
}
// x is not bound here
```
Comment on lines +87 to +97
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/*1*/ let x = 9999;
/*2*/ let y = Some(4);
/*3*/ if y is Some(x) && x > 0 {
/*4*/     println!("x1 = {x}"); // x1 = 4
/*5*/ }
/*6*/ if y is Some(x) && x > 0 || cheat_code_enabled() {
/*7*/     println!("x2 = {x}"); // x2 = 9999  ⁉️
/*8*/ }

Just by adding that || on line 6 causes the x on line 7 refer to the outer variable on line 1 instead is surprising.

For if let chains this kind of miss is impossible (and the let keyword makes the scope very clear), but an is expression can appear in any arbitrary place for expression!

I think that, even if x is not bound, the name x must still "pollute" the entire expression statement so use of x will generate at minimum a warn-by-default lint.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kennytm Extremely valid; I'll mention that case and add open questions about appropriate lints.


The pattern may use alternation (within parentheses), but must have the same
bindings in every alternative:

```rust
if color is (RGB(r, g, b) | RGBA(r, g, b, _)) && r == b && g < 10 {
println!("condition met")
}

// ERROR: `a` is not bound in all alternatives of the pattern
if color is (RGB(r, g, b) | RGBA(r, g, b, a)) && r == b && g < 10 {
println!("condition met")
}
```

`is` may appear anywhere a boolean expression is accepted:

```rust
func(x is Some(y) && y > 3);
```

The `is` operator may not appear as a statement; use `let` to bind a pattern in
that context:

```rust
// ERROR: use `let` for this
an_expression() is x;
joshtriplett marked this conversation as resolved.
Show resolved Hide resolved

let x = an_expression();
```

# Reference-level explanation

Add a new [operator
expression](https://doc.rust-lang.org/reference/expressions/operator-expr.html),
`IsExpression`:

> **<sup>Syntax</sup>**\
> _IsExpression_ :\
> &nbsp;&nbsp; _Expression_ `is` _PatternNoTopAlt_
joshtriplett marked this conversation as resolved.
Show resolved Hide resolved
Add `is` to the [operator
precedence](https://doc.rust-lang.org/reference/expressions.html#expression-precedence)
table, at the same precedence level as `==`, and likewise non-associative
(requiring parentheses).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that it should recommend parentheses, but have a higher precedence than ==, similar to how && has higher precedence than || but we still recommend parentheses there.

To me, x is Some(z) == y is Some(w) is unambiguous, even if I would recommend adding parentheses for clarity.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no valid expression with == on both sides of an is. But what about if a == b is false?

If a and b are bool, then the expression is ambiguous, but it returns the same result with either operator precedence. (But there might be a change in the evaluation order.)

One way to deal with expressions like this is a lint removing is true, and turning is false into !.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


Detect `is` appearing as a top-level statement and produce an error, with a
rustfix suggestion to use `let` instead.
joshtriplett marked this conversation as resolved.
Show resolved Hide resolved
Comment on lines +131 to +147
Copy link
Member

@kennytm kennytm Feb 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This Reference-Level Explanation is too short to explain how is actually works, especially compared with the Guide-Level Explanation.

I would like to see how that and additionally binds any bindings specified in PATTERN in the current scope for code subsequently executed along the path where the is expression is known to be true works, not through just a small number of examples, but the actual algorithm over all possible combinations of $:exprs, for instance the Guide-Level Explanation is not clear whether this is valid or not (I assume z is bound and w is not bound):

match x {
    Some(y) if y is Some(z) => (z is Some(w)).then(|| w + 1),
    _ => ..,
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kennytm Valid. So, you're suggesting a systematic look at every existing statement type and which parts of the statement the binding is valid in? That seems entirely reasonable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, you're suggesting a systematic look at every existing statement type and which parts of the statement the binding is valid in?

Yes. Expressions too (well basically all statements except let and items are expressions), e.g. stuff like ((x is Some(z)) || (y is Some(z)) && z > 0.


# Drawbacks

Introducing both the `is` operator and `let`-chains would provide two different
ways to do a pattern match as part of a condition. Having more than one way to
do something could lead people to wonder if there's a difference; we would need
to clearly communicate that they serve similar purposes.

An `is` operator will produce a name conflict with [the `is` method on
`dyn Any`](https://doc.rust-lang.org/std/any/trait.Any.html#method.is) in the
standard library, and with the (relatively few) methods named `is` in the
ecosystem. This will not break any existing Rust code, as the operator will
only exist in the Rust 2024 edition and newer. The Rust standard library and
any other library that wants to avoid requiring the use of `r#is` in Rust 2024
and newer could provide aliases of these methods under a new name; for
instance, the standard library could additionally provide `Any::is` under a new
name `is_type`.
joshtriplett marked this conversation as resolved.
Show resolved Hide resolved

# Rationale and alternatives
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something that I think is worth exploring here, even though I agree it's worse than the proposal, is the idea of just promoting let patterns to expressions. For example, allowing f(let Some(y) = x && y > 5). This is consistent with let-chaining, but noticeably uncomfortable, and worth exploring as an example of further motivation for why is is the better option.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's even good for the language to promote patterns such as

f(x is Some(y) && y > 5)

That promotes very obscure code which is really hard for people new or even intermediate to the language to even understand what is going on.

I'd much rather see patterns such as

if x is Some(y) && y > 5 {
  f(true);
} else {
  f(false);
}

which while more verbose is less arcane. I agree that the first one looks prettier but there is a lot of information to unpack in one line, especially if you are new.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, what you're describing to me is quite a stylistic choice and I don't think it's something that the language itself should have a say in, and maybe something that should be left in clippy lints.

What you've described to me is extremely similar to the common case of if x { true } else { false }, and it generally represents some failure to fully conceptualise booleans as data, rather than just conditions on branches. This is actually extremely common among programming in general and (IMO) represents a combination of failures in teaching, misconceptions accumulated from how other languages work, etc.

Like, to be clear, this isn't me saying you're wrong here-- it's a real problem and ignoring it is not a real solution. But in that regard, while failing to dig deep into why people prefer this more expanded version is ignoring it, it's also ignoring it to just say that the expanded version is better and not question it.

This is kind of why I think that the solution probably lies somewhere in clippy-- things such as if x { true } else { false } are warned in clippy lints, and similarly, your code would probably be changed back into mine after two passes, where the first notices that the f(...) could be factored outside the expression to f(if ... { true } else { false }) and then the second notices that you're doing if x { true } else { false }.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If let were to be promoted to return a boolean on a successful bind it would both solve let chaining as well as the main problem with let chaining as it is proposed today. I would hate for that to be accepted instead of is as the let keyword feels overused enough as is. But it is my personal preference over let chaining.


As noted in the [motivation](#motivation) section, adding the `is` operator
allows writing pattern matches from left-to-right, which reads more naturally
in some conditionals and fits well with method chains and similar. As noted
under [prior art](#prior-art), other languages such as C# already have this
exact operator for this exact purpose.

We could choose not to add this operator, and have *only* `let`-chains. This
would provide equivalent functionality, semantically; however, it would force
pattern-matches to be written with the pattern on the left, which won't read as
naturally in some expressions. Notably, this seems unlikely to do as effective
a job of reducing the desire for `is_variant()` methods and helpers like
`is_some_and(...)`.

We could choose to add the `is` operator and *not* add `let`-chains. However,
many people *already* expect `let`-chains to work as an obvious extrapolation
from seeing `if let`/`while let` syntax.

We could add this operator using punctuation instead (e.g. `~`). However, there
is no "natural" operator that conveys "pattern match" to people (the way that
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree that there's no natural operator, even though I agree that it would be less clear.

For example, we could use tildes as an additional equality operator (x ~ Some(y)) and it would fit in with Rust fine. I mostly say this because I think that the reasoning should lean heavier on its stronger arguments and not for a lack of creativity:

  • is is short and immediately clear
  • Any use of it as a variable is something that won't be missed (e.g. as a plural of i, an already nameless variable)
  • People are already used to it being a keyword in other languages, so, it being one in Rust too isn't strange

Copy link

@dev-ardi dev-ardi Feb 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something else to point to, is is easy to write too, which is worth considering.
For example it annoys me to have to write #![(...)]
In my opinion we should try to avoid symbols where words suffice.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some more prior art, Raku uses ~~ for “smart matching”. Using two tildes would leave the single tilde free for something else, if we wanted to use it later. Of course, Raku is known to be a symbol-heavy language, so I don’t think this is the best choice.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to note that Rust used to use tildes (~T), and during the move to Box<T> it was noted that ~ is simply absent (not just hard to type, plain absent) from a number of keyboard layouts.

As an example, consider a Polish keyboard layout.

I would recommend avoiding ~ altogether.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@matthieu-m The keyboard layout you've linked to is an obsolete typewriter layout. Polish computers use a QWERTY-based layout called "Polish programmer's" layout, which despite the name, is the default used by everyone.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another argument that i have yet to see is that the is keyword fits very well with the as keyword already present in Rust

`+` is well-known as addition). Using punctuation also seems likely to make the
language more punctuation-heavy, less obvious, and less readable.

We could add this operator using a different name. Most other names, however,
seem likely to be longer and to fit people's expectations less well, given the
widespread precedent of `is_xyz()` methods. The most obvious choice would be
something like `matches`.

We could permit top-level alternation in the pattern. However, this seems
likely to produce visual and semantic ambiguity. This is technically a one-way
door, in that `x is true|false` would parse differently depending on our
decision here; however, the use of a pattern-match for a boolean here seems
unlikely, redundant, and in poor style. In any case, the compiler could easily
detect most attempts at top-level alternation and suggest adding parentheses.

# Prior art

`let`-chains provide prior art for having this functionality in the language.

The `matches!` macro similarly provides precedent for having pattern matches in
boolean expressions. `is` would likely be a natural replacement for most uses
of `matches!`.
joshtriplett marked this conversation as resolved.
Show resolved Hide resolved

Many Rust enums provide `is_variant()` functions:
- `is_some()` and `is_none()` for `Option`
- `is_ok()` and `is_err()` for `Result`
- `is_eq()` and `is_lt()` and `is_gt()` for `Ordering`
- `is_ipv4()` and `is_ipv6()` for `SocketAddr`
- `is_break()` and `is_continue()` for `ControlFlow`
- `is_borrowed()` and `is_owned()` for `Cow`
- `is_pending()` and `is_ready()` for `Poll`
joshtriplett marked this conversation as resolved.
Show resolved Hide resolved

These functions serve as precedent for using the word `is` for this purpose.

There's extensive prior art in Rust for having more than one way to accomplish
the same thing. You can write a `for` loop or you can write iterator code. You
can use combinators or write a `match` or write an `if let`. You can write
`let`-`else` or use a `match`. You can write `x > 3` or `3 < x`. You can write
`x + 3` or `3 + x`. Rust does not normatively require one alternative. We do,
in general, avoid adding constructs that are entirely redundant with each
other. However, this RFC proposes that the constructs are *not* redundant: some
code will be more readable with `let`-chains, and some code will be more
readable with `is`.

[Kotlin has a similar `is`
operator](https://kotlinlang.org/docs/typecasts.html#smart-casts) for casts to
a type, which are similarly flow-sensitive: in the code path where the `is`
test has succeeded, subsequent code can use the tested value as that type.

[C# has an `is`
operator](https://learn.microsoft.com/en-US/dotnet/csharp/language-reference/operators/is)
for type-matching and pattern matching, which supports the same style of
chaining as the proposed `is` operator for Rust. For instance, the following
are valid C# code:

```csharp
if (expr is int x && other_expr is int y)
{
func(x - y);
}

if (bounding_box is { P1.X: 0 } or { P2.Y: 0 })
{
check(bounding_box);
}

if (GetData() is var data
&& data.Field == value
&& data.OtherField is [2, 4, 6])
{
show(data);
}
```

# Unresolved questions

Can we make `x is 10..=20` work without requiring the user to parenthesize the
pattern, or would that not be possible with our precedence? We could
potentially make this work over an edition boundary, but would it be worth the
churn?

Pattern types propose using `is` for a different purpose, in types rather than
in expressions: `u32 is 1..` would be a `u32` that can never be `0`, and
`Result<T, E> is Err(_)` would be a `Result<T, E>` that can never be the `Ok`
variant. Can we introduce the `is` operator in expressions without conflicting
with its potential use in pattern types? We could require the use of
parentheses in `v as u32 is 1..`, to force it to be parsed as either `(v as
u32) is 1..` or `v as (u32 is 1..)` (assuming that pattern types can be used in
`as` in the first place).

# Future possibilities

As with `let`-chains, we *could* potentially allow cases involving `||` to bind
the same patterns, such as `expr1 is V1(value) || expr2 is V2(value)`. This RFC
does *not* propose allowing that syntax to bind variables, to avoid confusing
code.

*If* in a future edition we decide to allow this for `let` chains, we should
similarly allow it for `is`. This RFC does not make or recommend such a future
proposal.