Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Finalize syntax for slice patterns with subslices #2359

Merged
merged 8 commits into from Jun 30, 2019

Conversation

@petrochenkov
Copy link
Contributor

commented Mar 9, 2018

Use an obvious syntax for subslices in slice patterns - .. and ..PAT.
If syntactic ambiguities arise in the future, always disambiguate in favor of subslice patterns.

Use syntax .. for "wildcard" sub-slice patterns and IDENT @ .. for binding subslice pattern to a name.

Rendered
Tracking issue

@petrochenkov

This comment has been minimized.

Copy link
Contributor Author

commented Mar 9, 2018

@SoniEx2

This comment has been minimized.

Copy link

commented Mar 9, 2018

Why not use the existing @ syntax?

name @ or just @.

_ gives an item, an empty spot gives any items.

@Centril Centril added the T-lang label Mar 9, 2018

@jethrogb

This comment has been minimized.

Copy link
Contributor

commented Mar 9, 2018

I also like PAT @ ..

@clarfon

This comment has been minimized.

Copy link
Contributor

commented Mar 9, 2018

IMO if someone has a slice of ranges, forcing them to use the structs themselves (e.g. [RangeFull]) is a reasonable compromise. There can even be a clippy lint for it. Personally, I'd rather have [a..b, ..] mean "a slice with at least one element where the first is a number between a and b", rather than "slices can't have different types in them."

@SimonSapin

This comment has been minimized.

Copy link
Contributor

commented Mar 9, 2018

And RangeFull only has one value, so a _ pattern can also be used.

@kennytm

This comment has been minimized.

Copy link
Member

commented Mar 9, 2018

To clarify, by RangeFull I mean RangeFull.contains(), not a literal RangeFull instance 😄

match x {
    5..=8 => a,
    9..12 => b,
    12.. => c,
    ..3 => d,
    .. => e,
}

is equivalent to

match x {
    y if (5..=8).contains(y) => a,
    y if (9..12).contains(y) => b,
    y if (12..).contains(y) => c,
    y if (..3).contains(y) => d,
    y if (..).contains(y) => e,
}

Now for sure only a and b are supported at the moment, and c and d are not supported, and making .. mean RangeFull would have ambiguity with

match (x, y, z) {
    (a, ..) => {}
}

(Also note that the RangeFull::contains method does not really exist.)

@SimonSapin

This comment has been minimized.

Copy link
Contributor

commented Mar 9, 2018

@kennytm Oh ok, so range patterns, not the RangeFull type. That’s also equivalent to a _ pattern, isn’t it?

@kennytm

This comment has been minimized.

Copy link
Member

commented Mar 9, 2018

@SimonSapin yes, but may require PartialOrd. Anyway I don't think we'll ever support .. as a range pattern. I just wanna say that ..PAT or PAT..'s ambiguity with a range pattern isn't unique.

@arielb1

This comment has been minimized.

Copy link
Contributor

commented Mar 14, 2018

I don't see why we would ever want to implement full range patterns - they are the same as a _-pattern.

OTOH, partial range patterns look like a useful enough feature, so I would prefer not to force us to make an ugly syntax change in a future epoch if we ever want to support them

I don't like the BINDING @ PAT syntax much, but it is not infinitely ugly, consistent with what we have today, and it prevents a few annoying specification questions about patterns such as [w, ..[x, y], z] (these work under the current implementation, but thinking of it, I probably should have made them an error instead of making them work).

@petrochenkov

This comment has been minimized.

Copy link
Contributor Author

commented Apr 7, 2018

@jethrogb

I also like PAT @ ..

Why?

  • It adds an extra sigil unnecessary in 99% cases (in 100% with the current language). It also can lead to double @ if the pattern itself contains it (ident @ PAT @ ..?).
  • It's not consistent with any existing syntax. We have ident @ PAT, but it has the pattern on the right side, so it's rather "confusingly similar" so people writing code by analogy may write a wrong thing.
  • It's not symmetric with "rest of the list" in expressions.
@SoniEx2

This comment has been minimized.

Copy link

commented Apr 7, 2018

I think ppl misunderstood when I said binding-at, they thought I meant binding-at-dot-dot. I meant binding-at, no dot-dot.

[a, b, c@] not [a, b, c@..].

This is about as verbose as ..x but it's actually less verbose because one less character. (And it's more obvious because it resembles binding@pat, not ranges.)

@petrochenkov

This comment has been minimized.

Copy link
Contributor Author

commented Apr 7, 2018

@SoniEx2
I see. PAT @ /* nothing */ looks unambiguous indeed.
I don't see where this syntax comes from though, how it relates to subslices or "rest of the list".
It looks kinda random to me like PAT >, or PAT ~, or PAT any_other_unrelated_symbol.

@SoniEx2

This comment has been minimized.

Copy link

commented Apr 7, 2018

PAT @ /*nothing*/? I thought it was supposed to be binding @ /*nothing*/?

I thought we had binding @ thing as it currently stands, e.g. I can do

match x {
    binding @ 1..10 => {},
    _ => {}
}

But are you telling me it's actually pat @ thing and not binding @ thing? Anyway the idea was to omit thing, that's where it comes from.

@petrochenkov

This comment has been minimized.

Copy link
Contributor Author

commented Apr 7, 2018

@SoniEx2

I thought we had binding @ thing as it currently stands

Exactly, we have binding @ PATTERN_THING on stable.
And in slice patterns we also have [a, PATTERN_THING.., b] on nightly for which this RFC tries to find a replacement for stabilization.

@SoniEx2

This comment has been minimized.

Copy link

commented Apr 7, 2018

remove PATTERN_THING.

idk why the hell PATTERN_THING.. is a thing, even the person who made that says it was a mistake and should have been binding.. instead.

so just take binding @ PATTENR_THING, and drop the PATTERN_THING, and treat dropping the PATTERN_THING as a "slice patterns" thing.

[a, binding@, b].

and the clippy lint should recommend no space between binding and @ for slice patterns.

and this is a trivial variation of already existing things.

if you want PATTERN_THING@ in slice patterns, give us PATTERN_THING @ PATTERN_THING everywhere.

@jethrogb

This comment has been minimized.

Copy link
Contributor

commented Apr 9, 2018

@petrochenkov

I also like PAT @ ..

Why?

  • It adds an extra sigil unnecessary in 99% cases (in 100% with the current language). It also can lead to double @ if the pattern itself contains it (ident @ PAT @ ..?).
  • It's not consistent with any existing syntax. We have ident @ PAT, but it has the pattern on the right side, so it's rather "confusingly similar" so people writing code by analogy may write a wrong thing.
  • It's not symmetric with "rest of the list" in expressions.

My bad, I meant ident @ .., not PAT @ ... The current RFC text calls it PAT but shows only identifier usage. I thought .. is the pattern in this case. Can you explain why you need a pattern in this position?

@nikomatsakis

This comment has been minimized.

Copy link
Contributor

commented Apr 26, 2018

I'm trying to get my head caught up. To summarize current debate:

There are two cases to consider:

  • [a, .., c] -- match any number of elements (and ignore)
  • [a, ..b, c] -- extract inner elements as a slice (or subarray, etc)

This syntax is indeed fairly obvious. However, it conflicts with a possible future extension of open-ended ranges like ..5, which might mean the same as 0..5. (And indeed one can write &foo[..x] today.)

There have been various alternatives floated. I think the primary one is the @ operator, like so:

  • [a, .., c]
  • [a, b @ .., c]

As @arielb1 pointed out, making .. a valid pattern is unlikely, since it would be equivalent to _. @petrochenkov points out that it could be considered unnecessary verbose for the common case. There is (to my eyes) a certain consistency, though, since .. refers to "the stuff in the middle" and @ is typically used to give a name to stuff that would have been matched (even though .. here is not, strictly speaking, a pattern).

Also proposed was to just say "nothing", I think. Not sure what was meant entirely, could have been this:

  • [a, , c]
  • [a, b @, c]

Does this sound like a fairly complete summary? It seems like we're at the point where we effectively have to make a judgement call.

@SoniEx2

This comment has been minimized.

Copy link

commented Apr 26, 2018

I suggested "nothing" in the sense of:

  • [a, @, c]
  • [a, b@, c]

The @ would still exist in the no-binding case only for the sake of catching typos ([a,, c] when you meant [a, c]). Accidentally forgetting to add the @ (in the case of b@) is also something to consider, but arguably less likely to happen in practice.

Alternatively, do some A/B testing:

  1. Make nightly use @ and b @. Leave it in for a few nightlies, gather feedback on it.
  2. Switch to `` (empty) and b @. Leave it in for a few nightlies, gather feedback on it.
  3. Switch to .. and b ... Leave it in for a few nightlies, gather feedback on it.
  4. Switch to .. and b @ ... Leave it in for a few nightlies, gather feedback on it.

(We should do more of this btw. I know it intentionally breaks code, but is probably the easiest/fastest way to gather feedback on it. Also, it's nightly, don't expect nightly to be stable.)

@petrochenkov

This comment has been minimized.

Copy link
Contributor Author

commented Apr 26, 2018

@jethrogb

Can you explain why you need a pattern in this position?

This is a legitimate question.
My first reaction is that supporting arbitrary patterns in this position is status quo, it's already implemented and we'd have to change the implementation to specifically prohibit them.

Regarding the actual motivation, we need to support at least two kinds of patterns:

  • bindings (..x, ..ref x, ..mut x, ..ref mut x). Note that currently if ref x is supported in some position, then arbitrary patterns are supported in it as well, so prohibiting them in sub-slices would create an exception,
  • wildcard _, because .. means .._

, so it's kinda reasonable to support remaining patterns as well.

From all the remaining pattern kinds only inner slice patterns (..[inner, slice, patterns]), array constant patterns (..ARRAY_CONST) and parentheses (..([A, B, B])) can be used in this position without type errors, including forms with bindings (..whole_subslice @ [A, B, C]).
I do agree that these are probably rarely needed, but maybe people can come up with some examples when they are useful?
I suspect something like whole_slice_binding @ [first_elem_binding, ..] would be the most useful, except that this case (and patterns with bindings on both sides in general) is not supported by borrow checker yet. (But they certainly may be supported in the future!)

@petrochenkov

This comment has been minimized.

Copy link
Contributor Author

commented Apr 26, 2018

@jethrogb

The current RFC text calls it PAT but shows only identifier usage.

Yeah, examples with non-binding patterns are certainly something that needs to be included in the RFC, at least from what I see in the discussion thread the lack of such examples caused enough confusion.

@SoniEx2

This comment has been minimized.

Copy link

commented Apr 26, 2018

any thoughts on A/B-testing language changes?

@jethrogb

This comment has been minimized.

Copy link
Contributor

commented Apr 26, 2018

@petrochenkov

Note that currently if ref x is supported in some position, then arbitrary patterns are supported in it as well, so prohibiting them in sub-slices would create an exception,

Either I don't understand what you're saying, or that's not actually the case. Example:

fn main() {
    let a = (1, 2);
    let ref s @ _ = a; // ok
    let (ref s, _) @ _ = a; // error: expected one of `:`, `;`, or `=`, found `@`
}

How is this behavior different from what I'm proposing for subslice bindings?

From all the remaining pattern kinds only inner slice patterns (..[inner, slice, patterns]), array constant patterns (..ARRAY_CONST) and parentheses (..([A, B, B])) can be used in this position without type errors

These could all be expanded in-place and then you don't need a subslice pattern, correct?

@petrochenkov

This comment has been minimized.

Copy link
Contributor Author

commented Apr 29, 2018

@jethrogb

let ref s @ _ = a; // ok
let (ref s, _) @ _ = a; // error: expected one of `:`, `;`, or `=`, found `@`

Well, yes, except for this case.
What I really meant is that "if ref x @ _ (i.e. ref x desugared) is supported in some position, then arbitrary patterns are supported in it as well".

@petrochenkov

This comment has been minimized.

Copy link
Contributor Author

commented Apr 29, 2018

@jethrogb

These could all be expanded in-place and then you don't need a subslice pattern, correct?

Yes, constants and parentheses are pretty useless and can be "inlined".

As I said, something like ..ref ident @ [_] would be the most realistically useful application for patterns in this position, when you wan't to match a specific subslice (e.g. with one element), but at the same time give a name to this subslice. This case cannot be "inlined".
(With PAT @ .. syntax it would look like ref ident @ [_] @ ...)

@bstrie

This comment has been minimized.

Copy link
Contributor

commented May 13, 2018

Three things:

  1. If we're being forward-thinking enough to consider the syntax for open-ended range patterns in the pattern grammar, then we should also consider that patterns are intended to reflect construction, which means that any syntax that we design here is eventually going to have people asking for its use in non-pattern contexts as well. In other words, once this feature is stable I guarantee that people are going to want to be able to do e.g. let x = [2, 3, 4]; let y = [1, ..x, 5]; assert!([1,2,3,4,5]==y);. This isn't a semantic stretch, either; in Python this is called "unpacking", and x = [2, 3, 4]; y = [1, *x, 5] is completely legal. And of course, let x = [..y]; would be illegal in Rust because ranges already use that syntax. It's unfortunate that Foo {x, ..y} is today the closest thing we have to "unpacking" syntax, because it means we're already boxed into a corner; either we decide never to support unpacking, or we accept that the syntax is going to have to be inconsistent somewhere.

  2. Changing tack, given that [1, foo @ _, 3] (intentional underscore) is a completely legal pattern today, I submit that [1, foo @ .., 3] should be made to Just Work; in other words, we lift the arbitrary restriction on the .. pattern that today prevents tuple patterns from being able to say (1, foo @ .., 2) (the restriction can remain for struct patterns, because nominal types). This should happen regardless of if we also want to have a dedicated syntax for subslice patterns, in the same way that [1, foo @ _, 3] works despite the fact that we have dedicated syntax for that of the form [1, foo, 3]. By that point of view, not only does this make the foo @ .. proposal look like the most conservative path forward, it also allows us to completely punt on what unpacking syntax might look like, sidestepping any collateral bikeshedding. Consider my vote in favor of foo @ ..; though I admit at first I wasn't a fan, it really does seem like the most sensible option -- and more importantly, it's forward-compatible with later coming up with alternative syntax without seeming like redundancy (it's a natural extension of @, after all). And who knows, maybe people will actually stop being afraid of the @ pattern if we make it more common. :P

  3. Although I hope that the above is convincing enough, if we really must have dedicated syntax for this then I submit ..ident.., as in [foo, ..bar.., qux], since it at least preserves some similarity to the existing "skip multiple things" syntax, and when people start asking for an unpacking RFC it wouldn't be so dissimilar from the functional struct update syntax that it would give me conniptions. :) That said, this is also a discussion that we could fully leave for the future if we implement my suggestion in the prior paragraph.

@SoniEx2

This comment has been minimized.

Copy link

commented May 14, 2018

And who knows, maybe people will actually stop being afraid of the @ pattern if we make it more common. :P

I'd like to plug #2272 and rust-lang/rust-clippy#2378. We should make ppl afraid of non-@ patterns instead.

@Centril Centril removed the I-nominated label Jun 6, 2019

@Centril

This comment has been minimized.

Copy link
Member

commented Jun 7, 2019

@rfcbot merge

We briefly discussed this RFC on the language team meeting yesterday. Those present agreed with the general design of this RFC in terms of syntax. Personally, I agree that using .. to denote "the rest" and (ref mut?)? x @ PAT to bind the rest to a name seems natural and follows what one would expect given the current language. Note that we not discuss the merits of sub-slice pattern matching as a feature itself as this is a preexisting feature before 1.0. See #495 as historical background.

Those present also agreed with having .. be a full pattern syntactically to have a simpler grammar and for the benefit of macros. To that end, I also propose that we merge #2707 and that we then implement and track these features together.

@rfcbot

This comment has been minimized.

Copy link

commented Jun 7, 2019

Team member @Centril has proposed to merge this. The next step is review by the rest of the tagged team members:

No concerns currently listed.

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@Centril

This comment has been minimized.

Copy link
Member

commented Jun 7, 2019

N.B. I have checked Aaron's box as they are on leave.

Show resolved Hide resolved text/0000-subslice-pattern-syntax.md Outdated
Show resolved Hide resolved text/0000-subslice-pattern-syntax.md Outdated
Show resolved Hide resolved text/0000-subslice-pattern-syntax.md Outdated

@Centril Centril added the I-nominated label Jun 7, 2019

@petrochenkov petrochenkov force-pushed the petrochenkov:subsl branch from 13e71bb to b9e0820 Jun 8, 2019

@rfcbot

This comment has been minimized.

Copy link

commented Jun 20, 2019

🔔 This is now entering its final comment period, as per the review above. 🔔

@rfcbot

This comment has been minimized.

Copy link

commented Jun 30, 2019

The final comment period, with a disposition to merge, as per the review above, is now complete.

As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed.

The RFC will be merged soon.

@Centril Centril merged commit 60b973a into rust-lang:master Jun 30, 2019

@Centril

This comment has been minimized.

Copy link
Member

commented Jun 30, 2019

🎉 Huzzah! This RFC has been merged! 🎉

Tracking issue: rust-lang/rust#62254

@petrochenkov petrochenkov deleted the petrochenkov:subsl branch Jul 1, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.