Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restrict constants in patterns #1445

Merged

Conversation

Projects
None yet
@nikomatsakis
Copy link
Contributor

nikomatsakis commented Jan 6, 2016

Feature-gate the use of constants in patterns unless those constants have simple types, like integers, booleans, and characters. The semantics of constants in general were never widely discussed and the compiler's current implementation is not broadly agreed upon (though it has many proponents). The intention of adding a feature-gate is to give us time to discuss and settle on the desired semantics in an "affirmative" way.

Because the compiler currently accepts a larger set of constants, this is a backwards incompatible change. This is justified as part of the "underspecified language semantics" clause of RFC 1122. A crater run found 14 regressions on crates.io, which suggests that the impact of this change on real code would be minimal.

Note: this was also discussed on an internals thread. Major points from that thread are summarized either inline or in alternatives.

Rendered view.

@nikomatsakis nikomatsakis added the T-lang label Jan 6, 2016

| u8 | u16 | u32 | u64 | usize // unsigned integers
| char // characters
| bool // booleans
| (B, ..., B) // tuples of builtin types

This comment has been minimized.

@nagisa

nagisa Jan 6, 2016

Contributor

Might be useful to consider including &str and arrays as well, since they have a well defined “primitive” memory-representation equality.


Here, although it may well be that `T::X == T::Y`, we can't know for
sure. So, for consistency, we may wish to treat all constants opaquely
regardless of whether we are in a generic context or not.

This comment has been minimized.

@nagisa

nagisa Jan 6, 2016

Contributor

This is a pretty good point. If we opt to do the checking for regular constants, the contents of constant becomes a public API – changing these becomes a breaking change, which is pretty much counters almost whole consts’ purpose.

This comment has been minimized.

@oli-obk

oli-obk Jan 11, 2016

Contributor

Alternatively, in the future, we could forbid this kind of matching if it cannot be guaranteed that T::X != T::Y (which could be generically guaranteed by allowing such a statement in where clauses)

@nagisa

This comment has been minimized.

Copy link
Contributor

nagisa commented Jan 6, 2016

Pretty strongly in favour.

@retep998

This comment has been minimized.

Copy link
Member

retep998 commented Jan 6, 2016

This RFC is a good idea. Since it only affects matching on constants of non-primitive types which is quite the sticky situation, I am in favor of this RFC.

-- the same issues arise if you consider exhaustiveness checking.

On the other hand, it feels very silly for the compiler not to
understand that `match some_bool { true => ..., false => ... }` is

This comment has been minimized.

@petrochenkov

petrochenkov Jan 6, 2016

Contributor

I'd say true and false feel more like unit variants of "enum bool { false, true }" than opaque constants.

@mahkoh

This comment has been minimized.

Copy link
Contributor

mahkoh commented Jan 6, 2016

This breaks lots of code of this kind:

pub struct Errno(pub c_int);

pub const Interrupted: Errno = Errno(EINTR);

My code contains at least 26 instances of this pattern (excluding flags) and an uncountable number of variants:

ack "pub struct .*\(pub .*\)" | grep -iv flags | wc -l

For instance, the Errno struct comes with 132 variants. How am I supposed to repair this?

@petrochenkov

This comment has been minimized.

Copy link
Contributor

petrochenkov commented Jan 6, 2016

pub struct Errno(pub c_int);

Newtypes like this are exactly the pattern which is responsible for the most of breakage on crates.io and in rustc itself.

@mahkoh

This comment has been minimized.

Copy link
Contributor

mahkoh commented Jan 6, 2016

I've also no idea what this has to do with constants.

match x {
    Errno(1) => 1,
    _ => 0,
}

if x == Errno(1) { 1 } else { 0 }

are not guaranteed to produce the same result either. Whether Errno(1) is behind a constant or not is irrelevant here.

@mahkoh

This comment has been minimized.

Copy link
Contributor

mahkoh commented Jan 6, 2016

Of course the same also applies to enums. E.g.

enum A {
    X(u8),
    Y,
}

impl PartialEq for A {
    fn partial_cmp(&self, other: &A) -> bool { false }
}

Matching will return results that are incompatible with the PartialEq results.

@mahkoh

This comment has been minimized.

Copy link
Contributor

mahkoh commented Jan 6, 2016

Given that the behavior for enums is already fixed (at least the RFC doesn't suggest otherwise), and given that newtypes with all fields public are very similar to enums, it seems that the current behavior (which agrees with the enum behavior) is already the expected behavior.

@nagisa

This comment has been minimized.

Copy link
Contributor

nagisa commented Jan 6, 2016

@mahkoh the problem here is using enums has a well specified structural equivalency, but using a constant doesn’t make the equivalency used obvious. There’s a bunch of additional reasons outlined by the RFC why structural equivalency with constants is not satisfactory.

@nikomatsakis

This comment has been minimized.

Copy link
Contributor Author

nikomatsakis commented Jan 6, 2016

@mahkoh

I've also no idea what this has to do with constants.

I see a difference between matching on a constant C (which is defined as Errno(1)) and matching on Errno(1) directly in the pattern. Basically, to me, the pattern translates into a set of "test-and-extract" operations that will be performed. So, the pattern Errno(pattern>) compiles to "test that the variant is Errno and extract the value". We then recursively apply <pattern> to that value. Sometimes there are no values to extract: if you match a nullary enum variant like None, it just means "test that the variant is None". Along these lines, matching a constant pattern like C (resp. 1) corresponds to "test that the value is equal (for some definition of equal) to C (resp. 1) and extract no values".

Looking at it like this, there is no reason to expect that

match foo { Errno(1) => ... }

and

const C: Errno = Errno(1);
match foo { C => ... }

would do the same thing, just as you would not necessarily expect that:

if errno.0 == 1 { ... }

and

if errno == C { ... }

would do the same thing.

PS, I am summarizing something I wrote on the internals thread, just for reference.

@nikomatsakis

This comment has been minimized.

Copy link
Contributor Author

nikomatsakis commented Jan 6, 2016

@mahkoh

This breaks lots of code of this kind

Yes, the proposal is backwards incompatible, and this is exactly the kind of cases that would no longer work. You would have to either use the feature-gate or translate such code from match foo { BAR => ... } to match foo { c if c == BAR => ... }.

UPDATE: To be clear, I don't want to break code either, and I also really want constants of user-defined types to work in patterns! I don't mean to sound glib. But, based on the crater results, it does seem that we have room here to rollback to semantics everyone can agree on. This would allow us to have a productive discussion later on focusing on what the best overall semantics ought to be for constants in a pattern.

@mahkoh

This comment has been minimized.

Copy link
Contributor

mahkoh commented Jan 6, 2016

@nikomatsakis

I'm fine with a feature gate if it can be enabled at the point where the type is defined. E.g.

#[structural_match]
pub struct Errno(pub c_int);
@nikomatsakis

This comment has been minimized.

Copy link
Contributor Author

nikomatsakis commented Jan 6, 2016

@mahkoh

I'm fine with a feature gate if it can be enabled at the point where the type is defined.

By this, do you mean that the clients who are matching on Errno values would not need to use the feature-gate?

@mahkoh

This comment has been minimized.

Copy link
Contributor

mahkoh commented Jan 6, 2016

@nikomatsakis Exactly.

@nikomatsakis

This comment has been minimized.

Copy link
Contributor Author

nikomatsakis commented Jan 6, 2016

@mahkoh Hmm. It is normally something we would not allow, since if we decided to use semantic equality (for example), then there might not be an equivalent behavior to #[structural_match] in the future (and I wouldn't want to preserve the attribute). OTOH, since I suspect you do define Eq for errno to act exactly as structural match acts, that might be ok (modulo exhaustiveness). I have to think about it.

@mahkoh

This comment has been minimized.

Copy link
Contributor

mahkoh commented Jan 6, 2016

@nikomatsakis As long as the attribute is behind a feature gate, breaking it again doesn't seem to be a problem.

@seanmonstar

This comment has been minimized.

Copy link
Contributor

seanmonstar commented Jan 6, 2016

This sort of breaking change seems like it should be a semver major bump. It's not a security, bug, or soundness fix. It's instead trying to improve reasoning about code. Not that the goal is bad, but that breakage for that goal seems unacceptable for the 1.x versions.

@mahkoh

This comment has been minimized.

Copy link
Contributor

mahkoh commented Jan 6, 2016

@nikomatsakis Whether the behavior is the current one or one that uses PartialEq, the code I'm worried about behaves the same. This attribute is simply supposed to bridge the gap. Maybe call it #[deprecated_const_match] if that makes it more obvious.

@nikomatsakis

This comment has been minimized.

Copy link
Contributor Author

nikomatsakis commented Jan 6, 2016

@seanmonstar

This sort of breaking change seems like it should be a semver major bump. It's not a security, bug, or soundness fix. It's instead trying to improve reasoning about code. Not that the goal is bad, but that breakage for that goal seems unacceptable for the 1.x versions.

Personally, I consider this a bug fix. That is, I did not expect that constants of arbitrary types should be matchable. In fact, I opened a bug about it before 1.0, but that bug was accidentally closed when I wrote a comment like "does not try to fix #20489", and hence it dropped off of my radar when triaging for "things that ought to be feature-gated before 1.0". However, clearly there is room for disagreement here.

@petrochenkov

This comment has been minimized.

Copy link
Contributor

petrochenkov commented Jan 6, 2016

Curiously, #[structural_match] already exists and is called #[derive(PartialEq)].
#[derive(PartialEq)] is essentially an assertion that structural and semantic matching do the same thing, so there's no choice left what to do when performing a match and we can unambiguously generate the same code as today.
Maybe #[derive(PartialEq)] should not only generate some code during expansion, but also emit a flag this_adt_is_usable_in_const_pattern_matching,which is considered in later stages of compilation.
this_adt_is_usable_in_const_pattern_matching can become a separate attribute eventually if necessary, but generating it as a part of #[derive(PartialEq)] allows to avoid breakage and generally looks like a reasonable way forward, covering most of practical cases.

Exhaustiveness checking is a separate orthogonal concern, it still may be better to turn it off for constants regardless of this_adt_is_usable_in_const_pattern_matching

@nikomatsakis

This comment has been minimized.

Copy link
Contributor Author

nikomatsakis commented Jan 6, 2016

@petrochenkov

Curiously, #[structural_match] already exists and is called #[derive(PartialEq)].

That is mostly true, but only if all types embedded within the struct also #[derive(PartialEq)]. (Still, that's a very interesting thought.)

@petrochenkov

This comment has been minimized.

Copy link
Contributor

petrochenkov commented Jan 7, 2016

@nikomatsakis
#[structural_match] should not affect nested things, otherwise you could make an "opaque" structure transparent/matchable by using a newtype:

`#[structural_match]`
struct Transparent(Opaque);

So, yes, it was implied, that the both #[structural_match] or #[derive(PartialEq)] have to be checked recursively.
In theory marking a structure containing non-#[structural_match] fields with #[structural_match] can be made an error, but derive is not smart enough to emit #[structural_match] conditionally to avoid this error, so I don't consider this variant.

@nikomatsakis

This comment has been minimized.

Copy link
Contributor Author

nikomatsakis commented Jan 8, 2016

@petrochenkov @mahkoh

So, on reflection, I quite like this idea. My assumption is that this would be something like #[fundamental] -- that is, an attribute that we intend to never stabilize but which lets us adopt the subset of semantics we know we want while we dicker about the remainder. In this case, using the semantics you proposed, we could ultimately resolve this in at least two ways:

  1. Removing the attribute and adopt the current "structural match" implementation.
  2. Removing the attribute and adopt the "semantic equality" interpretation. (*)

Shall I adjust the RFC? I think so.

(*) Actually, the attribute could even stay as a kind of performance optimization. That is, @pnkfelix and I have talked about the compiler recognizing when the PartialEq impl resulted from derive and generating better match code in that case, since its semantics are well understood. One could imagine adopting an unsafe attribute or something along those lines, which PartialEq implicitly provides.

@petrochenkov

This comment has been minimized.

Copy link
Contributor

petrochenkov commented Jan 8, 2016

#[structural_match] can probably be a (possibly unsafe) structural trait (OIBIT) and not an attribute.
Edit: It can't be an OIBIT, but it can be a normal trait, maybe even a sub-trait for PartialEq, kind of like Copy for Clone.

trait TrivialPartialEq: PartialEq {}
@petrochenkov

This comment has been minimized.

Copy link
Contributor

petrochenkov commented Jan 8, 2016

On interaction of #[structural_match] with floats.
(I assume that match can be used only with #[structural_match] types)
Edit: by "permitted in match" below I mean permitted in constants in match.

Variant 1.
The attribute gives hard guarantees about PartialEq properties.
In this case it can be used for optimizations, for example operators == for #[structural_match] types without padding can be translated into memcmps.
In this case floats simply can't be used in match, their equality comparison is not trivial.

Vartiant 2.
The attribute is a rubber stamp, it says "yeah, this can be used in match" and that's all. match in its turn performs a structural comparison OR runs partial_eq even if they do different things. In this case floats are allowed in match and compared structurally OR semantically, respectively.

Variant 3.
Floats are an exception, they are not #[structural_match], but permitted in match and compared structurally or semantically. Structures containing floats are not permitted in match,


Some bikeshedding, names for structural_match:
trivially_comparable, default_comparable - "comparable" can be confused with PartialOrd
trivial_equality
trivial_partial_eq (or TrivialPartialEq for a trait) - says exactly what it does, seems like a good if it provides hard guarantees.

@nikomatsakis

This comment has been minimized.

Copy link
Contributor Author

nikomatsakis commented Feb 5, 2016

Clean crater run finally completed. 6 regressions, as expected: https://gist.github.com/nikomatsakis/e714e4a824527e0ce5c9

@nikomatsakis nikomatsakis referenced this pull request Feb 5, 2016

Open

Restrict use of constants in patterns (RFC 1445) #31434

3 of 4 tasks complete

@nikomatsakis nikomatsakis merged commit 120c28b into rust-lang:master Feb 5, 2016

@nikomatsakis

This comment has been minimized.

Copy link
Contributor Author

nikomatsakis commented Feb 6, 2016

Update: in a recent PR (rust-lang/rust#31277) span was updated to derive equality, so there should be fewer regressions than previously thought.

@mrmonday

This comment has been minimized.

Copy link

mrmonday commented Mar 31, 2016

Is there a nice way to deal with this, that doesn't require a loss of type safety, or far more verbose code? libpnet has a lot of code like:

#[derive(Copy, Clone, Debug, PartialEq, Eq, PartialOrd, Ord, Hash)]
pub struct EtherType(pub u16);

pub const Ipv4: EtherType = EtherType(0x0800);
// ... list of constants ...
match some_ethertype {
    Ipv4 => {},
    Other => {}
}

The easy way to solve it is to remove the type safety, which is what @wictory's pull request did initially - these constants have been specifically designed like this though to minimise errors (there are lots of magical numbers for lots of different things), whilst still allowing the ability to use random values if you explicitly qualify it.

@eddyb

This comment has been minimized.

Copy link
Member

eddyb commented Mar 31, 2016

@mrmonday But that example compiles just fine (I added more variants to make sure there wasn't an exhaustiveness-related special casing).
@bluss pointed out on IRC that if you implement Eq manually you do get the errors, maybe that's what pnet is doing?

@eddyb

This comment has been minimized.

Copy link
Member

eddyb commented Mar 31, 2016

Turns out that syntex partially expands #[derive] so the code in libsyntax adding #[structural_match] on #[derive(..., PartialEq, ..., Eq, ...)] doesn't trigger. I'll try to fix this in the compiler.

EDIT: hang on, those attributes aren't stable, how was any of that compiling at all?!

@nikomatsakis

This comment has been minimized.

Copy link
Contributor Author

nikomatsakis commented Apr 4, 2016

@eddyb sigh. I can readily fix the code to have two distinct attributes, I was just being lazy.

@nikomatsakis

This comment has been minimized.

Copy link
Contributor Author

nikomatsakis commented Apr 4, 2016

Although, triggering on #[derive_Eq] is a bit harder. Hopefully not that hard. I'm trying to remember why I chose not to do so -- I feel like there was a reason for that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.