Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upExhaustive integer matching #50912
Conversation
rust-highfive
assigned
michaelwoerister
May 20, 2018
This comment has been minimized.
This comment has been minimized.
|
(rust_highfive has picked a reviewer for you, use r? to override) |
rust-highfive
added
the
S-waiting-on-review
label
May 20, 2018
This was referenced May 20, 2018
kennytm
added
the
T-lang
label
May 20, 2018
scottmcm
reviewed
May 21, 2018
|
I'm definitely fine with merging this without an RFC (feature-gated of course, as you did).
This is amazing even if all it did was give that error instead of just "pattern |
|
|
||
| // Let's test other types too! | ||
| match '\u{0}' { | ||
| '\u{0}' ..= char::MAX => {} // ok |
This comment has been minimized.
This comment has been minimized.
scottmcm
May 21, 2018
Member
Since char is a USV, it doesn't actually need to cover that entire range. If it's feasible, could this handle the example in rust-lang/rfcs#1550 (comment) too? (Feel free to punt if hard.)
| --> $DIR/exhaustive_integer_patterns.rs:50:11 | ||
| | | ||
| LL | match x { //~ ERROR non-exhaustive patterns | ||
| | ^ patterns `-128i8...-5i8`, `120i8...121i8` and `121i8...127i8` not covered |
This comment has been minimized.
This comment has been minimized.
scottmcm
May 21, 2018
Member
Could adjacent ranges be merged, or at least not overlap? 121 is in two of the uncovered ranges, and it would be nice if the message was "patterns -128i8...-5i8 and 120i8...127i8 not covered" instead.
| LL | match x { //~ ERROR non-exhaustive patterns | ||
| | ^ patterns `-128i8...-5i8`, `120i8...121i8` and `121i8...127i8` not covered | ||
|
|
||
| error: aborting due to 5 previous errors |
This comment has been minimized.
This comment has been minimized.
scottmcm
May 21, 2018
Member
All the holes in the examples are non-singleton. Consider a UI test for something like -127i8..=127 or match 0 { i16::MIN..=-1 => {} 1..=i16::MAX => {} }.
| value_constructors = true; | ||
| vec![ConstantRange(ty::Const::from_bits(cx.tcx, min, cx.tcx.types.char), | ||
| ty::Const::from_bits(cx.tcx, max, cx.tcx.types.char), | ||
| RangeEnd::Included)] |
This comment has been minimized.
This comment has been minimized.
scottmcm
May 21, 2018
Member
Hopefully the char request is just to initialize this as a two-element vec instead...
This comment has been minimized.
This comment has been minimized.
|
r? @eddyb maybe? |
rust-highfive
assigned
eddyb
and unassigned
michaelwoerister
May 21, 2018
eddyb
reviewed
May 21, 2018
| I16 => min_max_ty!(i16, u16, cx.tcx.types.i16), | ||
| I32 => min_max_ty!(i32, u32, cx.tcx.types.i32), | ||
| I64 => min_max_ty!(i64, u64, cx.tcx.types.i64), | ||
| I128 => min_max_ty!(i128, u128, cx.tcx.types.i128), |
This comment has been minimized.
This comment has been minimized.
eddyb
May 21, 2018
Member
cc @oli-obk This sort of thing can be done from the bit width (which you can get from the layout of these types).
eddyb
reviewed
May 21, 2018
| struct Interval<'tcx> { | ||
| pub lo: u128, | ||
| pub hi: u128, | ||
| pub ty: Ty<'tcx>, |
This comment has been minimized.
This comment has been minimized.
eddyb
reviewed
May 21, 2018
| /// An inclusive interval, used for precise integer exhaustiveness checking. | ||
| struct Interval<'tcx> { | ||
| pub lo: u128, | ||
| pub hi: u128, |
This comment has been minimized.
This comment has been minimized.
eddyb
reviewed
May 21, 2018
| remaining_ranges.into_iter().map(|(lo, hi)| { | ||
| let (lo, hi) = Interval::offset_sign(ty, (lo, hi), false); | ||
| ConstantRange(ty::Const::from_bits(cx.tcx, lo, ty), | ||
| ty::Const::from_bits(cx.tcx, hi, ty), |
This comment has been minimized.
This comment has been minimized.
eddyb
May 21, 2018
Member
Should ConstantRange use ty::Const? Seems expensive, and useless unless the ty::Const is Bits (cc @oli-obk).
This comment has been minimized.
This comment has been minimized.
oli-obk
May 21, 2018
Contributor
We should definitely be using ConstValue instead of ty::Const. I put it on my TODO list
This comment has been minimized.
This comment has been minimized.
eddyb
May 21, 2018
Member
ConstantRange can only hold integers, so ConstValue shouldn't be needed, right?
This comment has been minimized.
This comment has been minimized.
oli-obk
May 21, 2018
Contributor
Probably. My TODO already says "ConstValue or smaller" ;). That should not block this PR though, since it's a preexisting issue.
eddyb
reviewed
May 21, 2018
| // The pattern intersects the middle of the subrange, | ||
| // so we create two ranges either side of the intersection.) | ||
| remaining_ranges.push((subrange_lo, pat_interval.lo)); | ||
| remaining_ranges.push((pat_interval.hi, subrange_hi)); |
This comment has been minimized.
This comment has been minimized.
eddyb
reviewed
May 21, 2018
| for (subrange_lo, subrange_hi) in ranges { | ||
| if pat_interval.lo > subrange_hi || pat_interval.hi < subrange_lo { | ||
| // The pattern doesn't intersect with the subrange at all, | ||
| // so the subrange remains untouched. |
This comment has been minimized.
This comment has been minimized.
eddyb
May 21, 2018
Member
You could construct the intersection by i = a.start.max(b.start)..=a.end.min(b.end), and the two halves of the subtraction a - b by sl = a.start..=i.start-1 and sh = i.end+1..=a.end (be careful with overflow).
AFAIK, the empty condition is range.start > range.end (for inclusive ranges).
Since I don't think you need the intersection by itself, the two halves can be:
sl = a.start..=a.start.max(b.start)-1 and
sh = a.end.min(b.end)+1..=a.end
(when a.start.max(b.start) <= a.end.min(b.end), otherwise sl = sh = a)
Therefore:
if a.start > b.end || b.start > a.end {
// No intersection.
remaining_ranges.push(a);
} else {
// Overflow is now not a problem, because of the conditions.
if b.start > a.start {
remaining_ranges.push(a.start..=b.start-1);
}
if b.end < a.end {
remaining_ranges.push(b.end+1..=a.end);
}
}I've just checked and your code corresponds to this, but you have the 4 possibilities of my two small ifs expanded out - my main suggestion from this comment would be to orthogonalize those.
eddyb
reviewed
May 21, 2018
| // `missing_ctors` are those that should have appeared | ||
| // as patterns in the `match` expression, but did not. | ||
| let mut missing_ctors = vec![]; | ||
| 'req: for req_ctor in all_ctors.clone() { |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
I left some nit-picky comments about implementation details, but r? @nikomatsakis |
rust-highfive
assigned
nikomatsakis
and unassigned
eddyb
May 21, 2018
eddyb
reviewed
May 21, 2018
| } | ||
| } | ||
| let (min, max, ty) = match int_ty { | ||
| Isize => min_max_ty!(isize, usize, cx.tcx.types.isize), |
This comment has been minimized.
This comment has been minimized.
eddyb
May 21, 2018
Member
Oh, this is broken, because it's using the host's usize / isize bitwidth instead of the target's.
This comment has been minimized.
This comment has been minimized.
|
I've addressed most of the comments, but there's still the issue with |
varkor
reviewed
May 22, 2018
| --> $DIR/exhaustive_integer_patterns.rs:50:11 | ||
| | | ||
| LL | match x { //~ ERROR non-exhaustive patterns | ||
| | ^ patterns `-128i8...-6i8` and `122i8...127i8` not covered |
This comment has been minimized.
This comment has been minimized.
eddyb
reviewed
May 22, 2018
| U128 => ( u128::MAX as u128, cx.tcx.types.u128), | ||
| }); | ||
| let min_max_ty = |sty| { | ||
| let size = cx.tcx.layout_of(ty::ParamEnv::reveal_all().and(sty)) |
This comment has been minimized.
This comment has been minimized.
eddyb
May 22, 2018
Member
You should use cx.param_env or something. Also, you don't need to match on uint_ty at all! Pass pcx.ty instead.
This comment has been minimized.
This comment has been minimized.
eddyb
reviewed
May 22, 2018
| if pat_interval_hi < subrange_hi { | ||
| // The pattern intersects a lower section of the | ||
| // subrange, so an upper section will remain. | ||
| remaining_ranges.push((pat_interval_hi + 1, subrange_hi)); |
This comment has been minimized.
This comment has been minimized.
eddyb
May 22, 2018
Member
remaining_ranges could be a Vec<RangeInclusive<u128>>, i.e. use ..= syntax.
eddyb
reviewed
May 23, 2018
| @@ -34,6 +34,9 @@ use arena::TypedArena; | |||
| use std::cmp::{self, Ordering}; | |||
| use std::fmt; | |||
| use std::iter::{FromIterator, IntoIterator, repeat}; | |||
| use std::{char, usize, u8, u16, u32, u64, u128, isize, i8, i16, i32, i64, i128}; | |||
This comment has been minimized.
This comment has been minimized.
eddyb
reviewed
May 23, 2018
| let size = cx.tcx.layout_of(ty::ParamEnv::reveal_all().and(pcx.ty)) | ||
| .unwrap().size.bits() as u32; | ||
| let shift = 1u128.overflowing_shl(size); | ||
| let max = shift.0.wrapping_sub(1 + (shift.1 as u128)); |
This comment has been minimized.
This comment has been minimized.
eddyb
reviewed
May 23, 2018
| ty::TyInt(_) if exhaustive_integer_patterns => { | ||
| let size = cx.tcx.layout_of(ty::ParamEnv::reveal_all().and(pcx.ty)) | ||
| .unwrap().size.bits() as u128; | ||
| let min = (1u128 << (size - 1)).wrapping_neg(); |
This comment has been minimized.
This comment has been minimized.
eddyb
reviewed
May 23, 2018
| let size = cx.tcx.layout_of(ty::ParamEnv::reveal_all().and(pcx.ty)) | ||
| .unwrap().size.bits() as u128; | ||
| let min = (1u128 << (size - 1)).wrapping_neg(); | ||
| let max = (1u128 << (size - 1)).wrapping_sub(1); |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
Everything seems to be working correctly now, as far as I can tell. @eddyb had some suggestions for refactoring |
arielb1
reviewed
Aug 20, 2018
| Ok(false) => None, | ||
| Err(ErrorReported) => None, | ||
| } | ||
| // If the constructor is a single value, we add a row to the specialised matrix |
This comment was marked as resolved.
This comment was marked as resolved.
arielb1
reviewed
Aug 20, 2018
| Ok(false) => None, | ||
| Err(ErrorReported) => None, | ||
| } | ||
| PatternKind::Range { .. } => { |
This comment was marked as resolved.
This comment was marked as resolved.
This comment has been minimized.
This comment has been minimized.
|
Nice PR overall. Just need to take a look at the range-splitting thing and write a mega-comment there (you could try to write one yourself). |
varkor
added some commits
Aug 20, 2018
varkor
force-pushed the
varkor:exhaustive-integer-matching
branch
from
b3f6184
to
61b6363
Aug 20, 2018
This comment has been minimized.
This comment has been minimized.
|
I've tried to improve the comments in |
arielb1
reviewed
Aug 21, 2018
| } | ||
| let c = match a.1 { | ||
| Endpoint::Start => a.0, | ||
| Endpoint::End | Endpoint::Both => a.0 + 1, |
This comment has been minimized.
This comment has been minimized.
arielb1
Aug 21, 2018
Contributor
How does this handle integer overflow (i.e. a.0 = uint128::MAX)? worth a test case, at least.
This comment has been minimized.
This comment has been minimized.
varkor
Aug 21, 2018
•
Author
Member
(Edited) a.0 < b.0 strictly, so overflow shouldn't be possible. There are already test cases for covering the entire range of u128 and i128, so I think that covers the bases already. I'll add a comment to this effect.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
arielb1
Aug 21, 2018
Contributor
A: integer overflow can't occur, because only the last point can be uint128::MAX, and only the first point can be 0.
arielb1
reviewed
Aug 21, 2018
| // endpoint a point corresponds to. Whenever a point corresponds to both a start | ||
| // and an end, then we create a unit range for it. | ||
| #[derive(PartialEq, Clone, Copy, Debug)] | ||
| enum Endpoint { |
This comment has been minimized.
This comment has been minimized.
arielb1
Aug 21, 2018
Contributor
A potential simplification:
I just noticed that range boundaries are always "in between" numbers - an Endpoint::Start is a boundary between p-1 and p, and an Endpoint::End is between p and p+1 (that means that an Endpoint::End is equivalent to an Endpoint::Start that immediately follows it). You could do something based on that.
This comment has been minimized.
This comment has been minimized.
arielb1
Aug 21, 2018
Contributor
e.g., I think this is a correct implementation:
/// Represents a border between 2 integers. Because of the "fencepost error",
/// there are be 2^128+1 such borders.
#[derive(Copy, Clone, PartialOrd, Ord, PartialEq, Eq)]
enum Border {
JustBefore(u128),
AfterU128Max
}
impl Border {
fn before(n: u128) -> Self {
Border::JustBefore(n)
}
fn after(n: u128) -> Self {
match n.checked_add(1) {
m => Border::JustBefore(m),
None => Border::AfterU128Max
}
}
fn range_borders(r: IntRange<'_>) -> [Self; 2] {
let (r_lo, r_hi) = r.range.into_inner();
vec![Border::before(r_lo), Border::after(r_hi)]
}
// return the integers between `self` and `to` if `self < to`, `None` otherwise.
fn range_to(self, to: Self, ty: Ty<'tcx>) -> Option<IntRange<'tcx>> {
match (self, to) {
(Border::JustBefore(n), Border::JustBefore(m)) => {
IntRange::from_interval(ty, n, m, RangeEnd::Excluded)
}
(Border::JustBefore(n), Border::AfterU128Max) => {
IntRange::from_interval(ty, n, u128::MAX, RangeEnd::Included)
}
(Border::AfterU128Max, _) => None
}
}
}
// `borders` is the set of borders between equivalence classes - each equivalence
// class is between 2 borders.
let row_borders = m.iter()
.flat_map(|row| IntRange::from_pat(tcx, row[0]))
.flat_map(|range| ctor_range.intersection(&r))
.flat_map(|range| Border::range_borders(range));
let range_borders = Border::range_borders(ctor_range);
let mut borders: Vec<Border> = row_borders.chain(range_borders).collect();
borders.sort();
let ranges = borders.windows(2).map(|window| {
window[0].range_to(window[1]);
})
This comment has been minimized.
This comment has been minimized.
varkor
Aug 21, 2018
Author
Member
I do think this is cleaner :) The logic looks good to me too — I'm going to give it a go.
This comment has been minimized.
This comment has been minimized.
|
LGTM. If you think borders are cleaner enough than endpoints, you can use that, otherwise r=me. (Also, add a test that |
varkor
added some commits
Aug 21, 2018
This comment has been minimized.
This comment has been minimized.
|
@bors r+ |
This comment has been minimized.
This comment has been minimized.
|
|
bors
added
S-waiting-on-bors
and removed
S-waiting-on-review
labels
Aug 21, 2018
This comment has been minimized.
This comment has been minimized.
|
@arielb1: thank you for all your detailed comments and pointing me in the right direction! |
This comment has been minimized.
This comment has been minimized.
bors
added a commit
that referenced
this pull request
Aug 22, 2018
This comment has been minimized.
This comment has been minimized.
|
|
varkor commentedMay 20, 2018
•
edited
This adds a new feature flag
exhaustive_integer_patternsthat enables exhaustive matching of integer types by their values. For example, the following is now accepted:This matching is permitted on all integer (signed/unsigned and char) types. Sensible error messages are also provided. For example:
results in:
This implements rust-lang/rfcs#1550 for #50907. While there hasn't been a full RFC for this feature, it was suggested that this might be a feature that obviously complements the existing exhaustiveness checks (e.g. for
bool) and so a feature gate would be sufficient for now.