Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.
Sign upRFC: Generic integers #2581
Conversation
clarcharr
changed the title
Generic integers RFC
RFC: Generic integers
Oct 28, 2018
clarcharr
force-pushed the
clarcharr:generic_int
branch
from
fb23779
to
38e6de2
Oct 28, 2018
This comment has been minimized.
This comment has been minimized.
|
I like the idea, and I would really love to have efficient and ergonomic strongly-typed bitfields. However, this proposal feels too magical for my taste; there is to much stuff built in to the compiler. I would rather expose a single simple primitive that allows implementing arbitrarily sized ints efficiently. Just a half-baked idea: We only build a #[repr(packed)]
struct int<const Width: usize> {
bits: [Bit; Width],
}
#[repr(packed)]
struct uint<const Width: usize> {
bits: [Bit; Width],
}with the appropriate operations implemented via efficient bit-twiddling methods or compiler intrinsics for performance. |
This comment has been minimized.
This comment has been minimized.
Ekleog
commented
Oct 28, 2018
|
(independently from the remark above)
This RFC spends quite some time explaining the memory layout of `int<N>`
and `uint<N>` types. Is user code allowed to rely on this memory layout,
or is it not?
Intuitively it'd be better if the layout was not actually defined, to
allow for further optimizations, but the current text makes me believe
it is defined. The one case where layout would be mandatory would be for
a not-yet-written `#[repr(bitpacked)]` RFC, in my opinion.
Oh, and
At the time of writing, no known programming language offers this
level of integer generalisation, and if this RFC were to be accepted,
Rust would be the first.
Programming languages with dependent types (F*, etc.) do offer this, and
actually much more. :)
|
This comment has been minimized.
This comment has been minimized.
Well, you can represent a |
|
|
||
| ## Primitive behaviour | ||
|
|
||
| The compiler will have two new built-in integer types: `uint<N>` and `int<N>`, |
This comment has been minimized.
This comment has been minimized.
Centril
Oct 28, 2018
Contributor
Nit: these are two new built-in integer type families or type constructors. You really get 256 new types, not two.
This comment has been minimized.
This comment has been minimized.
clarcharr
Oct 29, 2018
Contributor
I agree with you but I also wonder if this is the language most people would prefer to use. For example, would you consider Vec<T> to also be a family of types, or just a generic type?
This comment has been minimized.
This comment has been minimized.
Centril
Oct 30, 2018
Contributor
Well, the type constructor is really Vec and not Vec<T>; but "two new built-in generic integer types" is I think clear enough.
| The compiler will have two new built-in integer types: `uint<N>` and `int<N>`, | ||
| where `const N: usize`. These will alias to existing `uN` and `iN` types if `N` | ||
| is a power of two and no greater than 128. `usize` and `isize` remain separate | ||
| types due to coherence issues, and `bool` remains separate from `uint<1>` as it |
This comment has been minimized.
This comment has been minimized.
Centril
Oct 28, 2018
Contributor
Elaborate on what those coherence issues are for unfamiliar readers?
This comment has been minimized.
This comment has been minimized.
rkruppe
Oct 28, 2018
Contributor
Traits can be implemented separately & differently for u32, u64, usize, etc. so unifying usize with uN for appropriate N would cause overlapping impls.
This comment has been minimized.
This comment has been minimized.
clarcharr
Oct 29, 2018
Contributor
I may not be able to get to this by the weekend but I will try to remember to elaborate more on this.
| For example, this means that `uint<48>` will take up 8 bytes and have an | ||
| alignment of 8, even though it only has 6 bytes of data. | ||
|
|
||
| `int<N>` store values between -2<sup>N-1</sup> and 2<sup>N-1</sup>-1, and |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
| In addition to the usual casts, `u1` and `i1` can also be cast *to* `bool` via | ||
| `as`, whereas most integer types can only be cast from `bool`. | ||
|
|
||
| For the moment, a monomorphisation error will occur if `N > 128`, to minimise |
This comment has been minimized.
This comment has been minimized.
Centril
Oct 28, 2018
Contributor
This is my biggest concern; I don't think monomorphization of this magnitude errors belong in the language and I would like to see this changed before stabilization.
This comment has been minimized.
This comment has been minimized.
clarcharr
Oct 29, 2018
Contributor
Would you be willing to elaborate more on this? Why are these errors a problem?
This comment has been minimized.
This comment has been minimized.
Centril
Oct 30, 2018
Contributor
The nature of Rust's bounded polymorphism is that type checking should be modular so that you can check a generic function separately from the instantiation and then the instantiation of any parameters satisfying bounds should not result in errors.
This makes for more local error messages (you won't have post monomorphization errors in some location far removed from where instantiation happened...), fewer surprises (because this is how polymorphism works everywhere else in the type system) and possibly also better performance.
Another benefit of avoiding post monomorphization errors is that the need to monomorphize as an implementation strategy is lessened. That said, there are instances where the compiler will cause post monomorphization errors, but those are extremely unlikely to occur in actual code. In the case of N > 128 it is rather quite likely.
The general principle is that you declare up front the requirements (with bounds, etc.) to use / call an object and then you don't impose new and hidden requirements for certain values.
If you want to impose N > 128, then that should be explicitly required in the "signatures", e.g. you should state struct uint<const N: usize> where N <= 128 { .. } (and on the impls...). Otherwise, it should work for all N: usize evaluable at compile time.
This comment has been minimized.
This comment has been minimized.
Nemo157
Oct 31, 2018
Contributor
e.g. you should state
struct uint<const N: usize> where N <= 128 { .. }(and on the impls...)
Is this possible with any currently planned or near future const generics? I don't recall seeing any RFCs that would support const operations in where clauses (although I would love to have them available). This specific case might be possible with some horrible hack like
trait LessThan128 {}
struct IsLessThan128<const N: usize>;
impl LessThan128 for IsLessThan128<0> {}
⋮
impl LessThan128 for IsLessThan128<127> {}
struct uint<const N: usize> where IsLessThan128<N>: LessThan128 { .. }but
This comment has been minimized.
This comment has been minimized.
Centril
Oct 31, 2018
Contributor
@Nemo157 not possible with RFC 2000 but might be with future extensions.
This comment has been minimized.
This comment has been minimized.
rodrimati1992
Nov 1, 2018
•
This is a different formulation which might work(if you can use a const expression in associated types):
struct Bool<const B:bool>;
trait LessThan128<const N: usize> {
type IsIt;
}
impl<const N:usize> LessThan128<N> for () {
type IsIt=Bool<{N<128}>;
}
struct uint<const N: usize>
where (): LessThan128<N,IsIt=Bool<true>>
{ .. }
Edit:
If you want to include an error message in the type error,you can use this to print an error message in the type itself.
struct Str<const S:&'static str>;
struct Bool<const B:bool>;
struct Usize<const B:bool>;
trait Assert<const COND:bool,Msg>{}
impl<const COND:bool,Msg> Assert<COND> for ()
where
():AssertHelper<COND,Output=Bool<true>>,
{}
trait AssertHelper<const COND:bool,Msg>{
type Output;
}
impl<Msg> AssertHelper<true,Msg> for (){
type Output=Bool<true>;
}
impl<Msg> AssertHelper<false,Msg> for (){
type Output=Msg;
}
trait AssertLessThan128<const N:usize>{}
impl<const N:usize,const IS_IT:bool> AssertLessThan128<N> for ()
where
():
LessThan<N,128,IsIt=Bool<IS_IT>>+
Assert<IS_IT, (
Str<"uint cannot be constructed with a size larger than 128,the passed size is:",
Usize<N>>
) >
{}
trait LessThan<const L:usize,const R:usize> {
type IsIt;
}
impl<const L:usize,const R:usize> LessThan<L,R> for () {
type IsIt=Bool<{L<R}>;
}
struct uint<const N: usize>
where (): AssertLessThan128<N>
{ .. }
This comment has been minimized.
This comment has been minimized.
gnzlbg
Nov 4, 2018
Contributor
AFAIK stable Rust does not currently have any monomorphization time errors - if you find any, it's a bug - so this RFC "as is" would be introducing them into the language =/
This comment has been minimized.
This comment has been minimized.
Centril
Nov 4, 2018
Contributor
fn poly<T>() {
let _n: [T; 10000000000000];
}
fn monomorphization_error() {
poly::<u8>(); // OK!
poly::<String>(); // BOOM!
}
fn main() {
monomorphization_error();
}| ## Standard library | ||
|
|
||
| Existing implementations for integer types should be annotated with | ||
| `default impl` as necessary, and most operations should defer to the |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
clarcharr
Oct 29, 2018
Contributor
Essentially, your default impls are your base cases and the other cases will recursively rely upon them. For example, <uint<48>>::count_zeroes would ultimately expand to u64::count_zeroes minus 24. I'll try to elaborate more in the RFC text itself later.
This comment has been minimized.
This comment has been minimized.
Nemo157
Oct 31, 2018
Contributor
You mention that it should be default fn in the text but don't put it in the count_zeros example, including it to the example + showing one of the specialized implementations for a power of two could clarify this somewhat.
|
|
||
| Once const generics and specialisation are implemented and stable, almost all of | ||
| this could be offered as a crate which offers `uint<N>` and `int<N>` types. I | ||
| won't elaborate much on this because I feel that there are many other |
This comment has been minimized.
This comment has been minimized.
Centril
Oct 28, 2018
Contributor
You should :) Do we know that it is actually implementable as a library?
This comment has been minimized.
This comment has been minimized.
Centril
added
T-lang
T-libs
labels
Oct 28, 2018
This comment has been minimized.
This comment has been minimized.
Ekleog
commented
Oct 28, 2018
|
> Programming languages with dependent types (F*, etc.) do offer this, and actually much more. :)
Well, you can represent a [`Fin : Nat -> Type`](https://github.com/idris-lang/Idris-dev/blob/master/libs/base/Data/Fin.idr) type constructor in dependent typing pretty easily, but those are quite inefficient...
Low* (a part of F* that deals with machine types) can represent those
and still compile to C, with for instance the type
```fstar
x:int_32{x >= -128l /\ x < 128l}
```
This type will be compiled (to C) as an `int32_t` type, after F* will
have proven that the value can indeed not be out of the `[-128, 128[`
bounds.
More details can be found about machine integers in F* on
[the F* wiki](https://github.com/FStarLang/FStar/wiki/Machine-integers)
|
| from `uint<N>` to `int<M>` or `uint<M + 1>`, where `M >= N`. | ||
|
|
||
| In addition to the usual casts, `u1` and `i1` can also be cast *to* `bool` via | ||
| `as`, whereas most integer types can only be cast from `bool`. |
This comment has been minimized.
This comment has been minimized.
rkruppe
Oct 28, 2018
Contributor
So, this treats 1 and -1 as true depending on the signedness? I rather like the route of identifying true with -1 instead of 1, but it's not the route Rust has chosen so it might be a bit controversial. From another angle, while it's consistent with true being represented as a single 1 bit, it's also inconsistent with the fact that bool as iN (for currently existing iN) turns true into 1.
Is there motivation for providing these cases instead of just making people write x != 0, other than "we can"?
This comment has been minimized.
This comment has been minimized.
clarcharr
Oct 29, 2018
Contributor
I completely forgot about sign extension when doing this and now that you mention it, it makes sense that if this were offered, then only uint<1> should cast to bool. Unless anyone has any objections when I next get around to revising the text I'll remove both casts to bool.
| `int<N>` store values between -2<sup>N-1</sup> and 2<sup>N-1</sup>-1, and | ||
| `uint<N>` stores values between 0 and 2<sup>N</sup>-1. One unexpected case of | ||
| this is that `i1` represents zero or *negative* one, even though LLVM and other | ||
| places use `i1` to refer to `u1`. This case is left as-is because generic code |
This comment has been minimized.
This comment has been minimized.
rkruppe
Oct 28, 2018
Contributor
Nit: integer types in LLVM don't have inherent signedness, they are bags of bits that are interpreted as signed or unsigned by individual operations, and i1 true is treated as -1 by signed operations (e.g., icmp slt i1 true, i1 false is true -- slt being signed less-than).
This comment has been minimized.
This comment has been minimized.
clarcharr
Oct 29, 2018
Contributor
I didn't actually know this-- I'll be sure to update the text to be accurate there.
| Because sign extension will always be applied, it's safe for the compiler to | ||
| internally treat `uint<N>` as `uint<N.next_power_of_two()>` when doing all | ||
| computations. As a concrete example, this means that adding two `uint<48>` | ||
| values will work exactly like adding two `u64` values, generating exactly the |
This comment has been minimized.
This comment has been minimized.
rkruppe
Oct 28, 2018
Contributor
This needs more thorough discussion. To take this example, an u48 add can overflow the 48 bits, setting some high bits in the 64 bit register. If we take that as given, u48 comparisons (to give just one example) need to first zero-extend the operands to guarantee the comparison works correctly. Conversely, we could zero-extend after arithmetic operations to get the invariant that the high 16 bits are always zero, and then use that knowledge to implement 48 bit comparisons as a plain 64 bit comparisons. Likewise for i48: you'll need sign extensions. In some code sequences compiler optimizations can prove the zero/sign extension redundant, but generally there is no free lunch here -- you need some extending even in code that never changes bit widths. Eliminating sign extensions is in fact the main reason why C compilers care about signed integer addition being UB rather than wrapping.
This comment has been minimized.
This comment has been minimized.
clarcharr
Oct 29, 2018
•
Contributor
I hadn't actually added this section in the original RFC when I requested feedback, and rushed it in because I felt it was necessary. You are right that in most cases, this wouldn't be a no-op, although I'm curious if optimisations could make them so.
This is certainly a case for adding add_unchecked and co. as suggested by… some other RFC issue I don't have the time to look up right now.
Either way, I'll definitely take some time this weekend to revise this section.
| Primitive operations on `int<N>` and `uint<N>` should work exactly like they do | ||
| on `int<N>` and `uint<N>`: overflows should panic when debug assertions are | ||
| enabled, but ignored when they are not. In general, `uint<N>` will be | ||
| zero-extended to the next power of two, and `int<N>` will be sign-extended to |
This comment has been minimized.
This comment has been minimized.
rkruppe
Oct 28, 2018
•
Contributor
Is this intended to be a user-facing guarantee? e.g. given x: &uint<48> are the following two guaranteed to give the same result:
unsafe { *(x as *const uint<48> as *const u64) }*x as u64
This comment has been minimized.
This comment has been minimized.
clarcharr
Oct 29, 2018
Contributor
I would not guarantee this, considering how x as *const uN as *const uM only holds on little-endian systems. Casting after dereferencing should work like casting values as usual, though.
I'll try and remember to clarify this in the text.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
I didn't see any discussion within the RFC about not using It seems like it should at least be mentioned in the alternatives section... |
This comment has been minimized.
This comment has been minimized.
jswrenn
commented
Oct 29, 2018
@mark-i-m's suggestion strikes me as deeply complementary to @clarcharr's proposal. The leading motivation of this RFC is supporting bitfields, but using a number (albeit one guaranteed to be the right number of bits) to represent a bitfield is often a logical type-mismatch. To use the example from the RFC: #[repr(bitfields)]
struct MipsInstruction {
opcode: u6,
rs: u5,
rt: u5,
rd: u5,
shift: u5,
function: u6,
}How often does it really make sense to add or subtract from an With @mark-i-m's suggestion, we can have a more well-typed version of this struct: #[repr(packed)]
struct MipsInstruction {
opcode: [Bit; 6],
rs: u5,
rt: u5,
rd: u5,
shift: u5,
function: [Bit; 6],
}It never makes sense to represent (Disclaimer: I'm not a MIPS expert; I'm just going off the wikibook. I can't tell whether As for the second half of @mark-i-m's suggestion: I can't tell whether it's merely an implementation detail or if it has semantic differences from @clarcharr's proposal. Regardless, it would be a good candidate for discussion in the Alternatives section. |
This comment has been minimized.
This comment has been minimized.
|
Arrays aren't (and can't be changed to be) |
This comment has been minimized.
This comment has been minimized.
|
Finally getting around to a few comments:
I feel that defining memory layout is important because there doesn't seem to be a compelling argument otherwise. Rust very much avoids "undefined"ness whenever possible. People will want to know how these types operate in a
I'll take a look later and add these to the prior art section.
I'll add it to alternatives, although the main reason against this would be that it doesn't allow generic impls even though it is generic. Unless |
This comment has been minimized.
This comment has been minimized.
|
In terms of offering |
This comment has been minimized.
This comment has been minimized.
|
Regarding |
This comment has been minimized.
This comment has been minimized.
|
While |
This comment has been minimized.
This comment has been minimized.
|
My thinking was that However, now that you mention it, IIUC, the size and alignment of types is tracked in bytes in the compiler. So some work would need to be put into making the compiler track bits, but i suspect similar work would need to be put into the current RFC proposal anyway to make bitfields work. Also one other minor thing that I didn't see mentioned in the RFC: does |
This comment has been minimized.
This comment has been minimized.
|
@rkruppe Sorry, I just saw your comment
I was curious why. Does this break some other stability guarantee we have? |
This comment has been minimized.
This comment has been minimized.
|
@mark-i-m the sizes of |
This comment has been minimized.
This comment has been minimized.
|
Hmm... so using |
This comment has been minimized.
This comment has been minimized.
|
@mark-i-m In this case, no. |
This comment has been minimized.
This comment has been minimized.
Arrays (and slices, which have the same layout as arrays of the same length) guarantee that each element is separately addressable -- that makes the Index/IndexMut impls and |
This comment has been minimized.
This comment has been minimized.
jswrenn
commented
Oct 30, 2018
Couldn't we signal that individual auto trait Addressable {}
// Individual bits are unaddressable.
impl !Addressable for Bit {} |
This comment has been minimized.
This comment has been minimized.
|
@jswrenn There was a big discussion of doing this for |
This comment has been minimized.
This comment has been minimized.
|
@jswrenn Auto traits are not assumed to be implemented by default (e.g. |
| # Summary | ||
| [summary]: #summary | ||
|
|
||
| Adds the builtin types `uint<N>` and `int<N>`, allowing integers with an |
This comment has been minimized.
This comment has been minimized.
Centril
Oct 30, 2018
Contributor
Perhaps the types should be called UInt<N> and Int<N> since that is more conventional these days; however, all the primitive types are lower cased so perhaps not... I'm torn.
This comment has been minimized.
This comment has been minimized.
mark-i-m
Nov 1, 2018
Contributor
The weird thing is that they are generic primitives, which I've never seen in a language before...
Perhaps we should do something suitably weird and have new notation for the type?
This comment has been minimized.
This comment has been minimized.
rkruppe
Nov 1, 2018
Contributor
There are plenty of primitive type constructors -- references, raw pointers, tuples, arrays, slices -- they just all have special syntax as well instead of using angle brackets. I don't think special syntax for these new primitives is worth it.
| to be solved during the development of this feature, rather than in this RFC. | ||
| However, here are just a few: | ||
|
|
||
| * Should `uN` and `iN` suffixes for integer literals, for arbitrary `N`, be |
This comment has been minimized.
This comment has been minimized.
Centril
Oct 30, 2018
Contributor
An alternative to this would be to simply have u<7> and i<42> which would be almost as short...
Perhaps that's too short to be understandable? Chances are tho that given the fundamental nature of the types that people would remember it...
This comment has been minimized.
This comment has been minimized.
mark-i-m
Nov 1, 2018
Contributor
Actually, I thought about this too. uint and int seem inconsistent with the other integer types somehow, so maybe u and i are the right choice?
This comment has been minimized.
This comment has been minimized.
mark-i-m
Nov 1, 2018
Contributor
Also, either way, would we need to make a breaking change to make these identifiers reserved?
This comment has been minimized.
This comment has been minimized.
Centril
Nov 1, 2018
Contributor
I don't think we need to make breaking changes; you can always shadow the type with something else afaik.
This comment has been minimized.
This comment has been minimized.
clarcharr
Nov 8, 2018
Contributor
I'm adding this to the alternatives section but stating against it because i is such a common variable name.
This comment has been minimized.
This comment has been minimized.
scottmcm
Nov 9, 2018
•
Member
Well, it's a different namespace so there isn't a conflict. The following "works":
struct i<i> { i: i };
let i: i<i32> = i { i: 4 };(Said without actually taking a position on whether I think i would be a good name for the type constructor in question.)
| signed simply depends on whether its lower bound is negative. | ||
|
|
||
| The primary reason for leaving this out is… well, it's a lot harder to | ||
| implement, and could be added in the future as an extension. Longer-term, we |
This comment has been minimized.
This comment has been minimized.
Centril
Oct 30, 2018
Contributor
Perhaps... tho you should elaborate on the implementation difficulty as you see it.
...but the ranges feel also much more useful generally for code that isn't interested in space optimizations and such things but rather want to enforce domain logic in a strict and more type-safe fashion. For example, you might want to represent a card in a deck as type Rank = uint<1..=13>;. Then you know by construction that once you have your the_rank : Rank then it is correct and you won't have to recheck things. Of course, the other, more elaborate type safe way is to use an enum, but it might also be less convenient to setup than a simple range.
I think this alternative should be seriously entertained as the way to go; then you can use type aliases / newtypes to map to the range based types, e.g. type uint<const N: usize> = urange<{0..=pow(2, N)}>;.
This comment has been minimized.
This comment has been minimized.
clarcharr
Oct 31, 2018
Contributor
You mean, urange<{0..=pow(2, N) - 1}>. :p
But actually, you're right. I should seriously clarify that and write it down in the alternatives.
| ```rust | ||
| impl<const N: usize> uint<N> { | ||
| fn count_zeros(self) -> u32 { | ||
| let M = N.next_power_of_two(); |
This comment has been minimized.
This comment has been minimized.
Nemo157
Oct 31, 2018
Contributor
I would assume this has to be const M = N.next_power_of_two(); since you can't use a non-const value in the type parameter on the next line. This appears to not be allowed by RFC2000 though.
It seems that this could be written
impl<const N: usize> uint<N> {
fn count_zeros(self) -> u32 {
let zeros = (self as uint<{ N.next_power_of_two() }>).count_zeros();
zeros + (N.next_power_of_two() - N)
}
}but I'm not certain if the const expression there would be accepted by the current const generics implementation.
This comment has been minimized.
This comment has been minimized.
clarcharr
Nov 8, 2018
Contributor
You're right and I replaced let M with const M: usize for now. I think that's valid.
| `(bit_size_of::<T>() + 7) / 8 == size_of::<T>()`. All types would have a bit | ||
| size, allowing for a future `repr(bitpacked)` extension which packs all values | ||
| in a struct or enum variant into the smallest number of bytes possible, given | ||
| their bit sizes. Doing so would prevent referencing the fields of the struct, |
This comment has been minimized.
This comment has been minimized.
Nokel81
Oct 31, 2018
Contributor
Since you are currently limiting the size to 128 bits (but maybe more in the future) wouldn't it be possible to define a reference to a bitpacked generic integer as a (ref, (u16, u16)) where the second pair is a (start, length) pair within?
This comment has been minimized.
This comment has been minimized.
rodrimati1992
referenced this pull request
Nov 2, 2018
Open
Tracking issue for const generics (RFC 2000) #44580
| where `const N: usize`. These will alias to existing `uN` and `iN` types if `N` | ||
| is a power of two and no greater than 128. `usize` and `isize` remain separate | ||
| types due to coherence issues, and `bool` remains separate from `uint<1>` as it | ||
| does not have most of the functionality that integers have. |
This comment has been minimized.
This comment has been minimized.
gnzlbg
Nov 4, 2018
•
Contributor
Rust's bool is guaranteed to have the same bit-width as C's bool so even if it were to have the same functionality as the integer types, it wouldn't have the same layout being proposed here for int<1>/uint<1> (or the layout of these types would need to be explicitly specified to be the same as C's bool here).
This comment has been minimized.
This comment has been minimized.
clarcharr
Nov 4, 2018
Contributor
I'm a bit confused what you're trying to say. These types would explicitly underly u8 and have 1 bit of data, so uint<1> and bool have exactly the same in-memory layout. Would you mind elaborating what you mean?
This comment has been minimized.
This comment has been minimized.
gnzlbg
Nov 4, 2018
Contributor
The current docs, PRs, and RFCs about this are confusing, so I've commented in the relevant issue asking for definitive clarification: rust-rfcs/unsafe-code-guidelines#9 (comment)
|
|
||
| As a hard rule, `uint<N>` will always have its upper `N.next_power_of_two() - N` | ||
| bits set to zero, and similarly, `int<N>`'s upper bits will be sign-extended. To | ||
| compute niche optimizations for enums, we'll have to take this into account. |
This comment has been minimized.
This comment has been minimized.
gnzlbg
Nov 4, 2018
Contributor
So IIUC to be able to use this bits for niche optimizations, one would need to put in there something that's "not zero" ? This is a bit different from &T/NonNull/NonZero where the niche optimizations are enabled by the guarantee that the storage cannot ever be zero.
This comment has been minimized.
This comment has been minimized.
clarcharr
Nov 4, 2018
Contributor
I'm going to try and elaborate this with examples, because examples explicitly helped me understand this in the internals thread. Will respond back once I've done that.
This comment has been minimized.
This comment has been minimized.
rkruppe
Nov 4, 2018
Contributor
We already have such niches -- bool is a byte where 0 and 1 are the only values, char has a niche that I can never remember but it doesn't include U+0000 for obvious reasons -- and rustc exploits them when laying out e.g. Option<Option<...Option<bool>...>>
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
I think I would prefer to get const generics first and see if we can build these in a library, and if we can't, see which language extensions we would need to do so before considering adding them to the language (in particular given that there is no RFC for One of the motivating examples for const generics are bounded integers (e.g. |
This comment has been minimized.
This comment has been minimized.
rodrimati1992
commented
Nov 4, 2018
•
As far as I remember Quoting from the RFC on const-generics:
If you wanted to implement Bounded before const generics can mention generic parameters,you would have to have at least 2 different versions, As far as specifying niches,could we not just have a trait with an associated constant specifying which bit-patterns are invalid for the type:
|
This comment has been minimized.
This comment has been minimized.
|
Now that I finally have the time to work on this, I'm going to try and integrate all of the suggestions into the RFC text before the end of the day. No idea how much I'll get done, but we'll see. I personally wasn't expecting this RFC to get accepted any time soon, and postponing it now makes a lot of sense even to me as the RFC author. Although I did want to post this for the visibility and design so that someone else can take this work and discussion and make another RFC once const generics are stable and the desires of the community are clear. Additionally, a lot of the things here I think should be planned to integrate into the language shortly after const generics are available, even though not all of them are the best decision. |
This comment has been minimized.
This comment has been minimized.
|
To clarify, I don't intend to close this until the FCP runs through. So, I'll try to constantly revise the RFC until it reaches that point. :) |
This comment has been minimized.
This comment has been minimized.
rodrimati1992
commented
Nov 4, 2018
•
|
Because it seems that nobody had seen the above code for how to constrain on const generics without a syntax for it in where clauses (tell me if this would not work):
Even if generic integers are not stabilized with this approach to constraints on constants,it should at least not be a blocker for accepting the RFC. |
This comment has been minimized.
This comment has been minimized.
|
Thinking about the notch optimizations in this, note that the notches are ranges not bits. So |
This comment has been minimized.
This comment has been minimized.
|
@scottmcm The problem is that if we want to use these for bitfields, we would then need to guarantee that the niche optimizations will always be applied to use the smallest number of bits. |
This comment has been minimized.
This comment has been minimized.
|
@mark-i-m Well, using these for bitfields will necessarily require a custom |
This comment has been minimized.
This comment has been minimized.
leonardo-m
commented
Nov 5, 2018
|
This feature, if desired, should be designed to be as orthogonal as possible to a similar but distinct feature: built-in ranged integers (as in Ada language). (And if I must choose what of the two features to implement in Rust, I think ranged integers are more useful). |
This comment has been minimized.
This comment has been minimized.
Really? Why do you say that?
This is true, but IMHO it seems less obvious from a UX perspective how to pack a Also, I'm really skeptical of stabilizing/guaranteeing optimizations, like niche-filling because then those effectively become part of the language and have stability guarantees, which seems very sketchy. |
This comment has been minimized.
This comment has been minimized.
Because I can soundly |
This comment has been minimized.
This comment has been minimized.
|
You can use a pair of unsigned integers to specify a wraparound range, you don't need a signed version separately.
IMO that causes all sorts of issues. Whether the trait is implemented must now be evaluated by several different things, including unsafety checking and type layout. |
This comment has been minimized.
This comment has been minimized.
rodrimati1992
commented
Nov 5, 2018
•
I am not expecting to get To clarify what I would expect if a trait like this were accepted,here is an newtype for u32 that disallows the maximum value in its constructor. Edit:This may not be the best example,since I will get told that ranged integers are a superset of this example.
|
This comment has been minimized.
This comment has been minimized.
|
Yes, I think that is harder to implement and more prone to subtle soundness issues. Note that your main function needs an |
This comment has been minimized.
This comment has been minimized.
|
Finally managed to get around to updating this. Let me know if there's anything I missed, or if you have any additional things to clarify. :) |
| For example, here's a modified version of our previous example: | ||
|
|
||
| ```rust | ||
| } |
This comment has been minimized.
This comment has been minimized.
| is required for `count_zeros`, and thus it's specialized. However, `Default` | ||
| does not require any specialisation, and can just be replaced with a single | ||
| generic impl. | ||
| #[repr(C, bitpacked)] |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
clarcharr
Nov 9, 2018
Contributor
Whoops, I think I accidentally copy-pasted the wrong thing into my editor and broke up the text. I'll take a look tomorrow morning and fix it.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
While I agree with postponing this until const generics land, I do think it makes sense to have enough compiler support to aid in using these in (Also, as a procedural note, if we postpone this, we need to add a note to the lang team agenda to revisit it at a defined later time.) |
This comment has been minimized.
This comment has been minimized.
|
As noted in the RFC text, there is the option of using these types just to get better interfaces while still leaving bit fields out of the language and letting macros generate the code to insert or extract those bits into/from |
This comment has been minimized.
This comment has been minimized.
|
Yep! The goal of this is just the generic integer types, and how exactly bitfields are done is entirely separate. I personally feel like the benefits of having the generic integer types in the compiler make it worth it, but the point of the RFC is to determine that. |
clarcharr commentedOct 28, 2018
Adds the builtin types
uint<N>andint<N>, allowing integers with an arbitrary size in bits. For now, restricts N ≤ 128.To everyone who helped on the internals thread to review this RFC, particularly @Centril, @rkruppe, @scottmcm, @comex, and @ibkevg.