New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weโ€™ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Generic integers #2581

Open
wants to merge 3 commits into
base: master
from

Conversation

@clarcharr
Contributor

clarcharr commented Oct 28, 2018

๐Ÿ–ผ Rendered

๐Ÿ“ Summary

Adds the builtin types uint<N> and int<N>, allowing integers with an arbitrary size in bits. For now, restricts N โ‰ค 128.

๐Ÿ’– Thanks

To everyone who helped on the internals thread to review this RFC, particularly @Centril, @rkruppe, @scottmcm, @comex, and @ibkevg.

@clarcharr clarcharr changed the title from Generic integers RFC to RFC: Generic integers Oct 28, 2018

@mark-i-m

This comment has been minimized.

Contributor

mark-i-m commented Oct 28, 2018

I like the idea, and I would really love to have efficient and ergonomic strongly-typed bitfields. However, this proposal feels too magical for my taste; there is to much stuff built in to the compiler. I would rather expose a single simple primitive that allows implementing arbitrarily sized ints efficiently.

Just a half-baked idea: We only build a Bit type into the language, which is guaranteed to be 1 bit large (though it must be padded in a struct unless you use repr(packed)). All of the other integer types are defined as follows:

#[repr(packed)]
struct int<const Width: usize> {
  bits: [Bit; Width],
}

#[repr(packed)]
struct uint<const Width: usize> {
  bits: [Bit; Width],
}

with the appropriate operations implemented via efficient bit-twiddling methods or compiler intrinsics for performance.

@Ekleog

This comment has been minimized.

Ekleog commented Oct 28, 2018

@Centril

This comment has been minimized.

Contributor

Centril commented Oct 28, 2018

Programming languages with dependent types (F*, etc.) do offer this, and actually much more. :)

Well, you can represent a Fin : Nat -> Type type constructor in dependent typing pretty easily, but those are quite inefficient...

## Primitive behaviour
The compiler will have two new built-in integer types: `uint<N>` and `int<N>`,

This comment has been minimized.

@Centril

Centril Oct 28, 2018

Contributor

Nit: these are two new built-in integer type families or type constructors. You really get 256 new types, not two.

This comment has been minimized.

@clarcharr

clarcharr Oct 29, 2018

Contributor

I agree with you but I also wonder if this is the language most people would prefer to use. For example, would you consider Vec<T> to also be a family of types, or just a generic type?

This comment has been minimized.

@Centril

Centril Oct 30, 2018

Contributor

Well, the type constructor is really Vec and not Vec<T>; but "two new built-in generic integer types" is I think clear enough.

The compiler will have two new built-in integer types: `uint<N>` and `int<N>`,
where `const N: usize`. These will alias to existing `uN` and `iN` types if `N`
is a power of two and no greater than 128. `usize` and `isize` remain separate
types due to coherence issues, and `bool` remains separate from `uint<1>` as it

This comment has been minimized.

@Centril

Centril Oct 28, 2018

Contributor

Elaborate on what those coherence issues are for unfamiliar readers?

This comment has been minimized.

@rkruppe

rkruppe Oct 28, 2018

Contributor

Traits can be implemented separately & differently for u32, u64, usize, etc. so unifying usize with uN for appropriate N would cause overlapping impls.

This comment has been minimized.

@clarcharr

clarcharr Oct 29, 2018

Contributor

I may not be able to get to this by the weekend but I will try to remember to elaborate more on this.

For example, this means that `uint<48>` will take up 8 bytes and have an
alignment of 8, even though it only has 6 bytes of data.
`int<N>` store values between -2<sup>N-1</sup> and 2<sup>N-1</sup>-1, and

This comment has been minimized.

@Centril

Centril Oct 28, 2018

Contributor

store/stores, pick one :)

This comment has been minimized.

@clarcharr

clarcharr Nov 8, 2018

Contributor

Store was a typo. :p

In addition to the usual casts, `u1` and `i1` can also be cast *to* `bool` via
`as`, whereas most integer types can only be cast from `bool`.
For the moment, a monomorphisation error will occur if `N > 128`, to minimise

This comment has been minimized.

@Centril

Centril Oct 28, 2018

Contributor

This is my biggest concern; I don't think monomorphization of this magnitude errors belong in the language and I would like to see this changed before stabilization.

This comment has been minimized.

@clarcharr

clarcharr Oct 29, 2018

Contributor

Would you be willing to elaborate more on this? Why are these errors a problem?

This comment has been minimized.

@Centril

Centril Oct 30, 2018

Contributor

The nature of Rust's bounded polymorphism is that type checking should be modular so that you can check a generic function separately from the instantiation and then the instantiation of any parameters satisfying bounds should not result in errors.

This makes for more local error messages (you won't have post monomorphization errors in some location far removed from where instantiation happened...), fewer surprises (because this is how polymorphism works everywhere else in the type system) and possibly also better performance.
Another benefit of avoiding post monomorphization errors is that the need to monomorphize as an implementation strategy is lessened. That said, there are instances where the compiler will cause post monomorphization errors, but those are extremely unlikely to occur in actual code. In the case of N > 128 it is rather quite likely.

The general principle is that you declare up front the requirements (with bounds, etc.) to use / call an object and then you don't impose new and hidden requirements for certain values.

If you want to impose N > 128, then that should be explicitly required in the "signatures", e.g. you should state struct uint<const N: usize> where N <= 128 { .. } (and on the impls...). Otherwise, it should work for all N: usize evaluable at compile time.

This comment has been minimized.

@Nemo157

Nemo157 Oct 31, 2018

Contributor

e.g. you should state struct uint<const N: usize> where N <= 128 { .. } (and on the impls...)

Is this possible with any currently planned or near future const generics? I don't recall seeing any RFCs that would support const operations in where clauses (although I would love to have them available). This specific case might be possible with some horrible hack like

trait LessThan128 {}
struct IsLessThan128<const N: usize>;

impl LessThan128 for IsLessThan128<0> {}
โ‹ฎ
impl LessThan128 for IsLessThan128<127> {}

struct uint<const N: usize> where IsLessThan128<N>: LessThan128 { .. }

but ๐Ÿคข

This comment has been minimized.

@Centril

Centril Oct 31, 2018

Contributor

@Nemo157 not possible with RFC 2000 but might be with future extensions.

This comment has been minimized.

@rodrimati1992

rodrimati1992 Nov 1, 2018

This is a different formulation which might work(if you can use a const expression in associated types):

struct Bool<const B:bool>;

trait LessThan128<const N: usize> {
    type IsIt;
}

impl<const N:usize> LessThan128<N> for () {
    type IsIt=Bool<{N<128}>;
}

struct uint<const N: usize> 
where (): LessThan128<N,IsIt=Bool<true>> 
{ .. }

Edit:

If you want to include an error message in the type error,you can use this to print an error message in the type itself.


struct Str<const S:&'static str>;

struct Bool<const B:bool>;

struct Usize<const B:bool>;




trait Assert<const COND:bool,Msg>{}

impl<const COND:bool,Msg> Assert<COND> for ()
where
    ():AssertHelper<COND,Output=Bool<true>>,
{}


trait AssertHelper<const COND:bool,Msg>{
    type Output;
}

impl<Msg> AssertHelper<true,Msg> for (){
    type Output=Bool<true>;
}

impl<Msg> AssertHelper<false,Msg> for (){
    type Output=Msg;
}



trait AssertLessThan128<const N:usize>{}


impl<const N:usize,const IS_IT:bool> AssertLessThan128<N> for ()
where
    ():
        LessThan<N,128,IsIt=Bool<IS_IT>>+
        Assert<IS_IT, (
            Str<"uint cannot be constructed with a size larger than 128,the passed size is:",
            Usize<N>>
        ) >
{}


trait LessThan<const L:usize,const R:usize> {
    type IsIt;
}

impl<const L:usize,const R:usize> LessThan<L,R> for () {
    type IsIt=Bool<{L<R}>;
}

struct uint<const N: usize> 
where (): AssertLessThan128<N>
{ .. }

This comment has been minimized.

@gnzlbg

gnzlbg Nov 4, 2018

Contributor

AFAIK stable Rust does not currently have any monomorphization time errors - if you find any, it's a bug - so this RFC "as is" would be introducing them into the language =/

This comment has been minimized.

@Centril

Centril Nov 4, 2018

Contributor

@gnzlbg

fn poly<T>() {
    let _n: [T; 10000000000000];
}

fn monomorphization_error() {
    poly::<u8>(); // OK!
    poly::<String>(); // BOOM!
}

fn main() {
    monomorphization_error();
}
## Standard library
Existing implementations for integer types should be annotated with
`default impl` as necessary, and most operations should defer to the

This comment has been minimized.

@Centril

Centril Oct 28, 2018

Contributor

I'm not sure why default impl would be used here... elaborate?

This comment has been minimized.

@clarcharr

clarcharr Oct 29, 2018

Contributor

Essentially, your default impls are your base cases and the other cases will recursively rely upon them. For example, <uint<48>>::count_zeroes would ultimately expand to u64::count_zeroes minus 24. I'll try to elaborate more in the RFC text itself later.

This comment has been minimized.

@Nemo157

Nemo157 Oct 31, 2018

Contributor

You mention that it should be default fn in the text but don't put it in the count_zeros example, including it to the example + showing one of the specialized implementations for a power of two could clarify this somewhat.

Once const generics and specialisation are implemented and stable, almost all of
this could be offered as a crate which offers `uint<N>` and `int<N>` types. I
won't elaborate much on this because I feel that there are many other

This comment has been minimized.

@Centril

Centril Oct 28, 2018

Contributor

You should :) Do we know that it is actually implementable as a library?

This comment has been minimized.

@clarcharr

clarcharr Oct 29, 2018

Contributor

I will try to get to this this weekend.

@Ekleog

This comment has been minimized.

Ekleog commented Oct 28, 2018

from `uint<N>` to `int<M>` or `uint<M + 1>`, where `M >= N`.
In addition to the usual casts, `u1` and `i1` can also be cast *to* `bool` via
`as`, whereas most integer types can only be cast from `bool`.

This comment has been minimized.

@rkruppe

rkruppe Oct 28, 2018

Contributor

So, this treats 1 and -1 as true depending on the signedness? I rather like the route of identifying true with -1 instead of 1, but it's not the route Rust has chosen so it might be a bit controversial. From another angle, while it's consistent with true being represented as a single 1 bit, it's also inconsistent with the fact that bool as iN (for currently existing iN) turns true into 1.

Is there motivation for providing these cases instead of just making people write x != 0, other than "we can"?

This comment has been minimized.

@clarcharr

clarcharr Oct 29, 2018

Contributor

I completely forgot about sign extension when doing this and now that you mention it, it makes sense that if this were offered, then only uint<1> should cast to bool. Unless anyone has any objections when I next get around to revising the text I'll remove both casts to bool.

`int<N>` store values between -2<sup>N-1</sup> and 2<sup>N-1</sup>-1, and
`uint<N>` stores values between 0 and 2<sup>N</sup>-1. One unexpected case of
this is that `i1` represents zero or *negative* one, even though LLVM and other
places use `i1` to refer to `u1`. This case is left as-is because generic code

This comment has been minimized.

@rkruppe

rkruppe Oct 28, 2018

Contributor

Nit: integer types in LLVM don't have inherent signedness, they are bags of bits that are interpreted as signed or unsigned by individual operations, and i1 true is treated as -1 by signed operations (e.g., icmp slt i1 true, i1 false is true -- slt being signed less-than).

This comment has been minimized.

@clarcharr

clarcharr Oct 29, 2018

Contributor

I didn't actually know this-- I'll be sure to update the text to be accurate there.

Because sign extension will always be applied, it's safe for the compiler to
internally treat `uint<N>` as `uint<N.next_power_of_two()>` when doing all
computations. As a concrete example, this means that adding two `uint<48>`
values will work exactly like adding two `u64` values, generating exactly the

This comment has been minimized.

@rkruppe

rkruppe Oct 28, 2018

Contributor

This needs more thorough discussion. To take this example, an u48 add can overflow the 48 bits, setting some high bits in the 64 bit register. If we take that as given, u48 comparisons (to give just one example) need to first zero-extend the operands to guarantee the comparison works correctly. Conversely, we could zero-extend after arithmetic operations to get the invariant that the high 16 bits are always zero, and then use that knowledge to implement 48 bit comparisons as a plain 64 bit comparisons. Likewise for i48: you'll need sign extensions. In some code sequences compiler optimizations can prove the zero/sign extension redundant, but generally there is no free lunch here -- you need some extending even in code that never changes bit widths. Eliminating sign extensions is in fact the main reason why C compilers care about signed integer addition being UB rather than wrapping.

This comment has been minimized.

@clarcharr

clarcharr Oct 29, 2018

Contributor

I hadn't actually added this section in the original RFC when I requested feedback, and rushed it in because I felt it was necessary. You are right that in most cases, this wouldn't be a no-op, although I'm curious if optimisations could make them so.

This is certainly a case for adding add_unchecked and co. as suggested byโ€ฆ some other RFC issue I don't have the time to look up right now.

Either way, I'll definitely take some time this weekend to revise this section.

Primitive operations on `int<N>` and `uint<N>` should work exactly like they do
on `int<N>` and `uint<N>`: overflows should panic when debug assertions are
enabled, but ignored when they are not. In general, `uint<N>` will be
zero-extended to the next power of two, and `int<N>` will be sign-extended to

This comment has been minimized.

@rkruppe

rkruppe Oct 28, 2018

Contributor

Is this intended to be a user-facing guarantee? e.g. given x: &uint<48> are the following two guaranteed to give the same result:

  • unsafe { *(x as *const uint<48> as *const u64) }
  • *x as u64

This comment has been minimized.

@clarcharr

clarcharr Oct 29, 2018

Contributor

I would not guarantee this, considering how x as *const uN as *const uM only holds on little-endian systems. Casting after dereferencing should work like casting values as usual, though.

I'll try and remember to clarify this in the text.

This comment has been minimized.

@rkruppe

rkruppe Oct 29, 2018

Contributor

Would you guarantee it on little-endian systems?

This comment has been minimized.

@clarcharr

clarcharr Nov 9, 2018

Contributor

I'm adding that as an unresolved question.

@Mark-Simulacrum

This comment has been minimized.

Member

Mark-Simulacrum commented Oct 29, 2018

I didn't see any discussion within the RFC about not using uN directly. It seems like that could make quite a bit of sense -- at least for an initial implementation.

It seems like it should at least be mentioned in the alternatives section...

@jswrenn

This comment has been minimized.

jswrenn commented Oct 29, 2018

We only build a Bit type into the language, which is guaranteed to be 1 bit large (though it must be padded in a struct unless you use repr(packed)).

@mark-i-m's suggestion strikes me as deeply complementary to @clarcharr's proposal. The leading motivation of this RFC is supporting bitfields, but using a number (albeit one guaranteed to be the right number of bits) to represent a bitfield is often a logical type-mismatch. To use the example from the RFC:

#[repr(bitfields)]
struct MipsInstruction {
    opcode: u6,
    rs: u5,
    rt: u5,
    rd: u5,
    shift: u5,
    function: u6,
}

How often does it really make sense to add or subtract from an opcode? By representing it as an unsigned six-bit integer, we signal that arithmetic on an opcode is a well-defined operation.

With @mark-i-m's suggestion, we can have a more well-typed version of this struct:

#[repr(packed)]
struct MipsInstruction {
    opcode: [Bit; 6],
    rs: u5,
    rt: u5,
    rd: u5,
    shift: u5,
    function: [Bit; 6],
}

It never makes sense to represent opcode or function as numbers, so we don't. Conversely, shift is very clearly numeric.

(Disclaimer: I'm not a MIPS expert; I'm just going off the wikibook. I can't tell whether r{s,t,d} ought to be [Bit; 5] or u5.)


As for the second half of @mark-i-m's suggestion: I can't tell whether it's merely an implementation detail or if it has semantic differences from @clarcharr's proposal. Regardless, it would be a good candidate for discussion in the Alternatives section.

@rkruppe

This comment has been minimized.

Contributor

rkruppe commented Oct 29, 2018

Arrays aren't (and can't be changed to be) repr(bitpacked), so [Bit; N] would actually occupy N bytes. While one could instead provide a dedicated Bitvector<N> type, that would have a lot of overlap with uint<N> (which have many applications not covered by Bitvector<N>). I also don't really see the typing benefit @jswrenn suggests: if you care about that, you'd want to use more fine-grained newtypes to distinguish (in this MIPS instruction encoding example) the opcode from the function field or the shift amount from the register numbers. Once you have these newtypes, uint<5> vs Bitvector<5> becomes largely irrelevant as it's an implementation detail. (In fact, one could implement Bitvector<N> as a newtype around uint<N>.)

@clarcharr

This comment has been minimized.

Contributor

clarcharr commented Oct 29, 2018

Finally getting around to a few comments:

Is user code allowed to rely on this memory layout, or is it not? Intuitively it'd be better if the layout was not actually defined, to allow for further optimizations, but the current text makes me believe it is defined. The one case where layout would be mandatory would be for a not-yet-written #[repr(bitpacked)] RFC, in my opinion.

I feel that defining memory layout is important because there doesn't seem to be a compelling argument otherwise. Rust very much avoids "undefined"ness whenever possible. People will want to know how these types operate in a #[repr(C)] struct or in an array; would [uint<48>; 2] take up six bytes or eight? Establishing this is crucial for stabilisation imho.

Programming languages with dependent types (F*, etc.) do offer this, and actually much more. :)

I'll take a look later and add these to the prior art section.

I didn't see any discussion within the RFC about not using uN directly. It seems like that could make quite a bit of sense -- at least for an initial implementation.

It seems like it should at least be mentioned in the alternatives section...

I'll add it to alternatives, although the main reason against this would be that it doesn't allow generic impls even though it is generic. Unless uN were just an alias for uint<N>, which seems unnecessary to me.

@clarcharr

This comment has been minimized.

Contributor

clarcharr commented Oct 29, 2018

In terms of offering Bit instead of uint-- I'll definitely add this to the alternatives section. Essentially, offering some kind of bits<N> type would be similar to uint<N>, although completely orthogonal to uint<N> and presumably only allowing bit operations, not arithmetic. I still believe that uint<N> is better overall, but thoroughness is important.

@mark-i-m

This comment has been minimized.

Contributor

mark-i-m commented Oct 30, 2018

Regarding Bit, I had intended it as a way to not add uint<N> and int<N> as language features. Rather, they could be implemented in a crate as wrappers around [Bit; N]. My motivation is just that adding uint<N> seems like a lot of magic, and I would like to reduce magic.

@clarcharr

This comment has been minimized.

Contributor

clarcharr commented Oct 30, 2018

While Bit by itself is less magic than bits<N>, there's a lot of magic for making arrays compact for just one particular type. How does Bit apply in most type contexts? Is (Bit, Bit) compact? Etc.

@mark-i-m

This comment has been minimized.

Contributor

mark-i-m commented Oct 30, 2018

My thinking was that Bit had a size of one bit and alignment of 1 bytes. So you need to use repr(packed) to get rid of padding, as with other types.

However, now that you mention it, IIUC, the size and alignment of types is tracked in bytes in the compiler. So some work would need to be put into making the compiler track bits, but i suspect similar work would need to be put into the current RFC proposal anyway to make bitfields work.

Also one other minor thing that I didn't see mentioned in the RFC: does size_of::<uint<N>>() just round up to the nearest byte? or the nearest byte when rounded up to a power of two?

@mark-i-m

This comment has been minimized.

Contributor

mark-i-m commented Oct 30, 2018

@rkruppe Sorry, I just saw your comment

Arrays aren't (and can't be changed to be) repr(bitpacked), so [Bit; N] would actually occupy N bytes.

I was curious why. Does this break some other stability guarantee we have?

@clarcharr

This comment has been minimized.

Contributor

clarcharr commented Oct 30, 2018

@mark-i-m the sizes of uint<N> are the same as the larger power of two size. So, uint<48> has the same size as u64.

@mark-i-m

This comment has been minimized.

Contributor

mark-i-m commented Oct 30, 2018

Hmm... so using repr(packed) actually changes the size of the type?

@clarcharr

This comment has been minimized.

Contributor

clarcharr commented Oct 30, 2018

@mark-i-m In this case, no. repr(packed) allows alignment to break, but in this case, uint<48> would have an alignment and size of 8. In this case, we'd need a different form of repr(packed) which allows both size and alignment to break, shoving things down to individual bits. That's mostly what repr(bitfields) is in the RFC; originally, I recommended bitpacked as a name but I changed it and I don't remember why.

@rkruppe

This comment has been minimized.

Contributor

rkruppe commented Oct 30, 2018

@mark-i-m

I was curious why. Does this break some other stability guarantee we have?

Arrays (and slices, which have the same layout as arrays of the same length) guarantee that each element is separately addressable -- that makes the Index/IndexMut impls and iter()/iter_mut() tick, for starters. Even though for all currently existing types that would remain so when arrays start become bitpacked, it would mean we'd have to add new bounds to those things, which would break generic code that doesn't have those bounds.

@jswrenn

This comment has been minimized.

jswrenn commented Oct 30, 2018

Even though for all currently existing types that would remain so when arrays start become bitpacked, it would mean we'd have to add new bounds to those things, which would break generic code that doesn't have those bounds.

Couldn't we signal that individual Bits are unaddressable with an Addressable auto trait that isn't implemented for Bit? E.g.:

auto trait Addressable {}

// Individual bits are unaddressable.
impl !Addressable for Bit {}
@clarcharr

This comment has been minimized.

Contributor

clarcharr commented Oct 30, 2018

@jswrenn There was a big discussion of doing this for DynSized with extern types, and the verdict was to not there. I feel like an Addressable bound would have similar problems.

@rkruppe

This comment has been minimized.

Contributor

rkruppe commented Oct 30, 2018

@jswrenn Auto traits are not assumed to be implemented by default (e.g. fn foo<T>() does not imply T: Send). So beyond just an auto trait, you'd need a new opt-out default bound like Sized, and as @clarcharr mentioned those have been rejected for other purposes in the past.

# Summary
[summary]: #summary
Adds the builtin types `uint<N>` and `int<N>`, allowing integers with an

This comment has been minimized.

@Centril

Centril Oct 30, 2018

Contributor

Perhaps the types should be called UInt<N> and Int<N> since that is more conventional these days; however, all the primitive types are lower cased so perhaps not... I'm torn.

This comment has been minimized.

@mark-i-m

mark-i-m Nov 1, 2018

Contributor

The weird thing is that they are generic primitives, which I've never seen in a language before...

Perhaps we should do something suitably weird and have new notation for the type?

This comment has been minimized.

@rkruppe

rkruppe Nov 1, 2018

Contributor

There are plenty of primitive type constructors -- references, raw pointers, tuples, arrays, slices -- they just all have special syntax as well instead of using angle brackets. I don't think special syntax for these new primitives is worth it.

to be solved during the development of this feature, rather than in this RFC.
However, here are just a few:
* Should `uN` and `iN` suffixes for integer literals, for arbitrary `N`, be

This comment has been minimized.

@Centril

Centril Oct 30, 2018

Contributor

An alternative to this would be to simply have u<7> and i<42> which would be almost as short...
Perhaps that's too short to be understandable? Chances are tho that given the fundamental nature of the types that people would remember it...

This comment has been minimized.

@mark-i-m

mark-i-m Nov 1, 2018

Contributor

Actually, I thought about this too. uint and int seem inconsistent with the other integer types somehow, so maybe u and i are the right choice?

This comment has been minimized.

@mark-i-m

mark-i-m Nov 1, 2018

Contributor

Also, either way, would we need to make a breaking change to make these identifiers reserved?

This comment has been minimized.

@Centril

Centril Nov 1, 2018

Contributor

I don't think we need to make breaking changes; you can always shadow the type with something else afaik.

This comment has been minimized.

@clarcharr

clarcharr Nov 8, 2018

Contributor

I'm adding this to the alternatives section but stating against it because i is such a common variable name.

This comment has been minimized.

@scottmcm

scottmcm Nov 9, 2018

Member

Well, it's a different namespace so there isn't a conflict. The following "works":

struct i<i> { i: i };
let i: i<i32> = i { i: 4 };

(Said without actually taking a position on whether I think i would be a good name for the type constructor in question.)

signed simply depends on whether its lower bound is negative.
The primary reason for leaving this out isโ€ฆ well, it's a lot harder to
implement, and could be added in the future as an extension. Longer-term, we

This comment has been minimized.

@Centril

Centril Oct 30, 2018

Contributor

Perhaps... tho you should elaborate on the implementation difficulty as you see it.

...but the ranges feel also much more useful generally for code that isn't interested in space optimizations and such things but rather want to enforce domain logic in a strict and more type-safe fashion. For example, you might want to represent a card in a deck as type Rank = uint<1..=13>;. Then you know by construction that once you have your the_rank : Rank then it is correct and you won't have to recheck things. Of course, the other, more elaborate type safe way is to use an enum, but it might also be less convenient to setup than a simple range.

I think this alternative should be seriously entertained as the way to go; then you can use type aliases / newtypes to map to the range based types, e.g. type uint<const N: usize> = urange<{0..=pow(2, N)}>;.

This comment has been minimized.

@clarcharr

clarcharr Oct 31, 2018

Contributor

You mean, urange<{0..=pow(2, N) - 1}>. :p

But actually, you're right. I should seriously clarify that and write it down in the alternatives.

```rust
impl<const N: usize> uint<N> {
fn count_zeros(self) -> u32 {
let M = N.next_power_of_two();

This comment has been minimized.

@Nemo157

Nemo157 Oct 31, 2018

Contributor

I would assume this has to be const M = N.next_power_of_two(); since you can't use a non-const value in the type parameter on the next line. This appears to not be allowed by RFC2000 though.

It seems that this could be written

impl<const N: usize> uint<N> {
    fn count_zeros(self) -> u32 {
        let zeros = (self as uint<{ N.next_power_of_two() }>).count_zeros();
        zeros + (N.next_power_of_two() - N)
    }
}

but I'm not certain if the const expression there would be accepted by the current const generics implementation.

This comment has been minimized.

@clarcharr

clarcharr Nov 8, 2018

Contributor

You're right and I replaced let M with const M: usize for now. I think that's valid.

`(bit_size_of::<T>() + 7) / 8 == size_of::<T>()`. All types would have a bit
size, allowing for a future `repr(bitpacked)` extension which packs all values
in a struct or enum variant into the smallest number of bytes possible, given
their bit sizes. Doing so would prevent referencing the fields of the struct,

This comment has been minimized.

@Nokel81

Nokel81 Oct 31, 2018

Contributor

Since you are currently limiting the size to 128 bits (but maybe more in the future) wouldn't it be possible to define a reference to a bitpacked generic integer as a (ref, (u16, u16)) where the second pair is a (start, length) pair within?

This comment has been minimized.

@clarcharr

clarcharr Nov 8, 2018

Contributor

Would you mind clarifying here? Not quite sure what you mean.

@ayosec

This comment has been minimized.

ayosec commented Nov 1, 2018

Adds the builtin types uint<N> and int<N>

Those names are a bit unexpected. They look like primitives (since they are in lowercase), but IMO they are closer to a generic type.

Maybe, something like std::num::integer::Signed<N> and std::num::integer::Unsigned<N> could work.

@nikomatsakis

This comment has been minimized.

Contributor

nikomatsakis commented Nov 1, 2018

In general I sort of like the "big ideas" behind this RFC, but I am inclined to postpone this right now. This seems like a big source of compiler complexity -- I predict a bunch of bugs and weird LLVM edge cases -- and before we embark on that I'd like to have a very strong motivation. I also think we probably need to work out the const generics story (i.e., so you can write impls over these things, and we don't wind up with the same "huge dump of impls" that we get with [T;N]).

The other thing is that, when we do visit this, I would like to have a very clear picture of our motivations. The main two motivations I have seen are:

  • C interop (for things with bitfields)
  • Unpacking packet headers or whatever (the Erlang use case)

I've not read in super detail, but I don't think that this RFC fulfills either of those?

@rodrimati1992

This comment has been minimized.

rodrimati1992 commented Nov 1, 2018

So,is the approach I take in the comment above not viable for constraining N to less than 128?

#2581 (comment)

@therealprof

This comment has been minimized.

therealprof commented Nov 1, 2018

@nikomatsakis I understand this as being the foundation for C compatible bitfields. But even without this there'd be a huge benefit for embedded: As you might be aware microcontrollers expose a lot of functionality via memory-mapped registers and those are nicely grouped together into a number of data bus width sized bitfields. Luckily we have tools (e.g. svd2rust) which compile vendor provided register definitions into a huge blob of a module which slice and dice them into nice thousands of proxies (one struct and impl per accessible field) making sure that the right bits are accessed.

I will spare you the large block of code required just to read or write a single bit inside such a register but needless to say that the sheer amounts of code required to make the whole system work is:

  • nearly unreadable
  • impossible to maintain by human
  • extremely slow to compile
  • generates huge docs (don't you dare click the src button!)
  • doesn't allow to address fields spanning multiple bits safely

The possibility of properly defining a register as a bitfield would be a game changer in my eyes, even if (at the beginning) we cannot rely on proper packing and simply coerce the whole thing into a e.g. u32 and vice versa.

@scottmcm

This comment has been minimized.

Member

scottmcm commented Nov 2, 2018

@rfcbot fcp postpone

While I think I'm actually in favour in general of something in this space, I don't think now is the right time to consider it. I propose that we postpone considering this until we have enough const generics in nightly to have a concrete comparison between this and a pure library solution. That ought to also give a more confident answer to a variety of questions here, like how things generic over N need to be constrained to avoid monomorphization-time errors.

@rfcbot

This comment has been minimized.

rfcbot commented Nov 2, 2018

Team member @scottmcm has proposed to postpone this. The next step is review by the rest of the tagged teams:

No concerns currently listed.

Once a majority of reviewers approve (and none object), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

where `const N: usize`. These will alias to existing `uN` and `iN` types if `N`
is a power of two and no greater than 128. `usize` and `isize` remain separate
types due to coherence issues, and `bool` remains separate from `uint<1>` as it
does not have most of the functionality that integers have.

This comment has been minimized.

@gnzlbg

gnzlbg Nov 4, 2018

Contributor

Rust's bool is guaranteed to have the same bit-width as C's bool so even if it were to have the same functionality as the integer types, it wouldn't have the same layout being proposed here for int<1>/uint<1> (or the layout of these types would need to be explicitly specified to be the same as C's bool here).

This comment has been minimized.

@clarcharr

clarcharr Nov 4, 2018

Contributor

I'm a bit confused what you're trying to say. These types would explicitly underly u8 and have 1 bit of data, so uint<1> and bool have exactly the same in-memory layout. Would you mind elaborating what you mean?

This comment has been minimized.

@gnzlbg

gnzlbg Nov 4, 2018

Contributor

The current docs, PRs, and RFCs about this are confusing, so I've commented in the relevant issue asking for definitive clarification: rust-rfcs/unsafe-code-guidelines#9 (comment)

As a hard rule, `uint<N>` will always have its upper `N.next_power_of_two() - N`
bits set to zero, and similarly, `int<N>`'s upper bits will be sign-extended. To
compute niche optimizations for enums, we'll have to take this into account.

This comment has been minimized.

@gnzlbg

gnzlbg Nov 4, 2018

Contributor

So IIUC to be able to use this bits for niche optimizations, one would need to put in there something that's "not zero" ? This is a bit different from &T/NonNull/NonZero where the niche optimizations are enabled by the guarantee that the storage cannot ever be zero.

This comment has been minimized.

@clarcharr

clarcharr Nov 4, 2018

Contributor

I'm going to try and elaborate this with examples, because examples explicitly helped me understand this in the internals thread. Will respond back once I've done that.

This comment has been minimized.

@rkruppe

rkruppe Nov 4, 2018

Contributor

We already have such niches -- bool is a byte where 0 and 1 are the only values, char has a niche that I can never remember but it doesn't include U+0000 for obvious reasons -- and rustc exploits them when laying out e.g. Option<Option<...Option<bool>...>>

This comment has been minimized.

@gnzlbg

gnzlbg Nov 4, 2018

Contributor

@rkruppe that makes sense, thanks!

@gnzlbg

This comment has been minimized.

Contributor

gnzlbg commented Nov 4, 2018

I think I would prefer to get const generics first and see if we can build these in a library, and if we can't, see which language extensions we would need to do so before considering adding them to the language (in particular given that there is no RFC for where clauses yet).

One of the motivating examples for const generics are bounded integers (e.g. Bounded<i8, 1, 10>, a 8-bit wide integer that only can take values between 1 and 10). I'd like for these types to be able to specify their niches "somehow", and if that were possible then we might be able to put these int<N>/uint<N> in a module in the std library instead of making them part of the language (as primitive types).

@rodrimati1992

This comment has been minimized.

rodrimati1992 commented Nov 4, 2018

I think I would prefer to get const generics first and see if we can build these in a library, and if we can't, see which language extensions we would need to do so before considering adding them to the language (in particular given that there is no RFC for where clauses yet).

One of the motivating examples for const generics are bounded integers (e.g. Bounded<i8, 1, 10>, a 8-bit wide integer that only can take values between 1 and 10). I'd like for these types to be able to specify their niches "somehow", and if that were possible then we might be able to put these int<N>/uint<N> in a module in the std library instead of making them part of the language (as primitive types).

As far as I remember struct Bounded<I,const START:I,const END:I> won't be valid until const generics can mention type parameters,which this RFC gets around by using const BITS:usize instead.

Quoting from the RFC on const-generics:

Because consts must have the structural match property, 
and this property cannot be enforced for a type variable, 
it is not possible to introduce a const parameter which is ascribed to a type variable 
(Foo<T, const N: T> is not valid).

If you wanted to implement Bounded before const generics can mention generic parameters,you would have to have at least 2 different versions,
IBounded<I,const START:i128,const END:i128> and
UBounded<I,const START:u128,const END:u128>.

As far as specifying niches,could we not just have a trait with an associated constant specifying which bit-patterns are invalid for the type:


/// Repr is the underlying representation of the type.
enum ValidBitPatterns<Repr>{
    AllAreValid,
    Range( ::core::ops::RangeInclusive<Repr> ),
    Value( Repr ),
    Not(&'static InvalidBitPattern<Repr>),
}

trait Niche{
    type Repr:BitPattern;
    const VALID:ValidBitPatterns<Self::Repr>;
}


@clarcharr

This comment has been minimized.

Contributor

clarcharr commented Nov 4, 2018

Now that I finally have the time to work on this, I'm going to try and integrate all of the suggestions into the RFC text before the end of the day. No idea how much I'll get done, but we'll see.

I personally wasn't expecting this RFC to get accepted any time soon, and postponing it now makes a lot of sense even to me as the RFC author. Although I did want to post this for the visibility and design so that someone else can take this work and discussion and make another RFC once const generics are stable and the desires of the community are clear. Additionally, a lot of the things here I think should be planned to integrate into the language shortly after const generics are available, even though not all of them are the best decision.

@clarcharr

This comment has been minimized.

Contributor

clarcharr commented Nov 4, 2018

To clarify, I don't intend to close this until the FCP runs through. So, I'll try to constantly revise the RFC until it reaches that point. :)

@rodrimati1992

This comment has been minimized.

rodrimati1992 commented Nov 4, 2018

Because it seems that nobody had seen the above code for how to constrain on const generics without a syntax for it in where clauses (tell me if this would not work):


struct Str<const S:&'static str>;

struct Bool<const B:bool>;

struct Usize<const N:usize>;

struct uint<const N: usize> 
where (): AssertLessThan128<N>
{ .. }

trait AssertLessThan128<const N:usize>{}

impl<const N:usize,const IS_IT:bool> AssertLessThan128<N> for ()
where
    ():
        LessThan<N,128,IsIt=Bool<IS_IT>>+
        Assert<IS_IT, (
            Str<"uint cannot be constructed with a size larger than 128,the passed size is:",
            Usize<N>>
        ) >
{}


trait LessThan<const L:usize,const R:usize> {
    type IsIt;
}
impl<const L:usize,const R:usize> LessThan<L,R> for () {
    type IsIt=Bool<{L<R}>;
}


trait Assert<const COND:bool,Msg>{}

impl<const COND:bool,Msg> Assert<COND> for ()
where
    ():AssertHelper<COND,Output=Bool<true>>,
{}

trait AssertHelper<const COND:bool,Msg>{
    type Output;
}
impl<Msg> AssertHelper<true,Msg> for (){
    type Output=Bool<true>;
}
impl<Msg> AssertHelper<false,Msg> for (){
    type Output=Msg;
}



Even if generic integers are not stabilized with this approach to constraints on constants,it should at least not be a blocker for accepting the RFC.

@scottmcm

This comment has been minimized.

Member

scottmcm commented Nov 4, 2018

Thinking about the notch optimizations in this, note that the notches are ranges not bits. So Integer<3, 200> can totally have the notch optimizations. As such, I think I'd rather have that version as the primitive, with type uint<N> = Integer<0, (1<<N)-1>; just a simple alias, not the main feature.

@mark-i-m

This comment has been minimized.

Contributor

mark-i-m commented Nov 4, 2018

@scottmcm The problem is that if we want to use these for bitfields, we would then need to guarantee that the niche optimizations will always be applied to use the smallest number of bits.

@scottmcm

This comment has been minimized.

Member

scottmcm commented Nov 5, 2018

@mark-i-m Well, using these for bitfields will necessarily require a custom repr to actually pack them that tightly anyway -- repr(packed) doesn't pack sub-bytes today, and I don't think it ever should. So whatever the new thing is isn't forced to know about int<N>; it could look for Integer<L,H> just as well.

@leonardo-m

This comment has been minimized.

leonardo-m commented Nov 5, 2018

This feature, if desired, should be designed to be as orthogonal as possible to a similar but distinct feature: built-in ranged integers (as in Ada language). (And if I must choose what of the two features to implement in Rust, I think ranged integers are more useful).

@mark-i-m

This comment has been minimized.

Contributor

mark-i-m commented Nov 5, 2018

@scottmcm

I don't think it ever should

Really? Why do you say that?

it could look for Integer<L,H> just as well.

This is true, but IMHO it seems less obvious from a UX perspective how to pack a Integer<L, H> than a int<15>, especially if L != 0.

Also, I'm really skeptical of stabilizing/guaranteeing optimizations, like niche-filling because then those effectively become part of the language and have stability guarantees, which seems very sketchy.

@scottmcm

This comment has been minimized.

Member

scottmcm commented Nov 5, 2018

Really? Why do you say that?

Because I can soundly ptr::write_unaligned over fields of repr(packed) structs today. That means that it can't pack things smaller than their size, or such code would overwrite other fields. Thus if int<N> were to be sub-byte packed by repr(packed), it would have to be !Sized or something so it can't be passed to existing code, and that doesn't sound like a nice world state. I'd much rather make a new "you can't get pointers to fields" repr than make that part of the uint<N> types.

@eddyb

This comment has been minimized.

Member

eddyb commented Nov 5, 2018

You can use a pair of unsigned integers to specify a wraparound range, you don't need a signed version separately.

As far as specifying niches,could we not just have a trait with an associated constant specifying which bit-patterns are invalid for the type:

IMO that causes all sorts of issues. Whether the trait is implemented must now be evaluated by several different things, including unsafety checking and type layout.
And you don't get a constructor like NonNull:: new for free.

@rodrimati1992

This comment has been minimized.

rodrimati1992 commented Nov 5, 2018

You can use a pair of unsigned integers to specify a wraparound range, you don't need a signed version separately.

As far as specifying niches,could we not just have a trait with an associated constant specifying which bit-patterns are invalid for the type:

IMO that causes all sorts of issues. Whether the trait is implemented must now be evaluated by several different things, including unsafety checking and type layout.
And you don't get a constructor like NonNull:: new for free.

I am not expecting to get NonNull::new for free though (as in without having to write it).

To clarify what I would expect if a trait like this were accepted,here is an newtype for u32 that disallows the maximum value in its constructor.

Edit:This may not be the best example,since I will get told that ranged integers are a superset of this example.


fn main(){
    
    assert_eq!(
        ::std::mem::size_of::<Option<NonMaxU32>>(),
        4
    );

    assert_eq!(
        ::std::mem::size_of::<NonMaxU32>(),
        4
    );


    assert_eq!(NoMaxU32::new(0).unwrap().value() , 0);
    assert_eq!(NoMaxU32::new(1).unwrap().value() , 1);
    assert_eq!(NoMaxU32::new(2).unwrap().value() , 2);

    let max=!0u32;

    assert_eq!(NoMaxU32::new(max-2).unwrap().value() , max-2);
    assert_eq!(NoMaxU32::new(max-1).unwrap().value() , max-1);
    assert_eq!(NoMaxU32::new(max) , None);
}

#[repr(transparent)]
#[derive(Copy,Clone,Debug)]
struct NoMaxU32{
    repr:u32,
}


impl NoMaxU32{
    pub fn new(repr:u32)->Option<Self>{
        if repr == !0u32 {
            None
        }else{
            Some(Self{ repr })
        }
    }
    pub fn value(self)->u32{ self.repr }
}


unsafe impl Niche for NoMaxU32{
    type Repr=u32;
    const VALID:ValidBitPatterns<u32>=
        ValidBitPatterns::Range(0..=((!0u32)-1) ) ;
}


@eddyb

This comment has been minimized.

Member

eddyb commented Nov 5, 2018

Yes, I think that is harder to implement and more prone to subtle soundness issues. Note that your main function needs an unsafe block to create Self.

@clarcharr

This comment has been minimized.

Contributor

clarcharr commented Nov 9, 2018

Finally managed to get around to updating this. Let me know if there's anything I missed, or if you have any additional things to clarify. :)

For example, here's a modified version of our previous example:
```rust
}

This comment has been minimized.

@mark-i-m

mark-i-m Nov 9, 2018

Contributor

Lol, I think something is missing here...

is required for `count_zeros`, and thus it's specialized. However, `Default`
does not require any specialisation, and can just be replaced with a single
generic impl.
#[repr(C, bitpacked)]

This comment has been minimized.

@mark-i-m

mark-i-m Nov 9, 2018

Contributor

It looks like you are missing the opening ```rust

This comment has been minimized.

@clarcharr

clarcharr Nov 9, 2018

Contributor

Whoops, I think I accidentally copy-pasted the wrong thing into my editor and broke up the text. I'll take a look tomorrow morning and fix it.

@leonardo-m

This comment has been minimized.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment