Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Generic integers #2581

Closed
wants to merge 3 commits into from

Conversation

@clarfon
Copy link
Contributor

clarfon commented Oct 28, 2018

🖼 Rendered

📝 Summary

Adds the builtin types uint<N> and int<N>, allowing integers with an arbitrary size in bits. For now, restricts N ≤ 128.

💖 Thanks

To everyone who helped on the internals thread to review this RFC, particularly @Centril, @rkruppe, @scottmcm, @comex, and @ibkevg.

@clarfon clarfon changed the title Generic integers RFC RFC: Generic integers Oct 28, 2018

@clarfon clarfon force-pushed the clarfon:generic_int branch from fb23779 to 38e6de2 Oct 28, 2018

@mark-i-m

This comment has been minimized.

Copy link
Contributor

mark-i-m commented Oct 28, 2018

I like the idea, and I would really love to have efficient and ergonomic strongly-typed bitfields. However, this proposal feels too magical for my taste; there is to much stuff built in to the compiler. I would rather expose a single simple primitive that allows implementing arbitrarily sized ints efficiently.

Just a half-baked idea: We only build a Bit type into the language, which is guaranteed to be 1 bit large (though it must be padded in a struct unless you use repr(packed)). All of the other integer types are defined as follows:

#[repr(packed)]
struct int<const Width: usize> {
  bits: [Bit; Width],
}

#[repr(packed)]
struct uint<const Width: usize> {
  bits: [Bit; Width],
}

with the appropriate operations implemented via efficient bit-twiddling methods or compiler intrinsics for performance.

@Ekleog

This comment has been minimized.

Copy link

Ekleog commented Oct 28, 2018

@Centril

This comment has been minimized.

Copy link
Contributor

Centril commented Oct 28, 2018

Programming languages with dependent types (F*, etc.) do offer this, and actually much more. :)

Well, you can represent a Fin : Nat -> Type type constructor in dependent typing pretty easily, but those are quite inefficient...


## Primitive behaviour

The compiler will have two new built-in integer types: `uint<N>` and `int<N>`,

This comment has been minimized.

@Centril

Centril Oct 28, 2018

Contributor

Nit: these are two new built-in integer type families or type constructors. You really get 256 new types, not two.

This comment has been minimized.

@clarfon

clarfon Oct 29, 2018

Author Contributor

I agree with you but I also wonder if this is the language most people would prefer to use. For example, would you consider Vec<T> to also be a family of types, or just a generic type?

This comment has been minimized.

@Centril

Centril Oct 30, 2018

Contributor

Well, the type constructor is really Vec and not Vec<T>; but "two new built-in generic integer types" is I think clear enough.

The compiler will have two new built-in integer types: `uint<N>` and `int<N>`,
where `const N: usize`. These will alias to existing `uN` and `iN` types if `N`
is a power of two and no greater than 128. `usize` and `isize` remain separate
types due to coherence issues, and `bool` remains separate from `uint<1>` as it

This comment has been minimized.

@Centril

Centril Oct 28, 2018

Contributor

Elaborate on what those coherence issues are for unfamiliar readers?

This comment has been minimized.

@rkruppe

rkruppe Oct 28, 2018

Member

Traits can be implemented separately & differently for u32, u64, usize, etc. so unifying usize with uN for appropriate N would cause overlapping impls.

This comment has been minimized.

@clarfon

clarfon Oct 29, 2018

Author Contributor

I may not be able to get to this by the weekend but I will try to remember to elaborate more on this.

For example, this means that `uint<48>` will take up 8 bytes and have an
alignment of 8, even though it only has 6 bytes of data.

`int<N>` store values between -2<sup>N-1</sup> and 2<sup>N-1</sup>-1, and

This comment has been minimized.

@Centril

Centril Oct 28, 2018

Contributor

store/stores, pick one :)

This comment has been minimized.

@clarfon

clarfon Nov 8, 2018

Author Contributor

Store was a typo. :p

In addition to the usual casts, `u1` and `i1` can also be cast *to* `bool` via
`as`, whereas most integer types can only be cast from `bool`.

For the moment, a monomorphisation error will occur if `N > 128`, to minimise

This comment has been minimized.

@Centril

Centril Oct 28, 2018

Contributor

This is my biggest concern; I don't think monomorphization of this magnitude errors belong in the language and I would like to see this changed before stabilization.

This comment has been minimized.

@clarfon

clarfon Oct 29, 2018

Author Contributor

Would you be willing to elaborate more on this? Why are these errors a problem?

This comment has been minimized.

@Centril

Centril Oct 30, 2018

Contributor

The nature of Rust's bounded polymorphism is that type checking should be modular so that you can check a generic function separately from the instantiation and then the instantiation of any parameters satisfying bounds should not result in errors.

This makes for more local error messages (you won't have post monomorphization errors in some location far removed from where instantiation happened...), fewer surprises (because this is how polymorphism works everywhere else in the type system) and possibly also better performance.
Another benefit of avoiding post monomorphization errors is that the need to monomorphize as an implementation strategy is lessened. That said, there are instances where the compiler will cause post monomorphization errors, but those are extremely unlikely to occur in actual code. In the case of N > 128 it is rather quite likely.

The general principle is that you declare up front the requirements (with bounds, etc.) to use / call an object and then you don't impose new and hidden requirements for certain values.

If you want to impose N > 128, then that should be explicitly required in the "signatures", e.g. you should state struct uint<const N: usize> where N <= 128 { .. } (and on the impls...). Otherwise, it should work for all N: usize evaluable at compile time.

This comment has been minimized.

@Nemo157

Nemo157 Oct 31, 2018

Contributor

e.g. you should state struct uint<const N: usize> where N <= 128 { .. } (and on the impls...)

Is this possible with any currently planned or near future const generics? I don't recall seeing any RFCs that would support const operations in where clauses (although I would love to have them available). This specific case might be possible with some horrible hack like

trait LessThan128 {}
struct IsLessThan128<const N: usize>;

impl LessThan128 for IsLessThan128<0> {}
⋮
impl LessThan128 for IsLessThan128<127> {}

struct uint<const N: usize> where IsLessThan128<N>: LessThan128 { .. }

but 🤢

This comment has been minimized.

@Centril

Centril Oct 31, 2018

Contributor

@Nemo157 not possible with RFC 2000 but might be with future extensions.

This comment has been minimized.

@rodrimati1992

rodrimati1992 Nov 1, 2018

This is a different formulation which might work(if you can use a const expression in associated types):

struct Bool<const B:bool>;

trait LessThan128<const N: usize> {
    type IsIt;
}

impl<const N:usize> LessThan128<N> for () {
    type IsIt=Bool<{N<128}>;
}

struct uint<const N: usize> 
where (): LessThan128<N,IsIt=Bool<true>> 
{ .. }

Edit:

If you want to include an error message in the type error,you can use this to print an error message in the type itself.


struct Str<const S:&'static str>;

struct Bool<const B:bool>;

struct Usize<const B:bool>;




trait Assert<const COND:bool,Msg>{}

impl<const COND:bool,Msg> Assert<COND> for ()
where
    ():AssertHelper<COND,Output=Bool<true>>,
{}


trait AssertHelper<const COND:bool,Msg>{
    type Output;
}

impl<Msg> AssertHelper<true,Msg> for (){
    type Output=Bool<true>;
}

impl<Msg> AssertHelper<false,Msg> for (){
    type Output=Msg;
}



trait AssertLessThan128<const N:usize>{}


impl<const N:usize,const IS_IT:bool> AssertLessThan128<N> for ()
where
    ():
        LessThan<N,128,IsIt=Bool<IS_IT>>+
        Assert<IS_IT, (
            Str<"uint cannot be constructed with a size larger than 128,the passed size is:",
            Usize<N>>
        ) >
{}


trait LessThan<const L:usize,const R:usize> {
    type IsIt;
}

impl<const L:usize,const R:usize> LessThan<L,R> for () {
    type IsIt=Bool<{L<R}>;
}

struct uint<const N: usize> 
where (): AssertLessThan128<N>
{ .. }

This comment has been minimized.

@gnzlbg

gnzlbg Nov 4, 2018

Contributor

AFAIK stable Rust does not currently have any monomorphization time errors - if you find any, it's a bug - so this RFC "as is" would be introducing them into the language =/

This comment has been minimized.

@Centril

Centril Nov 4, 2018

Contributor

@gnzlbg

fn poly<T>() {
    let _n: [T; 10000000000000];
}

fn monomorphization_error() {
    poly::<u8>(); // OK!
    poly::<String>(); // BOOM!
}

fn main() {
    monomorphization_error();
}
## Standard library

Existing implementations for integer types should be annotated with
`default impl` as necessary, and most operations should defer to the

This comment has been minimized.

@Centril

Centril Oct 28, 2018

Contributor

I'm not sure why default impl would be used here... elaborate?

This comment has been minimized.

@clarfon

clarfon Oct 29, 2018

Author Contributor

Essentially, your default impls are your base cases and the other cases will recursively rely upon them. For example, <uint<48>>::count_zeroes would ultimately expand to u64::count_zeroes minus 24. I'll try to elaborate more in the RFC text itself later.

This comment has been minimized.

@Nemo157

Nemo157 Oct 31, 2018

Contributor

You mention that it should be default fn in the text but don't put it in the count_zeros example, including it to the example + showing one of the specialized implementations for a power of two could clarify this somewhat.


Once const generics and specialisation are implemented and stable, almost all of
this could be offered as a crate which offers `uint<N>` and `int<N>` types. I
won't elaborate much on this because I feel that there are many other

This comment has been minimized.

@Centril

Centril Oct 28, 2018

Contributor

You should :) Do we know that it is actually implementable as a library?

This comment has been minimized.

@clarfon

clarfon Oct 29, 2018

Author Contributor

I will try to get to this this weekend.

@Ekleog

This comment has been minimized.

Copy link

Ekleog commented Oct 28, 2018

from `uint<N>` to `int<M>` or `uint<M + 1>`, where `M >= N`.

In addition to the usual casts, `u1` and `i1` can also be cast *to* `bool` via
`as`, whereas most integer types can only be cast from `bool`.

This comment has been minimized.

@rkruppe

rkruppe Oct 28, 2018

Member

So, this treats 1 and -1 as true depending on the signedness? I rather like the route of identifying true with -1 instead of 1, but it's not the route Rust has chosen so it might be a bit controversial. From another angle, while it's consistent with true being represented as a single 1 bit, it's also inconsistent with the fact that bool as iN (for currently existing iN) turns true into 1.

Is there motivation for providing these cases instead of just making people write x != 0, other than "we can"?

This comment has been minimized.

@clarfon

clarfon Oct 29, 2018

Author Contributor

I completely forgot about sign extension when doing this and now that you mention it, it makes sense that if this were offered, then only uint<1> should cast to bool. Unless anyone has any objections when I next get around to revising the text I'll remove both casts to bool.

`int<N>` store values between -2<sup>N-1</sup> and 2<sup>N-1</sup>-1, and
`uint<N>` stores values between 0 and 2<sup>N</sup>-1. One unexpected case of
this is that `i1` represents zero or *negative* one, even though LLVM and other
places use `i1` to refer to `u1`. This case is left as-is because generic code

This comment has been minimized.

@rkruppe

rkruppe Oct 28, 2018

Member

Nit: integer types in LLVM don't have inherent signedness, they are bags of bits that are interpreted as signed or unsigned by individual operations, and i1 true is treated as -1 by signed operations (e.g., icmp slt i1 true, i1 false is true -- slt being signed less-than).

This comment has been minimized.

@clarfon

clarfon Oct 29, 2018

Author Contributor

I didn't actually know this-- I'll be sure to update the text to be accurate there.

Because sign extension will always be applied, it's safe for the compiler to
internally treat `uint<N>` as `uint<N.next_power_of_two()>` when doing all
computations. As a concrete example, this means that adding two `uint<48>`
values will work exactly like adding two `u64` values, generating exactly the

This comment has been minimized.

@rkruppe

rkruppe Oct 28, 2018

Member

This needs more thorough discussion. To take this example, an u48 add can overflow the 48 bits, setting some high bits in the 64 bit register. If we take that as given, u48 comparisons (to give just one example) need to first zero-extend the operands to guarantee the comparison works correctly. Conversely, we could zero-extend after arithmetic operations to get the invariant that the high 16 bits are always zero, and then use that knowledge to implement 48 bit comparisons as a plain 64 bit comparisons. Likewise for i48: you'll need sign extensions. In some code sequences compiler optimizations can prove the zero/sign extension redundant, but generally there is no free lunch here -- you need some extending even in code that never changes bit widths. Eliminating sign extensions is in fact the main reason why C compilers care about signed integer addition being UB rather than wrapping.

This comment has been minimized.

@clarfon

clarfon Oct 29, 2018

Author Contributor

I hadn't actually added this section in the original RFC when I requested feedback, and rushed it in because I felt it was necessary. You are right that in most cases, this wouldn't be a no-op, although I'm curious if optimisations could make them so.

This is certainly a case for adding add_unchecked and co. as suggested by… some other RFC issue I don't have the time to look up right now.

Either way, I'll definitely take some time this weekend to revise this section.

Primitive operations on `int<N>` and `uint<N>` should work exactly like they do
on `int<N>` and `uint<N>`: overflows should panic when debug assertions are
enabled, but ignored when they are not. In general, `uint<N>` will be
zero-extended to the next power of two, and `int<N>` will be sign-extended to

This comment has been minimized.

@rkruppe

rkruppe Oct 28, 2018

Member

Is this intended to be a user-facing guarantee? e.g. given x: &uint<48> are the following two guaranteed to give the same result:

  • unsafe { *(x as *const uint<48> as *const u64) }
  • *x as u64

This comment has been minimized.

@clarfon

clarfon Oct 29, 2018

Author Contributor

I would not guarantee this, considering how x as *const uN as *const uM only holds on little-endian systems. Casting after dereferencing should work like casting values as usual, though.

I'll try and remember to clarify this in the text.

This comment has been minimized.

@rkruppe

rkruppe Oct 29, 2018

Member

Would you guarantee it on little-endian systems?

This comment has been minimized.

@clarfon

clarfon Nov 9, 2018

Author Contributor

I'm adding that as an unresolved question.

@Mark-Simulacrum

This comment has been minimized.

Copy link
Member

Mark-Simulacrum commented Oct 29, 2018

I didn't see any discussion within the RFC about not using uN directly. It seems like that could make quite a bit of sense -- at least for an initial implementation.

It seems like it should at least be mentioned in the alternatives section...

@jswrenn

This comment has been minimized.

Copy link

jswrenn commented Oct 29, 2018

We only build a Bit type into the language, which is guaranteed to be 1 bit large (though it must be padded in a struct unless you use repr(packed)).

@mark-i-m's suggestion strikes me as deeply complementary to @clarcharr's proposal. The leading motivation of this RFC is supporting bitfields, but using a number (albeit one guaranteed to be the right number of bits) to represent a bitfield is often a logical type-mismatch. To use the example from the RFC:

#[repr(bitfields)]
struct MipsInstruction {
    opcode: u6,
    rs: u5,
    rt: u5,
    rd: u5,
    shift: u5,
    function: u6,
}

How often does it really make sense to add or subtract from an opcode? By representing it as an unsigned six-bit integer, we signal that arithmetic on an opcode is a well-defined operation.

With @mark-i-m's suggestion, we can have a more well-typed version of this struct:

#[repr(packed)]
struct MipsInstruction {
    opcode: [Bit; 6],
    rs: u5,
    rt: u5,
    rd: u5,
    shift: u5,
    function: [Bit; 6],
}

It never makes sense to represent opcode or function as numbers, so we don't. Conversely, shift is very clearly numeric.

(Disclaimer: I'm not a MIPS expert; I'm just going off the wikibook. I can't tell whether r{s,t,d} ought to be [Bit; 5] or u5.)


As for the second half of @mark-i-m's suggestion: I can't tell whether it's merely an implementation detail or if it has semantic differences from @clarcharr's proposal. Regardless, it would be a good candidate for discussion in the Alternatives section.

@rkruppe

This comment has been minimized.

Copy link
Member

rkruppe commented Oct 29, 2018

Arrays aren't (and can't be changed to be) repr(bitpacked), so [Bit; N] would actually occupy N bytes. While one could instead provide a dedicated Bitvector<N> type, that would have a lot of overlap with uint<N> (which have many applications not covered by Bitvector<N>). I also don't really see the typing benefit @jswrenn suggests: if you care about that, you'd want to use more fine-grained newtypes to distinguish (in this MIPS instruction encoding example) the opcode from the function field or the shift amount from the register numbers. Once you have these newtypes, uint<5> vs Bitvector<5> becomes largely irrelevant as it's an implementation detail. (In fact, one could implement Bitvector<N> as a newtype around uint<N>.)

@clarfon

This comment has been minimized.

Copy link
Contributor Author

clarfon commented Oct 29, 2018

Finally getting around to a few comments:

Is user code allowed to rely on this memory layout, or is it not? Intuitively it'd be better if the layout was not actually defined, to allow for further optimizations, but the current text makes me believe it is defined. The one case where layout would be mandatory would be for a not-yet-written #[repr(bitpacked)] RFC, in my opinion.

I feel that defining memory layout is important because there doesn't seem to be a compelling argument otherwise. Rust very much avoids "undefined"ness whenever possible. People will want to know how these types operate in a #[repr(C)] struct or in an array; would [uint<48>; 2] take up six bytes or eight? Establishing this is crucial for stabilisation imho.

Programming languages with dependent types (F*, etc.) do offer this, and actually much more. :)

I'll take a look later and add these to the prior art section.

I didn't see any discussion within the RFC about not using uN directly. It seems like that could make quite a bit of sense -- at least for an initial implementation.

It seems like it should at least be mentioned in the alternatives section...

I'll add it to alternatives, although the main reason against this would be that it doesn't allow generic impls even though it is generic. Unless uN were just an alias for uint<N>, which seems unnecessary to me.

@clarfon

This comment has been minimized.

Copy link
Contributor Author

clarfon commented Oct 29, 2018

In terms of offering Bit instead of uint-- I'll definitely add this to the alternatives section. Essentially, offering some kind of bits<N> type would be similar to uint<N>, although completely orthogonal to uint<N> and presumably only allowing bit operations, not arithmetic. I still believe that uint<N> is better overall, but thoroughness is important.

@mark-i-m

This comment has been minimized.

Copy link
Contributor

mark-i-m commented Oct 30, 2018

Regarding Bit, I had intended it as a way to not add uint<N> and int<N> as language features. Rather, they could be implemented in a crate as wrappers around [Bit; N]. My motivation is just that adding uint<N> seems like a lot of magic, and I would like to reduce magic.

@clarfon

This comment has been minimized.

Copy link
Contributor Author

clarfon commented Oct 30, 2018

While Bit by itself is less magic than bits<N>, there's a lot of magic for making arrays compact for just one particular type. How does Bit apply in most type contexts? Is (Bit, Bit) compact? Etc.

@mark-i-m

This comment has been minimized.

Copy link
Contributor

mark-i-m commented Oct 30, 2018

My thinking was that Bit had a size of one bit and alignment of 1 bytes. So you need to use repr(packed) to get rid of padding, as with other types.

However, now that you mention it, IIUC, the size and alignment of types is tracked in bytes in the compiler. So some work would need to be put into making the compiler track bits, but i suspect similar work would need to be put into the current RFC proposal anyway to make bitfields work.

Also one other minor thing that I didn't see mentioned in the RFC: does size_of::<uint<N>>() just round up to the nearest byte? or the nearest byte when rounded up to a power of two?

@mark-i-m

This comment has been minimized.

Copy link
Contributor

mark-i-m commented Oct 30, 2018

@rkruppe Sorry, I just saw your comment

Arrays aren't (and can't be changed to be) repr(bitpacked), so [Bit; N] would actually occupy N bytes.

I was curious why. Does this break some other stability guarantee we have?

@clarfon

This comment has been minimized.

Copy link
Contributor Author

clarfon commented Oct 30, 2018

@mark-i-m the sizes of uint<N> are the same as the larger power of two size. So, uint<48> has the same size as u64.

@mark-i-m

This comment has been minimized.

Copy link
Contributor

mark-i-m commented Oct 30, 2018

Hmm... so using repr(packed) actually changes the size of the type?

@clarfon

This comment has been minimized.

Copy link
Contributor Author

clarfon commented Oct 30, 2018

@mark-i-m In this case, no. repr(packed) allows alignment to break, but in this case, uint<48> would have an alignment and size of 8. In this case, we'd need a different form of repr(packed) which allows both size and alignment to break, shoving things down to individual bits. That's mostly what repr(bitfields) is in the RFC; originally, I recommended bitpacked as a name but I changed it and I don't remember why.

@rkruppe

This comment has been minimized.

Copy link
Member

rkruppe commented Oct 30, 2018

@mark-i-m

I was curious why. Does this break some other stability guarantee we have?

Arrays (and slices, which have the same layout as arrays of the same length) guarantee that each element is separately addressable -- that makes the Index/IndexMut impls and iter()/iter_mut() tick, for starters. Even though for all currently existing types that would remain so when arrays start become bitpacked, it would mean we'd have to add new bounds to those things, which would break generic code that doesn't have those bounds.

@jswrenn

This comment has been minimized.

Copy link

jswrenn commented Oct 30, 2018

Even though for all currently existing types that would remain so when arrays start become bitpacked, it would mean we'd have to add new bounds to those things, which would break generic code that doesn't have those bounds.

Couldn't we signal that individual Bits are unaddressable with an Addressable auto trait that isn't implemented for Bit? E.g.:

auto trait Addressable {}

// Individual bits are unaddressable.
impl !Addressable for Bit {}
@clarfon

This comment has been minimized.

Copy link
Contributor Author

clarfon commented Oct 30, 2018

@jswrenn There was a big discussion of doing this for DynSized with extern types, and the verdict was to not there. I feel like an Addressable bound would have similar problems.

@rkruppe

This comment has been minimized.

Copy link
Member

rkruppe commented Oct 30, 2018

@jswrenn Auto traits are not assumed to be implemented by default (e.g. fn foo<T>() does not imply T: Send). So beyond just an auto trait, you'd need a new opt-out default bound like Sized, and as @clarcharr mentioned those have been rejected for other purposes in the past.

# Summary
[summary]: #summary

Adds the builtin types `uint<N>` and `int<N>`, allowing integers with an

This comment has been minimized.

@Centril

Centril Oct 30, 2018

Contributor

Perhaps the types should be called UInt<N> and Int<N> since that is more conventional these days; however, all the primitive types are lower cased so perhaps not... I'm torn.

This comment has been minimized.

@mark-i-m

mark-i-m Nov 1, 2018

Contributor

The weird thing is that they are generic primitives, which I've never seen in a language before...

Perhaps we should do something suitably weird and have new notation for the type?

This comment has been minimized.

@rkruppe

rkruppe Nov 1, 2018

Member

There are plenty of primitive type constructors -- references, raw pointers, tuples, arrays, slices -- they just all have special syntax as well instead of using angle brackets. I don't think special syntax for these new primitives is worth it.

to be solved during the development of this feature, rather than in this RFC.
However, here are just a few:

* Should `uN` and `iN` suffixes for integer literals, for arbitrary `N`, be

This comment has been minimized.

@Centril

Centril Oct 30, 2018

Contributor

An alternative to this would be to simply have u<7> and i<42> which would be almost as short...
Perhaps that's too short to be understandable? Chances are tho that given the fundamental nature of the types that people would remember it...

This comment has been minimized.

@mark-i-m

mark-i-m Nov 1, 2018

Contributor

Actually, I thought about this too. uint and int seem inconsistent with the other integer types somehow, so maybe u and i are the right choice?

This comment has been minimized.

@mark-i-m

mark-i-m Nov 1, 2018

Contributor

Also, either way, would we need to make a breaking change to make these identifiers reserved?

This comment has been minimized.

@Centril

Centril Nov 1, 2018

Contributor

I don't think we need to make breaking changes; you can always shadow the type with something else afaik.

This comment has been minimized.

@clarfon

clarfon Nov 8, 2018

Author Contributor

I'm adding this to the alternatives section but stating against it because i is such a common variable name.

This comment has been minimized.

@scottmcm

scottmcm Nov 9, 2018

Member

Well, it's a different namespace so there isn't a conflict. The following "works":

struct i<i> { i: i };
let i: i<i32> = i { i: 4 };

(Said without actually taking a position on whether I think i would be a good name for the type constructor in question.)

signed simply depends on whether its lower bound is negative.

The primary reason for leaving this out is… well, it's a lot harder to
implement, and could be added in the future as an extension. Longer-term, we

This comment has been minimized.

@Centril

Centril Oct 30, 2018

Contributor

Perhaps... tho you should elaborate on the implementation difficulty as you see it.

...but the ranges feel also much more useful generally for code that isn't interested in space optimizations and such things but rather want to enforce domain logic in a strict and more type-safe fashion. For example, you might want to represent a card in a deck as type Rank = uint<1..=13>;. Then you know by construction that once you have your the_rank : Rank then it is correct and you won't have to recheck things. Of course, the other, more elaborate type safe way is to use an enum, but it might also be less convenient to setup than a simple range.

I think this alternative should be seriously entertained as the way to go; then you can use type aliases / newtypes to map to the range based types, e.g. type uint<const N: usize> = urange<{0..=pow(2, N)}>;.

This comment has been minimized.

@clarfon

clarfon Oct 31, 2018

Author Contributor

You mean, urange<{0..=pow(2, N) - 1}>. :p

But actually, you're right. I should seriously clarify that and write it down in the alternatives.

```rust
impl<const N: usize> uint<N> {
fn count_zeros(self) -> u32 {
let M = N.next_power_of_two();

This comment has been minimized.

@Nemo157

Nemo157 Oct 31, 2018

Contributor

I would assume this has to be const M = N.next_power_of_two(); since you can't use a non-const value in the type parameter on the next line. This appears to not be allowed by RFC2000 though.

It seems that this could be written

impl<const N: usize> uint<N> {
    fn count_zeros(self) -> u32 {
        let zeros = (self as uint<{ N.next_power_of_two() }>).count_zeros();
        zeros + (N.next_power_of_two() - N)
    }
}

but I'm not certain if the const expression there would be accepted by the current const generics implementation.

This comment has been minimized.

@clarfon

clarfon Nov 8, 2018

Author Contributor

You're right and I replaced let M with const M: usize for now. I think that's valid.

`(bit_size_of::<T>() + 7) / 8 == size_of::<T>()`. All types would have a bit
size, allowing for a future `repr(bitpacked)` extension which packs all values
in a struct or enum variant into the smallest number of bytes possible, given
their bit sizes. Doing so would prevent referencing the fields of the struct,

This comment has been minimized.

@Nokel81

Nokel81 Oct 31, 2018

Contributor

Since you are currently limiting the size to 128 bits (but maybe more in the future) wouldn't it be possible to define a reference to a bitpacked generic integer as a (ref, (u16, u16)) where the second pair is a (start, length) pair within?

This comment has been minimized.

@clarfon

clarfon Nov 8, 2018

Author Contributor

Would you mind clarifying here? Not quite sure what you mean.

@mark-i-m

This comment has been minimized.

Copy link
Contributor

mark-i-m commented Nov 4, 2018

@scottmcm The problem is that if we want to use these for bitfields, we would then need to guarantee that the niche optimizations will always be applied to use the smallest number of bits.

@scottmcm

This comment has been minimized.

Copy link
Member

scottmcm commented Nov 5, 2018

@mark-i-m Well, using these for bitfields will necessarily require a custom repr to actually pack them that tightly anyway -- repr(packed) doesn't pack sub-bytes today, and I don't think it ever should. So whatever the new thing is isn't forced to know about int<N>; it could look for Integer<L,H> just as well.

@leonardo-m

This comment has been minimized.

Copy link

leonardo-m commented Nov 5, 2018

This feature, if desired, should be designed to be as orthogonal as possible to a similar but distinct feature: built-in ranged integers (as in Ada language). (And if I must choose what of the two features to implement in Rust, I think ranged integers are more useful).

@mark-i-m

This comment has been minimized.

Copy link
Contributor

mark-i-m commented Nov 5, 2018

@scottmcm

I don't think it ever should

Really? Why do you say that?

it could look for Integer<L,H> just as well.

This is true, but IMHO it seems less obvious from a UX perspective how to pack a Integer<L, H> than a int<15>, especially if L != 0.

Also, I'm really skeptical of stabilizing/guaranteeing optimizations, like niche-filling because then those effectively become part of the language and have stability guarantees, which seems very sketchy.

@scottmcm

This comment has been minimized.

Copy link
Member

scottmcm commented Nov 5, 2018

Really? Why do you say that?

Because I can soundly ptr::write_unaligned over fields of repr(packed) structs today. That means that it can't pack things smaller than their size, or such code would overwrite other fields. Thus if int<N> were to be sub-byte packed by repr(packed), it would have to be !Sized or something so it can't be passed to existing code, and that doesn't sound like a nice world state. I'd much rather make a new "you can't get pointers to fields" repr than make that part of the uint<N> types.

@eddyb

This comment has been minimized.

Copy link
Member

eddyb commented Nov 5, 2018

You can use a pair of unsigned integers to specify a wraparound range, you don't need a signed version separately.

As far as specifying niches,could we not just have a trait with an associated constant specifying which bit-patterns are invalid for the type:

IMO that causes all sorts of issues. Whether the trait is implemented must now be evaluated by several different things, including unsafety checking and type layout.
And you don't get a constructor like NonNull:: new for free.

@rodrimati1992

This comment has been minimized.

Copy link

rodrimati1992 commented Nov 5, 2018

You can use a pair of unsigned integers to specify a wraparound range, you don't need a signed version separately.

As far as specifying niches,could we not just have a trait with an associated constant specifying which bit-patterns are invalid for the type:

IMO that causes all sorts of issues. Whether the trait is implemented must now be evaluated by several different things, including unsafety checking and type layout.
And you don't get a constructor like NonNull:: new for free.

I am not expecting to get NonNull::new for free though (as in without having to write it).

To clarify what I would expect if a trait like this were accepted,here is an newtype for u32 that disallows the maximum value in its constructor.

Edit:This may not be the best example,since I will get told that ranged integers are a superset of this example.


fn main(){
    
    assert_eq!(
        ::std::mem::size_of::<Option<NonMaxU32>>(),
        4
    );

    assert_eq!(
        ::std::mem::size_of::<NonMaxU32>(),
        4
    );


    assert_eq!(NoMaxU32::new(0).unwrap().value() , 0);
    assert_eq!(NoMaxU32::new(1).unwrap().value() , 1);
    assert_eq!(NoMaxU32::new(2).unwrap().value() , 2);

    let max=!0u32;

    assert_eq!(NoMaxU32::new(max-2).unwrap().value() , max-2);
    assert_eq!(NoMaxU32::new(max-1).unwrap().value() , max-1);
    assert_eq!(NoMaxU32::new(max) , None);
}

#[repr(transparent)]
#[derive(Copy,Clone,Debug)]
struct NoMaxU32{
    repr:u32,
}


impl NoMaxU32{
    pub fn new(repr:u32)->Option<Self>{
        if repr == !0u32 {
            None
        }else{
            Some(Self{ repr })
        }
    }
    pub fn value(self)->u32{ self.repr }
}


unsafe impl Niche for NoMaxU32{
    type Repr=u32;
    const VALID:ValidBitPatterns<u32>=
        ValidBitPatterns::Range(0..=((!0u32)-1) ) ;
}


@eddyb

This comment has been minimized.

Copy link
Member

eddyb commented Nov 5, 2018

Yes, I think that is harder to implement and more prone to subtle soundness issues. Note that your main function needs an unsafe block to create Self.

@clarfon

This comment has been minimized.

Copy link
Contributor Author

clarfon commented Nov 9, 2018

Finally managed to get around to updating this. Let me know if there's anything I missed, or if you have any additional things to clarify. :)

For example, here's a modified version of our previous example:

```rust
}

This comment has been minimized.

@mark-i-m

mark-i-m Nov 9, 2018

Contributor

Lol, I think something is missing here...

is required for `count_zeros`, and thus it's specialized. However, `Default`
does not require any specialisation, and can just be replaced with a single
generic impl.
#[repr(C, bitpacked)]

This comment has been minimized.

@mark-i-m

mark-i-m Nov 9, 2018

Contributor

It looks like you are missing the opening ```rust

This comment has been minimized.

@clarfon

clarfon Nov 9, 2018

Author Contributor

Whoops, I think I accidentally copy-pasted the wrong thing into my editor and broke up the text. I'll take a look tomorrow morning and fix it.

@leonardo-m

This comment has been minimized.

@joshtriplett

This comment has been minimized.

Copy link
Member

joshtriplett commented Nov 19, 2018

While I agree with postponing this until const generics land, I do think it makes sense to have enough compiler support to aid in using these in repr(C) structs, so I don't think these should just be a library without language support.

(Also, as a procedural note, if we postpone this, we need to add a note to the lang team agenda to revisit it at a defined later time.)

@rkruppe

This comment has been minimized.

Copy link
Member

rkruppe commented Nov 19, 2018

As noted in the RFC text, there is the option of using these types just to get better interfaces while still leaving bit fields out of the language and letting macros generate the code to insert or extract those bits into/from uN fields. Library-defined types that enforce the bit size limit without compiler support would work just as well for that purpose, and would avoid many of the language and compiler complications of native bit fields.

@clarfon

This comment has been minimized.

Copy link
Contributor Author

clarfon commented Nov 19, 2018

Yep! The goal of this is just the generic integer types, and how exactly bitfields are done is entirely separate. I personally feel like the benefits of having the generic integer types in the compiler make it worth it, but the point of the RFC is to determine that.

@jD91mZM2

This comment has been minimized.

Copy link

jD91mZM2 commented Jan 20, 2019

Yes please. I hate writing wrappers for integers, and using macros for a From implementation. Sure I could use num-traits, but if what I'm doing is meant for learning, adding dependencies that does magic behind the scenes is probably not a good idea.

@rfcbot

This comment has been minimized.

Copy link

rfcbot commented Feb 5, 2019

🔔 This is now entering its final comment period, as per the review above. 🔔

@rfcbot

This comment has been minimized.

Copy link

rfcbot commented Feb 15, 2019

The final comment period, with a disposition to postpone, as per the review above, is now complete.

By the power vested in me by Rust, I hereby postpone this RFC.

@rfcbot rfcbot closed this Feb 15, 2019

@Aloso

This comment has been minimized.

Copy link

Aloso commented Feb 17, 2019

Prior art

ZIG has arbitrary-width integers built-in:

In addition to the integer types above, arbitrary bit-width integers can be referenced by using an identifier of i or u followed by digits. For example, the identifier i7 refers to a signed 7-bit integer. The maximum allowed bit-width of an integer type is 65535.

https://ziglang.org/documentation/master/#Primitive-Types

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.