Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discriminant bits #2684

Open
wants to merge 8 commits into
base: master
from

Conversation

Projects
None yet
@skade
Copy link
Contributor

commented Apr 15, 2019

Summary

This RFC proposes to expose the minimum size necessary to encode the discriminant of an enum, without exposing the exact encoding itself. This can be useful to write bitlevel collections.

Thanks @joshtriplett @Manishearth @eddyb and VLQC for early review <3.

Some details, especially naming, are very much bikesheddable.

skade added some commits Apr 14, 2019


This information can be used to pack multiple discriminants easily for efficient storage and easy indexing.

`Discriminat<T>` gains the methods `into_bits` and `from_bits`:

This comment has been minimized.

Copy link
@17cupsofcoffee

17cupsofcoffee Apr 15, 2019

Suggested change
`Discriminat<T>` gains the methods `into_bits` and `from_bits`:
`Discriminant<T>` gains the methods `into_bits` and `from_bits`:
# Reference-level explanation
[reference-level-explanation]: #reference-level-explanation
The feature may interact with non-exaustive enums.

This comment has been minimized.

Copy link
@17cupsofcoffee

17cupsofcoffee Apr 15, 2019

Suggested change
The feature may interact with non-exaustive enums.
The feature may interact with non-exhaustive enums.
@sgrif
Copy link
Contributor

left a comment

I think it's worth calling out how this interacts with explicit values that are larger than the smallest possible values, as well as explicit reprs. e.g. if I write

enum Foo {
    Bar = 7,
    Baz = 8,
}

technically the compiler still only needs 1 bit for the discriminant. We only need to convert to the larger values on cast. What's the return value of these methods in this case? Similarly if I write #[repr(u8)], is bit_size always 8?

Adding the proposed functions probably entails adding a new compiler intrinsic `discriminant_size`.
Empty enums are of size 0.

This comment has been minimized.

Copy link
@sgrif

sgrif Apr 15, 2019

Contributor

How about enums with a single variant?

This comment has been minimized.

Copy link
@ExpHP

ExpHP Apr 17, 2019

Obviously size zero.

As I see it, empty enums are specified in the proposal simply as an acknowledgement that they were taken into account. From a mathematical standpoint, they take NEG_INFINITY bytes. This proposal has chosen not to do this, and to treat them as ZSTs (which indeed aligns with some other parts of the language).

[reference-level-explanation]: #reference-level-explanation
The feature may interact with non-exaustive enums.
In this case, still, the currently used discriminant size should be used.

This comment has been minimized.

Copy link
@sgrif

sgrif Apr 15, 2019

Contributor

Doesn't this allow people to couple themselves to the number of variants in an enum which was explicitly requested to be non-exhaustive (to avoid that coupling)?

@SimonSapin

This comment has been minimized.

Copy link
Contributor

commented Apr 15, 2019

I think when talking about all this it is useful to separate:

  • The discriminant: the return value of std::mem::discriminant, a value that uniquely identifies a variant within an enum type. This has a dedicated Discriminant type that, so far, intentionally reveals very little. Not even that this is an integer.

  • The tag: this is part of the memory representation of #[repr(C)] and #[repr($Int)] enums. It may or may not be part of the memory representation of other enums. (E.g. it is not when the niche optimization is used, like in Option<&str>.)

  • The conversion to integer with as, for enums without any variant field (a.k.a “C-like” enum).

As an example of these things being separate, we have precedent in the bool type of a documented gurantee that conversion with as produces 1 for true and 0 for false, while making no guarantee about the memory representation.

I think it's worth calling out how this interacts with explicit values that are larger than the smallest possible values

So far the Variant = 7 in enum declaration is used for conversion to integer with as, and for the tag (with #2363).

This RFC should expand on the motivation for Discriminant::bit_size and Discriminant::into_bits and decide: should they be consistent with the (memory representation’s) tag? With the integer conversion with as?


Using these enums in collections is wasteful, as each instance reserves at least 1 byte of space.
Similarly, `std::mem::size_of<Discriminant<Cell>>()` is at least 1 byte.
For that reason, the book later goes on and replaces `Vec<Cell>` by [`fixedbitset`][game-of-life-exercise], ending up with a much less intuitive implementation.

This comment has been minimized.

Copy link
@Centril

Centril Apr 15, 2019

Member
Suggested change
For that reason, the book later goes on and replaces `Vec<Cell>` by [`fixedbitset`][game-of-life-exercise], ending up with a much less intuitive implementation.
For that reason, the Wasm book later goes on and replaces `Vec<Cell>` by [`fixedbitset`][game-of-life-exercise], ending up with a much less intuitive implementation.

(to avoid confusion with TRPL, which I had at first)

Similarly, `std::mem::size_of<Discriminant<Cell>>()` is at least 1 byte.
For that reason, the book later goes on and replaces `Vec<Cell>` by [`fixedbitset`][game-of-life-exercise], ending up with a much less intuitive implementation.

If it were possible to read the exact necessary size and the bit representation the descriminant, we could have a `PackedBits<T>` that uses exactly as much space as necessary.

This comment has been minimized.

Copy link
@Centril

Centril Apr 15, 2019

Member
Suggested change
If it were possible to read the exact necessary size and the bit representation the descriminant, we could have a `PackedBits<T>` that uses exactly as much space as necessary.
If it were possible to read the exact necessary size and the bit representation of the discriminant, we could define a type `PackedBits<T>` that uses exactly as much space as necessary.
`Discriminant::bit_size` is a method to retrieve the minimal number in bits necessary to represent this discriminant.

```rust
const fn bit_size() -> usize { }

This comment has been minimized.

Copy link
@Centril

Centril Apr 15, 2019

Member

If it is a method then self is needed somewhere.

This comment has been minimized.

Copy link
@oli-obk

oli-obk Apr 16, 2019

Contributor

It's a static method as far as I can tell. The runtime value of Discriminant is not needed

This comment has been minimized.

Copy link
@Centril

Centril Apr 16, 2019

Member

OK; It's not clear from the context that it is using the type variable T from Discriminant<T>; this should be made clear. Moreover, it is not a method because there's no such thing as a "static method" anymore. These are associated non-method functions.

This comment has been minimized.

Copy link
@eddyb

eddyb Apr 17, 2019

Member

I've suggested to make this an associated constant, btw.

`Discriminant::bit_size` is a method to retrieve the minimal number in bits necessary to represent this discriminant.

```rust
const fn bit_size() -> usize { }

This comment has been minimized.

Copy link
@Centril

Centril Apr 15, 2019

Member
Suggested change
const fn bit_size() -> usize { }
const fn bit_size() -> usize;

{ } misleads the reader into thinking for a moment that this is the definition, but it is just the signature.


## Disciminant data

`Discriminant::bit_size` is a method to retrieve the minimal number in bits necessary to represent this discriminant.

This comment has been minimized.

Copy link
@Centril

Centril Apr 15, 2019

Member
Suggested change
`Discriminant::bit_size` is a method to retrieve the minimal number in bits necessary to represent this discriminant.
`Discriminant::bit_size` is a method to retrieve the minimal number in bits necessary to represent an enum's discriminant.
- Why is this design the best in the space of possible designs?
- What other designs have been considered and what is the rationale for not choosing them?
- What is the impact of not doing this?
- `from_data` and `into_data` could instead be straight `From/Into` implementations

This comment has been minimized.

Copy link
@Centril

Centril Apr 15, 2019

Member

And why isn't it? As aforementioned there may be specific guarantees and non-guarantees we want to make which makes From and Into less apt in terms of communication of those guarantees to/with users.

- What other designs have been considered and what is the rationale for not choosing them?
- What is the impact of not doing this?
- `from_data` and `into_data` could instead be straight `From/Into` implementations
- Alternatively, `from/into_bits` could return a `Bits<T>` type with a richer interface

This comment has been minimized.

Copy link
@Centril

Centril Apr 15, 2019

Member

What would that richer interface be like?

# Future possibilities
[future-possibilities]: #future-possibilities
The feature is self-contained and I don't see direct extensions.

This comment has been minimized.

Copy link
@Centril

Centril Apr 15, 2019

Member
Suggested change
The feature is self-contained and I don't see direct extensions.
The feature is self-contained and there are no direct extensions.

Using these enums in collections is wasteful, as each instance reserves at least 1 byte of space.
Similarly, `std::mem::size_of<Discriminant<Cell>>()` is at least 1 byte.
For that reason, the book later goes on and replaces `Vec<Cell>` by [`fixedbitset`][game-of-life-exercise], ending up with a much less intuitive implementation.

This comment has been minimized.

Copy link
@Centril

Centril Apr 15, 2019

Member

So there is an implementation that works, but it is less intuitive; is that important? And is a more intuitive implementation sufficient justification to guarantee things I've mentioned below for all time?

Is PackedBits<T> more efficient than fixedbitset -- substantially so?


If it were possible to read the exact necessary size and the bit representation the descriminant, we could have a `PackedBits<T>` that uses exactly as much space as necessary.

This allows for an efficient representation of discriminant sets, which is both useful for simple enums, but also for crating an index of all discriminant values present in collection.

This comment has been minimized.

Copy link
@Centril

Centril Apr 15, 2019

Member
Suggested change
This allows for an efficient representation of discriminant sets, which is both useful for simple enums, but also for crating an index of all discriminant values present in collection.
This allows for an efficient representation of discriminant sets, which is both useful for simple enums, but also for creating an index of all discriminant values present in collection.
@joshtriplett

This comment has been minimized.

Copy link
Member

commented Apr 15, 2019

`Discriminant::bit_size` is a method to retrieve the minimal number in bits necessary to represent this discriminant.

```rust
const fn bit_size() -> usize { }

This comment has been minimized.

Copy link
@oli-obk

oli-obk Apr 16, 2019

Contributor

It's a static method as far as I can tell. The runtime value of Discriminant is not needed

const fn bit_size() -> usize { }
```

This number is not subject to optimisation, so e.g. `Option<&str>` reports a bitsize of `1`.

This comment has been minimized.

Copy link
@oli-obk

oli-obk Apr 16, 2019

Contributor

While being called "enum layout optimization", it's not really an optimization, but a clear set of rules of how the discriminant is represented.

What is the use-case for knowing the bitsize for enums whose variants have fields? I'm wondering if it would make more sense to have bit_size return Option<usize> in order to only return a bit_size where an actual tag field exists.

`Discriminant::bit_size` is a method to retrieve the minimal number in bits necessary to represent this discriminant.

```rust
const fn bit_size() -> usize { }

This comment has been minimized.

Copy link
@eddyb

eddyb Apr 17, 2019

Member

Not sure usize is the best. Other places use u32 when dealing with bit counts.

# Guide-level explanation
[guide-level-explanation]: #guide-level-explanation

## Disciminant data

This comment has been minimized.

Copy link
@sftim

sftim Apr 17, 2019

Suggested change
## Disciminant data
## Discriminant data
`Discriminat<T>` gains the methods `into_bits` and `from_bits`:

```rust
fn into_bits(&self) -> u128

This comment has been minimized.

Copy link
@ExpHP

ExpHP Apr 17, 2019

When I first saw this, I had to wonder what crazy kind of code would have an enum with more than 2^64 variants. Maybe #[repr(u128)] should be mentioned as justification?

This comment has been minimized.

Copy link
@Centril

Centril Apr 17, 2019

Member

In my mind it's not just about having > 2^64 variants. You can have much fewer and use enum Foo { ..., VariantN = DiscrimExprN, ... } to use up a whole lot of bits.

This comment has been minimized.

Copy link
@ExpHP

ExpHP Apr 17, 2019

Yes, that was my point. I had to think for a while before I remembered that explicit discriminants can be assigned.

This comment has been minimized.

Copy link
@skade

skade Apr 17, 2019

Author Contributor

@eddyb had actually recommended using that size, as that would be the final width of the internal Discriminant value.

@ExpHP

This comment has been minimized.

Copy link

commented Apr 17, 2019

This should mention the semantics for negative discriminants. I see three options for into_bits():

  1. Sign extension: -1i8 becomes -1i8 as i128 as u128, or 0xFFFF_FFFF_..._FFFF_FFFF..
  2. Zero extension of repr: -1i8 becomes (-1i8).into_bits() as u128, or 0x0000_0000_..._0000_00FF.
  3. Zero extension of bit_size bits: Like above, but we only keep bit_size bits. -1i8 becomes 0b00000000_..._00000111 if bit_size() == 3.

Similarly, there is a question for from_bits:

  • It could be strict and require the higher order bits to match into_bits().
  • It could be permissive and ignore the bits beyond bit_size.

My thoughts are:

  • Sign extension (option 1) is consistent with the behavior of Enum::Variant as u128.
  • Consider the use case of packing several enums into a bitset. Options 1 and 2 share a footgun when encoding a bunch of discriminants into a packed form. (one must truncate the higher order bits).
    • This makes option 2 worthless compared to option 3, so I won't consider it further.
  • If from_bits is strict, then option 1 also has an even bigger footgun when decoding a bunch of discriminants from a packed form. (one must manually sign-extend. Yikes!)
  • If from_bits is permissive, this may come with a performance cost.
@ExpHP

This comment has been minimized.

Copy link

commented Apr 17, 2019

I thought of an option 4, which seems to be most in spirit with the RFC:

  • into_bits() is opaque even for C-like enums. There is no guarantee that the value returned corresponds in any way to the integral constants assigned to the enum variants. However, it is guaranteed that only the bit_size most insignificant bits are nonzero.

With this option, from_bits should be strict.

This would basically use option 3 as the implementation, but without explicitly specifying it.

@oli-obk

This comment has been minimized.

Copy link
Contributor

commented Apr 17, 2019

There is no guarantee that the value returned corresponds in any way to the integral constants assigned to the enum variants

We do have a VariantIdx in the compiler (which is a u32 starting at 0). This would be much more condensed, but the implementation of into_bits would not be trivial anymore, making into_bits and from_bits require loads of logic and probably lookup tables.

@bill-myers

This comment has been minimized.

Copy link

commented Apr 20, 2019

Seems like this might be better handled by a "bit-packed serialization trait" implemented via custom or built-in derive that would give you the number of bits required to serialize a type and allow to (de)serialize a value into a bit stream.

In particular, such an approach would work for non-C-like enums and would also allow to recursively pack data.

@oli-obk

This comment has been minimized.

Copy link
Contributor

commented Apr 20, 2019

Seems like this might be better handled by a "bit-packed serialization trait" implemented via custom or built-in derive that would give you the number of bits required to serialize a type and allow to (de)serialize a value into a bit stream.

The problem with that is that it's very expensive. I believe the motivation of this RFC is to have an essentially zero cost operation (read one integer from memory) that gives you the discriminant.

@clarfon

This comment has been minimized.

Copy link
Contributor

commented Apr 24, 2019

There was a mention in my generic integers RFC about the concept of "bit sizes" for types, which could be used to make bit fields work. This seems like a special case of that to enums which is a bit odd to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.