Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option<char> should be represented as just one 32 bit value #5977

Closed
Kimundi opened this issue Apr 20, 2013 · 16 comments · Fixed by #45225
Closed

Option<char> should be represented as just one 32 bit value #5977

Kimundi opened this issue Apr 20, 2013 · 16 comments · Fixed by #45225
Labels
A-codegen Area: Code generation A-unicode Area: Unicode C-enhancement Category: An issue proposing an enhancement or a PR with one. I-slow Issue: Problems and improvements with respect to performance of generated code.

Comments

@Kimundi
Copy link
Member

Kimundi commented Apr 20, 2013

A char is a u32, but Unicode is standardized to only take up 21bit, so there is more than enough space for that optimization.

Might be useful for specifying code page conversion tables with holes in them.

@msullivan
Copy link
Contributor

In the same vein as #6791. Is a reasonable optimization but probably not critical. We do similar optimizations for pointer types, I think.

@thestinger
Copy link
Contributor

Okay, this would mean representing Some(char) as the zero tag and None with the non-zero tag. If we fix char to actually always be 21-bit, that would work.

@thestinger
Copy link
Contributor

Once #8974 lands, we can go ahead with this.

@Thiez
Copy link
Contributor

Thiez commented Sep 4, 2013

If the compiler knew about possible values a type could take perhaps it could make these optimizations automatically? E.g. an ASCII character can have values in the range 0-127, so Option<ASCII> could automatically get optimized to 1 byte.
The following could be optimized to a single byte as well.

enum T {
    On,
    Off
}
enum Repeats {
    Zero
    One(T),
    Two(T,T),
    Three(T,T,T)
    Four(T,T,T,T)
    Five(T,T,T,T,T)
}

Because Repeats has 6 variants it can use 3 tag bits, and T uses 1 bit, so the largest variant uses 5 bits for the data, it all fits in a byte.

@huonw
Copy link
Member

huonw commented Sep 4, 2013

@Thiez what happens for:

match repeats {
    Five(ref x, _, _, _, _) => {}
    _ => {}
}

it'd have to be able to handle sub-word references.

@thestinger
Copy link
Contributor

This will only work for Option. The compiler knows the high bits will always be zero for char so it can reuse one of them as the tag, with zero representing a Some(char) and non-zero representing None. It can't actually set any of the bits or references won't work.

@glaebhoerl
Copy link
Contributor

I think it would work for any enum with up to 2047 nullary variants and one variant containing a char.

@thestinger
Copy link
Contributor

If you set any of the bits in the char, a reference to the char will be corrupt.

@bluss
Copy link
Member

bluss commented Sep 4, 2013

Can this case be extended to bool as well? it has 7 unused bits.

@glaebhoerl
Copy link
Contributor

@thestinger: Yes. But if you have:

enum MaybeChar {
    HasChar(char),
    NoChar_0,
    NoChar_1,
    NoChar_2,
    ...,
    NoChar_N
}

You can freely use up to 11 bits to represent which variant it is, with the restriction that zero is reserved for HasChar.

@alexcrichton
Copy link
Member

We now have range asserts on char, so this is more possible than it was before.

@huonw
Copy link
Member

huonw commented Sep 9, 2014

Similar to #14540.

@Yoric
Copy link
Contributor

Yoric commented Jan 22, 2016

If someone can mentor me through this bug, I'd be interested in giving it a try.

@pnkfelix
Copy link
Member

@Yoric one starting point to look would be src/librustc_trans/trans/adt.rs ; you might look over how PR #19765 changed that file to expand how we search for potential discriminant representations in an enum.

Yoric pushed a commit to Yoric/rust that referenced this issue Feb 2, 2016
Yoric pushed a commit to Yoric/rust that referenced this issue Feb 3, 2016
@mrhota
Copy link
Contributor

mrhota commented Aug 18, 2016

FYI @pnkfelix I rebased @Yoric's work onto @eddyb's recent PR #35764

I'll wait for that to land, then I'll issue a PR for Yoric's rebased work.

mrhota pushed a commit to mrhota/rust that referenced this issue Sep 3, 2016
mrhota pushed a commit to mrhota/rust that referenced this issue Sep 3, 2016
@frewsxcv
Copy link
Member

For anyone wanting to see the status of this, here's code you can run:

fn main() {
    println!("{}", ::std::mem::size_of::<Option<char>>());
}

playpen

it currently prints '8' (bytes)

bors added a commit that referenced this issue Oct 14, 2017
Refactor type memory layouts and ABIs, to be more general and easier to optimize.

To combat combinatorial explosion, type layouts are now described through 3 orthogonal properties:
* `Variants` describes the plurality of sum types (where applicable)
  * `Single` is for one inhabited/active variant, including all C `struct`s and `union`s
  * `Tagged` has its variants discriminated by an integer tag, including C `enum`s
  * `NicheFilling` uses otherwise-invalid values ("niches") for all but one of its inhabited variants
* `FieldPlacement` describes the number and memory offsets of fields (if any)
  * `Union` has all its fields at offset `0`
  * `Array` has offsets that are a multiple of its `stride`; guarantees all fields have one type
  * `Arbitrary` records all the field offsets, which can be out-of-order
* `Abi` describes how values of the type should be passed around, including for FFI
  * `Uninhabited` corresponds to no values, associated with unreachable control-flow
  * `Scalar` is ABI-identical to its only integer/floating-point/pointer "scalar component"
  * `ScalarPair` has two "scalar components", but only applies to the Rust ABI
  * `Vector` is for SIMD vectors, typically `#[repr(simd)]` `struct`s in Rust
  * `Aggregate` has arbitrary contents, including all non-transparent C `struct`s and `union`s

Size optimizations implemented so far:
* ignoring uninhabited variants (i.e. containing uninhabited fields), e.g.:
  * `Option<!>` is 0 bytes
  * `Result<T, !>` has the same size as `T`
* using arbitrary niches, not just `0`, to represent a data-less variant, e.g.:
  * `Option<bool>`, `Option<Option<bool>>`, `Option<Ordering>` are all 1 byte
  * `Option<char>` is 4 bytes
* using a range of niches to represent *multiple* data-less variants, e.g.:
  * `enum E { A(bool), B, C, D }` is 1 byte

Code generation now takes advantage of `Scalar` and `ScalarPair` to, in more cases, pass around scalar components as immediates instead of indirectly, through pointers into temporary memory, while avoiding LLVM's "first-class aggregates", and there's more untapped potential here.

Closes #44426, fixes #5977, fixes #14540, fixes #43278.
bors added a commit that referenced this issue Oct 29, 2017
Refactor type memory layouts and ABIs, to be more general and easier to optimize.

To combat combinatorial explosion, type layouts are now described through 3 orthogonal properties:
* `Variants` describes the plurality of sum types (where applicable)
  * `Single` is for one inhabited/active variant, including all C `struct`s and `union`s
  * `Tagged` has its variants discriminated by an integer tag, including C `enum`s
  * `NicheFilling` uses otherwise-invalid values ("niches") for all but one of its inhabited variants
* `FieldPlacement` describes the number and memory offsets of fields (if any)
  * `Union` has all its fields at offset `0`
  * `Array` has offsets that are a multiple of its `stride`; guarantees all fields have one type
  * `Arbitrary` records all the field offsets, which can be out-of-order
* `Abi` describes how values of the type should be passed around, including for FFI
  * `Uninhabited` corresponds to no values, associated with unreachable control-flow
  * `Scalar` is ABI-identical to its only integer/floating-point/pointer "scalar component"
  * `ScalarPair` has two "scalar components", but only applies to the Rust ABI
  * `Vector` is for SIMD vectors, typically `#[repr(simd)]` `struct`s in Rust
  * `Aggregate` has arbitrary contents, including all non-transparent C `struct`s and `union`s

Size optimizations implemented so far:
* ignoring uninhabited variants (i.e. containing uninhabited fields), e.g.:
  * `Option<!>` is 0 bytes
  * `Result<T, !>` has the same size as `T`
* using arbitrary niches, not just `0`, to represent a data-less variant, e.g.:
  * `Option<bool>`, `Option<Option<bool>>`, `Option<Ordering>` are all 1 byte
  * `Option<char>` is 4 bytes
* using a range of niches to represent *multiple* data-less variants, e.g.:
  * `enum E { A(bool), B, C, D }` is 1 byte

Code generation now takes advantage of `Scalar` and `ScalarPair` to, in more cases, pass around scalar components as immediates instead of indirectly, through pointers into temporary memory, while avoiding LLVM's "first-class aggregates", and there's more untapped potential here.

Closes #44426, fixes #5977, fixes #14540, fixes #43278.
bors added a commit that referenced this issue Nov 13, 2017
Refactor type memory layouts and ABIs, to be more general and easier to optimize.

To combat combinatorial explosion, type layouts are now described through 3 orthogonal properties:
* `Variants` describes the plurality of sum types (where applicable)
  * `Single` is for one inhabited/active variant, including all C `struct`s and `union`s
  * `Tagged` has its variants discriminated by an integer tag, including C `enum`s
  * `NicheFilling` uses otherwise-invalid values ("niches") for all but one of its inhabited variants
* `FieldPlacement` describes the number and memory offsets of fields (if any)
  * `Union` has all its fields at offset `0`
  * `Array` has offsets that are a multiple of its `stride`; guarantees all fields have one type
  * `Arbitrary` records all the field offsets, which can be out-of-order
* `Abi` describes how values of the type should be passed around, including for FFI
  * `Uninhabited` corresponds to no values, associated with unreachable control-flow
  * `Scalar` is ABI-identical to its only integer/floating-point/pointer "scalar component"
  * `ScalarPair` has two "scalar components", but only applies to the Rust ABI
  * `Vector` is for SIMD vectors, typically `#[repr(simd)]` `struct`s in Rust
  * `Aggregate` has arbitrary contents, including all non-transparent C `struct`s and `union`s

Size optimizations implemented so far:
* ignoring uninhabited variants (i.e. containing uninhabited fields), e.g.:
  * `Option<!>` is 0 bytes
  * `Result<T, !>` has the same size as `T`
* using arbitrary niches, not just `0`, to represent a data-less variant, e.g.:
  * `Option<bool>`, `Option<Option<bool>>`, `Option<Ordering>` are all 1 byte
  * `Option<char>` is 4 bytes
* using a range of niches to represent *multiple* data-less variants, e.g.:
  * `enum E { A(bool), B, C, D }` is 1 byte

Code generation now takes advantage of `Scalar` and `ScalarPair` to, in more cases, pass around scalar components as immediates instead of indirectly, through pointers into temporary memory, while avoiding LLVM's "first-class aggregates", and there's more untapped potential here.

Closes #44426, fixes #5977, fixes #14540, fixes #43278.
bors added a commit that referenced this issue Nov 19, 2017
Refactor type memory layouts and ABIs, to be more general and easier to optimize.

To combat combinatorial explosion, type layouts are now described through 3 orthogonal properties:
* `Variants` describes the plurality of sum types (where applicable)
  * `Single` is for one inhabited/active variant, including all C `struct`s and `union`s
  * `Tagged` has its variants discriminated by an integer tag, including C `enum`s
  * `NicheFilling` uses otherwise-invalid values ("niches") for all but one of its inhabited variants
* `FieldPlacement` describes the number and memory offsets of fields (if any)
  * `Union` has all its fields at offset `0`
  * `Array` has offsets that are a multiple of its `stride`; guarantees all fields have one type
  * `Arbitrary` records all the field offsets, which can be out-of-order
* `Abi` describes how values of the type should be passed around, including for FFI
  * `Uninhabited` corresponds to no values, associated with unreachable control-flow
  * `Scalar` is ABI-identical to its only integer/floating-point/pointer "scalar component"
  * `ScalarPair` has two "scalar components", but only applies to the Rust ABI
  * `Vector` is for SIMD vectors, typically `#[repr(simd)]` `struct`s in Rust
  * `Aggregate` has arbitrary contents, including all non-transparent C `struct`s and `union`s

Size optimizations implemented so far:
* ignoring uninhabited variants (i.e. containing uninhabited fields), e.g.:
  * `Option<!>` is 0 bytes
  * `Result<T, !>` has the same size as `T`
* using arbitrary niches, not just `0`, to represent a data-less variant, e.g.:
  * `Option<bool>`, `Option<Option<bool>>`, `Option<Ordering>` are all 1 byte
  * `Option<char>` is 4 bytes
* using a range of niches to represent *multiple* data-less variants, e.g.:
  * `enum E { A(bool), B, C, D }` is 1 byte

Code generation now takes advantage of `Scalar` and `ScalarPair` to, in more cases, pass around scalar components as immediates instead of indirectly, through pointers into temporary memory, while avoiding LLVM's "first-class aggregates", and there's more untapped potential here.

Closes #44426, fixes #5977, fixes #14540, fixes #43278.
flip1995 pushed a commit to flip1995/rust that referenced this issue Sep 24, 2020
…tthiaskrgr

Add lint panic in result

### Change
Adding a new "restriction" lint that will emit a warning when using "panic", "unimplemented" or "unreachable" in a function of type option/result.

### Motivation
Some codebases must avoid crashes at all costs, and hence functions of type option/result must return an error instead of crashing.

### Test plan
Running:
TESTNAME=panic_in_result cargo uitest ---

changelog: none
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-codegen Area: Code generation A-unicode Area: Unicode C-enhancement Category: An issue proposing an enhancement or a PR with one. I-slow Issue: Problems and improvements with respect to performance of generated code.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

14 participants