Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add niche optimization #49

Open
wants to merge 8 commits into
base: master
Choose a base branch
from
Open

Add niche optimization #49

wants to merge 8 commits into from

Conversation

Kijewski
Copy link

@Kijewski Kijewski commented Sep 5, 2022

This PR adds niche optimization. This lets the compiler use values outside of the valid integer range to store e.g. other enum variants. This makes Option<u7> one byte.

The internal representation is own the same as the related primitive type, i.e. every u30 can be transmuted to u32. The other direction is possible if the u32 is representable in the u30 range.

This PR adds niche optimization. This lets the compiler use values
outside of the valid integer range to store e.g. other enum variants.
This makes `Option<u7>` one byte.

The internal representation is own the same as the related primitive
type, i.e. every u30 can be transmuted to u32. The other direction is
possible if the u32 is representable in the u30 range.
@chrysn
Copy link
Member

chrysn commented Sep 5, 2022

Nice; with the struct trick it should not make build times too bad. Thanks for implementing this!

I'll definitely give it a try and report back here.

(I might also look into whether types that previously made little sense (eg. n31 that is an i32 but guaranteed to be < 0) can be implemented using this, as ultimately that's where I want to go.)

Copy link
Collaborator

@kjetilkjeka kjetilkjeka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for this PR. I think this is great!

Do you know if we have managed to not expose any of the underlying implementation details such that this can be considered backwards compatible? Or have we given any guarantees that this will break? If you're not sure I will try to dig into it one of the next days.

Would it make sense to add some sort of test to verify that the niche optimization is being made when possible? Maybe there are other optimization we would like to implement down the road, then such test would be helpful to make sure we don't break anything. Let me know what you think.

I will try to check out the actual build time one of the next days and I also hope @chrysn gets a chance to comment after testing it.

src/niche/signed.rs Outdated Show resolved Hide resolved
src/niche/signed.rs Outdated Show resolved Hide resolved
src/niche/unsigned.rs Outdated Show resolved Hide resolved
This was referenced Sep 13, 2022
This speeds up the compilation by 50% (YMMV), because code is only
generated if used.

The methods should generate trivial instructions for most methods, so
there should be no drawbacks in using this annotation.
@Kijewski
Copy link
Author

Do you know if we have managed to not expose any of the underlying implementation details such that this can be considered backwards compatible?

I code it fully self-contained. The API was not changed in any way, nothing added, nothing removed.

But any code that relied on the previous, undocumented behavior, that you could set the unused bits to any value, will break. This could only be done on unsafe code using type casts, so I don't think that matters.

Would it make sense to add some sort of test to verify that the niche optimization is being made when possible?

static_assertions could be used to ensure that the optimizations work. I'll have a look what the best way is to add it.

I will try to check out the actual build time one of the next days

I added a commit to this PR, that makes every method #[inline] and cuts the compilation time in half: 1c7a8ef. (It should probably be it's own PR, though.)

@ironhaven
Copy link

So I just finished testing how this pull affects compile times and here are the results. The test was I ran cargo check --timings on the hello world binary crate with this library as a git dependancy and took the time to check the library alone.

Computer Master branch Pull branch Percentage
Win 10 Ryzen 3600x (6 core 4Ghz) 10.09s 11.02s +9.21%
Manjaro arm Pinebook Pro (2 cores 2Ghz, 4 cores 1.5Ghz) 57.12s 63.6s +1134%

test data

My testing shows this pull will increase compile time by about 10%. That is the raw data.

@Kijewski
Copy link
Author

I've added a test that checks the layout (size + alignment) of uX is the same as its primitive type, and that the niche optimization for Option<uX> is used.

@Kijewski
Copy link
Author

Kijewski commented Sep 16, 2022

I optimized the macro invocation, and used the same test as @ironhaven (best of three runs)

master pr percentage
8.75s 8.87s +1.4%

cargo 1.63.0 (fd9c4297c 2022-07-01), AMD Ryzen 7 2700X (8 cores @ 3.7GHz)

The macro optimization could easily be used without adding niche optimization, so it would be a lie for me to claim this PR comes without a cost.

@kjetilkjeka
Copy link
Collaborator

@chrysn do you think you will have a chance to test it out before merge? Or should I try to move this on without?

@chrysn
Copy link
Member

chrysn commented Sep 23, 2022

I've attempted a larger trial run, but that got stuck in too many places where I'd need to introduce additional .into() calls to make things practical. (I still plan to use it there, but it'll have to wait).

I've done a few tests more on the micro side, finding that

  • Result<u31, ()> and other ZSTs (like single-inhabitant repr(Rust) enums) are good / small (4 byte),
  • Result<u31, T> with a multi-valued enums are large (8 byte),
  • and even Result<u31, T> with a repr(u32) enums with a value in which the MSB-set is large (8 byte).

That doesn't make me particularly happy, but it's still an improvement, and most importantly, to my understanding, further improvements would not need to be done here, but would be up to the Rust compiler.

So I have no clear recommendation -- it's definitely not a "no", but it's not the enthusiastic "yes" I hoped to provide either. Given you were about to merge even without my tests, I'd guess that's a "go ahead".

@Kijewski
Copy link
Author

Kijewski commented Sep 23, 2022

and even Result<u31, T> with a repr(u32) enums with a value in which the MSB-set is large (8 byte).

@chrysn, that should be fixed by rust-lang/rust#94075. I don't know when this change lands in stable.

@chrysn
Copy link
Member

chrysn commented Sep 23, 2022

I should have tested on nightly right away -- a Result<u31, multi-valued-enum> is apparently optimized starting with current beta, 1.65-to-be. What it still can't do is optimizing for enums with values (eg. enum Error { Val1 = 0xffffffff, Val2 = 0xff00ffff }), like we'd have if we introduced a i32neg type that represents the inclusive range from -1 to -2**32 (where Result<u31, i32neg> would be practically u32 but treatable like a result).

So ... well, one step closer even. Thanks, I like it :-)

@dacut
Copy link

dacut commented Jan 17, 2023

I was trying to see what was still needed for this PR. Seems like it's just build times that are a concern?

Here's the latest on my M1 MacBook Pro, if this is of any help (hello world cargo binary using uX as a dep, with just enough code so it's not elided, cargo clean; cargo check --timings):

  • master: Min 5.72 s/Mean 5.78 s/Max 5.93 s
  • pr-niche: Min 5.79 s/Mean 5.94 s/Max 6.05 s

It's a 3% gain, but with some overlap in the data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants