Add niche optimization #49

Kijewski · 2022-09-05T15:01:13Z

This PR adds niche optimization. This lets the compiler use values outside of the valid integer range to store e.g. other enum variants. This makes Option<u7> one byte.

The internal representation is own the same as the related primitive type, i.e. every u30 can be transmuted to u32. The other direction is possible if the u32 is representable in the u30 range.

This PR adds niche optimization. This lets the compiler use values outside of the valid integer range to store e.g. other enum variants. This makes `Option<u7>` one byte. The internal representation is own the same as the related primitive type, i.e. every u30 can be transmuted to u32. The other direction is possible if the u32 is representable in the u30 range.

chrysn · 2022-09-05T16:41:17Z

Nice; with the struct trick it should not make build times too bad. Thanks for implementing this!

I'll definitely give it a try and report back here.

(I might also look into whether types that previously made little sense (eg. n31 that is an i32 but guaranteed to be < 0) can be implemented using this, as ultimately that's where I want to go.)

kjetilkjeka

Thanks a lot for this PR. I think this is great!

Do you know if we have managed to not expose any of the underlying implementation details such that this can be considered backwards compatible? Or have we given any guarantees that this will break? If you're not sure I will try to dig into it one of the next days.

Would it make sense to add some sort of test to verify that the niche optimization is being made when possible? Maybe there are other optimization we would like to implement down the road, then such test would be helpful to make sure we don't break anything. Let me know what you think.

I will try to check out the actual build time one of the next days and I also hope @chrysn gets a chance to comment after testing it.

src/niche/signed.rs

src/niche/unsigned.rs

This speeds up the compilation by 50% (YMMV), because code is only generated if used. The methods should generate trivial instructions for most methods, so there should be no drawbacks in using this annotation.

Kijewski · 2022-09-14T12:31:51Z

Do you know if we have managed to not expose any of the underlying implementation details such that this can be considered backwards compatible?

I code it fully self-contained. The API was not changed in any way, nothing added, nothing removed.

But any code that relied on the previous, undocumented behavior, that you could set the unused bits to any value, will break. This could only be done on unsafe code using type casts, so I don't think that matters.

Would it make sense to add some sort of test to verify that the niche optimization is being made when possible?

static_assertions could be used to ensure that the optimizations work. I'll have a look what the best way is to add it.

I will try to check out the actual build time one of the next days

I added a commit to this PR, that makes every method #[inline] and cuts the compilation time in half: 1c7a8ef. (It should probably be it's own PR, though.)

ironhaven · 2022-09-16T07:04:37Z

So I just finished testing how this pull affects compile times and here are the results. The test was I ran cargo check --timings on the hello world binary crate with this library as a git dependancy and took the time to check the library alone.

Computer	Master branch	Pull branch	Percentage
Win 10 Ryzen 3600x (6 core 4Ghz)	10.09s	11.02s	+9.21%
Manjaro arm Pinebook Pro (2 cores 2Ghz, 4 cores 1.5Ghz)	57.12s	63.6s	+1134%

test data

My testing shows this pull will increase compile time by about 10%. That is the raw data.

Kijewski · 2022-09-16T13:11:40Z

I've added a test that checks the layout (size + alignment) of uX is the same as its primitive type, and that the niche optimization for Option<uX> is used.

Kijewski · 2022-09-16T13:26:07Z

I optimized the macro invocation, and used the same test as @ironhaven (best of three runs)

master	pr	percentage
8.75s	8.87s	+1.4%

cargo 1.63.0 (fd9c4297c 2022-07-01), AMD Ryzen 7 2700X (8 cores @ 3.7GHz)

The macro optimization could easily be used without adding niche optimization, so it would be a lie for me to claim this PR comes without a cost.

kjetilkjeka · 2022-09-23T09:00:08Z

@chrysn do you think you will have a chance to test it out before merge? Or should I try to move this on without?

chrysn · 2022-09-23T14:08:46Z

I've attempted a larger trial run, but that got stuck in too many places where I'd need to introduce additional .into() calls to make things practical. (I still plan to use it there, but it'll have to wait).

I've done a few tests more on the micro side, finding that

Result<u31, ()> and other ZSTs (like single-inhabitant repr(Rust) enums) are good / small (4 byte),
Result<u31, T> with a multi-valued enums are large (8 byte),
and even Result<u31, T> with a repr(u32) enums with a value in which the MSB-set is large (8 byte).

That doesn't make me particularly happy, but it's still an improvement, and most importantly, to my understanding, further improvements would not need to be done here, but would be up to the Rust compiler.

So I have no clear recommendation -- it's definitely not a "no", but it's not the enthusiastic "yes" I hoped to provide either. Given you were about to merge even without my tests, I'd guess that's a "go ahead".

Kijewski · 2022-09-23T14:14:31Z

and even Result<u31, T> with a repr(u32) enums with a value in which the MSB-set is large (8 byte).

@chrysn, that should be fixed by rust-lang/rust#94075. I don't know when this change lands in stable.

chrysn · 2022-09-23T20:54:00Z

I should have tested on nightly right away -- a Result<u31, multi-valued-enum> is apparently optimized starting with current beta, 1.65-to-be. What it still can't do is optimizing for enums with values (eg. enum Error { Val1 = 0xffffffff, Val2 = 0xff00ffff }), like we'd have if we introduced a i32neg type that represents the inclusive range from -1 to -2**32 (where Result<u31, i32neg> would be practically u32 but treatable like a result).

So ... well, one step closer even. Thanks, I like it :-)

dacut · 2023-01-17T00:00:25Z

I was trying to see what was still needed for this PR. Seems like it's just build times that are a concern?

Here's the latest on my M1 MacBook Pro, if this is of any help (hello world cargo binary using uX as a dep, with just enough code so it's not elided, cargo clean; cargo check --timings):

master: Min 5.72 s/Mean 5.78 s/Max 5.93 s
pr-niche: Min 5.79 s/Mean 5.94 s/Max 6.05 s

It's a 3% gain, but with some overlap in the data.

Kijewski added 2 commits September 5, 2022 16:51

Only use public interface in tests

79323d7

chrysn mentioned this pull request Sep 5, 2022

wishlist: Allow optimization of uninhabited values #27

Open

Kijewski added 2 commits September 5, 2022 19:23

FIXUP: distinguish between signed and unsigned bytes

191510c

FIXUP: combine signed_bytes and unsigned_bytes

36acdeb

Kijewski mentioned this pull request Sep 7, 2022

Add niche optimization for NaiveTime chronotope/chrono#811

Closed

kjetilkjeka reviewed Sep 13, 2022

View reviewed changes

src/niche/signed.rs Outdated Show resolved Hide resolved

src/niche/signed.rs Outdated Show resolved Hide resolved

src/niche/unsigned.rs Outdated Show resolved Hide resolved

This was referenced Sep 13, 2022

Serde support #17

Open

Maintenance status accurate? #29

Open

Kijewski added 2 commits September 14, 2022 13:58

FIXUP: combine into one file

01c5bf3

Make every method #[inline]

1c7a8ef

This speeds up the compilation by 50% (YMMV), because code is only generated if used. The methods should generate trivial instructions for most methods, so there should be no drawbacks in using this annotation.

Add layout test

dadb1db

Cleanup: remove lib core

65b6a7e

chrysn mentioned this pull request Sep 26, 2022

Range types for integers (or refinement types?) rust-lang/rfcs#671

Open

chrysn mentioned this pull request Feb 4, 2023

Display traits don't mask #52

Closed

GnomedDev mentioned this pull request Jul 4, 2023

Niche optimisation JonathanWoollett-Light/ux2#18

Open

chrysn mentioned this pull request Feb 14, 2024

Decide mask-before vs. mask-after for public interfaces #67

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add niche optimization #49

Add niche optimization #49

Kijewski commented Sep 5, 2022

chrysn commented Sep 5, 2022

kjetilkjeka left a comment

Kijewski commented Sep 14, 2022

ironhaven commented Sep 16, 2022

Kijewski commented Sep 16, 2022

Kijewski commented Sep 16, 2022 •

edited

Loading

kjetilkjeka commented Sep 23, 2022

chrysn commented Sep 23, 2022

Kijewski commented Sep 23, 2022 •

edited

Loading

chrysn commented Sep 23, 2022

dacut commented Jan 17, 2023

Add niche optimization #49

Are you sure you want to change the base?

Add niche optimization #49

Conversation

Kijewski commented Sep 5, 2022

chrysn commented Sep 5, 2022

kjetilkjeka left a comment

Choose a reason for hiding this comment

Kijewski commented Sep 14, 2022

ironhaven commented Sep 16, 2022

Kijewski commented Sep 16, 2022

Kijewski commented Sep 16, 2022 • edited Loading

kjetilkjeka commented Sep 23, 2022

chrysn commented Sep 23, 2022

Kijewski commented Sep 23, 2022 • edited Loading

chrysn commented Sep 23, 2022

dacut commented Jan 17, 2023

Kijewski commented Sep 16, 2022 •

edited

Loading

Kijewski commented Sep 23, 2022 •

edited

Loading