Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

all: Switch to deterministic BTreeMap/BTreeSet #235

Merged
merged 1 commit into from
Jun 8, 2024
Merged

Conversation

ReillyBrogan
Copy link
Contributor

@ReillyBrogan ReillyBrogan commented May 6, 2024

While testing ccache I noticed that cache hit rate was extremely low. After some debugging I noticed that the CFLAGS and other environmental variables and macro scripts were not being set deterministically. This is problematic for ccache as it hashes the compiler arguments as part of the key and the order matters to get the same key on a future run.

The cause of this non-determinism is the Rust stdlib HashMap and HashSet, for which the default hasher used a random seed. To solve this switch to BTreeMap and BTreeSet from the stdlib. Since non-determinism is just generally a bad idea for build tools and package management software replace ALL uses of HashMap and HashSet repo-wide.

Tested by building a package with ccache enabled and using ccache --show-stats in the package recipe to verify that the hit rate was now 100% and that the build stage time had dropped to just the time it spent linking.

@ermo ermo added the type: enhancement New feature or request label May 6, 2024
@ermo ermo added this to the oxide-prealpha1 milestone May 6, 2024
@ReillyBrogan ReillyBrogan marked this pull request as draft May 7, 2024 22:26
@ReillyBrogan ReillyBrogan changed the title all: Switch to deterministic IndexMap all: Switch to deterministic BTreeMap/BTreeSet May 7, 2024
@ReillyBrogan ReillyBrogan marked this pull request as ready for review May 8, 2024 16:25
@ermo
Copy link
Member

ermo commented May 20, 2024

@tarkah Your input on this would be appreciated. I don't see any reason why it shouldn't go in, but maybe I'm missing something...?

@tarkah
Copy link
Collaborator

tarkah commented May 21, 2024

This sounds good in theory. Let me dig through each change tomorrow to confirm there's no sneaky regressions due to change from Hash/Eq to Ord

@ermo
Copy link
Member

ermo commented May 22, 2024

This sounds good in theory. Let me dig through each change tomorrow to confirm there's no sneaky regressions due to change from Hash/Eq to Ord

Any luck with this... @tarkah ?

While testing ccache I noticed that cache hit rate was extremely low. After some debugging I noticed that the CFLAGS and other environmental variables and macro scripts were not being set deterministically. This is problematic for ccache as it hashes the compiler arguments as part of the key and the order matters to get the same key on a future run.

The cause of this non-determinism is the Rust stdlib HashMap and HashSet, for which the default hasher used a random seed. To solve this switch to BTreeMap and BTreeSet from the stdlib. Since non-determinism is just generally a bad idea for build tools and package management software replace ALL uses of HashMap and HashSet repo-wide.

Tested by building a package with ccache enabled and using `ccache --show-stats` in the package recipe to verify that the hit rate was now 100% and that the build stage time had dropped to just the time it spent linking.

Signed-off-by: Reilly Brogan <reilly@reillybrogan.com>
@ikeycode ikeycode merged commit 8ca5186 into main Jun 8, 2024
2 checks passed
@ikeycode ikeycode deleted the fix-determinism branch June 8, 2024 23:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: enhancement New feature or request
Projects
Development

Successfully merging this pull request may close these issues.

None yet

4 participants