Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
By choosing primes which allow for faster division, we can shave off some time.
The exact performance depends on the architecture. The worst possible case is
aarch64
, where(due to GHC shortcomings) we cannot benefit from faster division algorithms at all.
Nevertheless, there are some modest gains, mostly because the divisor is now passed unboxed
and remainders forced, so they are unboxed
Int#
as well. Here are numbers on macOS M2:Running the same benchmark on
x86_64
demonstrates much more pronounced benefits, up to 3x faster: