-
Notifications
You must be signed in to change notification settings - Fork 10.5k
Use the "nearly divisionless" algorithm on all targets. #37920
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use the "nearly divisionless" algorithm on all targets. #37920
Conversation
We have multipliedFullWidth available everywhere now, so we don't need to carry around the old implementation on 32b targets. Also adds a few more benchmarks for random number generation.
@swift-ci please benchmark |
@swift-ci please test |
…rks. When these are visible compile-time constants, the compiler is smart enough to evaluate the division in the "nearly divisionless" algorithm, which makes it completely divisionless. That's good, but it obscures what the runtime performance of the algorithm will be when the bounds are _not_ available as compile-time constants. Thus, for some of the newly-added benchmarks, we pass the upper bound through `identity` to hide it from the optimizer (this is imperfect, but it's the simplest tool we have). We don't want to do this for all the tests for two reasons: - compile-time constant bounds are a common case that should still be reflected in our testing - we don't want to perturb existing benchmark results more than we have to.
@swift-ci please benchmark |
Performance (x86_64): -O
Code size: -O
Performance (x86_64): -Osize
Code size: -Osize
Performance (x86_64): -Onone
Code size: -swiftlibs
How to read the dataThe tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.If you see any unexpected regressions, you should consider fixing the Noise: Sometimes the performance results (not code size!) contain false Hardware Overview
|
@swift-ci test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The stdlib changes look good to me. I wonder if for the benchmarks it makes sense to add separate IntegerOpaque benchmarks like the ones you did for Double so we can see clearly which integer benchmarks are constant ranges vs. going through the algorithm. I was going to ask to land these benchmarks separately, but I don't think CI runs 32b benchmarks for this to matter for this PR.
Is it possible to reduce the number of benchmarks to a representative subset? Alternatively, you can keep all the benchmarks, but add a |
This is the reduced set of benchmarks! I'm not sure why we would reduce it further if we care about performance at all. I'm pretty sure that report and track fewer benchmark results for the entirety of the Swift compiler and standard library than we did for a single API in Accelerate when I was working on it(!) A reasonable set of "all" the benchmarks, just for random integers, would include the full matrix of (every builtin integer type + representative custom types) x (fast and slow generators) x (various ranges--tiny, about half, nearly all) x (compile time constant vs unknown bounds). I'm probably leaving a few dimensions out. The benchmarks are fast--much, much faster to run than the test suite--so I don't see what we gain by stripping it down below the minimum representative sample, and there's a lot to lose in doing so. |
ok, fine with me |
We have multipliedFullWidth available everywhere now, so we don't need to carry around the old implementation on 32b targets.
Also adds a few more benchmarks for random number generation.
Resolves #53302