Use the "nearly divisionless" algorithm on all targets. #37920

stephentyrone · 2021-06-15T14:55:29Z

We have multipliedFullWidth available everywhere now, so we don't need to carry around the old implementation on 32b targets.

Also adds a few more benchmarks for random number generation.

Resolves #53302

We have multipliedFullWidth available everywhere now, so we don't need to carry around the old implementation on 32b targets. Also adds a few more benchmarks for random number generation.

stephentyrone · 2021-06-15T14:55:40Z

@swift-ci please benchmark

stephentyrone · 2021-06-15T14:55:47Z

@swift-ci please test

…rks. When these are visible compile-time constants, the compiler is smart enough to evaluate the division in the "nearly divisionless" algorithm, which makes it completely divisionless. That's good, but it obscures what the runtime performance of the algorithm will be when the bounds are _not_ available as compile-time constants. Thus, for some of the newly-added benchmarks, we pass the upper bound through `identity` to hide it from the optimizer (this is imperfect, but it's the simplest tool we have). We don't want to do this for all the tests for two reasons: - compile-time constant bounds are a common case that should still be reflected in our testing - we don't want to perturb existing benchmark results more than we have to.

stephentyrone · 2021-06-15T15:32:23Z

@swift-ci please benchmark

swift-ci · 2021-06-15T16:15:52Z

Performance (x86_64): -O

Regression	OLD	NEW	DELTA	RATIO
DictionaryOfAnyHashableStrings_insert	3262	5558	+70.4%	0.59x
FlattenListLoop	1630	2495	+53.1%	0.65x (?)
Set.isDisjoint.Box25	362	514	+42.0%	0.70x (?)
DictionaryKeysContainsNative	23	27	+17.4%	0.85x (?)
StringRemoveDupes	263	293	+11.4%	0.90x (?)
UTF8Decode_InitFromBytes_ascii_as_ascii	512	556	+8.6%	0.92x (?)
NSStringConversion.UTF8	927	1000	+7.9%	0.93x (?)

Improvement	OLD	NEW	DELTA	RATIO
RandomDoubleLCG	216	200	-7.4%	1.08x (?)
NSStringConversion.Rebridge.LongUTF8	54	50	-7.4%	1.08x (?)

Added	MIN	MAX	MEAN	MAX_RSS
RandomDouble01Def	36200	36200	36200	—
RandomDouble01LCG	142	147	144	—
RandomDoubleOpaqueDef	36100	36400	36200	—
RandomDoubleOpaqueLCG	543	543	543	—
RandomInt64Def	65100	67400	66100	—
RandomInt64LCG	1691	1743	1709	—
RandomInt8Def	70300	71700	71200	—
RandomInt8LCG	638	638	638	—

Code size: -O

Regression	OLD	NEW	DELTA	RATIO
RandomValues.o	2763	8003	+189.6%	0.35x

Performance (x86_64): -Osize

Improvement	OLD	NEW	DELTA	RATIO
FlattenListLoop	2495	1645	-34.1%	1.52x (?)
FlattenListFlatMap	6003	4375	-27.1%	1.37x (?)
DataToStringLargeUnicode	7250	6500	-10.3%	1.12x (?)

Added	MIN	MAX	MEAN	MAX_RSS
RandomDouble01Def	36100	36100	36100	—
RandomDouble01LCG	514	514	514	—
RandomDoubleOpaqueDef	35400	35400	35400	—
RandomDoubleOpaqueLCG	524	524	524	—
RandomInt64Def	64400	67200	66033	—
RandomInt64LCG	1727	1788	1747	—
RandomInt8Def	71200	72800	71967	—
RandomInt8LCG	653	666	657	—

Code size: -Osize

Regression	OLD	NEW	DELTA	RATIO
RandomValues.o	2419	6885	+184.6%	0.35x

Performance (x86_64): -Onone

Improvement	OLD	NEW	DELTA	RATIO
StringInterpolationSmall	4390	3960	-9.8%	1.11x (?)
StringToDataMedium	5300	4900	-7.5%	1.08x (?)
String.data.LargeUnicode	137	127	-7.3%	1.08x (?)
DataCreateMediumArray	6380	5920	-7.2%	1.08x (?)

Added	MIN	MAX	MEAN	MAX_RSS
RandomDouble01Def	68500	68600	68567	—
RandomDouble01LCG	30286	30434	30358	—
RandomDoubleOpaqueDef	83900	84000	83967	—
RandomDoubleOpaqueLCG	44450	44510	44472	—
RandomInt64Def	110300	113400	112133	—
RandomInt64LCG	45408	45714	45522	—
RandomInt8Def	108000	111600	109267	—
RandomInt8LCG	44031	44432	44212	—

Code size: -swiftlibs

✅	Benchmark Check Report
⚠️⏱	`RandomInt64LCG` execution took at least 1690 μs. _{Decrease the workload of RandomInt64LCG by a factor of 2 (10), to be less than 1000 μs.}

How to read the data

The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview

  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 12-Core Intel Xeon E5
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 12
  L2 Cache (per Core): 256 KB
  L3 Cache: 30 MB
  Memory: 64 GB

stephentyrone · 2021-06-15T17:13:58Z

@swift-ci test

Azoy

The stdlib changes look good to me. I wonder if for the benchmarks it makes sense to add separate IntegerOpaque benchmarks like the ones you did for Double so we can see clearly which integer benchmarks are constant ranges vs. going through the algorithm. I was going to ask to land these benchmarks separately, but I don't think CI runs 32b benchmarks for this to matter for this PR.

eeckstein · 2021-06-15T17:59:23Z

Is it possible to reduce the number of benchmarks to a representative subset?
We cannot add benchmarks for every permutation of every language feature. That would explode the benchmark runtime.

Alternatively, you can keep all the benchmarks, but add a .skip attribute, which excludes the from the regular run.

stephentyrone · 2021-06-15T18:33:00Z

This is the reduced set of benchmarks! I'm not sure why we would reduce it further if we care about performance at all. I'm pretty sure that report and track fewer benchmark results for the entirety of the Swift compiler and standard library than we did for a single API in Accelerate when I was working on it(!) A reasonable set of "all" the benchmarks, just for random integers, would include the full matrix of (every builtin integer type + representative custom types) x (fast and slow generators) x (various ranges--tiny, about half, nearly all) x (compile time constant vs unknown bounds). I'm probably leaving a few dimensions out. The benchmarks are fast--much, much faster to run than the test suite--so I don't see what we gain by stripping it down below the minimum representative sample, and there's a lot to lose in doing so.

eeckstein · 2021-06-15T19:46:31Z

ok, fine with me

Use the "nearly divisionless" algorithm on all targets.

a2a8ede

We have multipliedFullWidth available everywhere now, so we don't need to carry around the old implementation on 32b targets. Also adds a few more benchmarks for random number generation.

stephentyrone requested a review from Azoy June 15, 2021 14:56

Azoy approved these changes Jun 15, 2021

View reviewed changes

stephentyrone merged commit f538b49 into swiftlang:main Jun 16, 2021

stephentyrone deleted the unify-integer-random-algorithm branch June 16, 2021 00:27

stephentyrone mentioned this pull request Jun 15, 2021

[SR-10912] Use Lemire's Nearly Divisionless Random Integer Generation on 32-bit Systems #53302

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use the "nearly divisionless" algorithm on all targets. #37920

Use the "nearly divisionless" algorithm on all targets. #37920

Uh oh!

stephentyrone commented Jun 15, 2021 •

edited by AnthonyLatsis

Loading

Uh oh!

stephentyrone commented Jun 15, 2021

Uh oh!

stephentyrone commented Jun 15, 2021

Uh oh!

stephentyrone commented Jun 15, 2021

Uh oh!

swift-ci commented Jun 15, 2021

Uh oh!

stephentyrone commented Jun 15, 2021

Uh oh!

Azoy left a comment

Uh oh!

eeckstein commented Jun 15, 2021

Uh oh!

stephentyrone commented Jun 15, 2021

Uh oh!

eeckstein commented Jun 15, 2021

Uh oh!

Uh oh!

Use the "nearly divisionless" algorithm on all targets. #37920

Use the "nearly divisionless" algorithm on all targets. #37920

Uh oh!

Conversation

stephentyrone commented Jun 15, 2021 • edited by AnthonyLatsis Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stephentyrone commented Jun 15, 2021

Uh oh!

stephentyrone commented Jun 15, 2021

Uh oh!

stephentyrone commented Jun 15, 2021

Uh oh!

swift-ci commented Jun 15, 2021

Performance (x86_64): -O

Code size: -O

Performance (x86_64): -Osize

Code size: -Osize

Performance (x86_64): -Onone

Code size: -swiftlibs

Uh oh!

stephentyrone commented Jun 15, 2021

Uh oh!

Azoy left a comment

Choose a reason for hiding this comment

Uh oh!

eeckstein commented Jun 15, 2021

Uh oh!

stephentyrone commented Jun 15, 2021

Uh oh!

eeckstein commented Jun 15, 2021

Uh oh!

Uh oh!

stephentyrone commented Jun 15, 2021 •

edited by AnthonyLatsis

Loading