Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix normal random skew towards mean #4539

Merged
merged 1 commit into from
Oct 3, 2023

Conversation

ranisalt
Copy link
Member

Pull Request Prelude

Changes Proposed

The current code for normal distribution is skewed and tends to generate much more times the central element than the rest of the distribution, more than is expected of a normal distribution.

Comparison of results generating 2^27 random values between 0 and 20 (click to expand)
>> TFS normal random:
0	* (800928)
1	**** (2126010)
2	***** (2982546)
3	******* (4027064)
4	********* (5222481)
5	************ (6495756)
6	************** (7770920)
7	***************** (8940450)
8	****************** (9872740)
9	******************* (10476202)
10	******************************** (16807803)
11	******************* (10474301)
12	****************** (9872670)
13	***************** (8924956)
14	************** (7778402)
15	************ (6496375)
16	********* (5210797)
17	******* (4026227)
18	***** (2983148)
19	**** (2126458)
20	* (801494)
>> Fixed normal random:
0	* (838123)
1	**** (2227927)
2	***** (3127189)
3	******** (4214630)
4	********** (5465489)
5	************ (6800608)
6	*************** (8147614)
7	***************** (9362439)
8	******************* (10342027)
9	******************** (10982638)
10	********************* (11204545)
11	******************** (10982142)
12	******************* (10342200)
13	***************** (9359764)
14	*************** (8140484)
15	************ (6804640)
16	********** (5465333)
17	******** (4215876)
18	***** (3128459)
19	**** (2226383)
20	* (839218)

This happens because of the following path:

if (v < 0.0) {
	increment = diff / 2;
} else if (v > 1.0) {
	increment = (diff + 1) / 2;
}

This causes values below or above two standard deviations (stdev is 0.25) from the mean (0.5) to simply be replaced by the central element. In normal distributions, about 5% of the results will be farther than two standard deviations from the mean, and those would all generate the central element instead.

image

The proper solution is to just discard the result and generate a new value. The while loop will only run once in 95% of the calls. There is an infinitely small chance of an infinite loop.

The new code is also roughly 30% faster (on a Ryzen 7 4800H chip) due to it performing less branching and having only one return.

@EPuncker EPuncker added the enhancement Increase or improvement in quality, value, or extent label Sep 25, 2023
@ranisalt ranisalt merged commit 5b1b24a into otland:master Oct 3, 2023
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Increase or improvement in quality, value, or extent
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants