Utils: Reimplement util.GetRandomString to avoid modulo bias #64481

DanCech · 2023-03-09T07:13:35Z

We discovered that the util.GetRandomString function exhibits a bias, and when used to generate alphanumeric strings will produce the characters 0-7 with approximately 25% higher frequency on average than the other characters (8-9, A-Z, a-z) in the allowed character list.

This is due to modulo bias, the fact that when we calculate the value of a random byte with value 0-255 module the number of allowed characters (62 by default), we get 4 input values that will produce each of the numbers 8-61 (mapped to characters 8-9, A-Z and a-z), but 5 inputs that will produce the numbers 0-7 (mapped to characters 0-7).

When we restrict the set of output characters to the numbers 0-9 we see a smaller effect, because there are 25 input values that produce the values 5-9 but 26 input values that map to the values 0-4, creating a 4% bias toward the numbers 0-4.

Further reading on modulo bias:

This PR reimplements util.GetRandomString following best practices to produce unbiased output regardless of the allowed characters. It applies the standard method of avoiding modulo bias, by discarding random values that are larger than the largest multiple of the number of characters in the output set that is smaller than 255.

It also adds 2 different tests to verify that the function is working correctly by comparing the relative frequencies of the highest and lowest-frequency characters in the output, and by using a chi-squared test.

We generate ~25% more random bytes than output characters to avoid multiple calls to rand.Read (which are slow), but if that isn't enough we will read again (and again if needed) until we have enough random bytes to generate the output string. In my testing the performance is very similar to the existing function, in part due to some efficiencies in go offsetting the need to read more bytes with rand.Read.

We use util.GetRandomString for various purposes including generating random filenames but also for generating cryptographic salt values and api key secrets, so the impact of this bias on the security of Grafana needed to be assessed.

@kelnage performed an analysis of the reduction in entropy of the current generated strings, with the following results:

I've done an analysis using this widely used formula: E = log_2(R^L), where R is the size of the character space ([a-zA-Z0-9] having 62 possible chars, [0-7] having 8), L is the length of the string, and E is the number of bits of entropy available, and the number of guesses G required can be estimated by G = 2^E.

Based on that formula, for the 32-character strings, in the worst-case (where the GetRandomString algorithm only selected from [0-7] - very improbable, but possible given the above), the resulting secret would still be expected to have ~96 bits of entropy (compared with the expected case of 190 bits with all 62 chars). This is still well above the common recommendations for passwords (such as Okta, who suggest aiming for at least 60 bits), and would require in the order of ~10^28 guesses.

That drops to ~30 bits of entropy (~1 billion guesses) for a 10-character string - but since those appear to generally be used as salts (where the encryption strength should not depend on the randomness of the salt) or short-lived temporary passwords/IDs and again, this is a worst-case analysis, this is not a cause for concern.

@jmatosgrafana performed analysis on the current and proposed implementations:

Computing randomness stats on a 10 million series of GetRandomString(20) with the ent tool via ent stats_10millions.txt gives:

Entropy = 5.942557 bits per byte.

The expected entropy can be computed in Python with math.log2(62): 5.954196310386875

For GetRandomString(20) it means losing only 17% (pow(2, 20*(5.954196310386875-5.942557))), and 8% for GetRandomString(10) (pow(2, 10*(5.954196310386875-5.942557))).

Given those numbers, the security impact can be considered insignificant.

With this PR and a 1 million GetRandom(20) series, ent tool now returns

Entropy = 5.954195 bits per byte
Arithmetic mean value of data bytes is 86.8884

This is extremely close to the theoretical numbers: 5.954196310386875 for entropy (math.log2(62)) and 86.88709677419355 for arithmetic mean (5387/62, when adding up the ascii values for 0-9,a-z and A-Z)

At this point our assessment is that the current bias does not constitute a security issue as we still have sufficient entropy for the ways we use the generated strings. This PR improves that entropy value and brings our measured entropy closer to the theoretical values.

grafanabot · 2023-04-04T07:42:48Z

This pull request was removed from the 9.5.0 milestone because 9.5.0 is currently being released.

DanCech · 2023-04-17T03:08:07Z

@zserge @sakjur can you take a look at this when you have a moment?

sakjur

I tried running the tests a bunch and fuzzing this. With fuzzing, n below zero and alphabets larger than 255 bytes misbehave (…which arguably isn't very likely unless there's repetition), otherwise this seems to work as intended. Those aren't problems for the ways we're using it, so it doesn't have to be addressed.

When running the test suite I noticed that particularly the alphanumeric test suite has a relatively noticeable flakiness that I don't think we should merge. I'm considering suggesting running your checks say three times and failing only if two of those runs fail. That should reduce the flakiness to "very low", even if it's still there as a consequence of what it's testing.
We should also have a note about it in the test 🤔

sakjur · 2023-04-17T09:10:45Z

pkg/util/encoding_test.go

+	// Ensure there is no more than 5% variance between lowest and highest frequency characters
+	assert.LessOrEqual(t, float64(max-min)/float64(min), 0.05, "Variance between lowest and highest frequency characters must be no more than 5%")
+
+	// Ensure chi-squared value is lower than the critical bound
+	// 99.9% probability for 61 degrees of freedom
+	assert.Less(t, chiSquared, 100.888, "Chi squared value must be less than the 99.9% critical bound")


I tried to run this 10 000 times (it timed out after 10 minutes, I'm a little unsure how many rounds it got through, but since 1000 runs took a little over a minute, I don't think the lost runs on the end are too numerous) and it failed 165 times for the variance check and 3 times for the chi check, for a little over 1.5% failure rate.

I think that's a bit too frequent.

I'll tweak the tests to make them a bit less sensitive, they're probably tighter than they need to be right now.

I relaxed the test criteria to 10% variance and 99.99% chi squared, 5k tests without any failures. The old code fails both chi squared and the string variance test. It passes the digits variance test which is expected because the skew was less than 10% when generating digits.

sakjur

LGTM!

* reimplement GetRandomString, add tests that results are unbiased (cherry picked from commit 7e765c8)

* reimplement GetRandomString, add tests that results are unbiased

…#66970) Utils: Reimplement util.GetRandomString to avoid modulo bias (#64481) * reimplement GetRandomString, add tests that results are unbiased (cherry picked from commit 7e765c8) Co-authored-by: Dan Cech <dcech@grafana.com>

…#66969) Utils: Reimplement util.GetRandomString to avoid modulo bias (#64481) * reimplement GetRandomString, add tests that results are unbiased (cherry picked from commit 7e765c8) Co-authored-by: Dan Cech <dcech@grafana.com>

DanCech added type/bug area/backend add to changelog backport v9.4.x Mark PR for automatic backport to v9.4.x labels Mar 9, 2023

DanCech added this to the 9.5.0 milestone Mar 9, 2023

DanCech requested a review from a team as a code owner March 9, 2023 07:13

DanCech requested review from zserge, mildwonkey and yangkb09 and removed request for a team March 9, 2023 07:13

DanCech changed the title ~~Utils: reimplement util.GetRandomString to avoid modulo bias~~ Utils: Reimplement util.GetRandomString to avoid modulo bias Mar 9, 2023

DanCech requested review from zserge, mildwonkey, yangkb09 and sakjur and removed request for zserge, mildwonkey and yangkb09 March 20, 2023 02:10

grafanabot removed this from the 9.5.0 milestone Apr 4, 2023

sakjur reviewed Apr 17, 2023

View reviewed changes

DanCech added 2 commits April 19, 2023 23:05

reimplement GetRandomString, add tests that results are unbiased

663c0cb

relax test criteria

8656fcd

DanCech force-pushed the get-random-string branch from 224eecb to 8656fcd Compare April 20, 2023 03:58

DanCech added this to the 9.5.x milestone Apr 20, 2023

DanCech requested a review from sakjur April 20, 2023 04:01

sakjur approved these changes Apr 20, 2023

View reviewed changes

sakjur added the backport v9.5.x Bot will automatically open backport PR label Apr 20, 2023

DanCech merged commit 7e765c8 into main Apr 20, 2023
13 checks passed

DanCech deleted the get-random-string branch April 20, 2023 14:24

grafanabot mentioned this pull request Apr 20, 2023

[v9.4.x] Utils: Reimplement util.GetRandomString to avoid modulo bias #66969

Merged

grafanabot pushed a commit that referenced this pull request Apr 20, 2023

Utils: Reimplement util.GetRandomString to avoid modulo bias (#64481)

c2f0b4a

* reimplement GetRandomString, add tests that results are unbiased (cherry picked from commit 7e765c8)

grafanabot mentioned this pull request Apr 20, 2023

[v9.5.x] Utils: Reimplement util.GetRandomString to avoid modulo bias #66970

Merged

grafanabot pushed a commit that referenced this pull request Apr 20, 2023

Utils: Reimplement util.GetRandomString to avoid modulo bias (#64481)

d329f91

* reimplement GetRandomString, add tests that results are unbiased (cherry picked from commit 7e765c8)

mdvictor pushed a commit that referenced this pull request Apr 21, 2023

Utils: Reimplement util.GetRandomString to avoid modulo bias (#64481)

db17cc6

* reimplement GetRandomString, add tests that results are unbiased

guicaulada modified the milestones: 9.5.x, 9.5.0 Apr 24, 2023

zerok modified the milestones: 9.5.0, 10.0.0 May 3, 2023

zerok modified the milestones: 10.0.0, 10.0.0-preview May 31, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Utils: Reimplement util.GetRandomString to avoid modulo bias #64481

Utils: Reimplement util.GetRandomString to avoid modulo bias #64481

DanCech commented Mar 9, 2023

grafanabot commented Apr 4, 2023

DanCech commented Apr 17, 2023

sakjur left a comment

sakjur Apr 17, 2023

DanCech Apr 20, 2023

DanCech Apr 20, 2023

sakjur left a comment

Utils: Reimplement util.GetRandomString to avoid modulo bias #64481

Utils: Reimplement util.GetRandomString to avoid modulo bias #64481

Conversation

DanCech commented Mar 9, 2023

grafanabot commented Apr 4, 2023

DanCech commented Apr 17, 2023

sakjur left a comment

Choose a reason for hiding this comment

sakjur Apr 17, 2023

Choose a reason for hiding this comment

DanCech Apr 20, 2023

Choose a reason for hiding this comment

DanCech Apr 20, 2023

Choose a reason for hiding this comment

sakjur left a comment

Choose a reason for hiding this comment