Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't use java.util.Random #1428

Closed
bhickey opened this issue Dec 21, 2015 · 23 comments
Closed

Don't use java.util.Random #1428

bhickey opened this issue Dec 21, 2015 · 23 comments
Labels

Comments

@bhickey
Copy link
Contributor

@bhickey bhickey commented Dec 21, 2015

There's no good reason to use the stock generator provided by java.util.Random. Instead, use java.security.SecureRandom.

@cskau
Copy link

@cskau cskau commented Feb 23, 2016

Usually the reason why someone would use a "non-secure" random over a "secure" one is performance.

In essence the (cryptographically) "non-secure" variant produces pseudo-random numbers as fast as the computer can loop through the calculation. But the pseudo- part means it is technically deterministic and potentially predictable, though to a human it'll look perfectly random.
A cryptographically secure RNG is supposed to at least rely on a true random source to make it truly unpredictable, but at the cost of potentially waiting for the source to generate input.
Referering to the doc:

Note: Depending on the implementation, the generateSeed and nextBytes methods may block as entropy is being gathered, for example, if they need to read from /dev/random on various unix-like operating systems.

( Source: SecureRandom )

So every time you need a random number (shuffling cards, coin toss, AI making a choice, etc..) you end up having to wait a potentially long time until 'entropy has been gathered' before your application can resume running. This is usually a deal-breaker for applications like games.

So why would anyone ever use SecureRandom? Well, the doc also states:

Instances of java.util.Random are not cryptographically secure. Consider instead using SecureRandom to get a cryptographically secure pseudo-random number generator for use by security-sensitive applications.

( Source: Random )

The operative words here are obviously 'cryptography' and 'security-sensitive'.

So if all cases where the project uses Random is for cryptography then yes, using SecureRandom would most likely the best choice.
However, if Random is being used for some of the common gaming applications outlined above, then I think Random is perfectly reasonable.

I think the next step here would be to examine if the former, the latter or both are the case.
Do you have any examples of Random being incorrectly applied in the source?

@bhickey
Copy link
Contributor Author

@bhickey bhickey commented Feb 24, 2016

Do you have any examples of Random being incorrectly applied in the source?

No matter how perfect your code, the choice to use java.util.Random is fundamentally flawed for any antagonistic game. If a player can figure out the internal state of your RNG, they can cheat by predicting any random element of gameplay: the ordering of cards in the deck, what their deck will look like after a shuffle, the result of coin flips.

java.util.Random is a 48-bit linear congruential generator with a 32-bit truncation. After a small number of emissions you can trivially predict future states. It's more or less unsuitable for any application. The guy who wrote it has been catching flak over it for about 20 years and counting.

Let's do some math. Generally M:tG decks are composed of 60 cards. There are 60! possible shuffles, which is approximately 2^272. Great, this tells us that some shuffles are just impossible. There are still a huge number of shuffles, so this isn't a problem, right? Wrong. A 48-bit state space is small enough to brute force. There are 60 possible first cards I can draw, 59 second cards, and so on. This is actually much larger than (60 choose 7). Crunching the numbers we get 103163592470400 opening hands. But wait, we only have 281474976710656 possible shuffles. On average each opening hand corresponds to 2.5 initial RNG states. As soon as I draw my first card, the RNG state is fully determined with high probability. Then the cheating can begin.

Practically speaking, only the most bored, destructive player would bother doing this. That isn't an excuse for leaving trivially fixable bugs in place.

@cskau
Copy link

@cskau cskau commented Feb 24, 2016

Brendan, thank you for elaborating on your concerns.

If we're concerned about an adversary getting access to the internal state of the RNG, it is of course not worth discussing any case where they have access the machine it's running on. Hence we can excludes any client code from this discussion.
So if we're talking about server code we can hypothesise whether it's practical for anyone to observe all RNG output. I personally think this is less likely - but for the sake of argument, let's say it is.

I cannot speak to the feasibility, and much less the practicality, of deriving the internal state from an observed subsequence, but if it is indeed an issue here, this is a property of all deterministic PRNGs.
In that case SecureRandom sadly isn't our saviour either:

Many SecureRandom implementations are in the form of a pseudo-random number generator (PRNG), which means they use a deterministic algorithm to produce a pseudo-random sequence from a true random seed. Other implementations may produce true random numbers, and yet others may use a combination of both techniques.

( Source: SecureRandom )

I actually think the biggest point there is that a lot of this ultimately comes down to implementation details.
As such I think we should skip the general discussion about the merits of each class and rather focus on particular usage, as that will give us something to base a discussion on.

I think it would be helpful to identify particular code where SecureRandom would be a better choice. Does anyone have such a case they'd like to raise here?

@bhickey
Copy link
Contributor Author

@bhickey bhickey commented Feb 24, 2016

Let's look at Library.shuffle():

public void shuffle() {
    UUID[] shuffled = library.toArray(new UUID[0]);
    for (int n = shuffled.length - 1; n > 0; n--) {
        int r = rnd.nextInt(n);
        UUID temp = shuffled[n];
        shuffled[n] = shuffled[r];
        shuffled[r] = temp;
    }
    library.clear();
    library.addAll(Arrays.asList(shuffled));
}

What's wrong with this code? It is a correct implementation of the Knuth shuffle, though if we change the type of library to LinkedList, the whole block reduces down to:

public void shuffle() {
    Collections.shuffle(library, rnd);
}

Looks good to me, right? Nope. rnd is initialized using the default constructor for Random, here's the implementation:

public Random() {
    this(System.currentTimeMillis());
}

All of a sudden we've reduced the number of possible initial states from 2^48 to tens or hundreds of thousands. This makes a brute force attack dead simple. You could try protecting against this:

rnd = new Random(seedWeGotFromSecureRandom);

This is an improvement, but a state space of 2^48 is just too darn small. Once you show me 7 cards off the top of my deck, I can just run shuffles locally until I hit on the random see that gives me those same cards. Once that happens, I know the full ordering of my deck: when to scry, when to fetch, when to clash.

My original point stands: The is never a defensible reason for using java.util.Random's default generator. Furthermore, if you pick and choose between Random and SecureRandom, mistakes will sneak in and open the game up to attacks. I work on an open source roguelike and watched a player decode and manipulate our RNG state just for fun. Don't underestimate the tenacity of bored people.

If you're worried about performance, write some benchmarks and show the impact on the game. Otherwise waxing poetic about the hypothetical differences between Random and SecureRandom is pointless. When /dev/random starts blocking long enough to screw up your system, you've got bigger problems on your hand.

@quercitron
Copy link
Contributor

@quercitron quercitron commented Feb 25, 2016

Just curious, what is seed size for SecureRandom? Cannot find it, maybe it depends on the implementation.

@thechucklingatom
Copy link

@thechucklingatom thechucklingatom commented Feb 25, 2016

One other question, maybe I am just missing something, but if you are just running shuffles locally until you hit the first seven cards, there will be many cases that start with the same seven cards. I would assume you are very unlikely to get the correct one very often, if at all. Unless you want to take all the possible combinations that start with those seven cards, but then you are basically just crunching probabilities and that isn't against any rules.

@quercitron
Copy link
Contributor

@quercitron quercitron commented Feb 25, 2016

How I see it:
Random seed size: 2^48 = 281 474 976 710 656, so the number of possible deck shuffles is limited by that number
Number of possible first 7 cards: 60 * 59 * .. * 54 = 60! / (60 - 7)! = 1 946 482 876 800
So there are 281 474 976 710 656 / 1 946 482 876 800 = 144.6 seeds per hand on average
But when draw 8th card you reduce it to 2.7 seeds per hand on average
And after 9th card it's 0.05 seeds per hand on average, so you almost always know precisely the seed (and your library, and how it will look after shuffles)

@bhickey
Copy link
Contributor Author

@bhickey bhickey commented Feb 25, 2016

quercitron: Spot on.

I believe that default algorithm for SecureRNG is SHA1PRNG, but to be safe you can specifically request it new SecureRandom("SHA1PRNG"). It gets a random seed from the OS, concatenates it with a 64-bit counter, hashes the result and emits the lower 64-bits of the hash. In the most pessimistic case it's guaranteed to have at least 64-bits of state.

From this we end up with 9.5m hands per state. Even better, by expanding the state we've made a brute force attack computationally infeasible. Assuming the generator is properly seeded, the state space is much larger than 64-bits.

@thechucklingatom
Copy link

@thechucklingatom thechucklingatom commented Feb 25, 2016

While that math looks fine to me, that is referring to each card doesn't have a duplicate in the deck. If you are running a deck filled with four of's you are more likely to get duplicates of the same starting hand, since instead of just one way to build 7 cards you have many. I would imagine this would make it harder to brute force the whole deck. I don't disagree that someone could brute force it to cheat, but my question is it even worth their time and why does it matter if one person wants to waste their time?

@bhickey
Copy link
Contributor Author

@bhickey bhickey commented Feb 25, 2016

thechucklingatom: Remember that each card is assigned a UUID by the server.

People will go to amazing lengths to cheat at online games. Besides, even if someone isn't likely to exploit it, the issue is trivially fixable with no real world downside.

@thechucklingatom
Copy link

@thechucklingatom thechucklingatom commented Feb 25, 2016

I do realize that, I forgot that might be visible though. Thank you. Since it's trivial should we expect a pull request from you then?

@bhickey
Copy link
Contributor Author

@bhickey bhickey commented Feb 25, 2016

@thechucklingatom I'm more than happy to do it, but I'll need to get a copyright release from my employer which can take a couple of weeks.

@thechucklingatom
Copy link

@thechucklingatom thechucklingatom commented Feb 25, 2016

No worries, was just wondering. This has been super interesting for me to watch this one. Never spent much time looking into random.

@LevelX2
Copy link
Contributor

@LevelX2 LevelX2 commented Feb 25, 2016

Two questions:

  1. Isn't it correct that a player to be able to predict the upcoming sequence of cards he has to know the order the cards were in before the last shuffle started? So at least a double shuffle with different seeds should prevent all predictability?

  2. On the other hand wouldn't simply also creating and using a new Random() after shuffling of every 7 cards prevent to predict the sequence any further?

@bhickey
Copy link
Contributor Author

@bhickey bhickey commented Feb 25, 2016

@LevelX2 Yes, that would work, but please don't do it. There isn't a good reason to add a ton of complexity rather a straightforward fix.

I'll get a PR out for this, but it'll probably be a couple of weeks.

@quercitron
Copy link
Contributor

@quercitron quercitron commented Feb 25, 2016

I don't know if any use of Random should be replaced in the code, but there are next candidates:

Library class - deck shuffling
ExpansionSet, DraftCube classes - boosters generation

Any other places?

@LevelX2
Copy link
Contributor

@LevelX2 LevelX2 commented Feb 25, 2016

For sure also coin toss.

But I wonder how log a sequence of true/false has to be to know where the sequence of the random created 1/0 values is located to be able to predict the next coin toss.

So for sure here is no need to change that imho.

@quercitron
Copy link
Contributor

@quercitron quercitron commented Feb 25, 2016

But I wonder how log a sequence of true/false has to be to predict the next coin toss.

Seed size for Random is 2^48, so I guess about 48 coin tosses.

@LevelX2
Copy link
Contributor

@LevelX2 LevelX2 commented Feb 25, 2016

Seed size for Random is 2^48, so I guess about 48 coin tosses.

Why does this restrict / determine that there are no longer identical true/false sequences with a length > 48 will happen multiple times in the whole random sequence?

@quercitron
Copy link
Contributor

@quercitron quercitron commented Feb 25, 2016

Why does this restrict / determine that there are no longer identical true/false sequences with a length > 48 will happen multiple times in the whole random sequence?

Oh, you mean when adversary doesn't know position of his subsequence in the random sequence? I think in this case he doesn't need much more information, but it's harder to brute force all sequences.

For example if his subsequence starts from position N, there are N times more possible subsequences he has to consider, and he should know additional log_2(N) true/false results.

@cskau
Copy link

@cskau cskau commented Feb 26, 2016

Brendan, please note that I am decidedly not arguing that Random is mathematically sound or even cryptographically secure. I hope that is not how my comments have come across.

Rather, our fundamental disagreement is about the insistence on such generalisations as:

The is never a defensible reason for using java.util.Random's default generator.

No matter how perfect your code, the choice to use java.util.Random is fundamentally flawed for any antagonistic game.

There's no good reason to use the stock generator provided by java.util.Random.

I, personally, think the official Java doc linked to multiple times already offers support against this.
But perhaps our differences ultimately boils down to this:

If you're worried about performance, write some benchmarks and show the impact on the game. Otherwise waxing poetic about the hypothetical differences between Random and SecureRandom is pointless. When /dev/random starts blocking long enough to screw up your system, you've got bigger problems on your hand.

First of all let me point out that I don't think benchmarks are in any way conclusive here - just because I might show it doesn't impact my particular setup, doesn't mean it won't impact yours.

With that in mind, I think by far the easiest way to test how much you can stress your system level entropy source on Linux is to try running:

cat /dev/random | xxd

I've done this repeatedly for both my home computer and my company workstation, and I get roughly 100-200 bytes before it blocks for a significant amount of time.

But of course, while SecureRandom often relies on /dev/random internally, the above does not translate directly into the performance of Java code. So to get a better idea of that, I also wrote a really naive piece of Java code derived from the SecureRandom doc:

for (int i=0; i < 100; i++) {
  byte seed[] = random.generateSeed(20);
  random.nextBytes(bytes);
}

Unfortunately that too I found trivial to get to block for two minutes at a time.

But as mentioned, please don't take any of these benchmarks as exactly how it will behave on your system.

After all that I'd rather point you to this old Java bug describing how other real people have had real issues with SecureRandom blocking:
http://bugs.java.com/view_bug.do?bug_id=6521844

(Actually I'd be happy to forward you a thread by some of our esteemed colleagues who independently ran into the exact same issue internally. If nothing else it's an interesting read in it's own right.)

Furthermore, if you pick and choose between Random and SecureRandom, mistakes will sneak in and open the game up to attacks.

I don't think picking either over the other guarantees us mistakes won't sneak in and that the game is protected from attacks.

The most important thing in my eyes is evaluating each case with the trade-offs of each method in mind. Only then can we decide the right tool for the job.

I'm glad we're now at a point where we're discussing concrete cases like the shuffling code in part because I think your argument about the state space is convincing. This does sound like a place where it's worth making trade-offs to replace Random with something else.

If I can make one plea it is that the outcome of this issue won't be to blanket search-replace all instances of Random with SecureRandom.
There are likely cases where SecureRandom is worth waiting for to ensure security. But there are just as likely cases where we can't accept a server grinding to a halt just because 20 people were playing at the same time.
And then there are probably yet other cases where what we really need is a third option of a larger state space but without the blocking true-random seed.

I hope it helps.

Cheers

@bhickey
Copy link
Contributor Author

@bhickey bhickey commented Feb 27, 2016

@cskau Hey Clement,

If I can make one plea it is that the outcome of this issue won't be to blanket search-replace all instances of Random with SecureRandom.

Promised. Thanks for the other comments.

@rkfg
Copy link
Member

@rkfg rkfg commented Mar 26, 2016

FWIW, I had serious issues with SecureRandom popping out from nowhere. One day I had my Tomcat instance at work starting for too long, like, 5 minutes. Before it was up in 20 seconds with several servlets inside (about 5). Digging the issue I've found this article. As you can see, it suggests to basically switch to the insecure random data source via the Java security properties. And I did that. And Tomcat started to behave properly afterwards.

So yeah, every generalization is fundamentally wrong (including this one of course, hehe). SecureRandom may easily hang the entire app or any operation if it suddenly is out of data. Do you really want users to wait for their secure shuffle to complete or let them just play the darn game? Don't forget also that, as in my case, SecureRandom may use the pseudorandom data source and thus may not be better than plain Random.

I think it's better not to replace Random with its secure counterpart but just add randomness to it with seed manipulations. For example, multiply the current timestamp by free system memory and current heap size and we're pretty good to go. It's a trivial change with no downsides (that I'm able to see).

@JayDi85 JayDi85 closed this Jul 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
7 participants