Don't use java.util.Random #1428
Comments
|
Usually the reason why someone would use a "non-secure" random over a "secure" one is performance. In essence the (cryptographically) "non-secure" variant produces pseudo-random numbers as fast as the computer can loop through the calculation. But the pseudo- part means it is technically deterministic and potentially predictable, though to a human it'll look perfectly random.
( Source: SecureRandom ) So every time you need a random number (shuffling cards, coin toss, AI making a choice, etc..) you end up having to wait a potentially long time until 'entropy has been gathered' before your application can resume running. This is usually a deal-breaker for applications like games. So why would anyone ever use SecureRandom? Well, the doc also states:
( Source: Random ) The operative words here are obviously 'cryptography' and 'security-sensitive'. So if all cases where the project uses Random is for cryptography then yes, using SecureRandom would most likely the best choice. I think the next step here would be to examine if the former, the latter or both are the case. |
No matter how perfect your code, the choice to use java.util.Random is fundamentally flawed for any antagonistic game. If a player can figure out the internal state of your RNG, they can cheat by predicting any random element of gameplay: the ordering of cards in the deck, what their deck will look like after a shuffle, the result of coin flips. java.util.Random is a 48-bit linear congruential generator with a 32-bit truncation. After a small number of emissions you can trivially predict future states. It's more or less unsuitable for any application. The guy who wrote it has been catching flak over it for about 20 years and counting. Let's do some math. Generally M:tG decks are composed of 60 cards. There are 60! possible shuffles, which is approximately 2^272. Great, this tells us that some shuffles are just impossible. There are still a huge number of shuffles, so this isn't a problem, right? Wrong. A 48-bit state space is small enough to brute force. There are 60 possible first cards I can draw, 59 second cards, and so on. This is actually much larger than (60 choose 7). Crunching the numbers we get 103163592470400 opening hands. But wait, we only have 281474976710656 possible shuffles. On average each opening hand corresponds to 2.5 initial RNG states. As soon as I draw my first card, the RNG state is fully determined with high probability. Then the cheating can begin. Practically speaking, only the most bored, destructive player would bother doing this. That isn't an excuse for leaving trivially fixable bugs in place. |
|
Brendan, thank you for elaborating on your concerns. If we're concerned about an adversary getting access to the internal state of the RNG, it is of course not worth discussing any case where they have access the machine it's running on. Hence we can excludes any client code from this discussion. I cannot speak to the feasibility, and much less the practicality, of deriving the internal state from an observed subsequence, but if it is indeed an issue here, this is a property of all deterministic PRNGs.
( Source: SecureRandom ) I actually think the biggest point there is that a lot of this ultimately comes down to implementation details. I think it would be helpful to identify particular code where SecureRandom would be a better choice. Does anyone have such a case they'd like to raise here? |
|
Let's look at Library.shuffle(): What's wrong with this code? It is a correct implementation of the Knuth shuffle, though if we change the type of library to Looks good to me, right? Nope. All of a sudden we've reduced the number of possible initial states from 2^48 to tens or hundreds of thousands. This makes a brute force attack dead simple. You could try protecting against this: This is an improvement, but a state space of 2^48 is just too darn small. Once you show me 7 cards off the top of my deck, I can just run shuffles locally until I hit on the random see that gives me those same cards. Once that happens, I know the full ordering of my deck: when to scry, when to fetch, when to clash. My original point stands: The is never a defensible reason for using java.util.Random's default generator. Furthermore, if you pick and choose between Random and SecureRandom, mistakes will sneak in and open the game up to attacks. I work on an open source roguelike and watched a player decode and manipulate our RNG state just for fun. Don't underestimate the tenacity of bored people. If you're worried about performance, write some benchmarks and show the impact on the game. Otherwise waxing poetic about the hypothetical differences between Random and SecureRandom is pointless. When |
|
Just curious, what is seed size for SecureRandom? Cannot find it, maybe it depends on the implementation. |
|
One other question, maybe I am just missing something, but if you are just running shuffles locally until you hit the first seven cards, there will be many cases that start with the same seven cards. I would assume you are very unlikely to get the correct one very often, if at all. Unless you want to take all the possible combinations that start with those seven cards, but then you are basically just crunching probabilities and that isn't against any rules. |
|
How I see it: |
|
quercitron: Spot on. I believe that default algorithm for From this we end up with 9.5m hands per state. Even better, by expanding the state we've made a brute force attack computationally infeasible. Assuming the generator is properly seeded, the state space is much larger than 64-bits. |
|
While that math looks fine to me, that is referring to each card doesn't have a duplicate in the deck. If you are running a deck filled with four of's you are more likely to get duplicates of the same starting hand, since instead of just one way to build 7 cards you have many. I would imagine this would make it harder to brute force the whole deck. I don't disagree that someone could brute force it to cheat, but my question is it even worth their time and why does it matter if one person wants to waste their time? |
|
thechucklingatom: Remember that each card is assigned a UUID by the server. People will go to amazing lengths to cheat at online games. Besides, even if someone isn't likely to exploit it, the issue is trivially fixable with no real world downside. |
|
I do realize that, I forgot that might be visible though. Thank you. Since it's trivial should we expect a pull request from you then? |
|
@thechucklingatom I'm more than happy to do it, but I'll need to get a copyright release from my employer which can take a couple of weeks. |
|
No worries, was just wondering. This has been super interesting for me to watch this one. Never spent much time looking into random. |
|
Two questions:
|
|
@LevelX2 Yes, that would work, but please don't do it. There isn't a good reason to add a ton of complexity rather a straightforward fix. I'll get a PR out for this, but it'll probably be a couple of weeks. |
|
I don't know if any use of Random should be replaced in the code, but there are next candidates: Library class - deck shuffling Any other places? |
|
For sure also coin toss. But I wonder how log a sequence of true/false has to be to know where the sequence of the random created 1/0 values is located to be able to predict the next coin toss. So for sure here is no need to change that imho. |
Seed size for Random is 2^48, so I guess about 48 coin tosses. |
Why does this restrict / determine that there are no longer identical true/false sequences with a length > 48 will happen multiple times in the whole random sequence? |
Oh, you mean when adversary doesn't know position of his subsequence in the random sequence? I think in this case he doesn't need much more information, but it's harder to brute force all sequences. For example if his subsequence starts from position N, there are N times more possible subsequences he has to consider, and he should know additional log_2(N) true/false results. |
|
Brendan, please note that I am decidedly not arguing that Random is mathematically sound or even cryptographically secure. I hope that is not how my comments have come across. Rather, our fundamental disagreement is about the insistence on such generalisations as:
I, personally, think the official Java doc linked to multiple times already offers support against this.
First of all let me point out that I don't think benchmarks are in any way conclusive here - just because I might show it doesn't impact my particular setup, doesn't mean it won't impact yours. With that in mind, I think by far the easiest way to test how much you can stress your system level entropy source on Linux is to try running:
I've done this repeatedly for both my home computer and my company workstation, and I get roughly 100-200 bytes before it blocks for a significant amount of time. But of course, while SecureRandom often relies on /dev/random internally, the above does not translate directly into the performance of Java code. So to get a better idea of that, I also wrote a really naive piece of Java code derived from the SecureRandom doc: Unfortunately that too I found trivial to get to block for two minutes at a time. But as mentioned, please don't take any of these benchmarks as exactly how it will behave on your system. After all that I'd rather point you to this old Java bug describing how other real people have had real issues with SecureRandom blocking: (Actually I'd be happy to forward you a thread by some of our esteemed colleagues who independently ran into the exact same issue internally. If nothing else it's an interesting read in it's own right.)
I don't think picking either over the other guarantees us mistakes won't sneak in and that the game is protected from attacks. The most important thing in my eyes is evaluating each case with the trade-offs of each method in mind. Only then can we decide the right tool for the job. I'm glad we're now at a point where we're discussing concrete cases like the shuffling code in part because I think your argument about the state space is convincing. This does sound like a place where it's worth making trade-offs to replace Random with something else. If I can make one plea it is that the outcome of this issue won't be to blanket search-replace all instances of Random with SecureRandom. I hope it helps. Cheers |
|
@cskau Hey Clement,
Promised. Thanks for the other comments. |
|
FWIW, I had serious issues with SecureRandom popping out from nowhere. One day I had my Tomcat instance at work starting for too long, like, 5 minutes. Before it was up in 20 seconds with several servlets inside (about 5). Digging the issue I've found this article. As you can see, it suggests to basically switch to the insecure random data source via the Java security properties. And I did that. And Tomcat started to behave properly afterwards. So yeah, every generalization is fundamentally wrong (including this one of course, hehe). SecureRandom may easily hang the entire app or any operation if it suddenly is out of data. Do you really want users to wait for their secure shuffle to complete or let them just play the darn game? Don't forget also that, as in my case, SecureRandom may use the pseudorandom data source and thus may not be better than plain Random. I think it's better not to replace Random with its secure counterpart but just add randomness to it with seed manipulations. For example, multiply the current timestamp by free system memory and current heap size and we're pretty good to go. It's a trivial change with no downsides (that I'm able to see). |
There's no good reason to use the stock generator provided by java.util.Random. Instead, use java.security.SecureRandom.
The text was updated successfully, but these errors were encountered: