Permalink
Switch branches/tags
Nothing to show
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
104 lines (83 sloc) 5.72 KB

Performance comparisons of C++ RNGs

C++11 introduces a nice random header which contains a variety of generators and distributions, including the ubiquitous Mersenne Twister 19937 algorithm in 32 and 64bit versions. They're typically used like this:

std::mt19937 generator;
std::uniform_int_distribution<int> distribution(0,42); // beware inclusive ranges!
int random_number1 = generator;
int random_number2 = distribution(generator);

When called without the distribution, the output is between 0 and their maximum. To evaluate the runtime penalty of the distributions, they're measured separately from the generators.

This article compares some of the C++11 generators, their boost counterparts, "/dev/urandom", Intel's hardware RdRand and good old rand(). No thorough evaluation is being done on the quality of the random numbers!

Call times

First the pure call times are compared, without using them with the distributions. The runtime is for generating 10'000'000 ints and writing them into a std::vector, which is a typical use case. To make sure the RNG speed is measured and not the vector writing overhead, a constant number is written into the vector, aka "xkcd random" to serve as a baseline. Here's the data for GCC 4.8. The plot contains additional data for clang and ICC.

PRNG rng.max() runtime [ms] runtime
baseline_int - 3.731 0.12
baseline_uint_fast64_t - 7.694 0.25
std::default_random_engine 2^31-2 50.92 1.64
std::mt19937 2^32-1 31.059 1.00
std::mt19937_64 2^64-1 36.901 1.19
std::minstd_rand 2^31-2 50.75 1.63
boost::random::mt19937 2^32-1 19.708 0.63
boost::random::mt19937_64 2^64-1 31.91 1.03
RdRand 2^32-1 364.842 11.75
/dev/urandom 2^32-1 2308.13 74.31
rand() 2^31-1 69.271 2.23

call times

  • Most RNGs are fast, especially Boost. On my machine, writing a boost::random::mt19937 into a vector takes just a little over 6 cycles!
  • Hardware numbers are disappointingly slow.
  • The inclusion of rand() is just for comparison, keep in mind that it usually has horrible numeric characteristics and commonly (at least under mingw) only has a 15bit range.
  • The overhead for handling the vector is relatively small, so the random number generation is dominating this comparison, as expected.
  • Very little compiler difference.

uniform_int_distribution

For generating uniform integers there's uniform_int_distribution<> which does that "using magic" (quote from STL). While this is perfectly uniform, it can be slow. There are faster but "wronger" alternatives like rng()%range. Be aware of the number bias it introduces though! This should be correct though if range is cleanly divisible by rng().max().

PRNG runtime [ms] runtime time(dist(rng)) / time(rng()) time(modulo) / time(rng())
std::default_random_engine 186.057 0.96 3.65 0.99
std::mt19937 194.047 1.00 6.25 1.02
std::mt19937_64 168.855 0.87 4.58 0.92
std::minstd_rand 185.565 0.96 3.66 0.99
boost::random::mt19937 96.529 0.50 4.90 1.14
boost::random::mt19937_64 199.588 1.03 6.25 1.01
RdRand 430.178 2.22 1.18 0.99
/dev/urandom 2359.62 12.16 1.02 1.00

Plot is for uniform_int_distribution<> runtimes: integers

  • There is a huge overhead from the uniform_int_distribution, boost and nonboost!
  • The rng()%range trick is very fast. So pick your guns depending on your random needs.
  • Very little compiler difference.

Floats and doubles

The PRNGs are plugged into float and double uniform_real_distribution and measured filling a 10'000'000 sized vector:

PRNG type runtime [ms] runtime [arb]
default_random_engine float 42.2 0.95
mt19937 float 44.2 1.00
mt19937_64 float 85.2 1.93
minstd_rand float 42.2 0.95
boost mt19937 float 48.4 1.10
boost mt19937_64 float 74.9 1.69
/dev/urandom float 2291.3 51.84
RdRand float 393.6 8.90
default_random_engine double 97.3 2.20
mt19937 double 87.9 1.99
mt19937_64 double 87.6 1.98
minstd_rand double 97.4 2.20
boost mt19937 double 108 2.44
boost mt19937_64 double 74.6 1.69
/dev/urandom double 4577.7 103.57
RdRand double 782.5 17.70

TODO: comparison to call times

A little warning: I encountered an odd performance drop (around a factor of 10!) with the specific combination of clang compiler, mt19937 (32 and 64bit) and uniform_real_distribution (float and double). But I could only reproduce this with clang (3.4), and that particular combination. Do a little testing beforehand if this applies to you. Link to stackoverflow discussion.

Versions, source code

  • g++ 4.8.1 on 64bit linux
  • ICC 14.0.2 20140120
  • clang 3.3
  • boost 1.53

Source code here