## Pseudo-Random Number Generation in Simulation

Make sure you are able to the run the following cell. If error occurs, it means you are missing the needed packages. If that happens, you need to install the missing packages and restart the notebook to continue with this tutorial.

In [None]:
# for random distributions, random number generators, statistics
import random
import numpy as np
import scipy.stats as stats

# the simulation
import simulus

# for data visualization
import matplotlib.pyplot as plt
%matplotlib inline

# for animation inside the notebook
from ipywidgets import interact
import ipywidgets as widgets


For this tutorial, we will use the `scipy.stats` module to generate random numbers. Python's  native `random` module can also be used for this purpose. In fact, `simulus` uses the `random` module as its core random number generator. But the `scipy.stats` module contains a huge number of probability distributions (more than 120 of them!) along with many useful statistical functions, which would be be extremely useful for any simulation tasks, and of course, would be more than sufficient for our tutorial.

The `scipy.stats` module uses the pseudo-random number generator provided by `numpy`. Both `numpy` and Python's `random` module use the Mersenne Twister as the core generator. This generator produces 53-bit precision floats and has a period of $2^{19937}-1$. As a result, this generator has very good random properties.

### Random Variables in `scipy.stats`

The `scipy.stats` module contains a large number of probability distributions. Let's first examine a few. One of the most common distribution used in simulation is the exponential distribution. The function `expon(scale)` creates an exponential random variable and the parameter 'scale' is the mean of the distribution. 

In [None]:
x = stats.expon(scale=2)
print(x)

We can eaily find the mean, the median, and the standard deviation of the random variable:

In [None]:
x.mean(), x.median(), x.std()

We can also use the `stats(moments)` method to find the mean(‘m’), variance (‘v’), skew (‘s’), and kurtosis(‘k’), where the argument 'moments' specifies which we would like to have. In the following example, we want all of them:    

In [None]:
m, v, s, k = x.stats(moments='mvsk')
print('mean=%g, var=%g, skew=%g, kurtosis=%g' % (m, v, s, k))

We can also plot the probability density function (pdf) and the cumulative density function (cdf) of the random variable. In the following, we first define a couple of functions for plotting:

In [None]:
# a generic function for plotting a random variable
def plot_rv(rv, title, xmin=None, xmax=None):
    # find the range we'd like to draw (ppf() is the inverse cdf)
    if xmin is None:
        xmin = rv.ppf(0.01)
    if xmax is None:
        xmax = rv.ppf(0.99)

    # get the data points for pdf and cdf
    xs = np.linspace(xmin, xmax, 100)
    ys = rv.pdf(xs)
    yys = rv.cdf(xs)
    
    plt.fill_between(xs, ys, color='#7fc97f', alpha=0.7)
    plt.plot(xs, ys, color='#7fc97f', lw=3, alpha=0.9, label='pdf')

    plt.fill_between(xs, yys, color='#beaed4', alpha=0.7)
    plt.plot(xs, yys, color='#beaed4', lw=3, alpha=0.9, label='cdf')
    
    plt.title(title)
    plt.xlim(xmin, xmax)
    plt.ylim(0)
    plt.legend()
    plt.show()

In [None]:
# create an exponentially distributed random variable with 
# the given scale (mean) and use the above function to plot it 
def plot_expon(scale):
    rv = stats.expon(scale=scale)
    plot_rv(rv, "exponential (scale=%g)" % scale)

Now we are ready to plot the exponential distribution:

In [None]:
plot_expon(2.0)

We can also dynamically change the 'scale' parameter (using a slider) and plot the distribution on the fly. In the following, we can move the slider to change the scale value.

In [None]:
slider = widgets.FloatSlider(min=0.1, max=5, value=2)
interact(plot_expon, scale=slider)
None

Let's do the same for two other distributions just for fun:

In [None]:
def plot_gamma(a, scale):
    rv = stats.gamma(a=a, scale=scale)
    plot_rv(rv, "gamma (a=%g, scale=%g)" % (a,scale))

slider_a = widgets.FloatSlider(min=0.1, max=5, value=2.5)
slider_scale = widgets.FloatSlider(min=0.1, max=5, value=0.4)
interact(plot_gamma, a=slider_a, scale=slider_scale)
None

In [None]:
def plot_norm(loc, scale):
    rv = stats.norm(loc=loc, scale=scale)
    plot_rv(rv, "normal (loc=%g, scale=%g)" % (loc, scale))
    
slider_loc = widgets.FloatSlider(min=-2, max=2, value=0)
slider_scale = widgets.FloatSlider(min=0.1, max=2, value=1)
interact(plot_norm, loc=slider_loc, scale=slider_scale)
None

### Generating Random Variates

Once a random variable is created, one can use the `rvs(size)` method to draw random samples from the given distribution, where the 'size' argument specifies the number of samples. To get repeatable results, we should first set a random seed. Recall that the `scipy.stats` module uses the pseudo-random number generator provided in the `numpy` module. Therefore, we should set the random seed using the `numpy.random.seed()` function (as opposed to `random.seed()`, which is the function for the Python's `random` module).

In [None]:
x = stats.expon(scale=2.0)

# get the first 1000 random samples and show the first 3 
np.random.seed(13579)
xs1 = x.rvs(1000)
print('the first 1000 samples: %r...' % xs1[:3])

# get another 1000 random samples and show the first 3
np.random.seed(24680)
xs2 = x.rvs(1000)
print('the second 1000 samples: %r...' % xs2[:3])

The random seed determines the random sequence. If we reuse the random seed, we should be able to get the same random sequence.

In [None]:
np.random.seed(13579)
xs1 = x.rvs(1000)
print('repeat the first 1000 samples: %r...' % xs1[:3])

np.random.seed(24680)
xs2 = x.rvs(1000)
print('repeat the second 1000 samples: %r...' % xs2[:3])

To make sure that the random numbers are indeed drawn from the expected random distribution, we can plot the histogram of the samples and compare that with the true distribution.

In [None]:
plt.figure(figsize=(10,5))

xmin, xmax = x.ppf(0.001), x.ppf(0.999)
xs = np.linspace(xmin, xmax, 100)
ys = x.pdf(xs)

axs = plt.subplot(1, 2, 1)
axs.hist(xs1, alpha=0.5, bins='auto', density=True)
axs.plot(xs, ys, 'r-')
axs.set_xlim(xmin, xmax)
axs.set_title("histogram of xs1")

axs = plt.subplot(1, 2, 2)
axs.hist(xs2, alpha=0.5, bins='auto', density=True)
axs.plot(xs, ys, 'r-')
axs.set_xlim(xmin, xmax)
axs.set_title("histogram of xs2")

plt.show()

### Repeatable Random Sequences for Simulation

Simulus uses the Python's `random` module as the default random number generator. In order to obtain repeatable results, one needs to set the random seed using `random.seed()` before calling any simulus functions (such as creating a simulator). We call this random seed (13579 in the example) *the global random seed*.

The following example shows a simple instance of simulus that uses the default random number generator in the `random` module. The example generates a series of random integers between 0 and 99. 

In [None]:
random.seed(13579) # the global random seed

sim = simulus.simulator()
x = [random.randint(0,99) for _ in range(10)]
print(x)

The above cell can run repeatedly and the same random sequence (34, 3, 21, ...) will be generated each time. This is because we set the same global random seed using `random.seed(13579)` each time we run the cell. 

If we are using the `scipy.stats` module, we also need to set the random seed for the `numpy.random` module. We can do this by getting a (32-bit) integer from the random number generator in the `random` module, and use it as the seed for the `numpy.random` module. In doing so, the cell can run repeatedly with the same results.

In [None]:
random.seed(13597) # the global random seed

s = random.randrange(2**32)
print("numpy.random's seed=%d" % s)
np.random.seed(s)

sim = simulus.simulator()
x = stats.randint(0, 99).rvs(10)
print(x)

In simulus, the global random seed determines all the random sequences of the simulation. This would satisfy most of the simulation needs in terms of repeatability. However, there are cases this may not be sufficient. 

For example, we may want to have a unique random sequence sepecific to a simulator. This happens when we run our model in parallel with multiple simulators, say, one for each CPU core. So if a machine has 24 cores, we may have 24 simulators running simultaneously on the machine. And if we have 100 such machines, we can instantiate a simulation with 2400 simulators, each dealing with one twenty-four hundredth of a large model. In this scenario, each simulator should have its own random sequence.

A simulator's random sequence is uniquely determined by the global random seed and the name of the simulator. Simulus guarantees that as long as we choose the same global random seed and the same name for a simulator, we can obtain the same random sequence regardless of where we run the simulator (even if on different CPU core or a different machine in the cluster).

The following example shows how to use the simulator-specific random number generator (as opposed of uring the default one in the random module). The example again generates a series of random integers between 0 and 99. This sequence of integers would be unique to the simulator named 'myname', and would only change if we change the global random seed.

In [None]:
random.seed(13579) # the global random seed

sim = simulus.simulator('myname')
x = [sim.rng().randint(0,99) for _ in range(10)] # simulator-specific random sequence
print(x)

Even within a simulator, we may need to use separate random sequences, a.k.a. *random streams*, for example, one for the random customer inter-arrival time and the other for the service time. Having multiple random streams would make it easier to debug the models, since one can keep the same random sequence for each part of the model, even if we have to change the other parts. 

In the following example, we create two `numpy` random number generators, 'rng1' and 'rng2', using two 32-bit random integers, 's1' and 's2', drawn from the simulator-specific random sequence. We attach the rangom number generators to the random variables, 'rv1' and 'rv2', by setting the `random_state` of respective random variables.

In [None]:
random.seed(13579) # the global random seed

sim = simulus.simulator('myname')

s1 = sim.rng().randrange(2**32)
rng1= np.random.RandomState(s1)
print("create rng1 with seed=%d" % s1)

s2 = sim.rng().randrange(2**32)
rng2= np.random.RandomState(s2)
print("create rng2 with seed=%d" % s2)

rv1 = stats.randint(0, 99)
rv1.random_state = rng1

rv2 = stats.randint(0, 99)
rv2.random_state = rng2

print('stream1: %r' % rv1.rvs(10))
print('stream2: %r' % rv2.rvs(10))

Now if we change one random stream, say, by changing the random variable 'rv1' to be one with a geometric distribution and also changing the number of random samples drawn from the distribution (from 10 to 20), the other random stream will not change.

In [None]:
random.seed(13579) # the global random seed

sim = simulus.simulator('myname')

s1 = sim.rng().randrange(2**32)
rng1= np.random.RandomState(s1)
print("create rng1 with seed=%d" % s1)

s2 = sim.rng().randrange(2**32)
rng2= np.random.RandomState(s2)
print("create rng2 with seed=%d" % s2)

rv1 = stats.geom(0.25)
rv1.random_state = rng1

rv2 = stats.randint(0, 99)
rv2.random_state = rng2

print('stream1: %r' % rv1.rvs(20))
print('stream2: %r' % rv2.rvs(10))

### Using Python Generators

One difference between simulation modeling and statistical analysis is that simulation tends to use the random numbers one at a time. For example, we draw one random number to represent the time it takes for the next customer to arrive at queue, and then we draw another random number to represent the time it takes for the customer to be serviced. The random numbers are drawn from different probability distributions. 

This can be accomplished using generators. Generators in Python are functions that can be paused and resumed on the fly, each time returning an object, which can be used in an interation. In our case, the returned objects are random numbers. To create a generator, one simply define a function and use the 'yield' statement instead of 'return'. 

In the following example, we define a generator function `exp_generator()` which returns one random number at a time drawn from an exponential distribution with the given mean. We create a separate random stream for each generator instance: one for the inter-arrival time, and the other for the service time. The random streams are "attached" the simulator named 'myname'. In this case, they are independent, unique, and repeatable, as long as we use the same global random seed. 

In [None]:
random.seed(13579) # the global random seed

def exp_generator(sim, mean):
    s = sim.rng().randrange(2**32)
    rv = stats.expon(scale=mean)
    rv.random_state = np.random.RandomState(s)
    while True:
        # generate 100 random numbers at a time
        for x in rv.rvs(100):
            yield x

sim = simulus.simulator('myname')
inter_arrival_time = exp_generator(sim, 1.0)
service_time = exp_generator(sim, 0.8)

for i in range(5):
    print("%d: iatime=%g, svctime=%g" % 
          (i, next(inter_arrival_time), next(service_time)))
print('...')