# Lab 9: Random Numbers and Random Walks

In [None]:
import numpy as np
import matplotlib.pyplot as plt

Random numbers play an important role in many computational methods. ``numpy`` has methods to create "pseudo-random" numbers. Run the following code:

In [None]:
rng = np.random.default_rng()
for i in range(10):
    print(rng.random())

Now, run it again... and again. What do you notice?

These numbers are called pseudo-random because they aren't truly random (a numerical scheme is used to calculate them), but they are random in the sense that there is no correlation between successive random numbers. ``rng = np.random.default_rng()`` creates a random number generator, a Python object that we will use to create random numbers. Then, ``rng.random()`` samples a random number uniformly distributed between 0 and 1. However, it can be difficult to test your code if there's always new random numbers being created. So, we can introduce what is called a seed, an integer that we use to start the calculations of these pseudo-random numbers. In the code below replace ``...`` with an integer (any integer) and run it:

In [None]:
rng = np.random.default_rng(...)
for i in range(10):
    print(rng.random())

Ok, this looks indistinguishably random like before. The power is when you run the code again... and again. (Do it.) So, what's the difference? When we give the random number generator a fixed value (called a seed), then it starts its calculations from the same number, which means the "pseudo-random" numbers that are created are always the same. When we ran the code before without a seed, Python just used your computer's time to choose its seed, which is why the results are not repeatable.

A histogram is useful in visualizing random numbers. To use this, we need to first create an array of random numbers. Our random number generator has a useful way to do this:

In [None]:
random_vals = rng.random(100)
print(random_vals)

The argument to ``rng.random( )`` is the size of the array that it creates. We can visualize this two ways, first as a simple plot, to show that the series appears to be random, and a histogram that shows the distribution of these random values.

In [None]:
plt.figure()
plt.plot(random_vals, 'o')
plt.figure()
plt.hist(random_vals)
plt.show()

This does not look great. In the cell below, put all the code needed to create one million random numbers and create a histogram.

As expected, you should see approximately 100,000 points in each of the 10 bins. However, it seems a little crude to plot only 10 bins with a million data points. ``plt.hist(random_vals, 100)`` will create 100 bins. See what happens. And, run the code again, and again. You'll notice that there are slight fluctuations in the distributions. This is because the random numbers are indeed random.

Another useful distribution is the standard normal distribution, also known as a Gaussian with zero mean and standard deviation of 1. To use this, we can simply repeat our steps but with ``rng.standard_normal(size)`` as the function to generate an array of standard normal random numbers. In the cell below, copy all the code you'll need (starting with ``rng = np.random.default_rng()`` to create a histogram of one million standard normal random numbers.

Once you're happy with your result, copy and paste your code into your submission notebook and make sure it runs.

A **random walk** is a model for diffusion, where a walker takes steps that are defined by a random number generator. Run the following code:

In [None]:
for i in range(10):
    print(rng.choice((-1,1)))

Run the code again and again. What does it appear is happening? 

We can be a little more precise because ``rng.choice((-1,1), 1000)`` creates 1000 random numbers. Make a histogram to confirm that the random number generator samples +1 and -1 with a 50/50 probability.

In a random walk, the location of the walker changes by +1 or -1 at each step, so we may look to do something like:

In [None]:
location = np.zeros(4)

location[0] = 0

location[1] = location[0] + rng.choice((-1,1))
location[2] = location[1] + rng.choice((-1,1))
location[3] = location[2] + rng.choice((-1,1))

plt.figure()
plt.plot(location, '-o')

Run the code many times to confirm that it appears to be a random walk that starts at zero, and takes a random step of +1 or -1 at every time step. We want to both have many more steps and to automate this with a ``for`` loop (and make sure your loop starts on the correct index). Create and plot a random walk with 100 steps.

Even better, we can write a function. Create a function with:
- Input ``N``, the number of steps in the random walk
- Output is a random walk as a ``numpy`` array of length ``N`` (initialize with ``np.zeros(N)`` then set the terms using a loop)

We can then plot it:
- ``plt.figure()``
- ``plt.plot(your_function_name(100))``
- ``plt.show``

Using your function name, these three lines would create a plot of a random walk.

Once you like your results, copy and paste your code to the submission notebook and make sure it works.

Random walks are useful in the aggregation of many walks and taking averages. To do this, it's useful to have a matrix as a data structure, replace the name of your random walk function below and run

In [None]:
matrix_of_walks = np.zeros((2000, 100))
matrix_of_walks[0, :] = your_function_here(100)

time = np.arange(100)

plt.figure()
plt.plot(time, matrix_of_walks[0,:], color='blue')
plt.plot(time, matrix_of_walks[1,:], color='black')
plt.show()

- The first line creates a 2000 x 100 matrix. In this matrix, the first index is a trial number (0 to 1999) and the second index is a time index (0 to 99).
- The second line sets the first row of the matrix ``[0,:]`` equal to a random walk of 100 steps. this sets trial number 0, and the 100 steps.
- We then create an array. The time array is 100 points from 0 to 99.
- The plots show two things: in blue, you see a random walk; in black, you see nothing (because the code I've provided you doesn't do the other rows of the matrix).

Now, we want to use a loop to create all 2000 random walks, and set them equal to different random walks. Next, we want to plot some of them. It may stress your computer to plot all 2000 walks, but we can plot, say 100 of them. To do this, you can use your ``plt.plot(time, matrix_of_walks[i,:])`` within a loop to give Python 100 plot statements! The statement ``matrix_of_walks[i,:]`` is an array that represents trial i and all 100 steps in the random walk.

In random walks, the most useful quantity is called the mean-squared-displacement, or msd for short. This is the average value of the position, squared. First, let's look at the arrays ``matrix_of_walks[:,0]``; ``matrix_of_walks[:,1]``; and ``matrix_of_walks[:,2]``. The ``:`` means we are looking at all trials, and the 0, 1, and 2 mean we are looking at the initial location, the location after one step, and the step after that.

In [None]:
print(matrix_of_walks[:,0])
print(matrix_of_walks[:,1])
print(matrix_of_walks[:,2])

We can also square these arrays,

In [None]:
print(matrix_of_walks[:,0]**2)
print(matrix_of_walks[:,1]**2)
print(matrix_of_walks[:,2]**2)

Finally, we can find the average (msd):

In [None]:
print(np.mean(matrix_of_walks[:,0]**2))
print(np.mean(matrix_of_walks[:,1]**2))
print(np.mean(matrix_of_walks[:,2]**2))

The first two steps results should make sense. After that, randomness ensues. We want to do this for all 100 time steps.

To do this we will need to:
- initialize an array of length 100 to store all our msd values
- Use a ``for`` loop to calculate each msd value and store it in our array
- Create a plot of msd vs. time

Finally, we can ask what we should expect. Statistical analysis of the random walk problem finds that we expect that 
$${\rm msd} = 2 D ({\rm time}),$$
$D$ is called the diffusion coefficient, and in these units, $D = 0.5$. So, create a plot that has two curves, msd. vs time from your random walks, and the mathematical model with $D = 0.5$.

**Exercise**: Gaussian (standard normal) random walks. It's a little disjoint to see steps just +1 or -1. So, instead of ``rng.choice((-1,1))``, let's use ``rng.standard_normal()`` to determine the step size. Now, not only can the step be positive or negative, but its size can vary. This will lead to smoother (and sometimes crazier) looking results. In the cell below, include all the code needed to:
- A function that takes N as an input and outputs an array of length N that is a random walk whose steps are chosen by taking standard normal steps.
- Create a matrix of 2000 trials, each of 100 steps
- Plot 100 of these trials
- Calculate msd at each of the 100 time steps
- Plot msd vs. time and include a second plot with the $D = 0.5$ model curve.

When you are satisfied with your results, copy and paste your results into the submission notebook and make sure it works!