### Python Module of the Week: `random`
---
Author: James D. Triveri

The **random** library is builtin to Python, and exposes functionality to generate random numbers and randomly sample from existing collections of objects.

Generated numbers are pseudo-random. We can set a seed to generate the same sequence of random pseudo-random numbers:


In [None]:

import random 

random.seed(516)

r = []
for i in range(5):
    val = random.random()
    r.append(val)

print(f"r: {r}")


If I set the seed again to `516`, the same list will be generated:

In [None]:

random.seed(516)

r = []
for i in range(5):
    val = random.random()
    r.append(val)

print(f"r: {r}")


If I do not set the seed, the list of numbers will be different:


In [None]:

r = []
for i in range(5):
    val = random.random()
    r.append(val)

print(f"r: {r}")



In [None]:


r = []
for i in range(5):
    val = random.random()
    r.append(val)


print(f"r: {r}")


`random.uniform` generates random float in the range of [a, b]:

In [None]:

# Generate 10 floats between 0-50.
r = []
for i in range(10):
    val = random.uniform(0, 50) 
    r.append(val)

for j in r:
    print(j)
    

`random.randint` generates random integers falling within [a, b] (inclusive of endpoints):

In [None]:

# Generate 10 ints between 0-50.
r = []
for i in range(10):
    val = random.randint(0, 50) 
    r.append(val)

for j in r:
    print(j)

`random.choice` will return an element at random from a sequence:

In [None]:

fruits = ["apple", "orange", "kiwi", "banana", "blueberry", "strawberry", "pineapple", "peach"]

random.choice(fruits)


`random.choices` returns a length `k` list with replacement:

In [None]:

fruits = ["apple", "orange", "kiwi", "banana", "blueberry", "strawberry", "pineapple", "peach"]

random.choices(fruits, k=10)


Different weights can be assigned to each observation by passing in a `weights` list:

In [None]:

fruits = ["apple", "orange", "kiwi", "banana", "blueberry", "strawberry", "pineapple", "peach"]
weights = [.75, .19, .01, .01, .01, .01, .01, .01]

random.choices(fruits, weights=weights, k=20)

`random.sample` chooses `k` unique random elements from a population sequence. If `k` is larger than the length of the sequence, `ValueError` is thrown:

In [None]:

vals = [2, 4, 6 ,8, 10, 12, 14, 16, 18, 20]

random.sample(vals, k=5)


`random.shuffle` shuffles a sequence in-place:

In [None]:

a = list(range(0, 33, 3))

print(f"a before shuffle: {a}")

# Call random.shuffle (in-place operation).
random.shuffle(a)

print(f"a after shuffle: {a}")


`random.gauss` produces samples from a standard normal distribution (bell curve)

In [None]:

import matplotlib.pyplot as plt

r = []
for i in range(100000):
    val = random.gauss(mu=0, sigma=1)
    r.append(val)


# Plot histogram of standard normal random samples. 
fig, ax = plt.subplots(1, 1)
ax.set_title("standard normal random samples (mean=0, sigma=1)")
ax.hist(r, 35, edgecolor="white")
plt.show()


For comparison, a histogram of random uniform values:

In [None]:

g = []
for i in range(100000):
    val = random.gauss(mu=0, sigma=1)
    g.append(val)


u = []
for i in range(100000):
    val = random.uniform(a=0, b=10)
    u.append(val)



fig, ax = plt.subplots(1, 2, figsize=(10, 4))

# Plot histogram of standard normal random samples. 
ax[0].set_title("standard normal random samples (mean=0, sigma=1)", size=10)
ax[0].hist(g, 30, edgecolor="white")

# Plot histogram of uniform random samples. 
ax[1].set_title("uniform random samples a=0, b=10", size=10)
ax[1].hist(u, 30, edgecolor="white")
plt.show()



Short video on the Gaussian distribution:

- https://www.youtube.com/watch?v=rzFX5NWojp0

For an list of data $x$ of length $n$, we can compute the mean (average) as follows:


$$
\bar{x} = \frac{1}{n} \sum_{i=1}^n x_{i}
$$


The variance is defined as:

$$
\mathrm{s^2} = \frac{1}{n - 1}\sum_{i=1}^{n} (x_i - \bar{x})^2
$$


The standard deviation is the square root of the variance:

$$
\sigma = s = \sqrt{s^2}
$$



For data the follows a normal distribution, the Empirical rule states:

- ~68% of the data fall within +/- 1 s.d.
- ~95% of the data fall within +/- 2 s.d.
- ~99.7% of the data fall within +/- 3 s.d.


In [None]:

import statistics

x = [4, 7, 3, 6, 5]

# x_bar = statistics.mean(x)
# x_var = statistics.variance(x)
# x_std = statistics.stdev(x)

# print(f"x_bar: {x_bar:.5f}")
# print(f"x_var: {x_var:.5f}")
# print(f"x_std: {x_std:.5f}")
