# Numpy.random Package

In [2]:
# Import matplotlib.
import matplotlib.pyplot as plt

# Make matplotlib show interactive plots in the notebook.
%matplotlib inline

# Import numpy.
import numpy as np

## Purpose of Package
- The numpy.random package is a sub-package of numpy.
- Its primary use is to produce pseudo-random numbers, bytes, lists or arrays.
- It can also be used to randomise the order of existing lists or arrays, to make a random selection from a list or array.
- Probability distributions can be specified in order to dictate the likelihood of each number appearing. 

## Simple Random Data
- Used to produce random numbers, lists or arrays.

#### Numpy.random.rand
- Acept as an input one or more integers, these are the dimensions of the output.
- If one integer input is provided the function returns a list of that many random numbers between 0 (inclusive) and 1 (exclusive).
- Numbers are selected from a uniform probability distribution, so all numbers are equally as likely to come up.

In [None]:
np.random.rand(7)

- If more than one integer input is provided, these are used as the dimensions of the output array.

In [None]:
np.random.rand(5,3)

- By using this to produce many random numbers and plotting the results in a histogram we can see that the numbers are in fact evenly distributed between 0 and 1.

In [None]:
plt.hist(np.random.rand(10000))
plt.show()

#### Numpy.random.randn
- Operates similarly to numpy.random.rand but populates results from a normal probability distribution.

In [None]:
plt.hist(np.random.randn(10000))
plt.show

#### Numpy.random.randint
- Accepts as inputs a floor value, ceiling value and parameters for the output array.
- Return an integer or array of integers between the floor value and the ceiling value.
- If no ceiling value is included the values range from zero to the floor provided.

In [None]:
np.random.randint(2,high=9,size=(4,3))

#### Numpy.random.choice
- Accepts as an input an integer or array to select from, the size of the array to be returned, and an array of probabilities for each element.
- If an integer is provided, the output will be selected from the list from 0 to that integer.
- Returns an array of the specified size selected from the list or array provided.

In [None]:
np.random.choice(9,4)

#### Numpy.random.bytes
- Accepts an integer as input.
- Returns a byte of that size.

In [None]:
np.random.bytes(7)

## Permutations

####  Shuffle
- Accepts as an input an array or list.
- Shuffles the order of the elements of the array or list.
- This function does not return anything, but the order of the elements in the array are changed permanently.

In [12]:
# Print a random list
arr1 = np.random.rand(6)
print(arr1)
print()

# Use shuffle to rearrange the elements of the array
np.random.shuffle(arr1)
print(arr1)

[0.68923564 0.92954594 0.91811655 0.97530172 0.39700197 0.26262609]

[0.39700197 0.68923564 0.97530172 0.92954594 0.91811655 0.26262609]


- If the array is multidimensional the elements are shuffled on the first dimension only, so the content of the sub-arrays will stay the same but the order of the sub-arrays will be shuffled.
- For a two-dimensional array this means that the rows of the array will change order but the content of each row will remain the same.

In [11]:
# Print a random array
arr2 = np.random.rand(4,2)
print(arr2)
print()

# Use shuffle to rearrange the rows of the array
np.random.shuffle(arr2)
print(arr2)

[[0.80214764 0.14376682]
 [0.70426097 0.70458131]
 [0.21879211 0.92486763]
 [0.44214076 0.90931596]]

[[0.21879211 0.92486763]
 [0.44214076 0.90931596]
 [0.80214764 0.14376682]
 [0.70426097 0.70458131]]


#### Permutation
- Accepts as an input an integer or array, x.
- If the input is an integer the function returns an array with x elements, 0 to (x - 1), in a shuffled order.

In [9]:
print(np.random.permutation(8))

[4 7 1 0 5 3 2 6]


- If the input is multidimensional, the elements will be shuffled in the first dimension only, as with the shuffle function above.
- This function differs from the shuffle function in that it returns the newly shuffled array and does not alter the original array.

In [7]:
# Print a random array
arr3 = np.random.rand(4,3)
print(arr3)
print()

# Use permutation function to produce a new array which is a rearranged version of the previous array
print(np.random.permutation(arr3))
print()

# Print the original array to confirm that it has not changed.
print(arr3)

[[0.62210877 0.43772774 0.78535858]
 [0.77997581 0.27259261 0.27646426]
 [0.80187218 0.95813935 0.87593263]
 [0.35781727 0.50099513 0.68346294]]

[[0.35781727 0.50099513 0.68346294]
 [0.62210877 0.43772774 0.78535858]
 [0.80187218 0.95813935 0.87593263]
 [0.77997581 0.27259261 0.27646426]]

[[0.62210877 0.43772774 0.78535858]
 [0.77997581 0.27259261 0.27646426]
 [0.80187218 0.95813935 0.87593263]
 [0.35781727 0.50099513 0.68346294]]


## Distributions

#### 1.  Uniform Distribution
- This function selects random numbers from a uniform distribution.
- In a uniformly distributed dataset, each number has an equal chance of occurring.
- The function accepts as inputs a floor value, a ceiling value and the size of the array to be returned.
- The output of the function is an array of the specified size containing values from the uniform distribution described.

In [None]:
# Plot a uniform probability distribution
plt.hist(np.random.uniform(3, 13, 10000))
plt.show

- This distribution is used when there is no most probably outcome, for example when rolling a die.

#### 2. Numpy.random.normal
- This function selects random numbers from a normal distribution.
- A normal distribution has mean mu and standard deviation sigma. The mean is the centre of the distribution and the standard deviation is the distance from the mean in which the data is most heavily concentrated.
- The mean value has the highest probability of occurring, and the probability decreases with distance from the mean.
- The function accepts as inputs mu, sigma, and the size of the array to be returned.
- The output of the function is an array of the specified size containing values from the normal distribution described.

In [None]:
# Plot a normal probability distribution
plt.hist(np.random.normal(5, 2, 10000))
plt.show

- This distribution is also known as a bell curve.
- The Central Limit Theorem (CLT) states that if you add a large number of random variables, the distribution of the sum will be approximately normal under certain conditions.
- It has a wide variety of uses. It is often used to describe random variables whose distributions are not known, and often occurs in nature.

#### 3. Numpy.random.exponential
- This function selects random numbers from an exponential distribution.
- A normal distribution has scale beta. Beta is inversely proportional to the rate parameter alpha which defines the shape of the curve.
- The function accepts as inputs an integer scale, and the size of the array to be returned.
- The output of the function is an array of the specified size containing values from the exponential distribution described.

In [None]:
# Plot an exponential probability distribution
plt.hist(np.random.exponential(2, 10000))
plt.show

- This distribution is used where lower values occur very frequently with the probability tailing off at higher values.
- The exponential probability distribution is often used to model time elapsed between events.
- Another example of an exponential distribution would be average earnings of a country.

#### 4. Numpy.random.gamma
- This function selects random numbers from a gamma distribution.
- A gamma distribution has shape k and scale theta.
- The function accepts as inputs k, theta, and the size of the array to be returned.
- The output of the function is an array of the specified size containing values from the gamma distribution described.

In [None]:
# Plot a Gamma probability distribution
plt.hist(np.random.gamma(8,5,10000))
plt.show

- This is an important distribution because of its relationship to to exponential and normal distributions.
- It can be shown that Gamma(1,alpha) = Exponential(alpha).

#### 5. Numpy.random.laplace
- This function selects random numbers from a Laplace distribution.
- A Laplace distribution has position mu, and scale alpha.
- The Laplace distribution is essential two exponential distributions mirrored about mu.
- The function accepts as inputs mu, alpha, and the size of the array to be returned.
- The output of the function is an array of the specified size containing values from the Laplace distribution described.

In [None]:
# Plot a Laplace probability distribution
plt.hist(np.random.laplace(1,0.5,10000))
plt.show

- For many problems in economics and health sciences, this distribution seems to model the data better than the standard Gaussian distribution.

## Seeds
- The outputs produced by numpy.random are pseudo-random, meaning that they are not truly random.
- The package takes a set starting number, known as the seed, and then uses an algorithm to generate a pseudo-random number.
- These pseudo-random outputs are produced by a Mersenne Twister pseudo-random generator.
- Typically the seed is taken from the computer system itself, or the computer's clock. As these are constantly changing, a different pseudo-random number is produced each time the pseudo-random generator runs.
- If desired, a seed can be specified. Using the same seed for the generator will produce the same psuedo-random number each time.
- A specified seed only applies the next time the generator is used, after this the generator reverts to using a seed drawn from the computer system. 
- If the seed and the generator algorithm are known, the pseudo-random output can be predicted. For this reason pseudo-random generators should not be used for security purposes.


In [6]:
# Seed the generator
np.random.seed(1234)
print(np.random.rand())

# Run the generator unseeded
print(np.random.rand())

# Seed the generator with the same seed to confirm that the same output will be produced
np.random.seed(1234)
print(np.random.rand())

0.1915194503788923
0.6221087710398319
0.1915194503788923


## References
- NumPy developers. Numpy.
  https://docs.scipy.org/doc/numpy/reference/routines.random.html
- Wikipedia. Normal distribution. https://en.wikipedia.org/wiki/Normal_distribution
- Open Science Notebook. How to randomly shuffle an array in python using numpy. https://www.science-emergence.com/Articles/How-to-randomly-shuffle-an-array-in-python-using-numpy/
- Introduction to probability, statistics and random processes. https://www.probabilitycourse.com/chapter4/4_0_0_intro.php
- Real Python. Generating Random Data in Python (Guide). https://realpython.com/python-random/