<a href="https://colab.research.google.com/github/sc22lg/COMP3611_Machine_Learning/blob/main/COMP3611_Simulating_Stochastic_Processes.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<a rel="license" href="http://creativecommons.org/licenses/by/4.0/"><img alt="Creative Commons Licence" style="border-width:0" src="https://i.creativecommons.org/l/by/4.0/88x31.png" /></a><br /><span xmlns:dct="http://purl.org/dc/terms/" property="dct:title">COMP3611 - Simulating Stochastic Processes</span> by <span xmlns:cc="http://creativecommons.org/ns#" property="cc:attributionName">Marc de Kamps and University of Leeds</span> is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</a>.

### Learning Outcomes

In Numpy, it is fairly straightforward to generate samples from a distribution. This is notebook is a kind of self-assessment. Are you able to use randomisers and generating data accoding to some common distributions

At the end of this notebook you should be able to:
- Select a random generator for the uniform distribution
- Set its seed
- Inspect simple distributions visually
- Implement Bernoulli and Binomial distributions, or use Numpy functions to that effect
- Explain how samples from a uniform distribution can transformed into samples of other distributions


### Preparation

- Consult the following link as an entry for random process generation: https://numpy.org/doc/stable/reference/random/index.html
- Read up on random number generators. The Wikipedia entry is a decent enough start: https://en.wikipedia.org/wiki/Random_number_generation. For an introduction to the more technical aspects, the book *Numerical Recipes in C*, by Press *et al* contains an informative chapter.

In [1]:
from numpy.random import default_rng
import numpy as np
rng = default_rng(seed = 12)
vals = rng.uniform(size=10)
print(vals)

[0.25082446 0.94675294 0.18932038 0.17929141 0.34988924 0.23054125
 0.67044574 0.11507938 0.89630937 0.85813049]


### Numpy's random module
To generate samples according to various distributions, *numpy.random* is convenient. Above you see an example of how to generate 10 random values.

**Exercise**: note down some of the values that were printed. Now go to the top menu of the notebook, select *Kernel*
and select *Restart & Clear Output*. Run the notebook again. Comment on whether you observe the same values.
If the numbers change every time you restart the book, is this desirable?
- Comment on situations were this is potentially valuable.

There are also good reasons for this to be undesirable.

- State at least one reason for why this is undesirable behaviour.
- If this undesirable behaviour, investigate how you can use seed to prevent it.

### Comments:
Previous values: [0.25082446 0.94675294 0.18932038 0.17929141 0.34988924 0.23054125
 0.67044574 0.11507938 0.89630937 0.85813049]

Values after restart:

This is potentially valuable in programs where randomness is part of the function e.g. a program to shuffle data
This could be undesireable in situations where random numbers are needed and also its desireable for results to be reproduced.
This behaviour can be prevented by using a random seed.

### Common Sense: looking at your data.
*Exercise*: Generate 10000 data points and histogram them in the interval $[0,1]$. The histogram should be approximately flat.

In [3]:
import plotly.express as px

In [4]:
data_points = rng.uniform(size=10000)
df = data_points.tolist()
fig = px.histogram(df)
fig.show()

**Exercise**: Generate 10000 data points $(x,y)$ that are uniformly distributed in the interval $[0,1] \times [0,1]$. Make a scatter plot of the sample and comment on whether the result is as expected.


In [8]:
data_points = rng.uniform(size=(10000,2))
df = data_points.tolist()
fig = px.scatter(df)
fig.show()

**Exercise**: Investigate how you can change the aspect ratio of a plot in the notebook and how you can change the dpi (dots per inch). Experiment to make a plot that you find visually attractive: change marker color and type, as well as size. Save the plot as a pdf file.

In [10]:
data_points_x = rng.uniform(size=10000).tolist()
data_points_y = rng.uniform(size=10000).tolist()
fig = px.scatter(x = data_points_x, y = data_points_y)
fig.show()

**Exercise:** Now that you can simulate uniformly distributed numbers, it is not hard to simulate a Bernoulli process. Simulate a sample of 1000 uniformly distributed events ($[0,1]$), think of a way of converting them into a sequence generated by a Bernoulli process with $\mu = 0.4$. Implement this in Python and perform some common sense checks to whether the result is reasonable.

In [11]:
mu = 0.4
count = 1000
distribution = rng.choice([0, 1], count, p=[1-mu, mu])
df = distribution.tolist()
fig = px.histogram(df)
fig.show()

The count of 0's is 584, count of 1s is 416.

584/1000 = 5.8 which is aproximately 0.6 (1-mu)

416/100 = 4.2 which is aproximately 0.4 (mu)

**Exercise**: Design and implement a function simulate a Bernoulli sample of arbitrary length. Use this as a basis for a function that generates binomial samples. Generate a sample of 10000 $\mbox{Bin}(N=10,p=0.25)$ events. Plot a histogram of the sample that you generated.

In [17]:
def Bin(N, p):
  sample = rng.choice([0, 1], N, p=[1-p, p])
  return sample

N = 10
p = 0.25
count = 10000

totals = []

for i in range(count):
  sample = Bin(N, p)
  totals.append(sum(sample))


fig = px.histogram(totals)
fig.show()


**Exercise** Research the *numpy.random.binomial* method. Use the method to repeat the previous experiment. Also, research whether a method *numpy.random.bernoulli* exists. If not, explain why not.

## The Moral of the Story (so far)

The really hard bit about generating random numbers is the generation of uniformly distributed random numbers. Once this is in place, discrete stochastic processes can be modelled relatively easily, although numpy.random provides a convenient front end for most distributions. Feel free to experiment with other distributions.