# Computer Lab 3 


# Monte Carlo integration and the Central Limit Theorem



## Learning Objectives

- Using `numpy` to model stochastic processes
- Using `matplotlib` for visualising results
- Understand and apply Monte Carlo integration 
- Understand and apply the Central Limit Theorem

This material has been adapted from the following sources:
- [TowardsAI](https://github.com/towardsai/tutorials)
- [Hands-On Reinforcement Learning With Python](https://github.com/sudharsan13296/Hands-On-Reinforcement-Learning-With-Python)


Author:
- Micaela Matta

Let's go with some imports: 🚀

In [None]:
##########################################
#                                        #
#               IMPORTS                  #
#                                        #
##########################################

# run this cell before any of the cells below, 
# or you'll get a NameError for trying to use a module that's not been imported!
import numpy as np
import math
import random
import matplotlib.pyplot as plt
%matplotlib inline

## Using numpy as a source of pseudorandom numbers


As seen in the previous CL, `numpy`’s random module (`numpy.random`) provides a means to pseudorandomly generate randomised values.

For more information see also the [documentation](https://numpy.org/doc/1.16/reference/routines.random.html).

In [None]:
N = 1992
np.random.seed(N) # we set up the random seed for reproducibility

A 3x3 array of random integers distributed uniformly between `0` and `100`:

In [None]:
np.random.randint(0, 100,(3,3)) 

A 3x3 array of `[0, 1)`:

In [None]:
np.random.randn(3,3)

In [None]:
np.random.normal?

A 1D array with `N` uniformly distributed random numbers:

In [None]:
np.random.uniform(0,1,N)

A 1D array with `N` normally distributed random numbers:

In [None]:
np.random.normal(0,1,N)

A single random number from a uniform distribution:

In [None]:
np.random.uniform(0,1) 

A single random number from a normal distribution:

In [None]:
np.random.normal(0,1)

### Random seed


Setting a random seed in `numpy.random` allows you to control the random number generator, ensuring that the same sequence of random numbers will be generated each time you run your code. This is useful for debugging purposes, as it allows you to reproduce the same results. It is also useful for comparing results from different runs of the same algorithm, as the same data is used each time. Additionally, it can be used to control the randomness of an algorithm, allowing you to explore a range of possible outcomes.


Let's visualise the effect of different random seeds using a plot:

In [None]:
x = np.linspace(0,1,10)
np.random.seed(10)
y1 = np.random.uniform(0,1,10)
np.random.seed(2022)
y2 = np.random.uniform(0,1,10)
np.random.seed(10)
y3 = np.random.uniform(0,1,10)

plt.plot(x, y1, label='seed = 10')
plt.plot(x, y2, label='seed = 2022')
plt.plot(x, y3, label='seed = 10')

plt.legend()
plt.show()

We only see 2 lines because the series `y1` and `y3`, having the same seed, are identical:

In [None]:
y1==y3

## Estimating the value of $π$ using Monte Carlo

As you have seen in the last lecture, the Monte Carlo method can also be used to estimate the value of $π$ (3.141592...)

If we define a circle of radius 0.5 enclosed by a 1×1 square, the area of the circle is $πr^2=π/4$ and the ratio between the area  of the circle and the square is $π/4$.

In [None]:
seed = 5331
np.random.seed(seed)

square_size = 1
sample_size = 100 # try different values!
arc = np.linspace(0, np.pi/2, 100)

Then we generate a large number of uniformly distributed random points in any position within the square - between (0,0) and (1,1):

In [None]:
x = np.random.uniform(0,1,sample_size)
y = np.random.uniform(0,1,sample_size)

If they fall within the circle, they are coloured yellow, otherwise they are coloured blue. We keep track of the number of points that fall inside the circle. 

In [None]:
plt.axes().set_aspect('equal')

#index lists
in_square = [] 
in_circle = []

#iterate over each point
for i in range(sample_size):
    
    #check if it falls inside the circle 
    if math.sqrt(x[i]**2 + y[i]**2) <= 1:
        in_circle.append(i)
        #plot in yellow
        plt.plot(x[i], y[i], '.y')
    else:
        in_square.append(i)
        #plot in blue
        plt.plot(x[i], y[i], '.b')

#plot the arc of circonference as line
plt.plot(np.cos(arc), np.sin(arc), 'r', lw=3)

plt.show()

If we divide the number of points within the circle, `N_in_circle`, by the total number of points, `sample_size`, we should get a value that is an approximation of the ratio of the areas we calculated above, $π/4$.

In [None]:
# get the list len
N_in_circle = len(in_circle)

#calculate pi
pi = 4 * N_in_circle/sample_size

print(f"The approximate value of π is {pi}")

<div class="alert alert-success"><b>Check your understanding: </b>
   
    
- What happens to your estimate of $π$ as you increase `sample_size`?  
- How large does `sample_size` need to be to provide a good estimate of the first 4 significant digits of $π$? 
    
</div>

# Demonstrating the Central Limit Theorem


The Central Limit Theorem (CLT) states that, given a sufficiently large sample size from a population with finite variance, the mean of all possible sample means, $ \bar{X_{n}}$ is equal to the population mean $\mu$ and the distribution of all possible sample means is approximately normal.

We will demonstrate the validity of the CLT with an example.

<div class="alert alert-success"><b> Task 1.1: </b>
The central limit theorem

- What does each line of code in the cell below do? **Hint:** run each line of code separately, and print each variable to get a better understanding of the code.

- What happens as you change the parameters `population_size`, `number_of_samples` and `sample_size`? Record the change in the plot below as you change the parameters, and describe what you observe relating the behaviour to what you have learnt about the central limit theorem. 

- **Label each plot you save to show which parameters you used to produce it.**

- **Upload your plots as a single file and fill in the text box on KEATS.**


</div>

In [None]:
population_size = 100   # what happens if you change this number?
number_of_samples = 10  # what happens if you change this number?
sample_size = 5        # what happens if you change this number?

np.random.seed(440241)  # don't change the seed

population = np.random.rand(population_size)
sample_means = np.random.rand(number_of_samples)
c = np.random.rand(number_of_samples)
 
for i in range(number_of_samples):
    c = np.random.randint(1,population_size, sample_size)
    sample_means[i] = population[c].mean()

Now we use seaborn and matplotlib to visualise the results:

In [None]:
import seaborn as sns
import warnings
warnings.filterwarnings("ignore")
# size of x and y axis ticks
plt.xticks(fontsize=14)
plt.yticks(fontsize=14)

# we plot a histogram as well as its kde (kernel density estimate)
# for more info about the kde, see the explanation in the cell below
sns.distplot(sample_means, bins=int(180/5), hist = True, kde = True)

# title axis labels
plt.title('Histogram of Sample mean',fontsize=20) 
plt.xlabel('Sample mean',fontsize=20)
plt.ylabel('Count',fontsize=20)
plt.show()
# uncomment this to save the plot to a file
#plt.savefig('CLT_100_10_5.png',dpi=300)

<div class="alert alert-block alert-info">
<b> FYI: Kernel Density Estimation (KDE) plots</b>

- Kernel Density Estimation (KDE) is a smoothing technique used to represent the distribution of a given data set 
- KDE is used to estimate the probability density function of a given data set.


<div class="alert alert-block alert-info">
    <b> Key points:</b> 


- Monte Carlo algorithms (MCAs) are used to simulate randomly generated outcomes or to determine the probability of an event occurring.
- MCAs are used in a wide range of fields, including finance, physics, chemistry, and engineering.
- MCAs can be used to optimize parameters, simulate physical systems, and generate random numbers.
- In chemistry, MCAs  are used to model complex chemical systems and simulate the behavior of molecules.

**next notebook: `CL3_NB2_polymers.ipynb`**