# MATH 360 Python Assignment 2

* See [Mathematical Python](https://patrickwalls.github.io/mathematicalpython/) for an introduction to Python and Jupyter
* See [Introduction to Mathematical Modelling](https://ubcmath.github.io/MATH360/) for more examples
* Write solutions in the cells with `YOUR CODE HERE`
* Do **not** import any packages (other than the standard packages in the cell below)
* Run the tests to verify your solutions
* There are **hidden tests** therefore your solutions may not be entirely correct even if they pass the tests below
* Submit your `.ipynb` notebook file to Canvas (download from Syzygy to your machine and upload to Canvas)

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats
import scipy.integrate as spi

## Problem 1: Probability Density Functions

### Part 1a (2 marks)

Let $X \sim U(1,5)$ be a uniform random variable with parameters $a=1$ and $b=5$ and let $f_X$ be the corresponding probability density function. Create the function $f_X$ and save the result as `f1a`.

In [None]:
# YOUR CODE HERE

In [None]:
# Test 1: Verify f1a is a function with positive values. (1 mark)
assert callable(f1a)
assert f1a(0) >= 0
assert f1a(1) >= 0
assert f1a(3) >= 0
assert f1a(10) >= 0
print("Test 1: Success!")

In [None]:
# Test 2: Verify f1a returns the correct values. This cell contains hidden tests. (1 mark)

In [None]:
# Plot the result
x = np.linspace(0,6,200)
plt.figure(figsize=(6,3)), plt.plot(x,f1a(x)), plt.grid(True)
plt.show()

### Part 1b (2 marks)

Let $X \sim Exp(1/2)$ be an exponential random variable with parameter $\lambda=1/2$ and let $f_X$ be the corresponding probability density function. Create the function $f_X$ and save the result as `f1b`.

In [None]:
# YOUR CODE HERE

In [None]:
# Test 1: Verify f1b is a function with positive values. (1 mark)
assert callable(f1b)
assert f1b(0) >= 0
assert f1b(1) >= 0
assert f1b(10) >= 0
print("Test 1: Success!")

In [None]:
# Test 2: Verify f1b returns the correct values. This cell contains hidden tests. (1 mark)

In [None]:
# Plot the result
x = np.linspace(0,10,200)
plt.figure(figsize=(6,3)), plt.plot(x,f1b(x)), plt.grid(True)
plt.show()

### Part 1c (2 marks)

Let $X \sim N(2,3)$ be a normal random variable with mean $\mu=2$ and variance $\sigma^2=3$ and let $f_X$ be the corresponding probability density function. Create the function $f_X$ and save the result as `f1c`.

In [None]:
# YOUR CODE HERE

In [None]:
# Test 1: Verify f1c is a function with positive values. (1 mark)
assert callable(f1c)
assert f1c(0) >= 0
assert f1c(1) >= 0
assert f1c(10) >= 0
print("Test 1: Success!")

In [None]:
# Test 2: Verify f1c returns the correct values. This cell contains hidden tests. (1 mark)

In [None]:
# Plot the result
x = np.linspace(-6,10,200)
plt.figure(figsize=(6,3)), plt.plot(x,f1c(x)), plt.grid(True)
plt.show()

## Problem 2: Computing Probabilities

### Part 2a (2 marks)

Let $X$ be a normal random variable with mean $\mu=2$ and variance $\sigma^2=1.2$. Compute the probability $P(X \geq 1)$. Save the result as `P2a`.

In [None]:
# YOUR CODE HERE

In [None]:
# Test 1: Verify P2a is a number between 0 and 1. (1 mark)
assert P2a > 0
assert P2a < 1
print("Test 1: Success!")

In [None]:
# Test 2: Verify P2a is the correct value. This cell contains hidden tests. (1 mark)
assert abs(P2a - 0.82) < 0.01, "P2a rounded to 2 decimal places is 0.82."

### Part 2b (2 marks)

Let $X$ be an exponential random variable with parameter $\lambda = 1.5$. Compute the probability $P(0.5 \leq X \leq 1.5)$. Save the result as `P2b`.

In [None]:
# YOUR CODE HERE

In [None]:
# Test 1: Verify P2b is a number between 0 and 1. (1 mark)
assert P2b > 0
assert P2b < 1
print("Test 1: Success!")

In [None]:
# Test 2: Verify P2b is the correct value. This cell contains hidden tests. (1 mark)
assert abs(P2b - 0.37) < 0.01, "P2b rounded to 2 decimal places is 0.37."

### Part 2c (2 marks)

Let $X$ be a uniform random variable with range $[-1,6]$. Compute the probability $P(2 \leq X \leq 3)$. Save the result as `P2c`.

In [None]:
# YOUR CODE HERE

In [None]:
# Test 1: Verify P2c is a number between 0 and 1. (1 mark)
assert P2c > 0
assert P2c < 1
print("Test 1: Success!")

In [None]:
# Test 2: Verify P2c is the correct value. This cell contains hidden tests. (1 mark)
assert abs(P2c - 0.14) < 0.01, "P2c rounded to 2 decimal places is 0.14."

## Problem 3: Sampling

### Part 3a (2 marks)

Let $X$ be a uniform random variable with range $[0,10]$. Generate $4000$ random samples of $X$. Save the result as a NumPy array called `x3a`.

In [None]:
# YOUR CODE HERE

In [None]:
# Test 1: Verify x3a is a NumPy array with 4000 entries. (1 mark)
assert isinstance(x3a,np.ndarray) , "x3a should be a NumPy array."
assert x3a.size == 4000 , "x3a should have 4000 entries."
print("Test 1: Success!")

In [None]:
# Test 2: Verify x3a contains samples of U(0,10). This cell contains hidden tests. (1 mark)
assert abs(np.mean(x3a) - 5) < 0.5 , "Sample mean is close to 5."
assert abs(np.var(x3a,ddof=1) - 100/12) < 0.5 , "Unbiased sample variance is near 100/12"

In [None]:
### Plot histogram with the probability density function.
plt.hist(x3a,density=True,bins=50)
plt.plot([0,10],[1/10,1/10],'r')
plt.show()

### Part 3b (2 marks)

Let $X$ be a normal random variable with mean $\mu = 1$ and variance $\sigma^2 = 6$. Generate $2000$ random samples of $X$. Save the result as a NumPy array called `x3b`.

In [None]:
# YOUR CODE HERE

In [None]:
# Test 1: Verify x3b is a NumPy array with 2000 entries. (1 mark)
assert isinstance(x3b,np.ndarray) , "x3b should be a NumPy array."
assert x3b.size == 2000 , "x3b should have 2000 entries."
print("Test 1: Success!")

In [None]:
# Test 2: Verify x3b contains samples of N(1,6). This cell contains hidden tests. (1 mark)
assert abs(np.mean(x3b) - 1) < 0.1 , "Sample mean is close to 5."
assert abs(np.var(x3b,ddof=1) - 6) < 0.5 , "Unbiased sample variance is near 6"

In [None]:
### Plot histogram with the probability density function.
plt.hist(x3b,density=True,bins=50)
x = np.linspace(-8,10,200)
y = stats.norm.pdf(x,loc=1,scale=6**0.5)
plt.plot(x,y)
plt.show()

### Part 3c (2 marks)

Suppose $Y = X_1 + X_2 X_3,$ where $X_1$ is a standard normal random variable, $X_2$ is an exponential random variable with mean $\mu=5$, and $X_3$ is a normal random variable with mean $\mu=4$ and variance $\sigma^2=10$.

Generate $5000$ samples of $Y$ and save the result as `x3c`.

In [None]:
# YOUR CODE HERE

In [None]:
# Test 1: Verify x3c is a NumPy array with 5000 entries. (1 mark)
assert isinstance(x3c,np.ndarray) , "x3c should be a NumPy array."
assert x3c.size == 5000 , "x3c should have 5000 entries."
print("Test 1: Success!")

In [None]:
# Test 2: Verify x3c contains samples of Y. This cell contains hidden tests. (1 mark)
assert abs(np.mean(x3c) - 20) < 0.75 , "Sample mean is close to 20."

In [None]:
### Plot histogram
plt.figure(figsize=(6,3))
plt.hist(x3c,density=True,bins=50), plt.grid(True)
plt.show()

## Problem 4: Kernel Density Estimation

Run the cell below to import the data in the file `data2.csv` and plot the histogram:

In [None]:
x4 = np.loadtxt('data2.txt')
plt.hist(x4,bins=50), plt.grid(True)
plt.show()

### Part 4a (2 marks)

Use the function `scipy.stats.gaussian_kde` to approximate the probability density function of the distribution of the data in `x4`. Use the parameter value `bw_method=0.5`. Save the result as `kde4a`.

In [None]:
# YOUR CODE HERE

In [None]:
# Test 1: Verify kde4a is a function with positive values. (1 mark)
assert callable(kde4a)
assert kde4a(0) >= 0
assert kde4a(1) >= 0
assert kde4a(10) >= 0
print("Test 1: Success!")

In [None]:
# Test 1: Verify kde4a returns the correct values. This cell contains hidden tests. (1 mark)
assert abs(kde4a(5) - 0.20776092) < 1e-8

In [None]:
# Plot kde with a histogram of the data
x = np.linspace(0,15,100)
plt.plot(x,kde4a(x),'r')
plt.hist(x4,bins=50,density=True,color='b',alpha=0.5), plt.grid(True)
plt.show()

The figure above should show that the kernel density estimation is over-smoothed and so the value `bw_method=0.5` is too large for this data. Let's try again with `bw_method=0.1`.

### Part 4b (2 marks)

Use the function `scipy.stats.gaussian_kde` to approximate the probability density function of the distribution of the data in `x4`. Use the parameter value `bw_method=0.1`. Save the result as `kde4b`.

In [None]:
# YOUR CODE HERE

In [None]:
# Test 1: Verify kde4b is a function with positive values. (1 mark)
assert callable(kde4b)
assert kde4b(0) >= 0
assert kde4b(1) >= 0
assert kde4b(10) >= 0
print("Test 1: Success!")

In [None]:
# Test 2: Verify kde4b returns the correct values. This cell contains hidden tests. (1 mark)
assert abs(kde4b(5) - 0.34668468) < 1e-8

In [None]:
# Plot the kde with a histogram of the data
x = np.linspace(0,15,100)
plt.plot(x,kde4b(x),'r')
plt.hist(x4,bins=50,density=True,color='b',alpha=0.5), plt.grid(True)
plt.show()

### Part 4c (2 marks)

The distribution is bimodal. Approximate the `x` values of the two local maximum values of the kernel density function `kde4b` computed with `bw_method=0.1`. Save the smaller value as `x4c1` and the larger value as `x4c2`.

In [None]:
# YOUR CODE HERE

In [None]:
# Test 1: Verify x4c1 is the correct value. This cell contains hidden tests. (1 mark)
assert round(x4c1) == 5

In [None]:
# Test 2: Verify x4c2 is the correct value. This cell contains hidden tests. (1 mark)
assert round(x4c2) == 9

### Part 4d (2 marks)

The distribution is bimodal. Let $X$ be a random varaible whose probability density function is given by the kernel density function `kde4b` computed with `bw_method=0.1`. Compute the probability $P(5 \leq X \leq 9)$. Save the result as `P4d`.

In [None]:
# YOUR CODE HERE

In [None]:
# Test 1: Verify P4d is a value between 0 and 1. (1 mark)
assert P4d > 0
assert P4d < 1
print("Test 1: Success!")

In [None]:
# Test 2: Verify P4d is the correct value. This cell contains hidden tests. (1 mark)
assert abs(P4d - 0.5) < 0.05

## Problem 5: Modelling with Random Variables

Suppose you want to get a coffee between classes. The time it takes to get a coffee can be modelled with the following variables:

* the total walking time $X_1$ (in minutes) from class, to the coffee shop, and to your next class is a normal random variable with mean 2 and variance 0.5.
* the total waiting time $X_2$ (in minutes) to queue for coffee, order, and receive your drink is an exponential random variable with mean 5.

The mean value of the total walking plus waiting time is 7 minutes and the time between classes is 10 minutes. What is the probability that you are able to obtain a coffee between classes while arriving at the next class on time?

In other words, let $Y = X_1 + X_2$ be the total walking plus waiting time. We want to estimate $P(Y \leq 10)$.

Let's compute the probability in two ways: Monte Carlo and KDE.

### Part 5a (2 marks)

Compute $N=10000$ samples of $Y$ and count the number of values less than or equal to 10. Use this to compute the probability $P(Y \leq 10)$. Save the result as `P5a`.

In [None]:
# YOUR CODE HERE

In [None]:
# Test 1: Verify P5a is a value between 0 and 1. (1 mark)
assert P5a > 0
assert P5a < 1
print("Test 1: Success!")

In [None]:
# Test 2: Verify P5a is the correct value. This cell contains hidden tests. (1 mark)
assert round(P5a,1) == 0.8
print("Test 2: Success!")

### Part 5b (2 marks)

Compute $N=1000$ samples of $Y$ (note that this is a smaller number than part 5a) and use `scipy.stats.gaussian_kde` to compute the kernel density function. Plot the KDE with the histgram of the data to choose a good value of `bw_method`. Integrate the KDE to approximate the probability $P(Y \leq 10)$. Save the result as `P5b`.

In [None]:
# YOUR CODE HERE

In [None]:
# Test 1: Verify P5b is a value between 0 and 1. (1 mark)
assert P5b > 0
assert P5b < 1
print("Test 1: Success!")

In [None]:
# Test 2: Verify P5b is the correct value. This cell contains hidden tests. (1 mark)
assert round(P5b,1) == 0.8
print("Test 2: Success!")