# Assignment 04 Due: Thursday 9/28

## Problem 1

For this assignment we will look at the normal distribution function's PDF and how we can use it to calculate probabilities. Import the numpy, scipy.stats, scipy.integrate, and matplotlib.pyplot libraries.

In [1]:
import numpy as np
import scipy.stats as st
import matplotlib.pyplot as plt
from scipy import integrate
%matplotlib notebook

# Problem 2

Use the scipy library to plot the pdf of a normal distribution that has a mean of 10, sigma of 0.5 and spans the range 5 to 15.
> scipy.stats.norm.pdf(x, loc, scale)

function to plot the pdf of the normal distribution. Here x should be a numpy array of 1000 points from 5 to 15. The prameter loc and scale that the function takes as input is the mean and sigma values of the normal distribution. You plot should look like what is shown below. 

<div>
<img src="attachment:P2.png" width="400"/>
</div>

In [2]:
xs = np.linspace(5,15,1000)
pdfs = st.norm.pdf(xs, 10, 0.5)
plt.plot(xs,pdfs)
plt.title("Normal Distribution")

<IPython.core.display.Javascript object>

Text(0.5, 1.0, 'Normal Distribution')

# Problem 3

We can calculate the probability that our value lies in a particular range. For the normal distribution the number of sigmas that you are away from the mean are directly related to the probability of finding our value in that range. 

For example if our value x lies in the range $\mu - \sigma \le x \le \mu +\sigma$, where $\mu$ is the mean value of the distribution (10 in this assignment), then the probability of our value x lying in that range is $68.2\%$. 

The probability that x lies in the range $\mu - 2\sigma \le x \le \mu + 2\sigma$ is $95.4\%$. As our range becomes wider, so does the probability that it contains our value x. This can be seen from the plot below (taken from [wikipedia](https://en.wikipedia.org/wiki/Normal_distribution#/media/File:Standard_deviation_diagram_micro.svg)). 

![P4.png](attachment:P4.png)

We will first use the cumulative distribution function (CDF) from scipy to verify these probabilities.

Using
>scipy.stats.norm.cdf(x,loc,scale)

where the inputs are:
* x tells the cdf to add all values upto and below x
* loc is the mean value of the normal distribution
* scale is the sigma of the normal distribution

Use the scipy cdf function to evaluate the probability in the following ranges:
* mean $-\sigma < x <$ mean $+ \sigma$
* mean $-2\sigma < x <$ mean $+ 2\sigma$
* mean $-3\sigma < x <$ mean $+ 3\sigma$

In [3]:
print("Probability of being between mean-sigma and mean+sigma")
st.norm.cdf(10.5,10,0.5) - st.norm.cdf(9.5,10,0.5)

Probability of being between mean-sigma and mean+sigma


0.6826894921370859

In [4]:
print("Probability of being between mean- 2sigma and mean+ 2sigma")
st.norm.cdf(11,10,0.5) - st.norm.cdf(9,10,0.5)

Probability of being between mean- 2sigma and mean+ 2sigma


0.9544997361036416

In [5]:
print("Probability of being between mean-sigma and mean+sigma")
st.norm.cdf(11.5,10,0.5) - st.norm.cdf(8.5,10,0.5)

Probability of being between mean-sigma and mean+sigma


0.9973002039367398

# Problem 4

Define a function that takes an array x, and numbers a and b, which correspond to the mean and sigma of a normal distibution, and returns the evaluation of

$$\frac{1}{b\sqrt{2\pi}}\exp\left[{-\frac{1}{2}\left(\frac{(x-a)}{b}\right)^2}\right] $$

In [6]:
def normal_pdf(x,a,b):
    pdf = np.exp(-(x-a)**2 / (2*b*b))
    pdf = pdf / (np.sqrt(2*np.pi) * b)
    return pdf
    

# Problem 5

Make a numpy array called xx that contains 1000 points over the range 5 to 15. Use your gaus function from the previous problem that takes the arguments xx, mean, and sigma. Here the mean = 10 and sigma = 0.5, same as in Problem 2. Call this array yy.

Then plot yy vs. xx as markers, and compare it to the PDF curve you found in Problem 2 by plotting both on the same canvas. See below for what you plot should look like.

<div>
<img src="attachment:P5.png" width="400"/>
</div>


In [7]:
xx = np.linspace(5,15,100)
yy = st.norm.pdf(xx, 10, 0.5)
mypdfs = normal_pdf(xx, 10, 0.5)

In [8]:
#plot it
fig = plt.figure('Normal PDF')
ax = fig.add_axes([0.1,0.1,0.8,0.8])
#draw 
ax.set_title('Normal Distribution')
ax.plot(xx,mypdfs,linestyle = ' ', marker = "o",label = 'Gauss Function');
ax.plot(xx,yy,linestyle = '-', label = 'PDF From Problem 2');
ax.set_xlabel('x');
plt.xlim(5,15)
ax.legend();

<IPython.core.display.Javascript object>

# Problem 6

Because the CDF is just the integral of the PDF, we should be able to integrate our gaus function over a particular range and obtain the same probabilities we found in Problem 3.

Use the 
>scipy.integrate.quad(function, a, b, args=(mean, sigma))

where function should be our gaus function, a is the lower limit of our integral, b is the upper limit of the integral, and args are the arguments taken by our gaus function. In our case this is the mean = 10, and sigma = 0.5. This will give you an array with two elements. The first is the integral value (e.g. our probability) and the second is the numerical error of the integration. Remember numerical integrations are approximations so there is an error associated with the calculation.

Using scipy.integrate.quad integrate your gaus function of the three ranges specified in Problem 3.

**Do you end up with the same probabilities?**

In [9]:
print("Probability of being between mean-sigma and mean+sigma")
integrate.quad(normal_pdf, 9.5, 10.5, args = (10,0.5))[0]

Probability of being between mean-sigma and mean+sigma


0.682689492137086

In [10]:
print("Probability of being between mean - 2sigma and mean + 2sigma")
integrate.quad(normal_pdf, 9, 11, args = (10,0.5))[0]

Probability of being between mean - 2sigma and mean + 2sigma


0.9544997361036414

In [11]:
print("Probability of being between mean - 3sigma and mean + 3sigma")
integrate.quad(normal_pdf, 8.5, 11.5, args = (10,0.5))[0]

Probability of being between mean - 3sigma and mean + 3sigma


0.9973002039367399

Ended up with same probabilities