# Assignment 04 Due: Thursday 9/28

## Problem 1

For this assignment we will look at the normal distribution function's PDF and how we can use it to calculate probabilities. Import the numpy, scipy.stats, scipy.integrate, and matplotlib.pyplot libraries.

In [1]:
import numpy as np
import scipy.stats as st
import matplotlib.pyplot as plt
%matplotlib notebook

# Problem 2

Use the scipy library to plot the pdf of a normal distribution that has a mean of 10, sigma of 0.5 and spans the range 5 to 15.
> scipy.stats.norm.pdf(x, loc, scale)

function to plot the pdf of the normal distribution. Here x should be a numpy array of 1000 points from 5 to 15. The prameter loc and scale that the function takes as input is the mean and sigma values of the normal distribution. You plot should look like what is shown below. 

<div>
<img src="attachment:P2.png" width="400"/>
</div>

In [2]:
mean = 10
sigma = 0.5

x = np.arange(5, 15, 0.01)

pdf = st.norm.pdf(x, mean, sigma)


In [3]:
fig_1 = plt.plot(x, pdf, color='blue')
plt.xlabel('X')
plt.ylabel('Probability Density')
plt.title('Normal Distribution PDF')

plt.show()

<IPython.core.display.Javascript object>

# Problem 3

We can calculate the probability that our value lies in a particular range. For the normal distribution the number of sigmas that you are away from the mean are directly related to the probability of finding our value in that range. 

For example if our value x lies in the range $\mu - \sigma \le x \le \mu +\sigma$, where $\mu$ is the mean value of the distribution (10 in this assignment), then the probability of our value x lying in that range is $68.2\%$. 

The probability that x lies in the range $\mu - 2\sigma \le x \le \mu + 2\sigma$ is $95.4\%$. As our range becomes wider, so does the probability that it contains our value x. This can be seen from the plot below (taken from [wikipedia](https://en.wikipedia.org/wiki/Normal_distribution#/media/File:Standard_deviation_diagram_micro.svg)). 

![P4.png](attachment:P4.png)

We will first use the cumulative distribution function (CDF) from scipy to verify these probabilities.

Using
>scipy.stats.norm.cdf(x,loc,scale)

where the inputs are:
* x tells the cdf to add all values upto and below x
* loc is the mean value of the normal distribution
* scale is the sigma of the normal distribution

Use the scipy cdf function to evaluate the probability in the following ranges:
* mean $-\sigma < x <$ mean $+ \sigma$
* mean $-2\sigma < x <$ mean $+ 2\sigma$
* mean $-3\sigma < x <$ mean $+ 3\sigma$

In [4]:
range_1 = st.norm.cdf(mean + sigma, mean, sigma) - st.norm.cdf(mean - sigma, mean, sigma)
range_2 = st.norm.cdf(mean + 2*sigma, mean, sigma) - st.norm.cdf(mean - 2*sigma, mean, sigma)
range_3 = st.norm.cdf(mean + 3*sigma, mean, sigma) -st. norm.cdf(mean - 3*sigma, mean, sigma)


print("Probability in range 1:", range_1)
print("Probability in range 2:", range_2)
print("Probability in range 3:", range_3)

Probability in range 1: 0.6826894921370859
Probability in range 2: 0.9544997361036416
Probability in range 3: 0.9973002039367398


# Problem 4

Define a function that takes an array x, and numbers a and b, which correspond to the mean and sigma of a normal distibution, and returns the evaluation of

$$\frac{1}{b\sqrt{2\pi}}\exp\left[{-\frac{1}{2}\left(\frac{(x-a)}{b}\right)^2}\right] $$

In [5]:
def evaluate_normal_distribution(x, a, b):
    result = (1 / (b * np.sqrt(2 * np.pi))) * np.exp(-0.5 * ((x - a) / b)**2)
    return result



    

# Problem 5

Make a numpy array called xx that contains 1000 points over the range 5 to 15. Use your gaus function from the previous problem that takes the arguments xx, mean, and sigma. Here the mean = 10 and sigma = 0.5, same as in Problem 2. Call this array yy.

Then plot yy vs. xx as markers, and compare it to the PDF curve you found in Problem 2 by plotting both on the same canvas. See below for what you plot should look like.

<div>
<img src="attachment:P5.png" width="400"/>
</div>


In [6]:
xx = np.arange(5,15,.1)
yy = evaluate_normal_distribution(xx, 10, 0.5)


In [7]:
fig = plt.figure('Problem 5 - Normal Distribution')
ax = fig.add_axes([0.1,0.1,0.8,0.8])
ax.plot(xx,yy,linestyle = '', marker = 'o',markersize=4, label = 'Gauss Function')
ax.plot(x,pdf,'r', label='PDF from Problem 2')

ax.legend()
ax.set_xlabel('x');

<IPython.core.display.Javascript object>

# Problem 6

Because the CDF is just the integral of the PDF, we should be able to integrate our gaus function over a particular range and obtain the same probabilities we found in Problem 3.

Use the 
>scipy.integrate.quad(function, a, b, args=(mean, sigma))

where function should be our gaus function, a is the lower limit of our integral, b is the upper limit of the integral, and args are the arguments taken by our gaus function. In our case this is the mean = 10, and sigma = 0.5. This will give you an array with two elements. The first is the integral value (e.g. our probability) and the second is the numerical error of the integration. Remember numerical integrations are approximations so there is an error associated with the calculation.

Using scipy.integrate.quad integrate your gaus function of the three ranges specified in Problem 3.

**Do you end up with the same probabilities?**

In [8]:
import scipy.integrate

integration = scipy.integrate.quad(evaluate_normal_distribution, 5, 15, args=(10, 0.5))
print(integration)

(1.0000000000000002, 8.67103130711643e-10)


In [9]:
#Range 1

integration1 = scipy.integrate.quad(evaluate_normal_distribution, mean-sigma, mean+sigma, args=(10, 0.5))

print(integration1)

(0.682689492137086, 7.579375928402476e-15)


In [10]:
#Range 2

integration2 = scipy.integrate.quad(evaluate_normal_distribution, mean - 2*sigma, mean + 2*sigma, args=(10, 0.5))

print(integration2)

(0.9544997361036412, 1.840356045641616e-11)


In [11]:
#Range 3

integration3 = scipy.integrate.quad(evaluate_normal_distribution, mean - 3*sigma, mean + 3*sigma, args=(10, 0.5))

print(integration3)

(0.9973002039367399, 1.1072256503105314e-14)


YES, you end up with the same probibilities, and it makes sense that setting the bounds as the function range results in a probability of 1 since all probabilities lie in that range. 