# CPS600 - Python Programming for Finance 
###  
<img src="https://www.syracuse.edu/wp-content/themes/g6-carbon/img/syracuse-university-seal.svg?ver=6.3.9" style="width: 200px;"/>

# Probability & Distributions

###  October 11, 2018

Today we'll talk some more about probability. We'll do some exercises using the chalkboard and Python together. We need to understand some concepts so that we can complete the homework.

First, let's remember our motivation:

In [15]:
#Import for your convenience.
import pandas as pd

# Imports for stats
import numpy as np
import scipy.special

# Imports for plotting
from bokeh.layouts import gridplot
from bokeh.plotting import figure, show, output_notebook

# Outputting in the notebook.
output_notebook()

# First of four figures
p1 = figure(title="Normal Distribution (μ=0, σ=0.5)",tools="save",
            background_fill_color="#E8DDCB")

#Parameters for the normal distribution
mu, sigma = 0, 0.5

measured = np.random.normal(mu, sigma, 1000)
hist, edges = np.histogram(measured, density=True, bins=50)

x = np.linspace(-2, 2, 1000)
pdf = 1/(sigma * np.sqrt(2*np.pi)) * np.exp(-(x-mu)**2 / (2*sigma**2))
cdf = (1+scipy.special.erf((x-mu)/np.sqrt(2*sigma**2)))/2

p1.quad(top=hist, bottom=0, left=edges[:-1], right=edges[1:],
        fill_color="#036564", line_color="#033649")
p1.line(x, pdf, line_color="#D95B43", line_width=8, alpha=0.7, legend="PDF")
p1.line(x, cdf, line_color="white", line_width=2, alpha=0.7, legend="CDF")

p1.legend.location = "center_right"
p1.legend.background_fill_color = "darkgrey"
p1.xaxis.axis_label = 'x'
p1.yaxis.axis_label = 'Pr(x)'


#Second of four figures
p2 = figure(title="Log Normal Distribution (μ=0, σ=0.5)", tools="save",
            background_fill_color="#E8DDCB")

#Parameters
mu, sigma = 0, 0.5

measured = np.random.lognormal(mu, sigma, 1000)
hist, edges = np.histogram(measured, density=True, bins=50)

x = np.linspace(0.0001, 8.0, 1000)
pdf = 1/(x* sigma * np.sqrt(2*np.pi)) * np.exp(-(np.log(x)-mu)**2 / (2*sigma**2))
cdf = (1+scipy.special.erf((np.log(x)-mu)/(np.sqrt(2)*sigma)))/2

p2.quad(top=hist, bottom=0, left=edges[:-1], right=edges[1:],
        fill_color="#036564", line_color="#033649")
p2.line(x, pdf, line_color="#D95B43", line_width=8, alpha=0.7, legend="PDF")
p2.line(x, cdf, line_color="white", line_width=2, alpha=0.7, legend="CDF")

p2.legend.location = "center_right"
p2.legend.background_fill_color = "darkgrey"
p2.xaxis.axis_label = 'x'
p2.yaxis.axis_label = 'Pr(x)'


# Third
p3 = figure(title="Gamma Distribution (k=1, θ=2)", tools="save",
            background_fill_color="#E8DDCB")

# Parameters
k, theta = 1.0, 2.0
#k, theta = 0.5, 1.0

measured = np.random.gamma(k, theta, 1000)
hist, edges = np.histogram(measured, density=True, bins=50)

x = np.linspace(0.0001, 20.0, 1000)
pdf = x**(k-1) * np.exp(-x/theta) / (theta**k * scipy.special.gamma(k))
cdf = scipy.special.gammainc(k, x/theta) / scipy.special.gamma(k)

p3.quad(top=hist, bottom=0, left=edges[:-1], right=edges[1:],
        fill_color="#036564", line_color="#033649")
p3.line(x, pdf, line_color="#D95B43", line_width=8, alpha=0.7, legend="PDF")
p3.line(x, cdf, line_color="white", line_width=2, alpha=0.7, legend="CDF")

p3.legend.location = "center_right"
p3.legend.background_fill_color = "darkgrey"
p3.xaxis.axis_label = 'x'
p3.yaxis.axis_label = 'Pr(x)'


# Fourth
p4 = figure(title="Weibull Distribution (λ=1, k=1.25)", tools="save",
            background_fill_color="#E8DDCB")

# Parameters
lam, k = 1, 1.25

measured = lam*(-np.log(np.random.uniform(0, 1, 1000)))**(1/k)
hist, edges = np.histogram(measured, density=True, bins=50)

x = np.linspace(0.0001, 8, 1000)
pdf = (k/lam)*(x/lam)**(k-1) * np.exp(-(x/lam)**k)
cdf = 1 - np.exp(-(x/lam)**k)

p4.quad(top=hist, bottom=0, left=edges[:-1], right=edges[1:],
       fill_color="#036564", line_color="#033649")
p4.line(x, pdf, line_color="#D95B43", line_width=8, alpha=0.7, legend="PDF")
p4.line(x, cdf, line_color="white", line_width=2, alpha=0.7, legend="CDF")

p4.legend.location = "center_right"
p4.legend.background_fill_color = "darkgrey"
p4.xaxis.axis_label = 'x'
p4.yaxis.axis_label = 'Pr(x)'


show(gridplot(p1,p2,p3,p4, ncols=2, plot_width=400, plot_height=400, toolbar_location=None))

In [5]:
# Getting the data
df = pd.read_csv('distroCSV.csv')


# Getting histogram
hist, edges = np.histogram(df.distro1.values, density=True, bins=50)

# Create figure
q1 = figure(title="distro1",tools="save",
            background_fill_color="#E8DDCB")

# Create glyphs ("quad"rilateral)
q1.quad(top=hist, bottom=0, left=edges[:-1], right=edges[1:],
        fill_color="#036564", line_color="#033649")

# Show the figure.
#show(q1)

hist, edges = np.histogram(df.distro2.values, density=True, bins=20)
q2 = figure(title="distro2",tools="save",
            background_fill_color="#E8DDCB")
q2.quad(top=hist, bottom=0, left=edges[:-1], right=edges[1:],
        fill_color="#036564", line_color="#033649")
hist, edges = np.histogram(df.distro3.values, density=True, bins=100)
q3 = figure(title="distro3",tools="save",
            background_fill_color="#E8DDCB")
q3.quad(top=hist, bottom=0, left=edges[:-1], right=edges[1:],
        fill_color="#036564", line_color="#033649")
hist, edges = np.histogram(df.distro4.values, density=True, bins=100)
q4 = figure(title="distro4",tools="save",
            background_fill_color="#E8DDCB")
q4.quad(top=hist, bottom=0, left=edges[:-1], right=edges[1:],
        fill_color="#036564", line_color="#033649")
show(gridplot(q1,q2,q3,q4, ncols=2, plot_width=400, plot_height=400, toolbar_location=None))

In [22]:
logDistro2 = np.log(df.distro2.values)
hist, edges = np.histogram(logDistro2, density=True, bins=50)
q5 = figure(title="log transform of distro2",tools="save",
            background_fill_color="#E8DDCB", plot_width=400, plot_height=300)
q5.quad(top=hist, bottom=0, left=edges[:-1], right=edges[1:],
        fill_color="#036564", line_color="#033649")
print(np.var(logDistro2), np.mean(logDistro2))
show(q5)

2.302939011904085 1.9722945520423396


*And finally the exercises*:

This is a very rich example. Let's look at each one of these distributions to understand what is going on.
1. [Normal](https://en.wikipedia.org/wiki/Normal_distribution)
2. [Log Normal](https://en.wikipedia.org/wiki/Log-normal_distribution)
3. [Gamma](https://en.wikipedia.org/wiki/Gamma_distribution)
4. [Weibull](https://en.wikipedia.org/wiki/Weibull_distribution)

**Exercise 1**

Work out the parameter estimates for the *Gamma* distribution.

***Answer:***

Given a random sample with mean $\bar{x}$ and standard deviation $s$, the parameter estimate for *Gamma* distribution is: 
$$ \hat{\alpha} = (\frac{\bar{x}}{s})^2 $$
$$ \hat{\theta} = \frac{s^2}{\bar{x}} $$

**Exercise 2**

Work out the parameter estimates for the *Weibull* distribution.

***Answer:***



**Exercise 3**

Load the CSV file provided. Plot a histogram for each column. Guess which of the four families we discussed generated each column (you don't get to do this IRL). Using your guess and your parameter formulas, plot the appropriate PDF an CDF functions for each column.

**Total probability**

The simplest expression of the law of total probability is

$$ 1 = P(A) + P(A^c) $$.

For more useful statements, see below.

*Example* Suppose that you toss a fair coin 5 times. What is the probability that you will get at least one heads? 

**Joint Probability**

If $A$ and $B$ are events, then the joint probability $P(A\cap B)$ is the probability of the event $A \cap B$.

*Example* If we roll two dice, what is the probability that both come up $3$? Harder: what is the probability that both come up the same?

**Independent Events**

We say that events are *independent* if $P(A\cap B) = P(A) \cdot P(B)$.

*Example* If we roll one die or toss one coin, and then roll or toss another, then the outcomes are *independent*.

*Non-example* The *sum* of the rolls of two dice is not independent of either of the two individual rolls.

**Conditional Probability**

The *conditional probability of $A$ given $B$* is given by

$$ P(A \mid B) = P(A \cap B) / P(B) = \frac{P(A \cap B)}{P(B)}$$

*Example* What is the conditional probability of rolling a sum of $8$ or more on two dice given that the outcome of the first die is $3$?

What is the conditional probability of flipping a heads given that a tails has just been tossed.

### Total probability... Please copy from blackboard

**Bayes Theorem**

We can also, of course, ask for the conditional probability of $B$ given $A$. *THIS IS EMPHATICALLY NOT THE SAME AS $P(A\mid B)$*. But look at what the definition gives us:

$$ P(B \mid A) = \frac{P(A \cap B)}{P(A)} $$

In particular, we can calculate $P(A \cap B)$ in two different ways...

The simplest expression of Bayes' theorem is this:

$$ P(B \mid A) = \frac{P(A \mid B)P(B)}{P(A)} $$


**Monty Hall**

Let's do the famous [*Monty Hall Problem*](https://en.wikipedia.org/wiki/Monty_Hall_problem).

**Virus Screening**

Let's illustrate the difference between $P(A \mid B)$ and $P(B \mid A)$.
>A patient goes to see a doctor. The doctor performs a test with 99 percent reliability--that is, 99 percent of people who are sick test positive and 99 percent of the healthy people test negative. The doctor knows that only 1 percent of the people in the country are sick. Now the question is: if the patient tests positive, what are the chances the patient is sick?

To solve this, we first set some notation.
* $A$ is the event of having the disease
* $B$ is the event of testing positive for the disease
* We have been asked to calculate $P(A \mid B)$
* We know $P(B \mid A) = .99$.
* We know $P(B \mid A^c) = .01$.

The next step is to put all of this together into Bayes' rule:

$$ P(A \mid B) =  \frac{P(B \mid A)P(A)}{P(B)}$$

Almost there...what's missing? What should we do?

**Maximum Likelihood**

The likelihood of a parameter $p$ given some observations $X$ is defined to be the probability of observing the data given the parameter.

$$ L(p;X) = P(X \mid p)$$

*Example* Suppose we toss a coin $5$ times and it comes up heads $3$ times. What is the most *likely* value for the coin's bias?

In [3]:

np.random.binomial(5, .6, 100).mean()

2.99