# Notebook 8: More Discrete Random Variables and More Distributions 
***

In this notebook we'll get some more practice identifying and working with the Binomial, Negative Binomial, Geometric, and Poisson distributions. 

We'll need Numpy and Matplotlib for this notebook, so let's load them.  We'll also probably need SciPy's binom function for computing binomial coefficients as well as Python's factorial function.  

In [12]:
import numpy as np 
import matplotlib.pylab as plt 
from scipy.special import binom
from math import factorial 
from math import e 
%matplotlib inline

First, lets remember what our discrete distributuins *mean*.

Which of these distrubutions deals with Bernoulli trials? Bernoulli trials - success/failure, outcomes are independent

1. Binomial - number of success in n independent experiments

2. Negative binomial - number of successes untila specified number of successes

3. Geometric - number of bernoulli trials until a success

4. Poisson - the probability of distibution given number of events in a fixed interval"

If:
1. Binomial - number of success in n independent experiments:
$$
x \text{~} \text{Bin} \left( n, p \right) \\
n: \text{ number of trials} \\
p: \text{ probability of success} \\
P \left( x = k \right) = {n \choose k} p^k \left( 1 - p \right)^{n-k}\\
{n \choose k} = \frac{n!}{k! \left( n - k\right)}!\\
$$

2. Negative binomial - number of successes until a set number of failures / successes
$$
x \text{~} \text{NB}(k, p) \\
n: \text{ number of successes} \\
p: \text{ probability} \\
P \left( x = n \right) = {n-1 \choose k-1} p^k \left( 1 - p \right)^{n-k}\\
$$

3. Geometric - number of bernoulli trials until a success
$$
x \text{~} \text{GEO} \left( k, p \right) \\
P \left(x = k \right) = \left(1 - p \right)^{k-1}p\\
$$

4. Poisson - the probability of distibution given number of events in a fixed interval"
$$
x \text{~} \left( \lambda \right)\\
\lambda: text{ rate of events}\\
P \left( k_{\text{events in interval}} \right) = e^{{-\lambda}} \frac{\lambda^{k}}{k!}
$$

## Exercise 1 - Defective Hard Drives 
*** 

A factory manufactures sold state hard drives for Seagate Technology.  Approximately $4\%$ of hard drives that come off of a particular assembly line are defective. For each of the scenarios below: 

1. Define an appropriate random variable and distribution for the experiment. 
2. State the values that the random variable can take on. 
3. Find the probability that the random variable takes on the value $X = 3$. 
4. State any assumptions that you need to make 

**Part A**: Out of 20 drives, $k$ of them are defective. 

In [33]:
#1. Binomial
#2. Values that r.v. can take on: 0, 1, 2... 20
#3. n = 20, p = 4%, k= how many defective? 3 nbased on question
#4. the probability is the same throughout all the drives

In [34]:
# x ~ bin(n, p)
n1a = 20
p1a = 0.04
k1a = 3

(factorial(n1a)/(factorial(k1a) * factorial(n1a - k1a))) * (p1a ** k1a) * (1 - p1a)**(n1a - k1a)

0.036449853488319195

**Part B**: The number of defective drives made that day, where the rate of defective parts per day is 10. 

In [4]:
#1. Poisson: we have a rate of how much defective parts per day
#2. Values RV can take on: 0, 1, 2, 3, ... all the drives made in that day (could be infinity)
#3. P(x = 3); lambda = 10, k=3
#4. rate is constant throughout the day
    # each drive's probability of being defective is the same as the other drives

In [32]:
# P(3) poisson
l1b = 10
k1b = 3
(e ** (-l1b)) * (l1b**k1b)/(factorial(k1b))

0.007566654960414146

**Part C**: While we observe the assembly line, the first defective drive observed is the $k$th drive observed overall.

In [None]:
#1. Geometric - going until a failed drive is found
#2. Values: 0, 1, 2 ... all drives made in the day could be the kth drive observed thats defective
#3. P(x = 3)
#4.

**Part D**: While we observe the assembly line, the third defective drive observed is the $k$th drive observed overall.

### Exercise 2 - Winning Concert Tickets 
*** 

You and a friend want to go to a concert, but unfortunately only one ticket is still available. The man who sells the tickets decides to toss a coin until heads appears. In each toss heads appears with probability $p$, where $0 < p < 1$, independent of each of the previous tosses. If the number of tosses needed is odd, your friend is allowed to buy the ticket; otherwise you can buy it. Would you agree to this arrangement?

**Part A**: What discrete distribution would be useful in solving this problem? 

In [14]:
# Geometric distribution

**Part B**: In a minute we'll compute the approximate probability that you win the concert tickets, but before doing so, can you solve this problem by intuition (and a small amount of math)? 

In [None]:
#pf1 = p probs friend gets the ticket 1st toss
#pm1 = (1 - p) p probs we get it on the second 1st toss

**Part C**: Write a Python function that takes in the probability of heads on the coin and returns the probability mass function of the random variable described above.  You can safely assume that nobody wants to stick around for more than 100 coin flips, so only consider up to and including the 100th coin flip.  So, the output of this function should be an array or list of length 100.

In [37]:
def pmf_geo(p):
    
    pmf = [((1-p) ** (k - 1)) * p for k in range(1, 101)]
    
    return pmf 

**Part D**: Use the function you wrote in **Part C** to estimate the probabilities that your friend or you win the ticket for different values of the bias of the coin.  Use $p = 0.25$, $~p = 0.5$, and $p = 0.75$. 

In [38]:
pmf_geo(0.5)

[0.5,
 0.25,
 0.125,
 0.0625,
 0.03125,
 0.015625,
 0.0078125,
 0.00390625,
 0.001953125,
 0.0009765625,
 0.00048828125,
 0.000244140625,
 0.0001220703125,
 6.103515625e-05,
 3.0517578125e-05,
 1.52587890625e-05,
 7.62939453125e-06,
 3.814697265625e-06,
 1.9073486328125e-06,
 9.5367431640625e-07,
 4.76837158203125e-07,
 2.384185791015625e-07,
 1.1920928955078125e-07,
 5.960464477539063e-08,
 2.9802322387695312e-08,
 1.4901161193847656e-08,
 7.450580596923828e-09,
 3.725290298461914e-09,
 1.862645149230957e-09,
 9.313225746154785e-10,
 4.656612873077393e-10,
 2.3283064365386963e-10,
 1.1641532182693481e-10,
 5.820766091346741e-11,
 2.9103830456733704e-11,
 1.4551915228366852e-11,
 7.275957614183426e-12,
 3.637978807091713e-12,
 1.8189894035458565e-12,
 9.094947017729282e-13,
 4.547473508864641e-13,
 2.2737367544323206e-13,
 1.1368683772161603e-13,
 5.684341886080802e-14,
 2.842170943040401e-14,
 1.4210854715202004e-14,
 7.105427357601002e-15,
 3.552713678800501e-15,
 1.7763568394002505e

### Exercise 3 - Implementing and Sampling the Geometric Distribution 
***

**Part A**: Write a function flips_until_heads that simulates the coin slipping scenario in **Exercise 2**. Your function should take as its sole argument the probability $p$ of flipping Heads for the coin and return the number of flips observed when you find your first Heads.  

In [None]:
coin = np.array(["H", "T"])

def flips_until_heads(p):

    # your code goes here!
    
    return counter

**Part B**: Now, run many trials of the experiment and count how many trials result in each value of the random variable.  Make a _density_ histogram of the results, using $p=0.25$. 

**Part C**: Use the function you wrote in **Exercise 2** to make a bar plot of the probability mass function of $X$.  Does it look like the density histogram from **Part B**?  If not, run your simulation in **Part B** for more trials.  Does the situation improve?  Again, use $p=0.25$ in your codes.

### Exercise 4 - Predicting Space Shuttle Disasters 
***

The space shuttle _Challenger_ disaster occurred in January 1986 when one of six O-rings failed and caused the main fuel tank to explode.  The failure of the O-ring was likely due to the low temperature at the time of the launch.  Further analysis shows that the probability of an O-ring failure as a function of temperature is given by 

$$
p(T) = \frac{e^{a+bT}}{1 + e^{a+bT}}
$$

where $a = 5.085$ and $b = -0.1156$, and $T$ is the temperature (in degrees Fahrenheit) at the time of the launch of the shuttle. At the time of the _Challenger_ launch the temperature was $T=31$, corresponding to a probability of O-ring failure of $p(31) = 0.8178$. 

**Part A**: Let $X$ be the number of failing O-rings at launch temperature $31^{\circ} F$.  Assume that the failure of each of the six O-rings is independent. What type of probability distribution does $X$ have, and what are the values of its parameters? 

**Part B**: Calculate (by hand) the probability $P(X \geq 1)$ that at least one O-ring fails.

**Part C**:  Let us assume that all space shuttles will be launched at $81^{\circ}F$.  With this temperature, the probability of an O-ring failure is equal to $p(81) = 0.0137$. 

What is the probability that during 23 launches no O-ring will fail, but that at least one O-ring will fail during the 24th launch of the space shuttle? 

**Part D**: What is the probability that no O-ring fails during 24 launches? 