In [2]:
from scipy import stats

## Probability Distribution Practice

### Lt Col Horton

In this notebook, I will provide some practice problems that will help reinforce your understanding of probability distributions. Throughout this practice, I highly encourage you to refer to the `scipy` documentation online. While you are there, feel free to explore areas we don't cover in this notebook (particularly plotting and randomization). 

For each of the tasks below **_1)_** define a random variable that will help you answer the question; **_2)_** state the distribution and parameters of that random variable; **_3)_** determine the expected value and variance of that random variable. 

I will demonstrate using **_1.1_** below. 

#### Problem 1

The T-6 training aircraft is used during UPT. Suppose that on each training sortie, aircraft return with a maintenance-related failure at a rate of 1 per 100 sorties. 

**_1.1_** Find the probability of no maintenance failures in 15 sorties. 

**_1.2_** Find the probability of at least two maintenance failures in 15 sorties. 

**_1.3_** Find the probability of at least 30 successful (no mx failures) sorties before the first failure.

**_1.4_** Find the probability of at least 50 successful sorties before the third failure. 



##### Demonstration using 1.1

Find the probability of no maintenance failures in 15 sorties.

$X$: the number of maintenance failures in 15 sorties. 

$X\sim \textsf{Bin}(n=15, p=0.01)$

$E(X) = np = 15*0.01 = 0.15$

$V(X) = np(1-p) = 15*0.01*0.99 = 0.1485$

Probability of no maintenance failures, $P(X=0)$:  

In [3]:
stats.binom.pmf(0,15,0.01)

0.8600583546412885

The probability of getting no maintenance failures in 15 sorties is about 0.86. It is worth taking a moment to make sure this value makes sense. Since failures are fairly unlikely, and the **expected** number of failures (0.15) is close to 0, then 0 failures should be a fairly likely outcome and a probability of 0.86 makes sense. 

**1.2**  Find the probability of at least 2 maintenance failures in 15 sorties

In [5]:
1 - stats.binom.cdf(1, 15, 0.01)

0.00041580270187557833

**1.3** Find the probability of at least 30 successful (no mx failures) sorties before the first failure

In [6]:
stats.binom.cdf(0, 30, 0.01)

0.7397003733882802

**1.4** Find the probability of at least 50 successful sorties before the third failure. 

In [10]:
print("Using the negative binomial distribution:", 1 - stats.nbinom.cdf(49, 3, 0.01))

Using the negative binomial distribution: 0.9846473742663409


#### Problem 2

On a given Saturday, suppose vehicles arrive at the USAFA North Gate according to a Poisson process at a rate of 40 arrivals per hour. 

**_2.1_** Find the probability no vehicles arrive in 10 minutes. 

**_2.2_** Find the probability at least 50 vehicles arrive in an hour. 

**_2.3_** Find the probability that at least 5 minutes will pass before the next arrival.

**_2.4_** Find the probability that the next vehicle will arrive between 2 and 10 minutes from now. 

**_2.5_** Find the probability that at least 7 minutes will pass before the next arrival, given that 2 minute have already passed. Compare this answer to **_2.3_**. This is an example of the *memoryless* property of the exponential distribution.

**_2.6_** Fill in the blank. There is a probability of 90% that the next vehicle will arrive within __ minutes. This value is known as the 90% percentile of the random variable. 


Poisson distribution: Probability of a certain number of events occuring in a fixed time interval given the average rate of occurance, $\mu$, assuming independent arrivals.

**2.1** Find the probability no vehicles arrive in 10 min

In [27]:
stats.poisson.pmf(0, 40/6)

0.0012726338013398079

**2.2** Find the probability that at least 50 vehicles arrive in an hour.

In [28]:
1 - stats.poisson.cdf(49, 40)

0.07033506665939493

**2.3** Find the probability that at least 5 minutes will pass before the next arrival

In [29]:
stats.poisson.cdf(0, 40/12)

0.035673993347252395

**2.4** Find the probability that the next vehicle will arrive between 2 and 10 minutes from now

In [30]:
lambda_exp = 40/60
stats.expon.cdf(10, scale=1/lambda_exp) - stats.expon.cdf(2, scale=1/lambda_exp)

0.2623245043143869

**2.5** Find the probability that at least 7 minutes will pass before the next arrival, given that 2 minutes have already passed. Compare this answer to **2.3**. This is an example of the *memoryless* property of the exponential distribution.

In [31]:
rate = 1/lambda_exp
(1 - stats.expon.cdf(7, scale = rate))/(1 - stats.expon.cdf(2, scale = rate))

0.03567399334725243

**2.6** Fill in the blank. There is a probability of 90% that the next vehicle will arrive within __ minutes. This value is known as the 90% percentile of the random variable. 

In [34]:
stats.expon.ppf(0.9, scale=rate)

3.453877639491069

#### Problem 3

Suppose there are 12 male and 7 female cadets in a classroom. I select 5 completely at random (without replacement). 

**_3.1_** Find the probability I select no female cadets. 

**_3.2_** Find the probability I select more than 2 female cadets. 

**3.1**

In [35]:
M = 19 # total number
n = 7 # number females
N = 5 # number of draws
k = 0 # number we are wondering about

stats.hypergeom.pmf(k, M, n, N)

0.06811145510835913

**3.2**

In [38]:
k = 2
1 - stats.hypergeom.pmf(k, M, n, N)

0.6026831785345721

#### Problem 4

Suppose PFT scores in the cadet wing follow a normal distribution with mean 330 and standard deviation 50. 

**_4.1_** Find the probability a randomly selected cadet has a PFT score higher than 450. 

**_4.2_** Find the probability a randomly selected cadet has a PFT score within 2 standard deviations of the mean.

**_4.3_** Find $a$ and $b$ such that 90% of PFT scores will be between $a$ and $b$. 

**_4.4_** Find the probability a randomly selected cadet has a PFT score higher than 450 given he/she is among the top 10% of cadets. 

**_4.1_** Find the probability a randomly selected cadet has a PFT score higher than 450. 

In [43]:
mean = 330
sd = 50
1 - stats.norm.cdf(450, loc=mean, scale=sd)

0.008197535924596155

**_4.2_** Find the probability a randomly selected cadet has a PFT score within 2 standard deviations of the mean.

In [44]:
lower_bound = 230
upper_bound = 430 # we want the probability it is in between these two bounds

stats.norm.cdf(upper_bound, mean, sd) - stats.norm.cdf(lower_bound, mean, sd)

0.9544997361036416

**_4.3_** Find $a$ and $b$ such that 90% of PFT scores will be between $a$ and $b$. 

In [45]:
a = stats.norm.ppf(0.05, mean, sd)
b = stats.norm.ppf(0.95, mean, sd)

print("a:", a)
print("b:", b)

a: 247.75731865242636
b: 412.2426813475736


**_4.4_** Find the probability a randomly selected cadet has a PFT score higher than 450 given he/she is among the top 10% of cadets. 

In [52]:
(1 - stats.norm.cdf(450, mean, sd)) / 0.1

0.08197535924596155

#### Problem 5

Suppose time until computer errors on the F-35 follows a Gamma distribution with mean 20 hours and variance 10.  

**_5.1_** Find the probability that 50 hours pass without a computer error. 

**_5.2_** Find the probability that 75 hours pass without a computer error, given that 25 hours have already passed. Dose the memoryless property apply to the Gamma distribution? 

**_5.3_** Find $a$ and $b$: There is a 95% probability time until next computer error will be between $a$ and $b$.  

**_5.1_** Find the probability that 50 hours pass without a computer error. 

In [54]:
alpha = 40
beta = 1/2

1 - stats.gamma.cdf(50, alpha, beta)

0.07365494485857027

**_5.2_** Find the probability that 75 hours pass without a computer error, given that 25 hours have already passed. Dose the memoryless property apply to the Gamma distribution? 

In [55]:
(1 - stats.gamma.cdf(75, alpha, beta)) / (1 - stats.gamma.cdf(25, alpha, beta))

4.577191724634049e-06

**_5.3_** Find $a$ and $b$: There is a 95% probability time until next computer error will be between $a$ and $b$.  