In [2]:
from scipy import stats

## Probability Distribution Practice (Solution)

### Lt Col Horton

In this notebook, I will provide some practice problems that will help reinforce your understanding of probability distributions. Throughout this practice, I highly encourage you to refer to the `scipy` documentation online. While you are there, feel free to explore areas we don't cover in this notebook (particularly plotting and randomization). 

For each of the tasks below **_1)_** define a random variable that will help you answer the question; **_2)_** state the distribution and parameters of that random variable; **_3)_** determine the expected value and variance of that random variable. 

I will demonstrate using **_1.1_** below. 

#### Problem 1

The T-6 training aircraft is used during UPT. Suppose that on each training sortie, aircraft return with a maintenance-related failure at a rate of 1 per 100 sorties. 

**_1.1_** Find the probability of no maintenance failures in 15 sorties. 

**_1.2_** Find the probability of at least two maintenance failures in 15 sorties. 

**_1.3_** Find the probability of at least 30 successful (no mx failures) sorties before the first failure.

**_1.4_** Find the probability of at least 50 successful sorties before the third failure. 



##### Demonstration using 1.1

Find the probability of no maintenance failures in 15 sorties.

$X$: the number of maintenance failures in 15 sorties. 

$X\sim \textsf{Bin}(n=15, p=0.01)$

$E(X) = np = 15*0.01 = 0.15$

$V(X) = np(1-p) = 15*0.01*0.99 = 0.1485$

Probability of no maintenance failures, $P(X=0)$:  

In [5]:
stats.binom.pmf(0,15,0.01)

0.8600583546412884

The probability of getting no maintenance failures in 15 sorties is about 0.86. It is worth taking a moment to make sure this value makes sense. Since failures are fairly unlikely, and the **expected** number of failures (0.15) is close to 0, then 0 failures should be a fairly likely outcome and a probability of 0.86 makes sense. 

##### 1.2

Find the probability of at least two maintenance failures in 15 sorties.

$X$: the number of maintenance failures in 15 sorties. 

$X\sim \textsf{Bin}(n=15, p=0.01)$

$E(X) = np = 15*0.01 = 0.15$

$V(X) = np(1-p) = 15*0.01*0.99 = 0.1485$

Probability of at least two maintenance failures, $P(X\geq 2)$:  

In [6]:
1-stats.binom.cdf(1,15,0.01)

0.009629773443364797

##### 1.3

Find the probability of at least 30 successful (no mx failures) sorties before the first failure.

###### Using binomial:

$X$: the number of maintenance failures in 30 sorties. 

$X\sim \textsf{Bin}(n=30, p=0.01)$

$E(X) = np = 30*0.01 = 0.3$

$V(X) = np(1-p) = 30*0.01*0.99 = 0.297$

###### Using negative binomial:

$Y$: the number of successful sorties before the first failure.

$Y\sim \textsf{NegBin}(r=1, p=0.01)$

$E(Y) = \frac{r}{p}-r = 1/0.01 - 1 = 99$

$V(Y) = \frac{r(1-p)}{p^2}= \frac{1(1-0.01)}{0.01^2} = 9900$

Probability of no failures in 30 sorties, $P(X= 0)$ or $P(Y\geq 30)$:

In [11]:
#binomial
print(stats.binom.pmf(0,30,0.01))
#negative binomial
print(1-stats.nbinom.cdf(29,1,0.01))

0.7397003733882804
0.7397003733882803


##### 1.4

Find the probability of at least 50 successful sorties before the third failure.

$Y$: the number of successful sorties before the third failure. 

$Y\sim \textsf{NegBin}(r=3, p=0.01)$

$E(Y) = \frac{3}{0.01}-3 = 297$

$V(Y) = \frac{3(1-0.01)}{0.01^2} = 29700$

Probability of at least 50 successful sorties before third failure, $P(Y\geq 50)$:  

In [14]:
1-stats.nbinom.cdf(49,3,0.01)

0.9846473742663409

#### Problem 2

On a given Saturday, suppose vehicles arrive at the USAFA North Gate according to a Poisson process at a rate of 40 arrivals per hour. 

**_2.1_** Find the probability no vehicles arrive in 10 minutes. 

**_2.2_** Find the probability at least 50 vehicles arrive in an hour. 

**_2.3_** Find the probability that at least 5 minutes will pass before the next arrival.

**_2.4_** Find the probability that the next vehicle will arrive between 2 and 10 minutes from now. 

**_2.5_** Find the probability that at least 7 minutes will pass before the next arrival, given that 2 minute have already passed. Compare this answer to **_2.3_**. This is an example of the *memoryless* property of the exponential distribution.

**_2.6_** Fill in the blank. There is a probability of 90% that the next vehicle will arrive within __ minutes. This value is known as the 90% percentile of the random variable. 


##### 2.1

Find the probability no vehicles arrive in 10 minutes.

$X$: the number of arrivals in 10 minutes. 

$X\sim \textsf{Pois}(\lambda = 40/6 = 6.67)$

$E(X) = \lambda = 6.67$

$V(X) = \lambda = 6.67$

Probability of no arrivals in 10 minutes, $P(X=0)$:  

In [16]:
stats.poisson.pmf(0,40/6)

0.0012726338013398079

##### 2.2

Find the probability at least 50 vehicles arrive in an hour. 

$X$: the number of arrivals in an hour. 

$X\sim \textsf{Pois}(\lambda = 40)$

$E(X) = \lambda = 40$

$V(X) = \lambda = 40$

Probability of at least 50 arrivals in an hour, $P(X\geq 50)$:  

In [17]:
1-stats.poisson.cdf(49,40)

0.07033506665939493

##### 2.3

Find the probability that at least 5 minutes will pass before the next arrival.

###### Using Poisson

$X$: the number of arrivals in 5 minutes. 

$X\sim \textsf{Pois}(\lambda = 40/12 = 3.33)$

$E(X) = \lambda = 3.33$

$V(X) = \lambda = 3.33$

###### Using Exponential

$Y$: the time until the next arrival (in minutes)

$Y\sim \textsf{Exp}(\lambda= 40/60 = 0.667)$

$E(Y) = \frac{1}{\lambda} = 1.5$

$V(Y) = \frac{1}{\lambda^2} = 2.25$

Probability of at least 5 minutes until next arrival, $P(X=0)$ or $P(Y\geq 5)$:  

In [25]:
print(stats.poisson.pmf(0,40/12))
lambda_exp=2/3
print(1-stats.expon.cdf(5,scale=1/lambda_exp))

0.035673993347252395
0.03567399334725241


##### 2.4

Find the probability that the next vehicle will arrive between 2 and 10 minutes from now. 

$Y$: the time until the next arrival (in minutes)

$Y\sim \textsf{Exp}(\lambda= 40/60 = 0.667)$

$E(Y) = \frac{1}{\lambda} = 1.5$

$V(Y) = \frac{1}{\lambda^2} = 2.25$

Probability next arrival will be between 2 and 10 minutes from now, $P(2\leq Y \leq 10)$:  

In [26]:
lambda_exp=2/3
print(stats.expon.cdf(10,scale=1/lambda_exp)-stats.expon.cdf(2,scale=1/lambda_exp))

0.2623245043143869


##### 2.5

Find the probability that at least 7 minutes will pass before the next arrival, given 2 minutes have already passed.

$Y$: the time until the next arrival (in minutes)

$Y\sim \textsf{Exp}(\lambda= 40/60 = 0.667)$

$E(Y) = \frac{1}{\lambda} = 1.5$

$V(Y) = \frac{1}{\lambda^2} = 2.25$

Probability of at least 7 minutes until next arrival, given 2 minutes have passed, $P(Y\geq 7|Y\geq 2) = \frac{P(Y\geq 7, Y\geq 2)}{P(Y\geq 2)} = \frac{P(Y\geq 7)}{P(Y\geq 2)}$:  

In [45]:
lambda_exp=2/3
print((1-stats.expon.cdf(7,scale=1/lambda_exp))/(1-stats.expon.cdf(2,scale=1/lambda_exp)))

0.03567399334725243


##### 2.6

Fill in the blank. There is a probability of 90% that the next vehicle will arrive within __ minutes.

$Y$: the time until the next arrival (in minutes)

$Y\sim \textsf{Exp}(\lambda= 40/60 = 0.667)$

$E(Y) = \frac{1}{\lambda} = 1.5$

$V(Y) = \frac{1}{\lambda^2} = 2.25$

Need to find $y$ such that, $P(Y \leq y) = 0.9$:  

In [33]:
lambda_exp=2/3
print(stats.expon.ppf(0.9,scale=1/lambda_exp))

3.453877639491069


#### Problem 3

Suppose there are 12 male and 7 female cadets in a classroom. I select 5 completely at random (without replacement). 

**_3.1_** Find the probability I select no female cadets. 

**_3.2_** Find the probability I select more than 2 female cadets. 

##### 3.1

Find the probability I select no female cadets. 

$X$: the number of female cadets selected in my sample of 5. 

$X\sim \textsf{Hypergeometric}(M=19, n=7, N=5)$

$E(X) = \frac{nN}{M} = 1.842$

$V(X) = \frac{nN(M-n)(M-N)}{M^2(M-1)} = 0.905$

Probability of no female cadets, $P(X=0)$:  

In [29]:
stats.hypergeom.pmf(0,19,7,5)

0.06811145510835913

##### 3.2

Find the probability I select more than 2 female cadets. 

$X$: the number of female cadets selected in my sample of 5. 

$X\sim \textsf{Hypergeometric}(M=19, n=7, N=5)$

$E(X) = \frac{nN}{M} = 1.842$

$V(X) = \frac{nN(M-n)(M-N)}{M^2(M-1)} = 0.905$

Probability of more than 2 female cadets, $P(X> 2)$:  

In [32]:
1-stats.hypergeom.pmf(2,19,7,5)

0.6026831785345721

#### Problem 4

Suppose PFT scores in the cadet wing follow a normal distribution with mean 330 and standard deviation 50. 

**_4.1_** Find the probability a randomly selected cadet has a PFT score higher than 450. 

**_4.2_** Find the probability a randomly selected cadet has a PFT score within 2 standard deviations of the mean.

**_4.3_** Find $a$ and $b$ such that 90% of PFT scores will be between $a$ and $b$. 

**_4.4_** Find the probability a randomly selected cadet has a PFT score higher than 450 given he/she is among the top 10% of cadets. 

##### 4.1

Find the probability a randomly selected cadet has a PFT score higher than 450. 

$X$: PFT score of a randomly selected cadet. 

$X\sim \textsf{Norm}(\mu = 330, \sigma = 50)$

$E(X) = \mu = 330$

$V(X) = \sigma^2 = 2500$

Probability of PFT score higher than 450, $P(X>0)$:  

In [41]:
1-stats.norm.cdf(450,330,50)

0.008197535924596155

##### 4.2

Find the probability a randomly selected cadet has a PFT score within 2 standard deviations of the mean.

$X$: PFT score of a randomly selected cadet. 

$X\sim \textsf{Norm}(\mu = 330, \sigma = 50)$

$E(X) = \mu = 330$

$V(X) = \sigma^2 = 2500$

Probability of PFT score within 2 sds of mean, $P(230 \leq X \leq 430)$:  

In [38]:
stats.norm.cdf(430,330,50)-stats.norm.cdf(230,330,50)

0.9544997361036416

##### 4.3

Find $a$ and $b$ such that 90% of PFT scores will be between $a$ and $b$. 

$X$: PFT score of a randomly selected cadet. 

$X\sim \textsf{Norm}(\mu = 330, \sigma = 50)$

$E(X) = \mu = 330$

$V(X) = \sigma^2 = 2500$

90% interval of $X$:

In [51]:
print(stats.norm.ppf(0.05,330,50))
print(stats.norm.ppf(0.95,330,50))

247.75731865242636
412.2426813475736


##### 4.4

Find the probability a randomly selected cadet has a PFT score higher than 450 given he/she is among the top 10% of cadets.  

$X$: PFT score of a randomly selected cadet. 

$X\sim \textsf{Norm}(\mu = 330, \sigma = 50)$

$E(X) = \mu = 330$

$V(X) = \sigma^2 = 2500$

Probability of PFT score higher than 475 given upper 10th percentile, $P(X > 450|X>x_{.9})$, where $P(X \leq x_{.9})=0.9$ 

In [42]:
(1-stats.norm.cdf(450,330,50))/0.1

0.08197535924596155

#### Problem 5

Suppose time until computer errors on the F-35 follows a Gamma distribution with mean 20 hours and variance 10.  

**_5.1_** Find the probability that 50 hours pass without a computer error. 

**_5.2_** Find the probability that 75 hours pass without a computer error, given that 25 hours have already passed. Dose the memoryless property apply to the Gamma distribution? 

**_5.3_** Find $a$ and $b$: There is a 95% probability time until next computer error will be between $a$ and $b$.  

##### 5.1

Find the probability that 50 hours pass without a computer error. 

$X$: time until next computer error. 

$X\sim \textsf{Gamma}(\alpha = 40, \lambda = 2)$

$E(X) = 20$

$V(X) = 10$

Probability that 50 hours pass without error, $P(X\geq 50)$:  

In [49]:
1-stats.gamma.cdf(50,40,1/2)

0.07365494485857027

##### 5.2

Find the probability that 75 hours pass without a computer error, given that 25 hours have already passed. Dose the memoryless property apply to the Gamma distribution?

$X$: time until next computer error. 

$X\sim \textsf{Gamma}(\alpha = 40, \lambda = 2)$

$E(X) = 20$

$V(X) = 10$

Probability that 75 hours pass without error, given 25 have already passed $P(X\geq 75|X\geq 25) = \frac{P(X\geq 75, X\geq 25)}{P(X\geq 25)} = \frac{P(X\geq 75)}{P(X\geq 25)}$:  

In [48]:
(1-stats.gamma.cdf(75,40,1/2))/(1-stats.gamma.cdf(25,40,1/2))

4.577191724634049e-06

Memoryless property does not apply to Gamma.

##### 5.3

Find $a$ and $b$: There is a 95% probability time until next computer error will be between $a$ and $b$.  

$X$: time until next computer error. 

$X\sim \textsf{Gamma}(\alpha = 40, \lambda = 2)$

$E(X) = 20$

$V(X) = 10$

95% interval of $X$:  

In [50]:
print(stats.gamma.ppf(0.025,40,1/2))
print(stats.gamma.ppf(0.975,40,1/2))

29.07658644178896
53.81428386583285
