# Discrete Distributions Exercises

In [1]:
import pandas as pd 
import scipy as sp


# 1. 
(3.5.11)<br>
A computer system uses passwords that are exactly six characters and each character is one ofthe 26 letters (a-–z) or 10 integers (0-–9).<br>
Suppose that 10,000 users of the system have unique passwords. A hacker randomly selects (with replacement) 100,000 passwords from the potential set, and a match toa user’s password is called a hit.<br>

## 1.A - What is the distributions of the number of hits?
### Answer
Let <em>X</em> be a random variable that denotes the amount of correct passwords the hacker selects. Then <em>X</em> follows a binominal distributions, since it is a series of independent Bernoulli trials where the outcome is either "hit" or "no hit".<br>
<em>n</em> is number of trials, which in this case is the number of passwords.<br>
<em>p</em> is the probability of getting a hit. In this case we have 10000 users with unique passwords, where the hacker has to pick 1 correct from the 10000. There are $36^6$ possible permutations since we have replacements 

$$ 
    X \sim B(n=100000, p=\frac{10^4}{36^6})
$$

## 1.B - What is the probability of no hits?
Getting no hits is the same as X = 0 for the binominal distribution. We calculate the probability by using the PMF.

In [2]:
x = 0
n = 100000
p = pow(10,4) / pow(36,6)

p_no_hits = sp.stats.binom.pmf(x, n, p)
round(p_no_hits, 2)

0.63

## 1.C - What are the mean and variance of the hits?

In [3]:
hits_mean = sp.stats.binom.mean(n, p)
hits_variance = sp.stats.binom.var(n, p)

print(f"Mean: {round(hits_mean, 2)}")
print(f"Variance: {round(hits_variance, 2)}")

Mean: 0.46
Variance: 0.46


# 2
(3.5.13)<br>
Because all airline passengers do not show up for their reserved seat, an airline sells 125 tickets
for a flight that holds only 120 passengers.<br>
The probability that a passenger does not show up is 0.10, and the passengers behave independently.

## 2.A - What is the probability that every passenger who shows up can take the flight

### Answer
We know that the passengers behave independently and there are 2 outcomes - show up or do not show up<br>
Let <em>X</em> be a random variable that denotes how many passengers that do not show up. Then <em>X</em> follows a binomial distributions, since we have a series of independent Bernoulli distributions where each outcome is either "show up" or "do not show up" for each ticket sold<br>
<em>n</em> is the number of trials, which in this case is how many tickets sold for the flight<br>
<em>p</em> is the probability of not showing up, which from the description is 0.10<br>

$$
 X \sim B(n=125, p=0.10)
$$

To allow all passengers that show up to take a flight with a seat capacity of 120, when we have sold 125 tickets; 5 or more of the passengers who bought a ticket must not show up. <br>
$$P(X >= 5) = 1 - P(X < 5) = 1 - P(X <= 4)$$

In [4]:
n = 125
p = 0.1
p_all_can_take_flight = 1 - sp.stats.binom.cdf(4, n, p)
print(f"{round(p_all_can_take_flight, 3)}")

0.996


## 2.B - What is the probability that a flight departs with empty seats
### Answer
If the flight has 120 seats and we sell 125 tickets, then at least 6 or more must not show up in order to have a flight with empty seats
$$ P(X>=6) = 1 - P(X < 6) = 1 - P(X <= 5) $$

In [5]:
p_flight_with_empty_seats = 1 - sp.stats.binom.cdf(5, n, p)
print(f"{round(p_flight_with_empty_seats, 3)}")

0.989


# 3 
(3.6.6)<br>
A player of a video game is confronted with a series of opponents and has an 80% probability of defeating each one.<br> 
Success with any opponent is independent of previous encounters. Until defeated, the player continues to contest opponents.

## 3.A - What is the PMF of the number of opponents contested in the game
### Answer
We essentially have a geometric distributions, which is a special case of negative binomial where we are looking for how many trials until 1 success.<br> 
Quick overview: 
- Binomial Distributions
    - Models # of successes in a fixed amount of trials
- Negative Binomial Distributions
    - Models # of trials until a fixed amount of successes
- Geometric
    - Models # of trials until 1 success

The PMF of a geometric distribution is
$$
 P(X=x) = ({1-p})^{x-1}*p
$$

## 3.B - What is the probability a player defeats at least two opponents in a game?
### Answer
Since a geometric distribution models # of trials until success, and our game is how many opponents we can defeat until the player is defeated, then: <br>
Let <em>X</em> denote the number of opponents defeated and <em>p</em> be the chance of player defeated, which is $1-P_{defeat~opponent} = 1 - 0.8 = 0.2$ <br>
Then to beat at least 2 opponents we have the case 
$$
 P(X >= 2) = 1 - P(X < 2) = 1 - P(X <= 1)
$$

In [6]:
p = 0.2
p_defeat_2_or_more = 1 - sp.stats.geom.cdf(1, 0.2)
print(f"{round(p_defeat_2_or_more, 2)}")

0.8


## 3.C - What is the expected number of opponents contesed in a game?
The expectation is how many opoonents we expect to defeat until defeat, when the player has a 20% chance of getting defeated. This is essentially the mean of the distribution

In [7]:
expected_opponents = sp.stats.geom.mean(0.2) # Number of expected trials until first success
print(f"{expected_opponents}")

5.0


## 3.D - What is the probability that a player contests four or more opponents in a game?
### Answer
We are looking for 
$$
P(X >= 4) = 1 - P(X < 4) = 1 - P(X <= 3)
$$

In [8]:
p = 0.2
p_four_or_more = 1- sp.stats.geom.cdf(3, p)
print(f"{p_four_or_more}")

0.512


## 3.E - What is expected number of game plays until a player contests four or more opponents?
### Answer
The chance of defeating 4 or more opponents is 
$$
P(X >= 4)
$$
We now let X be a random variable that denotes the number of games until contest 4 or more. The expected number of games played until player contests 4 or more is then the mean of the new geometric distribution

In [9]:
expected_games_until_4_or_more_contested = sp.stats.geom.mean(p_four_or_more)
print(f"{round(expected_games_until_4_or_more_contested,2)}") # 1.95 ~ 2 games 

1.95


# 4 
(3.8.5) <br>
Astronomers treat the number of stars in a given volume of space as a Poisson random variable.<br>
The density in the Milky Way Galaxy in the vicinity of our solar system is one star per 16 cubic light-
years.

## 4.A - What is the probability of two or more stars in 16 cubic light-years?
### Answer
We now the number of stars in a given volume of space is a Posson random variable. <br>
Let <em>X</em> be a random variable that denotes the number of stars per 16 cubic light-years. We then need to find
$$
P(X >= 2) = 1-P(X < 2) = 1-P(X <= 1)
$$

In [12]:
mu = 1
p_two_or_more = 1 - sp.stats.poisson.cdf(1, mu)
print(f"{p_two_or_more}")

0.26424111765711533


# 5
(3.S16)<br>
A congested computer network has a 1% chance of losing a data packet that must be resent,<br>
and packet losses are independent events. An e-mail message requires 100 packets.

## 5.A - What is the distribution of the number of packets in an e-mail message that must be resent? Include the parameter values.
### Answer
Each data packet can either be sent (not resent) or resent <strong>and</strong> the packet losses are independent.<br>
Since we have a binary outcome of each trial (sending a single packet) and those events are independent, then the number of packets in an e-mail that must be resent follows a Binomial Distribution<br>
Let <em>X</em> be a RV that denotes the number of data packets that must be resent, then <em>X</em> follows a Binomial Distribution where<br>
$$
X \sim B(n=100, p=0.01)
$$
<em>n</em> is the number of packet sent<br>
<em>p</em> is probability of losing a packet

## 5.B - What is the probability that at least one packet is resent?
### Answer
The probability of resending at least one packet is 
$$
P(X >= 1) = 1 - P(X <= 0)
$$

In [17]:
p = 0.01 
n = 100
p_one_or_more = 1-sp.stats.binom.cdf(0, n, p)
print(f"{round(p_one_or_more, 2)}")

0.63


## 5.C - What is the probability that two or more packets are resent?
### Answer
$$
P(X>=2) =  1 - P(X<=1)
$$

In [18]:
p_two_or_more = 1-sp.stats.binom.cdf(1,n,p)
print(f"{round(p_two_or_more, 2)}")

0.26


## 5.D What are the mean and standard deviation of the number of packets that are sent?
### Answer

In [21]:
mean = sp.stats.binom.mean(n, p)
standard_deviation = sp.stats.binom.std(n,p)
print(f"Mean: {round(mean, 4)}, Standard Deviation: {standard_deviation}")

Mean: 1.0, Standard Deviation: 0.99498743710662


# 6 
(3.S28) A manufacturer of a consumer electronics product expects 2% of units to fail during the warranty
period. A sample of 500 independent units is tracked for warranty performance.

## 6.A What is the probability that none fails during the warranty period?
### Answer
It is assumed that the failure of a unit is independent of other units since the units are independent. <br>
Given that the units can either fail or not fail during the warranty, i.e. a binary outcome, and the failures are independent:
Let <em>X</em> be a RV that denotes the amount of failures in the warrany period, then <em>X</em> must follow a Binomial distribution <br>
$$
X \sim B(n=500, p=0.02)
$$
<em>n</em> is the number of units<br>
<em>p</em> is the probability of failure<br>

We are looking for 
$$
P(X = 0)
$$

In [24]:
n = 500
p= 0.02
p_no_failures = sp.stats.binom.cdf(0, n, p)
print(f"{round(p_no_failures, 2)}")

0.0


## 6.B  What is the expected number of failures during the warranty period?
### Answer
The expected number of failures during the warranty period is the mean of the distribution

In [25]:
expected_failure_count = sp.stats.binom.mean(n, p)
print(f"{expected_failure_count}")

10.0


## 6.C What is the probability that more than two units fail during the warranty period?
### Answer
We are looking for 
$$
P(X > 2) = 1-P(X<=2)
$$

In [27]:
p_more_than_two = 1 - sp.stats.binom.cdf(2, n, p)
print(f"{round(p_more_than_two,2)}")

1.0


# 7
The probability that a patient recovers from a rare blood disease is 0.4. If 15 people are known to have
contracted this disease, what is the probability that

## 7.A at least 10 survive
### Answer
It is assumed that the recovery of an patient is independent of other patient. In this situation a patient can either recover or not i.e., a binary outcome <br>
Let <em>X</em> be a RV that denotes the number of recovered patients<br>
$$
X \sim B(n=15, p=0.4)
$$
<em>n</em> is the number of patients<br>
<em>p</em> is the probability of recovery<br>

$$
P(X>=10) = 1 - P(X <= 9)
$$

In [29]:
n = 15
p = 0.4
p_atleast_10_recovery = sp.stats.binom.cdf(9, n, p)
print(f"{round(p_atleast_10_recovery, 2)}")

0.97


## 7.B from 3 to 8 survive
### Answer
$$
P(3<=x<=8) = P(X > 9) - (P(X<=2)) = (P(X <=8)) - P(X<=2)
$$

In [30]:
p_3_to_8_recovery = sp.stats.binom.cdf(8, n, p) - sp.stats.binom.cdf(2, n, p)
print(f"{round(p_3_to_8_recovery, 2)}")

0.88


## 7.C Exactly 5
### Answer
$$
P(X = 5)
$$

In [31]:
p_5 = sp.stats.binom.pmf(5, n ,p)
round(p_5,2)

0.19

## 7.D Find the mean and variance
### Answer

In [33]:
mean = sp.stats.binom.mean(n, p)
variance = sp.stats.binom.var(n, p)
print(f"Mean: {mean}, Variance: {round(variance, 2)}")

Mean: 6.0, Variance: 3.6


# 8
A large chain retailer purchases a certain kind of electronic device from a manufacturer. The manufac-
turer indicates that the defective rate of the device is 3%

## 8.A The inspector of the retailer randomly picks 20 items from a shipment. What is the probability that there will be at least one defective item among them?
### Answer
It is assumed that the items being defective is independent from each other. Each item can either be defective or not i.e., binary outcome.<br>
Let <em>X</em> be a RV that denotes the amount of defective items, then <em>X</em> follows a Binomial distribution <br>
$$
X \sim B(n=20, p=0.03)
$$
<em>n</em> number of items picked from shipment<br>
<em>p</em> probability of an item being defect<br>

$$
P(X >= 1) = 1 - P(X <= 0)
$$

In [40]:
n = 20
p = 0.03
p_atleast_one_defective = 1 - sp.stats.binom.cdf(0, n, p)
round(p_atleast_one_defective, 3)

0.456

## 8.B Suppose that the retailer receives 10 shipments in a month and the inspector randomly tests 20 devices per shipment. What is the probability that there will be 3 shipments containing at least one defective device?
### Answer
Let now <em>Y</em> be a RV that denotes the amount of defective devices per shipment, <em>Y</em> follows a Binomial distribution<br> 
$$
Y \sim B(n=10, p=P(X>=1))
$$

$$
P(Y = 3)
$$

In [41]:
p = p_atleast_one_defective
n = 10
p_3_shipments_atleast_one_defective =  sp.stats.binom.pmf(3, n, p)
round(p_3_shipments_atleast_one_defective, 2)

0.16

# 9
High flows result in the closure of a causeway. From past records, the road is closed for this reason on 10
days during a 20-year period.<br>
At an adjoining village, there is concern about the closure of the causeway because it provides the only access. <br>
The villagers assume that the probability of a closure of the road for more than one day during a year is less than 0.10. <br>
Is this correct? Please show using the Poisson distribution.

## Answer to 9
Let <em>X</em> be a RV that denotes the number of days the road is closed during a year, then X follows a poisson distribution<br>
$$
X \sim Poi(\lambda = 10 days / 20 years)
$$
$\lambda$ denotes the average amount of days the road is closed in a year<br>

We must show that 
$$
P(X>1) < 0.10 \implies 1-P(X<=1) < 0.10
$$


In [46]:
_lambda = 10/20
p_more_than_one_day = 1-sp.stats.poisson.cdf(1, _lambda) 
villangers_correct  = p_more_than_one_day < 0.10
print(f"Villagers are correct: {villangers_correct}, P(x > 1): {round(p_more_than_one_day,4)}")

Villagers are correct: True, P(x > 1): 0.0902


# 10
A company performs inspection on shipments from suppliers in order to detect nonconforming products.<br>
Assume a lot contains 1000 items and 1% is nonconforming.<br> 
Assuming that the number of nonconforming products in the sample is binomial,<br> what sample size is needed so that the probability of choosing at
least one nonconforming item in the sample is at least 0.9?

## Answer to 10


# 11
The number of errors in a textbook follows a Poisson distribution with a mean of 0.01 error per page.
What is the probability that there are three or less errors in 100 pages?

## Answer to 11
Let <em>X</em> be a RV that denotes the number of errors in 100 pages, then <em>X</em> follows a poisson distribution<br>
$$
X \sim Poi(\lambda = 0.01 \frac{errors}{page} * 100 pages)
$$
$\lambda$ is the average number of errors per 100 pages

$$
P(X<=3)
$$

In [None]:
errors_per_page = 0.01
pages = 100
_lambda = errors_per_page*pages
p_3_or_less = sp.stats.poisson.cdf(3, _lambda)
round(p_3_or_less,2)