In [2]:
import numpy as np
from scipy.stats import binom, norm

### Discrete Random Variable : Binomial Random Variable

#### Scenario : A multiple choice test has 10 questions, each with 5 possible answers, only one of which is correct. A student who did not study is absolutely clueless, and therefore uses an independent random guess to answer each of the 10 questions.
#### Let X be the number of questions the student gets right.

#### X is a binomial random variable with n = 10, and p = 0.2
#### Why X is binomial ?
1. Fixed number of trials (n=10)
2. All trials are independent.
3. Only two outcomes (Either student gets the question right or wrong)
4. Probability of success in each trial is same (p = 0.2)

#### Here X can take values from 0 to 10 (either no question gets right or all gets right or in between these two cases)

#### Problem 1 : What is the probability that the student will get no more than 4 questions right, P(X ≤ 4)?

#### Here we will use Scipy's binom method (It calculates CDF i.e Cumulative Distribution Function for the different values of X)

In [20]:
X_cdf = binom(10, 0.2)

#### Solution 1 : Since the problem is asking that student should not get more than 4 question right, that means X should take value between 0 and 4 i.e. 0,1,2,3 and 4. So, we will just pass 4 to the cdf method of the binom values (It will calculate cumulative probability of 0 to 4)

In [21]:
X_4 = X_cdf.cdf(4)
X_4

0.9672065024000001

#### Problem 2 : What is the probability that the student gets more than 2 questions right: P(X > 2)?

#### Solution 2 : Since the problem is asking that the student should get more than 2 questions right, that means X should take value between 3 and 10 i.e. 3,4,5,6,7,8,9, and 10. So we need to first pass the value of 2 in the cdf method of the binom value and then do the same with value 10. And then we need to take the difference between these two. This will give the probability for getting more than 2 questions right.

In [22]:
print(X_cdf.cdf(10) - X_cdf.cdf(2))

0.3222004735999999


### Continuous Random Variable

#### Finding z-score given probability

In [11]:
norm.ppf(.9798)

2.0496356380817193

#### FInding probability given z-score

In [14]:
norm.cdf(2.0496356380817193)

0.9798

#### Problem 1 : Adult male height (X) follows (approximately) a normal distribution with a mean of 69 inches and a standard deviation of 2.8 inches. (a) What proportion of males are less than 65 inches tall? In other words, what is P(X < 65)? (b) What proportion of males are more than 75 inches tall? In other words, what is P(X > 75)? (c) What proportion of males are between 66 and 72 inches tall? In other words, what is P(66 < X < 72)?

#### Solution 1: Here values (continuous random values) are given. So, first we need to standardize (finding z-score : z-score = (x - mean) / sd ) that. Then look for the probability using norm.cdf function (Since its cumulative value and it gives less than value we need to adjust the scenario according to the question if scenario of more than or anything else comes)

In [48]:
def cal_z_score(mean, sd, x):
    return (x - mean) / sd
    
def cal_normal_value(mean, sd, z):
    return (sd * z) + mean
    
def get_z_score_from_prob(prob):
    return norm.ppf(prob)

def get_prob_from_z_score(z):
    return norm.cdf(z)

In [49]:
mean_height = 69.0
sd_height = 2.8
x = [65, 75, 66, 72]
z = []
p = []
for i in x:
    p.append(get_prob_from_z_score(cal_z_score(mean_height, sd_height, i)))
p

[0.07656372550983476,
 0.9839377143961717,
 0.14198838587545587,
 0.8580116141245442]

In [30]:
# Answer for part a:
p[0] * 100

7.656372550983476

In [31]:
# Answer for part b:
(1 - p[1]) * 100

1.6062285603828275

In [32]:
# Answer for part c:
(p[3] - p[2]) * 100

71.60232282490884

#### Problem 2 : Recall that adult male height follows a normal distribution with a mean of 69 inches and a standard deviation of 2.8 inches. (a) How tall must a male be in order to be among the shortest 0.5% of males? (b) How tall must a male be in order to be among the tallest 0.25% of males?

#### Solution 2 : Here probabilities are given. And we need to find the normal value. For that we first get the z-score using norm.ppf() function. Then using that to unstandardize it (x = sd * z-score + mean)

In [45]:
given_p = [0.005, 1 - 0.0025]

In [46]:
# Answer for part a:
cal_normal_value(mean_height, sd_height, get_z_score_from_prob(given_p[0]))

61.78767795006308

In [47]:
# Answer for part b:
cal_normal_value(mean_height, sd_height, get_z_score_from_prob(given_p[1]))

76.85969455136267