### Normal Distribution

<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/8/8c/Standard_deviation_diagram.svg/1200px-Standard_deviation_diagram.svg.png" width="400">

$P(a \leq x \leq b) = \int_a^b \frac{1}{\sigma \sqrt{2\pi}} e^{-\frac{(x - \mu)^2}{2\sigma^2}} dx$

___
*Example*

The test scores of a physics class with 800 students are distributed normally with a mean 75 and a standard deviation of 7.
1. What percentage of the class has a test score between 68 and 82?
2. Approximately how many students have a test score between 61 and 89?
3. What is the probability that a student chosen at random has a test score between 54 and 75?
4. Approximately how many students have a test score greater than or equal 96?

*Solution*

$\mu = 75 \\
\sigma = 7$

![image-3.png](attachment:image-3.png)

1. Percentage of the class has a test score between 68 and 82.
$\\ P(68 \leq X \leq 82) = 34.134\% + 34.134\% = 68.268\%\\$
Or, by the formula above:
$P(68 \leq x \leq 82) = \int_{68}^{82} \frac{e^{-\frac{(x - 75)^2}{2\cdot 7^2}}}{7 \sqrt{2\pi}}dx = 0.68268949$


2. How many students have a test score between 61 and 89.
$\\P(61 \leq X \leq 89) = 13.591\% + 34.134\% + 34.134\% + 13.591\% = 95.45\% \\
P(61 \leq x \leq 89) = \int_{61}^{89} \frac{e^{-\frac{(x - 75)^2}{2\cdot 7^2}}}{7 \sqrt{2\pi}}dx = 95.44997\% \\$
$N = 0.9544997 \times 800 = 763.6 \approx 764$ students


3. Probability that a random student has a test score between 54 and 75.
$P(54 \leq X \leq 75) = 2.14\% + 13.591\% + 34.134\% = 49.865\% \\
P(54 \leq x \leq 75) = \int_{54}^{75} \frac{e^{-\frac{(x - 75)^2}{2\cdot 7^2}}}{7 \sqrt{2\pi}}dx = 49.865\%$


4. How many students have a test score greater than or equal 96.
$P(X \geq 96) = 0.135\% \\
P(X \geq 96) = \int_{96}^{1000} \frac{e^{-\frac{(x - 75)^2}{2\cdot 7^2}}}{7 \sqrt{2\pi}}dx = 0.13498903\% \\$
$N = 0.0013498903 \times 800 = 1.0799 \approx 1$ students

### Cumulative Probability

Cumulative distribution functions (cdf) are used to calculate the area under the curve to the left from a point of interest. It is used to evaluate the accumulated probability. For continuous probability distributions, the probability = area under the curve. Total Area = 1.

The probability density function (pdf) is f(x) which describes the shape of the distribution (uniform, exponential, or normal distribution).

#### Uniform
$f(x) = \frac{1}{b - a}$ - pdf

$A_{L} = P(X \leq x) = \frac{x - a}{b - a}$ - cdf

#### Exponential

$f(x) = \lambda\cdot e^{\lambda - x} \\
\lambda = \frac{1}{\mu}$

$A_{L} = P(X \leq x) = 1 - e^{-\lambda\cdot x} \\
A_{R} = 1 - A_{L} = e^{\lambda - x}$

$A = P(a < X < b) = P(X < b) - P(X < a) = (1 - e^{-\lambda\cdot b}) - (1 - e^{-\lambda\cdot a})$

$A = P(a \leq X \leq b) = P(a < X < b)$. Here $P(X = a) = 0$ because you can not calculate area of a line.

For a generic normal distribution with density $f$, mean $\mu$ and deviation $\sigma$, the cumulative distribution function is:
$F(x)=\Phi \left({\frac{x-\mu}{\sigma}}\right)={\frac{1}{2}}\left[1+\operatorname {erf} \left({\frac{x-\mu }{\sigma {\sqrt{2}}}}\right)\right]$

___
*Example 1*

In a certain plant, the time taken to assemble a car is a random variable, $X$, having a normal distribution with a mean of $20$ hours and a standard deviation of $2$ hours. What is the probability that a car can be assembled at this plant in:

1. Less than $19.5$ hours?
2. Between $20$ and $22$ hours?

*Code*

In [1]:
import math

m, sd = 20, 2
h1 = 19.5
h2, h3 = 20, 22

cdf = lambda x: 0.5 * (1 + math.erf((x - m) / (sd * (2 ** 0.5))))

print('Probability that a car can be assembled in less than 19.5 hours = {:.3f}'.format(cdf(h1)))
print('Probability that a car can be assembled in between 20 and 22 hours = {:.3f}'.format(cdf(h3) - cdf(h2)))

Probability that a car can be assembled in less than 19.5 hours = 0.401
Probability that a car can be assembled in between 20 and 22 hours = 0.341


___
*Example 2*

The final grades for a Physics exam taken by a large group of students have a mean of $\mu = 70$ and a standard deviation of $\sigma = 10$. If we can approximate the distribution of these grades by a normal distribution, what percentage of the students:

1. Scored higher than $80$ (i.e., have a grade $> 80$)?
2. Passed the test (i.e., have a grade $\geq 60$)?
3. Failed the test (i.e., have a grade $< 60$)?

*Code*

In [3]:
import math

m, sd = 70, 10
s1 = 80
s2 = 60

cdf = lambda x: 0.5 * (1 + math.erf((x - m) / (sd * (2 ** 0.5))))

print('Percentage of the students scored higher than 80 = {}'.format(round((1 - cdf(s1)) * 100, 2)))
print('Percentage of the students passed the test = {}'.format(round((1 - cdf(s2)) * 100, 2)))
print('Percentage of the students failed the test = {}'.format(round(cdf(s2) * 100, 2)))

Percentage of the students scored higher than 80 = 15.87
Percentage of the students passed the test = 84.13
Percentage of the students failed the test = 15.87
