## <font color=red>Problem</font>
***

In a study on the heights of the residents of a city, it was found that the data set follows an ** approximately normal distribution **, with ** average 1.70 ** and ** standard deviation of 0.1 **. With this information, obtain the following set of probabilities:

> ** A. ** probability of a person, selected at random, being less than 1.80 meters.

> ** B. ** probability that a person, selected at random, is between 1.60 meters and 1.80 meters.

> ** C. ** probability of a person, selected at random, having more than 1.90 meters.

## <font color=green>Normal distribution</font>
***

The normal distribution is one of the most used in statistics. It is a continuous distribution, where the frequency distribution of a quantitative variable is bell-shaped and symmetrical in relation to its mean.

![Normal](https://caelum-online-public.s3.amazonaws.com/1178-estatistica-parte2/01/img001.png)

### Important features

1. It is symmetrical around the average;

2. The area under the curve corresponds to the proportion 1 or 100%;

3. The measures of central tendency (average, median and fashion) have the same value;

4. The ends of the curve tend to infinity in both directions and, theoretically, never touch the $ x $ axis;

5. The standard deviation defines the flatness and width of the distribution. Wider and flatter curves have higher standard deviation values;

6. The distribution is defined by its mean and standard deviation;

7. The probability will always be equal to the area under the curve, bounded by the lower and upper limits.


# $$f(x) = \frac{1}{\sqrt{2\pi\sigma}}e^{-\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^2}$$

Where:

$x$ = normal variable

$\sigma$ = standard deviation

$\mu$ = mean

The probability is obtained from the area under the curve, bounded by the specified lower and upper limits. An example can be seen in the figure below.

![alt text](https://caelum-online-public.s3.amazonaws.com/1178-estatistica-parte2/01/img002.png)


To obtain the area above, just calculate the integral of the function for the determined intervals. According to the equation below:

# $$P(L_i<x<L_s) = \int_{L_i}^{L_s}\frac{1}{\sqrt{2\pi\sigma}}e^{-\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^2}$$

Where:

$x$ = normal variable

$\sigma$ = standard deviation

$\mu$ = mean

$L_i$ = inferior limit

$L_s$ = superior limit


### Standardized tables

Standardized tables were created to make it easier to obtain the values of the areas under the normal curve and eliminate the need to solve defined integrals.

To see the values in a standardized table, simply transform our variable into a standardized variable $ Z $.

This variable $ Z $ represents the deviation in standard deviations of a value from the original variable in relation to the mean.

# $$Z = \frac{x-\mu}{\sigma}$$

Where:

$x$ = normal variable with mean $\mu$ and standard deviation $\sigma$

$\sigma$ = standard deviation

$\mu$ = mean

### Constructing a standarized table
https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.norm.html

In [61]:
import pandas as pd
import numpy as np
from scipy.stats import norm

standarized_normal_table = pd.DataFrame(
    [], 
    index=["{0:0.2f}".format(i / 100) for i in range(0, 400, 10)],
    columns = ["{0:0.2f}".format(i / 100) for i in range(0, 10)])

for index in standarized_normal_table.index:
    for column in standarized_normal_table.columns:
        Z = np.round(float(index) + float(column), 2)
        standarized_normal_table.loc[index, column] = "{0:0.4f}".format(norm.cdf(Z))

standarized_normal_table.rename_axis('Z', axis = 'columns', inplace = True)

standarized_normal_table

Z,0.00,0.01,0.02,0.03,0.04,0.05,0.06,0.07,0.08,0.09
0.0,0.5,0.504,0.508,0.512,0.516,0.5199,0.5239,0.5279,0.5319,0.5359
0.1,0.5398,0.5438,0.5478,0.5517,0.5557,0.5596,0.5636,0.5675,0.5714,0.5753
0.2,0.5793,0.5832,0.5871,0.591,0.5948,0.5987,0.6026,0.6064,0.6103,0.6141
0.3,0.6179,0.6217,0.6255,0.6293,0.6331,0.6368,0.6406,0.6443,0.648,0.6517
0.4,0.6554,0.6591,0.6628,0.6664,0.67,0.6736,0.6772,0.6808,0.6844,0.6879
0.5,0.6915,0.695,0.6985,0.7019,0.7054,0.7088,0.7123,0.7157,0.719,0.7224
0.6,0.7257,0.7291,0.7324,0.7357,0.7389,0.7422,0.7454,0.7486,0.7517,0.7549
0.7,0.758,0.7611,0.7642,0.7673,0.7704,0.7734,0.7764,0.7794,0.7823,0.7852
0.8,0.7881,0.791,0.7939,0.7967,0.7995,0.8023,0.8051,0.8078,0.8106,0.8133
0.9,0.8159,0.8186,0.8212,0.8238,0.8264,0.8289,0.8315,0.834,0.8365,0.8389


<img src='https://caelum-online-public.s3.amazonaws.com/1178-estatistica-parte2/01/img003.png' width='250px'>

The table above provides the area under the curve between $-\infty$ and $Z$ above standard deviations. Remembering that since these are standardized values, we have $\mu = 0 $.

## <font color = 'blue'> Example: How tall are you? </font>

In a study on the heights of the residents of a city, it was found that the data set follows an ** approximately normal distribution **, with ** average 1.70 ** and ** standard deviation of 0.1 **. With this information, obtain the following set of probabilities:

> ** A. ** probability of a person, selected at random, being less than 1.80 meters.

> ** B. ** probability that a person, selected at random, is between 1.60 meters and 1.80 meters.

> ** C. ** probability of a person, selected at random, having more than 1.90 meters.

### Problem A - Identification of the area under the curve

<img style='float: left' src='https://caelum-online-public.s3.amazonaws.com/1178-estatistica-parte2/01/img004.png' width='350px'>

### Get the $ Z $ standardized variable

In [62]:
mean = 1.7
std_deviation = 0.1
x = 1.8

z = (x - mean) / std_deviation
z

1.0000000000000009

### Solution 1 - Using table

In [63]:
probability = 0.8413

Solution 2 - Using Scipy

In [64]:
from scipy.stats import norm

norm.cdf(z)

0.8413447460685431

### Problem B - Identification of the area under the curve

<img style='float: left' src='https://caelum-online-public.s3.amazonaws.com/1178-estatistica-parte2/01/img005.png' width='350px'>

### Get the $Z$ standardized variable

In [65]:
 z = (1.7 - mean) / std_deviation
 z

0.0

In [66]:
z = (1.8 - mean) / std_deviation
z

1.0000000000000009

### Solution 1 - Using table

In [67]:
probability_1_80 = 0.8413 # calculation from above
mean = 0.5 # mean of bell curve, 50%
p1 = probability_1_80 - mean  # p of 1.8 - mean (1.7, equal to 0.5 of curve)


# considering that the curve is symmetrical, and that 1.7-1.8 = p1, the interval between 1.6 and 1.8 is p1 * 2
probability = p1 * 2  
probability 

0.6826000000000001

In [68]:
mean = 1.7
std_deviation = 0.1
superior_z = (1.8 - mean) / std_deviation
superior_z = round(superior_z, 2)

inferior_z = (1.6 - mean) / std_deviation
inferior_z = round(inferior_z, 2)


probability = norm.cdf(superior_z)  - (1 - norm.cdf(superior_z))

probability 

0.6826894921370859

### Solution 2 - Using scipy

In [69]:
probability = norm.cdf(superior_z) - norm.cdf(inferior_z)
probability

0.6826894921370859

### Problem C - Identifying the area under the curve

<img style='float: left' src='https://caelum-online-public.s3.amazonaws.com/1178-estatistica-parte2/01/img006.png' width='350px'>

### Get the $Z$ standardized variable

In [70]:
z = (1.9 - mean) / std_deviation
z

1.9999999999999996

### Solution 1 - Using table

In [72]:
probability = 1 - 0.9767
probability

0.023299999999999987

### Solution 2 - Using Scipy

In [74]:
probability = 1 - norm.cdf(z)
probability

0.02275013194817921

In [75]:
probability = norm.cdf(-z)
probability

0.022750131948179216