# NORMAL DISTRIBUTION

The normal distribution is widely used in statistics. It's a continuous distribution in which the frequency distribution of a quantitative variable shows a bell curve, the shape of a bell, and it's symmetric when related with the mean.

In probability theory, a normal (or Gaussian or Gauss or Laplaceâ€“Gauss) distribution is a type of continuous probability distribution for a real-valued random variable [[1]](https://en.wikipedia.org/wiki/Normal_distribution).

![Normal](https://caelum-online-public.s3.amazonaws.com/1178-estatistica-parte2/01/img001.png)

# Important features of a normal distribution

1. The normal distribution shape is symmetric around the mean value;


2. The area under the curve corresponds to the proportion of 1 or 100%;


3. The measures of central tendency (mean, median and mode) have the same value;


4. The ends of the curve tend to infinity in both directions and, theorically, they never touch the $x$ value;


5. The standard deviation defines flatness and width of the distribution. Wider and flatter curves have higher standard deviation value;


6. The distribution is defined by its mean and standard deviation;


7. The probability is always equals to the area under the curve, bounded by the lower and upper limits.

# $$f(x) = \frac{1}{\sqrt{2\pi\sigma}}e^{-\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^2}$$

Where:

$x$ = normal variable

$\sigma$ = standard deviation

$\mu$ = mean

The probability is gotten from the area under the curve, bounded by the specified lower and upper limits. See an example below:

![alt text](https://caelum-online-public.s3.amazonaws.com/1178-estatistica-parte2/01/img002.png)

Example: the probability to choose a people who has the height between 1.60 (lower limite) and 1.80 (upper limite) -> it's the blue area under the curve.

To obtain the area above you need calculate the integral of the function for the determined intervals. According to the equation below:

# $$P(L_i<x<L_s) = \int_{L_i}^{L_s}\frac{1}{\sqrt{2\pi\sigma}}e^{-\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^2}$$

Where:

$x$ = normal variable

$\sigma$ = standard deviation

$\mu$ = mean

$L_i$ = lower limit

$L_s$ = upper limit

# Standard normal tables

The standardized tables were created to facilite  obtaining of the values of area under the normal curve and eliminate the need to solve defined integrals.


To see the values in a standardized table, just transform our variable into standardized $Z$ variable.


This variable $Z$ represents the deviation in standard deviations of a value from the original variable related to the mean.


# $$Z = \frac{x-\mu}{\sigma}$$


Where:

$x$ = normal variable with mean $\mu$ and std $\sigma$

$\sigma$ = standard deviation

$\mu$ = mean

## Example

In a study on the heights of the residents of a city, it was found that the data set follows an **approximately normal distribution**, with **average 1.70** and **standard deviation of 0.1**. With this information, obtain the following set of probabilities:

> **A.** probability of a person, selected at random, being less than 1.80 meters.

> **B.** probability that a person, selected at random, is between 1.60 meters and 1.80 meters.

> **C.** probability of a person, selected at random, having more than 1.90 meters.

### Problem A - Identification of the area under the curve

probability of a person, selected at random, being less than 1.80 meters.

<img style='float: left' src='https://caelum-online-public.s3.amazonaws.com/1178-estatistica-parte2/01/img004.png' width='350px'>

#### Solution 01: Using standard normal tables

In [8]:
import pandas as pd
import numpy as np
from scipy.stats import norm


standard_normal_table = pd.DataFrame(
    [],
    index=['{0:0.2f}'.format(i/100) for i in range(0, 400, 10)],
    columns=['{0:0.2f}'.format(i/100) for i in range(0, 10)]
    )

for index in standard_normal_table.index:
    for column in standard_normal_table.columns:
        Z = np.round(float(index) + float(column), 2)
        standard_normal_table.loc[index, column] = "{0:0.4f}".format(norm.cdf(Z))

standard_normal_table.rename_axis('Z', axis='columns', inplace=True)

standard_normal_table

Z,0.00,0.01,0.02,0.03,0.04,0.05,0.06,0.07,0.08,0.09
0.0,0.5,0.504,0.508,0.512,0.516,0.5199,0.5239,0.5279,0.5319,0.5359
0.1,0.5398,0.5438,0.5478,0.5517,0.5557,0.5596,0.5636,0.5675,0.5714,0.5753
0.2,0.5793,0.5832,0.5871,0.591,0.5948,0.5987,0.6026,0.6064,0.6103,0.6141
0.3,0.6179,0.6217,0.6255,0.6293,0.6331,0.6368,0.6406,0.6443,0.648,0.6517
0.4,0.6554,0.6591,0.6628,0.6664,0.67,0.6736,0.6772,0.6808,0.6844,0.6879
0.5,0.6915,0.695,0.6985,0.7019,0.7054,0.7088,0.7123,0.7157,0.719,0.7224
0.6,0.7257,0.7291,0.7324,0.7357,0.7389,0.7422,0.7454,0.7486,0.7517,0.7549
0.7,0.758,0.7611,0.7642,0.7673,0.7704,0.7734,0.7764,0.7794,0.7823,0.7852
0.8,0.7881,0.791,0.7939,0.7967,0.7995,0.8023,0.8051,0.8078,0.8106,0.8133
0.9,0.8159,0.8186,0.8212,0.8238,0.8264,0.8289,0.8315,0.834,0.8365,0.8389


In [11]:
mean = 1.70
std = 0.1
x = 1.80

z = (x - mean)/ std
print(z)
print('probability = 0.8413')

1.0000000000000009
probability = 0.8413


#### Solution 02: Using scipy method

In [14]:
from scipy.stats import norm

probability = norm.cdf(z)
probability

0.8413447460685431

### Problem B

probability that a person, selected at random, is between 1.60 meters and 1.80 meters.

In [33]:
mean = 1.70
std = 0.1
x1 = 1.60
x2 = 1.80

z1 = (x1 - mean)/ std
z2 = (x2 - mean)/ std
print(z1)
print(z2)

probability = norm.cdf(z2) - norm.cdf(z1)
probability

-0.9999999999999987
1.0000000000000009


0.6826894921370857

### Problem C
probability of a person, selected at random, having more than 1.90 meters.

In [34]:
mean = 1.70
std = 0.1
x = 1.90

z = (x - mean)/ std

probability = 1 - norm.cdf(z)
probability

0.02275013194817921

- see: 

https://s3.amazonaws.com/assets.datacamp.com/production/course_14568/slides/chapter3.pdf

https://teslaconcursos.com.br/wp-content/uploads/2016/05/Estatistica.pdf