### Normal Distribution

* One of the most popular continuous distribution in Analytics field.
* Normal distribution is observed across many naturally occurring measures such as birth weight, height and intelligence etc

#### Probability Density Function

f(x) = $\frac {1}{\sigma\sqrt(2 \pi)} e^\frac {-(x-\mu^2)}{2\sigma^2}$

Where
* f(x) is used to represent a probability density function
* x is any value of the continuous variable, where -∞ < x < ∞
* e denotes the mathematical constant approximated by 2.71828
* Π is a mathematical constant approximated by 3.14159
* μ and σ are the mean and standard deviation of the normal distribution


For a continuous function, the probability density function is the probability that the value has the value x. 
Since for continuous distributions the probability at a single point is zero, this is expressed in terms of an integral of its probability density function
P(X<= x) = F(x) = 
$\int_{-\infty}^xf(t)dt$


Standardize normal variable
Compute Z by subtracting the mean, mu from a normally distributed variable, divide by the standard deviation, sigma.
Z = (X - μ) / σ 
Z is in standard units. 
Z ~ N(0,1) and the variable, Z always has mean 0 and standard deviation 1

Its pdf is given by

fZ(z) = $\frac{1}{ √2Π} {exp \frac {-z2} {2}}$

for all z $\in R$

The (1 / √2Π ) is there to make sure that the area under the PDF is 1.

* For a normal distribution, mu is the location parameter, which locates (centre) the distribution on the horizontal axis.
* Sigma is the scale parameter, which defines the spread of the normal distribution.
* Normal distribution has no shape parameter since all the normal distribution curves have bell shape and are symmetrical. 


### Properties
1. Theoretical normal density functions are defined between -∞ and ∞
2. There are two parameters, location (μ which is the mean) and scale (σ which is standard deviation).
3. It has a symmetrical (bell shape) around the mean. mean = median = mode
4. Areas between specific values are measured in terms of μ and σ
5. Any linear transformation if a normal random variable is also normal random variable.
6. If X1 is an independent normal random variable with mean μ1 and variance $\sigma1^2$ and 
      X2 is another independent normal random variable with mean μ2 and$\sigma2^2$, 
then X1 + X2 is also a normal distribution with mean μ1 + μ2 and variance $\sigma1^2$ + $\sigma2^2$ 

| Value of the random variable | Area under the normal distribution (CDF) |
| --------------------------------- | --------------------------|
| $\mu - \sigma \leq x \leq \mu + \sigma$ | 0.6828 |
| $\mu - 2\sigma \leq x \leq \mu + 2\sigma$ | 0.9545 |
| $\mu - 3\sigma \leq x \leq \mu + 3\sigma$ | 0.9974 |

### Some important functions in python for solving Normal Distribution problems

#### 1. Cumulative Density Function (cdf)

1. scipy.stats.norm.cdf(z)                 # Here z is an attribute

2. stats.norm.cdf(z2) – stats.norm.cdf(z1) # Here z is an attribute

3. stats.norm.isf(0.99) # Inverse Survival function gives the value given a probability

### EXERCISE

A survey on use of smart phones in India was conducted and it is observed the smart phone users spend 68 minutes in a day on average in sending messages and the corresponding standard deviation is 12 minutes. 

* Assume that the time spent in sending messages follows a normal distribution. 
* a) What proportion of the smart phone users are spending more than 90 minutes in sending messages daily?
* b) What proportion of customers are spending less than 20 minutes?
* c) What proportion of customers are spending between 50 minutes and 100 minutes?

### Compute Z by subtracting the mean, mu from a normally distributed variable, divide by the standard deviation, sigma

In [1]:
import numpy as np
import scipy.stats as stats
import matplotlib.pyplot as plt

In [2]:
# Write code here
#more than 90 minutes in sending messages daily
z = (90-68)/12
1-stats.norm.cdf(z)

0.03337650758481725

In [3]:
#less than 20 minutes
z=(20-68)/12
stats.norm.cdf(z)

3.167124183311986e-05

In [4]:
#between 50 minutes and 100 minutes
z1 = (50-68)/12
z2 = (100-68)/12
stats.norm.cdf(z2)-stats.norm.cdf(z1)

0.9293624181635521

### EXERCISE

The mean salaries of Data Scientists working in Chennai, India is calculated to be 7,00,000 INR with a standard deviation of 90,000 INR. The random variable salary of Data Scientists follows a normal distribution.

* a) What is the probability that a Data Scientist in Chennai has a salary more than 10,00,000 INR?
* b) What is the probability that a Data Scientist in Chennai has a salary between 6,00,000 & 9,00,000 INR?
* c) What is the probability that a Data Scientist in Chennai has a salary less than 4,00,000 INR?

In [5]:
# Write code here
z = (1000000-700000)/90000
z

3.3333333333333335

In [6]:
#more than 10,00,000
1-stats.norm.cdf(z)

0.0004290603331967846

In [7]:
#probability that a Data Scientist in Chennai has a salary between 6,00,000 & 9,00,000
z1 = (600000-700000)/90000
z2 = (900000-700000)/90000
(stats.norm.cdf(z2)-stats.norm.cdf(z1))

0.8536055914064735

In [8]:
# probability that a Data Scientist in Chennai has a salary less than 4,00,000
z = (400000-700000)/90000
stats.norm.cdf(z)

0.0004290603331968372

### EXERCISE

The fill amount in 2-liter soft drink bottles is normally distributed, with a mean of 2.0 liters and a standard deviation of 0.05 liter. If the bottles contain less than 95% of the listed net content (1.90 liters, in our case), the manufacturer may be subject to penalty by the state office of consumer affairs. Bottles that have a net content above 2.1 liters may cause excess spillage upon opening. What is the proportion of bottles that will contain

* a) between 1.9 and 2.0 liters
* b) between 1.9 and 2.1 liters
* c) below 1.9 liters or above 2.1 liters
* d) At least how much soft drink is contained in 99% of the bottles?

In [9]:
# Write code here
#between 1.9 and 2.0 liters
z1 = (1.9-2.0)/0.05
z2 = (2.0-2.0)/0.05
stats.norm.cdf(z2)-stats.norm.cdf(z1)

0.4772498680518209

In [10]:
#between 1.9 and 2.1 liters
z1 = (1.9-2.0)/0.05
z2 = (2.1-2.0)/0.05
stats.norm.cdf(z2)-stats.norm.cdf(z1)

0.9544997361036418

In [11]:
#below 1.9 liters or above 2.1 liters
z1 = (1.9-2.0)/0.05
z2 = (2.1-2.0)/0.05
stats.norm.cdf(z1)+(1-stats.norm.cdf(z2))

0.045500263896358195

In [12]:
#At least how much soft drink is contained in 99% of the bottles?
z = -stats.norm.isf(0.99)
2.0 + z*0.05

2.116317393702042