# Variance and Standard Deviation

## Notation

Variance of a random variable is denoted by $ \sigma² $ or $ \sigma_{x}² $. Notation for standard deviation is $ \sigma $ or $ \sigma_{x} $.

## Variance

The variance of a random variable is roughly interpreted as the average squared distance from the mean for all the outcomes you would get in a long term, over all possible samples.

The variance is the average of the squared differences from the mean. To figure out the variance, first calculate the difference between each point and the mean; then, square and average the results.

Variance has a scale larger than the values in the given data set; not expressed in the same unit as the values themselves.

Variance is expressed as a mathematical dispersion. Since it’s an arbitrary number relative to the original measurements of the data set, it is difficult to visualize and apply in a real-world sense. Finding the variance is usually just the final step before finding the standard deviation. Variance values are sometimes used in finance and statistical formulas. 

## Standard Deviation

In statistics, the standard deviation is a measure of the amount of variation or dispersion of a set of values. A low standard deviation indicates that the values tend to be close to the mean (also called the expected value) of the set, while a high standard deviation indicates that the values are spread out over a wider range. 

The standard deviation of a random variable, statistical population, data set, or probability distribution is the square root of its variance.

Standard deviation has a scale as values in the given data set; therefore, expressed in the same units.

Standard deviation, which is expressed in the original units of the data set, is much more intuitive and closer to the values of the original data set. It is most often used to analyze demographics or population samples to gain a sense of what is **normal in the population**. 

## Example

Assume we have some dandelions of 3, 4, 5, 4, 11 and 6 inches. If we calculate the standard deviation of this set with python:

In [2]:
import pandas as pd

dataset = pd.DataFrame([3, 4, 5, 4, 11, 6])
print('Mean: ', dataset.mean())
print('Standard Deviation: ', dataset.std())

Mean:  0    5.5
dtype: float64
Standard Deviation:  0    2.880972
dtype: float64


The standard deviation is about 2.88 inches. That means that for the sample, any dandelion within 2.69 inches of the mean (5.5 inches) is ‘normal’. 

## Outliers

![title](Bell-curve.png)

In a normal distribution, about 68% of the population (or values) falls within 1 standard deviation (1σ) of the mean and about 94% fall within 2σ. Values that differ from the mean by 1.7σ or more are usually considered outliers. 