<a href="https://colab.research.google.com/github/keshavvprabhu/statistics_tutorials/blob/main/standard_deviation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# What is Standard Deviation?

Standard Deviation is used to measure the ***spread*** of values in a sample. We can use the belof formula to calculate the Standard Deviation of a given sample:

\begin{align}
s = \sqrt \frac{\sum_{i=1}^n (x_i - \bar{x})^2}{n-1}
\end{align}

where: 

$x_i$ = each sample, $n$ = sample size, $\bar{x}$ = sample mean



A Standard Deviation cannot be good or bad because it simply tells us how spread out the values are in a sample.

The higher the value of standard deviation, the more spread out the values are in a sample. Conversely, the lower the value for the standard deviation, the more tightly packed the values are relative to each other.


# Why is the Standard Deviation important?

Standard deviation is important because it tells us how spread out the values are in a given dataset.

Whenever we analyze the dataset we are interested in finding out the following metrics:

* The center of the dataset
* The spread of the values in the dataset

By knowing where the center is located and how spread out the values are, we can gain a good understanding of the distribution of values in any dataset


## Visualization
Box plots are a good way to visualize the standard distribution of a dataset

## Coefficient of Variation
One way to determine if a standard deviation is high is to compare it to the mean of the dataset

A coefficient of variation ($C_v$) is a way to measure how spread out the values are in a dataset relative to the mean. It's calculated as:

\begin{align}
    C_v = \frac{s}{\bar{x}}
\end{align}

where:

$s$ = standard deviation; 

$\bar{x}$ = mean of dataset

When the $C_v > 1$, it is an indication that the dataset is widely spread 



In [None]:
import math
import statistics

In [None]:
list_numbers = (1,10,100,1000,10000)
sample_size = len(list_numbers)
sample_mean = statistics.mean(list_numbers)
print(f"List of Numbers: {list_numbers}")
print(f"Sample Size: {sample_size}")
print(f"Mean of the Sample: {sample_mean}")

sum = 0
for number in list_numbers:
    sum +=(number - sample_mean)**2

std_dev = math.sqrt(sum/(sample_size - 1))
print(f"Calculated Standard Deviation: {std_dev}")
print(f"Actual Standard Deviation: {statistics.stdev(list_numbers)}")
print(f"Coefficient of Variance: {std_dev/sample_mean}")

List of Numbers: (1, 10, 100, 1000, 10000)
Sample Size: 5
Mean of the Sample: 2222.2
Calculated Standard Deviation: 4368.044093184042
Actual Standard Deviation: 4368.044093184042
Coefficient of Variance: 1.9656394983278025
