# Variance and standard deviation

## 1. Computing the variance:
It is important to have some understanding of what commonly-used functions are doing under the hood. Though you may already know how to compute variances, this is a beginner course that does not assume so. In this exercise, we will explicitly compute the variance of the petal length of _Iris veriscolor_ using the equations discussed in the videos. We will then use `np.var()` to compute it.

### Instructions:
* Create an array called `differences` that is the difference between the petal lengths (`versicolor_petal_length`) and the mean petal length. The variable `versicolor_petal_length` is already in your namespace as a NumPy array so you can take advantage of NumPy's vectorized operations.
* Square each element in this array. For example, `x**2` squares each element in the array `x`. Store the result as `diff_sq`.
* Compute the mean of the elements in `diff_sq` using `np.mean()`. Store the result as `variance_explicit`.
* Compute the variance of `versicolor_petal_length` using `np.var()`. Store the result as `variance_np`.
* Print both `variance_explicit` and `variance_np` in one `print` call to make sure they are consistent.

In [1]:
# Import pandas
import pandas as pd
# Import plotting modules
import matplotlib.pyplot as plt
% matplotlib inline
import seaborn as sns
# Set default Seaborn style
sns.set()
# Import numpy
import numpy as np

In [2]:
# Load iris file
# file_path = '13_Statistical_Thinking_1/_datasets/'  # path for Intellij
file_path = '../_datasets/'                            # path for Jupyter
iris = pd.read_csv(file_path+'iris.csv')

# Filter all rows with versicolor in species column
versicolor = iris.loc[iris['species']=='versicolor']
# Select petal length column
versicolor_petal_length = versicolor.iloc[:,2]

In [3]:
# Array of differences to mean: differences
differences = versicolor_petal_length - np.mean(versicolor_petal_length)

# Square the differences: diff_sq
diff_sq = differences**2

# Compute the mean square difference: variance_explicit
variance_explicit = np.mean(diff_sq)

# Compute the variance using NumPy: variance_np
variance_np = np.var(versicolor_petal_length)

# Print the results
print(variance_explicit, variance_np)


0.21640000000000004 0.21640000000000004


## 2. The standard deviation and the variance:
As mentioned in the video, the standard deviation is the square root of the variance. You will see this for yourself by computing the standard deviation using `np.std()` and comparing it to what you get by computing the variance with `np.var()` and then computing the square root.

### Instructions:
* Compute the variance of the data in the `versicolor_petal_length` array using `np.var()` and store it in a variable called `variance`.
* Print the square root of this value.
* Print the standard deviation of the data in the `versicolor_petal_length` array using `np.std()`.

In [4]:
# Compute the variance: variance
variance = np.var(versicolor_petal_length)

# Print the square root of the variance
print(variance**(1/2))

# Print the standard deviation
print(np.std(versicolor_petal_length))

0.4651881339845203
0.4651881339845203
