### <b>Variance</b>   
The statistical measurement of the spread between numbers in a dataset. It measures how far each number in the set is from the mean(average), and thus from every other number in the set.

Let's use a simple example to understand how variance helps in statistics. Imagine you have the heights of students in two different classes:

Class A: 150 cm, 155 cm, 160 cm, 165 cm, 170 cm <br/>
Class B: 155 cm, 155 cm, 155 cm, 155 cm, 155 cm

Now, let's calculate the variance for each class to see how spread out the heights are:

**Class A:**
1. Find the mean (average) height:
   Mean = (150 + 155 + 160 + 165 + 170) / 5 = 160 cm

2. Calculate the squared differences from the mean for each height:
   (150 - 160)^2 = 100, (155 - 160)^2 = 25, (160 - 160)^2 = 0, (165 - 160)^2 = 25, (170 - 160)^2 = 100 

3. Find the average of these squared differences:
   Variance = (100 + 25 + 0 + 25 + 100) \ 5 = 50 cm²

**Class B:**
1. Find the mean height (since all heights are the same, the mean is simply 155).

2. Calculate the squared differences from the mean for each height:
   (155 - 155)^2 = 0, \, (155 - 155)^2 = 0, \, (155 - 155)^2 = 0, \, (155 - 155)^2 = 0, \, (155 - 155)^2 = 0

3. Find the average of these squared differences (which is 0).

Now, let's see how variance helps:

- **Class A has a variance of 50 cm²:** This indicates that the heights in Class A are somewhat spread out from the average (160 cm). Some students are taller, and some are shorter.

- **Class B has a variance of 0:** This indicates that all students in Class B have the same height (155 cm), so there's no spread. The heights are not varying; they are all the same.

In summary, variance helps us understand the spread or variability in a set of data. A higher variance suggests more variability, while a lower variance suggests less variability or more uniformity.

#### Import relevant libraries

In [1]:
import numpy as np
import pandas as pd

#### Load data into dataframe

In [2]:
covid_data = pd.read_csv("data/covid-data.csv")

#### Inspect the dataframe

In [3]:
covid_data.head(5)

Unnamed: 0,iso_code,continent,location,date,total_cases,new_cases,new_cases_smoothed,total_deaths,new_deaths,new_deaths_smoothed,...,female_smokers,male_smokers,handwashing_facilities,hospital_beds_per_thousand,life_expectancy,human_development_index,excess_mortality_cumulative_absolute,excess_mortality_cumulative,excess_mortality,excess_mortality_cumulative_per_million
0,AFG,Asia,Afghanistan,24/02/2020,5,5,,,,,...,,,37.746,0.5,64.83,0.511,,,,
1,AFG,Asia,Afghanistan,25/02/2020,5,0,,,,,...,,,37.746,0.5,64.83,0.511,,,,
2,AFG,Asia,Afghanistan,26/02/2020,5,0,,,,,...,,,37.746,0.5,64.83,0.511,,,,
3,AFG,Asia,Afghanistan,27/02/2020,5,0,,,,,...,,,37.746,0.5,64.83,0.511,,,,
4,AFG,Asia,Afghanistan,28/02/2020,5,0,,,,,...,,,37.746,0.5,64.83,0.511,,,,


In [4]:
covid_data = covid_data[['iso_code','continent','location','date','total_cases','new_cases']]

In [5]:
covid_data.head(5)

Unnamed: 0,iso_code,continent,location,date,total_cases,new_cases
0,AFG,Asia,Afghanistan,24/02/2020,5,5
1,AFG,Asia,Afghanistan,25/02/2020,5,0
2,AFG,Asia,Afghanistan,26/02/2020,5,0
3,AFG,Asia,Afghanistan,27/02/2020,5,0
4,AFG,Asia,Afghanistan,28/02/2020,5,0


In [6]:
covid_data.dtypes

iso_code       object
continent      object
location       object
date           object
total_cases     int64
new_cases       int64
dtype: object

In [7]:
covid_data.shape

(5818, 6)

#### Check the variance of the new_cases column using the var method in numpy

In [8]:
data_variance = np.var(covid_data["new_cases"])
data_variance

451321915.9280954

#### Check the variance of the new_cases column using the var method in pandas

In [9]:
covid_data["new_cases"].var()

451399502.6421969