### <b>Standard Deviation</b>

Both variance and standard deviation provide information about the variability or spread of data points in a dataset. Variance is in squared units, while standard deviation is in the original units of the data. The choice between them often depends on the context and the need for interpretability.

Certainly! Let's use a simple example to understand how standard deviation helps in statistics. Consider the test scores of two students, Alex and Bailey, in three subjects: Math, English, and Science.

**Alex's Scores:**
- Math: 80
- English: 85
- Science: 75

**Bailey's Scores:**
- Math: 90
- English: 82
- Science: 88

Now, let's calculate the standard deviation for each student's scores to see how spread out the scores are:

**Alex's Scores:**
1. Find the mean (average) score:
   Mean = (80 + 85 + 75) / 3  = 80 

2. Calculate the squared differences from the mean for each score:
   (80 - 80)^2 = 0, (85 - 80)^2 = 25, (75 - 80)^2 = 25 

3. Find the average of these squared differences:
   Variance = (0 + 25 + 25) / 3 = 16.67

4. Take the square root of the variance to get the standard deviation:
   Standard Deviation = sqrt(16.67) = 3.08 

**Bailey's Scores:**
1. Find the mean score:
   Mean = (90 + 82 + 88) / 3 = 86.67

2. Calculate the squared differences from the mean for each score:
   (90 - 86.67)^2 = 11.11, (82 - 86.67)^2 = 21.78, (88 - 86.67)^2 = 1.11 

3. Find the average of these squared differences:
   Variance = (11.11 + 21.78 + 1.11) / 3 = 11.67 

4. Take the square root of the variance to get the standard deviation:
   Standard Deviation = sqrt(11.67) = 3.42 

Now, let's see how standard deviation helps:

- **Alex's Scores have a standard deviation of approximately 3.08:** This indicates that Alex's scores are not too spread out from the average (80). The standard deviation is relatively small.

- **Bailey's Scores have a standard deviation of approximately 3.42:** This indicates that Bailey's scores are a bit more spread out from the average (86.67). The standard deviation is a bit larger.

In summary, standard deviation helps us understand the variability or spread in a set of data. A higher standard deviation suggests more variability, while a lower standard deviation suggests less variability or more consistency. It gives us a sense of how much the individual scores deviate from the average.

#### Import relevant libraries

In [1]:
import numpy as np
import pandas as pd

#### Load data into dataframe

In [2]:
covid_data = pd.read_csv("data/covid-data.csv")

#### Inspect the dataframe

In [3]:
covid_data.head(5)


Unnamed: 0,iso_code,continent,location,date,total_cases,new_cases,new_cases_smoothed,total_deaths,new_deaths,new_deaths_smoothed,...,female_smokers,male_smokers,handwashing_facilities,hospital_beds_per_thousand,life_expectancy,human_development_index,excess_mortality_cumulative_absolute,excess_mortality_cumulative,excess_mortality,excess_mortality_cumulative_per_million
0,AFG,Asia,Afghanistan,24/02/2020,5,5,,,,,...,,,37.746,0.5,64.83,0.511,,,,
1,AFG,Asia,Afghanistan,25/02/2020,5,0,,,,,...,,,37.746,0.5,64.83,0.511,,,,
2,AFG,Asia,Afghanistan,26/02/2020,5,0,,,,,...,,,37.746,0.5,64.83,0.511,,,,
3,AFG,Asia,Afghanistan,27/02/2020,5,0,,,,,...,,,37.746,0.5,64.83,0.511,,,,
4,AFG,Asia,Afghanistan,28/02/2020,5,0,,,,,...,,,37.746,0.5,64.83,0.511,,,,


In [4]:
covid_data = covid_data[['iso_code','continent','location','date','total_cases','new_cases']]

In [5]:
covid_data.head(5)

Unnamed: 0,iso_code,continent,location,date,total_cases,new_cases
0,AFG,Asia,Afghanistan,24/02/2020,5,5
1,AFG,Asia,Afghanistan,25/02/2020,5,0
2,AFG,Asia,Afghanistan,26/02/2020,5,0
3,AFG,Asia,Afghanistan,27/02/2020,5,0
4,AFG,Asia,Afghanistan,28/02/2020,5,0


In [6]:
covid_data.dtypes

iso_code       object
continent      object
location       object
date           object
total_cases     int64
new_cases       int64
dtype: object

In [7]:
covid_data.shape

(5818, 6)

#### Identify the standard deviation of a dataset using the std method in numpy

In [8]:
data_sd = np.std(covid_data["new_cases"])
data_sd

21244.338444114834

#### Identify the standard deviation of a dataset using the std method in pandas

In [9]:
covid_data["new_cases"].std()

21246.164421895