## Tutorial :: The numbers behind COVID-19

**CONCERN**

You are working for Qantas as a data analyst. Your manager welcomes the Australian open borders. However, he wants to have an overview of how different countries in the world have been dealing with the pandemic to restart operation in the safest way possible.

1. **Q**uestion
2. **D**ata
3. **A**nalysis
4. **V**isualisation
5. **I**nsight

<img src="graphics/QDAVI_cycle_sm.png" width="50%" />

### 1. Question

How each country is in terms of new cases, ICU pations and vaccination?

### 2. Data

We are going to use a file called `owid-covid-data.csv`, located in the data folder, week-7

In [None]:
# Libraries for the analysis
import pandas as pd
import matplotlib.pyplot as plt

In [None]:
# Load the dataset
df = pd.read_csv(???)
df

#### Clean/preprocess data

In [None]:
# Check data types
df.dtypes

In [None]:
# Convert date to a date data type
df["date"] = pd.to_datetime(df["date"], format="%d/%m/%Y")
df

**Tip:** The format will depend on the data. If the format does not match the data, it will return an error

In [None]:
# Check data types to confirm the convertion
df.dtypes

In [None]:
# Check the data descriptive statistics
df.describe()

In [None]:
# Check the missing values
df.isnull().sum()

### 3. Analysis

The data needs to be grouped to have a better perspective of the data

In [None]:
# Create new columns with the year and month
df["year"] = df["date"].dt.year
df["month"] = df["date"].dt.month
df

In [None]:
# Find the unique values for locations
df["location"].unique()

In [None]:
# Filter by Australia 2021
au_df = df[(???) & (???)]
au_df

In [None]:
# Group Australian data by month
au_df_grouped = au_df[["new_cases", "new_deaths", "people_fully_vaccinated", "month"]].groupby("month").agg({"new_cases": "sum",
    "new_deaths": "sum", "people_fully_vaccinated": "max"})
au_df_grouped

In [None]:
# Filter by Brazil 2021
br_df = df[(???) & (???)]
br_df

In [None]:
# Group Brazilian data by month
br_df_grouped = br_df[["new_cases", "new_deaths", "people_fully_vaccinated", "month"]].groupby("month").agg({"new_cases": "sum",
    "new_deaths": "sum", "people_fully_vaccinated": "max"})
br_df_grouped

### 4. Visualisation

In [None]:
# Visualise the countries
fig, axs = plt.subplots(nrows=1, ncols=3, figsize=(20,5))
fig.suptitle("Countries COVID-19 overview 2021", fontweight="bold", size=25)
fig.tight_layout(h_pad=20)

axs[0].set_title("New cases", fontweight="bold", size=15)
au_df_grouped["new_cases"].plot(ax=axs[0], label="Australia")
br_df_grouped["new_cases"].plot(ax=axs[0], label="Brazil")
axs[0].legend(loc='upper right')

axs[1].set_title("New deaths", fontweight="bold", size=15)
au_df_grouped["new_deaths"].plot(ax=axs[1], label="Australia")
br_df_grouped["new_deaths"].plot(ax=axs[1], label="Brazil")
axs[1].legend(loc='upper right')

axs[2].set_title("People fully vaccinated", fontweight="bold", size=15)
au_df_grouped["people_fully_vaccinated"].plot(ax=axs[2], label="Australia")
br_df_grouped["people_fully_vaccinated"].plot(ax=axs[2], label="Brazil")
axs[2].legend(loc='upper left')

### 5. Insights

- Is it easier to compare the countries with our approach?
- Is it fair and ethic the way we are analysing the data?
- Can you think of a different way of analysing the data with the data we already have?