# Covid-19 Report

In this project we are going through the covid-19 data from the [John Hopkins University](https://github.com/CSSEGISandData/COVID-19) to build a full world status report. This project is divided in 3 parts:

* Setting up the data
* Exploratory data analysis
* Build the report

## Setting up the data

We start our project by loading the need packages and the John Hopkins University data.

In [5]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from fpdf import FPDF
import plotly.express as px
from datetime import datetime, timedelta

confirmed_link = 'https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv'
death_link = 'https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv'
recovered_link = 'https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_recovered_global.csv'

confirmed_df = pd.read_csv(confirmed_link)
confirmed_df.tail()

Unnamed: 0,Province/State,Country/Region,Lat,Long,1/22/20,1/23/20,1/24/20,1/25/20,1/26/20,1/27/20,...,2/28/23,3/1/23,3/2/23,3/3/23,3/4/23,3/5/23,3/6/23,3/7/23,3/8/23,3/9/23
284,,West Bank and Gaza,31.9522,35.2332,0,0,0,0,0,0,...,703228,703228,703228,703228,703228,703228,703228,703228,703228,703228
285,,Winter Olympics 2022,39.9042,116.4074,0,0,0,0,0,0,...,535,535,535,535,535,535,535,535,535,535
286,,Yemen,15.552727,48.516388,0,0,0,0,0,0,...,11945,11945,11945,11945,11945,11945,11945,11945,11945,11945
287,,Zambia,-13.133897,27.849332,0,0,0,0,0,0,...,343012,343012,343079,343079,343079,343135,343135,343135,343135,343135
288,,Zimbabwe,-19.015438,29.154857,0,0,0,0,0,0,...,263921,264127,264127,264127,264127,264127,264127,264127,264276,264276


In [6]:
death_df = pd.read_csv(death_link)
death_df.tail()

Unnamed: 0,Province/State,Country/Region,Lat,Long,1/22/20,1/23/20,1/24/20,1/25/20,1/26/20,1/27/20,...,2/28/23,3/1/23,3/2/23,3/3/23,3/4/23,3/5/23,3/6/23,3/7/23,3/8/23,3/9/23
284,,West Bank and Gaza,31.9522,35.2332,0,0,0,0,0,0,...,5708,5708,5708,5708,5708,5708,5708,5708,5708,5708
285,,Winter Olympics 2022,39.9042,116.4074,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
286,,Yemen,15.552727,48.516388,0,0,0,0,0,0,...,2159,2159,2159,2159,2159,2159,2159,2159,2159,2159
287,,Zambia,-13.133897,27.849332,0,0,0,0,0,0,...,4057,4057,4057,4057,4057,4057,4057,4057,4057,4057
288,,Zimbabwe,-19.015438,29.154857,0,0,0,0,0,0,...,5663,5668,5668,5668,5668,5668,5668,5668,5671,5671


In [7]:
recovered_df = pd.read_csv(recovered_link)
recovered_df.tail()

Unnamed: 0,Province/State,Country/Region,Lat,Long,1/22/20,1/23/20,1/24/20,1/25/20,1/26/20,1/27/20,...,2/28/23,3/1/23,3/2/23,3/3/23,3/4/23,3/5/23,3/6/23,3/7/23,3/8/23,3/9/23
269,,West Bank and Gaza,31.9522,35.2332,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
270,,Winter Olympics 2022,39.9042,116.4074,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
271,,Yemen,15.552727,48.516388,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
272,,Zambia,-13.133897,27.849332,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
273,,Zimbabwe,-19.015438,29.154857,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


We have 3 different dataframes containing information about the confirmed, the death and the recovered cases. The data format is such that there is a row per country and a column per date. There are 299 rows in the confirmed and death dataframes and 274 in the recovered dataframe, which means that we don't have the information for the recovered cases for all countries.

Let us reshape the data into a more suitable format, so that we can have a column with the dates and a column with the cases.

In [14]:
confirmed_df = pd.melt(confirmed_df, id_vars=confirmed_df.columns[0:4], value_vars=confirmed_df.columns[4:],
        var_name='Date', value_name='Cases')
confirmed_df.tail()

Unnamed: 0,Province/State,Country/Region,Lat,Long,Date,Cases
330322,,West Bank and Gaza,31.9522,35.2332,3/9/23,703228
330323,,Winter Olympics 2022,39.9042,116.4074,3/9/23,535
330324,,Yemen,15.552727,48.516388,3/9/23,11945
330325,,Zambia,-13.133897,27.849332,3/9/23,343135
330326,,Zimbabwe,-19.015438,29.154857,3/9/23,264276


In [15]:
death_df = pd.melt(death_df, id_vars=death_df.columns[0:4], value_vars=death_df.columns[4:],
        var_name='Date', value_name='Cases')
death_df.tail()

Unnamed: 0,Province/State,Country/Region,Lat,Long,Date,Cases
330322,,West Bank and Gaza,31.9522,35.2332,3/9/23,5708
330323,,Winter Olympics 2022,39.9042,116.4074,3/9/23,0
330324,,Yemen,15.552727,48.516388,3/9/23,2159
330325,,Zambia,-13.133897,27.849332,3/9/23,4057
330326,,Zimbabwe,-19.015438,29.154857,3/9/23,5671


In [16]:
recovered_df = pd.melt(recovered_df, id_vars=recovered_df.columns[0:4], value_vars=recovered_df.columns[4:],
        var_name='Date', value_name='Cases')
recovered_df.tail()

Unnamed: 0,Province/State,Country/Region,Lat,Long,Date,Cases
313177,,West Bank and Gaza,31.9522,35.2332,3/9/23,0
313178,,Winter Olympics 2022,39.9042,116.4074,3/9/23,0
313179,,Yemen,15.552727,48.516388,3/9/23,0
313180,,Zambia,-13.133897,27.849332,3/9/23,0
313181,,Zimbabwe,-19.015438,29.154857,3/9/23,0


In [17]:
confirmed_df[confirmed_df['Country/Region']=='Brazil']

Unnamed: 0,Province/State,Country/Region,Lat,Long,Date,Cases
31,,Brazil,-14.235,-51.9253,1/22/20,0
320,,Brazil,-14.235,-51.9253,1/23/20,0
609,,Brazil,-14.235,-51.9253,1/24/20,0
898,,Brazil,-14.235,-51.9253,1/25/20,0
1187,,Brazil,-14.235,-51.9253,1/26/20,0
...,...,...,...,...,...,...
328913,,Brazil,-14.235,-51.9253,3/5/23,37081209
329202,,Brazil,-14.235,-51.9253,3/6/23,37076053
329491,,Brazil,-14.235,-51.9253,3/7/23,37076053
329780,,Brazil,-14.235,-51.9253,3/8/23,37076053
