# Analyzing COVID-19 data



Code snippets to import/merge the data based on https://medium.com/@jcharistech/data-cleaning-a-practical-example-with-coronavirus-dataset-using-pandas-and-schedule-for-14abf485c881

Import data from https://raw.githubusercontent.com/CSSEGISandData

Read their terms of use! Last time I checked it, it is strictly for public use in academic or research purposes.

## Import data

In [2]:
import pandas as pd

confirmed_cases_url = "https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Confirmed.csv"
recovered_cases_url = "https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Recovered.csv"
death_cases_url ="https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Deaths.csv"

raw_confirmed = pd.read_csv(confirmed_cases_url)
raw_recovered = pd.read_csv(recovered_cases_url)
raw_deaths = pd.read_csv(death_cases_url)

Let's have a look at a few entries of the raw data.

In [3]:
raw_confirmed[:8]

Unnamed: 0,Province/State,Country/Region,Lat,Long,1/22/20,1/23/20,1/24/20,1/25/20,1/26/20,1/27/20,...,3/5/20,3/6/20,3/7/20,3/8/20,3/9/20,3/10/20,3/11/20,3/12/20,3/13/20,3/14/20
0,,Thailand,15.0,101.0,2,3,5,7,8,8,...,47,48,50,50,50,53,59,70,75,82
1,,Japan,36.0,138.0,2,1,2,2,4,4,...,360,420,461,502,511,581,639,639,701,773
2,,Singapore,1.2833,103.8333,0,1,3,3,4,5,...,117,130,138,150,150,160,178,178,200,212
3,,Nepal,28.1667,84.25,0,0,0,1,1,1,...,1,1,1,1,1,1,1,1,1,1
4,,Malaysia,2.5,112.5,0,0,0,3,4,4,...,50,83,93,99,117,129,149,149,197,238
5,British Columbia,Canada,49.2827,-123.1207,0,0,0,0,0,0,...,13,21,21,27,32,32,39,46,64,64
6,New South Wales,Australia,-33.8688,151.2093,0,0,0,0,3,4,...,22,26,28,38,48,55,65,65,92,112
7,Victoria,Australia,-37.8136,144.9631,0,0,0,0,1,1,...,10,10,11,11,15,18,21,21,36,49


## Adaptation

Adapt according to your needs.

## Convert data

Convert input data to a data structure which is better suited for our analysis (Note that lat/long values are dropped here).

In [20]:
def get_n_melt_data(raw_data,case_type):
    # Drop values 'Lat' and 'Long'
    raw_data.drop(['Lat', 'Long'], axis=1, inplace=True)
    melted_df = raw_data.melt(id_vars=['Province/State', 'Country/Region'])
    melted_df.rename(columns={"variable":"Date","value":case_type},inplace=True)
    return melted_df

melted_confirmed = get_n_melt_data(raw_confirmed,"Confirmed")
melted_recovered = get_n_melt_data(raw_recovered,"Recovered")
melted_deaths = get_n_melt_data(raw_deaths,"Deaths")

final_df = melted_confirmed.join(melted_recovered['Recovered']).join(melted_deaths['Deaths'])
final_df[0:10]

Unnamed: 0,Province/State,Country/Region,Date,Confirmed,Recovered,Deaths
0,,Thailand,1/22/20,2,0,0
1,,Japan,1/22/20,2,0,0
2,,Singapore,1/22/20,0,0,0
3,,Nepal,1/22/20,0,0,0
4,,Malaysia,1/22/20,0,0,0
5,British Columbia,Canada,1/22/20,0,0,0
6,New South Wales,Australia,1/22/20,0,0,0
7,Victoria,Australia,1/22/20,0,0,0
8,Queensland,Australia,1/22/20,0,0,0
9,,Cambodia,1/22/20,0,0,0


Now we can access the numbers Confirmed/Recovered/Deaths via the key values of Province/State + Country/Region + Date.

In [17]:
final_df.keys

<bound method NDFrame.keys of               Province/State                    Country/Region      Lat  \
0                        NaN                          Thailand  15.0000   
1                        NaN                             Japan  36.0000   
2                        NaN                         Singapore   1.2833   
3                        NaN                             Nepal  28.1667   
4                        NaN                          Malaysia   2.5000   
5           British Columbia                            Canada  49.2827   
6            New South Wales                         Australia -33.8688   
7                   Victoria                         Australia -37.8136   
8                 Queensland                         Australia -28.0167   
9                        NaN                          Cambodia  11.5500   
10                       NaN                         Sri Lanka   7.0000   
11                       NaN                           Germany  51.000