# COVID 19 Daily Cases Using Open Data

##### Author: Laura G. Funderburk
##### Date: July 28 2020
##### Last modified: July 29 2020

### Intro

In this notebook I will visualize COVID 19 Daily cases in Canada and other countries using Python. 

### Source

COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University https://github.com/CSSEGISandData/COVID-19.

Dong E, Du H, Gardner L. An interactive web-based dashboard to track COVID-19 in real time. Lancet Infect Dis; published online Feb 19. https://doi.org/10.1016/S1473-3099(20)30120-1.

This notebook uses an API implemented by https://github.com/CSSEGISandData/COVID-19/issues/851

In [None]:
# Importing libraries
import requests as r 
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go

#do this if you don't have the latest pandas version
#from pandas.io.json import json_normalize

Now we are going to download the data using the API. 

In [None]:
# Get API LINK with confirmed cases
API_LINK= "https://covid19api.herokuapp.com/deaths"
# Pull data
json_data = r.get(API_LINK).json()

Once we have downloaded the data, we will format it into a pandas dataframe using the `locations` column.

In [None]:
confirmed_df = pd.json_normalize(json_data,record_path=['locations'])
# do this if you don't have the latest pandas version
# json_nomrmalize(json_data,record_path=['locations'])

We will then reset the index to the `country` column.

In [None]:
confirmed_df.set_index('country',inplace=True)

This is what our data looks like:

In [None]:
confirmed_df.head()

We next need to remove the "coordinates" and "history" prefix - this will make plotting and manipulating our data easier. 

We will define a function to remove the prefix for us. 

In [None]:
# Define a function to drop the history.prefix
# Create function drop_prefix
def drop_prefix(self, prefix):
    self.columns = self.columns.str.lstrip(prefix)
    return self

# Call function
pd.core.frame.DataFrame.drop_prefix = drop_prefix

We want to clean up our data a bit - let's remove prefices, and sort by date. 

In [None]:
# Define function which removes history. prefix, and orders the column dates in ascending order
def order_dates(flat_df):
    """This function takes as input a dataframe containing
    daily COVID 19 cases and as output generates a dataframe
    ordered by date, where prefices history and coordinated 
    are removed"""
    # Drop prefix
    flat_df.drop_prefix('history.')
    flat_df.drop_prefix("coordinates.")
    # Isolate dates columns
    flat_df.iloc[:,6:].columns = pd.to_datetime(flat_df.iloc[:,6:].columns)
    # Transform to datetim format
    sub = flat_df.iloc[:,6:]
    sub.columns = pd.to_datetime(sub.columns)
    # Sort
    sub2 = sub.reindex(sorted(sub.columns), axis=1)
    sub3 = flat_df.reindex(sorted(flat_df.columns),axis=1).iloc[:,-5:]
    # Concatenate
    final = pd.concat([sub2,sub3], axis=1, sort=False)
    return final

In [None]:
final = order_dates(confirmed_df)
final.head()

In [None]:
condition = final.index=='US'
final[condition]

In [None]:
transposed_final = final[final.index=='Canada'].set_index("province").T.iloc[:-4,]

In [None]:
transposed_final.head()

## Visualizing Total (Cumulative) Cases per Province

Run the cell below to get the provinces. 

In [None]:
transposed_final.columns

Select one of the provinces from the list, and enter it in the `province` variable in the code below.

In [None]:
province = "British Columbia"
px.scatter(transposed_final,\
           x=transposed_final.index,
           y=province,
          title='Cumulative Cases in ' + str(province),
          labels=
           {"x":"Time (daily)",
           str(province):"Number of reported deaths in "
            + str(province)})

## Observations

We see that for most provinces there is an increase in the cumulative number of cases between January and July 2020.

## Total Cumulative and Non-Cumulative Cases in the Country

In this section we will visualize cumulative and daily (non-cumulative) cases in Canada. 

In [None]:
transposed_final.head(1)

In [None]:
# Getting cumulative cases
transposed_final["TotalDailyCase"] = transposed_final.sum(axis=1)

In [None]:
transposed_final.tail()

In [None]:
px.scatter(transposed_final,
          x=transposed_final.index,
          y="TotalDailyCase",
          title="Total (cumulative) COVID19 Reported Deaths in Canada ",
          labels={"x":"Time (daily)",
                 "TotalDailyCase": "Total number of reported deaths"})

We want to get more granularity with respect to our cases. 

Let's look at non-cumulative reported infections. 

___

## Non-cumulative cases in Canada

In [None]:
non_cumulative = transposed_final.diff(axis=0)

In [None]:
non_cumulative.tail()

In [None]:
import plotly.graph_objects as go

layout = go.Layout(yaxis=dict(range=[0, 3000]))
fig = px.line(non_cumulative,
          x=non_cumulative.index,
          y="TotalDailyCase",
       title="Daily (non-cumulative) COVID19 reported deaths in Canada",
       labels={"x": "Time (days)",
              "TotalDailyCase":"Non cumulative reported daily deaths"})
                   
fig.update_layout(yaxis=dict(range=[0,3000]))
fig.show()

## Final remarks

We observe a first wave of COVID 19 cases  during March 2020 to June 2020, followed by what seems to be a second wave. 