![alt text](https://github.com/callysto/callysto-sample-notebooks/blob/master/notebooks/images/Callysto_Notebook-Banner_Top_06.06.18.jpg?raw=true)  


## Plotting the logarithmic scale on cumulative COVID-19 cases per country

In this notebook we will have an opportunity to plot the cumulative number of confirmed COVID-19 cases per country, the cumulative number of deaths per country, and plot the logarithmic scale correspondingly. 


### What is a logarithmic scale?

A logarithmic scale is a nonlinear scale often used when analyzing a large range of quantities. Instead of increasing in equal increments, each interval is increased by a factor of the base of the logarithm. Typically, a base 10 and base $e$ scale are used. In this notebook, we will use base 10. 

Let's say you have a variable $y$ which [grows exponentially](https://en.wikipedia.org/wiki/Exponential_growth), that is, 

on the first day, $y=10$, 

on the second day, $y = 100$, 

on the third day, $y = 1000$...

What this means is that every day, the value of y will increase by a factor of ten.

### Why logarithmic scale?

Using a logarithmic scale is useful when the largest numbers in the data are hundreds or thousands of times larger than the smallest numbers. 

In our previous example, 

on the first day, $log_{(10)} (y) = 1$, 

on the second day, $log_{(10)} (y) = 2$, 

and on the third day, $log_{(10)} (y) = 3$.

### COVID-19 number of confirmed cases grow exponentially

Many articles, [including this one](https://ourworldindata.org/coronavirus) have noted that the number of confirmed cases is growing exponentially - this means that every day the number of confirmed cases is increasing by a factor "x". This number varies across each country. In this notebook we will explore how this is the case. 

Press the Run button to run the next cell.


In [None]:
import requests as r
import pandas as pd
from pandas.io.json import json_normalize
import cufflinks as cf
import numpy as np
import plotly.graph_objs as go
#com/mand to display graphics correctly in a Jupyter notebook
cf.go_offline()
print("Sucess!")

We will begin by downloading the data via an API developed by [Omar Laraqui](https://github.com/Omaroid).

In [None]:
# Get the latest data
# Confirmed
try:
    API_response_confirmed = r.get("https://covid19api.herokuapp.com/confirmed")
    data = API_response_confirmed.json() # Check the JSON Response Content documentation below
    confirmed_df = json_normalize(data,record_path=["locations"])
    
    print("Confirmed cases download was successful!")
except:
    print("Error: check GitHub is functioning appropriately, check https://covid19api.herokuapp.com/ is not down, check fields were not renamed")
# Deaths
try:
    API_response_death = r.get("https://covid19api.herokuapp.com/deaths")
    data1 = API_response_death.json() # Check the JSON Response Content documentation below
    death_df = json_normalize(data1,record_path=["locations"])
    
    print("Death cases download was successful!")
except:
    print("Error: check GitHub is functioning appropriately, check https://covid19api.herokuapp.com/ is not down, check fields were not renamed")
# Latest
try:
    API_summary = r.get("https://covid19api.herokuapp.com/latest")
    data2 = API_summary.json()
    summary  = json_normalize(data2)
    print("Latest cases download was successful!")
except:
    print("Error: check GitHub is functioning appropriately, check https://covid19api.herokuapp.com/ is not down, check fields were not renamed")

Now that we have downloaded the data, let's take a look at our dataframes:

In [None]:
print("Confirmed cases, first 5 entries")
confirmed_df.head(5)

In [None]:
print("Death cases, first 5 entries")
death_df.head(5)

In [None]:
print("Summary data, latest cases")
summary

Let's flatten the data - remove the {} and expand history field. 

In [None]:
# Flattening the data 
flat_confirmed = json_normalize(data=data['locations'])
flat_death = json_normalize(data=data1['locations'])
flat_confirmed.set_index('country', inplace=True)
flat_death.set_index('country', inplace=True)


Let's take a look at the first few entries.

In [None]:
print("Flattened confirmed cases")
flat_confirmed.head(5)

In [None]:
print("Flattened death cases")
flat_death.head(5)

We need to manipulate the data a bit to remove the "history." from the dates. 

In [None]:
# Define a function to drop the history.prefix
# Create function drop_prefix
def drop_prefix(self, prefix):
    self.columns = self.columns.str.lstrip(prefix)
    return self

# Call function
pd.core.frame.DataFrame.drop_prefix = drop_prefix

# Define function which removes history. prefix, and orders the column dates in ascending order
def order_dates(flat_df):

    # Drop prefix
    flat_df.drop_prefix('history.')
    # Isolate dates columns
    flat_df.iloc[:,3:-2].columns = pd.to_datetime(flat_df.iloc[:,3:-2].columns)
    # Transform to datetim format
    sub = flat_df.iloc[:,3:-2]
    sub.columns = pd.to_datetime(sub.columns)
    # Sort
    sub2 = sub.reindex(sorted(sub.columns), axis=1)
    sub3 = flat_df.reindex(sorted(flat_df.columns),axis=1).iloc[:,-5:]
    # Concatenate
    final = pd.concat([sub2,sub3], axis=1, sort=False)
    return final

final_confirmed = order_dates(flat_confirmed)

final_deaths = order_dates(flat_death)



In [None]:
print("Reformatted confirmed")
final_confirmed.head(5)

In [None]:
print("Reformatted deaths")
final_deaths.head(5)

## Visualizing the data

In the next few cells we will manipulate the data one more time to visualize. 

In [None]:
from ipywidgets import widgets, Button, VBox, HBox,Layout
from IPython.display import display, Javascript, Markdown, HTML

def run_4cell( b ):
    
    display(Javascript('IPython.notebook.execute_cell_range(IPython.notebook.get_selected_index()+1,\
    IPython.notebook.get_selected_index()+2)'))  

    
def plot_log_function(country,final_df,type_case):
    
    latest_arr = []
    date_arr = []
    for item in final_df[final_df.index==country].iloc[:,0:-5].columns:
        date_arr.append(item)
        latest_arr.append(final_df[final_df.index==country][item].sum())

    final_confirmed_red = pd.DataFrame({"Date":date_arr,"CumulativeTotal":latest_arr})

    
    
    x = final_confirmed_red.Date
    y = final_confirmed_red.CumulativeTotal

    npy = np.array(y.to_list())
    l_y = np.log10(npy, where=0<npy, out=np.nan*npy)


    trace1 = go.Bar(x=x,y=y,name=country)
    trace2 = go.Scatter(x=x,y=l_y,name='Log ' + str(country),yaxis='y2')
    layout = go.Layout(
        title= ('Number of ' + str(type_case) + ' cases for ' + str(country)),
        yaxis=dict(title='Total Number of ' + str(type_case) + ' cases',\
                   titlefont=dict(color='blue'), tickfont=dict(color='blue')),
        yaxis2=dict(title='Logarithmic curve', titlefont=dict(color='red'), \
                    tickfont=dict(color='red'), overlaying='y', side='right'),
        showlegend=False)
    fig = go.Figure(data=[trace1,trace2],layout=layout)
    fig.update_yaxes(showgrid=True)
    fig.show()   
    
countries_regions = final_confirmed.index.unique().tolist()


style = {'description_width': 'initial'}

# UI
CD_button = widgets.Button(
        button_style='success',
        description="Choose Country", 
        layout=Layout(width='15%', height='30px'),
        style=style
    )    

all_the_widgets = [widgets.Combobox(
        # value='John',
        placeholder='Choose country',
        options = countries_regions, 
        description ='Country/Region:',
        ensure_option=True,
        style=style,
        disabled=False
    )]
    # Connect widget to function - run subsequent cells
CD_button.on_click( run_4cell )

#### Exercise

In the cell below, change type the first few letters of a country you are interested in. 

Press the "Choose Country" button to visualize the logarithmic curve for each country. 

In [None]:
tab2 = VBox(children=[HBox(children=all_the_widgets),
                          CD_button])
display(tab2)

In [None]:
country = all_the_widgets[0].value

plot_log_function(country,final_confirmed,"confirmed")
plot_log_function(country,final_deaths,"death")

### Observations

Try multiple countries and compare the red curve with the logarithmic values against the actual values. 

For example: try China, US, Canada, Italy. How does the number of actual cases change? Remember that we are computing log base 10 - which means that the log scale tells us by how many factors of 10 the number of confirmed and deaths have changed over time. 



[![Callysto.ca License](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-bottom.jpg?raw=true)](https://github.com/callysto/curriculum-notebooks/blob/master/LICENSE.md)