# Descriptive and predictive analysis on COVID-19 using Growth Factor

<h2>Covid-19 Growth Factor by Country</h2>

Please watch the following 9-minute video on exponential growth and the spread of disease...https://www.youtube.com/watch?v=Kas0tIxDvrg

The versions before Version 16 explored Savitzky–Golay filter (https://en.wikipedia.org/wiki/File:Lissage_sg3_anim.gif) for smoothing data however it seems that that filter is too good at retaining the original trends in data. So here we change to a different smoothing technique.

**Disclaimer**: This notebook demonstrates a very simple mathematical model, a differential equation called the logistic equation which is a special case of the Bernoulli equation. The purpose of this notebook was to illustrate mathematical modeling with simple ordinary differential equations to my introductory mathematical modeling class. I am not a health expert, this notebook should not be taken too seriously.

<h2>Introduction to the maths: Exponential vs Logistic</h2>

The spread of infectious disease can be modeled using a logistic curve rather than an exponential curve. The growth starts exponentially, but must slow down after some point called the **inflection point**. The inflection point is essentially the midpoint of the spread. We will model the number of confirmed cases using a logistic curve. Let's look at the equation for such a curve, the differential equation for which this curve is a solution, and the graph of the curve.

###Logistic Curve Graph

![Log Curve Graph by 3Blue1Brown](https://resus.me/wp-content/uploads/2020/03/2C034D87-1EDF-4EC9-9BDE-F3D7475950B0-1024x572.png)

###Logistic Function

A **logistic function** or **logistic curve** is an equation of the form: ![LogisticFunction](https://andymath.com/wp-content/uploads/2019/08/Logistic-Function.jpg)
Where

* x_0 = the inflection point,
* N = the curve's maximum value, and
* k = growth rate or steepness of the curve.

For reference: https://en.wikipedia.org/wiki/Logistic_function.
Actually, the logistic function is just a solution for the following first-order, non-linear ordinary differential equation called the Logistic Differential Equation
![General form of the logistic equiaton in the general form](https://wikimedia.org/api/rest_v1/media/math/render/svg/56ea9da1d67a28910d8550db9f54255e70b75714)

From the differential equation, stability of solutions and equilibria can be explored. However, this may not be directly helpful in predicting confirmed cases, so let's keep things simple for now and just look at the growth metrics.

## Explanation of the maths

Now that we have seen the math, let's explore the following growth metrics for the confirmed cases for each country:
* Growth Factor
* Growth Ratio
* Growth Rate
* 2nd Derivative

The **growth factor** on day N is the number of confirmed cases on day N minus confirmed cases on day N-1 divided by the number of confirmed cases on day N-1 minus confirmed cases on day N-2.

The **growth ratio** on day N is the number of confirmed cases on day N divided by the number of confirmed cases on day N-1.

The **growth rate** is simply the first derivative.

We will use these growth metrics to gain insight into which countries may have already hit their inflection points. For example, if a country's growth factor has stabilized around 1.0 then this can be a sign that that country has reached it's inflection point. We will then use curve fitting to fit a logistic curve (similar to the one above) to the number of confirmed cases for each country. This may help us predict if a country has hit their inflection point, and therefore when they will reach a possible maximum number of confirmed cases.

**The growth factor tells us the curvature of the data.** If we take our data and take the 2nd derivative, basically all it is telling us is whether the cases are growing at an accelerating or decelerating rate. From calculus you may remember we use the 2nd derivative test to test for concavity and find saddle points. The inflection point is where the curve changes concavity. We can look at these growth metrics

The bigger picture will be to correlate this with preventative efforts such as quarentines, closing of schools, etc. It will also be interesting to see growth factor as a feature in a ML model.

—

Work in process, initially forked from [dferhadi](https://www.kaggle.com/dferhadi/covid-19-predictions-growth-factor-and-calculus)

# The Analysis

<h2>Setting up</h2>

In [None]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

import os
# {dirname for dirname, _, filenames in os.walk('/kaggle/input') for filename in filenames}

In [None]:
dir_location = '/kaggle/input/novel-corona-virus-2019-dataset/'
%ls {dir_location}

In [None]:
global_data = pd.read_csv(dir_location+'covid_19_data.csv',
                          skiprows=1,
                          names=['ObservationDate', 'Province_State', 'Country_Region',\
                                 'LastUpdate','Confirmed','Deaths', 'Recovered'])

In [None]:
global_data.sample(5)

<h2>Defintions</h2>

Defining Classes and functions

In [None]:
class CountryAnalisys:
    def __init__(self, df, country_name):
        self.country_name = country_name
        self.status = "active"
        query = 'Country_Region'+("!= '0'", "== '{}'")[self.country_name.lower() != 'world']
        self.data = global_data.query(query.format(self.country_name))
        
   ## Prints 
    def __repr__(self):
        return "Country name {cn}. Count cases: {tc}.".format(cn=self.country_name,
                                                         tc=self.status)

    def __str__(self):
        return ("Country: {cn}\
                \nTotal cases: {tc}\
                \n".format(
            cn=self.country_name,
            tc=len(self.data))
    )
class helper_fns:
    def smoother(inputdata,w=0.5,imax=5):
        data = 1.0*inputdata
        data = pd.DataFrame(np.nan_to_num(data))
        smoothed = 1.0*data
        normalization = 1
        for i in range(-imax,imax+1):
            if i==0:
                continue
            smoothed += (w**abs(i))*data.shift(i,axis=0)
            normalization += w**abs(i)
        smoothed /= normalization
        return smoothed
    # function to compute growth factor
    def growth_factor(factor_series):
        factor_iminus1 = factor_series.shift(1, axis=0)
        factor_iminus2 = factor_series.shift(2, axis=0)
        return (factor_series-factor_iminus1)/(factor_iminus1-factor_iminus2)
    #function to compute growth ratio
    def growth_ratio(confirmed):
        confirmed_iminus1 = confirmed.shift(1, axis=0)
        return (confirmed/confirmed_iminus1)
class ChangeRates(CountryAnalisys):
    def __init__(self, df, country_name):
        super().__init__(df, country_name)
        data = self.data.assign(activeCases=lambda x: x.Confirmed - x.Deaths - x.Recovered)
        self.data_agg = pd.pivot_table(data,
                        values=['activeCases','Confirmed', 'Recovered','Deaths'],
                        index=['ObservationDate'],
                        aggfunc=np.sum)
        self.dderivative = helper_fns.smoother(np.gradient(np.gradient(self.data_agg.Confirmed)),0.5,7)
        self.growthFactor= helper_fns.smoother(helper_fns.growth_factor(self.data_agg.Confirmed), w=0.5, imax=5)
        self.growthRatio = helper_fns.smoother(helper_fns.growth_ratio(self.data_agg.Confirmed), w=0.5, imax=5)  # confirmed[i]/confirmed[i-1]
        self.growthRate  = helper_fns.smoother(np.gradient(np.log(self.data_agg.Confirmed)),0.5,3)  # Same, but logaritmic

In [None]:
def country_plot(data, country_name):
  """
      Plots the graphs of change for the selected country:
      Parameters
        ----------
        data : dataframe
            Expecting a dataframe from the Novel-Covid-19

        country_name : str, default=None
            Expects a country name from the unique values of the countries in the dataframe.
            If None is passed it will consider the world.
      Returns
        -------
            grid_plot
                Plots the growth rate, ration, 
      Examples
        --------
        >>> country_plot(DataFrame, 'column_title')
  """
  try:
    country_data_ob = ChangeRates(data, country_name)
  except NameError:
    country_data_ob = ChangeRates(data)
  fig3 = plt.figure(constrained_layout=False, figsize=(20,14));
  gs = fig3.add_gridspec(3, 2)
  fig3.suptitle(country_name)
  fig3.tight_layout()

  f3_ax1 = fig3.add_subplot(gs[2:, :])
  f3_ax1.set_title('Reports in {}'.format(country_name))
  plt.plot(country_data_ob.data_agg[['Confirmed','activeCases', 'Recovered','Deaths']])
  plt.xticks(rotation=90)  ##
  plt.legend(['Confirmed','activeCases', 'Recovered','Deaths'])
  f3_ax2 = fig3.add_subplot(gs[0, 0]);
  f3_ax2.set_title('Growth Factor');
  plt.plot(country_data_ob.growthFactor);
  f3_ax5 = fig3.add_subplot(gs[0, 1]);
  f3_ax5.set_title('Growth Rate');
  plt.plot(country_data_ob.growthRate);
  f3_ax3 = fig3.add_subplot(gs[1, 0]);
  f3_ax3.set_title('2nd derivative');
  plt.plot(country_data_ob.dderivative);
  f3_ax4 = fig3.add_subplot(gs[1, 1]);
  f3_ax4.set_title('Growth Ration');
  plt.plot(country_data_ob.growthRatio)

<h2>Viz</h2>

In [None]:
country_plot(global_data, 'World')

In [None]:
country_plot(global_data, 'US')

In [None]:
country_plot(global_data, 'Mainland China')

In [None]:
country_plot(global_data, 'Italy')

In [None]:
country_plot(global_data, 'Spain')

In [None]:
country_plot(global_data, 'Mexico')

In [None]:
country_plot(global_data, 'Canada')