## Goal

* In this kernel, I try to fit actual CoViD-19 data (by country) on the SIR-D model.

* I use the Tree Parzen Estimator (TPE) algorithm, to calculate $\beta$, $\gamma$ and $\delta$.

* For an explanation of SIR(-D) model, play around with this interactive kernel: [Simple SIR w/ equation solver](https://www.kaggle.com/hotessy/for-beginners-simple-sir-w-equation-solver)

* Fork a version of this script, [SIR Solver Hyperopt](https://www.kaggle.com/hotessy/sir-solver-hyperopt/), and add it as a utility script. The `Learner` class is implemented there.

In [None]:
import numpy as np 
import pandas as pd 
from pathlib import Path

import cufflinks as cf

## (from https://www.kaggle.com/hotessy/sir-solver-hyperopt/)
from sir_solver_hyperopt import Learner

In [None]:
pd.set_option('display.max_rows', 500)
pd.set_option('use_inf_as_na', True)
cf.set_config_file(offline=True, theme='solar');

## Reading CoViD-19 data

In [None]:
path = Path("../input/novel-corona-virus-2019-dataset/")

In [None]:
recovered_df = (pd.read_csv(path/'time_series_covid_19_recovered.csv')
                .drop(columns=['Lat', 'Long'])
                .groupby('Country/Region')
                .sum())

deaths_df = (pd.read_csv(path/'time_series_covid_19_deaths.csv')
             .drop(columns=['Lat', 'Long'])
             .groupby('Country/Region')
             .sum())

confirmed_df = (pd.read_csv(path/'time_series_covid_19_confirmed.csv')
                .drop(columns=['Lat', 'Long'])
                .groupby('Country/Region')
                .sum())

## Creating a `Learner` Instance 

From the script [sir_solver_hyperopt](https://www.kaggle.com/hotessy/sir-solver-hyperopt/)

In [None]:
class MyLearner(Learner):
    
    def __init__(self, country):
        self.first_case_date = (confirmed_df.filter(items=[country], axis=0).iloc[0] > 0).idxmax()
        super().__init__(country)
    
    def load_confirmed(self, country):
        return confirmed_df.filter(items=[country], axis=0).iloc[0][self.first_case_date:]

    def load_recovered(self, country):
        return recovered_df.filter(items=[country], axis=0).iloc[0][self.first_case_date:]

    def load_dead(self, country):
        return deaths_df.filter(items=[country], axis=0).iloc[0][self.first_case_date:]

In [None]:
learner = MyLearner('India')

In [None]:
pd.DataFrame(data=[learner.infected, learner.recovered, learner.dead]).T.head()

#### Considering Day 0 as the date when first infected patient was reported

Maximum of 14 days is considered because experts say, the incubation time of the virus is between 1 to 14 days (5 days on average).

In [None]:
first_case_date = (learner.infected > 0).idxmax()
first_recovery_date = (learner.recovered > 0).idxmax()

i_0 = learner.infected[first_case_date:].rolling(14).max().dropna()[0]
r_0 = learner.recovered[first_recovery_date:].rolling(14).max().dropna()[0]
d_0 = learner.dead[first_recovery_date:].rolling(14).max().dropna()[0]

print(i_0, r_0 + d_0)

## Training

**Note**: This will take considerable time depending on the country and `max_evals`. It is better to commit the notebook and save the output as CSV

In [None]:
learner.train(s_0=1e4, i_0=1, r_0=1, weight=0.3, max_evals=50)

In [None]:
print(learner.country.upper())
print(f"Γ = {learner.Γ}")
print(f"β (standardised) = {learner.β}")
print(f"Reproduction Rate (standardised) = {learner.β/learner.Γ}")

In [None]:
fig, data = learner.plot()

## Output

In [None]:
data.to_csv(f'{learner.country}.csv')

In [None]:
fig.show()

## References:

1. Theory:
    1. https://www.maa.org/book/export/html/115606
    2. https://kingaa.github.io/clim-dis/parest/parest.html
    3. https://web.stanford.edu/~jhj1/teachingdocs/Jones-on-R0.pdf
    4. https://www.nature.com/articles/srep46076.pdf
    5. https://www.math.uzh.ch/li/index.php?file&key1=41327
    
    
2. Code:
    1. https://www.lewuathe.com/covid-19-dynamics-with-sir-model.html
    2. https://towardsdatascience.com/infection-modeling-part-1-87e74645568a