#  Covid-19 Prediction Model

------------------------------------------------

In [16]:
import altair as alt
import pandas as pd
import os
import numpy as np 
import matplotlib.pyplot as plt
import sklearn
import requests

from datetime import datetime, timedelta
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score
from scipy.optimize import curve_fit

In [None]:
res = requests.get('https://coronavirus-tracker-api.herokuapp.com/confirmed')
res.status_code

Used API from coronavirus-tracker-api.herokuapp.com using the get function to receive the data. And then converted it into json format.

In [None]:
resjson = res.json()
#resjson

In [None]:
#resjson['locations']

Indexed through the nested dictionaries to find the key 'history' to extract the dictionary of all the dates and confirmed cases. 

Then defined a function so that we can reuse it for any other country.

In [None]:
def country(country):
    b = resjson['locations']
    for a in b:
            if a['country'] == country :
                print(a)
                name = a['country']
                latest = a['latest']
                history = a['history']
    return history


In [None]:
history = country('Nepal')

In [None]:
#print(history)

In [None]:
#wc = pd.DataFrame(list(history.items()), columns=['date','confirmed'])
#wc.head()

Made a dataframe and then added a column by using map function.

The datetime module was used to convert the date intoappropriate timeseries format.

In [None]:
#wc['days']= wc['date'].map(lambda x : (datetime.strptime(x, '%m/%d/%y') - datetime.strptime("1/22/20", '%m/%d/%y')).days  )
#wc[['date','days','confirmed']]

In [None]:
def maketable(history): #put country('country_name') as the parameter
    wc = pd.DataFrame(list(history.items()), columns=['date','confirmed'])
    wc['days']= wc['date'].map(lambda x : (datetime.strptime(x, '%m/%d/%y') - datetime.strptime("1/22/20", '%m/%d/%y')).days  )
    return wc
wc = maketable(country('Nepal'))


------------------------------------------------

### The Gompertz curve or Gompertz function

The Gompertz curve or Gompertz function, is a type of mathematical model for a time series and is named after Benjamin Gompertz (1779-1865). It is a sigmoid function which describes growth as being slowest at the start and end of a given time period. The right-hand or future value asymptote of the function is approached much more gradually by the curve than the left-hand or lower valued asymptote. This is in contrast to the simple logistic function in which both asymptotes are approached by the curve symmetrically. It is a special case of the generalised logistic function. The function was originally designed to describe human mortality, but since has been modified to be applied in biology, with regard to detailing populations. 

"![Keywords-Ads-Landing pages flow](11.png)"

where,

    a is an asymptote, since
   

"![Keywords-Ads-Landing pages flow](22.png)"

    b sets the displacement along the x-axis (translates the graph to the left or right). Symmetry is when b =log(2).
    c sets the growth rate (y scaling)
    e is Euler's Number (e = 2.71828...)

------------------------------------------------

In [None]:
def gompertz(a, c, t, t_0):
    Q = a * np.exp(-np.exp(-c*(t-t_0)))
    return Q

x = list(wc['days'])
y = list(wc['confirmed'])


x_train, x_test, y_train, y_test = train_test_split(x,y,train_size=0.85, test_size=0.15, shuffle=False)

x_test_added = x_test + list(range((max(x_test)+1), 200))

popt, pcov = curve_fit(gompertz, x_train, y_train, method='trf', bounds=([5, 0, 0],[30*max(y_train),0.15, 245]))
a, estimated_c, estimated_t_0 = popt
y_pred = gompertz(a, estimated_c, x_train+x_test_added, estimated_t_0)

y_pred

In [None]:
plt.plot(x_train+x_test_added, y_pred, linewidth=2, label='predicted') 
plt.plot(x, y, linewidth=2, color='r', linestyle='dotted', label='confirmed')
plt.title('prediction vs confirmed data on covid-19 cases in Nepal\n')
plt.xlabel('days since January 22 2020')
plt.ylabel('confirmed positive cases')
plt.legend(loc='upper left') 

#### After fitting a curve, we see a projection of confirmed cases for 200 days since late January. We can obsereve how the trend changes over time and we can note that the we are approaching the peak of the curve and curve is expected to flatten within the next 50 days. The maximum cases is projected to be aprroximately 1500. 

-----------------------------------------------------------------------------------------------------------------

##### After this we similarly train a model to predict the covid cases fot US.

In [None]:
us = maketable(country("US"))

In [None]:
x = list(us['days'])
y = list(us['confirmed'])


x_train, x_test, y_train, y_test = train_test_split(x,y,train_size=0.85, test_size=0.15, shuffle=False)

x_test_added = x_test + list(range((max(x_test)+1), 225))

popt, pcov = curve_fit(gompertz, x_train, y_train, method='trf', bounds=([1500000, 0, 0],[2*max(y_train),0.1, 175]))
a, estimated_c, estimated_t_0 = popt
y_pred = gompertz(a, estimated_c, x_train+x_test_added, estimated_t_0)

y_pred

In [None]:
plt.plot(x_train+x_test_added, y_pred, linewidth=2, label='predicted', color = 'y', ) 
plt.plot(x, y, linewidth=2, color='r', linestyle='dotted', label='confirmed')
plt.title('prediction vs confirmed data on covid-19 cases in US\n')
plt.xlabel('days since January 22 2020')
plt.ylabel('confirmed positive cases')
plt.legend(loc='upper left') 

#### We trained a model to predict the covid cases fot US andplotted the curve. We can see that the predicted number of cases is approximately 2 million, 250000 additional cases than the current cases.  
#### We can obsereve how the trend changes over time and we can note that the curve is flattening and is expected to completely flatten within the next 100 days. 

----------------------------------------------------------------

### Disscussion:

We were able to create a model for the prediction of Covid-19 cases using Gompertz function and also observe how the trend is changing and is expected to change over time, how and when the curve is flattening. Although the prediction is approximate and there probably is many complex variable involved, but it does give us a idea of trend that is very close to accurate.

Going furthur we can easily create a prediction model for any other country. We can simply enter the country's name in the function : country("country's name") to extract the data we need for that country.

I plan to continue improving by adding a logistic function in addition to Gompertz function and comparing them side by side.

----------------------------------------------------------------------------------------------