### In this notebook, I will try to forecast the Covid-19 cases of India. I am going to use a Curve Fittiing based method to predict number of cases.

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import warnings
warnings.filterwarnings('ignore')
import matplotlib.pyplot as plt
plt.style.use('ggplot')
# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 5GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

I will predict 2 things : Cummlative Total Cases and Daily Cases<br/>
* Cummulative Total Cases : To predict total case, I would try to fit a ***Sigmoid*** like curve on the avalbale time series data  
**Sigmoid** like Curve : $$ f_{a,b,c}(x) =  \frac{a}{\mathrm{1} + be^{-cx+d} }  $$ 
* Daily Cases : To predict daily case, I would try to fit a ***Gaussian*** like curve on the avalbale time series data  
**Gaussian** like Curve : $$ g_{a,b,c,d}(x) =  \frac{a}{c}{e^{-\frac{(x-b)^2}{c^2}}}  $$ 

In [None]:
def sigmoid(x,a,b,c,d):
    return a/(1+b*np.exp(-c*x+d))

def gaussian(x,a,b,c):
    return (a/c)*np.exp(-(x-b)*(x-b)/(c*c))

* So in case of Sigmoid we need to estimate parameters $a,b,c,d$ by curve fitting  
* and in case of Gaussian we need to estimate parameters $a,b,c$ by curve fitting   

Now let us first have a look at the number of cases as a time series data

In [None]:
nld_df = pd.read_csv('/kaggle/input/covid19-corona-virus-india-dataset/nation_level_daily.csv')
yt = nld_df['totalconfirmed'].values
yd = nld_df['dailyconfirmed'].values
X = np.arange(0,yt.shape[0])+1
plt.figure(dpi=150)
plt.subplot(121)
plt.title('Total Cases: India')
plt.bar(X,yt)
plt.xlabel('Days')
plt.ylabel('Cases')
plt.subplot(122)
plt.title('Daily Cases: India')
plt.bar(X,yd)
plt.xlabel('Days')
plt.ylabel('Cases')

So, we will now fit Sigmoid like curve on Total cases and Gaussain like Curve for Daily cases  
We are using *Scipy curve_fit* method for fitting the curve on the above plotted data 

In [None]:
from scipy.optimize import curve_fit

def cfplot(func,X,y,p0,T,dpi,title):
    popt,_ = curve_fit(f=func,xdata=X,ydata=y,p0=p0)
    y_pred = []
    for x in range(1,T+1):
        y_pred.append(func(x,*popt))
    x_ax1 = np.arange(len(y_pred))
    plt.figure(dpi=dpi)
    plt.plot(x_ax1,np.array(y_pred),color='blue',label='Prediction')
    x_ax2 = np.arange(y.shape[0])
    plt.bar(x_ax2,y,width=0.8,color='red',label='Actual')
    plt.xlabel('Days')
    plt.ylabel('Cases')
    ##plt.axvline(x=x_ax2[-1],color='orange')
    plt.legend()
    plt.title(title)
    loss = int((1/y.shape[0])*np.sum((y-np.array(y_pred[:y.shape[0]]))**2))
    xlim = plt.gca().get_xlim()
    ylim = plt.gca().get_ylim()
    plt.text(0.6*xlim[1],0.5*ylim[1],str('MSE : '+str(loss)))
    plt.show()
    return popt

## Total Confirmed Cases : India

In [None]:
nld_df = pd.read_csv('/kaggle/input/covid19-corona-virus-india-dataset/nation_level_daily.csv')
y = nld_df['totalconfirmed'].values
X = np.arange(0,y.shape[0])+1
popt = cfplot(sigmoid,X,y,[1,1,1,1],400,125,'India Total Cases')

### As we can see from the above prediction curve, Total Cases in India will attain peak around : <span style="color:green">Day 175</span>
### And Total Confirmed cases is preidicted to around : <span style="color:red">2,80,000</span>

## Daily Confirmed Cases : India

In [None]:
y = nld_df['dailyconfirmed'].values
X = np.arange(0,y.shape[0])
popt = cfplot(gaussian,X,y,[100,100,100],400,125,'India Daily Cases')

### As we can see from the above prediction curve, Maximum Daily Cases in India will attain peak at around :  <span style="color:green">Day 170</span>
### And Maximum Daily Confirmed cases is preidicted to be around : <span style="color:red">18,000</span>
----

## State Wise Forecast

In [None]:
sld_df = pd.read_csv('/kaggle/input/covid19-corona-virus-india-dataset/complete.csv')
mhd_df = sld_df[sld_df['Name of State / UT']=='Maharashtra']
gujd_df = sld_df[sld_df['Name of State / UT']=='Gujarat']
tnd_df = sld_df[sld_df['Name of State / UT']=='Tamil Nadu']
wbd_df = sld_df[sld_df['Name of State / UT']=='West Bengal']
df_list = [mhd_df,gujd_df,tnd_df,wbd_df]
st_name = ['Maharashtra','Gujarat','Tamil Nadu','West Bengal']
for i,s in enumerate(df_list): 
    y = np.array(s['Total Confirmed cases'])
    X = np.arange(0,y.shape[0])+1
    popt=cfplot(sigmoid,X,y,[1,1,1,1],120,75,str('Total Cases: '+st_name[i]))