# Time Series Analysis Covid19 Cases-ARIMA Model

The first case of the COVID-19 pandemic in Kerala (which was also the first confirmed case in all of India) was confirmed in Thrissur on 30 January 2020. On 12 May 2021 Kerala reported the largest single day spike with 43,529 new cases. As of 26 July 2021, there have been 32,83,116 confirmed cases, test positivity rate is at 10.59% (12.46% cumulative), (thrice the national average) with 31,29,628 (95.32%) recoveries and 16,170 deaths in the state.

Doing **Time Series Analysis** on [Latest Covid-19 Confirmed Cases Kerala](http://www.kaggle.com/anandhuh/covid19-confirmed-cases-kerala) Dataset, using **ARIMA** model, I will predict Covid-19 confirmed cases for next 2 weeks

## Importing Libraries

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

from matplotlib.pylab import rcParams
rcParams['figure.figsize'] = 14,7

import warnings
warnings.filterwarnings('ignore')

## Reading dataset

In [None]:
# loading data into dataframe
df = pd.read_csv('../input/covid19-confirmed-cases-kerala/Covid 19 Confirmed Cases-Kerala.csv')

In [None]:
#parse strings to datetime type
df['Date']=pd.to_datetime(df['Date'], infer_datetime_format=True)

# set Date column as index
df = df.set_index('Date')

In [None]:
# first five rows of dataframe
df.head()

In [None]:
# last 5 rows of dataframe
df.tail()

## Exploratory Data Analysis

In [None]:
# column names
df.columns

In [None]:
# shape of the dataframe
df.shape

In [None]:
# concise summary of dataframe
df.info()

In [None]:
# checking for null values
df.isnull().sum()

In [None]:
# descriptive statistics of data
df.describe()

In [None]:
# plot graph

plt.xlabel('Dates')
plt.ylabel('Confirmed Cases')
plt.title('Date vs Confirmed Cases')
plt.plot(df, color='b')

## Determining Rolling Statistics

In [None]:
# rolling mean
rolmean = df.rolling(window=3).mean()
rolmean.head()

In [None]:
# rolling standard deviation
rolstd = df.rolling(window=3).std()
rolstd.head()

In [None]:
# plotting rolling statistics
org = plt.plot(df, color='b', label='Original')
mean = plt.plot(rolmean, color='r', label='Rolling Mean')
std = plt.plot(rolstd, color='black', label='Rolling Std')
plt.legend(loc='best')
plt.title('Rolling Mean & Standard Deviation')
plt.show(block=False)

## Dickey-Fuller Test

In [None]:
# perform Dickey-Fuller test
from statsmodels.tsa.stattools import adfuller

print('Results of Dickey-Fuller Test')

dftest = adfuller(df.Confirmed, autolag='AIC')

dfoutput = pd.Series(dftest[0:4], index=['Test Statistic','p value','#Lags used', 'No:of observations found'])
for key, value in dftest[4].items():
    dfoutput['Critical Value (%s)'%key] = value
    
print(dfoutput)

In [None]:
# Wrapping visual and statistical tools in a single function

def test_stationarity(timeseries):
    
    # Determining rolling statistics
    rolmean = timeseries.rolling(window=3).mean()
    rolstd = timeseries.rolling(window=3).std()
    
    # Plot rolling statistics
    org = plt.plot(timeseries, color='b', label='Original')
    mean = plt.plot(rolmean, color='r', label='Rolling Mean')
    std = plt.plot(rolstd, color='black', label='Rolling Std')
    plt.legend(loc='best')
    plt.title('Rolling Mean & Standard Deviation')
    plt.show(block=False)
    
    # perform Dickey-Fuller test
    print('Results of Dickey-Fuller Test')
    dftest = adfuller(timeseries.Confirmed, autolag='AIC')
    dfoutput = pd.Series(dftest[0:4], index=['Test Statistic','p value','#Lags used', 'No:of observations found'])
    for key, value in dftest[4].items():
        dfoutput['Critical Value (%s)'%key] = value    
    print(dfoutput)

In [None]:
test_stationarity(df)

- Since **Test statistic > Critical value(5%)** (or since p value > 0.05), data is **not stationary**

## Converting non stationary data to stationary data

#### Differencing simple moving average

In [None]:
movingAverage = df.rolling(window=3).mean()
df_minus_movingAverage = df - movingAverage
df_minus_movingAverage.head(7)

In [None]:
# dropping nan values
df_minus_movingAverage.dropna(inplace=True)
df_minus_movingAverage.head()

In [None]:
# dropping nan values
movingAverage.dropna(inplace=True)
movingAverage

In [None]:
test_stationarity(df_minus_movingAverage)

- Since **p value < 0.05**, data is now **stationary**

In [None]:
#checking lags

from statsmodels.tsa.stattools import arma_order_select_ic
arma_order_select_ic(df_minus_movingAverage)

## ARIMA Model

In [None]:
from statsmodels.tsa.arima_model import ARIMA
# ARIMA Model

print('Plotting ARIMA Model')
model = ARIMA(df_minus_movingAverage, order=(4,0,2))
results_ARIMA = model.fit(disp=1)
plt.plot(df_minus_movingAverage,color='b')
plt.plot(results_ARIMA.fittedvalues, color='r')
plt.title('RSS: {:1.4f}'.format(sum((results_ARIMA.fittedvalues-df['Confirmed'])**2)))
plt.show()

### Getting predictions

In [None]:
# to pandas series
pred_ARIMA_diff = pd.Series(results_ARIMA.fittedvalues, copy=True)
pred_ARIMA_diff

In [None]:
#to dataframe

pred_ARIMA = pred_ARIMA_diff.to_frame()
pred_ARIMA.tail()

In [None]:
# converting back to orginal by adding moving averages

model_values = pred_ARIMA[0] + movingAverage['Confirmed']
model_values.tail()

In [None]:
# plotting fitted model
plt.plot(df, color='b', label='Original')
plt.plot(model_values, color='r', label='Model')
plt.xlabel('Date')
plt.ylabel('Confirmed Cases')
plt.legend(loc='best')
plt.show(block=False)

In [None]:
# orginal data have 545 rows. Prediction for next 14 days
# 545 + 14 =559

results_ARIMA.plot_predict(1, 559)

In [None]:
# forecast values for next 2 weeks
fc=results_ARIMA.forecast(steps=14)
forecast = fc[0]
forecast

In [None]:
# new dataframe with last 3 rows from orginal dataframe
new_df = df.tail(3)
new_df

# converting forecast value to orginal scale
for fc in forecast:
    s = new_df.iloc[-2:].sum()
    value = ((3*fc)+s)/2
    new_df = new_df.append(value, ignore_index=True)
    
# printing first five rows of new dataframe
new_df.head()

In [None]:
# date range from July 29 to August 11 (2 week)
date=pd.date_range('2021-07-29','2021-08-11')
date

In [None]:
# forecast dataframe
forcast_df = pd.DataFrame({'Date':date,
                           'Confirmed':new_df['Confirmed'][3:]}) 

# set datatype to datetime format
forcast_df['Date']=pd.to_datetime(forcast_df['Date'], infer_datetime_format=True)

# set Date column as index
forcast_df = forcast_df.set_index('Date')

In [None]:
# Predicted Covid-19 Confirmed Cases for next 2 weeks
forcast_df

In [None]:
# Prediction Plotting

orgi = plt.plot(df, color='b', label='Original')
predi = plt.plot(forcast_df, color='r', label='Predicted')
plt.legend(loc='best')
plt.title('Covid Confirmed Cases Prediction')
plt.show(block=False)

## **Thank You**