# Overview

### **Context**
###### Coronaviruses are a large family of viruses which may cause illness in animals or humans. In humans, several coronaviruses are known to cause respiratory infections ranging from the common cold to more severe diseases such as Middle East Respiratory Syndrome (MERS) and Severe Acute Respiratory Syndrome (SARS). The most recently discovered coronavirus causes coronavirus disease COVID-19 - World Health Organization

###### The number of new cases are increasing day by day around the world. This dataset has information from the states and union territories of India at daily level.

### **I use Time series analysis to understand the data better and to answer many questions which may arise.**

## So what is Time Series?

###   1. **A time series is a series of data points indexed (or listed or graphed) in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time. Thus it is a sequence of discrete-time data.**
###   2. **An observed time series can be decomposed into three components:** 
       * the trend (long term direction)
       * the seasonal (systematic, calendar related movements) 
       * the irregular (unsystematic, short term fluctuations).
###   3. **Time series analysis is a statistical technique that deals with time series data, or trend analysis. Time series data means that data is in a series of particular time periods or intervals.** 
       
## **How to do a time series analysis?**

#### Step 1: Visualize the Time Series. )It is essential to analyze the trends prior to building any kind of time series model)
#### Step 2: Stationarize/Decompose the Series
#### Step 3: Find Optimal Parameters
#### Step 4: Build ARIMA Model
#### Step 5: Make Predictions

# Import libraries & Read file 

In [None]:
import warnings
import itertools
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
warnings.filterwarnings("ignore")
plt.style.use('fivethirtyeight')
import pandas as pd
import statsmodels.api as sm
import matplotlib
matplotlib.rcParams['axes.labelsize'] = 14
matplotlib.rcParams['xtick.labelsize'] = 12
matplotlib.rcParams['ytick.labelsize'] = 12
matplotlib.rcParams['text.color'] = 'k'




import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

In [None]:
df = pd.read_csv("../input/covid19-in-india/covid_19_india.csv")

# Insights of Data

In [None]:
df.head()

In [None]:
try :
    df.drop('Unnamed: 9',axis=1,inplace= True)
    df.shape
except : 
    print('Done')

In [None]:
df.isnull().sum()

In [None]:
df.info()

In [None]:
df.describe()

In [None]:
df.corr()

In [None]:
print(df['Deaths'].unique())

In [None]:
try :
    a=[]
    for i in df['Deaths'].values:
        if '\xa0' in i:
            a.append(int(i.replace("\xa0", '')))
        else:
            a.append(int(i))

    df['Deaths'] = a
    len(a)
except:
    pass


In [None]:
print(df['Deaths'].unique())

In [None]:
df['Deaths'].astype('int64')

In [None]:
df['State/UnionTerritory'].unique()

In [None]:
len(df['State/UnionTerritory'].unique())

In [None]:
def drop_star(df):
    for i in df['State/UnionTerritory'].iteritems():
        if i[1][-3:] == "***":
            df.drop(i[0],inplace=True)
        
drop_star(df)
df['State/UnionTerritory'].unique()

In [None]:
len(df['State/UnionTerritory'].unique())

# Data Visualizations

In [None]:
df['Cured'].plot(alpha=0.8)
df['Deaths'].plot(alpha=0.3)
df['Confirmed'].plot(alpha=0.5)
plt.show()

In [None]:
df.groupby('State/UnionTerritory')['Confirmed'].plot()
plt.show()
df.groupby('State/UnionTerritory')['Deaths'].plot()
plt.show()

In [None]:
df['Datetime'] = df['Date']+' '+df['Time']

* adding both columns for easy time series analysis

In [None]:
l = df.groupby('State/UnionTerritory')
current = l.last()

In [None]:
fig ,ax = plt.subplots(figsize= (12,8))
plt.title('Top 10 Contaminated States')
current = current.sort_values("Confirmed",ascending=False)[:10]
p = sns.barplot(ax=ax,x= current.index,y=current['Confirmed'])
p.set_xticklabels(labels = current.index,rotation=90)
p.set_yticklabels(labels=(p.get_yticks()*1).astype(int))
plt.show()

* ### Maharashtra being the most contaminated state followed byKarnataka and Andhra Pradesh with approximately equal cases. 

In [None]:
l = df.groupby('State/UnionTerritory')
current = l.last()
current = current.sort_values("Confirmed",ascending=False)

In [None]:
df['Date'].min(), df['Date'].max()

# Time Series Analysis For RAJASTHAN State


In [None]:
Raj = df.loc[df['State/UnionTerritory'] == 'Rajasthan']
Raj.head()

In [None]:
Raj.shape

* Checking the data for any null/ missing value

In [None]:
Raj.isnull().sum()

In [None]:
Raj.columns

### - Dropping all other columns and using only "Date+Time and Confirmed Cases"

In [None]:
cols=['Sno', 'Time', 'State/UnionTerritory',
       'ConfirmedIndianNational', 'ConfirmedForeignNational', 'Cured',
       'Deaths']
Raj['Date'] = Raj['Date']+' '+Raj['Time']
Raj.drop(cols, axis=1, inplace=True)
Raj= Raj.sort_values('Date')
Raj.isnull().sum()

In [None]:
Raj.head()

In [None]:
Raj.index

### - The initial index is Sr.no so lets change it to ****" Date "****.

In [None]:
Raj = Raj.groupby('Date')['Confirmed'].sum().reset_index()


In [None]:
Raj = Raj.set_index('Date')
Raj.index = pd.to_datetime(Raj.index)
Raj.index

### - Resampling with 'W' means we are taking the weekly data from the whole time period. (Every Sunday)

In [None]:
y = Raj['Confirmed'].resample('W').mean()

In [None]:
y.index

In [None]:
y.fillna(method='ffill',inplace=True)
y['2020':]

In [None]:
Raj.plot(figsize=(16, 6))
plt.show()

### - The above is initial graph showing the increasing trend and seasonality in the data.

### Now lets plot the Decomposition Plot which shows :
   - orignal data
   - Trend in the data
   - Seasonality 
   - Residual 
     

In [None]:
from pylab import rcParams
rcParams['figure.figsize'] = 18, 8
decomposition = sm.tsa.seasonal_decompose(y, freq = 20, model='additive')
fig = decomposition.plot()
plt.show()

### But why do we decompose time series?
###### When we decompose a time series into components, we usually combine the trend and cycle into a single trend-cycle component (sometimes called the trend for simplicity). Often this is done to help improve understanding of the time series, but it can also be used to improve forecast accuracy.

### Types of decomposition :
   - Multiplicative : The components multiply together to make the time series. If you have an increasing trend, the amplitude of seasonal activity increases. Everything becomes more exaggerated.
   - Addative : In an additive time series, the components add together to make the time series.
   
(Here we used Addative)

In [None]:
p = d = q = range(0, 2)
pdq = list(itertools.product(p, d, q))
seasonal_pdq = [(x[0], x[1], x[2], 12) for x in list(itertools.product(p, d, q))]
print('Examples of parameter combinations for Seasonal ARIMA...')
print('SARIMAX: {} x {}'.format(pdq[1], seasonal_pdq[1]))
print('SARIMAX: {} x {}'.format(pdq[1], seasonal_pdq[2]))
print('SARIMAX: {} x {}'.format(pdq[2], seasonal_pdq[3]))
print('SARIMAX: {} x {}'.format(pdq[2], seasonal_pdq[4]))

(We used SARIMAX)
### -> Seasonal AutoRegressive Integrated Moving Averages
#### One of the methods available in Python to model and predict future points of a time series is known as SARIMAX, which stands for Seasonal AutoRegressive Integrated Moving Averages with eXogenous regressors

### -> What does an Arima model do?
#### Autoregressive Integrated Moving Average Model. An ARIMA model is a class of statistical models for analyzing and forecasting time series data. It explicitly caters to a suite of standard structures in time series data, and as such provides a simple yet powerful method for making skillful time series forecasts.

### -> How to select perfect ARIMA model
#### Rules for identifying ARIMA models. General seasonal models: ARIMA (0,1,1)x(0,1,1) etc. Identifying the order of differencing and the constant: If the series has positive autocorrelations out to a high number of lags (say, 10 or more), then it probably needs a higher order of differencing.

In [None]:
for param in pdq:
    for param_seasonal in seasonal_pdq:
        try:
            mod = sm.tsa.statespace.SARIMAX(y,
                                            order=param,
                                            seasonal_order=param_seasonal,
                                            enforce_stationarity=False,
                                            enforce_invertibility=False)
            results = mod.fit()
            print('ARIMA{}x{}7 - AIC:{}'.format(param, param_seasonal, results.aic))
        except:
            continue

### -> We choose the one with lowest AIC value from above. In this case we have => ARIMA(0, 1, 1)x(1, 1, 1, 12)7 - AIC:588.9188045652764

In [None]:
mod = sm.tsa.statespace.SARIMAX(y,
                                order=(0, 1, 1),
                                seasonal_order=(1, 1, 1, 12),
                                enforce_stationarity=False,
                                enforce_invertibility=False)
results = mod.fit()

### -> Plot on the training data to check how well our model is predicting.

In [None]:
pred = results.get_prediction(start=pd.to_datetime('2020-08-02'), dynamic=False)
pred_ci = pred.conf_int()
ax = y['2020':].plot(label='observed')
pred.predicted_mean.plot(ax=ax, label='One-step ahead Forecast', alpha=.7, figsize=(14, 7))
ax.fill_between(pred_ci.index,
                pred_ci.iloc[:, 0],
                pred_ci.iloc[:, 1], color='k', alpha=.2)
ax.set_xlabel('Date')
ax.set_ylabel('Confirmed Cases')
plt.legend()
plt.show()

### -> Graph showing predicted trends for the next 50 steps.

In [None]:
pred_uc = results.get_forecast(steps=50)
pred_ci = pred_uc.conf_int()
ax = y.plot(label='observed', figsize=(14, 7))
pred_uc.predicted_mean.plot(ax=ax, label='Forecast')
ax.fill_between(pred_ci.index,
                pred_ci.iloc[:, 0],
                pred_ci.iloc[:, 1], color='k', alpha=.25)
ax.set_xlabel('Date')
ax.set_ylabel('Confirmed Cases')
plt.legend()
plt.show()

# Thank You !!

## Drop an UpVote if you liked the Kernel :)