# COVID PREDICTION USING ML

**Machine Learning**: Machine learning is the science of making computer learn and act like humans by feeding data and information without being explicitly programmed

**Types of Machine Learning**: 
1. Supervised - Situations based on labeled data fed to the machine
2. Unsupervised - Hidden pattern in an unlabeled data

**Algorithms**:
The method used to train the dataset and test it on testing dataset. eg. Linear regression, Random forest, Support vector machine(SVM) etc.

In this project we aim to predict the spread of COVID-19 in India vs Asia using Linear regression, SVM and Artificial neural networks and compare the algorithms.

## Code: 

***Importing important libraries***

In [1]:
import pandas as pd
import tensorflow as tf
from tensorflow import keras
import numpy as np

***Reading dataset***

In [2]:
covid=pd.read_csv('WHO.csv')

FileNotFoundError: [Errno 2] No such file or directory: '2021.csv'

***Extracting India's dataset for COVID19***

In [None]:
india_case=covid[covid["Country"]=="India"].copy()
index=np.arange(len(india_case))
india_case=india_case.set_index(index)
india_case.head()

***Defining a function to plot learning curves***

In [None]:
from sklearn.metrics import mean_squared_error
def plot_learning_curves(model, X, y):
    X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.3, random_state=10)
    train_errors, val_errors = [], []
    for m in range(1, len(X_train)):
        model.fit(X_train[:m], y_train[:m])
        y_train_predict = model.predict(X_train[:m])
        y_val_predict = model.predict(X_val)
        train_errors.append(mean_squared_error(y_train[:m], y_train_predict))
        val_errors.append(mean_squared_error(y_val, y_val_predict))

    plt.plot(np.sqrt(train_errors), color="#175F90", linewidth=1.8, label="train")
    plt.plot(np.sqrt(val_errors), "#5FBA83",linewidth=1.8, label="val")
    plt.legend(loc="upper right", fontsize=12)   
    plt.xlabel("Training set size", fontsize=12) 
    plt.ylabel("RMSE", fontsize=12)

***Defining a funtion to calculate Mean absolute percentage error***

In [None]:
def MAPE(y,y_pred):
    MAPE = np.mean(np.abs(np.array(y) -\
                      np.array(y_pred))/np.array(y))
    return MAPE

## India COVID Prediction

Using seaborn and matplot library for data visualisation

In [None]:
import seaborn as sns
import matplotlib
from matplotlib import pyplot as plt

In [None]:
#Plotting confirmed cases per day vs date
sns.set(rc={'figure.figsize':(15,10)})
sns.set(style="darkgrid")
sns.lineplot(x="date",y="total_cases",data=india_case)
plt.show()

In [None]:
sns.set(rc={'figure.figsize':(15,10)})
sns.lineplot(x="date",y="Cumulative_deaths",data=india_case)
plt.show()

We need the time where COVID was at its peak so refining data to get the desired months where COVID was at peak

In [None]:
india_case2=india_case[(india_case["date"]>="01-05-2020") & (india_case["date"]<"30-09-2020")]
india_case2.head()

Using sklearn library for machine learning model. To build a model the date must be in **'date'** format and not in **'string'** format.

In [None]:
import warnings
warnings.filterwarnings('ignore')
from sklearn.model_selection import train_test_split
import datetime as dt

To apply an algorithm a dataset must be divided into training and testing dataset. Using sklearn to split the data into 25% testing dataset and training dataset. 


### Linear regression Model

In [None]:
import warnings
import numpy as np
warnings.filterwarnings('ignore')
start=0
dateHI=[]
for i in range(0,len(india_case2.index)):
    dateHI.append(start+i)
india_case2['dates']=dateHI
india_x=np.array(dateHI)
india_y=india_case2['total_cases']
india_z=india_case2['Cumulative_deaths']

***Splitting data***

In [None]:
india_x_train,india_x_test,india_y_train,india_y_test=train_test_split(india_x,india_y,test_size=0.25)
india_x_train2,india_x_test2,india_z_train,india_z_test=train_test_split(india_x,india_z,test_size=0.25)

***Training and predicting***

In [None]:
from sklearn.linear_model import LinearRegression
lr=LinearRegression()
import numpy as np
lr.fit(np.array(india_x_train).reshape(-1,1),np.array(india_y_train).reshape(-1,1))
lr_india_y_pred=lr.predict(np.array(india_x_test).reshape(-1,1))
lr.fit(np.array(india_x_train2).reshape(-1,1),np.array(india_z_train).reshape(-1,1))
lr_india_z_pred=lr.predict(np.array(india_x_test2).reshape(-1,1))

***Plot***

In [None]:
f, axes = plt.subplots(1, 2)
lr_india_y_pred=np.array(lr_india_y_pred).reshape(-1)
lr_india_z_pred=np.array(lr_india_z_pred).reshape(-1)
sns.scatterplot(india_x_test,india_y_test,color='#A7A7A7',s=75,legend="brief",label="Actual",ax=axes[0]).set_title('Total Cases per day')
sns.lineplot(india_x_test,lr_india_y_pred,color="#A4D54E",linewidth=2.25,label="Predicted",ax=axes[0])
sns.scatterplot(india_x_test2,india_z_test,color='#A7A7A7',s=75,legend="brief",label="Actual",ax=axes[1]).set_title('Cumulative deaths per day')
sns.lineplot(india_x_test2,lr_india_z_pred,color="#F94350",linewidth=2.25,label="Predicted",ax=axes[1])

***Mean absolute Percentage Error***

In [None]:
print("MAPE of confirmed casses is ",MAPE(india_y_test,lr_india_y_pred)*100, " %")
print("Accuracy for confirmed cases is: ",(1-MAPE(india_y_test,lr_india_y_pred))*100,"\n")
print("MAPE of deaths is ",MAPE(india_z_test,lr_india_z_pred)*100, " %")
print("Accuracy for deaths is: ",(1-MAPE(india_z_test,lr_india_z_pred))*100)

***R square***

It can be seen that model is not good as the actual data seems to be an exponential and the fitting model is linear. Calculating R square tells how accurate a machine learning algorithm is.

In [None]:
from sklearn.metrics import r2_score
score_lr_india1=r2_score(india_y_test, lr_india_y_pred)*100
score_lr_india2=r2_score(india_z_test, lr_india_z_pred)*100
print('R square score for total cases=',score_lr_india1)
print('R square score for cumulative deaths=',score_lr_india2)

***Cross validation***

In [None]:
from sklearn.model_selection import cross_val_score
lr_scores_india1 = cross_val_score(lr, np.array(india_x_train).reshape(-1,1), np.array(india_y_train).reshape(-1,1), scoring='r2', cv=5)
print("Cross Validation scores for total cases :",lr_scores_india1)
print("Mean of cross validation scores for total cases= ",np.mean(lr_scores_india1),'\n')
lr_scores_india2 = cross_val_score(lr, np.array(india_x_train2).reshape(-1,1), np.array(india_z_train).reshape(-1,1), scoring='r2', cv=5)
print("Cross Validation scores for cumulative deaths :",lr_scores_india2)
print("Mean of cross validation scores for cumulative deaths= ",np.mean(lr_scores_india2))

***Learning Curve***

In [None]:
plot_learning_curves(lr,np.array(india_x_train).reshape(-1,1),np.array(india_y_train).reshape(-1,1))

### Polynomial Regression

***Adding polynomial feature***

In [None]:
from sklearn.preprocessing import PolynomialFeatures
poly=PolynomialFeatures(degree=4)
pr_x=poly.fit_transform(np.array(india_x_train).reshape(-1,1))
pr_x2=poly.fit_transform(np.array(india_x_train2).reshape(-1,1))

***Training the model and predicting***

In [None]:
Pr=LinearRegression()
Pr.fit(pr_x,np.array(india_y_train).reshape(-1,1))
pr_india1=Pr.predict(poly.fit_transform(np.array(india_x_test).reshape(-1,1)))
pr_pred1=np.array(pr_india1).reshape(-1)

In [None]:
Pr.fit(pr_x2,np.array(india_z_train).reshape(-1,1))
pr_india2=Pr.predict(poly.fit_transform(np.array(india_x_test2).reshape(-1,1)))
pr_pred2=np.array(pr_india2).reshape(-1)

***Plot***

In [None]:
f, axes = plt.subplots(1, 2)
sns.scatterplot(india_x_test,india_y_test,color='#A7A7A7',s=75,legend="brief",label="Actual",ax=axes[0]).set_title('Total cases per day')
sns.lineplot(india_x_test,y=pr_pred1,color="#A4D54E",linewidth=2.25,label="Predicted",ax=axes[0])
sns.scatterplot(india_x_test2,india_z_test,color='#A7A7A7',s=75,legend="brief",label="Actual",ax=axes[1]).set_title('Cumulative deaths per day')
sns.lineplot(india_x_test2,y=pr_pred2,color="#F94350",linewidth=2.25,label="Predicted")

***Mean absolute percentage error***

In [None]:
print("MAPE of confirmed casses is ",MAPE(india_y_test,pr_pred1)*100, " %")
print("Accuracy for confirmed cases is: ",(1-MAPE(india_y_test,pr_pred1))*100,"\n")
print("MAPE of deaths is ",MAPE(india_z_test,pr_pred2)*100, " %")
print("Accuracy for deaths is: ",(1-MAPE(india_z_test,pr_pred2))*100)

***R square***

In [None]:
score_pr_india1=r2_score(india_y_test, pr_pred1)*100
score_pr_india2=r2_score(india_z_test, pr_pred2)*100
print('R square score for total cases=',score_pr_india1)
print('R square score for cumulative deaths=',score_pr_india2)

***Cross Validation***

In [None]:
pr_scores_india1 = cross_val_score(Pr, pr_x, np.array(india_y_train).reshape(-1,1), scoring='r2', cv=5)
print("Cross Validation scores for total cases :",pr_scores_india1)
print("Mean of cross validation scores for total cases= ",np.mean(pr_scores_india1),'\n')
pr_scores_india2 = cross_val_score(Pr,pr_x2, np.array(india_z_train).reshape(-1,1), scoring='r2', cv=5)
print("Cross Validation scores for cumulative deaths :",pr_scores_india2)
print("Mean of cross validation scores for cumulative deaths= ",np.mean(pr_scores_india2))

***Learning curve***

In [None]:
plot_learning_curves(Pr,pr_x,np.array(india_y_train).reshape(-1,1))

</br>

### ARIMA Model

***Performing Augmented Dickey Fuller Test***

In [None]:
#Performing ADF test
from statsmodels.tsa.stattools import adfuller
from numpy import log
adf_data=india_case2['total_cases']
result = adfuller(adf_data)
print('ADF Statistic: %f' % result[0])
print('p-value: %f' % result[1])

From above results 'p' value is less than significance level(0.05) in order to reject null hypothesis

***To find 'd' value, plotting auto-correlation***

If more than 10 values are positive then it requires more differencing and it is non-stationary

In [None]:
import numpy as np, pandas as pd
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
import matplotlib.pyplot as plt
sns.set(rc={'figure.figsize':(18,15)})
# Original Series
fig, axes = plt.subplots(4, 2)
axes[0, 0].plot(adf_data); axes[0, 0].set_title('Original Series')
plot_acf(adf_data, ax=axes[0, 1])

# 1st Differencing
axes[1, 0].plot(adf_data.diff()); axes[1, 0].set_title('1st Order Differencing')
plot_acf(adf_data.diff().dropna(), ax=axes[1, 1])

# 2nd Differencing
axes[2, 0].plot(adf_data.diff().diff()); axes[2, 0].set_title('2nd Order Differencing')
plot_acf(adf_data.diff().diff().dropna(), ax=axes[2, 1])

#3rd Differencing
axes[3, 0].plot(adf_data.diff().diff().diff()); axes[3, 0].set_title('3rd Order Differencing')
plot_acf(adf_data.diff().diff().diff().dropna(), ax=axes[3, 1])

plt.show()

From above results, it can be observed that for order 1 differencing has more than 10 positive autocorrelation samples therefore it needs more differencing. Therefore d=2

***To find 'p' value, plotting Partial auto correlation plot***

In [None]:
# PACF plot of 1st differenced series
sns.set(rc={'figure.figsize':(18,5)})

fig, axes = plt.subplots(1, 2)
axes[0].plot(adf_data.diff().diff()); axes[0].set_title('2nd Differencing')
axes[1].set(ylim=(0,5))
plot_pacf(adf_data.diff().diff().dropna(), ax=axes[1])

plt.show()

One sample is well above significance level therefore p=1.

***To find 'q' value, plotting autocorrelation plot for order 2 differencing***

In [None]:
sns.set(rc={'figure.figsize':(18,5)})
fig, axes = plt.subplots(1, 2)
axes[0].plot(adf_data.diff().diff()); axes[0].set_title('2nd Differencing')
axes[1].set(ylim=(0,1.2))
plot_acf(adf_data.diff().diff().dropna(), ax=axes[1])

plt.show()

***Training and forecasting using ARIMA***

In [None]:
from statsmodels.tsa.arima_model import ARIMA
sns.set(rc={'figure.figsize':(15,10)})

warnings.filterwarnings('ignore')
#Splitting dataset
split=int(len(india_case2)*0.8)
arima_train=india_case2[:split]['total_cases']
arima_test=india_case2[split:]['total_cases']
arima_death_train=india_case2[:split]['Cumulative_deaths']
arima_death_test=india_case2[split:]['Cumulative_deaths']

# 1,2,2 ARIMA Model for confirmed cases
model = ARIMA(arima_train, order=(1,2,2))
confirmed = model.fit(disp=-1)
# Forecast
conf_fc, se, conf_ci = confirmed.forecast(len(india_case2)-split,alpha=0.05)  # 95% conf

# Make as pandas series
conf_fc = pd.Series(conf_fc, index=arima_test.index)
conf_low = pd.Series(conf_ci[:, 0], index=arima_test.index)
conf_high = pd.Series(conf_ci[:, 1], index=arima_test.index)

# 1,2,2 ARIMA Model for deaths cases
model = ARIMA(arima_death_train, order=(1,2,2))
deaths = model.fit(disp=-1)
# Forecast
death_fc, se, death_ci = deaths.forecast(len(india_case2)-split,alpha=0.05)  # 95% conf

# Make as pandas series
death_fc = pd.Series(death_fc, index=arima_death_test.index)
death_low = pd.Series(death_ci[:, 0], index=arima_death_test.index)
death_high = pd.Series(death_ci[:, 1], index=arima_death_test.index)


# Plot
f, axes = plt.subplots(1, 2)
axes[0].plot(arima_train, label='training')
axes[0].plot(arima_test, label='actual')
axes[0].plot(conf_fc, label='forecast')
axes[0].fill_between(conf_low.index, conf_low, conf_high, 
                 color='k', alpha=.15)
axes[0].set_title('Confirmed cases: actual vs forecast')
axes[0].set_ylabel('Confirmed')
axes[0].legend(loc='upper left', fontsize=8)
axes[1].plot(arima_death_train, label='training')
axes[1].plot(arima_death_test, label='actual')
axes[1].plot(death_fc, label='forecast')
axes[1].fill_between(death_low.index, death_low, death_high, 
                 color='k', alpha=.15)
axes[1].set_ylabel('Deaths')
axes[1].set_title('Deaths: actual vs forecast')
axes[1].legend(loc='upper left', fontsize=8)
plt.show()

***Mean absolute percentage error***

In [None]:
print("MAPE of confirmed casses is ",MAPE(arima_test,conf_fc)*100, " %")
print("Accuracy for confirmed cases is: ",(1-MAPE(arima_test,conf_fc))*100,"\n")
print("MAPE of deaths is ",MAPE(arima_death_test,death_fc)*100, " %")
print("Accuracy for deaths is: ",(1-MAPE(arima_death_test,death_fc))*100)

</br>

### Multilayer Perceptron using keras

***Scaling data and splitting into training and testing***

In [None]:
## For confirmed cases

from sklearn.preprocessing import MinMaxScaler
#Take 10 steps to predict the 11th data point
n_steps = 10
n_features = 1 
#Splitting training and testing data
size_x_Train=len(india_x)-n_steps
mlp_india_y=np.array(india_y).reshape(-1,1)
mlp_train=mlp_india_y[:size_x_Train]
mlp_test=mlp_india_y[size_x_Train:]
#Scaling data
mlp_scaler=MinMaxScaler()
mlp_scaler=mlp_scaler.fit(mlp_train)
mlp_train=mlp_scaler.transform(mlp_train)
mlp_test=mlp_scaler.transform(mlp_test)

***Using TimeseriesGenerator***

In [None]:
from keras.preprocessing.sequence import TimeseriesGenerator
n_input=n_steps
generator = TimeseriesGenerator(mlp_train,mlp_train,length = n_input,batch_size=1)
print(len(generator))
for i in range(len(generator)-2,len(generator)):
    x, y = generator[i]
    print('%s => %s' % (x, y))

***Defining neural network***

In [None]:
from keras.models import Sequential
from keras.layers import Dense, Activation,Flatten

model = Sequential()
model.add(Dense(192, activation='relu',input_shape=(n_input,n_features)))
model.add(Dense(64, activation='relu'))
model.add(Dense(64, activation='relu'))
model.add(Flatten())
model.add(Dense(units=1))
model.compile(optimizer='adam', loss='mse')

In [None]:
model.summary()

***Preparing a validation set***

In [None]:
val_set = np.append(mlp_train[-1],mlp_test)
val_set=val_set.reshape(n_steps+1,1)
val_set

In [None]:
n_input = n_steps
n_features = 1
validation_gen = TimeseriesGenerator(val_set,val_set,length = n_input,batch_size=1)

In [None]:
from tensorflow.keras.callbacks import EarlyStopping
early_stop = EarlyStopping(monitor='val_loss',patience=25,restore_best_weights=True)

# fit the model
model.fit_generator(generator,validation_data=validation_gen,epochs=200,callbacks=[early_stop],steps_per_epoch=10)

In [None]:
pd.DataFrame(model.history.history).plot(title="Loss vs epochs curve")

***Taking test batch as last 10 data points***

In [None]:
# list of predictions
mlp_pred = []

# last `n_input` points from training set
test_batch = mlp_train[-n_input:].reshape(1,n_input,n_features)

test_batch.shape

***Forecasting for 50 unseen observations***

In [None]:

# forecast the number of confirmed cases in India for the validation set and the next 50 days

for i in range(n_steps+50):
    batch_pred = model.predict(test_batch)[0]
    mlp_pred.append(batch_pred)
    test_batch = np.append(test_batch[:,1:,:],[[batch_pred]],axis=1)

mlp_pred

In [None]:
# apply inverse transformations on scaled data
mlp_pred = mlp_scaler.inverse_transform(mlp_pred)
mlp_pred[:,0]

In [None]:
last_day=size_x_Train
future=india_x[last_day:]
for i in range(len(india_x),len(india_x)+50):
    future=np.append(future,i)
print(future)

In [None]:
from itertools import repeat
mlp_conf_df = pd.DataFrame(columns=["Confirmed","Confirmed_predicted"],index=future)
mlp_conf_df.loc[:,"Confirmed_predicted"] = mlp_pred[:,0]
test_set=india_case2.iloc[size_x_Train:]['total_cases'].values.tolist()
nan_list=[]
nan_list=nan_list.extend(repeat(None,50))
mlp_conf_df.loc[:,"Confirmed"] =india_case2.iloc[size_x_Train:]['total_cases'].values.tolist()+[None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None]

***For Death cases, Similarly...***

In [None]:
## For death cases

mlp_deaths_india=np.array(india_z).reshape(-1,1)
mlp_deaths_train=mlp_deaths_india[:size_x_Train]
mlp_deaths_test=mlp_deaths_india[size_x_Train:]
#Scaling data
mlp_scaler2=MinMaxScaler()
mlp_scaler2=mlp_scaler2.fit(mlp_deaths_train)
mlp_deaths_train=mlp_scaler2.transform(mlp_deaths_train)
mlp_deaths_test=mlp_scaler2.transform(mlp_deaths_test)

In [None]:
from keras.preprocessing.sequence import TimeseriesGenerator
n_input=n_steps
generator = TimeseriesGenerator(mlp_deaths_train,mlp_deaths_train,length = n_input,batch_size=1)
for i in range(len(generator)-2,len(generator)):
    x, y = generator[i]
    print('%s => %s' % (x, y))

In [None]:
val_set = np.append(mlp_deaths_train[-1],mlp_deaths_test)
val_set=val_set.reshape(n_steps+1,1)

In [None]:
n_input = n_steps
n_features = 1
validation_gen = TimeseriesGenerator(val_set,val_set,length = n_input,batch_size=1)

In [None]:
from tensorflow.keras.callbacks import EarlyStopping
early_stop = EarlyStopping(monitor='val_loss',patience=25,restore_best_weights=True)

# fit the model
model.fit_generator(generator,validation_data=validation_gen,epochs=200,callbacks=[early_stop],steps_per_epoch=10)

In [None]:
# list of predictions
mlp_pred2 = []

# last `n_input` points from training set
test_batch = mlp_deaths_train[-n_input:].reshape(1,n_input,n_features)

test_batch.shape

In [None]:

# forecast the number of deaths in India for the validation set and the next 50 days

for i in range(n_steps+50):
    batch_pred = model.predict(test_batch)[0]
    mlp_pred2.append(batch_pred)
    test_batch = np.append(test_batch[:,1:,:],[[batch_pred]],axis=1)

In [None]:
# apply inverse transformations on scaled data
mlp_pred2 = mlp_scaler2.inverse_transform(mlp_pred2)
mlp_pred2[:,0]

In [None]:
from itertools import repeat
mlp_deaths_df = pd.DataFrame(columns=["Deaths","Deaths_predicted"],index=future)
mlp_deaths_df.loc[:,"Deaths_predicted"] = mlp_pred2[:,0]
test_set=india_case2.iloc[size_x_Train:]['Cumulative_deaths'].values.tolist()
nan_list=[]
nan_list=nan_list.extend(repeat(None,50))
mlp_deaths_df.loc[:,"Deaths"] =india_case2.iloc[size_x_Train:]['Cumulative_deaths'].values.tolist()+[None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None]

In [None]:
f, axes = plt.subplots(1, 2)
mlp_conf_df.plot(title="Confirmed Predictions for next 50 days-MLP",ax=axes[0])
mlp_deaths_df.plot(title="Death Predictions for next 50 days-MLP",ax=axes[1])

In [None]:
from itertools import repeat
present=future
present=np.append(np.arange(future[0]),future)
mlp = pd.DataFrame(columns=["Confirmed","Confirmed_predicted,Deaths,Deaths_predicted"],index=present)
mlp.loc[:future[0],"Confirmed"] = india_case2.iloc[:size_x_Train+1]['total_cases'].values.tolist()
mlp.loc[:future[0],"Deaths"] = india_case2.iloc[:size_x_Train+1]['Cumulative_deaths'].values.tolist()
nan_list=[]
nan_list.extend(repeat(None,167))
mlp.loc[:,"Confirmed_predicted"] =nan_list+ mlp_pred[:,0].tolist()
mlp.loc[:,"Deaths_predicted"] =nan_list+ mlp_pred2[:,0].tolist()
mlp

In [None]:
fig, axes = plt.subplots(1,2)
mlp.plot(y='Confirmed',ax=axes[0],linewidth=2).set_title('Time Forecasting')
mlp.plot(y='Confirmed_predicted',linestyle='dashed',ax=axes[0],linewidth=2)
mlp.plot(y='Deaths',ax=axes[1],linewidth=2).set_title('Time Forecasting')
mlp.plot(y='Deaths_predicted',linestyle='dashed',ax=axes[1],linewidth=2)

In [None]:
print("MAPE of confirmed casses is ",MAPE(mlp_conf_df["Confirmed"][:n_steps],mlp_conf_df["Confirmed_predicted"][:n_steps])*100," %")
print("Accuracy for confirmed cases is: ",(1-MAPE(mlp_conf_df["Confirmed"][:n_steps],mlp_conf_df["Confirmed_predicted"][:n_steps]))*100,"\n")
print("MAPE of deaths is ",MAPE(mlp_deaths_df["Deaths"][:n_steps],mlp_deaths_df["Deaths_predicted"][:n_steps]), " %")
print("Accuracy for deaths is: ",(1-MAPE(mlp_deaths_df["Deaths"][:n_steps],mlp_deaths_df["Deaths_predicted"][:n_steps]))*100)

### Support Vector Machines 

***Importing the SVR model from sklearn SVM module and training the model***

In [None]:
from sklearn.svm import SVR
from sklearn.preprocessing import MinMaxScaler
scalerX1 = MinMaxScaler()
scalerY1 = MinMaxScaler()
scalerX2 = MinMaxScaler()
scalerY2 = MinMaxScaler()
svr_x_train1 = scalerX1.fit_transform(np.array(india_x_train).reshape(-1,1))
svr_y_train1 = scalerY1.fit_transform(np.array(india_y_train).reshape(-1,1))
svr_x_test1 = scalerX1.transform(np.array(india_x_test).reshape(-1,1))
svr_y_test1 = scalerY1.transform(np.array(india_y_test).reshape(-1,1))
svr_x_train2 = scalerX2.fit_transform(np.array(india_x_train2).reshape(-1,1))
svr_z_train = scalerY2.fit_transform(np.array(india_z_train).reshape(-1,1))
svr_x_test2 = scalerX2.transform(np.array(india_x_test2).reshape(-1,1))
svr_z_test = scalerY2.transform(np.array(india_z_test).reshape(-1,1))
svm=SVR(kernel="rbf", C=1,epsilon=0.01,gamma='scale')
svm.fit(svr_x_train1,svr_y_train1)
svr_india_pred1=scalerY1.inverse_transform(np.array(svm.predict(svr_x_test1)).reshape(-1,1))
svr_india_pred1=np.array(svr_india_pred1).reshape(-1)
svm.fit(svr_x_train2,svr_z_train)
svr_india_pred2=scalerY2.inverse_transform(np.array(svm.predict(svr_x_test2)).reshape(-1,1))
svr_india_pred2=np.array(svr_india_pred2).reshape(-1)

***Plotting model vs actual***

In [None]:
f, axes = plt.subplots(1, 2)
sns.scatterplot(india_x_test,india_y_test,color='#A7A7A7',s=75,label="Actual",ax=axes[0])
sns.lineplot(india_x_test,np.array(svr_india_pred1).reshape(-1),color="#A4D54E",linewidth=2.25,label="Predicted",ax=axes[0])
sns.scatterplot(india_x_test2,india_z_test,color='#A7A7A7',s=75,label="Actual",ax=axes[1])
sns.lineplot(india_x_test2,np.array(svr_india_pred2).reshape(-1),color="#F94350",linewidth=2.25,label="Predicted",ax=axes[1])

***R square***

In [None]:
print("R square score for total cases",r2_score(india_y_test, svr_india_pred1)*100)
print("R square score for total cases",r2_score(india_z_test, svr_india_pred2)*100)

***Cross Validation***

In [None]:
svr_scores1 = cross_val_score(svm, np.array(svr_x_train1).reshape(-1,1), np.array(svr_y_train1).reshape(-1,1), scoring='r2', cv=5)
print("Cross Validation scores for total cases :",svr_scores1)
print("Mean of cross validation scores for total cases= ",np.mean(svr_scores1),'\n')
svr_scores2 = cross_val_score(svm, np.array(svr_x_train1).reshape(-1,1), np.array(svr_y_train1).reshape(-1,1), scoring='r2', cv=5)
print("Cross Validation scores for cumulative deaths :",svr_scores2)
print("Mean of cross validation scores for cumulative deaths= ",np.mean(svr_scores2))

***Learning curve***

In [None]:
plot_learning_curves(svm,np.array(svr_x_train1).reshape(-1,1),np.array(svr_y_train1).reshape(-1,1))

***Mean absolute percentage error***

In [None]:
print("MAPE of confirmed casses is ",MAPE(india_y_test,svr_india_pred1)*100, " %")
print("Accuracy for confirmed cases is: ",(1-MAPE(india_y_test,svr_india_pred1))*100,"\n")
print("MAPE of deaths is ",MAPE(india_z_test,svr_india_pred2)*100, " %")
print("Accuracy for deaths is: ",(1-MAPE(india_z_test,svr_india_pred2))*100)

### Asian countries(except India) Prediction

In [None]:
asia=['China','India','Hong Kong','Indonesia','Pakistan','Japan','Philippines','Vietnam','Turkey','Iran','Thailand','Myanmar','South Korea','Iraq','Afghanistan','Saudi Arabia','Malaysia','Yemen','Nepal','Sri lanka','Kazakhstan','Syria','Jordan','Israel','Singapore','Laos','Lebanon','Oman','United Arab Emirates','Kuwait','Georgia','Mangolia','Armenia','Qatar','Russia','Bahrain','Timor-Leste','Cyprus','Bhutan','Maldives','Brunei','Taiwan']
asia_case=covid[covid['Country'].isin(asia)]

In [None]:
#Select all countries except India 
except_india_asia=asia_case[asia_case['Country']!='India']
from sklearn.model_selection import train_test_split
import datetime as dt
#except_india_asia['date']=pd.to_datetime(except_india_asia['date'])

In [None]:
import warnings
warnings.filterwarnings('ignore')
#except_india_asia['date']=except_india_asia['date'].map(dt.datetime.toordinal)

Our dataset contains all the continents and each continent has many countries. We require a data such that for each date the total number of cases by continent are known. Using pandas_df.groupby([]) helps us achieve this. It is similar to GROUP BY in SQL

In [None]:
grouped_data=except_india_asia.groupby(['date'],as_index=False,sort=False).sum()
grouped_data

***Data Visualisation***

In [None]:
#Plotting confirmed cases per day vs date
sns.lineplot(x="date",y="total_cases",data=grouped_data)
plt.show()

In [None]:
#Plotting confirmed cases per day vs date
sns.set(rc={'figure.figsize':(15,10)})
sns.lineplot(x="date",y="Cumulative_deaths",data=grouped_data)
plt.show()

In [None]:
start=0
dateHI2=[]
for i in range(0,len(grouped_data.index)):
    dateHI2.append(start+i)
grouped_data['datess']=dateHI2
x=np.array(dateHI2)
asiaE_x=x
asiaE_y=grouped_data['total_cases']
asiaE_z=grouped_data['Cumulative_deaths']
asiaE_x_train,asiaE_x_test,asiaE_y_train,asiaE_y_test=train_test_split(asiaE_x,asiaE_y,test_size=0.25)
asiaE_x_train2,asiaE_x_test2,asiaE_z_train,asiaE_z_test=train_test_split(asiaE_x,asiaE_z,test_size=0.25)

***Fitting the linear model on training dataset and predicting***

In [None]:
lr=LinearRegression()
lr.fit(np.array(asiaE_x_train).reshape(-1,1),np.array(asiaE_y_train).reshape(-1,1))
lr_asiaE_pred1=lr.predict(np.array(asiaE_x_test).reshape(-1,1))
lr.fit(np.array(asiaE_x_train2).reshape(-1,1),np.array(asiaE_z_train).reshape(-1,1))
lr_asiaE_pred2=lr.predict(np.array(asiaE_x_test2).reshape(-1,1))

***Plot - Asia(except India)***

In [None]:
lr_asiaE_pred1=np.array(lr_asiaE_pred1).reshape(-1)
lr_asiaE_pred2=np.array(lr_asiaE_pred2).reshape(-1)
f, axes = plt.subplots(1, 2)
sns.scatterplot(asiaE_x_test,asiaE_y_test,color='#A7A7A7',s=75,legend="brief",label="Actual",ax=axes[0])
sns.lineplot(asiaE_x_test,lr_asiaE_pred1,color="#A4D54E",linewidth=2.25,label="Predicted",ax=axes[0])
sns.scatterplot(asiaE_x_test2,asiaE_z_test,color='#A7A7A7',s=75,legend="brief",label="Actual")
sns.lineplot(asiaE_x_test2,lr_asiaE_pred2,color="#F94350",linewidth=2.25,label="Predicted")
plt.show()

***Mean absolute percentage error***

In [None]:
print("MAPE of confirmed casses is ",MAPE(asiaE_y_test,lr_asiaE_pred1)*100, " %")
print("Accuracy for confirmed cases is: ",(1-MAPE(asiaE_y_test,lr_asiaE_pred1))*100,"\n")
print("MAPE of deaths is ",MAPE(asiaE_z_test,lr_asiaE_pred2)*100, " %")
print("Accuracy for deaths is: ",(1-MAPE(asiaE_z_test,lr_asiaE_pred2))*100)

***Calculating R square***

In [None]:
score_pr_india1=r2_score(asiaE_y_test, lr_asiaE_pred1)*100
score_pr_india2=r2_score(asiaE_z_test, lr_asiaE_pred2)*100
print('R square score for total cases=',score_pr_india1)
print('R square score for cumulative deaths=',score_pr_india2)

***Cross Validation***

In [None]:
lr_scores_asiaE1 = cross_val_score(lr, np.array(asiaE_x_train).reshape(-1,1), np.array(asiaE_y_train).reshape(-1,1), scoring='r2', cv=5)
print("Cross Validation scores for total cases :",lr_scores_asiaE1)
print("Mean of cross validation scores for total cases= ",np.mean(lr_scores_asiaE1),'\n')
lr_scores_asiaE2 = cross_val_score(lr, np.array(asiaE_x_train2).reshape(-1,1), np.array(asiaE_z_train).reshape(-1,1), scoring='r2', cv=5)
print("Cross Validation scores for cumulative deaths :",lr_scores_asiaE2)
print("Mean of cross validation scores for cumulative deaths= ",np.mean(lr_scores_asiaE2))

***Learning curve***

In [None]:
plot_learning_curves(lr,np.array(asiaE_x_train).reshape(-1,1),np.array(asiaE_y_train).reshape(-1,1))

### Polynomial Regression

***Adding extra feature***

In [None]:
poly2=PolynomialFeatures(degree=2)
pr_x_asia=poly.fit_transform(np.array(asiaE_x_train).reshape(-1,1))
pr_x_asia2=poly.fit_transform(np.array(asiaE_x_train2).reshape(-1,1))

***Training the model***

In [None]:
Pr=LinearRegression()
Pr.fit(pr_x_asia,np.array(asiaE_y_train).reshape(-1,1))
pr_asiaE_pred1=np.array(Pr.predict(poly.fit_transform(np.array(asiaE_x_test).reshape(-1,1)))).reshape(-1)
Pr.fit(pr_x_asia2,np.array(asiaE_z_train).reshape(-1,1))
pr_asiaE_pred2=np.array(Pr.predict(poly.fit_transform(np.array(asiaE_x_test2).reshape(-1,1)))).reshape(-1)

***Plot***

In [None]:
f, axes = plt.subplots(1, 2)
sns.scatterplot(asiaE_x_test,asiaE_y_test,color='#A7A7A7',s=75,legend="brief",label="Actual",ax=axes[0])
sns.lineplot(asiaE_x_test,pr_asiaE_pred1,color="#A4D54E",linewidth=2.25,label="Predicted",ax=axes[0])
sns.scatterplot(asiaE_x_test2,asiaE_z_test,color='#A7A7A7',s=75,legend="brief",label="Actual",ax=axes[1])
sns.lineplot(asiaE_x_test2,pr_asiaE_pred2,color="#F94350",linewidth=2.25,label="Predicted",ax=axes[1])
plt.show()

***Mean absolute percentage error***

In [None]:
print("MAPE of confirmed casses is ",MAPE(asiaE_y_test,pr_asiaE_pred1)*100, " %")
print("Accuracy for confirmed cases is: ",(1-MAPE(asiaE_y_test,pr_asiaE_pred1))*100,"\n")
print("MAPE of deaths is ",MAPE(asiaE_z_test,pr_asiaE_pred2)*100, " %")
print("Accuracy for deaths is: ",(1-MAPE(asiaE_z_test,pr_asiaE_pred2))*100)

***R square***

In [None]:
score_pr_india1=r2_score(asiaE_y_test, pr_asiaE_pred1)*100
score_pr_india2=r2_score(asiaE_z_test, pr_asiaE_pred2)*100
print('R square score for total cases=',score_pr_india1)
print('R square score for cumulative deaths=',score_pr_india2)

***Cross Validation***

In [None]:
pr_scores_asiaE1 = cross_val_score(Pr, pr_x_asia, np.array(asiaE_y_train).reshape(-1,1), scoring='r2', cv=5)
print("Cross Validation scores for total case ",pr_scores_asiaE1)
print("The mean score after cross validation for total cases is",np.mean(pr_scores_asiaE1),"\n")
pr_scores_asiaE2 = cross_val_score(Pr, pr_x_asia2, np.array(asiaE_z_train).reshape(-1,1), scoring='r2', cv=5)
print("Cross Validation scores for total case ",pr_scores_asiaE2)
print("The mean score after cross validation for total cases is",np.mean(pr_scores_asiaE2))

***Learning Curve***

In [None]:
plot_learning_curves(Pr,pr_x_asia,np.array(asiaE_y_train).reshape(-1,1))

</br>

### ARIMA Model 

In [None]:
#Performing ADF test
from statsmodels.tsa.stattools import adfuller
from numpy import log
adf_data=asiaE_y
result = adfuller(adf_data)
print('ADF Statistic: %f' % result[0])
print('p-value: %f' % result[1])

In [None]:
import numpy as np, pandas as pd
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
import matplotlib.pyplot as plt
sns.set(rc={'figure.figsize':(18,15)})
# Original Series
fig, axes = plt.subplots(4, 2)
axes[0, 0].plot(adf_data); axes[0, 0].set_title('Original Series')
plot_acf(adf_data, ax=axes[0, 1])

# 1st Differencing
axes[1, 0].plot(adf_data.diff()); axes[1, 0].set_title('1st Order Differencing')
plot_acf(adf_data.diff().dropna(), ax=axes[1, 1])

# 2nd Differencing
axes[2, 0].plot(adf_data.diff().diff()); axes[2, 0].set_title('2nd Order Differencing')
plot_acf(adf_data.diff().diff().dropna(), ax=axes[2, 1])

#3rd Differencing
axes[3, 0].plot(adf_data.diff().diff().diff()); axes[3, 0].set_title('3rd Order Differencing')
plot_acf(adf_data.diff().diff().diff().dropna(), ax=axes[3, 1])

plt.show()

In [None]:
# PACF plot of 2nd differenced series
sns.set(rc={'figure.figsize':(18,5)})

fig, axes = plt.subplots(1, 2)
axes[0].plot(adf_data.diff().diff()); axes[0].set_title('2nd Differencing')
axes[1].set(ylim=(0,5))
plot_pacf(adf_data.diff().diff().dropna(), ax=axes[1])

plt.show()

In [None]:
plt.rcParams.update({'figure.figsize':(18,5), 'figure.dpi':100})

fig, axes = plt.subplots(1, 2)
axes[0].plot(adf_data.diff().diff()); axes[0].set_title('2nd Differencing')
axes[1].set(ylim=(0,1.2))
plot_acf(adf_data.diff().diff().dropna(), ax=axes[1])

plt.show()

In [None]:
from statsmodels.tsa.arima_model import ARIMA
sns.set(rc={'figure.figsize':(15,10)})

warnings.filterwarnings('ignore')
#Splitting dataset
split=int(len(grouped_data)*0.8)
arima_train=grouped_data[:split]['total_cases']
arima_test=grouped_data[split:]['total_cases']
arima_death_train=grouped_data[:split]['Cumulative_deaths']
arima_death_test=grouped_data[split:]['Cumulative_deaths']

# 1,2,2 ARIMA Model for confirmed cases
model = ARIMA(arima_train, order=(1,2,2))
confirmed = model.fit(disp=-1)
# Forecast
conf_fc, se, conf_ci = confirmed.forecast(len(grouped_data)-split,alpha=0.05)  # 95% conf

# Make as pandas series
conf_fc = pd.Series(conf_fc, index=arima_test.index)
conf_low = pd.Series(conf_ci[:, 0], index=arima_test.index)
conf_high = pd.Series(conf_ci[:, 1], index=arima_test.index)

# 1,2,2 ARIMA Model for deaths cases
model = ARIMA(arima_death_train, order=(1,1,1))
deaths = model.fit(disp=-1)
# Forecast
death_fc, se, death_ci = deaths.forecast(len(grouped_data)-split,alpha=0.05)  # 95% conf

# Make as pandas series
death_fc = pd.Series(death_fc, index=arima_death_test.index)
death_low = pd.Series(death_ci[:, 0], index=arima_death_test.index)
death_high = pd.Series(death_ci[:, 1], index=arima_death_test.index)


# Plot
f, axes = plt.subplots(1, 2)
axes[0].plot(arima_train, label='training')
axes[0].plot(arima_test, label='actual')
axes[0].plot(conf_fc, label='forecast')
axes[0].fill_between(conf_low.index, conf_low, conf_high, 
                 color='k', alpha=.15)
axes[0].set_title('Confirmed cases: actual vs forecast')
axes[0].set_ylabel('Confirmed')
axes[0].legend(loc='upper left', fontsize=8)
axes[1].plot(arima_death_train, label='training')
axes[1].plot(arima_death_test, label='actual')
axes[1].plot(death_fc, label='forecast')
axes[1].fill_between(death_low.index, death_low, death_high, 
                 color='k', alpha=.15)
axes[1].set_ylabel('Deaths')
axes[1].set_title('Deaths: actual vs forecast')
axes[1].legend(loc='upper left', fontsize=8)
plt.show()

In [None]:
print("MAPE of confirmed casses is ",MAPE(arima_test,conf_fc)*100, " %")
print("Accuracy for confirmed cases is: ",(1-MAPE(arima_test,conf_fc))*100,"\n")
print("MAPE of deaths is ",MAPE(arima_death_test,death_fc)*100, " %")
print("Accuracy for deaths is: ",(1-MAPE(arima_death_test,death_fc))*100)

### Multilayer Perceptron

In [None]:
## For confirmed cases

from sklearn.preprocessing import MinMaxScaler
#Take 10 steps to predict the 11th data point
n_steps = 10
n_features = 1 
#Splitting training and testing data
size_x_Train=len(asiaE_x)-n_steps
mlp_asia_y=np.array(asiaE_y).reshape(-1,1)
mlp_train=mlp_asia_y[:size_x_Train]
mlp_test=mlp_asia_y[size_x_Train:]
#Scaling data
mlp_scaler=MinMaxScaler()
mlp_scaler=mlp_scaler.fit(mlp_train)
mlp_train=mlp_scaler.transform(mlp_train)
mlp_test=mlp_scaler.transform(mlp_test)

In [None]:
from keras.preprocessing.sequence import TimeseriesGenerator
n_input=n_steps
generator = TimeseriesGenerator(mlp_train,mlp_train,length = n_input,batch_size=1)
print(len(generator))
for i in range(len(generator)-2,len(generator)):
    x, y = generator[i]
    print('%s => %s' % (x, y))

In [None]:
from keras.models import Sequential
from keras.layers import Dense, Activation,Flatten

model = Sequential()
model.add(Dense(192, activation='relu',input_shape=(n_input,n_features)))
model.add(Dense(64, activation='relu'))
model.add(Dense(64, activation='relu'))
model.add(Flatten())
model.add(Dense(units=1))
model.compile(optimizer='adam', loss='mse')

In [None]:
val_set = np.append(mlp_train[-1],mlp_test)
val_set=val_set.reshape(n_steps+1,1)
val_set

In [None]:
n_input = n_steps
n_features = 1
validation_gen = TimeseriesGenerator(val_set,val_set,length = n_input,batch_size=1)

In [None]:
from tensorflow.keras.callbacks import EarlyStopping
early_stop = EarlyStopping(monitor='val_loss',patience=25,restore_best_weights=True)

# fit the model
model.fit_generator(generator,validation_data=validation_gen,epochs=200,callbacks=[early_stop],steps_per_epoch=10)

In [None]:
pd.DataFrame(model.history.history).plot(title="Loss vs epochs curve")

In [None]:
# list of predictions
mlp_pred = []

# last `n_input` points from training set
test_batch = mlp_train[-n_input:].reshape(1,n_input,n_features)

test_batch.shape

In [None]:
# forecast the number of confirmed cases in India for the validation set and the next 50 days

for i in range(n_steps+50):
    batch_pred = model.predict(test_batch)[0]
    mlp_pred.append(batch_pred)
    test_batch = np.append(test_batch[:,1:,:],[[batch_pred]],axis=1)

mlp_pred

In [None]:
# apply inverse transformations on scaled data
mlp_pred = mlp_scaler.inverse_transform(mlp_pred)
mlp_pred[:,0]

In [None]:
last_day=size_x_Train
future=asiaE_x[last_day:]
for i in range(len(asiaE_x),len(asiaE_x)+50):
    future=np.append(future,i)
print(future)

In [None]:
from itertools import repeat
mlp_conf_df = pd.DataFrame(columns=["Confirmed","Confirmed_predicted"],index=future)
mlp_conf_df.loc[:,"Confirmed_predicted"] = mlp_pred[:,0]
test_set=grouped_data.iloc[size_x_Train:]['total_cases'].values.tolist()
nan_list=[]
nan_list=nan_list.extend(repeat(None,50))
mlp_conf_df.loc[:,"Confirmed"] =grouped_data.iloc[size_x_Train:]['total_cases'].values.tolist()+[None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None]

In [None]:
## For death cases

mlp_deaths_asia=np.array(asiaE_z).reshape(-1,1)
mlp_deaths_train=mlp_deaths_asia[:size_x_Train]
mlp_deaths_test=mlp_deaths_asia[size_x_Train:]
#Scaling data
mlp_scaler2=MinMaxScaler()
mlp_scaler2=mlp_scaler2.fit(mlp_deaths_train)
mlp_deaths_train=mlp_scaler2.transform(mlp_deaths_train)
mlp_deaths_test=mlp_scaler2.transform(mlp_deaths_test)

In [None]:
from keras.preprocessing.sequence import TimeseriesGenerator
n_input=n_steps
generator = TimeseriesGenerator(mlp_deaths_train,mlp_deaths_train,length = n_input,batch_size=1)
for i in range(len(generator)-2,len(generator)):
    x, y = generator[i]
    print('%s => %s' % (x, y))

In [None]:
val_set = np.append(mlp_deaths_train[-1],mlp_deaths_test)
val_set=val_set.reshape(n_steps+1,1)

In [None]:
n_input = n_steps
n_features = 1
validation_gen = TimeseriesGenerator(val_set,val_set,length = n_input,batch_size=1)

In [None]:
from tensorflow.keras.callbacks import EarlyStopping
early_stop = EarlyStopping(monitor='val_loss',patience=25,restore_best_weights=True)

# fit the model
model.fit_generator(generator,validation_data=validation_gen,epochs=200,callbacks=[early_stop],steps_per_epoch=10)

In [None]:
# list of predictions
mlp_pred2 = []

# last `n_input` points from training set
test_batch = mlp_deaths_train[-n_input:].reshape(1,n_input,n_features)

test_batch.shape

In [None]:

# forecast the number of deaths in India for the validation set and the next 50 days

for i in range(n_steps+50):
    batch_pred = model.predict(test_batch)[0]
    mlp_pred2.append(batch_pred)
    test_batch = np.append(test_batch[:,1:,:],[[batch_pred]],axis=1)

In [None]:
# apply inverse transformations on scaled data
mlp_pred2 = mlp_scaler2.inverse_transform(mlp_pred2)
mlp_pred2[:,0]

In [None]:
from itertools import repeat
mlp_deaths_df = pd.DataFrame(columns=["Deaths","Deaths_predicted"],index=future)
mlp_deaths_df.loc[:,"Deaths_predicted"] = mlp_pred2[:,0]
test_set=grouped_data.iloc[size_x_Train:]['Cumulative_deaths'].values.tolist()
nan_list=[]
nan_list=nan_list.extend(repeat(None,50))
mlp_deaths_df.loc[:,"Deaths"] =grouped_data.iloc[size_x_Train:]['Cumulative_deaths'].values.tolist()+[None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None]

In [None]:
f, axes = plt.subplots(1, 2)
mlp_conf_df.plot(title="Confirmed Predictions for next 50 days-MLP",ax=axes[0])
mlp_deaths_df.plot(title="Death Predictions for next 50 days-MLP",ax=axes[1])

In [None]:
from itertools import repeat
present=future
present=np.append(np.arange(future[0]),future)
mlp = pd.DataFrame(columns=["Confirmed","Confirmed_predicted,Deaths,Deaths_predicted"],index=present)
mlp.loc[:future[0],"Confirmed"] = grouped_data.iloc[:size_x_Train+1]['total_cases'].values.tolist()
mlp.loc[:future[0],"Deaths"] = grouped_data.iloc[:size_x_Train+1]['Cumulative_deaths'].values.tolist()
nan_list=[]
nan_list.extend(repeat(None,174))
mlp.loc[:,"Confirmed_predicted"] =nan_list+ mlp_pred[:,0].tolist()
mlp.loc[:,"Deaths_predicted"] =nan_list+ mlp_pred2[:,0].tolist()
mlp

In [None]:
fig, axes = plt.subplots(1,2)
mlp.plot(y='Confirmed',ax=axes[0],linewidth=2).set_title('Time Forecasting')
mlp.plot(y='Confirmed_predicted',linestyle='dashed',ax=axes[0],linewidth=2)
mlp.plot(y='Deaths',ax=axes[1],linewidth=2).set_title('Time Forecasting')
mlp.plot(y='Deaths_predicted',linestyle='dashed',ax=axes[1],linewidth=2)

In [None]:
print("MAPE of confirmed casses is ",MAPE(mlp_conf_df["Confirmed"][:n_steps],mlp_conf_df["Confirmed_predicted"][:n_steps])*100," %")
print("Accuracy for confirmed cases is: ",(1-MAPE(mlp_conf_df["Confirmed"][:n_steps],mlp_conf_df["Confirmed_predicted"][:n_steps]))*100,"\n")
print("MAPE of deaths is ",MAPE(mlp_deaths_df["Deaths"][:n_steps],mlp_deaths_df["Deaths_predicted"][:n_steps]), " %")
print("Accuracy for deaths is: ",(1-MAPE(mlp_deaths_df["Deaths"][:n_steps],mlp_deaths_df["Deaths_predicted"][:n_steps]))*100)

### Support Vector machine

***Training SVR and predicting***

In [None]:
from sklearn.svm import SVR
from sklearn.preprocessing import MinMaxScaler
scalerX1 = MinMaxScaler()
scalerY1 = MinMaxScaler()
scalerX2 = MinMaxScaler()
scalerY2 = MinMaxScaler()
svr_x_train1 = scalerX1.fit_transform(np.array(asiaE_x_train).reshape(-1,1))
svr_y_train1 = scalerY1.fit_transform(np.array(asiaE_y_train).reshape(-1,1))
svr_x_test1 = scalerX1.transform(np.array(asiaE_x_test).reshape(-1,1))
svr_y_test1 = scalerY1.transform(np.array(asiaE_y_test).reshape(-1,1))
svr_x_train2 = scalerX2.fit_transform(np.array(asiaE_x_train2).reshape(-1,1))
svr_z_train = scalerY2.fit_transform(np.array(asiaE_z_train).reshape(-1,1))
svr_x_test2 = scalerX2.transform(np.array(asiaE_x_test2).reshape(-1,1))
svr_z_test = scalerY2.transform(np.array(asiaE_z_test).reshape(-1,1))
svm=SVR(kernel="rbf", C=10,epsilon=0.001,gamma='scale')
svm.fit(svr_x_train1,svr_y_train1)
svr_asia_pred1=scalerY1.inverse_transform(np.array(svm.predict(svr_x_test1)).reshape(-1,1))
svr_asia_pred1=np.array(svr_asia_pred1).reshape(-1)
svm.fit(svr_x_train2,svr_z_train)
svr_asia_pred2=scalerY2.inverse_transform(np.array(svm.predict(svr_x_test2)).reshape(-1,1))
svr_asia_pred2=np.array(svr_asia_pred2).reshape(-1)

***Plot SVR Model***

In [None]:
f, axes = plt.subplots(1, 2)
sns.scatterplot(asiaE_x_test,asiaE_y_test,color='#A7A7A7',s=75,label="Actual",ax=axes[0])
sns.lineplot(asiaE_x_test,svr_asia_pred1,color="#A4D54E",linewidth=2.25,label="Predicted",ax=axes[0])
sns.scatterplot(asiaE_x_test2,asiaE_z_test,color='#A7A7A7',s=75,label="Actual",ax=axes[1])
sns.lineplot(asiaE_x_test2,np.array(svr_asia_pred2).reshape(-1),color="#F94350",linewidth=2.25,label="Predicted",ax=axes[1])

***Mean absolute percentage error***

In [None]:
print("MAPE of confirmed casses is ",MAPE(asiaE_y_test,svr_asia_pred1)*100, " %")
print("Accuracy for confirmed cases is: ",(1-MAPE(asiaE_y_test,svr_asia_pred1))*100,"\n")
print("MAPE of deaths is ",MAPE(asiaE_z_test,svr_asia_pred2)*100, " %")
print("Accuracy for deaths is: ",(1-MAPE(asiaE_z_test,svr_asia_pred2))*100)

***R Square***

In [None]:
print("R square score for total cases",r2_score(asiaE_y_test, svr_asia_pred1)*100)
print("R square score for total cases",r2_score(asiaE_z_test, svr_asia_pred2)*100)

***Cross Validation***

In [None]:
svr_scores1 = cross_val_score(svm, np.array(svr_x_train1).reshape(-1,1), np.array(svr_y_train1).reshape(-1,1), scoring='r2', cv=5)
print("Cross Validation scores for total cases :",svr_scores1)
print("Mean of cross validation scores for total cases= ",np.mean(svr_scores1),'\n')
svr_scores2 = cross_val_score(svm, np.array(svr_x_train1).reshape(-1,1), np.array(svr_y_train1).reshape(-1,1), scoring='r2', cv=5)
print("Cross Validation scores for cumulative deaths :",svr_scores2)
print("Mean of cross validation scores for cumulative deaths= ",np.mean(svr_scores2))

### Inference:
This is a very import result.
1. If we exclude India from our Asian dataset the overall data follows a linear nature.
2. Due to this linear nature, linear regression's accuracy is very high
3. India was the worst hit COVID country in Asia
4. For this particular trend, SVR kernel should be linear
5. Both SVR and linear regression perform good