## Covid-19 stats and predictions with LSTM neural networks
---
Last updated: 30/07/2021

### Introduction

This noteboook makes a comparative analysis and future predictions of covid-19 cases and deaths for 18 countries: 
- 9 European Union countries: Spain, Belgium, France, Germany, Italy, Netherlands, Portugal, Sweden, Switzerland
- 9 world countries: Argentina, Brazil, Canada, India, Iran, Mexico, Russia, United Kingdom, United States (US)

The primary (measured) variables I will work with, as obtained from the repository files, are:

    - number of confirmed cases
    - number of recovered cases
    - number of deaths
    
Then I will calculate and investigate these additional variables:

    - active cases
    - mortality(%)
    - growth rates of cases and deaths
    
Finally, I will implement multivariate - multistep LSTM predictive models for some of these variables. I will be using the data series for all countries as inputs (multivariate) which means each output will be a function of itself and the other country inputs. The idea is that being the selected countries in different stages of the pandemic evolution, the RNN will enhance its predictive power on those curves lagging behind by using its knowledge of those ahead.     

I hope you will find it interesting. Please upvote me if you do!!!


### Version release notes

Note that a new version is saved every time the notebook is run, mostly with the purpose of updating with the latest data. I typically do this every two or three days since I started this notebook sometime in April 2020. I will not be commenting those version updates. 

-----------------------------------------------
#### Version 205 (update on 1/04/2021)

Added bidirectional layer wrapper to deep layer in neural networks 1 and 2 (cummulative confirmed cases and deaths). Significant improvement in quality of predictions and reproducibility. 

-----------------------------------------------
#### Version 181 (update on 10/11/2020)

Replaced China series with Argentina. 

-----------------------------------------------
#### Version 151 (update on 13/10/2020)
 
Added plots of neural networks training loss, to better understand the training process and results, adjusting number of epochs in consequence. 

Also experimented with different activation functions (elu, selu, relu, swish) for the the neural networks output (dense).

-----------------------------------------------
#### Version 125 (update on 13/09/2020)

Code has been added with the countries populations so to calculate number of confirmed cases and deaths per million people, before introducing the data to train the LSTM neural networks. Results seem to be at least more reliable (more predictions are good). Minor, disperse changes to update dependent code blocks.

Active cases charts (per country) removed too due to the poor quality of the series for some countries.

-----------------------------------------------
#### Version 124 (update on 11/09/2020)

Mortality charts suppressed, left only the latest (current) values of mortality.

-----------------------------------------------
#### Version 68 (update on 14/08/2020)

Code modified to read the data files directly from the COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University (before they were being read from a pre-processed version in Kaggle): 

    https://github.com/CSSEGISandData/COVID-19

These files are updated daily and hence you can get an up-to-date, fresh execution any time you run the notebook. Also these files contain confirmed cases, recovered and dates cumulative numbers for 188 countries in the world, so whilst I use a subset of 18 countries, you can easily fork the notebook and taylor it to your needs.

## Data load and pre-processing
---

### Library imports and general settings

In [None]:
# # Set the random seed for reproducibility
# seed_value = 98765
# from numpy.random import seed
# seed(seed_value)


# Libraries import
import numpy as np
import pandas as pd
from datetime import datetime, date, timedelta
import matplotlib.dates as mdates
import matplotlib.pyplot as plt
from sklearn.preprocessing import MinMaxScaler
from keras.models import Sequential
from keras.layers import Dense, LSTM, Dropout, Bidirectional 

import warnings
warnings.filterwarnings("ignore")

plt.style.use('seaborn-whitegrid')
              
# Set precision to two decimals
pd.set_option("display.precision", 2)

# Define date format for charts like Apr 16 or Mar 8
my_date_fmt = mdates.DateFormatter('%b %e')

### Read files and tidy up

In [None]:
# Download files from github
cases_url = "https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv"
df_cases = pd.read_csv(cases_url, error_bad_lines=False)

deaths_url = 'https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv'
df_deaths = pd.read_csv(deaths_url, error_bad_lines=False)

recovered_url = 'https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_recovered_global.csv'
df_recovered = pd.read_csv(recovered_url, error_bad_lines=False)

In [None]:
# Drop Province/State, Lat and Long
df_cases.drop(columns=['Province/State', 'Lat', 'Long'], inplace=True)
df_deaths.drop(columns=['Province/State', 'Lat', 'Long'], inplace=True)
df_recovered.drop(columns=['Province/State', 'Lat', 'Long'], inplace=True)

# Rename Country/Region as Country
df_cases.rename(columns={'Country/Region' : 'Country'}, inplace=True)
df_deaths.rename(columns={'Country/Region' : 'Country'}, inplace=True)
df_recovered.rename(columns={'Country/Region' : 'Country'}, inplace=True)

# Some countries (Australia, Canada...) report data by province so we need to aggregate it
df_cases = df_cases.groupby(by='Country').sum()
df_deaths = df_deaths.groupby(by='Country').sum()
df_recovered = df_recovered.groupby(by='Country').sum()

# Transpose dataframes and make the date column index of datetime type
df_cases = df_cases.T
df_cases.index = pd.to_datetime(df_cases.index)
df_deaths = df_deaths.T
df_deaths.index = pd.to_datetime(df_deaths.index)
df_recovered = df_recovered.T
df_recovered.index = pd.to_datetime(df_recovered.index)

In [None]:
# Get last date in the set
last_date = df_cases.tail(1).index[0]
print('Last date in the dataset: ' + str(datetime.date(last_date)))

### Country selection

In [None]:
# List of countries for this work
country_list = ['Belgium', 'France', 'Germany', 'Italy', 'Netherlands', 'Portugal', 'Spain', 'Sweden', 'Switzerland',  
                'Argentina', 'Brazil', 'Canada', 'India', 'Iran',  'Mexico', 'Russia', 'United Kingdom', 'US']
clist1 = ['Belgium', 'France', 'Germany', 'Italy', 'Netherlands', 'Portugal', 'Spain', 'Sweden', 'Switzerland']
clist2 = ['Argentina', 'Brazil', 'Canada', 'India', 'Iran', 'Mexico', 'Russia', 'United Kingdom', 'US']

In [None]:
# Extract selection of countries
df_cases = df_cases[country_list]
df_recovered = df_recovered[country_list]
df_deaths = df_deaths[country_list]

### Compute active cases

In [None]:
# Active cases = Confirmed cases - Recoverres - Deaths
df_active = pd.DataFrame(columns=df_cases.columns, index=df_cases.index)
for x in country_list:
    df_active[x] = df_cases[x] - df_recovered[x] - df_deaths[x] 

### Compute mortality

In [None]:
# Mortality(%) = Deaths / Cases
df_mortality = pd.DataFrame(columns=df_cases.columns, index=df_cases.index)
for x in country_list:
    df_mortality[x] = 100 * df_deaths[x] / df_cases[x] 

### Compute growth rates

In [None]:
# Compute daily variation of confirmed and active cases, and deaths
df_cases_diff = pd.DataFrame(columns=df_cases.columns, index=df_cases.index)
df_active_diff = pd.DataFrame(columns=df_active.columns, index=df_active.index)
df_deaths_diff = pd.DataFrame(columns=df_deaths.columns, index=df_deaths.index)

for x in country_list:
    df_cases_diff[x] = df_cases[x].diff()
    df_active_diff[x] = df_active[x].diff()
    df_deaths_diff[x] = df_deaths[x].diff()
    
df_cases_diff.fillna(value=0, inplace=True)
df_active_diff.fillna(value=0, inplace=True)
df_deaths_diff.fillna(value=0, inplace=True)

### Remove outliers

In [None]:
# Confirmed cases and deaths are always growing, hence their derivatives must be positive or zero
df_cases_diff[df_cases_diff < 0] = 0
df_deaths_diff[df_deaths_diff < 0] = 0

## Descriptive statistics
---

Let's have a look at where each country is in its specific pandemic expansion. 

The following variables will be displayed for each of the 18 countries:
- Confirmed cases, active cases and deaths
- Mortality(%)
- Growth rates: confirmed cases and deaths

### Evolution of covid-19 cases

In [None]:
# First batch of 9 countries: EVOLUTION of CASES (1 of 2)
fig1, ax1 = plt.subplots(3,3, figsize=(36,15))
fig1.subplots_adjust(top=0.93)
i = 0
j = 0
for x in clist1:
  ax1[i,j].set_title(x, fontsize='x-large')
  ax1[i,j].xaxis.set_major_formatter(my_date_fmt)
  ax1[i,j].xaxis.set_major_locator(plt.MultipleLocator(28)) 
  ax1[i,j].plot(df_cases.index, df_cases[x], color='navy', linewidth=1.5, label='Confirmed cases')
  ax1[i,j].plot(df_active.index, df_active[x], color='skyblue', linewidth=1.5, label='Active cases')
  ax1[i,j].plot(df_recovered.index, df_recovered[x], color='lime', linewidth=1.5, label='Recovered cases')
  ax1[i,j].plot(df_deaths.index, df_deaths[x], color='coral', linewidth=2, label='Deaths')
  if j<2:
    j = j + 1
  else:
    j = 0
    i = i + 1

ax1[0,0].legend(loc='upper left', fontsize='large')
fig1.suptitle('Evolution of covid-19 cases by country (Europe)', fontsize='xx-large')  
fig1.autofmt_xdate(rotation=45, ha='right')
plt.show()

In [None]:
# Second batch of 9 countries: EVOLUTION of CASES (2 of 2)
fig2, ax2 = plt.subplots(3,3, figsize=(36,15))
fig2.subplots_adjust(top=0.93)
i = 0
j = 0
for x in clist2:
  ax2[i,j].set_title(x, fontsize='x-large')
  ax2[i,j].xaxis.set_major_formatter(my_date_fmt)
  ax2[i,j].xaxis.set_major_locator(plt.MultipleLocator(28)) 
  ax2[i,j].plot(df_cases.index, df_cases[x], color='navy', linewidth=1.5, label='Confirmed cases')
  ax2[i,j].plot(df_active.index, df_active[x], color='skyblue', linewidth=1.5, label='Active cases')
  ax2[i,j].plot(df_recovered.index, df_recovered[x], color='lime', linewidth=1.5, label='Recovered cases')
  ax2[i,j].plot(df_deaths.index, df_deaths[x], color='coral', linewidth=1.5, label='Deaths')  
  if j<2:
    j = j + 1
  else:
    j = 0
    i = i + 1

ax2[0,0].legend(loc='upper left', fontsize='large')
fig2.suptitle('Evolution of covid-19 cases by country (World excluding Europe)', fontsize='xx-large')  
fig2.autofmt_xdate(rotation=45, ha='right')
plt.show()

### Mortality

Mortality is calculated as the number of deaths divided by the number of confirmed cases, and expressed as %. As both the number of deaths and confirmed cases vary with time, we can only plot an "instant mortality" chart. However, the real mortality will only be known once the pandemic has been erradicated, so the total number of cases and deaths are known. 

After some thought about this, from version 124 I have removed the mortality charts and left only the "current or latest mortality" values, corresponding to the up-to-date cummulative numbers of cases and deaths.

In [None]:
# Mortality(%) of covid-19 in Europe
print('Mortality(%) in Europe')
df_mortality[clist1].tail(1)

In [None]:
# Mortality(%) of covid-19 in the world (excl. Europe)
print('Mortality(%) in the world (excl.Europe)')
df_mortality[clist2].tail(1)

### Confirmed cases growth rate

The daily change of the number of confirmed cases is calculated by substracting the value of cases on day t-1 from the value on day t. 

It represents the rate of growth, that is, how quickly (or slowly) the number of detected cases is changing.

Since the growth rate is the first derivative of an strictly growing variable (confirmed cases) it must be always positive or zero.

In [None]:
# First batch of 9 countries: DAILY VARIATION of CONFIRMED CASES (1 of 2)

fig1, ax1 = plt.subplots(3,3, figsize=(36,15))
fig1.subplots_adjust(top=0.93)
i = 0
j = 0

for x in clist1:
  ax1[i,j].set_title(x, fontsize='x-large')
  ax1[i,j].xaxis.set_major_formatter(my_date_fmt)
  ax1[i,j].xaxis.set_major_locator(plt.MultipleLocator(28))
  ax1[i,j].bar(df_cases_diff.index,  df_cases_diff[x], color='grey', alpha=0.2, label='Cases growth rate')
  ax1[i,j].plot(df_cases_diff.index,  df_cases_diff[x].rolling(window=7).mean(), color='navy', linewidth=1.5, label='7-day MA')
  if j<2:
    j = j + 1
  else:
    j = 0
    i = i + 1

ax1[0,0].legend(loc='upper left', fontsize='large')
fig1.suptitle('Daily variation of covid-19 confirmed cases by country (Europe)', fontsize='xx-large')  
fig1.autofmt_xdate(rotation=45, ha='right')
plt.show()

In [None]:
# Second batch of 9 countries: DAILY VARIATION of CONFIRMED CASES (2 of 2)

fig2, ax2 = plt.subplots(3,3, figsize=(36,15))
fig2.subplots_adjust(top=0.93)
i = 0
j = 0

for x in clist2:
  ax2[i,j].set_title(x, fontsize='x-large')
  ax2[i,j].xaxis.set_major_formatter(my_date_fmt)
  ax2[i,j].xaxis.set_major_locator(plt.MultipleLocator(28))
  ax2[i,j].bar(df_cases_diff.index,  df_cases_diff[x], color='grey', alpha=0.2, label='Cases growth rate')
  ax2[i,j].plot(df_cases_diff.index,  df_cases_diff[x].rolling(window=7).mean(), color='navy', linewidth=1.5, label='7-day MA')
  if j<2:
    j = j + 1
  else:
    j = 0
    i = i + 1

ax2[0,0].legend(loc='upper left', fontsize='large')
fig2.suptitle('Daily variation of covid-19 confirmed cases by country (World excluding Europe)', fontsize='xx-large')  
fig2.autofmt_xdate(rotation=45, ha='right')
plt.show()

### Deaths growth rate

The daily variation of deaths (or deaths growth rate) tells us how quickly the number of deaths due to covid-19 is increasing (or decreasing).

The same as in the confirmed cases growth rate, the deaths growth rate can only be positive or zero, but not negative.

In [None]:
# First batch of 9 countries: DAILY VARIATION of DEATHS (1 of 2)

fig1, ax1 = plt.subplots(3,3, figsize=(36,15))
fig1.subplots_adjust(top=0.93)
i = 0
j = 0

for x in clist1:
  ax1[i,j].set_title(x, fontsize='x-large')
  ax1[i,j].xaxis.set_major_formatter(my_date_fmt)
  ax1[i,j].xaxis.set_major_locator(plt.MultipleLocator(28))
  ax1[i,j].bar(df_deaths_diff.index,  df_deaths_diff[x], color='grey', alpha=0.2, label='Deaths growth rate')
  ax1[i,j].plot(df_deaths_diff.index,  df_deaths_diff[x].rolling(window=7).mean(), color='coral', linewidth=2, label='7-day MA')
  if j<2:
    j = j + 1
  else:
    j = 0
    i = i + 1

ax1[0,0].legend(loc='upper left', fontsize='large')
fig1.suptitle('Daily variation of covid-19 deaths by country (Europe)', fontsize='xx-large')  
fig1.autofmt_xdate(rotation=45, ha='right')
plt.show()

In [None]:
# Second batch of 9 countries : DAILY VARIATION of DEATHS (2 of 2)

fig2, ax2 = plt.subplots(3,3, figsize=(36,15))
fig2.subplots_adjust(top=0.93)
i = 0
j = 0

for x in clist2:
  ax2[i,j].set_title(x, fontsize='x-large')
  ax2[i,j].xaxis.set_major_formatter(my_date_fmt)
  ax2[i,j].xaxis.set_major_locator(plt.MultipleLocator(28))  
  ax2[i,j].bar(df_deaths_diff.index,  df_deaths_diff[x], color='grey', alpha=0.2, label='Deaths growth rate')
  ax2[i,j].plot(df_deaths.index,  df_deaths_diff[x].rolling(window=7).mean(), color='coral', linewidth=2, label='7-day MA')
  if j<2:
    j = j + 1
  else:
    j = 0
    i = i + 1

ax2[0,0].legend(loc='upper left', fontsize='large')
fig2.suptitle('Daily variation of covid-19 deaths by country (World excluding Europe)', fontsize='xx-large')  
fig2.autofmt_xdate(rotation=45, ha='right')
plt.show()

## Data series normalization by countries population
---

The purpose of this part is to re-calculate the variables object of study by dividing the confirmed cases and deaths time series of each country by its population, so to have numbers per million, which reduces the range of values of the set. 

This will allow for a more convenient comparison of the countries infection levels, and also seems to give better results when fed into the predictive models.

### Exogenous data: country populations

In [None]:
# Countries populations
# Source: https://www.worldometers.info/world-population/population-by-country/

pop = {}


pop['Belgium'] = 11589623
pop['France'] = 65273511
pop['Germany'] = 83783942
pop['Italy'] = 60461826
pop['Netherlands'] = 17134872
pop['Portugal'] = 10196709
pop['Spain'] = 46754778
pop['Sweden'] = 10099265
pop['Switzerland'] = 8654622
pop['Argentina'] = 45195774
pop['Brazil'] = 212559417
pop['Canada'] = 37600000
pop['India'] = 1380004385
pop['Iran'] = 83992949
pop['Mexico'] = 128932753
pop['Russia'] = 145934462
pop['United Kingdom'] = 67886011
pop['US'] = 331002651

pop

### Calculate confirmed cases and deaths per million

In [None]:
# Calculate nbr of cases per million people
df_cases_per_million = pd.DataFrame(columns=df_cases.columns, index=df_cases.index)
for x in df_cases_per_million.columns:
  df_cases_per_million[x] = 1000000 * df_cases[x] // pop[x]

print('Nbr of covid-19 cases per million people')
print(df_cases_per_million.tail(1))

In [None]:
# Calculate nbr of deaths per million people
df_deaths_per_million = pd.DataFrame(columns=df_deaths.columns, index=df_cases.index)
for x in df_deaths_per_million.columns:
  df_deaths_per_million[x] = 1000000 * df_deaths[x] // pop[x]

print('Nbr of covid-19 deaths per million people')
print(df_deaths_per_million.tail(1))

### Plot cases and deaths per million people

In [None]:
fig, ax = plt.subplots(1,2, figsize=(28,6))

# Axis 0: cases per million
for x in df_cases.columns:
  ax[0].bar(x, df_cases_per_million[x].tail(1))

# ax[0].set_ylabel('Number of cases per million')
ax[0].set_xticklabels(country_list, rotation=45, horizontalalignment='right')
ax[0].set_title('Covid-19 confirmed cases per million people as of ' + str(last_date), fontsize='x-large')

# Chart 2: deaths per million 
for x in df_cases.columns:
  ax[1].bar(x, df_deaths_per_million[x].tail(1))

# ax[1].set_ylabel('Number of deaths per million')
ax[1].set_xticklabels(country_list, rotation=45, horizontalalignment='right', fontsize='large')
ax[1].set_title('Covid-19 deaths per million people as of ' +  str(last_date), fontsize='x-large')

plt.show()

## Predictive models of confirmed cases and deaths
---

I will now build, train and evaluate two LSTM-based predictive models for the main cumulative variables. I have used a "Mean Square Error (MSE)" metric to set the neural networks optimization objective and monitor its learning ability over the epochs.

This has been an iterative exercise of model parameters investigation,  hyperparameter tuning and activation function selection. Or maybe even more. What I am trying to say is do not expect all the choices below to make sense right away, as some may be the result of a lengthy trial - error effort.

Based on various tests, it also appears the LSTM networks work better when the different timeseries have a similar range of values. Because of this, I will feed the variables per million to the predictive models to train them and make predictions. In order to plot the resulting charts with meaningful (real) values, I will then perform the inverse normalization by multiplying predicted data by each country population.

In [None]:
#########################################################################
# Prediction model parameters (Confirmed cases and deaths)
#########################################################################

# Number of features Xi (Countries)
NBR_FEATURES = len(country_list)

# Number of predictions (days)
NBR_PREDICTIONS = 90      

# Size ot TRAIN and TEST samples
NBR_SAMPLES = len(df_cases)
NBR_TRAIN_SAMPLES = NBR_SAMPLES - NBR_PREDICTIONS
NBR_TEST_SAMPLES = NBR_SAMPLES - NBR_TRAIN_SAMPLES

# Number of input steps [x(t-1), x(t-2), x(t-3)...] to predict an output y(t)
TIME_STEPS = 60

# Number of overlapping training sequences of TIME_STEPS
BATCH_SIZE = 8

# Number of training cycles
EPOCHS = 100

print('Prediction model parameters for confirmed cases and deaths')
print('..........................................................')
print('NBR_SAMPLES: ', NBR_SAMPLES)
print('NBR_TRAIN_SAMPLES: ', NBR_TRAIN_SAMPLES)
print('NBR_TEST_SAMPLES: ', NBR_TEST_SAMPLES)
print('NBR_PREDICTIONS: ', NBR_PREDICTIONS)
print()
print('NBR_FEATURES: ', NBR_FEATURES)
print('TIME_STEPS:', TIME_STEPS)
print('BATCH_SIZE: ', BATCH_SIZE)
print('EPOCHS: ', EPOCHS)
print('..........................................................')

### Build, train and evaluate RNN for confirmed cases

In [None]:
# Process of CONFIRMED CASES 

# Split dataset into test and train subsets 
df_train_1 = df_cases_per_million.iloc[0:NBR_TRAIN_SAMPLES, 0:NBR_FEATURES] 
df_test_1 = df_cases.iloc[NBR_TRAIN_SAMPLES:, 0:NBR_FEATURES]

# Normalize test and train data (range: 0 - 1)
sc1 = MinMaxScaler(feature_range = (0, 1))
sc1.fit(df_train_1)
sc_df_train_1 = sc1.transform(df_train_1)
# sc_df_test = sc.transform(df_test)

# Prepare training sequences
X_train_1 = []
y_train_1 = []
for i in range(TIME_STEPS, NBR_TRAIN_SAMPLES):
    X_train_1.append(sc_df_train_1[i-TIME_STEPS:i, 0:NBR_FEATURES])
    y_train_1.append(sc_df_train_1[i, 0:NBR_FEATURES])
   
X_train_1, y_train_1 = np.array(X_train_1), np.array(y_train_1)
X_train_1 = np.reshape(X_train_1, (X_train_1.shape[0], X_train_1.shape[1], NBR_FEATURES))

In [None]:
# Build the RNN, dropout helps prevent overfitting

# Initialize structure
RNN1 = Sequential()

# Build layers: 2 LSTM layers with dropout
RNN1.add(LSTM(units = 512, return_sequences = True, input_shape = (X_train_1.shape[1], NBR_FEATURES)))
RNN1.add(Dropout(0.25))
RNN1.add(Bidirectional(LSTM(units = 256), merge_mode='ave'))
RNN1.add(Dropout(0.25))
RNN1.add(Dense(units = NBR_FEATURES, activation='selu'))

RNN1.summary()

In [None]:
%%time
# Compile the RNN 
RNN1.compile(optimizer = 'adam', loss = 'mean_squared_error')

# Train the RNN
history_RNN1 = RNN1.fit(X_train_1, y_train_1, epochs = EPOCHS, batch_size = BATCH_SIZE, verbose=0)

In [None]:
# Convert the training history to a dataframe
history_RNN1_df = pd.DataFrame(history_RNN1.history)

# use Pandas native plot method
history_RNN1_df['loss'].plot(figsize=(8,4), title='MSE for "Confirmed Cases" neural network training (evaluation model)', xlabel='EPOCH', color='brown');

In [None]:
# Use now the full dataframe to predict / evaluate the model
df_full_1 = df_cases_per_million.copy()

# Scale full dataset (use same scaler fitted with train data earlier)
df_full_1 = sc1.transform(df_full_1)

X_test_1 = []
for i in range(NBR_TRAIN_SAMPLES, NBR_SAMPLES):
    X_test_1.append(df_full_1[i-TIME_STEPS:i, 0:NBR_FEATURES])

X_test_1 = np.array(X_test_1)
X_test_1 = np.reshape(X_test_1, (X_test_1.shape[0], X_test_1.shape[1], NBR_FEATURES))

# Make predictions
predicted_values_1 = RNN1.predict(X_test_1)
predicted_values_1 = sc1.inverse_transform(predicted_values_1)

# Reverse per million scaling
i = 0
for x in country_list:
  df_test_1[x + '_Predicted'] = predicted_values_1[:,i] * pop[x] / 1000000
  df_train_1[x] = df_train_1[x] * pop[x] / 1000000
  i = i + 1

In [None]:
fig, ax = plt.subplots(6,3, figsize=(36,30))
fig.subplots_adjust(top=0.95)
i = 0
j = 0

for x in country_list:
  ax[i,j].set_title(x, fontsize='x-large')
  ax[i,j].xaxis.set_major_formatter(my_date_fmt)
  ax[i,j].xaxis.set_major_locator(plt.MultipleLocator(28))
  ax[i,j].plot(df_train_1.index, df_train_1[x], color='blue', linewidth=1.5, label='Train')
  ax[i,j].plot(df_test_1.index, df_test_1[x], color='grey', linewidth=1.5, alpha=0.5, label='Test')
  ax[i,j].plot(df_test_1.index, df_test_1[x + '_Predicted'], color='indigo', linestyle=':', linewidth=2, label='Prediction')
  ax[i,j].legend(loc='upper left', fontsize='large')
  if j<2:
    j = j + 1
  else:
    i = i + 1
    j = 0

fig.suptitle(str(NBR_PREDICTIONS) + '-day prediction of covid-19 cases vs. training and validation data', fontsize='xx-large')  
fig.autofmt_xdate(rotation=45, ha='right')
plt.show()

### Build, train and evaluate RNN for deaths

In [None]:
# Process of DEATHS 
# Split dataset into test and train subsets 
df_train_2 = df_deaths_per_million.iloc[0:NBR_TRAIN_SAMPLES, 0:NBR_FEATURES] 
df_test_2 = df_deaths.iloc[NBR_TRAIN_SAMPLES:, 0:NBR_FEATURES]

# Normalize test and train data (range: 0 - 1)
sc2 = MinMaxScaler(feature_range = (0, 1))
sc2.fit(df_train_2)
sc_df_train_2 = sc2.transform(df_train_2)

# Prepare training sequences
X_train_2 = []
y_train_2 = []
for i in range(TIME_STEPS, NBR_TRAIN_SAMPLES):
    X_train_2.append(sc_df_train_2[i-TIME_STEPS:i, 0:NBR_FEATURES])
    y_train_2.append(sc_df_train_2[i, 0:NBR_FEATURES])
   
X_train_2, y_train_2 = np.array(X_train_2), np.array(y_train_2)
X_train_2 = np.reshape(X_train_2, (X_train_2.shape[0], X_train_2.shape[1], NBR_FEATURES))

In [None]:
# Build the RNN, dropout helps prevent overfitting

# Initialize structure
RNN2 = Sequential()

# Build layers: 2 LSTM layers with dropout
RNN2.add(LSTM(units = 512, return_sequences = True, input_shape = (X_train_2.shape[1], NBR_FEATURES)))
RNN2.add(Dropout(0.25))
RNN2.add(Bidirectional(LSTM(units = 256), merge_mode='ave'))
RNN2.add(Dropout(0.25))
RNN2.add(Dense(units = NBR_FEATURES, activation='selu'))

RNN2.summary()

In [None]:
%%time
# Compile the RNN
RNN2.compile(optimizer = 'adam', loss = 'mean_squared_error')

# Train the RNN
history_RNN2 = RNN2.fit(X_train_2, y_train_2, epochs = EPOCHS, batch_size = BATCH_SIZE, verbose=0)

In [None]:
# Convert the training history to a dataframe
history_RNN2_df = pd.DataFrame(history_RNN2.history)

# use Pandas native plot method
history_RNN2_df['loss'].plot(figsize=(8,4), title='MSE for "Deaths" neural network training (evaluation model)', xlabel='EPOCH', color='brown');

In [None]:
# Use now the full dataframe to predict / evaluate the model
df_full_2 = df_deaths_per_million.copy()

# Scale full dataset (use same scaler fitted with train data earlier)
df_full_2 = sc2.transform(df_full_2)

X_test_2 = []
for i in range(NBR_TRAIN_SAMPLES, NBR_SAMPLES):
    X_test_2.append(df_full_2[i-TIME_STEPS:i, 0:NBR_FEATURES])

X_test_2 = np.array(X_test_2)
X_test_2 = np.reshape(X_test_2, (X_test_2.shape[0], X_test_2.shape[1], NBR_FEATURES))

# Make predictions
predicted_values_2 = RNN2.predict(X_test_2)
predicted_values_2 = sc2.inverse_transform(predicted_values_2)

# Reverse per million scaling
i = 0
for x in country_list:
  df_test_2[x + '_Predicted'] = predicted_values_2[:,i] * pop[x] / 1000000
  df_train_2[x] = df_train_2[x] * pop[x] / 1000000
  i = i + 1

In [None]:
fig, ax = plt.subplots(6,3, figsize=(36,30))
fig.subplots_adjust(top=0.95)
i = 0
j = 0

for x in country_list:
  ax[i,j].set_title(x, fontsize='x-large')
  ax[i,j].xaxis.set_major_formatter(my_date_fmt)
  ax[i,j].xaxis.set_major_locator(plt.MultipleLocator(28))
  ax[i,j].plot(df_train_2.index, df_train_2[x], color='coral', linewidth=1.5, label='Train')
  ax[i,j].plot(df_test_2.index, df_test_2[x], color='grey', linewidth=1.5, alpha=0.5, label='Test')
  ax[i,j].plot(df_test_2.index, df_test_2[x + '_Predicted'], color='black', linestyle=':', linewidth=2, label='Prediction')
  ax[i,j].legend(loc='upper left', fontsize='large')
  if j<2:
    j = j + 1
  else:
    i = i + 1
    j = 0

fig.suptitle(str(NBR_PREDICTIONS) + '-day prediction of covid-19 deaths vs. training and validation data', fontsize='xx-large')  
fig.autofmt_xdate(rotation=45, ha='right')
plt.show()

## Future predictions of confirmed cases and deaths
---

The above predictons look quite impressive. However, it is neccessary to explain here that the predicted values are single-step predictions. 

The LSTM neural network works in such way that it predicts y(t) from X(t-1), X(t-2), .... X(t-TIMESTEPS). So I have trained it to do so with batches formed with data up to NBR_TRAIN_SAMPLES. But then the validation is done by using such batches with data after NBR_TRAIN_SAMPLES and up to NBR_PREDICTIONS - 1. 

So it is not bad, but it is not enough. What I really want is to make multi-step predictions. More precisely, I would like to predict 30 days ahead starting the day after today. I can do that by training the LSTM network with all data available (up to the date of execution) and then make recurring predictions retrofitting each new single-step prediction as an input for the next prediction.

Let's see this at work.

In [None]:
#########################################################################
# Future prediction model parameters for confirmed cases and deaths
#########################################################################

# Number of features Xi (Countries)
NBR_FEATURES = len(country_list)

# Number of predictions (days)
NBR_PREDICTIONS = 90

# Size ot TRAIN and TEST samples
NBR_SAMPLES = len(df_cases)
NBR_TRAIN_SAMPLES = NBR_SAMPLES

# Number of input steps [x(t-1), x(t-2), x(t-3)...] to predict an output y(t)
TIME_STEPS = 60

# Number of overlapping training sequences of TIME_STEPS
BATCH_SIZE = 8

# Number of training cycles
EPOCHS = 250

print('Future prediction model parameters for confirmed cases and deaths')
print('.................................................................')
print('NBR_SAMPLES: ', NBR_SAMPLES)
print('NBR_TRAIN_SAMPLES: ', NBR_TRAIN_SAMPLES)
print('NBR_PREDICTIONS: ', NBR_PREDICTIONS)
print()
print('TIME_STEPS:', TIME_STEPS)
print('NBR_FEATURES: ', NBR_FEATURES)
print('BATCH_SIZE: ', BATCH_SIZE)
print('EPOCHS: ', EPOCHS)
print('.................................................................')

### Retrain confirmed cases RNN and make future predictions

In [None]:
# Use full dataset as train data - CONFIRMED CASES
df_train_1 = df_cases_per_million.copy()

# Create empty dataframe with NBR_PREDICTIONS samples
start_date = df_train_1.index[-1] + timedelta(days=1)
ind = pd.date_range(start_date, periods=NBR_PREDICTIONS, freq='D')
df_pred_1 = pd.DataFrame(index=ind, columns=df_train_1.columns)
df_pred_1.fillna(value=0, inplace=True)

# Normalize train data (range: 0 - 1)
sc1 = MinMaxScaler(feature_range = (0, 1))
sc1.fit(df_train_1)
sc_df_train_1 = sc1.transform(df_train_1)

# Prepare training sequences
X_train_1 = []
y_train_1 = []
for i in range(TIME_STEPS, NBR_TRAIN_SAMPLES):
    X_train_1.append(sc_df_train_1[i-TIME_STEPS:i, 0:NBR_FEATURES])
    y_train_1.append(sc_df_train_1[i, 0:NBR_FEATURES])

X_train_1, y_train_1 = np.array(X_train_1), np.array(y_train_1)
X_train_1 = np.reshape(X_train_1, (X_train_1.shape[0], X_train_1.shape[1], NBR_FEATURES))

In [None]:
%%time
# Will reuse RNN1 already defined and validated earlier
RNN1.summary()

# Retrain the RNN with all available data
history_RNN1 = RNN1.fit(X_train_1, y_train_1, epochs = EPOCHS, batch_size = BATCH_SIZE, verbose=0)

In [None]:
# Convert the training history to a dataframe
history_RNN1_df = pd.DataFrame(history_RNN1.history)

# use Pandas native plot method
history_RNN1_df['loss'].plot(figsize=(8,4), title='MSE for "Confirmed Cases" neural network training (future predictive model)', xlabel='EPOCH', color='brown');

In [None]:
# Make predictions 
LSTM_predictions_scaled_1 = list()
batch = sc_df_train_1[-TIME_STEPS:]
current_batch = batch.reshape((1, TIME_STEPS, NBR_FEATURES))

for i in range(len(df_pred_1)):   
    LSTM_pred_1 = RNN1.predict(current_batch)[0]
    LSTM_predictions_scaled_1.append(LSTM_pred_1) 
    current_batch = np.append(current_batch[:,1:,:],[[LSTM_pred_1]],axis=1)
    
# Reverse downscaling
LSTM_predictions_1 = sc1.inverse_transform(LSTM_predictions_scaled_1)
df_pred_1 = pd.DataFrame(data=LSTM_predictions_1, index=df_pred_1.index, columns=df_pred_1.columns)

# Reverse per million scaling
for x in country_list:
    df_pred_1[x] = df_pred_1[x] * pop[x] / 1000000    

In [None]:
fig, ax = plt.subplots(6,3, figsize=(36,30))
fig.subplots_adjust(top=0.95)
i = 0
j = 0

for x in country_list:
  ax[i,j].set_title(x, fontsize='x-large')
  ax[i,j].xaxis.set_major_formatter(my_date_fmt)
  ax[i,j].xaxis.set_major_locator(plt.MultipleLocator(28))
  ax[i,j].plot(df_cases.index, df_cases[x], color='blue', linewidth=1.5, label='Actual data')
  ax[i,j].plot(df_pred_1.index, df_pred_1[x], color='indigo', linewidth=2, linestyle=':', label='Prediction')
  ax[i,j].legend(loc='upper left', fontsize='large')
  if j<2:
    j = j + 1
  else:
    i = i + 1
    j = 0

fig.suptitle(str(NBR_PREDICTIONS) + '-day future prediction of covid-19 cases by country', fontsize='xx-large')  
fig.autofmt_xdate(rotation=45, ha='right')
plt.show()

### Retrain deaths RNN and make future predictions

In [None]:
# Use full dataset as train data - DEATHS
df_train_2 = df_deaths_per_million.copy()

# Create empty dataframe with NBR_PREDICTIONS samples
start_date = df_train_2.index[-1] + timedelta(days=1)
ind = pd.date_range(start_date, periods=NBR_PREDICTIONS, freq='D')
df_pred_2 = pd.DataFrame(index=ind, columns=df_train_2.columns)
df_pred_2.fillna(value=0, inplace=True)

# Normalize train data (range: 0 - 1)
sc2 = MinMaxScaler(feature_range = (0, 1))
sc2.fit(df_train_2)
sc_df_train_2 = sc2.transform(df_train_2)

# Prepare training sequences
X_train_2 = []
y_train_2 = []
for i in range(TIME_STEPS, NBR_TRAIN_SAMPLES):
    X_train_2.append(sc_df_train_2[i-TIME_STEPS:i, 0:NBR_FEATURES])
    y_train_2.append(sc_df_train_2[i, 0:NBR_FEATURES])

X_train_2, y_train_2 = np.array(X_train_2), np.array(y_train_2)
X_train_2 = np.reshape(X_train_2, (X_train_2.shape[0], X_train_2.shape[1], NBR_FEATURES))

In [None]:
%%time
# Will reuse RNN2 already defined and validated earlier
RNN2.summary()

# Retrain the RNN with all available data
history_RNN2 = RNN2.fit(X_train_2, y_train_2, epochs = EPOCHS, batch_size = BATCH_SIZE, verbose=0)

In [None]:
# Convert the training history to a dataframe
history_RNN2_df = pd.DataFrame(history_RNN2.history)

# use Pandas native plot method
history_RNN2_df['loss'].plot(figsize=(8,4), title='MSE for "Deaths" neural network training (future predictive model)', xlabel='EPOCH', color='brown');

In [None]:
# Make predictions 
LSTM_predictions_scaled_2 = list()
batch = sc_df_train_2[-TIME_STEPS:]
current_batch = batch.reshape((1, TIME_STEPS, NBR_FEATURES))

for i in range(len(df_pred_2)):   
    LSTM_pred_2 = RNN2.predict(current_batch)[0]
    LSTM_predictions_scaled_2.append(LSTM_pred_2) 
    current_batch = np.append(current_batch[:,1:,:],[[LSTM_pred_2]],axis=1)
    
# Reverse downscaling
LSTM_predictions_2 = sc2.inverse_transform(LSTM_predictions_scaled_2)
df_pred_2 = pd.DataFrame(data=LSTM_predictions_2, index=df_pred_2.index, columns=df_pred_2.columns)

# Reverse per million scaling
for x in country_list:
    df_pred_2[x] = df_pred_2[x] * pop[x] / 1000000

In [None]:
fig, ax = plt.subplots(6,3, figsize=(36,30))
fig.subplots_adjust(top=0.95)
i = 0
j = 0

for x in country_list:
  ax[i,j].set_title(x, fontsize='x-large')
  ax[i,j].xaxis.set_major_formatter(my_date_fmt)
  ax[i,j].xaxis.set_major_locator(plt.MultipleLocator(28))
  ax[i,j].plot(df_deaths.index, df_deaths[x], color='coral', linewidth=1.5, label='Actual data')
  ax[i,j].plot(df_pred_2.index, df_pred_2[x], color='black', linewidth=2, linestyle=':', label='Prediction')
  ax[i,j].legend(loc='upper left', fontsize='large')
  if j<2:
    j = j + 1
  else:
    i = i + 1
    j = 0

fig.suptitle(str(NBR_PREDICTIONS) + '-day future prediction of covid-19 deaths by country', fontsize='xx-large')  
fig.autofmt_xdate(rotation=45, ha='right')
plt.show()

Sometimes the predictions are all wrong, with the values predicted for the cumulative variables (cases and deaths) decreasing which is just not possible. I do not know why it happens, but I have learnt that another run (or two) should bring me the desired results.

## Predictive model of confirmed cases growth rate
---

Finally, I will implement and train a LSTM-based predicive model of the cases growth rate. 

I initially wanted to implement a future prediction model of active cases growth rate (in fact, early versions of this notebook did so) but time and experience has taught me that was a bad variable to work with. Countries like Spain, UK or Sweden do not report covid-19 recovered figures, which is a component of active cases, so any further work on the latter will carry over any errors of the former.

So I decided to stick to the basics and do a predictive model of the number of cases growth rate instead. This is actually the very first variable reported and spoken about. When someone says in the TV news "there were 2500 new cases registered today" this is the value of the cases growth rate today. 

Unlike the confirmed cases and deaths, where I normalized per million people before training the RNNs, I have worked here with the total number of cases growth rate, and the results are acceptable.

In [None]:
#########################################################################
# Prediction model parameters (Confirmed cases growth rate)
#########################################################################

# Number of features Xi (Countries)
NBR_FEATURES = len(country_list)

# Number of predictions (days)
NBR_PREDICTIONS = 90

# Size ot TRAIN and TEST samples
NBR_SAMPLES = len(df_cases_diff)
NBR_TRAIN_SAMPLES = NBR_SAMPLES - NBR_PREDICTIONS
NBR_TEST_SAMPLES = NBR_SAMPLES - NBR_TRAIN_SAMPLES

# Number of input steps [x(t-1), x(t-2), x(t-3)...] to predict an output y(t)
TIME_STEPS = 60

# Number of overlapping training sequences of TIME_STEPS
BATCH_SIZE = 8

# Number of training cycles
EPOCHS = 200

print('Prediction model parameters for confirmed cases growth rates')
print('............................................................')
print('NBR_SAMPLES: ', NBR_SAMPLES)
print('NBR_TRAIN_SAMPLES: ', NBR_TRAIN_SAMPLES)
print('NBR_TEST_SAMPLES: ', NBR_TEST_SAMPLES)
print('NBR_PREDICTIONS: ', NBR_PREDICTIONS)
print()
print('NBR_FEATURES: ', NBR_FEATURES)
print('TIME_STEPS:', TIME_STEPS)
print('BATCH_SIZE: ', BATCH_SIZE)
print('EPOCHS: ', EPOCHS)
print('............................................................')

### Build, train and evaluate RNN for confirmed cases growth rate

In [None]:
# Process of CONFIRMED CASES GROWTH RATE data

# Split dataset into test and train subsets 
df_train_3 = df_cases_diff.iloc[0:NBR_TRAIN_SAMPLES, 0:NBR_FEATURES] 
df_test_3 = df_cases_diff.iloc[NBR_TRAIN_SAMPLES:, 0:NBR_FEATURES]

# Normalize test and train data (range: 0 - 1)
sc3 = MinMaxScaler(feature_range = (0, 1))
sc3.fit(df_train_3)
sc_df_train_3 = sc3.transform(df_train_3)

# Prepare training sequences
X_train_3 = []
y_train_3 = []
for i in range(TIME_STEPS, NBR_TRAIN_SAMPLES):
    X_train_3.append(sc_df_train_3[i-TIME_STEPS:i, 0:NBR_FEATURES])
    y_train_3.append(sc_df_train_3[i, 0:NBR_FEATURES])
   
X_train_3, y_train_3 = np.array(X_train_3), np.array(y_train_3)
X_train_3 = np.reshape(X_train_3, (X_train_3.shape[0], X_train_3.shape[1], NBR_FEATURES))

In [None]:
# Build the RNN, dropout helps prevent overfitting

# Initialize structure
RNN3 = Sequential()

# Build layers: 2 LSTM layers with dropout
RNN3.add(LSTM(units = 512, return_sequences = True, input_shape = (X_train_3.shape[1], NBR_FEATURES)))
RNN3.add(Dropout(0.3))
RNN3.add(LSTM(units = 512))
RNN3.add(Dropout(0.3))
RNN3.add(Dense(units = NBR_FEATURES, activation='relu'))

RNN3.summary()

In [None]:
%%time
# Compile the RNN
RNN3.compile(optimizer = 'adam', loss = 'mean_squared_error')

# Retrain the RNN with all available data
history_RNN3 = RNN3.fit(X_train_3, y_train_3, epochs = EPOCHS, batch_size = BATCH_SIZE, verbose=0)

In [None]:
# convert the training history to a dataframe
history_RNN3_df = pd.DataFrame(history_RNN3.history)

# use Pandas native plot method
history_RNN3_df['loss'].plot(figsize=(8,4), title='MSE for "Cases Growth Rate" neural network training (evaluation model)', xlabel='EPOCH', color='brown');

In [None]:
# Use now the full dataframe to predict / evaluate the model
df_full_3 = df_cases_diff.copy()

# Scale full dataset (use same scaler fitted with train data earlier)
df_full_3 = sc3.transform(df_full_3)

X_test_3 = []
for i in range(NBR_TRAIN_SAMPLES, NBR_SAMPLES):
    X_test_3.append(df_full_3[i-TIME_STEPS:i, 0:NBR_FEATURES])

X_test_3 = np.array(X_test_3)
X_test_3 = np.reshape(X_test_3, (X_test_3.shape[0], X_test_3.shape[1], NBR_FEATURES))

# Make predictions
predicted_values_3 = RNN3.predict(X_test_3)
predicted_values_3 = sc3.inverse_transform(predicted_values_3)

i = 0
for x in country_list:
  df_test_3[x + '_Predicted'] = predicted_values_3[:,i]
  i = i + 1

In [None]:
# Plot future predictions of the cases growth rate
fig, ax = plt.subplots(6,3, figsize=(36,30))
fig.subplots_adjust(top=0.95)
i = 0
j = 0

for x in country_list:
  ax[i,j].set_title(x, fontsize='x-large')
  ax[i,j].xaxis.set_major_formatter(my_date_fmt)
  ax[i,j].xaxis.set_major_locator(plt.MultipleLocator(28))
  ax[i,j].plot(df_train_3.index, df_train_3[x], color='indigo', linewidth=1.5, label='Train')
  ax[i,j].plot(df_test_3.index, df_test_3[x], color='grey', linewidth=1.5, alpha=0.5, label='Test')
  ax[i,j].plot(df_test_3.index, df_test_3[x + '_Predicted'], color='red', linestyle=':', linewidth=1.5, label='Prediction')
  ax[i,j].legend(loc='upper left', fontsize='medium')
  if j<2: 
    j = j + 1
  else:
    i = i + 1
    j = 0


fig.suptitle(str(NBR_PREDICTIONS) + '-day prediction of the covid-19 cases growth rate vs. training and validation data', fontsize='xx-large')  
fig.autofmt_xdate(rotation=45, ha='right')
plt.show()

## Future predictions of confirmed cases growth rate

In [None]:
#########################################################################
# Future prediction model parameters for confirmed cases growth rate
#########################################################################

# Number of features Xi (Countries)
NBR_FEATURES = len(country_list)

# Number of predictions (days)
NBR_PREDICTIONS = 90

# Size ot TRAIN and TEST samples
NBR_SAMPLES = len(df_cases_diff)
NBR_TRAIN_SAMPLES = NBR_SAMPLES

# Number of input steps [x(t-1), x(t-2), x(t-3)...] to predict an output y(t)
TIME_STEPS = 60

# Number of overlapping training sequences of TIME_STEPS
BATCH_SIZE = 8

# Number of training cycles
EPOCHS = 200

print('Future prediction model parameters for confirmed cases growth rate')
print('..................................................................')
print('NBR_SAMPLES: ', NBR_SAMPLES)
print('NBR_TRAIN_SAMPLES: ', NBR_TRAIN_SAMPLES)
print('NBR_PREDICTIONS: ', NBR_PREDICTIONS)
print()
print('TIME_STEPS:', TIME_STEPS)
print('NBR_FEATURES: ', NBR_FEATURES)
print('BATCH_SIZE: ', BATCH_SIZE)
print('EPOCHS: ', EPOCHS)
print('..................................................................')

### Retrain cases growth rate RNN and make future predictions

In [None]:
# Use full dataset as train data - CONFIRMED CASES GROWTH RATE
df_train_3 = df_cases_diff.copy()

# Create empty dataframe with NBR_PREDICTIONS samples
start_date = df_train_3.index[-1] + timedelta(days=1)
ind = pd.date_range(start_date, periods=NBR_PREDICTIONS, freq='D')
df_pred_3 = pd.DataFrame(index=ind, columns=df_train_3.columns)
df_pred_3.fillna(value=0, inplace=True)

# Normalize train data (range: 0 - 1)
sc3 = MinMaxScaler(feature_range = (0, 1))
sc3.fit(df_train_3)
sc_df_train_3 = sc3.transform(df_train_3)

# Prepare training sequences
X_train_3 = []
y_train_3 = []
for i in range(TIME_STEPS, NBR_TRAIN_SAMPLES):
    X_train_3.append(sc_df_train_3[i-TIME_STEPS:i, 0:NBR_FEATURES])
    y_train_3.append(sc_df_train_3[i, 0:NBR_FEATURES])

X_train_3, y_train_3 = np.array(X_train_3), np.array(y_train_3)
X_train_3 = np.reshape(X_train_3, (X_train_3.shape[0], X_train_3.shape[1], NBR_FEATURES))

In [None]:
%%time
# Will reuse RNN3 already defined and validated earlier
RNN3.summary()

# Retrain the RNN with all available data
history_RNN3 = RNN3.fit(X_train_3, y_train_3, epochs = EPOCHS, batch_size = BATCH_SIZE, verbose=0)

In [None]:
# Convert the training history to a dataframe
history_RNN3_df = pd.DataFrame(history_RNN3.history)

# use Pandas native plot method
history_RNN3_df['loss'].plot(figsize=(8,4), title='MSE for "Cases Growth Rate" neural network training (future predictive model)', xlabel='EPOCH', color='brown');

In [None]:
# Make predictions 
LSTM_predictions_scaled_3 = list()
batch = sc_df_train_3[-TIME_STEPS:]
current_batch = batch.reshape((1, TIME_STEPS, NBR_FEATURES))

for i in range(len(df_pred_3)):   
    LSTM_pred_3 = RNN3.predict(current_batch)[0]
    LSTM_predictions_scaled_3.append(LSTM_pred_3) 
    current_batch = np.append(current_batch[:,1:,:],[[LSTM_pred_3]],axis=1)
    
# Reverse downscaling
LSTM_predictions_3 = sc3.inverse_transform(LSTM_predictions_scaled_3)
df_pred_3 = pd.DataFrame(data=LSTM_predictions_3, index=df_pred_3.index, columns=df_pred_3.columns)

In [None]:
fig, ax = plt.subplots(6,3, figsize=(36,30))
fig.subplots_adjust(top=0.95)
i = 0
j = 0

for x in country_list:
  ax[i,j].set_title(x, fontsize='x-large')
  ax[i,j].xaxis.set_major_formatter(my_date_fmt)
  ax[i,j].xaxis.set_major_locator(plt.MultipleLocator(28))
  ax[i,j].plot(df_train_3.index, df_train_3[x], color='indigo', linewidth=1.5, label='Actual data')
  ax[i,j].plot(df_pred_3.index, df_pred_3[x], color='red', linewidth=1.5, linestyle=':', label='Prediction')
  ax[i,j].legend(loc='upper left', fontsize='medium')
  if j<2:
    j = j + 1
  else:
    i = i + 1
    j = 0

fig.suptitle(str(NBR_PREDICTIONS) + '-day future prediction of the confirmed cases growth rate by country', fontsize='xx-large')  
fig.autofmt_xdate(rotation=45, ha='right')
plt.show()

In [None]:
#np.random.get_state()



(End of noteboook)