We live in a world that is so intertwined and connected. This reminds us the poet from the great Persian poet Saa'di Shirazi:
> "Human beings are members of one another,
since in their creation they are of one essence.
When the conditions of the time brings a member (limb) to pain,
the other members (limbs) will suffer from discomfort.
You, who are indifferent to the misery of others,
it is not fitting that they should call you a human being."

This is a Quantum world and we need the rules of Quantum Physics to address the events b/c it cannot be described anymore with the cause-effect relationship. This is infact a difference between our view and how Ray Dalio (Bridgewater CEO, the world wealthiest hedgefund) interprets the world. We start with simple Newtonian AI approach here. We hope in the future we can publish some Quantum solutions.

# Part 1: Simple LSTM with PyTorch

The novel coronavirus (COVID-19) has impacted a lot of life style especially in the southern California where we live. In this part we try to address this using a the most simple possible timeseries model using LSTM in PyTorch framework. We are also thankful to Venelin Valkov for taking initiative and published the first LSTM attempt in this regard for global patterns. Cf. https://github.com/curiousily/Getting-Things-Done-with-Pytorch/blob/master/05.time-series-forecasting-covid-19.ipynb for more information.

## Novel Coronavirus (COVID-19)

The novel Coronavirus (Covid-19) has spread around the world in a fast way. Currenly, [Worldometers.info](https://www.worldometers.info/coronavirus/) provides data for more than *335,403* confirmed cases in more than *192* countries.

The top 6 worst-affected (by far) are China (the source of the virus), Italy, USA, Spain, Germany, and Iran. However, many cases are currently not reported due to:

- A person can get infected without even knowing (asymptomatic)
- Incorrect data reporting
- Not enough test kits
- The symptoms look a lot like the common flu


### How dangerous is this virus?

> Except for the common statistics you might see cited on the news, there are some good and some bad news:
> 
> - More than 80% of the confirmed cases recover without any need of medical attention
> - [3.4% Mortality Rate estimate by the World Health Organization (WHO) as of March 3](https://www.worldometers.info/coronavirus/coronavirus-death-rate/#who-03-03-20)
> - The reproductive number which represents the average number of people to which a single infected person will transmit the virus is between 1.4 and 2.5 [(WHO's estimated on Jan. 23)](https://www.worldometers.info/coronavirus/#repro)
> 
> The last one is really scary. It sounds like we can witness some crazy exponential growth if appropriate measures are not put in place.

There would be other factors and we will elaborate later in the data challenges. Let's now focus on the most simple model.


In [None]:
# Import libraries
import torch

import os
import numpy as np
import pandas as pd
from tqdm import tqdm
import seaborn as sns
from pylab import rcParams
import matplotlib.pyplot as plt
from matplotlib import rc
from sklearn.preprocessing import MinMaxScaler
from pandas.plotting import register_matplotlib_converters
from torch import nn, optim
from sklearn.model_selection import train_test_split
%matplotlib inline
%config InlineBackend.figure_format='retina'

sns.set(style='whitegrid', palette='muted', font_scale=1.2)

HAPPY_COLORS_PALETTE = ["#01BEFE", "#FFDD00", "#FF7D00", "#FF006D", "#93D30C", "#8F00FF"]

sns.set_palette(sns.color_palette(HAPPY_COLORS_PALETTE))

rcParams['figure.figsize'] = 14, 10
register_matplotlib_converters()

RANDOM_SEED = 42
np.random.seed(RANDOM_SEED)
torch.manual_seed(RANDOM_SEED)


## Daily Cases Dataset

The data is provided by the Johns Hopkins University Center for Systems Science and Engineering (JHU CSSE) and contains the number of reported daily cases by country. [The dataset is available on GitHub](https://github.com/CSSEGISandData/COVID-19) and is updated regularly.

We're going to take the Time Series data only for confirmed cases (number of deaths and recovered cases are also available):
Note that, we are told by Kaggle to use the data upto March 18, 2020 to predict the future.

## Data Exploration and EDA

![](http://)

In [None]:
df_train = pd.read_csv("../input/covid19-local-us-ca-forecasting-week-1/ca_train.csv")
df_test = pd.read_csv("../input/covid19-local-us-ca-forecasting-week-1/ca_test.csv")

In [None]:
df_train.head()

In [None]:
df_test.head()

Two things to note here:

- The data contains a province, country, latitude, and longitude. For simplicity these are ignored in this notebook and for the future we will explore them further.
- The number of cases is cumulative. We'll undo the accumulation.

Let's start by getting rid of the first four columns:

In [None]:
train_drop_cols = df_train.columns[:-3]
test_drop_cols = df_test.columns[1:-1]

train = df_train.copy().drop(train_drop_cols, axis=1)
test = df_test.copy().drop(test_drop_cols, axis=1)

In [None]:
train.head()

In [None]:
test.head()

In [None]:
df = train
df.head()

### Check for missing values:

In [None]:
df.isnull().sum().sum()

### Reindexing the data:

In [None]:
train.index = pd.to_datetime(train['Date'])
train.drop(['Date'], axis=1, inplace=True)

test.index = pd.to_datetime(test['Date'])
test.drop(['Date'], axis=1, inplace=True)

In [None]:
train.head()

In [None]:
test.head()

Note that we only have data for 63 days which is not enough for a deep learning strategy but we will try to overcome this by improving the model parameters.

In [None]:
daily_cases = train

In [None]:
plt.plot(daily_cases['ConfirmedCases'])
plt.title("Cumulative confirmed daily cases");

In [None]:
plt.plot(daily_cases['Fatalities'])
plt.title("Cumulative fatalities daily cases");

We'll undo the accumulation by subtracting the current value from the previous. We'll preserve the first value of the sequence:


In [None]:
daily_cases_infected = daily_cases['ConfirmedCases'].diff().fillna(daily_cases['ConfirmedCases'][0]).astype(np.int64)
daily_cases_infected.head()

In [None]:
daily_cases_fatality = daily_cases['Fatalities'].diff().fillna(daily_cases['Fatalities'][0]).astype(np.int64)
daily_cases_fatality.head()

In [None]:
plt.plot(daily_cases_infected)
plt.title("Daily infected cases");

In [None]:
plt.plot(daily_cases_fatality)
plt.title("Daily fatality cases");

In [None]:
daily_cases_infected.shape

In [None]:
daily_cases_fatality.shape

## Extract rows with confirmed cases greater than**** 0


In [None]:
# train = train[train['ConfirmedCases'] > 0]
# train_data, test_data = train_test_split(train, test_size=0.33, random_state=42)
# infection_train = train_data['ConfirmedCases']
# infection_test = test_data['ConfirmedCases']
# fatality_train = train_data['Fatalities']
# fatality_test = test_data['Fatalities']

# Preprocessing

In [None]:
train_data_infected, test_data_infected = train_test_split(daily_cases_infected, test_size=0.33, random_state=42)
train_data_fatality, test_data_fatality = train_test_split(daily_cases_fatality, test_size=0.33, random_state=42)
infection_train = train_data_infected
fatality_train = train_data_fatality
infection_test = test_data_infected
fatality_test = test_data_fatality

In [None]:
train_data_infected.shape

In [None]:
train_data_fatality.shape

The data is scaled b/w 0 and 1 to increase the training speed and performance of the model.

In [None]:
scaler_infection = MinMaxScaler()

scaler_infection = scaler_infection.fit(np.expand_dims(infection_train, axis=1))

infection_train = scaler_infection.transform(np.expand_dims(infection_train, axis=1))

infection_test = scaler_infection.transform(np.expand_dims(infection_test, axis=1))

scaler_fatality = MinMaxScaler()

scaler_fatality = scaler_fatality.fit(np.expand_dims(fatality_train, axis=1))

fatality_train = scaler_fatality.transform(np.expand_dims(fatality_train, axis=1))

fatality_test = scaler_fatality.transform(np.expand_dims(fatality_test, axis=1))




In [None]:
fatality_test

### Break the large sequence into chunks of smaller sequences

In [None]:
def create_sequences(data, seq_length):
    xs = []
    ys = []

    for i in range(len(data)-seq_length-1):
        x = data[i:(i+seq_length)]
        y = data[i+seq_length]
        xs.append(x)
        ys.append(y)

    return np.array(xs), np.array(ys)

In [None]:
seq_length = 2

# confirmed cases
X_train_infection, y_train_infection = create_sequences(infection_train, seq_length)
X_test_infection, y_test_infection = create_sequences(infection_test, seq_length)

X_train_infection = torch.from_numpy(X_train_infection).float()
y_train_infection = torch.from_numpy(y_train_infection).float()

X_test_infection = torch.from_numpy(X_test_infection).float()
y_test_infection = torch.from_numpy(y_test_infection).float()

# fatalities
X_train_fatality, y_train_fatality = create_sequences(fatality_train, seq_length)
X_test_fatality, y_test_fatality = create_sequences(fatality_test, seq_length)

X_train_fatality = torch.from_numpy(X_train_fatality).float()
y_train_fatality = torch.from_numpy(y_train_fatality).float()

X_test_fatality = torch.from_numpy(X_test_fatality).float()
y_test_fatality = torch.from_numpy(y_test_fatality).float()

In [None]:
y_test_infection

Each training example contains a sequence of 5 data points of history and a label for the real value that our model needs to predict.

In [None]:
X_train_infection.shape

In [None]:
X_train_fatality.shape

In [None]:
X_train_infection[:2]

In [None]:
X_train_fatality[:2]

In [None]:
y_train_infection.shape

In [None]:
y_train_fatality.shape

In [None]:
y_train_infection[:2]

In [None]:
y_train_fatality[:2]

In [None]:
X_test_infection.shape

In [None]:
infection_train[:10]

In [None]:
fatality_train[:10]

# Constructing LSTM RNN model

In [None]:
class CoronaVirusForecast(nn.Module):

  def __init__(self, n_features, n_hidden, seq_len, n_layers=2):
    super(CoronaVirusForecast, self).__init__()

    self.n_hidden = n_hidden
    self.seq_len = seq_len
    self.n_layers = n_layers

    self.lstm = nn.LSTM(
      input_size=n_features,
      hidden_size=n_hidden,
      num_layers=n_layers,
      dropout=0.5
    )

    self.linear = nn.Linear(in_features=n_hidden, out_features=1)

  def reset_hidden_state(self):
    self.hidden = (
        torch.zeros(self.n_layers, self.seq_len, self.n_hidden),
        torch.zeros(self.n_layers, self.seq_len, self.n_hidden)
    )

  def forward(self, sequences):
    lstm_out, self.hidden = self.lstm(
      sequences.view(len(sequences), self.seq_len, -1),
      self.hidden
    )
    last_time_step = \
      lstm_out.view(self.seq_len, len(sequences), self.n_hidden)[-1]
    y_pred = self.linear(last_time_step)
    return y_pred

The CoronaVirusForecast contains 3 methods:
constructor - initialize all helper data and create the layers
reset_hidden_state - implements a stateless LSTM, so the state after each sample is reseted. 
forward - get the sequences, pass all of them through the LSTM layer, at once. The output of the last time step is taken and is passed through the linear layer to get the prediction.

### Training the model

In [None]:
def train_model_infection(
  model, 
  infection_train, 
  train_labels, 
  infection_test=None, 
  test_labels=None
):
  loss_fn = torch.nn.MSELoss(reduction='sum')

  optimiser = torch.optim.Adam(model.parameters(), lr=1e-3)
  num_epochs = 400

  infection_train_hist = np.zeros(num_epochs)
  infection_test_hist = np.zeros(num_epochs)

  for t in range(num_epochs):
    model.reset_hidden_state()

    y_pred_infection = model(X_train_infection)

    loss = loss_fn(y_pred_infection.float(), y_train_infection)

    if infection_test is not None:
      with torch.no_grad():
        y_test_pred_infection = model(X_test_infection)
        test_loss = loss_fn(y_test_pred_infection.float(), y_test_infection)
      infection_test_hist[t] = test_loss.item()

      if t % 10 == 0:  
        print(f'Epoch {t} train loss: {loss.item()} test loss: {test_loss.item()}')
    elif t % 10 == 0:
      print(f'Epoch {t} train loss: {loss.item()}')

    infection_train_hist[t] = loss.item()
    
    optimiser.zero_grad()

    loss.backward()

    optimiser.step()
  
  return model.eval(), infection_train_hist, infection_test_hist

In [None]:
def train_model_fatality(
  model, 
  fatality_train, 
  train_labels, 
  fatality_test=None, 
  test_labels=None
):
  loss_fn = torch.nn.MSELoss(reduction='sum')

  optimiser = torch.optim.Adam(model.parameters(), lr=1e-3)
  num_epochs = 400

  fatality_train_hist = np.zeros(num_epochs)
  fatality_test_hist = np.zeros(num_epochs)

  for t in range(num_epochs):
    model.reset_hidden_state()

    y_pred_fatality = model(X_train_fatality)

    loss = loss_fn(y_pred_fatality.float(), y_train_fatality)

    if fatality_test is not None:
      with torch.no_grad():
        y_test_pred_fatality = model(X_test_fatality)
        test_loss = loss_fn(y_test_pred_fatality.float(), y_test_fatality)
      fatality_test_hist[t] = test_loss.item()

      if t % 10 == 0:  
        print(f'Epoch {t} train loss: {loss.item()} test loss: {test_loss.item()}')
    elif t % 10 == 0:
      print(f'Epoch {t} train loss: {loss.item()}')

    fatality_train_hist[t] = loss.item()
    
    optimiser.zero_grad()

    loss.backward()

    optimiser.step()
  
  return model.eval(), fatality_train_hist, fatality_test_hist

Note that the hidden state is reset at the start of each epoch. We don't use batches of data our model sees every example at once. We'll use mean squared error to measure our training and test error. We'll record both.
Let's create an instance of our model and train it:

In [None]:
model = CoronaVirusForecast(
  n_features=1, 
  n_hidden=512, 
  seq_len=seq_length, 
  n_layers=2
)
model, infection_train_hist, infection_test_hist = train_model_infection(
  model, 
  X_train_infection, 
  y_train_infection, 
  X_test_infection, 
  y_test_infection
)

In [None]:
plt.plot(infection_train_hist, label="Training loss")
plt.plot(infection_test_hist, label="Test loss")
plt.ylim((0, 5))
plt.legend();

In [None]:
model = CoronaVirusForecast(
  n_features=1, 
  n_hidden=512, 
  seq_len=seq_length, 
  n_layers=2
)
model, fatality_train_hist, fatality_test_hist = train_model_fatality(
  model, 
  X_train_fatality, 
  y_train_fatality, 
  X_test_fatality, 
  y_test_fatality
)

In [None]:
plt.plot(fatality_train_hist, label="Training loss")
plt.plot(fatality_test_hist, label="Test loss")
plt.ylim((0, 5))
plt.legend();

## Daily Cases Prediction

The model can (due to the way we've trained it) predict only a single day in the future. We'll employ a simple strategy to overcome this limitation. Use predicted values as input for predicting the next days:

In [None]:
with torch.no_grad():
  test_seq_infection = X_test_infection[:1]
  preds_infection = []
  for _ in range(len(X_test_infection)):
    y_test_pred_infection = model(test_seq_infection)
    pred_infection = torch.flatten(y_test_pred_infection).item()
    preds_infection.append(pred_infection)
    new_seq_infection = test_seq_infection.numpy().flatten()
    new_seq_infection = np.append(new_seq_infection, [pred_infection])
    new_seq_infection = new_seq_infection[1:]
    test_seq_infection = torch.as_tensor(new_seq_infection).view(1, seq_length, 1).float()

In [None]:
with torch.no_grad():
  test_seq_fatality = X_test_fatality[:1]
  preds_fatality = []
  for _ in range(len(X_test_fatality)):
    y_test_pred_fatality = model(test_seq_fatality)
    pred_fatality = torch.flatten(y_test_pred_fatality).item()
    preds_fatality.append(pred_fatality)
    new_seq_fatality = test_seq_fatality.numpy().flatten()
    new_seq_fatality = np.append(new_seq_fatality, [pred_fatality])
    new_seq_fatality = new_seq_fatality[1:]
    test_seq_fatality = torch.as_tensor(new_seq_fatality).view(1, seq_length, 1).float()

We have to reverse the scaling of the test data and the model predictions:


In [None]:
true_cases_infection = scaler_infection.inverse_transform(
    np.expand_dims(y_test_infection.flatten().numpy(), axis=0)
).flatten()

predicted_cases_infection = scaler_infection.inverse_transform(
  np.expand_dims(preds_infection, axis=0)
).flatten()

In [None]:
true_cases_fatality = scaler_fatality.inverse_transform(
    np.expand_dims(y_test_fatality.flatten().numpy(), axis=0)
).flatten()

predicted_cases_fatality = scaler_fatality.inverse_transform(
  np.expand_dims(preds_fatality, axis=0)
).flatten()

In [None]:
plt.plot(
  daily_cases_infected.index[:len(infection_train)], 
  scaler_infection.inverse_transform(infection_train).flatten(),
  label='Historical Infected Daily Cases'
)

plt.plot(
  daily_cases_infected.index[len(infection_train):len(infection_train) + len(true_cases_infection)], 
  true_cases_infection,
  label='Real Infected Daily Cases'
)

plt.plot(
  daily_cases_infected.index[len(infection_train):len(infection_train) + len(true_cases_infection)], 
  predicted_cases_infection, 
  label='Predicted Infected Daily Cases'
)

plt.legend();

# All data for Training

In [None]:
scaler_infection = MinMaxScaler()

scaler_infection = scaler_infection.fit(np.expand_dims(daily_cases_infected, axis=1))

all_data_infection = scaler_infection.transform(np.expand_dims(daily_cases_infected, axis=1))

all_data_infection.shape

In [None]:
scaler_fatality = MinMaxScaler()

scaler_fatality = scaler_fatality.fit(np.expand_dims(daily_cases_fatality, axis=1))

all_data_fatality = scaler_fatality.transform(np.expand_dims(daily_cases_fatality, axis=1))

all_data_fatality.shape

In [None]:
X_all_infection, y_all_infection = create_sequences(all_data_infection, seq_length)

X_all_infection = torch.from_numpy(X_all_infection).float()
y_all_infection = torch.from_numpy(y_all_infection).float()

model = CoronaVirusForecast(
  n_features=1, 
  n_hidden=512, 
  seq_len=seq_length, 
  n_layers=2
)
model, train_hist_infection, _ = train_model_infection(model, X_all_infection, y_all_infection)

In [None]:
X_all_fatality, y_all_fatality = create_sequences(all_data_fatality, seq_length)

X_all_fatality = torch.from_numpy(X_all_fatality).float()
y_all_fatality = torch.from_numpy(y_all_fatality).float()

model = CoronaVirusForecast(
  n_features=1, 
  n_hidden=512, 
  seq_len=seq_length, 
  n_layers=2
)
model, train_hist_fatality, _ = train_model_fatality(model, X_all_fatality, y_all_fatality)

In [None]:
DAYS_TO_PREDICT_INFECTION = 43

with torch.no_grad():
  test_seq = X_all_infection[:1]
  preds_infection = []
  for _ in range(DAYS_TO_PREDICT_INFECTION):
    y_test_pred_infection = model(test_seq_infection)
    pred_infection = torch.flatten(y_test_pred_infection).item()
    preds_infection.append(pred_infection)
    new_seq_infection = test_seq_infection.numpy().flatten()
    new_seq_infection = np.append(new_seq_infection, [pred_infection])
    new_seq_infection = new_seq_infection[1:]
    test_seq_infection = torch.as_tensor(new_seq_infection).view(1, seq_length, 1).float()

In [None]:
DAYS_TO_PREDICT_FATALITY = 43

with torch.no_grad():
  test_seq = X_all_fatality[:1]
  preds_fatality = []
  for _ in range(DAYS_TO_PREDICT_FATALITY):
    y_test_pred_fatality = model(test_seq_fatality)
    pred_fatality = torch.flatten(y_test_pred_fatality).item()
    preds_fatality.append(pred_fatality)
    new_seq_fatality = test_seq_fatality.numpy().flatten()
    new_seq_fatality = np.append(new_seq_fatality, [pred_fatality])
    new_seq_fatality = new_seq_fatality[1:]
    test_seq_fatality = torch.as_tensor(new_seq_fatality).view(1, seq_length, 1).float()

In [None]:
predicted_cases_infection = scaler_infection.inverse_transform(
  np.expand_dims(preds_infection, axis=0)
).flatten()

In [None]:
predicted_cases_fatality = scaler_fatality.inverse_transform(
  np.expand_dims(preds_fatality, axis=0)
).flatten()

To create a cool chart with the historical and predicted cases, we need to extend the date index of our data frame:

In [None]:
daily_cases_infected.index[-1]

In [None]:
daily_cases_fatality.index[-1]

In [None]:
predicted_index_infection = pd.date_range(
  start=daily_cases_infected.index[-14],
  periods=DAYS_TO_PREDICT_INFECTION + 1,
  closed='right'
)

predicted_cases_infection = pd.Series(
  data=predicted_cases_infection,
  index=predicted_index_infection
)

plt.plot(predicted_cases_infection, label='Predicted Infected Daily Cases')
plt.legend();

In [None]:
predicted_index_fatality = pd.date_range(
  start=daily_cases_fatality.index[-14],
  periods=DAYS_TO_PREDICT_FATALITY + 1,
  closed='right'
)

predicted_cases_fatality = pd.Series(
  data=predicted_cases_fatality,
  index=predicted_index_fatality
)

plt.plot(predicted_cases_fatality, label='Predicted Fatality Daily Cases')
plt.legend();

In [None]:
predicted_index_infection

In [None]:
plt.plot(daily_cases_infected, label='Historical Infected Daily Cases')
plt.plot(predicted_cases_infection, label='Predicted Infected Daily Cases')
plt.legend();


In [None]:
plt.plot(daily_cases_fatality, label='Historical Fatality Daily Cases')
plt.plot(predicted_cases_fatality, label='Predicted Fatality Daily Cases')
plt.legend();



# Conclusion

To make the model better it is good to focus on more data, the number of drop_out, the number of hidden layer, and maybe bidirectional LSTM.

In [None]:
sample_submission = pd.read_csv("../input/covid19-local-us-ca-forecasting-week-1/ca_submission.csv")
sample_submission
submission = pd.DataFrame({
                           'ConfirmedCases': predicted_cases_infection,
                           'Fatalities': predicted_cases_fatality})
submission.index = sample_submission.index
submission['ForecastId'] = sample_submission['ForecastId']
submission = submission[['ForecastId','ConfirmedCases','Fatalities']]
submission.tail()

In [None]:
submission.to_csv("submission.csv", index=False)