Let's now try to use Deep learning to predict future trade Volume.

In this notebook, we will usually use normalized features and volume. The volume will be denormalized only for the purpose of model evaluation. We will use StandardScaler for normalization.

We can draw conclusion from the Exploratory Data Analysis that not all features from the original dataset are usefull for prediction and few more additional features can be added. Models in this notebook will work with usually with following columns:

| Feature  | Type | Description |
| ------------- | ------------- ||
| Volume X  | float64  |Historial trading volume shifted by X|
| AdjCloseDiff X  | float64  |Historical difference between AdjClose price of two consecutive days shifted by X|
| HighLowDiff X  | float64  ||Historical difference between High and Low price shifted by X|
| DayOfWeek X  | one-hot |One-hot encoding value for each day|
| Month X  | one-hot  |One-hot encoding value for each month|

Le'ts now read the normalized data.

In [1]:
import tensorflow as tf
import numpy as np
import sys
import pandas as pd
import datetime

sys.path.append("../") # go to parent dir
from util.read_data import DataReader
from util.evaluator import ModelEvaluator

tf.enable_eager_execution()

In [2]:
# read test and train data
reader = DataReader()
# 
df = reader.read_normalized_data_for_rnn('../data/S&P500.csv')
train_features, train_volume = reader.get_train_data(df)
test_features, test_volume = reader.get_test_data(df)
evaluator = ModelEvaluator(reader.label_scaler)

In [3]:
class DataLoader():
    def __init__(self, features, volume):
        self.features = features
        self.volume = volume

    def get_batch(self, seq_length, batch_size):
        seq = []
        next_volume = []
        for i in range(batch_size):
            # the training example in batch has to be random 
            index = np.random.randint(0, len(self.volume) - seq_length)
            seq.append(self.features[index:index+seq_length].values)
            next_volume.append(self.volume[index+seq_length])
        return np.array(seq), np.array(next_volume).reshape(-1, 1)

# TODO should be refactored to predict in batches, not one by one
def predict_volume(rnn_model, features):
    volume_pred = []
    for i in range(len(features) - seq_length):
        x = features[i:i + seq_length]
        x = x.values.reshape(1, seq_length, x.shape[1])
        volume_pred.append(rnn_model(x).numpy()[0, 0])
    return pd.Series(volume_pred)

In [4]:
class RNN(tf.keras.Model):
    def __init__(self):
        super().__init__()
        # neither deep nor shallow
        self.cell1 = tf.nn.rnn_cell.BasicLSTMCell(num_units=512)
        self.cell2 = tf.nn.rnn_cell.BasicLSTMCell(num_units=512)
        self.dense1 = tf.keras.layers.Dense(units=1024)
        self.dense2 = tf.keras.layers.Dense(units=1)

    def call(self, inputs):
        batch_size, seq_length, _ = tf.shape(inputs)
        state1 = self.cell1.zero_state(batch_size=batch_size, dtype=tf.float32)
        state2 = self.cell2.zero_state(batch_size=batch_size, dtype=tf.float32)
        for t in range(seq_length.numpy()):
            output, state1 = self.cell1(inputs[:, t, :], state1)
            output, state2 = self.cell2(output, state2)
            output = self.dense1(output)
            output = self.dense2(output)
        return output

    def predict(self, inputs, temperature=1.):
        batch_size, _ = tf.shape(inputs)
        output = self(inputs)
        return output.numpy()


Let's use following hyper-parameters.

In [14]:
learning_rate = 1e-3
batch_size = 64
seq_length = 50
num_batches = 500

In [15]:
data_loader = DataLoader(train_features,train_volume)
model = RNN()
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
for batch_index in range(num_batches):
    X, y = data_loader.get_batch(seq_length, batch_size)
    with tf.GradientTape() as tape:
        y_pred = model(X)
        loss = tf.losses.mean_squared_error(labels=y, predictions=y_pred)
        if (batch_index % 100 == 0):
            print("batch %d: loss %f" % (batch_index, loss.numpy()))
    grads = tape.gradient(loss, model.variables)
    optimizer.apply_gradients(grads_and_vars=zip(grads, model.variables))

batch 0: loss 1.220601
batch 100: loss 0.599872
batch 200: loss 0.372579
batch 300: loss 0.274882
batch 400: loss 0.178969


In [16]:
print(evaluator.evaluate("lstm {} on train".format(seq_length), train_volume[seq_length:], 
                         predict_volume(model, train_features)))

print(evaluator.evaluate("lstm {} on test".format(seq_length), test_volume[seq_length:], 
                         predict_volume(model, test_features)))

lstm 50 on train: MSE = 9.158586e+17, R2 = 0.624, conf. int. 95% of error = (208,790,216 - 266,491,325)
lstm 50 on test: MSE = 9.636852e+17, R2 = -1.336, conf. int. 95% of error = (677,053,005 - 858,256,010)


The performance of the model is not gread, probably for similar reasons as mentioned in the machine_learning notebook - features used are not the best to predict future Volume.
Still, some hyper-parameter tuning would help a bit. 