---
<img src="../assets/grt_logo.png" style="float: left; margin: 20px; height: 55px">

# Garton Research & Trading

_August 23, 2019_

---

## Forecasting Futures Prices with LSTMs (NumPy)

---

**Context:** The purpose of this study is to experiment with implementing Long Short Term Memory neural networks to predict futures prices. The data for this study consists of continuous futures contracts (generic first nearby with naive rolling method) from the Wiki Continuous Futures dataset on [Quandl](https://www.quandl.com/data/CHRIS-Wiki-Continuous-Futures). The specific markets I chose to use are based on the markets used in the original ['Turtle Traders'](https://bigpicture.typepad.com/comments/files/turtlerules.pdf) strategy.

This notebook is a derivation of my initial work [Forecasting Futures Prices with LSTMs](https://nbviewer.jupyter.org/github/mattg12/Financial-Time-Series/blob/master/code/Forecasting%20Futures%20Prices%20with%20LSTMs.ipynb) in which I am trying to replicate the data wrangling and modeling without using `pandas`. Since `keras` actually works with `numpy` arrays, it would be more straightforward if all data preprocessing could be done in `numpy`, cutting out the 'middle man', so to speak. I found in my initial approach that having to transfer my data back and forth between `numpy` arrays and `pandas` dataframes felt inefficient and inelegant.

_Author: Matthew Garton_

In [1]:
# standard data science imports - no pandas!
import numpy as np
from sklearn.preprocessing import MinMaxScaler

# necessary for plotting
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style('whitegrid')
%matplotlib inline

# keras imports
from keras.models import Sequential
from keras.layers import Dense, Dropout, LSTM
from keras.preprocessing.sequence import TimeseriesGenerator
from keras import optimizers

Using TensorFlow backend.


### Import and Preprocess Data

In [2]:
# get data
sugar = np.genfromtxt('../data/ICE_SB1.csv', delimiter=',', skip_header=1, usecols=4, dtype=float)

In [3]:
# difference data
s_diff = np.diff(sugar)

In [4]:
# train test split
train_split = 0.7
train, test = s_diff[:int(train_split*s_diff.shape[0])], s_diff[int(train_split*s_diff.shape[0]):]

In [5]:
# minmax scale
scaler = MinMaxScaler(feature_range=(-1, 1))
train = train.reshape(-1, 1)
train_sc = scaler.fit_transform(train)
test = test.reshape(-1, 1)
test_sc = scaler.transform(test)

In [19]:
# generate samples
lags=7
batch_size=1
neurons=100
train_samples = TimeseriesGenerator(train, train, length=lags, batch_size=batch_size)
test_samples = TimeseriesGenerator(test, test, length=lags, batch_size=batch_size)

### Build and Train the Model

In [20]:
# define model architecture and build
model = Sequential()
model.add(LSTM(neurons, 
               activation='tanh',
               batch_input_shape=(batch_size, lags, 1),
               stateful=True))
model.add(Dense(1, activation='linear'))
model.compile(optimizer='adam', loss='mse')

In [21]:
model.fit_generator(train_samples)

Instructions for updating:
Use tf.cast instead.
Epoch 1/1


<keras.callbacks.History at 0x7f8ba9e7aef0>

### Analyze Results of Model

In [22]:
model.evaluate_generator(test_samples)

0.1405682139916179

### Notes

Implementing the model fully in `numpy` was easier than anticipated, and `TimeSeriesGenerator` appears to be avery powerful tool. I will first clean up this code and test my understanding to be sure I am setting everything up correctly, then I will take next steps in improving my model.

**Next Steps:**
1. Make predictions with the model, and inverse-transform those predictions to evaluate against actual data.
2. Explore parameters of both TimseriesGenerator and LSTM model to understand what my options are to tune the model.
3. Work on tweaking and optimizing the model on simple datasets to get a better understanding of how changes to model architecture affect it's ability to learn from sequential data.
4. Play around with longer lookbacks and prediction windows.