# Sami Abdelazim - JC Foster

This notebook, ultimately, takes the multidimensional timeseries that we made in the Apply-SA_Model notebook, and applies a simple LSTM to it. Overall, we consider 3 different number of hours of lag for each datapoint, namely 6, 12, and 18. Additionally, since predicting the actual price is much more difficult than predicting if it's going up or down, we will label the data with 0 if the price went down and 1 if it went up, and perform a classification.

To evaluate the performance, we calculate the accuracy (no AUC since the data is so small), additionally, we calculate how much money we would make if we used the following very simple trading strategy starting with $10000,in all cases, the number outputed would be the return on 6-18 hours of trading:
- if we predict that the price will go up, aka 1, then we spend all of our money on oil at the current price.
- if we predict that the price will go down, aka 0, then we sell all of our oil at the current price.

In [1]:
import numpy as np
import pandas as pd
import torch
import tensorflow
from sklearn.metrics import accuracy_score
from tensorflow import keras
from keras.models import Sequential
from keras.layers import LSTM,Dense
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [8]:
data = pd.read_csv('drive/MyDrive/DS-301_PROJECT/TwitterData/final_data.csv')
X = data[['news','oil','think','gov']].values
data['price_change'] = data[' price'] > data[' price'].shift()
data['price_change'] = data['price_change'].apply(lambda x : 1 if x else 0)
y = data['price_change'].values

In [16]:
import math
lags = [6,12,18]

for lag in lags:
  X_lag = []
  for i in range(lag,len(y)):
    X_lag.append(X[i-lag+1:i+1])

  # define time series with lag
  X_lag = np.asarray(X_lag)
  y_lag = y[lag:]

  tensorflow.random.set_seed(42)

  trainsize = int(len(X_lag)*0.8)

  #define model
  model = Sequential()
  model.add(LSTM(50,activation='relu',input_shape=X_lag.shape[1:]))
  model.add(Dense(1,activation='sigmoid'))
  model.compile(optimizer='adam',
                loss='binary_crossentropy',
                metrics=['accuracy'])
  X_train = X_lag[:trainsize]
  y_train = y_lag[:trainsize]

  model.fit(X_train,y_train,epochs=10)
  
  # define test set
  X_test = X_lag[trainsize:]
  y_test = y_lag[trainsize:]

  # get predictions
  predictions = model.predict(X_test)
  preds = []
  predictions
  for prediction in predictions:
    if prediction>=0.5:
      preds.append(1)
    else:
      preds.append(0)

  print(f"Accuracy for lag={lag}:",accuracy_score(y_test,preds))

  # get prices from data
  prices = data[' price'].values
  prices_ = prices[lag:]
  # note start at 30 instead of 31
  # this is so we can see how price
  # changes from hour 
  prices_frame = prices_[trainsize-1:]

  # we start with $10000
  initial_amount = 10000
  amount = initial_amount

  for i,val in enumerate(preds):
      if val == 1:
        change = prices_frame[i+1]/prices_frame[i]
        amount = amount*change

  print(f"Final Amount for lag={lag}:", amount)
  print()

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Accuracy for lag=6: 0.7272727272727273
Final Amount for lag=6: 10145.663324354902

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Accuracy for lag=12: 0.8
Final Amount for lag=12: 10257.831171861082

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Accuracy for lag=18: 0.4444444444444444
Final Amount for lag=18: 10073.278434128326

