# Toy Models

In this notebook, we will explore Transformer Architectures to perform classification of price movement on stock data. We will preprocess the stock data to contain the Times in the form of sines and cosines in order to feed additional data to our model. We will follow the approach outlined in the [paper](https://arxiv.org/pdf/2010.02803.pdf), where the features are projected into high dimensional space and a time/sequence representation is learned by our model

### Library Import

In [1]:
import os
import sys
import numpy as np
import pandas as pd
import pandas_ta as ta
import tensorflow as tf
import keras
from tensorflow.keras import layers
import matplotlib.pyplot as plt

plt.rcParams["figure.figsize"] = (20, 10)
%matplotlib inline

### Local Imports

In [2]:
from window_generator import WindowGenerator

In [3]:
# for python scripts use: "os.path.dirname(__file__)" instead of "os.path.abspath('')"
sys.path.append(
    os.path.abspath(os.path.join(os.path.abspath(''), os.path.pardir)))

from data_clean import get_trading_times

#### Ensure that GPU is available

In [4]:
tf.config.list_physical_devices('GPU')  

[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

### Get the Data

In [170]:
data_path = r'..\data\raw\AAPL_15min.csv'
df = pd.read_csv(data_path, index_col=0, 
                 parse_dates=True, infer_datetime_format=True)

# df = get_trading_times(df)
df = df.dropna()

# add days, hours, and minutes to the dataset
dayofweek = df.index.dayofweek
hour = df.index.hour
minute = df.index.minute

# encode the days, hours, and minutes with sin and cos functions
days_in_week = 7
hours_in_day = 24
minutes_in_hour = 60

df['sin_day'] = np.sin(2*np.pi*dayofweek/days_in_week)
df['cos_day'] = np.cos(2*np.pi*dayofweek/days_in_week)
df['sin_hour'] = np.sin(2*np.pi*hour/hours_in_day)
df['cos_hour'] = np.cos(2*np.pi*hour/hours_in_day)
df['sin_minute'] = np.sin(2*np.pi*minute/minutes_in_hour)
df['cos_minute'] = np.cos(2*np.pi*minute/minutes_in_hour)


### Add target columns
We will add a column for price change at each interval, this will be our regression target variable. We will also add another column that quantifys the magnitude of the price change, this will be out target variable for classification.

In [171]:
df['price_diff'] = df['close'].diff()

thresh = 0.1 # dollars
df['price_change'] = 1 # price stays the same
df['price_change'][df['price_diff'] < -thresh] = 0 # downward price movement
df['price_change'][df['price_diff'] > thresh] = 2 # upward prive movement

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['price_change'][df['price_diff'] < -thresh] = 0 # downward price movement
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['price_change'][df['price_diff'] > thresh] = 2 # upward prive movement


In [172]:
df = df.dropna()
df.head()

Unnamed: 0_level_0,open,high,low,close,volume,sin_day,cos_day,sin_hour,cos_hour,sin_minute,cos_minute,price_diff,price_change
time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
2020-10-01 04:30:00,115.634512,115.792604,115.407254,115.407254,13550.0,0.433884,-0.900969,0.866025,0.5,5.665539e-16,-1.0,-0.207496,0
2020-10-01 04:45:00,115.367731,115.367731,115.120712,115.308447,12857.0,0.433884,-0.900969,0.866025,0.5,-1.0,-1.83697e-16,-0.098808,1
2020-10-01 05:00:00,115.308447,115.397374,115.298566,115.318327,10079.0,0.433884,-0.900969,0.965926,0.258819,0.0,1.0,0.009881,1
2020-10-01 05:15:00,115.417135,115.604869,115.377612,115.604869,3534.0,0.433884,-0.900969,0.965926,0.258819,1.0,2.832769e-16,0.286542,2
2020-10-01 05:30:00,115.604869,115.703677,115.555466,115.703677,7688.0,0.433884,-0.900969,0.965926,0.258819,5.665539e-16,-1.0,0.098808,1


## Compute Technical Indicators

In this portion we will compute several Technical Indicators that will help feed the model more useful information.

We will compute the:
- [Awsome Oscillator](https://www.ifcm.co.uk/ntx-indicators/awesome-oscillator)
- RSI 
- SMA
- EMA


In [173]:
# momentum indicators
awsome_oscillator = ta.momentum.ao(df.high, df.low, fast=5, slow=34)

rsi_14 = ta.momentum.rsi(df.close, length=14)
rsi_24 = ta.momentum.rsi(df.close, length=24)

stoch_rsi_14 = ta.momentum.rsi(df.close, rsi_length=14, length=14)
stoch_rsi_24 = ta.momentum.rsi(df.close, rsi_length=24, length=24)

tsi = ta.momentum.tsi(df.close)

ema_10 = ta.ema(df.close, length=10)
ema_20 = ta.ema(df.close, length=20)
ema_30 = ta.ema(df.close, length=30)

# volume indicators
acc_dist = ta.volume.ad(df.high, df.low, df.close, df.open)

In [None]:
help(ta.ema)

In [None]:
dir(ta.trend)

Place everything in the data frame

In [174]:
indicators = [
    awsome_oscillator,
    rsi_14, 
    rsi_24, 
    stoch_rsi_14, 
    stoch_rsi_24,
    tsi, 
    ema_10, 
    ema_20, 
    ema_30, 
    acc_dist 
]

df = pd.concat([df, pd.concat(indicators, axis=1)], axis=1)

df = df.dropna()
df.head()

Unnamed: 0_level_0,open,high,low,close,volume,sin_day,cos_day,sin_hour,cos_hour,sin_minute,...,RSI_14,RSI_24,RSI_14,RSI_24,TSI_13_25_13,TSIs_13_25_13,EMA_10,EMA_20,EMA_30,AD
time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2020-10-01 12:45:00,114.863813,115.061428,114.626674,114.794548,2690445.0,0.433884,-0.900969,1.224647e-16,-1.0,-1.0,...,37.401455,40.240973,37.401455,40.240973,-8.961924,-4.83003,115.086042,115.317195,115.54358,420.912656
2020-10-01 13:00:00,114.789707,114.972501,114.685959,114.942859,2110737.0,0.433884,-0.900969,-0.258819,-0.965926,0.0,...,40.976625,42.61357,40.976625,42.61357,-9.05096,-5.43302,115.060009,115.281544,115.504823,511.952769
2020-10-01 13:15:00,114.942859,115.160235,114.893455,115.051547,1936852.0,0.433884,-0.900969,-0.258819,-0.965926,1.0,...,43.522318,44.304539,43.522318,44.304539,-8.465868,-5.866284,115.05847,115.259639,115.47558,533.238483
2020-10-01 13:30:00,115.051547,115.160235,114.83417,114.993745,2435041.0,0.433884,-0.900969,-0.258819,-0.965926,5.665539e-16,...,42.473151,43.591724,42.473151,43.591724,-8.286458,-6.212023,115.046702,115.234316,115.444494,530.797996
2020-10-01 13:45:00,115.003724,115.377612,114.954222,115.272481,2259626.0,0.433884,-0.900969,-0.258819,-0.965926,-1.0,...,48.873587,47.816401,48.873587,47.816401,-6.415967,-6.241158,115.087753,115.237951,115.433396,588.689019


### Get Standardized train, valid, and test sets

Split into train, valid, and test sets. And then standardize with training mean and standard deviation

In [185]:
train_df = df.loc['2020-10-01':'2021-10-01']
valid_df = df.loc['2021-10-02':'2022-05-01']
test_df = df.loc['2022-05-02':]

train_mean = train_df.mean()
train_std = train_df.std()

# ensure that target column is not standardized
train_mean.price_change = 0
train_std.price_change = 1

train_df = (train_df - train_mean) / train_std
valid_df = (valid_df - train_mean) / train_std
test_df = (test_df - train_mean) / train_std

# (train_df * train_std + train_mean)

print(train_df.shape)
print(valid_df.shape)
print(test_df.shape)

(16079, 24)
(9243, 24)
(6267, 24)


### Get Data Generator for each time step

In [194]:
data_gen = WindowGenerator(
                input_width=64, label_width=1, shift=1, 
                train_df=train_df, valid_df=valid_df, test_df=test_df,
                label_columns=['price_change'])

In [195]:
for inputs, targets in data_gen.train.take(1):
    print(f'Inputs shape (batch, time, features): {inputs.shape}')
    print(f'Targets shape (batch, time, features): {targets.shape}')

Inputs shape (batch, time, features): (32, 64, 24)
Targets shape (batch, time, features): (32, 1, 1)


## **Start Training Models**

First we will define a helper function to streamline this process

In [178]:
def compile_and_fit(model, window, lr=1e-4, max_epochs=100, patience=2):
    early_stopping = tf.keras.callbacks.EarlyStopping(monitor='val_loss',
                                                      patience=patience,
                                                      mode='min')

    model.compile(loss=tf.keras.losses.SparseCategoricalCrossentropy(),
                  optimizer=tf.keras.optimizers.Adam(learning_rate=lr),
                  metrics=['accuracy'])

    history = model.fit(window.train, epochs=max_epochs,
                        validation_data=window.valid,
                        callbacks=[early_stopping])
    return history

Track the performance of several models

In [None]:
val_performance = {}
performance = {}

We will also define a learning rate scheduler

In [14]:
def lr_scheduler(epoch, lr, warmup_epochs=15, decay_epochs=100, initial_lr=1e-6, base_lr=1e-3, min_lr=5e-5):
    if epoch <= warmup_epochs:
        pct = epoch / warmup_epochs
        return ((base_lr - initial_lr) * pct) + initial_lr

    if epoch > warmup_epochs and epoch < warmup_epochs+decay_epochs:
        pct = 1 - ((epoch - warmup_epochs) / decay_epochs)
        return ((base_lr - min_lr) * pct) + min_lr

    return min_lr

Setup the Transformer Encoder

Build model with Keras classes

Things to try:
- Try to replace Layer Normalization with Batch Normalization and observe the results

In [196]:
from tensorflow_addons.layers import MultiHeadAttention

class TransformerEncoder(tf.keras.layers.Layer):

    def __init__(self, n_heads, head_size, ff_dim, dropout=0):
        super().__init__()
        
        self.n_heads = n_heads
        self.head_size = head_size
        self.ff_dim = ff_dim
        self.dropout = dropout

        self.attn_heads = list()


    def build(self, input_shape):
        
        # attention portion
        self.attn_multi = MultiHeadAttention(num_heads=self.n_heads, 
                                             head_size=self.head_size, 
                                             dropout=self.dropout)
        self.attn_dropout = layers.Dropout(self.dropout)
        self.attn_norm = layers.LayerNormalization(epsilon=1e-6)

        # feedforward portion
        self.ff_conv1 = layers.Conv1D(filters=self.ff_dim, 
                                      kernel_size=1, 
                                      activation='relu')
        self.ff_dropout = layers.Dropout(self.dropout)
        self.ff_conv2 = layers.Conv1D(filters=input_shape[-1],
                                      kernel_size=1)
        self.ff_norm = layers.LayerNormalization(epsilon=1e-6)


    def call(self, inputs):
        # attention portion
        x = self.attn_multi([inputs, inputs])
        x = self.attn_dropout(x)
        x = self.attn_norm(x)

        # get first residual
        res = x + inputs
        
        # feedforward portion
        x = self.ff_conv1(res)
        x = self.ff_dropout(x)
        x = self.ff_conv2(x)
        x = self.ff_norm(x)
        
        # return residual
        return res + x
    
    # Needed for saving and loading model with custom layer
    def get_config(self): 
        config = super().get_config().copy()
        config.update({'d_k': self.d_k,
                       'd_v': self.d_v,
                       'n_heads': self.n_heads,
                       'ff_dim': self.ff_dim,
                       'attn_heads': self.attn_heads,
                       'dropout': self.dropout_rate})
        return config          


In [197]:
class TransformerModel(keras.Model):

    def __init__(self, 
            n_heads,
            head_size,
            ff_dim,
            num_transformer_blocks,
            mlp_units,
            n_outputs=3,
            dropout=0.1,
            mlp_dropout=0.1):
            
        super().__init__()
        
        self.n_heads = n_heads
        self.head_size = head_size
        self.ff_dim = ff_dim
        self.num_transformer_blocks = num_transformer_blocks
        self.mlp_units = mlp_units
        self.n_outputs = n_outputs
        self.dropout = dropout
        self.mlp_dropout = mlp_dropout

        
         
    def build(self, input_shape):

        # get embedding layer that projects inputs inot high dimensional space
        self.embed = layers.Dense(self.head_size)

        # get learnable time layer
        # self.time_layer = layers.Layer(tf.random.uniform((input_shape[1], self.head_size), -0.2, 0.2))
        # self.time_layer = tf.Variable(
        #     initial_value=tf.random.uniform((input_shape[1], self.head_size), -0.2, 0.2)
        #     )
        
        # get transformer encoders
        self.encoders = [TransformerEncoder(self.n_heads, self.head_size, self.ff_dim, self.dropout) 
                         for _ in range(self.num_transformer_blocks)]

        self.avg_pool = layers.GlobalAveragePooling1D(data_format="channels_first")

        # get MLP portion of network
        self.mlp_layers = []
        for dim in self.mlp_units:
            self.mlp_layers.append(layers.Dense(dim, activation="relu"))
            self.mlp_layers.append(layers.Dropout(self.mlp_dropout))

        # output layer 
        self.mlp_output = layers.Dense(self.n_outputs, activation='softmax')


    def call(self, x):

        # project input data into high dimensional space
        x = self.embed(x)

        # inject time information ??
        # x = x + self.time_layer(x)
        
        # Encoder Portion
        for encoder in self.encoders:
            x = encoder(x)

        # Average Pooling
        x = self.avg_pool(x)

        # MLP portion for classification
        for mlp_layer in self.mlp_layers:
            x = mlp_layer(x)

        x = self.mlp_output(x)

        return x

In [198]:
transformer_model = TransformerModel(
            n_heads=2,
            head_size=512,
            ff_dim=256,
            num_transformer_blocks=2,
            mlp_units=[256],
            n_outputs=3,
            dropout=0.1,
            mlp_dropout=0.1)

In [199]:
compile_and_fit(transformer_model, data_gen, patience=15, max_epochs=10)

val_performance['transformer_3'] = transformer_model.evaluate(data_gen.valid)
performance['transformer_3'] = transformer_model.evaluate(data_gen.test, verbose=0)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


In [193]:
val_performance

{'transformer_1': [0.9981821179389954, 0.4725024402141571],
 'transformer_2': [1.033190131187439, 0.4488556385040283]}