# Toy Models

In this notebook, we will explore Transformer Architectures to perform classification of price movement on stock data. We will preprocess the stock data to contain the Times in the form of sines and cosines in order to feed additional data to our model. We will follow the approach outlined in the [paper](https://arxiv.org/pdf/2010.02803.pdf), where the features are projected into high dimensional space and a time/sequence representation is learned by our model

### Library Import

In [29]:
import os
import sys
import numpy as np
import pandas as pd
import pandas_ta as ta
import tensorflow as tf
import keras
from tensorflow.keras import layers
import matplotlib.pyplot as plt

plt.rcParams["figure.figsize"] = (20, 10)
%matplotlib inline

### Local Imports

In [2]:
from window_generator import WindowGenerator

In [3]:
# for python scripts use: "os.path.dirname(__file__)" instead of "os.path.abspath('')"
sys.path.append(
    os.path.abspath(os.path.join(os.path.abspath(''), os.path.pardir)))

from data_clean import get_trading_times

#### Ensure that GPU is available

In [4]:
tf.config.list_physical_devices('GPU')  

[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

### Get the Data

In [5]:
data_path = r'..\data\raw\AAPL_15min.csv'
df = pd.read_csv(data_path, index_col=0, 
                 parse_dates=True, infer_datetime_format=True)

# df = get_trading_times(df)
df = df.dropna()

# add days, hours, and minutes to the dataset
dayofweek = df.index.dayofweek
hour = df.index.hour
minute = df.index.minute

# encode the days, hours, and minutes with sin and cos functions
days_in_week = 7
hours_in_day = 24
minutes_in_hour = 60

df['sin_day'] = np.sin(2*np.pi*dayofweek/days_in_week)
df['cos_day'] = np.cos(2*np.pi*dayofweek/days_in_week)
df['sin_hour'] = np.sin(2*np.pi*hour/hours_in_day)
df['cos_hour'] = np.cos(2*np.pi*hour/hours_in_day)
df['sin_minute'] = np.sin(2*np.pi*minute/minutes_in_hour)
df['cos_minute'] = np.cos(2*np.pi*minute/minutes_in_hour)


### Add target columns
We will add a column for price change at each interval, this will be our regression target variable. We will also add another column that quantifys the magnitude of the price change, this will be out target variable for classification.

In [6]:
df['price_diff'] = df['close'].diff()

thresh = 0.1 # dollars
df['price_change'] = 1 # price stays the same
df['price_change'][df['price_diff'] < -thresh] = 0 # downward price movement
df['price_change'][df['price_diff'] > thresh] = 2 # upward prive movement

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['price_change'][df['price_diff'] < -thresh] = 0 # downward price movement
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['price_change'][df['price_diff'] > thresh] = 2 # upward prive movement


In [7]:
df = df.dropna()
df.head()

Unnamed: 0_level_0,open,high,low,close,volume,sin_day,cos_day,sin_hour,cos_hour,sin_minute,cos_minute,price_diff,price_change
time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
2020-10-01 04:30:00,115.634512,115.792604,115.407254,115.407254,13550.0,0.433884,-0.900969,0.866025,0.5,5.665539e-16,-1.0,-0.207496,0
2020-10-01 04:45:00,115.367731,115.367731,115.120712,115.308447,12857.0,0.433884,-0.900969,0.866025,0.5,-1.0,-1.83697e-16,-0.098808,1
2020-10-01 05:00:00,115.308447,115.397374,115.298566,115.318327,10079.0,0.433884,-0.900969,0.965926,0.258819,0.0,1.0,0.009881,1
2020-10-01 05:15:00,115.417135,115.604869,115.377612,115.604869,3534.0,0.433884,-0.900969,0.965926,0.258819,1.0,2.832769e-16,0.286542,2
2020-10-01 05:30:00,115.604869,115.703677,115.555466,115.703677,7688.0,0.433884,-0.900969,0.965926,0.258819,5.665539e-16,-1.0,0.098808,1


### Get Standardized train, valid, and test sets

Split into train, valid, and test sets. And then standardize with training mean and standard deviation

In [8]:
train_df = df.loc['2020-10-01':'2021-10-01']
valid_df = df.loc['2021-10-02':'2022-05-01']
test_df = df.loc['2022-05-02':]

train_mean = train_df.mean()
train_std = train_df.std()

# ensure that target column is not standardized
train_mean.price_change = 0
train_std.price_change = 1

train_df = (train_df - train_mean) / train_std
valid_df = (valid_df - train_mean) / train_std
test_df = (test_df - train_mean) / train_std


print(train_df.shape)
print(valid_df.shape)
print(test_df.shape)

(16112, 13)
(9243, 13)
(6267, 13)


### Get Data Generator for each time step

In [9]:
data_gen = WindowGenerator(
                input_width=24, label_width=1, shift=1, 
                train_df=train_df, valid_df=valid_df, test_df=test_df,
                label_columns=['price_change'])

In [10]:
for inputs, targets in data_gen.train.take(1):
    print(f'Inputs shape (batch, time, features): {inputs.shape}')
    print(f'Targets shape (batch, time, features): {targets.shape}')

Inputs shape (batch, time, features): (32, 24, 13)
Targets shape (batch, time, features): (32, 1, 1)


## **Start Training Models**

First we will define a helper function to streamline this process

In [14]:
def compile_and_fit(model, window, lr=1e-4, max_epochs=100, patience=2):
    early_stopping = tf.keras.callbacks.EarlyStopping(monitor='val_loss',
                                                      patience=patience,
                                                      mode='min')

    model.compile(loss=tf.keras.losses.SparseCategoricalCrossentropy(),
                  optimizer=tf.keras.optimizers.Adam(learning_rate=lr),
                  metrics=['accuracy'])

    history = model.fit(window.train, epochs=max_epochs,
                        validation_data=window.valid,
                        callbacks=[early_stopping])
    return history

We will also define a learning rate scheduler

In [28]:
def lr_scheduler(epoch, lr, warmup_epochs=15, decay_epochs=100, initial_lr=1e-6, base_lr=1e-3, min_lr=5e-5):
    if epoch <= warmup_epochs:
        pct = epoch / warmup_epochs
        return ((base_lr - initial_lr) * pct) + initial_lr

    if epoch > warmup_epochs and epoch < warmup_epochs+decay_epochs:
        pct = 1 - ((epoch - warmup_epochs) / decay_epochs)
        return ((base_lr - min_lr) * pct) + min_lr

    return min_lr

Setup the Transformer Encoder

In [21]:
from tensorflow import keras
from tensorflow.keras import layers


def transformer_encoder(inputs, n_heads, d_k, d_v, ff_dim, dropout=0):
    # Normalization and Attention
    x = layers.LayerNormalization(epsilon=1e-6)(inputs)
    x = layers.MultiHeadAttention(
        num_heads=n_heads, key_dim=d_k, value_dim=d_v, dropout=dropout
    )(x, x)
    x = layers.Dropout(dropout)(x)
    res = x + inputs

    # Feed Forward Part
    x = layers.LayerNormalization(epsilon=1e-6)(res)
    x = layers.Conv1D(filters=ff_dim, kernel_size=1, activation="relu")(x)
    x = layers.Dropout(dropout)(x)
    x = layers.Conv1D(filters=inputs.shape[-1], kernel_size=1)(x)
    return x + res


def build_model(
            input_shape,
            n_heads,
            d_k,
            d_v,
            ff_dim,
            num_transformer_blocks,
            mlp_units,
            n_outputs=1,
            dropout=0.1,
            mlp_dropout=0.1,
        ):
    inputs = keras.Input(shape=input_shape)
    x = inputs
    for _ in range(num_transformer_blocks):
        x = transformer_encoder(x, n_heads, d_k, d_v, ff_dim, dropout)

    x = layers.GlobalAveragePooling1D(data_format="channels_first")(x)
    for dim in mlp_units:
        x = layers.Dense(dim, activation="relu")(x)
        x = layers.Dropout(mlp_dropout)(x)
    outputs = layers.Dense(n_outputs, activation='softmax')(x)
    return keras.Model(inputs, outputs)

Build model with Keras classes

Things to try:
- Try to replace Layer Normalization with Batch Normalization and observe the results

In [198]:
from tensorflow_addons.layers import MultiHeadAttention

class TransformerEncoder(tf.keras.layers.Layer):

    def __init__(self, n_heads, head_size, ff_dim, dropout=0):
        super().__init__()
        
        self.n_heads = n_heads
        self.head_size = head_size
        self.ff_dim = ff_dim
        self.dropout = dropout

        self.attn_heads = list()


    def build(self, input_shape):

        # print(input_shape)
        
        # attention portion
        # self.attn_multi = layers.MultiHeadAttention(self.n_heads, 
        #                                             self.d_k, 
        #                                             self.d_v, 
        #                                             dropout=self.dropout)
        self.attn_multi = MultiHeadAttention(num_heads=self.n_heads, 
                                             head_size=self.head_size, 
                                             dropout=self.dropout)
        self.attn_dropout = layers.Dropout(self.dropout)
        self.attn_norm = layers.LayerNormalization(epsilon=1e-6)

        # feedforward portion
        self.ff_conv1 = layers.Conv1D(filters=self.ff_dim, 
                                      kernel_size=1, 
                                      activation='relu')
        self.ff_dropout = layers.Dropout(self.dropout)
        self.ff_conv2 = layers.Conv1D(filters=input_shape[-1],
                                      kernel_size=1)
        self.ff_norm = layers.LayerNormalization(epsilon=1e-6)


    def call(self, inputs):
        # attention portion
        x = self.attn_multi([inputs, inputs])
        x = self.attn_dropout(x)
        x = self.attn_norm(x)

        # get first residual
        res = x + inputs
        
        # feedforward portion
        x = self.ff_conv1(res)
        x = self.ff_dropout(x)
        x = self.ff_conv2(x)
        x = self.ff_norm(x)
        
        # return residual
        return res + x
    
    # Needed for saving and loading model with custom layer
    def get_config(self): 
        config = super().get_config().copy()
        config.update({'d_k': self.d_k,
                       'd_v': self.d_v,
                       'n_heads': self.n_heads,
                       'ff_dim': self.ff_dim,
                       'attn_heads': self.attn_heads,
                       'dropout': self.dropout_rate})
        return config          


In [211]:
class TransformerModel(keras.Model):

    def __init__(self, 
            n_heads,
            head_size,
            ff_dim,
            num_transformer_blocks,
            mlp_units,
            n_outputs=3,
            dropout=0.1,
            mlp_dropout=0.1):
            
        super().__init__()
        
        self.n_heads = n_heads
        self.head_size = head_size
        self.ff_dim = ff_dim
        self.num_transformer_blocks = num_transformer_blocks
        self.mlp_units = mlp_units
        self.n_outputs = n_outputs
        self.mlp_dropout = mlp_dropout

        # self.encoders = [TransformerEncoder(n_heads, head_size, ff_dim, dropout) 
        #                  for _ in range(num_transformer_blocks)]
         
    def call(self, x):

        # project input data into high dimensional space
        x = layers.Dense(self.head_size)(x)
        
        # Encoder Portion
        # # for encoder in self.encoders:
        # #     x = encoder(x)
        # for _ in range(self.num_transformer_blocks):
        #     x = TransformerEncoder(self.n_heads, self.head_size, self.ff_dim, self.dropout)(x)

        # # Averal
        # x = layers.GlobalAveragePooling1D(data_format="channels_first")(x)

        # # MLP portion for classification
        # for dim in self.mlp_units:
        #     x = layers.Dense(dim, activation="relu")(x)
        #     x = layers.Dropout(self.mlp_dropout)(x)
        # outputs = layers.Dense(self.n_outputs, activation='softmax')(x)

        return x

In [212]:
def build_transformer(input_shape,
            n_heads,
            head_size,
            ff_dim,
            num_transformer_blocks,
            mlp_units,
            n_outputs=3,
            dropout=0.1,
            mlp_dropout=0.1,
        ):
    inputs = keras.Input(shape=input_shape)
    x = inputs

    # model to project inputs into higher dimensional space (same as head_size??)
    # This is just a linear layer with a bias and no activation
    x = layers.Dense(units=head_size)(x)

    # model to encode time/positions into high dimensional data

    # encoder portion
    for _ in range(num_transformer_blocks):
        x = TransformerEncoder(n_heads, head_size, ff_dim, dropout)(x)

    x = layers.GlobalAveragePooling1D(data_format="channels_first")(x)

    # MLP portion for classification
    for dim in mlp_units:
        x = layers.Dense(dim, activation="relu")(x)
        x = layers.Dropout(mlp_dropout)(x)
    outputs = layers.Dense(n_outputs, activation='softmax')(x)
    return keras.Model(inputs, outputs)
    

In [216]:
input_shape = inputs.shape[1:]

transformer_model = build_transformer(
    input_shape,
    n_heads=2,
    head_size=512,
    ff_dim=256,
    num_transformer_blocks=2,
    mlp_units=[256],
    n_outputs=3,
    dropout=0.1,
    mlp_dropout=0.1,
)

In [214]:
transformer_model = TransformerModel(
            n_heads=2,
            head_size=512,
            ff_dim=256,
            num_transformer_blocks=2,
            mlp_units=[256],
            n_outputs=3,
            dropout=0.1,
            mlp_dropout=0.1)

In [None]:
compile_and_fit(transformer_model, data_gen, patience=15, max_epochs=5)

val_performance['transformer'] = transformer_model.evaluate(data_gen.valid)
performance['transformer'] = transformer_model.evaluate(data_gen.test, verbose=0)