# Technical Indicators - NN Models

We've tried training most of the basic models provided by scikit on technical data from individual assets - with a good amount of success. The most effective model was the Random Forest model, which does not recognize any temporal relationship between any two feature sets (ie. if you provide samples to the model in reverse you get the same result).

RNNs (and to some degree CNNs) are relatively unique in the landscape of machine learning models in that temporal relationships can be encoded and learned fairly easily. This is perfect for our use case of purely market temporal data.

Given the wide array of neural network architectures in use today, it makes sense to explore the results we can get from a variety of NN architectures. For now we will experiment without changing the format or content of the input data.

Input will be the historical technical indicators data for a single asset **One model should be trained for each asset.**

In [1]:
# Auto reload local files
%load_ext autoreload
%autoreload 2
# Make files in src/ available to notebook
import sys
if 'src' not in sys.path:
    sys.path.insert(0, 'src')


import numpy as np
import pandas as pd

from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor

import torch
from torch.utils.data import Dataset, DataLoader

from technical_signals import TechnicalSignals, percent_change
import datastore as ds

class TechnicalIndicatorsDataset(Dataset):
    def __init__(self, ticker, predict_window, transform=None, target_transform=None):
        data = ds.get_daily_candlesticks([ticker], "2000-01-01", "2040-06-06")[ticker]
        indicators = TechnicalSignals(data, predict_window=predict_window)
        self.X, self.y, self.date = indicators.toXy()
        print(self.X.shape)
        self.transform = transform
        self.target_transform = target_transform

    def __len__(self):
        return len(self.y)

    def __getitem__(self, idx):
        sample = self.X[:, idx]
        label = self.y[idx]
        if self.transform:
            sample = self.transform(sample)
        if self.target_transform:
            label = self.target_transform(label)
        return sample, label


train_dataset = TechnicalIndicatorsDataset('AAPL', predict_window=7)
n_features = train_dataset.X.shape[1]

batch_size = 16
train_dataloader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True, num_workers=4)
#test_dataloader = torch.DataLoader(test_dataset, batch_size=batch_size, shuffle=True, num_workers=4)

(5630, 38)


  dip[idx] = 100 * (self._dip[idx] / value)
  din[idx] = 100 * (self._din[idx] / value)


In [2]:
import torch.nn as nn

#n_features = 20
n_outputs = 1

net = nn.Sequential(
    # Pass input to a 1D convolutional layer with a kernel size of 3, apply to activation function.
    nn.Conv1d(n_features, 32, 3),
    nn.ReLU(),

    # Pass previous layer output to a 1D convolutional layer with a kernel size of 2, apply to activation function,
    # and get the max value from each kernel.
    nn.Conv1d(32, 32, 2),
    nn.ReLU(),
    nn.MaxPool1d(kernel_size=2),

    # Pass previous layer output to a 1D convolutional layer with a kernel size of 2, apply to activation function,
    # and get the max value from each kernel. (same as previous layer)
    nn.Conv1d(32, 32, 2),
    nn.ReLU(),
    nn.MaxPool1d(kernel_size=2),

    # Flatten the convolutions. Input shape: (a, b, c), Output shape: (a, b*c)
    nn.Flatten(),
    
    #nn.Dropout(0.5),
    # ?
    # XXX: The first number needs to be updated each time the input shapes change. We could instead
    #      Create a class-based Module, and do a single pass through the conv portion of the network
    #      in order to determine the actual size.
    #      (This technique is shown in https://www.youtube.com/watch?v=1gQR24B3ISE&list=PLQVvvaa0QuDdeMyHEYc0gxFpYwHY2Qfdh&index=7).
    #      For now, we can update this value as needed by commenting out all layers after Flatten(), then running the code
    #      below and inspecting the output shape. The x[1] value should be the first arg in the following line.
    #nn.Linear(1696, 512),  # ~= nn.LazyLinear(512)
    nn.LazyLinear(512),

    # Flatten the linear layer into the required number of outputs
    nn.Linear(512, n_outputs),
    #nn.Softmax()
)

# Test, print basic info about model
for i, data in enumerate(train_dataloader, 0):
    features, labels = data
    print("Model input shape:", features.shape)
    out = net(features.float())
    print("Model output shape:", out.shape)
    break



IndexError: Caught IndexError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/home/jared/.local/share/virtualenvs/notebooks-AmY4tn1L/lib/python3.10/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
    data = fetcher.fetch(index)
  File "/home/jared/.local/share/virtualenvs/notebooks-AmY4tn1L/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/jared/.local/share/virtualenvs/notebooks-AmY4tn1L/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 49, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/tmp/ipykernel_579884/4074662948.py", line 35, in __getitem__
    sample = self.X[:, idx]
IndexError: index 1856 is out of bounds for axis 1 with size 38
