## Investment Strategy Demo

Our neural network is performing better than the XGBoost model at the moment, so we'll demo with just the NN.

I'm a novice when it comes to developing trading strategies, and I'm not afraid to admit it. Luckily, our model is here to save us. If we use its predictions to inform the simplest of trading strategies, and that strategy competes with the SPY fund, then we'll know that our model is useful.

#### Here's a *really* simple strategy:
- Knowing that our algorithm is conservative, we'll only put faith in high predictions: If an insider trade is filed for a ticker that our algorithm predicts will increase by >15% in valuation, purchase \\$1 at the next day's opening.
- If, in the next 90 days, the value of our purchase is at least 15% higher at any market closing, immediately sell.

This strategy, of course, assumes that we successfully make each purchase right at the opening and each sell right at the closing.

In [53]:
%load_ext autoreload
%autoreload 1
%aimport my_functions

import datetime as dt
import numpy as np
import pandas as pd
import xgboost as xgb
import tensorflow as tf

from my_functions import *

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [42]:
'''
Load the cross-validation insider trade data and the historic ticker data.
'''
cv_XY = my_misc.load_obj('data/cv_XY')
historicDat = my_misc.load_obj('data/historicDat')

In [43]:
'''
Run our simulation on data between 2021-08-27 and 2021-11-26.
'''
my_sims.runTradeSimulation(
    cv_XY, 'NN_Prediction', historicDat, '2021-08-27', '2021-11-26', buyThresh=15, sellThresh=15
)

Buying FFBC on 2021-08-30, currently $24.07
Buying NR on 2021-08-30, currently $2.6
Buying VYNT on 2021-08-30, currently $2.73
Buying EFOI on 2021-08-30, currently $3.38
Buying RBLX on 2021-08-30, currently $84.6
Buying DXCM on 2021-08-30, currently $130.4
Buying CHH on 2021-08-30, currently $121.47
Buying EMMA on 2021-08-30, currently $1.5
Buying KLAC on 2021-08-30, currently $345.34
Buying NEU on 2021-08-30, currently $344.53
Buying SOFI on 2021-08-30, currently $14.18
Buying OKTA on 2021-08-30, currently $263.0
Buying SPRO on 2021-08-30, currently $17.26
Buying ANET on 2021-08-30, currently $93.39
Buying ANET on 2021-08-30, currently $93.39
Buying ANET on 2021-08-30, currently $93.39
Buying OSCR on 2021-08-30, currently $14.52
Buying OSCR on 2021-08-30, currently $14.52
Buying SWAV on 2021-08-30, currently $211.0
Buying VAL on 2021-08-30, currently $28.72
Buying LOV on 2021-08-30, currently $3.33
Buying LUMO on 2021-08-30, currently $10.62
Buying DASH on 2021-08-30, currently $187.9

## Success!
From 2021-08-27 to 2021-11-27, the SPY ETF fund rose from \\$450.19 to \\$456.94, a 1.5% gain.

Over this time period, with just 4 days of buying, our algorithm outperforms SPY by ***4.4 percentage points***.

It almost seems too good to be true...

A major caveat is that our cross-validation is being performed during a time when the market was at an all-time peak. The market suffered a bit in early September 2021, but otherwise, mid-2021 had some of the most profitable months in recent memory, at least in terms of the S&P500.

In particular, we could get away with assuming that our algorithm's estimates would indeed be conservative.

A good way to test our algorithm is to see if it still generates a profit during a bearish market period. For example, let's gather data from 2022-03-29 to 2022-06-29, over which period the value of SPY tumbled by ***17.38%***. We'll tone back our confidence in the algorithm a bit, knowing that we're in a bearish market period, so we'll keep our buy threshold at 15% but lower our sell threshold to 5%, just hoping to make *some* profit.

In [15]:
DAYS_TO_LOOK_FORWARD = 90  # for computing median price increase
DAYS_TO_LOOK_BACK = 6  # for computing volume volatility and related insider buys
WINDOW_LEN = 3  # number of days over which to compute median price increase
MIN_OUTPUT = -10 + 1e-6  # change all lower price changes to this value
MAX_OUTPUT = 70. - 1e-6  # change all highger price changes to this value

In [44]:
'''
Gather data, create features, and prepare the data for modeling.
'''

test_XY_incomplete, historicDat = my_cleaning.cleanAndFormatDF(
    'data/sec4_Mar2022', 
    'data/insiderDat_Mar2022_clean', 
    'data/historicDat',                                      
    newORload='load', 
    startDate='2021-06-01',
    endDate='2022-08-01'
)

test_XY_unprepped = my_features.createAllFeatures(
    test_XY_incomplete, historicDat, DAYS_TO_LOOK_FORWARD, WINDOW_LEN, DAYS_TO_LOOK_BACK, MIN_OUTPUT, MAX_OUTPUT
)

test_XY, test_X, test_Y = my_model_prep.returnXandY(
    my_model_prep.prepareForModel(test_XY_unprepped), '2022-03-23', '2022-04-04'
)

There are 685 unique tickers.
13 tickers to download.
[*********************100%***********************]  13 of 13 completed

Example ticker data for CMBM:
                 Open       High        Low      Close  Adj Close   Volume
Date                                                                      
2021-06-01  58.939999  59.250000  55.770000  57.500000  57.500000   219100
2021-06-02  54.075001  54.619999  48.009998  48.910000  48.910000   754300
2021-06-03  45.680000  46.610001  45.040001  45.619999  45.619999  1881600
2021-06-04  45.400002  48.932999  45.310001  48.150002  48.150002  1331000
2021-06-07  48.000000  50.418999  47.810001  49.689999  49.689999   479100
...               ...        ...        ...        ...        ...      ...
2022-05-24  13.350000  13.375000  12.640000  13.180000  13.180000   111600
2022-05-25  13.130000  13.560000  13.000000  13.340000  13.340000    96000
2022-05-26  13.260000  13.900000  13.140000  13.770000  13.770000   106600
2022-05-27  13.9700

In [57]:
'''
Use our trained neural network to generate price-change predictions for the test set.
'''
import nbimporter
from D_neural_net import asymm_rmse

nn_model= tf.keras.models.load_model('models/nn_model', compile=False)

nn_model.compile(
            optimizer=tf.keras.optimizers.Adam(learning_rate=5e-4, clipnorm=1.0),
            loss=asymm_rmse,
            metrics=asymm_rmse
)

test_Y_preds = nn_model.predict(test_X)

test_XY['NN_Prediction'] = test_Y_preds
my_misc.save_obj(test_XY, 'data/test_XY')



In [60]:
'''
Run the trade simulation on our data between 2022-03-29 and 2022-06-29.
'''
my_sims.runTradeSimulation(
    test_XY, 'NN_Prediction', historicDat, '2022-03-29', '2022-06-29', buyThresh=15, sellThresh=5
)

Buying FDP on 2022-03-30, currently $26.42
Buying FDP on 2022-03-30, currently $26.42
Buying ETR on 2022-03-30, currently $116.03
Buying FTCV on 2022-03-30, currently $9.87
Buying PING on 2022-03-30, currently $26.99
Buying INSP on 2022-03-30, currently $259.72
Buying ISEE on 2022-03-30, currently $16.41
Buying ADSK on 2022-03-30, currently $218.62
Buying MIXT on 2022-03-30, currently $11.56
Buying AIR on 2022-03-30, currently $48.2
Buying USM on 2022-03-30, currently $30.42
Buying LTBR on 2022-03-30, currently $8.94
Buying KN on 2022-03-30, currently $22.64
Buying MRO on 2022-03-30, currently $25.37
Buying BSX on 2022-03-30, currently $44.65
Buying COUP on 2022-03-30, currently $107.95
Buying BAH on 2022-03-30, currently $86.82
Buying SAIC on 2022-03-30, currently $91.57
Buying AMP on 2022-03-30, currently $312.22
Buying SRE on 2022-03-30, currently $165.21
Buying AMP on 2022-03-30, currently $312.22
Buying AIZ on 2022-03-31, currently $181.89
Buying ON on 2022-03-31, currently $64.39

## We were successful in a bull market, but not so much a bear market.
I mean, we did outperform the S&P500 by 6 percentage points during this time period, but we can't exactly brag that our portfolio lost money.

However...

(1) Remember that this is the most simplistic investment strategy I could imagine, solely to demonstrate that the algorithm learned something substantive.

(2) Strategy aside -- the fact that this didn't generalize perfectly gives me more ideas on how to make the model more robust! 

### Here a few thoughts I have for model improvement...
- **We need more data!** Perhaps 23,000 training examples isn't enough.
- **Should we ONLY train on purchases?** Common wisdom says that investors may sell for any of a number of reasons, but they only buy for one reason. All of the sale data (particularly sales occurring during the bullish market period) may have disguised the importance of sales.
- **We need to be aware of current market trends.** The stock market is dynamic. Insider trades might signal different outcomes depending on how the economy is doing as a whole. This should inform our choice of training set.
- **For a neural model:** Perhaps categorization is the move, instead of regression. We could also have more outputs. We can use the Keras Functional API to have different activations in the final layer, such as
    - a softmax output that places the expected X-day price increase into a *category*, and
    - a sigmoid output predicting *where* in the X-day window the max price will occur
- **For XGBoost:** Again, we might be better-served by predicting price categories (e.g. 0-5%, 5-10%, etc.) instead of actual price increase. This may help remove some of the extreme noise. (For example, a 20% run and a 40% run are both great things to identify, but our model would think it performed poorly if it respectively predicted 40% and 20%!)

Also, we should keep in mind that individuals vary *a lot*. Some insiders do routine buys/sells and don't try to be opportunistic, while others are the opposite. If we really wanted to go deep with our analysis, we might want to work with an LSTM framework that remembers particular insiders' decisions and their effectiveness.