## Investment Strategy Demo

Our XGBoost model is performing better than the neural network at the moment, so we'll demo with just XGBoost.

I'm a novice when it comes to developing trading strategies, and I'm not afraid to admit it. Luckily, our model is here to save us. If we use its predictions to inform the simplest of trading strategies, and that strategy competes with the SPY fund, then we'll know that our model is useful.

#### Here's a *really* simple strategy:
- Knowing that our algorithm is conservative and tends to be more accurate about high predictions, we'll only put faith in high predictions: If an insider trade is filed for a ticker that our algorithm predicts will increase by >10% in valuation, purchase \\$1 at the next day's opening.
- Hold onto our shares unless their value is >10% higher at any market closing, in which case we immediately sell.

This strategy, of course, assumes that we successfully make each purchase right at the opening and each sell right at the closing.

In [1]:
%load_ext autoreload
%autoreload 1
%aimport my_functions

import datetime as dt
import numpy as np
import pandas as pd
import xgboost as xgb
import tensorflow as tf

strptime = dt.datetime.strptime
strftime = dt.date.strftime

from my_functions import *

In [2]:
DAYS_TO_LOOK_FORWARD = 5  # for computing best price increase

In [3]:
historicDat = my_misc.load_obj('data/historicDat')

In [4]:
test_XY, test_X, test_Y = my_model_prep.returnXandY(pd.read_csv('data/test.csv'))

In [5]:
'''
Use our trained model to generate price-change predictions for the test set.
'''
#nn_model= tf.keras.models.load_model('models/nn_model')
xgb_model = xgb.XGBRegressor()
xgb_model.load_model('models/xgb_model.json')

test_Y_preds = xgb_model.predict(test_X.drop('Unnamed: 0', axis=1))

test_XY['XGB_Prediction'] = test_Y_preds
my_misc.save_obj(test_XY, 'data/test_XY')

In [9]:
SIM_START_DATE = min(test_XY.FilingDate)
SIM_END_DATE = strftime(strptime(max(test_XY.FilingDate), '%Y-%m-%d') + dt.timedelta(days=DAYS_TO_LOOK_FORWARD), '%Y-%m-%d')

my_sims.runTradeSimulation(
    test_XY, 'XGB_Prediction', historicDat, SIM_START_DATE, SIM_END_DATE, buyThresh=10, sellThresh=10
)

Buying BSET on 2022-10-10, currently $15.76
Buying SRRK on 2022-10-10, currently $7.94
Buying RKT on 2022-10-10, currently $6.73
Buying HSON on 2022-10-10, currently $32.86
Buying DFCO on 2022-10-10, currently $0.14
Buying FARM on 2022-10-10, currently $4.79
Buying RKT on 2022-10-10, currently $6.73
Buying WAVD on 2022-10-10, currently $1.11
Buying TPL on 2022-10-10, currently $1999.79
Selling BSET on 2022-10-11, currently $17.83, for 13.13% profit
Buying OTLK on 2022-10-12, currently $1.24
Buying ANGO on 2022-10-12, currently $13.67
Buying CRDF on 2022-10-12, currently $1.56
Buying RCKT on 2022-10-12, currently $17.33
Buying CAMP on 2022-10-12, currently $3.62
Buying NILE on 2022-10-12, currently $0.18
Buying ESTE on 2022-10-12, currently $13.77
Buying ESTE on 2022-10-12, currently $13.77
Buying OTLK on 2022-10-12, currently $1.24
Buying RWLK on 2022-10-12, currently $0.8
Buying NREF on 2022-10-12, currently $15.59
Buying NREF on 2022-10-12, currently $15.59
Buying WAVD on 2022-10-12,

## Success!
From 2022-10-07 to 2022-10-25, the SPY ETF fund's price per share rose from \\$362.79 to \\$384.92, a 6.1% gain.

Over this time period, with just 4 days of buying, our algorithm outperforms SPY by ***3.2 percentage points***.

Looking forward, a good way to test our algorithm is to see if it still generates a profit during a period in which SPY declines.

### Here a few thoughts I have for model improvement...
- **We need more data!** Perhaps 20,000 training examples isn't enough.
- **We need to be aware of current market trends.** The stock market is dynamic. Insider buys might signal different outcomes depending on how the economy is doing as a whole. We can do more to take this into account than just using the current SPY value.
- **For a neural model:** Perhaps categorization is the move, instead of regression. We could also have more outputs. We can use the Keras Functional API to have different activations in the final layer, such as
    - a softmax output that places the expected X-day price increase into a *category*, and
    - a sigmoid output predicting *where* in the X-day window the max price will occur
- **For XGBoost:** Again, we might be better-served by predicting price categories (e.g. 0-1%, 1-5%, etc.) instead of actual price increase. This may help remove some of the extreme noise. (For example, a 20% run and a 40% run are both great things to identify, but our model would think it performed poorly if it respectively predicted 40% and 20%!)
- **Other features:** I think that collecting Google Trends data regarding ticker search popularity could be helpful. Unfortunately, the Google Trends API rate limit of 10 requests/second is somewhat limiting.

Also, we should keep in mind that individuals vary *a lot*. Some insiders do routine buys and don't try to be opportunistic, while others are the opposite. If we really wanted to go deep with our analysis, we might want to work with an LSTM framework that remembers particular insiders' decisions and their effectiveness.