## XGBoost Investment Strategy Demo

I'm a total novice in this area, and I'm not afraid to admit it! Luckily, designing a foolproof investment strategy wasn't the point of this project. The point was to see if we've learned to identify which insider trades are worth paying attention to!


#### However, to clearly demonstrate what has been learned, let's define a REALLY simple strategy:
- Knowing that our algorithm is conservative, we'll only put faith in high predictions: If an insider trade is filed that our algorithm predicts will rise by >10\%, purchase \\$1 at the next day's opening.
- If, in the next 90 days, the value of our purchase is at least 15% higher at any closing, immediately sell. (Our algorithm is conservative enough that we've decided to be greedy, hoping to squeeze out a bit more gain than predicted.)

This strategy, of course, assumes that we successfully make each purchase right at the opening and each sell right at the closing.

In [1]:
%load_ext autoreload
%autoreload 1
%aimport my_functions

import datetime as dt
import numpy as np
import pandas as pd
import xgboost as xgb

from sklearn.metrics import mean_squared_error

from my_functions import *

In [2]:
cv_XY = load_obj('data/cv_XY')
historicDat = load_obj('data/historicDat')

In [3]:
runTradeSimulation(cv_XY, 'XGB_Prediction', historicDat, '2021-06-27', '2021-09-27', buyThresh=10, sellThresh=15)

Buying BILL on 2021-06-29, currently $187.0
Buying ASAN on 2021-06-29, currently $63.15
Buying TRVN on 2021-06-29, currently $1.86
Buying CVNA on 2021-06-29, currently $301.63
Buying CVNA on 2021-06-29, currently $301.63
Buying CVNA on 2021-06-29, currently $301.63
Buying CVNA on 2021-06-30, currently $301.67
Buying DCTH on 2021-06-30, currently $12.3
Buying TLRS on 2021-06-30, currently $0.22
Buying COUP on 2021-06-30, currently $269.51
Buying SSNT on 2021-07-01, currently $12.18
Buying CVNA on 2021-07-01, currently $300.0
Buying ZI on 2021-07-01, currently $52.0
Buying ASAN on 2021-07-01, currently $63.18
Buying ZI on 2021-07-01, currently $52.0
Buying SNRG on 2021-07-01, currently $0.3
Buying ETSY on 2021-07-01, currently $206.41
Buying UTHR on 2021-07-01, currently $179.97
Buying MRNA on 2021-07-01, currently $236.3
Selling SSNT on 2021-07-06, currently $14.09, for 15.68% profit
Selling MRNA on 2021-07-16, currently $286.43, for 21.21% profit
Selling ASAN on 2021-07-22, currently $

## Success!
From 2021-06-27 to 2021-09-27, the SPY ETF fund rose by 3.56%.

Over this time period, with just 4 days of buying, our algorithm outperforms SPY by nearly ***10 percentage points***.

This is incredible! It almost seems too good to be true...

A major caveat is that our cross-validation is being performed on trades from a time period when the market was red-hot. The market suffered a bit in early September 2021, but the months of June, July, and August were some of the most profitable months in recent memory, at least in terms of the S&P500.

In particular, we could get away with assuming that our algorithm's estimates would indeed be conservative!

A real way to test our algorithm is to see if it still generates a profit during a bearish market period. For example, let's gather data from 2022-03-29 to 2022-06-29, over which period the value of SPY tumbled by 17.38%:

In [5]:
test_XY_incomplete, historicDat = cleanAndFormatDF(
    'data/sec4_Mar2022', 
    'data/insiderDat_Mar2022_clean', 
    'data/historicDat',                                      
    newORload='load', 
    startDate='2021-06-01',
    endDate='2022-08-01'
)

test_XY_unprepped = createAllFeatures(test_XY_incomplete, historicDat)
test_XY, test_X, test_Y = returnXandY(prepareForModel(test_XY_unprepped), '2022-03-29', '2022-04-04')

There are 537 unique tickers.
Getting historic data for these tickers...

Example ticker data for CMBM:
                 Open       High        Low      Close  Adj Close   Volume
Date                                                                      
2021-06-01  58.939999  59.250000  55.770000  57.500000  57.500000   219100
2021-06-02  54.075001  54.619999  48.009998  48.910000  48.910000   754300
2021-06-03  45.680000  46.610001  45.040001  45.619999  45.619999  1881600
2021-06-04  45.400002  48.932999  45.310001  48.150002  48.150002  1331000
2021-06-07  48.000000  50.418999  47.810001  49.689999  49.689999   479100
...               ...        ...        ...        ...        ...      ...
2022-05-24  13.350000  13.375000  12.640000  13.180000  13.180000   111600
2022-05-25  13.130000  13.560000  13.000000  13.340000  13.340000    96000
2022-05-26  13.260000  13.900000  13.140000  13.770000  13.770000   106600
2022-05-27  13.970000  14.750000  13.880000  14.660000  14.660000   112

In [8]:
xgb_model = xgb.XGBRegressor()
xgb_model.load_model('models/xgb_model.json')
test_Y_preds = xgb_model.predict(test_X)

test_XY['XGB_Prediction'] = test_Y_preds
save_obj(test_XY, 'data/test_XY')

In [9]:
runTradeSimulation(test_XY, 'XGB_Prediction', historicDat, '2022-03-29', '2022-06-29', buyThresh=10, sellThresh=5)

Buying CRM on 2022-03-30, currently $219.71
Buying STIM on 2022-03-30, currently $3.06
Buying WKHS on 2022-03-30, currently $4.95
Buying HLBZ on 2022-03-30, currently $3.24
Buying PQEFF on 2022-03-30, currently $0.35
Buying LCTX on 2022-03-30, currently $1.53
Buying BNED on 2022-03-31, currently $3.55
Buying TLRS on 2022-03-31, currently $0.28
Buying OGEN on 2022-03-31, currently $0.35
Buying LPTH on 2022-03-31, currently $2.04
Buying CLNN on 2022-03-31, currently $3.6
Selling PQEFF on 2022-03-30, currently $0.37, for 5.37% profit
Buying PMD on 2022-04-01, currently $6.9
Buying SPLP on 2022-04-01, currently $42.1
Buying CRM on 2022-04-01, currently $212.48
Buying IGXT on 2022-04-01, currently $0.26
Buying CRM on 2022-04-01, currently $212.48
Buying CREX on 2022-04-01, currently $0.85
Selling CLNN on 2022-03-31, currently $3.94, for 9.44% profit
Buying BRTX on 2022-04-04, currently $5.59
Buying SXT on 2022-04-04, currently $85.8
Buying MRNA on 2022-04-04, currently $177.24
Buying OGEN o

## We were successful in a bull market, but not so much a bear market.
I mean, we did still outperform the S&P500 by 8 percentage points, but we can't exactly brag that we created a portfolio that lost money.

However...

(1) Remember that this is one of the most simplistic investment strategies imagineable.

(2) Luckily, I am not actually managing my money with this strategy.

(3) Strategy aside -- the fact that this didn't generalize perfectly gives me more ideas on how to make the model more robust! 

### Here a few thoughts I have for model improvement...
- **We need more data!** 5000 training examples probably isn't enough.
- **Purchases are underrepresented.** Common wisdom says that investors may sell for any of a number of reasons, but they only buy for one reason. The data set has a lopsided majority of trades that are sales, which may cause our algorithms to miss out on important information to be gained from purchases.
- **We need to be aware of current market trends.** The stock market is dynamic. Insider trades might signal different outcomes depending on how the economy is doing as a whole. This should inform our choice of training set.
- **For a neural model:** We need to have more outputs! We could use the Keras Functional API to have different activations in the final layer, such as
    - a softmax output categorizing the max price increase in the next X days,
    - a linear output predicting the actual max price increase in the next X days,
    - a sigmoid output predicting *where* in the X-day window the max price will occur
- **For XGBoost:** We might be better-served by predicting price categories (e.g. 0-5%, 5-10%, etc.) instead of actual price increase. This may help remove some of the extreme noise. (For example, a 20% run and a 40% run are both great things to identify, but our model would think it performed poorly if it respectively predicted 40% and 20%!)