## XGBoost Investment Strategy Demo

I'm a total noob in this area, and I'm not afraid to admit it! Luckily, designing a foolproof investment strategy wasn't the point of this project. The point was to see if we've learned to identify which insider trades are worth paying attention to!


#### However, to clearly demonstrate what has been learned, let's define a REALLY simple strategy:
- Knowing that our algorithm is conservative, we'll only put faith in high predictions: If an insider trade is filed that our algorithm predicts will rise by >10\%, purchase \\$1 at the next day's opening.
- If, in the next 90 days, the value of our purchase is at least 15\% higher at any closing, immediately sell. (Our algorithm is conservative enough that we've decided to be greedy, hoping to squeeze out a bit more gain than predicted.)

This strategy, of course, assumes that we successfully make each purchase right at the opening and each sell right at the closing.

In [121]:
import datetime as dt
import numpy as np
import pandas as pd
import nbimporter
import sys
import xgboost as xgb

from sklearn.metrics import mean_squared_error

mods = ['exploratory_analysis', 'create_all_features', 'prep_and_split_data']
[sys.modules.pop(mod) for mod in mods if mod in sys.modules]

from exploratory_analysis import save_obj, load_obj, returnDataOnDate, cleanAndFormatDF
from create_all_features import createAllFeatures
from prep_and_split_data import prepareForModel, returnXandY

In [6]:
cv_XY = load_obj('data/cv_XY')

In [7]:
historicDat = load_obj('data/historicDat')

In [38]:
def runTradeSimulation(data_XY, historicDat, startDate, endDate, buyThresh, sellThresh):
    purchasesDict = {}
    totalInvested = 0
    totalProfit = 0

    for d in pd.date_range(start=startDate, end=endDate):    
        currentDate = dt.date.strftime(d.date(), '%Y-%m-%d')

        for tradeNum, trade in data_XY[data_XY['FilingDate'] == d.date()].iterrows():
            # Check prediction. If high enough, purchase at next day's opening.
            if trade['Prediction'] < buyThresh:
                continue

            tick = trade['Ticker']

            buyPrice, buyDate = returnDataOnDate(historicDat, tick, currentDate, delta=1, dataName='Open')
            buyDate = dt.date.strftime(buyDate, '%Y-%m-%d')

            totalInvested += 1

            print(f'''Buying {tick} on {buyDate}, currently ${round(buyPrice, 2)}''')

            if tick in purchasesDict.keys():
                purchasesDict[tick]['BuyPrice'].append(buyPrice)
                purchasesDict[tick]['SellPrice'].append(None)
            else:
                purchasesDict[tick] = {'BuyPrice': [buyPrice], 'SellPrice': [None]}


        # Check current already-purchased stocks. If value has risen enough, sell at closing.
        for tick, elem in purchasesDict.items():
            for buyNum, buyPrice in enumerate(elem['BuyPrice']):
                try:
                    currentPrice = historicDat[tick].loc[currentDate]['Close']
                    if (currentPrice > (1. + sellThresh/100)*buyPrice) and (elem['SellPrice'][buyNum] is None):
                        elem['SellPrice'][buyNum] = currentPrice
                        profit = (currentPrice-buyPrice) / buyPrice
                        totalProfit += profit
                        print(f'Selling {tick} on {currentDate}, currently ${round(currentPrice, 2)}, ' +
                                     f'a {round(100*profit, 2)}% profit')
                              
                except KeyError:
                    pass  # unable to sell on this day; move on

            # sell everything that hasn't already been sold on the last day
            if d.date() == dt.datetime.strptime(endDate, '%Y-%m-%d').date():
                elem['SellPrice'] = [currentPrice if val is None else val for val in elem['SellPrice']]


    print('\n-----------------------------------------\n')

    '''Determine total profit in the given time period.'''
    print(f'We invested ${totalInvested} and earned ${round(totalProfit, 2)} for a return of ' +
    f'{round(100*totalProfit/totalInvested, 2)}%.')

In [39]:
runTradeSimulation(cv_XY, historicDat, '2021-06-27', '2021-09-27', 10, 15)

Buying BILL on 2021-06-29, currently $187.0
Buying ASAN on 2021-06-29, currently $63.15
Buying TRVN on 2021-06-29, currently $1.86
Buying GMS on 2021-06-29, currently $49.4
Buying CVNA on 2021-06-29, currently $301.63
Buying CVNA on 2021-06-29, currently $301.63
Buying CVNA on 2021-06-29, currently $301.63
Buying ZI on 2021-06-30, currently $53.0
Buying ZI on 2021-06-30, currently $53.0
Buying CVNA on 2021-06-30, currently $301.67
Buying UHAL on 2021-06-30, currently $587.37
Buying TLRS on 2021-06-30, currently $0.22
Buying ZI on 2021-06-30, currently $53.0
Buying ZI on 2021-06-30, currently $53.0
Buying CVNA on 2021-07-01, currently $300.0
Buying ZI on 2021-07-01, currently $52.0
Buying ZI on 2021-07-01, currently $52.0
Buying ZI on 2021-07-01, currently $52.0
Buying ZI on 2021-07-01, currently $52.0
Buying KOD on 2021-07-01, currently $95.05
Buying ZI on 2021-07-01, currently $52.0
Buying ASAN on 2021-07-01, currently $63.18
Buying ZI on 2021-07-01, currently $52.0
Buying ETSY on 202

In [None]:
test_XY_incomplete, historicDat_temp = cleanAndFormatDF(
    'data/sec4_Mar2022', 
    'data/insiderDat_Mar2022_clean', 
    'data/historicDat',                                      
    newORload='load', 
    startDate='2021-06-01',
    endDate='2022-08-01'
)

historicDat.update(historicDat_temp)
save_obj(historicDat, 'data/historicDat')

test_XY_unprepped = createAllFeatures(test_XY_incomplete, historicDat, '2022-03-28', '2022-04-04')
test_XY, test_X, test_Y = returnXandY(prepareForModel(test_XY_unprepped), '2022-03-29', '2022-04-04')

In [118]:
xgb_model = xgb.XGBRegressor()
xgb_model.load_model('models/xgb_model.json')
test_Y_preds = xgb_model.predict(test_X)

test_XY['Prediction'] = test_Y_preds
save_obj(test_XY, 'data/test_XY')

print('test MSE: ', mean_squared_error(test_Y, test_Y_preds))

test MSE:  126.51099537834722


In [120]:
runTradeSimulation(test_XY, historicDat, '2022-03-29', '2022-06-29', buyThresh=10, sellThresh=15)

Buying CRM on 2022-03-30, currently $219.71
Buying SXT on 2022-03-30, currently $86.66
Buying STIM on 2022-03-30, currently $3.06
Buying MORN on 2022-03-30, currently $280.49
Buying LLY on 2022-03-30, currently $290.58
Buying BRTX on 2022-03-30, currently $5.4
Buying HLBZ on 2022-03-30, currently $3.24
Buying PQEFF on 2022-03-30, currently $0.35
Buying LCTX on 2022-03-30, currently $1.53
Buying CRM on 2022-03-31, currently $214.5
Buying NSYS on 2022-03-31, currently $10.15
Buying PQEFF on 2022-03-31, currently $0.38
Buying LPTH on 2022-03-31, currently $2.04
Buying LLY on 2022-03-31, currently $289.37
Buying JHG on 2022-04-01, currently $35.67
Buying JHG on 2022-04-01, currently $35.67
Buying SPLP on 2022-04-01, currently $42.1
Buying SPLP on 2022-04-01, currently $42.1
Buying CRM on 2022-04-01, currently $212.48
Buying HGBL on 2022-04-01, currently $1.36
Buying LLY on 2022-04-01, currently $286.15
Buying BTTX on 2022-04-01, currently $2.04
Buying SEEL on 2022-04-01, currently $0.86
Bu