# Predicting the movement of video game stocks using supervised learning

Here is a quick primer on the case for owning video game stocks:

- Highly profitable businesses
- Produce a product that is extremely sticky (read: addictive)
- Companies have a considerable moat due to the increasing cost in producing Triple AAA games
- Overall market expected to increase at an annual rate of about 5-6% until 2021.

Further to that, the  games that are produced by these companies are increasingly played in the public realm thanks in part to the advent of modern streaming platforms like Youtube and Twitch. The games also enjoy significant coverage and discussion in public forums like Twitter.

These stocks are one of the first, where the use of the companies product can be estimated, if not accurately tracked, in real time. I suspect that this offers an opportunity to make informed decisions on ownership and trading of these companies and further to this, probably impacts the price behavior of these stocks.

This capstone will focus on the second part of that thesis and aim to predict increases in stock price from available market indicators. Due to the provenance of large amounts of data in this domain, we will initially build a model on one indicator company and see if this model can be applied to other companies, first without reteaching based on that model and if required, with reteaching.

<h1><center> Companies of interest: </center></h1>

| Company     | Market Cap (in $, rounded) | Country of Origin         |
|-------------|----------------------------|---------------------------|
| EA          | 28 Billion                 | US                        |
| Take Two    | 11 Billion                 | US                        |
| Activision  | 36 Billion                 | US                        |
| Ubisoft     | 9 Billion                  | France/Operates worldwide |
| Square Enix | 4 Billion                  | Japan                     |
| Nintendo    | 45 Billion                 | Japan                     |
| Konami      | 7 Billion                  | Japan                     |
| Capcom      | 3 Billion                  | Japan                     |


From this list, Activision is a fairly representative company. So let's start by modelling and generating our features for this company. The features used will be consistent across all companies.

In [3]:
import pandas as pd
import numpy as np
current_state = np.random.get_state()
np.random.set_state(current_state)

import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
%matplotlib inline

from scipy import stats
from scipy.stats import ttest_ind
import time 
import random

from collections import Counter
from sklearn import ensemble
from sklearn.model_selection import train_test_split, cross_val_score, GridSearchCV
from sklearn.metrics import roc_curve, precision_recall_curve, auc, make_scorer, recall_score, accuracy_score, precision_score, confusion_matrix
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.tree import DecisionTreeClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import confusion_matrix, classification_report
from sklearn.metrics import roc_curve
from sklearn.metrics import accuracy_score
from sklearn import model_selection
from sklearn.naive_bayes import BernoulliNB
from sklearn.neighbors import KNeighborsClassifier
from sklearn import linear_model
from sklearn import svm
from sklearn import metrics
from IPython.display import display
from sklearn.metrics import recall_score

import warnings
warnings.filterwarnings("ignore")

In [2]:
from IPython.display import HTML
HTML('''<script>
code_show=true; 
function code_toggle() {
 if (code_show){
 $('div.input').hide();
 } else {
 $('div.input').show();
 }
 code_show = !code_show
} 
$( document ).ready(code_toggle);
</script>
The raw code for this Jupyter notebook is by default hidden for easier reading.
To toggle on/off the raw code, click <a href="javascript:code_toggle()">here</a>.''')

In [83]:
atvi = pd.read_csv('ATVI.csv')

In [84]:
atvi.head()

Unnamed: 0,date,open,high,low,close,adj_close,volume,change
0,4/24/2014,19.82,19.99,19.43,19.57,18.849047,2326500,-0.25
1,4/25/2014,19.440001,19.530001,19.25,19.459999,18.743099,2934700,0.019998
2,4/28/2014,19.639999,19.639999,18.889999,19.33,18.617887,6142300,-0.309999
3,4/29/2014,19.389999,19.700001,19.290001,19.690001,18.964626,4191900,0.300002
4,4/30/2014,19.59,20.040001,19.450001,20.01,19.272837,5563400,0.42


In [85]:
print(atvi.describe())

              open         high          low        close    adj_close  \
count  1258.000000  1258.000000  1258.000000  1258.000000  1258.000000   
mean     43.923013    44.451208    43.336161    43.914841    43.188019   
std      18.245345    18.474054    17.973257    18.227464    18.253969   
min      18.000000    18.740000    17.730000    18.209999    17.539148   
25%      25.889999    26.092500    25.662500    25.942501    25.242547   
50%      40.465000    40.760000    40.039999    40.489999    39.762289   
75%      61.442500    62.017499    60.709999    61.377501    60.576089   
max      84.180000    84.680000    82.739998    83.389999    82.725464   

             volume       change  
count  1.258000e+03  1258.000000  
mean   7.889920e+06    -0.008172  
std    6.193421e+06     0.852801  
min    1.369100e+06    -5.119995  
25%    5.025075e+06    -0.329999  
50%    6.528100e+06     0.010002  
75%    8.904475e+06     0.367499  
max    1.330824e+08     4.620003  


The dataset is simple and deep - there's a bunch of samples but few variables. We're going to engineer specific variables into this dataset which are known to correlate with price movement. These more complicated variables can all be calculated from the variables we have here.

# Feature engineering

We will start with the following variables:

Relative Strength Index:

100 - (100/(1+RS))

Stochiastic Oscillator:

so = 100 * (closing price - lowest closing price over past 14 days)/(highestclosingprice - lowestclosing price)


Stochiastic Oscillator:

Price Rate of Change:

Moving Average Convergence Divergence:

On Balance Volume:

Williams %R:


In [96]:
#MAKE RSI VARIABLE
#make loss and gain columns based on change
atvi['gain'] = atvi.change [atvi.change > 0] 
atvi['loss'] = atvi.change [atvi.change < 0]

#setup window for calculating RSI
window = 14

#enter 0 values so these are appropriately counted
atvi['gain'] = atvi['gain'].fillna(0)
atvi['loss'] = atvi['loss'].fillna(0)

#fix loss column

atvi['loss'] = -atvi['loss'] 

#calculate number of gains/loss in last 14 days

atvi['rsmeanloss_14 days'] = atvi['loss'].rolling(window, min_periods=1).mean()

atvi['rsmeangain_14 days'] = atvi['gain'].rolling(window, min_periods=1).mean()

#create RS

atvi['rs'] = atvi['rsmeangain_14 days']/atvi['rsmeanloss_14 days']

#create RSI

atvi['rsi'] = 100 - 100/(1+atvi['rs'])

In [107]:
#MAKE Stochiastic Oscillator variable

atvi['14daylowestlow'] = atvi['low'].rolling(window).min()
atvi['14dayhighesthigh'] = atvi['high'].rolling(window).max()

atvi['so'] = 100 * (atvi['close'] - atvi['14daylowestlow'])/(atvi['14dayhighesthigh'] - atvi['14daylowestlow'])

In [109]:
#make williams R%

atvi['williams_R']= (atvi['14dayhighesthigh']-atvi['close'])/(atvi['14dayhighesthigh'] - atvi['14daylowestlow']) * -100

In [139]:
#calculate price rate of change
roc= []
for i in range(len(atvi['close'])):
    roc.append((atvi['close'].iloc[i]-atvi['close'].iloc[i-14])/(atvi['close'].iloc[i-14]))
    
atvi['roc']=roc

atvi['roc'].iloc[0:14]=0

In [158]:
#exponential moving average
def ema(data, period=0, column='<CLOSE>'):
    data['ema' + str(period)] = data[column].ewm(ignore_na=False, min_periods=period, com=period, adjust=True).mean()
    
    return data

ema(atvi, period=10, column='close')
ema(atvi, period=14, column='close')
ema(atvi, period=26, column='close')
ema(atvi, period=12, column='close')

Unnamed: 0,date,open,high,low,close,adj_close,volume,change,gain,loss,...,so,williams_R,roc,obv,obv_ema21,obv_ema14,ema10,ema14,ema26,ema12
0,4/24/2014,19.820000,19.990000,19.430000,19.570000,18.849047,2326500,-0.250000,0.000000,0.250000,...,,,0.000000,2326500.0,2.326500e+06,2.326500e+06,,,,
1,4/25/2014,19.440001,19.530001,19.250000,19.459999,18.743099,2934700,0.019998,0.019998,-0.000000,...,,,0.000000,-608200.0,8.250256e+05,8.085517e+05,,,,
2,4/28/2014,19.639999,19.639999,18.889999,19.330000,18.617887,6142300,-0.309999,0.000000,0.309999,...,,,0.000000,-6750500.0,-1.818489e+06,-1.886831e+06,,,,
3,4/29/2014,19.389999,19.700001,19.290001,19.690001,18.964626,4191900,0.300002,0.300002,-0.000000,...,,,0.000000,-2558600.0,-2.016621e+06,-2.072532e+06,,,,
4,4/30/2014,19.590000,20.040001,19.450001,20.010000,19.272837,5563400,0.420000,0.420000,-0.000000,...,,,0.000000,3004800.0,-9.167949e+05,-9.123479e+05,,,,
5,5/1/2014,19.969999,20.170000,19.850000,19.969999,19.234310,3004000,0.000000,0.000000,-0.000000,...,,,0.000000,800.0,-7.455418e+05,-7.327556e+05,,,,
6,5/2/2014,20.270000,20.309999,19.870001,19.940001,19.205418,5981000,-0.329999,0.000000,0.329999,...,,,0.000000,-5980200.0,-1.601638e+06,-1.646055e+06,,,,
7,5/5/2014,19.889999,19.900000,19.330000,19.420000,18.704573,8977300,-0.469999,0.000000,0.469999,...,,,0.000000,-14957500.0,-3.555210e+06,-3.738210e+06,,,,
8,5/6/2014,19.440001,19.490000,19.120001,19.309999,18.598623,8440300,-0.130002,0.000000,0.130002,...,,,0.000000,-23397800.0,-6.191791e+06,-6.571665e+06,,,,
9,5/7/2014,20.170000,21.059999,20.059999,21.010000,20.235998,20043000,0.840000,0.840000,-0.000000,...,,,0.000000,-3354800.0,-5.845132e+06,-6.141362e+06,19.835915,,,


In [159]:
#moving average convergence divergence

def macd(data, period_long=26, period_short=12, period_signal=9, column='<CLOSE>'):
    remove_cols = []
    if not 'ema' + str(period_long) in data.columns:
        data = ema(data, period_long)
        remove_cols.append('ema' + str(period_long))

    if not 'ema' + str(period_short) in data.columns:
        data = ema(data, period_short)
        remove_cols.append('ema' + str(period_short))

    data['macd_val'] = data['ema' + str(period_short)] - data['ema' + str(period_long)]
    data['macd_signal_line'] = data['macd_val'].ewm(ignore_na=False, min_periods=0, com=period_signal, adjust=True).mean()

    data = data.drop(remove_cols, axis=1)
        
    return data

macd(atvi, period_long=26, period_short=12, period_signal=9, column='close')

Unnamed: 0,date,open,high,low,close,adj_close,volume,change,gain,loss,...,roc,obv,obv_ema21,obv_ema14,ema10,ema14,ema26,ema12,macd_val,macd_signal_line
0,4/24/2014,19.820000,19.990000,19.430000,19.570000,18.849047,2326500,-0.250000,0.000000,0.250000,...,0.000000,2326500.0,2.326500e+06,2.326500e+06,,,,,,
1,4/25/2014,19.440001,19.530001,19.250000,19.459999,18.743099,2934700,0.019998,0.019998,-0.000000,...,0.000000,-608200.0,8.250256e+05,8.085517e+05,,,,,,
2,4/28/2014,19.639999,19.639999,18.889999,19.330000,18.617887,6142300,-0.309999,0.000000,0.309999,...,0.000000,-6750500.0,-1.818489e+06,-1.886831e+06,,,,,,
3,4/29/2014,19.389999,19.700001,19.290001,19.690001,18.964626,4191900,0.300002,0.300002,-0.000000,...,0.000000,-2558600.0,-2.016621e+06,-2.072532e+06,,,,,,
4,4/30/2014,19.590000,20.040001,19.450001,20.010000,19.272837,5563400,0.420000,0.420000,-0.000000,...,0.000000,3004800.0,-9.167949e+05,-9.123479e+05,,,,,,
5,5/1/2014,19.969999,20.170000,19.850000,19.969999,19.234310,3004000,0.000000,0.000000,-0.000000,...,0.000000,800.0,-7.455418e+05,-7.327556e+05,,,,,,
6,5/2/2014,20.270000,20.309999,19.870001,19.940001,19.205418,5981000,-0.329999,0.000000,0.329999,...,0.000000,-5980200.0,-1.601638e+06,-1.646055e+06,,,,,,
7,5/5/2014,19.889999,19.900000,19.330000,19.420000,18.704573,8977300,-0.469999,0.000000,0.469999,...,0.000000,-14957500.0,-3.555210e+06,-3.738210e+06,,,,,,
8,5/6/2014,19.440001,19.490000,19.120001,19.309999,18.598623,8440300,-0.130002,0.000000,0.130002,...,0.000000,-23397800.0,-6.191791e+06,-6.571665e+06,,,,,,
9,5/7/2014,20.170000,21.059999,20.059999,21.010000,20.235998,20043000,0.840000,0.840000,-0.000000,...,0.000000,-3354800.0,-5.845132e+06,-6.141362e+06,19.835915,,,,,


In [152]:
#on balance volume - need to check this one.

def on_balance_volume(data, trend_periods=14, close_col='<CLOSE>', vol_col='<VOL>'):
    for index, row in data.iterrows():
        if index > 0:
            last_obv = data.at[index - 1, 'obv']
            if row[close_col] > data.at[index - 1, close_col]:
                current_obv = last_obv + row[vol_col]
            elif row[close_col] < data.at[index - 1, close_col]:
                current_obv = last_obv - row[vol_col]
            else:
                current_obv = last_obv
        else:
            last_obv = 0
            current_obv = row[vol_col]

        data.set_value(index, 'obv', current_obv)

    data['obv_ema' + str(trend_periods)] = data['obv'].ewm(ignore_na=False, min_periods=0, com=trend_periods, adjust=True).mean()
    
    return data

on_balance_volume(atvi, close_col='close', vol_col='volume')

Unnamed: 0,date,open,high,low,close,adj_close,volume,change,gain,loss,...,14daymax,14daymin,14daylowestlow,14dayhighesthigh,so,williams_R,roc,obv,obv_ema21,obv_ema14
0,4/24/2014,19.820000,19.990000,19.430000,19.570000,18.849047,2326500,-0.250000,0.000000,0.250000,...,,,,,,,0.000000,2326500.0,2.326500e+06,2.326500e+06
1,4/25/2014,19.440001,19.530001,19.250000,19.459999,18.743099,2934700,0.019998,0.019998,-0.000000,...,,,,,,,0.000000,-608200.0,8.250256e+05,8.085517e+05
2,4/28/2014,19.639999,19.639999,18.889999,19.330000,18.617887,6142300,-0.309999,0.000000,0.309999,...,,,,,,,0.000000,-6750500.0,-1.818489e+06,-1.886831e+06
3,4/29/2014,19.389999,19.700001,19.290001,19.690001,18.964626,4191900,0.300002,0.300002,-0.000000,...,,,,,,,0.000000,-2558600.0,-2.016621e+06,-2.072532e+06
4,4/30/2014,19.590000,20.040001,19.450001,20.010000,19.272837,5563400,0.420000,0.420000,-0.000000,...,,,,,,,0.000000,3004800.0,-9.167949e+05,-9.123479e+05
5,5/1/2014,19.969999,20.170000,19.850000,19.969999,19.234310,3004000,0.000000,0.000000,-0.000000,...,,,,,,,0.000000,800.0,-7.455418e+05,-7.327556e+05
6,5/2/2014,20.270000,20.309999,19.870001,19.940001,19.205418,5981000,-0.329999,0.000000,0.329999,...,,,,,,,0.000000,-5980200.0,-1.601638e+06,-1.646055e+06
7,5/5/2014,19.889999,19.900000,19.330000,19.420000,18.704573,8977300,-0.469999,0.000000,0.469999,...,,,,,,,0.000000,-14957500.0,-3.555210e+06,-3.738210e+06
8,5/6/2014,19.440001,19.490000,19.120001,19.309999,18.598623,8440300,-0.130002,0.000000,0.130002,...,,,,,,,0.000000,-23397800.0,-6.191791e+06,-6.571665e+06
9,5/7/2014,20.170000,21.059999,20.059999,21.010000,20.235998,20043000,0.840000,0.840000,-0.000000,...,,,,,,,0.000000,-3354800.0,-5.845132e+06,-6.141362e+06


In [151]:
print(atvi.obv.iloc[1202:1220])

1202    541222900.0
1203    551857900.0
1204    560531200.0
1205    525911600.0
1206    543368600.0
1207    524093400.0
1208    479161000.0
1209    512676100.0
1210    564385000.0
1211    586925100.0
1212    569772600.0
1213    552511000.0
1214    531317800.0
1215    514672500.0
1216    501597800.0
1217    513914900.0
1218    523109100.0
1219    514614900.0
Name: obv, dtype: float64
