# Traditional Machine Learning Analysis

**Author**: Maleakhi Agung Wijaya  
**Email**: *maw219@cam.ac.uk*  
**Description**: This file contains code implementation of standard machine learning algorithms, including naive Bayes, logistic regression, k-NN, decision tree, SVC.

In [1]:
import numpy as np
import pandas as pd
from sklearn.preprocessing import scale
from os.path import join
from sklearn.metrics import accuracy_score, f1_score, confusion_matrix, classification_report, mean_absolute_error as mae
import os
from pathlib2 import Path
import seaborn as sns
import matplotlib.pyplot as plt
from IPython.display import display, HTML
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.model_selection import GridSearchCV
from sklearn.naive_bayes import GaussianNB, MultinomialNB
from sklearn.model_selection import cross_val_score
from sklearn.dummy import DummyClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier

import warnings
warnings.filterwarnings('ignore')

import tensorflow as tf
import tensorflow.keras as keras
from tensorflow.keras import layers, models, backend as K, callbacks

In [2]:
%run Utilities.ipynb

## Load data and preprocessing

In this step, we will load all dataframes,fill missing values, and scale so that each column all features are on the same scale. Afterwards, we will generate sequential datasets using **full features, PCA features**, and only **technical indicator features**.

In [3]:
market_orders, n_markets, aggregated_datasets = load_aggregated_datasets([DATASET_DJI, 
                                                                          DATASET_NASDAQ, 
                                                                          DATASET_NYSE,
                                                                          DATASET_RUSSELL, 
                                                                          DATASET_SP])

# Load datasets
## DJI
dji_df = aggregated_datasets["DJI"]

## NASDAQ
nasdaq_df = aggregated_datasets["NASDAQ"]

## NYSE
nyse_df = aggregated_datasets["NYA"]

## Russell
russell_df = aggregated_datasets["RUT"]

## SP
sp_df = aggregated_datasets["S&P"]

In [4]:
# Fill missing values, do some scaling (run prev cell first)
list_df = []

for df in [dji_df, nasdaq_df, nyse_df, russell_df, sp_df]:
    columns = df.columns
    df.fillna(0, inplace=True) # fill na with 0
    y = df["MOVEMENT"].copy()
    X = df.drop(columns=["MOVEMENT"]).copy()
    scaler = StandardScaler()
    X = pd.DataFrame(scaler.fit_transform(X))
    X["MOVEMENT"] = np.array(y)
    X.columns = columns
    list_df.append(X)
    
### Clean dataframe (full features)
dji_df_full = list_df[0]
nasdaq_df_full = list_df[1]
nyse_df_full = list_df[2]
russell_df_full = list_df[3]
sp_df_full = list_df[4]

In [5]:
# PCA dataframe (30 features - explained 90% variance) - pca.explained_variance_ratio_.cumsum()
list_df_pca = []

for df in [dji_df_full, nasdaq_df_full, nyse_df_full, russell_df_full, sp_df_full]:
    pca = PCA(n_components=30)
    y = df["MOVEMENT"].copy()
    X = df.drop(columns=["MOVEMENT"]).copy()
    reduced_X = pd.DataFrame(pca.fit_transform(X))
    reduced_X["MOVEMENT"] = y
    list_df_pca.append(reduced_X)

### Clean dataframe (pca features)
dji_df_pca = list_df_pca[0]
nasdaq_df_pca = list_df_pca[1]
nyse_df_pca = list_df_pca[2]
russell_df_pca = list_df_pca[3]
sp_df_pca = list_df_pca[4]

In [6]:
# Technical indicator dataframe
ti_columns = ["Volume", "mom", "mom1", "mom2", "mom3", 
              "ROC_5", "ROC_10", "ROC_15", "ROC_20",
              "EMA_10", "EMA_20", "EMA_50", "EMA_200"]
list_df_ti = []

for df in [dji_df_full, nasdaq_df_full, nyse_df_full, russell_df_full, sp_df_full]:
    y = df["MOVEMENT"].copy()
    X = df[ti_columns].copy()
    X["MOVEMENT"] = np.array(y)
    list_df_ti.append(X)

### Clean dataframe (ti features)
dji_df_ti = list_df_ti[0]
nasdaq_df_ti = list_df_ti[1]
nyse_df_ti = list_df_ti[2]
russell_df_ti = list_df_ti[3]
sp_df_ti = list_df_ti[4]

In [7]:
# Build sequential dataset
sequence_length = 60

### Sequential dataset (full features)
dji_X_seq, dji_y_seq = generate_sequential_data(dji_df_full, sequence_length)
nasdaq_X_seq, nasdaq_y_seq = generate_sequential_data(nasdaq_df_full, sequence_length)
nyse_X_seq, nyse_y_seq = generate_sequential_data(nyse_df_full, sequence_length)
russell_X_seq, russell_y_seq = generate_sequential_data(russell_df_full, sequence_length)
sp_X_seq, sp_y_seq = generate_sequential_data(sp_df_full, sequence_length)

In [8]:
### Sequential dataset (PCA features)
dji_X_pca_seq, dji_y_pca_seq = generate_sequential_data(dji_df_pca, sequence_length)
nasdaq_X_pca_seq, nasdaq_y_pca_seq = generate_sequential_data(nasdaq_df_pca, sequence_length)
nyse_X_pca_seq, nyse_y_pca_seq = generate_sequential_data(nyse_df_pca, sequence_length)
russell_X_pca_seq, russell_y_pca_seq = generate_sequential_data(russell_df_pca, sequence_length)
sp_X_pca_seq, sp_y_pca_seq = generate_sequential_data(sp_df_pca, sequence_length)

In [9]:
### Sequential dataset (TI features)
dji_X_ti_seq, dji_y_ti_seq = generate_sequential_data(dji_df_ti, sequence_length)
nasdaq_X_ti_seq, nasdaq_y_ti_seq = generate_sequential_data(nasdaq_df_ti, sequence_length)
nyse_X_ti_seq, nyse_y_ti_seq = generate_sequential_data(nyse_df_ti, sequence_length)
russell_X_ti_seq, russell_y_ti_seq = generate_sequential_data(russell_df_ti, sequence_length)
sp_X_ti_seq, sp_y_ti_seq = generate_sequential_data(sp_df_ti, sequence_length)

## Experiments on standard machine learning classifiers

In this step, we will build naive Bayes, logistic regression, ZeroR (baseline), k-NN, decision tree, SVC trained on the datasets that we have created in step 1. We will perform grid search and report the mean cross validation performance.

### Load data

In [10]:
# Sequential flatten (full features)
dji_X_seq_flatten = sequential_reshape(dji_X_seq, (len(dji_X_seq), -1))
nasdaq_X_seq_flatten = sequential_reshape(nasdaq_X_seq, (len(nasdaq_X_seq), -1))
nyse_X_seq_flatten = sequential_reshape(nyse_X_seq, (len(nyse_X_seq), -1))
russell_X_seq_flatten = sequential_reshape(russell_X_seq, (len(russell_X_seq), -1))
sp_X_seq_flatten = sequential_reshape(sp_X_seq, (len(sp_X_seq), -1))

In [11]:
# Sequential flatten (pca)
dji_X_pca_seq_flatten = sequential_reshape(dji_X_pca_seq, (len(dji_X_pca_seq), -1))
nasdaq_X_pca_seq_flatten = sequential_reshape(nasdaq_X_pca_seq, (len(nasdaq_X_pca_seq), -1))
nyse_X_pca_seq_flatten = sequential_reshape(nyse_X_pca_seq, (len(nyse_X_pca_seq), -1))
russell_X_pca_seq_flatten = sequential_reshape(russell_X_pca_seq, (len(russell_X_pca_seq), -1))
sp_X_pca_seq_flatten = sequential_reshape(sp_X_pca_seq, (len(sp_X_pca_seq), -1))

In [12]:
# Sequential flatten (technical indicator)
dji_X_ti_seq_flatten = sequential_reshape(dji_X_ti_seq, (len(dji_X_ti_seq), -1))
nasdaq_X_ti_seq_flatten = sequential_reshape(nasdaq_X_ti_seq, (len(nasdaq_X_ti_seq), -1))
nyse_X_ti_seq_flatten = sequential_reshape(nyse_X_ti_seq, (len(nyse_X_ti_seq), -1))
russell_X_ti_seq_flatten = sequential_reshape(russell_X_ti_seq, (len(russell_X_ti_seq), -1))
sp_X_ti_seq_flatten = sequential_reshape(sp_X_ti_seq, (len(sp_X_ti_seq), -1))

### Split into training and test (80/20)

In [13]:
## Full features
dji_X_train_full, dji_X_test_full, dji_y_train_full, dji_y_test_full = train_test_split(dji_X_seq_flatten,
                                                                                        dji_y_seq,
                                                                                        stratify=None,
                                                                                        test_size=0.2, shuffle=False)
nasdaq_X_train_full, nasdaq_X_test_full, nasdaq_y_train_full, nasdaq_y_test_full = train_test_split(nasdaq_X_seq_flatten,
                                                                                        nasdaq_y_seq,
                                                                                        stratify=None,
                                                                                        test_size=0.2, shuffle=False)
nyse_X_train_full, nyse_X_test_full, nyse_y_train_full, nyse_y_test_full = train_test_split(nyse_X_seq_flatten,
                                                                                        nyse_y_seq,
                                                                                        stratify=None,
                                                                                        test_size=0.2, shuffle=False)
russell_X_train_full, russell_X_test_full, russell_y_train_full, russell_y_test_full = train_test_split(russell_X_seq_flatten,
                                                                                        russell_y_seq,
                                                                                        stratify=None,
                                                                                        test_size=0.2, shuffle=False)
sp_X_train_full, sp_X_test_full, sp_y_train_full, sp_y_test_full = train_test_split(sp_X_seq_flatten,
                                                                                        sp_y_seq,
                                                                                        stratify=None,
                                                                                        test_size=0.1, shuffle=False)

In [298]:
## pca features
dji_X_train_pca, dji_X_test_pca, dji_y_train_pca, dji_y_test_pca = train_test_split(dji_X_pca_seq_flatten,
                                                                                        dji_y_seq,
                                                                                        stratify=None,
                                                                                        test_size=0.2, shuffle=False)
nasdaq_X_train_pca, nasdaq_X_test_pca, nasdaq_y_train_pca, nasdaq_y_test_pca = train_test_split(nasdaq_X_pca_seq_flatten,
                                                                                        nasdaq_y_seq,
                                                                                        stratify=None,
                                                                                        test_size=0.2, shuffle=False)
nyse_X_train_pca, nyse_X_test_pca, nyse_y_train_pca, nyse_y_test_pca = train_test_split(nyse_X_pca_seq_flatten,
                                                                                        nyse_y_seq,
                                                                                        stratify=None,
                                                                                        test_size=0.2, shuffle=False)
russell_X_train_pca, russell_X_test_pca, russell_y_train_pca, russell_y_test_pca = train_test_split(russell_X_pca_seq_flatten,
                                                                                        russell_y_seq,
                                                                                        stratify=None,
                                                                                        test_size=0.2, shuffle=False)
sp_X_train_pca, sp_X_test_pca, sp_y_train_pca, sp_y_test_pca = train_test_split(sp_X_pca_seq_flatten,
                                                                                        sp_y_seq,
                                                                                        stratify=None,
                                                                                        test_size=0.2, shuffle=False)

In [299]:
## ti features
dji_X_train_ti, dji_X_test_ti, dji_y_train_ti, dji_y_test_ti = train_test_split(dji_X_ti_seq_flatten,
                                                                                        dji_y_seq,
                                                                                        stratify=None,
                                                                                        test_size=0.2, shuffle=False)
nasdaq_X_train_ti, nasdaq_X_test_ti, nasdaq_y_train_ti, nasdaq_y_test_ti = train_test_split(nasdaq_X_ti_seq_flatten,
                                                                                        nasdaq_y_seq,
                                                                                        stratify=None,
                                                                                        test_size=0.2, shuffle=False)
nyse_X_train_ti, nyse_X_test_ti, nyse_y_train_ti, nyse_y_test_ti = train_test_split(nyse_X_ti_seq_flatten,
                                                                                        nyse_y_seq,
                                                                                        stratify=None,
                                                                                        test_size=0.2, shuffle=False)
russell_X_train_ti, russell_X_test_ti, russell_y_train_ti, russell_y_test_ti = train_test_split(russell_X_ti_seq_flatten,
                                                                                        russell_y_seq,
                                                                                        stratify=None,
                                                                                        test_size=0.2, shuffle=False)
sp_X_train_ti, sp_X_test_ti, sp_y_train_ti, sp_y_test_ti = train_test_split(sp_X_ti_seq_flatten,
                                                                                        sp_y_seq,
                                                                                        stratify=None,
                                                                                        test_size=0.2, shuffle=False)

### Dummy Classifier

#### Full features | PCA | technical indicator

In [300]:
dummy = DummyClassifier(strategy="most_frequent")
dummy.fit(dji_X_train_full, dji_y_train_full)
print("Macro Average F1 DJI")
analyse_cv(dummy, dji_X_train_full, dji_y_train_full, 5, "f1_macro")

print("\n")
print("Accuracy DJI")
analyse_cv(dummy, dji_X_train_full, dji_y_train_full, 5, "accuracy")

Macro Average F1 DJI
Scores: [0.34937238 0.35010482 0.35010482 0.35010482 0.35010482]
Mean Scores: 0.34995833442979574
Standard deviation: 0.000292974746278607


Accuracy DJI
Scores: [0.53697749 0.53870968 0.53870968 0.53870968 0.53870968]
Mean Scores: 0.5383632403277667
Standard deviation: 0.0006928741831760288


In [301]:
dummy = DummyClassifier(strategy="most_frequent")
dummy.fit(nasdaq_X_train_full, nasdaq_y_train_full)
print("Macro Average F1 NASDAQ")
analyse_cv(dummy, nasdaq_X_train_full, nasdaq_y_train_full, 5, "f1_macro")

print("\n")
print("Accuracy NASDAQ")
analyse_cv(dummy, nasdaq_X_train_full, nasdaq_y_train_full, 5, "accuracy")

Macro Average F1 NASDAQ
Scores: [0.35743802 0.35817805 0.35684647 0.35684647 0.35684647]
Mean Scores: 0.3572310978892581
Standard deviation: 0.0005259942446186913


Accuracy NASDAQ
Scores: [0.5562701  0.55806452 0.55483871 0.55483871 0.55483871]
Mean Scores: 0.5557701483248625
Standard deviation: 0.0012741118964606582


In [302]:
dummy = DummyClassifier(strategy="most_frequent")
dummy.fit(nyse_X_train_full, nyse_y_train_full)
print("Macro Average F1 NYSE")
analyse_cv(dummy, nyse_X_train_full, nyse_y_train_full, 5, "f1_macro")

print("\n")
print("Accuracy NYSE")
analyse_cv(dummy, nyse_X_train_full, nyse_y_train_full, 5, "accuracy")

Macro Average F1 NYSE
Scores: [0.34800839 0.34736842 0.3487395  0.3487395  0.3487395 ]
Mean Scores: 0.3483190588383649
Standard deviation: 0.0005532687742801843


Accuracy NYSE
Scores: [0.53376206 0.53225806 0.53548387 0.53548387 0.53548387]
Mean Scores: 0.5344943470594338
Standard deviation: 0.0013018970584764624


In [303]:
dummy = DummyClassifier(strategy="most_frequent")
dummy.fit(russell_X_train_full, russell_y_train_full)
print("Macro Average F1 RUSSELL")
analyse_cv(dummy, russell_X_train_full, russell_y_train_full, 5, "f1_macro")

print("\n")
print("Accuracy RUSSELL")
analyse_cv(dummy, russell_X_train_full, russell_y_train_full, 5, "accuracy")

Macro Average F1 RUSSELL
Scores: [0.34800839 0.3487395  0.34736842 0.34736842 0.34736842]
Mean Scores: 0.34777062894008975
Standard deviation: 0.0005441589549191782


Accuracy RUSSELL
Scores: [0.53376206 0.53548387 0.53225806 0.53225806 0.53225806]
Mean Scores: 0.5332040244787886
Standard deviation: 0.0012801267156406925


In [304]:
dummy = DummyClassifier(strategy="most_frequent")
dummy.fit(sp_X_train_full, sp_y_train_full)
print("Macro Average F1 S&P 500")
analyse_cv(dummy, sp_X_train_full, sp_y_train_full, 5, "f1_macro")

print("\n")
print("Accuracy S&P 500")
analyse_cv(dummy, sp_X_train_full, sp_y_train_full, 5, "accuracy")

Macro Average F1 S&P 500
Scores: [0.35208333 0.35281837 0.35281837 0.35146444 0.35146444]
Mean Scores: 0.35212978936825035
Standard deviation: 0.0006059441632024813


Accuracy S&P 500
Scores: [0.54340836 0.54516129 0.54516129 0.54193548 0.54193548]
Mean Scores: 0.5435203817031429
Standard deviation: 0.0014437114188587748


### Naive Bayes

#### Full features

In [305]:
# DJI
gnb = GaussianNB()
gnb.fit(dji_X_train_full, dji_y_train_full)
print("Macro Average F1 DJI")
analyse_cv(gnb, dji_X_train_full, dji_y_train_full, 5, "f1_macro")

print("\n")
print("Accuracy DJI")
analyse_cv(gnb, dji_X_train_full, dji_y_train_full, 5, "accuracy")

Macro Average F1 DJI
Scores: [0.48212236 0.46201058 0.50836776 0.49911916 0.48989494]
Mean Scores: 0.48830296042053867
Standard deviation: 0.015821567122588115


Accuracy DJI
Scores: [0.48231511 0.47096774 0.50967742 0.5        0.49677419]
Mean Scores: 0.49194689347578047
Standard deviation: 0.01367788654706015


In [306]:
# NASDAQ
gnb = GaussianNB()
gnb.fit(nasdaq_X_train_full, nasdaq_y_train_full)
print("Macro Average F1 NASDAQ")
analyse_cv(gnb, nasdaq_X_train_full, nasdaq_y_train_full, 5, "f1_macro")

print("\n")
print("Accuracy NASDAQ")
analyse_cv(gnb, nasdaq_X_train_full, nasdaq_y_train_full, 5, "accuracy")

Macro Average F1 NASDAQ
Scores: [0.44141984 0.47017383 0.50209429 0.46128472 0.47087965]
Mean Scores: 0.46917046390215156
Standard deviation: 0.019596881202272154


Accuracy NASDAQ
Scores: [0.4437299  0.47096774 0.50645161 0.46129032 0.47096774]
Mean Scores: 0.47068146457836324
Standard deviation: 0.02046581350067431


In [307]:
# NYSE
gnb = GaussianNB()
gnb.fit(nyse_X_train_full, nyse_y_train_full)
print("Macro Average F1 NYSE")
analyse_cv(gnb, nyse_X_train_full, nyse_y_train_full, 5, "f1_macro")

print("\n")
print("Accuracy NYSE")
analyse_cv(gnb, nyse_X_train_full, nyse_y_train_full, 5, "accuracy")

Macro Average F1 NYSE
Scores: [0.48094127 0.46649283 0.45623121 0.50320513 0.45159008]
Mean Scores: 0.47169210283695834
Standard deviation: 0.01870177086650713


Accuracy NYSE
Scores: [0.48231511 0.46774194 0.45806452 0.50322581 0.4516129 ]
Mean Scores: 0.47259205476610305
Standard deviation: 0.018480597555609282


In [308]:
# Russell
gnb = GaussianNB()
gnb.fit(russell_X_train_full, russell_y_train_full)
print("Macro Average F1 RUSSELL")
analyse_cv(gnb, russell_X_train_full, russell_y_train_full, 5, "f1_macro")

print("\n")
print("Accuracy RUSSELL")
analyse_cv(gnb, russell_X_train_full, russell_y_train_full, 5, "accuracy")

Macro Average F1 RUSSELL
Scores: [0.45482386 0.48998292 0.47942634 0.50801871 0.4934166 ]
Mean Scores: 0.4851336857559433
Standard deviation: 0.01770370401503455


Accuracy RUSSELL
Scores: [0.46623794 0.49032258 0.48064516 0.50967742 0.49354839]
Mean Scores: 0.48808629810185666
Standard deviation: 0.014378686093139498


In [309]:
# S&P 500
gnb = GaussianNB()
gnb.fit(sp_X_train_full, sp_y_train_full)
print("Macro Average F1 S&P 500")
analyse_cv(gnb, sp_X_train_full, sp_y_train_full, 5, "f1_macro")

print("\n")
print("Accuracy S&P 500")
analyse_cv(gnb, sp_X_train_full, sp_y_train_full, 5, "accuracy")

Macro Average F1 S&P 500
Scores: [0.44487339 0.46479581 0.50314308 0.49677419 0.46596639]
Mean Scores: 0.4751105721893107
Standard deviation: 0.02172294541076408


Accuracy S&P 500
Scores: [0.44694534 0.46774194 0.50322581 0.49677419 0.47096774]
Mean Scores: 0.4771310030079867
Standard deviation: 0.020513939028225424


#### PCA features

In [310]:
# DJI
gnb = GaussianNB()
gnb.fit(dji_X_train_pca, dji_y_train_pca)
print("Macro Average F1 DJI")
analyse_cv(gnb, dji_X_train_pca, dji_y_train_pca, 5, "f1_macro")

print("\n")
print("Accuracy DJI")
analyse_cv(gnb, dji_X_train_pca, dji_y_train_pca, 5, "accuracy")

Macro Average F1 DJI
Scores: [0.48180246 0.50965701 0.52401843 0.4861932  0.4999948 ]
Mean Scores: 0.5003331791625909
Standard deviation: 0.015436098071391064


Accuracy DJI
Scores: [0.48874598 0.50967742 0.52580645 0.48709677 0.5       ]
Mean Scores: 0.5022653251737371
Standard deviation: 0.014332571897292375


In [311]:
# NASDAQ
gnb = GaussianNB()
gnb.fit(nasdaq_X_train_pca, nasdaq_y_train_pca)
print("Macro Average F1 NASDAQ")
analyse_cv(gnb, nasdaq_X_train_pca, nasdaq_y_train_pca, 5, "f1_macro")

print("\n")
print("Accuracy NASDAQ")
analyse_cv(gnb, nasdaq_X_train_pca, nasdaq_y_train_pca, 5, "accuracy")

Macro Average F1 NASDAQ
Scores: [0.51962888 0.49625    0.51743868 0.472293   0.54467246]
Mean Scores: 0.5100566046474915
Standard deviation: 0.024335540691461564


Accuracy NASDAQ
Scores: [0.52090032 0.49677419 0.52258065 0.48064516 0.5483871 ]
Mean Scores: 0.5138574836635204
Standard deviation: 0.023293288527257615


In [312]:
# NYSE
gnb = GaussianNB()
gnb.fit(nyse_X_train_pca, nyse_y_train_pca)
print("Macro Average F1 NYSE")
analyse_cv(gnb, nyse_X_train_pca, nyse_y_train_pca, 5, "f1_macro")

print("\n")
print("Accuracy NYSE")
analyse_cv(gnb, nyse_X_train_pca, nyse_y_train_pca, 5, "accuracy")

Macro Average F1 NYSE
Scores: [0.51084437 0.4871877  0.52318172 0.46342176 0.48038021]
Mean Scores: 0.4930031511408698
Standard deviation: 0.0214348877485399


Accuracy NYSE
Scores: [0.51125402 0.5        0.52580645 0.46451613 0.48064516]
Mean Scores: 0.4964443522456176
Standard deviation: 0.02173995225686658


In [313]:
# Russell
gnb = GaussianNB()
gnb.fit(russell_X_train_pca, russell_y_train_pca)
print("Macro Average F1 RUSSELL")
analyse_cv(gnb, russell_X_train_pca, russell_y_train_pca, 5, "f1_macro")

print("\n")
print("Accuracy RUSSELL")
analyse_cv(gnb, russell_X_train_pca, russell_y_train_pca, 5, "accuracy")

Macro Average F1 RUSSELL
Scores: [0.51684067 0.48554968 0.48645016 0.48683512 0.46479581]
Mean Scores: 0.48809428563381
Standard deviation: 0.016612869833675083


Accuracy RUSSELL
Scores: [0.51768489 0.48709677 0.48709677 0.48709677 0.46774194]
Mean Scores: 0.48934342910486467
Standard deviation: 0.016031251749095542


In [314]:
# S&P 500
gnb = GaussianNB()
gnb.fit(sp_X_train_pca, sp_y_train_pca)
print("Macro Average F1 S&P 500")
analyse_cv(gnb, sp_X_train_pca, sp_y_train_pca, 5, "f1_macro")

print("\n")
print("Accuracy S&P 500")
analyse_cv(gnb, sp_X_train_pca, sp_y_train_pca, 5, "accuracy")

Macro Average F1 S&P 500
Scores: [0.53096885 0.50189892 0.49625    0.50935066 0.52619237]
Mean Scores: 0.5129321597767609
Standard deviation: 0.01352042415087303


Accuracy S&P 500
Scores: [0.53376206 0.50322581 0.49677419 0.50967742 0.52903226]
Mean Scores: 0.5144943470594336
Standard deviation: 0.014469154619181227


#### Technical indicator features

In [315]:
# DJI
gnb = GaussianNB()
gnb.fit(dji_X_train_ti, dji_y_train_ti)
print("Macro Average F1 DJI")
analyse_cv(gnb, dji_X_train_ti, dji_y_train_ti, 5, "f1_macro")

print("\n")
print("Accuracy DJI")
analyse_cv(gnb, dji_X_train_ti, dji_y_train_ti, 5, "accuracy")

Macro Average F1 DJI
Scores: [0.54203999 0.5447776  0.48384948 0.49849182 0.50732021]
Mean Scores: 0.5152958188295289
Standard deviation: 0.02416312232030432


Accuracy DJI
Scores: [0.54340836 0.54516129 0.48387097 0.5        0.51290323]
Mean Scores: 0.5170687687999169
Standard deviation: 0.024057233626188255


In [316]:
# NASDAQ
gnb = GaussianNB()
gnb.fit(nasdaq_X_train_ti, nasdaq_y_train_ti)
print("Macro Average F1 NASDAQ")
analyse_cv(gnb, nasdaq_X_train_ti, nasdaq_y_train_ti, 5, "f1_macro")

print("\n")
print("Accuracy NASDAQ")
analyse_cv(gnb, nasdaq_X_train_ti, nasdaq_y_train_ti, 5, "accuracy")

Macro Average F1 NASDAQ
Scores: [0.49651308 0.44642857 0.45683761 0.52885576 0.51314888]
Mean Scores: 0.4883567794322989
Standard deviation: 0.03185207670529294


Accuracy NASDAQ
Scores: [0.49839228 0.4516129  0.47096774 0.52903226 0.51935484]
Mean Scores: 0.4938720049787366
Standard deviation: 0.02903958702029174


In [317]:
# NYSE
gnb = GaussianNB()
gnb.fit(nyse_X_train_ti, nyse_y_train_ti)
print("Macro Average F1 NYSE")
analyse_cv(gnb, nyse_X_train_ti, nyse_y_train_ti, 5, "f1_macro")

print("\n")
print("Accuracy NYSE")
analyse_cv(gnb, nyse_X_train_ti, nyse_y_train_ti, 5, "accuracy")

Macro Average F1 NYSE
Scores: [0.50638505 0.515625   0.51620843 0.51930982 0.49277428]
Mean Scores: 0.5100605165076467
Standard deviation: 0.009660570218498636


Accuracy NYSE
Scores: [0.50803859 0.51612903 0.51935484 0.51935484 0.5       ]
Mean Scores: 0.5125754589772845
Standard deviation: 0.007526928380853887


In [318]:
# Russell
gnb = GaussianNB()
gnb.fit(russell_X_train_ti, russell_y_train_ti)
print("Macro Average F1 RUSSELL")
analyse_cv(gnb, russell_X_train_ti, russell_y_train_ti, 5, "f1_macro")

print("\n")
print("Accuracy RUSSELL")
analyse_cv(gnb, russell_X_train_ti, russell_y_train_ti, 5, "accuracy")

Macro Average F1 RUSSELL
Scores: [0.49509333 0.44254386 0.47326724 0.5221818  0.45830848]
Mean Scores: 0.4782789413756661
Standard deviation: 0.02796825513513449


Accuracy RUSSELL
Scores: [0.49517685 0.47096774 0.47419355 0.52580645 0.46129032]
Mean Scores: 0.4854869826781454
Standard deviation: 0.022992961756979503


In [319]:
# S&P 500
gnb = GaussianNB()
gnb.fit(sp_X_train_ti, sp_y_train_ti)
print("Macro Average F1 S&P 500")
analyse_cv(gnb, sp_X_train_ti, sp_y_train_ti, 5, "f1_macro")

print("\n")
print("Accuracy S&P 500")
analyse_cv(gnb, sp_X_train_ti, sp_y_train_ti, 5, "accuracy")

Macro Average F1 S&P 500
Scores: [0.48197234 0.44833543 0.48613035 0.4907451  0.42494477]
Mean Scores: 0.46642559588657095
Standard deviation: 0.025570802906326948


Accuracy S&P 500
Scores: [0.48231511 0.4483871  0.49032258 0.49354839 0.42580645]
Mean Scores: 0.46807592573384504
Standard deviation: 0.026536287652188892


### Logistic Regression

In [320]:
# Grid search parameter
param_grid_lr = {
    "class_weight": [None, "balanced"],
    "solver": ["liblinear", "lbfgs"],
#     "C": [0.5, 1, 2]
}

#### Full features

In [321]:
## DJI
lr = LogisticRegression()
print("Macro Average F1 DJI")
grid_search = GridSearchCV(lr, param_grid_lr, cv=5, scoring="f1_macro")
grid_search.fit(dji_X_train_full, dji_y_train_full)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)
    

print("\n")
print("Accuracy DJI")
grid_search = GridSearchCV(lr, param_grid_lr, cv=5, scoring="accuracy")
grid_search.fit(dji_X_train_full, dji_y_train_full)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)

Macro Average F1 DJI
0.5238466268791454 0.0195418932326495 {'class_weight': None, 'solver': 'liblinear'}
0.526626542874667 0.014256825608477665 {'class_weight': None, 'solver': 'lbfgs'}
0.5245812301739048 0.02147557861130297 {'class_weight': 'balanced', 'solver': 'liblinear'}
0.5254129444943587 0.015414415451803595 {'class_weight': 'balanced', 'solver': 'lbfgs'}


Accuracy DJI
0.5254558655741106 0.019123092903500914 {'class_weight': None, 'solver': 'liblinear'}
0.5293289077896484 0.013604184677851476 {'class_weight': None, 'solver': 'lbfgs'}
0.5260968779172285 0.02102792190508948 {'class_weight': 'balanced', 'solver': 'liblinear'}
0.5280385852090033 0.014887425372188736 {'class_weight': 'balanced', 'solver': 'lbfgs'}


In [322]:
## nasdaq
lr = LogisticRegression()
print("Macro Average F1 NASDAQ")
grid_search = GridSearchCV(lr, param_grid_lr, cv=5, scoring="f1_macro")
grid_search.fit(nasdaq_X_train_full, nasdaq_y_train_full)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)
    

print("\n")
print("Accuracy NASDAQ")
grid_search = GridSearchCV(lr, param_grid_lr, cv=5, scoring="accuracy")
grid_search.fit(nasdaq_X_train_full, nasdaq_y_train_full)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)

Macro Average F1 NASDAQ
0.5151035434330453 0.016163774685905973 {'class_weight': None, 'solver': 'liblinear'}
0.5083735012533639 0.011813529671990388 {'class_weight': None, 'solver': 'lbfgs'}
0.5154202516945446 0.015384048698663826 {'class_weight': 'balanced', 'solver': 'liblinear'}
0.5101071344800646 0.011788018801605842 {'class_weight': 'balanced', 'solver': 'lbfgs'}


Accuracy NASDAQ
0.5190146250388964 0.015339276161219462 {'class_weight': None, 'solver': 'liblinear'}
0.5145047194274454 0.012406980045887691 {'class_weight': None, 'solver': 'lbfgs'}
0.5190166995124987 0.014578596404572973 {'class_weight': 'balanced', 'solver': 'liblinear'}
0.5157908930608859 0.01145729648385652 {'class_weight': 'balanced', 'solver': 'lbfgs'}


In [323]:
## nyse
lr = LogisticRegression()
print("Macro Average F1 NYSE")
grid_search = GridSearchCV(lr, param_grid_lr, cv=5, scoring="f1_macro")
grid_search.fit(nyse_X_train_full, nyse_y_train_full)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)
    

print("\n")
print("Accuracy NYSE")
grid_search = GridSearchCV(lr, param_grid_lr, cv=5, scoring="accuracy")
grid_search.fit(nyse_X_train_full, nyse_y_train_full)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)

Macro Average F1 NYSE
0.4509998964819701 0.02175636388074724 {'class_weight': None, 'solver': 'liblinear'}
0.4498266599388107 0.02265935468304082 {'class_weight': None, 'solver': 'lbfgs'}
0.45105039479005515 0.021365475732158176 {'class_weight': 'balanced', 'solver': 'liblinear'}
0.4506953101627684 0.02378579033173474 {'class_weight': 'balanced', 'solver': 'lbfgs'}


Accuracy NYSE
0.45197801057981535 0.022136893756008982 {'class_weight': None, 'solver': 'liblinear'}
0.45133492376309514 0.023051267835433483 {'class_weight': None, 'solver': 'lbfgs'}
0.45197801057981535 0.021757590435147037 {'class_weight': 'balanced', 'solver': 'liblinear'}
0.45198008505341764 0.02406595529620798 {'class_weight': 'balanced', 'solver': 'lbfgs'}


In [324]:
## russell
lr = LogisticRegression()
print("Macro Average F1 RUSSELL")
grid_search = GridSearchCV(lr, param_grid_lr, cv=5, scoring="f1_macro")
grid_search.fit(russell_X_train_full, russell_y_train_full)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)
    

print("\n")
print("Accuracy RUSSELL")
grid_search = GridSearchCV(lr, param_grid_lr, cv=5, scoring="accuracy")
grid_search.fit(russell_X_train_full, russell_y_train_full)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)

Macro Average F1 RUSSELL
0.49289557620017516 0.034030741180151786 {'class_weight': None, 'solver': 'liblinear'}
0.4871804232157011 0.030252221009079365 {'class_weight': None, 'solver': 'lbfgs'}
0.493544341925569 0.03329311195077635 {'class_weight': 'balanced', 'solver': 'liblinear'}
0.48869282053333984 0.0321317121217041 {'class_weight': 'balanced', 'solver': 'lbfgs'}


Accuracy RUSSELL
0.49583653148013695 0.034819913413826925 {'class_weight': None, 'solver': 'liblinear'}
0.49132247692148123 0.03135928890875535 {'class_weight': None, 'solver': 'lbfgs'}
0.49647961829685716 0.03405098032933791 {'class_weight': 'balanced', 'solver': 'liblinear'}
0.492610725028524 0.033012598687462036 {'class_weight': 'balanced', 'solver': 'lbfgs'}


In [325]:
## S&P 500
lr = LogisticRegression()
print("Macro Average F1 S&P 500")
grid_search = GridSearchCV(lr, param_grid_lr, cv=5, scoring="f1_macro")
grid_search.fit(sp_X_train_full, sp_y_train_full)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)
    

print("\n")
print("Accuracy S&P 500")
grid_search = GridSearchCV(lr, param_grid_lr, cv=5, scoring="accuracy")
grid_search.fit(sp_X_train_full, sp_y_train_full)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)

Macro Average F1 S&P 500
0.5088655376933364 0.02424613019141935 {'class_weight': None, 'solver': 'liblinear'}
0.5099021816897975 0.024433693230630917 {'class_weight': None, 'solver': 'lbfgs'}
0.5084121550872547 0.02244407025013868 {'class_weight': 'balanced', 'solver': 'liblinear'}
0.5108736584332009 0.0229182302260517 {'class_weight': 'balanced', 'solver': 'lbfgs'}


Accuracy S&P 500
0.5112747640286277 0.025677400681251076 {'class_weight': None, 'solver': 'liblinear'}
0.5132102478995955 0.026088637756992016 {'class_weight': None, 'solver': 'lbfgs'}
0.5106296027383052 0.02390572417000887 {'class_weight': 'balanced', 'solver': 'liblinear'}
0.5138554091899181 0.024552016366019728 {'class_weight': 'balanced', 'solver': 'lbfgs'}


#### PCA features

In [326]:
## DJI
lr = LogisticRegression()
print("Macro Average F1 DJI")
grid_search = GridSearchCV(lr, param_grid_lr, cv=5, scoring="f1_macro")
grid_search.fit(dji_X_train_pca, dji_y_train_pca)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)
    

print("\n")
print("Accuracy DJI")
grid_search = GridSearchCV(lr, param_grid_lr, cv=5, scoring="accuracy")
grid_search.fit(dji_X_train_pca, dji_y_train_pca)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)

Macro Average F1 DJI
0.4881487244074402 0.016127298695894403 {'class_weight': None, 'solver': 'liblinear'}
0.4894189190773126 0.0215753503289529 {'class_weight': None, 'solver': 'lbfgs'}
0.4902742376490683 0.01666913349327519 {'class_weight': 'balanced', 'solver': 'liblinear'}
0.4940146074951864 0.017383211025422562 {'class_weight': 'balanced', 'solver': 'lbfgs'}


Accuracy DJI
0.4900217819728244 0.016618460294789972 {'class_weight': None, 'solver': 'liblinear'}
0.4919510424229852 0.021675849967166115 {'class_weight': None, 'solver': 'lbfgs'}
0.4919572658437922 0.016975266268331193 {'class_weight': 'balanced', 'solver': 'liblinear'}
0.4964609480344363 0.017615489306133367 {'class_weight': 'balanced', 'solver': 'lbfgs'}


In [327]:
## nasdaq
lr = LogisticRegression()
print("Macro Average F1 NASDAQ")
grid_search = GridSearchCV(lr, param_grid_lr, cv=5, scoring="f1_macro")
grid_search.fit(nasdaq_X_train_pca, nasdaq_y_train_pca)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)
    

print("\n")
print("Accuracy NASDAQ")
grid_search = GridSearchCV(lr, param_grid_lr, cv=5, scoring="accuracy")
grid_search.fit(nasdaq_X_train_pca, nasdaq_y_train_pca)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)

Macro Average F1 NASDAQ
0.5042210568239864 0.0070730790501194315 {'class_weight': None, 'solver': 'liblinear'}
0.5057709027193601 0.012089860338975892 {'class_weight': None, 'solver': 'lbfgs'}
0.501985544760998 0.006019977831805256 {'class_weight': 'balanced', 'solver': 'liblinear'}
0.5043167744567929 0.01002287495195815 {'class_weight': 'balanced', 'solver': 'lbfgs'}


Accuracy NASDAQ
0.5080614044186287 0.004564183257549121 {'class_weight': None, 'solver': 'liblinear'}
0.5112872108702416 0.009462045460525276 {'class_weight': None, 'solver': 'lbfgs'}
0.5054828337309407 0.0028136210199527748 {'class_weight': 'balanced', 'solver': 'liblinear'}
0.509351726999274 0.007243860428612697 {'class_weight': 'balanced', 'solver': 'lbfgs'}


In [328]:
## nyse
lr = LogisticRegression()
print("Macro Average F1 NYSE")
grid_search = GridSearchCV(lr, param_grid_lr, cv=5, scoring="f1_macro")
grid_search.fit(nyse_X_train_pca, nyse_y_train_pca)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)
    

print("\n")
print("Accuracy NYSE")
grid_search = GridSearchCV(lr, param_grid_lr, cv=5, scoring="accuracy")
grid_search.fit(nyse_X_train_pca, nyse_y_train_pca)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)

Macro Average F1 NYSE
0.49838082306130743 0.02395294369498334 {'class_weight': None, 'solver': 'liblinear'}
0.4963382896671392 0.01865040345146707 {'class_weight': None, 'solver': 'lbfgs'}
0.49858851431841894 0.021992549561027024 {'class_weight': 'balanced', 'solver': 'liblinear'}
0.49700150551621813 0.020524931702461592 {'class_weight': 'balanced', 'solver': 'lbfgs'}


Accuracy NYSE
0.5003173944611555 0.0234470407082239 {'class_weight': None, 'solver': 'liblinear'}
0.4983922829581994 0.018074185047999135 {'class_weight': None, 'solver': 'lbfgs'}
0.5003215434083601 0.021792642883754 {'class_weight': 'balanced', 'solver': 'liblinear'}
0.4990374442485219 0.020132995986918745 {'class_weight': 'balanced', 'solver': 'lbfgs'}


In [329]:
## russell
lr = LogisticRegression()
print("Macro Average F1 RUSSELL")
grid_search = GridSearchCV(lr, param_grid_lr, cv=5, scoring="f1_macro")
grid_search.fit(russell_X_train_pca, russell_y_train_pca)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)
    

print("\n")
print("Accuracy RUSSELL")
grid_search = GridSearchCV(lr, param_grid_lr, cv=5, scoring="accuracy")
grid_search.fit(russell_X_train_pca, russell_y_train_pca)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)

Macro Average F1 RUSSELL
0.49633309290637123 0.023253811899612016 {'class_weight': None, 'solver': 'liblinear'}
0.4925071045390023 0.02766774280421835 {'class_weight': None, 'solver': 'lbfgs'}
0.4957157493568243 0.022698728398788115 {'class_weight': 'balanced', 'solver': 'liblinear'}
0.49860794223760135 0.02503131355537156 {'class_weight': 'balanced', 'solver': 'lbfgs'}


Accuracy RUSSELL
0.49774297272067214 0.02426015112656787 {'class_weight': None, 'solver': 'liblinear'}
0.49451509179545683 0.028445428634921117 {'class_weight': None, 'solver': 'lbfgs'}
0.49709781143034953 0.02371357388397084 {'class_weight': 'balanced', 'solver': 'liblinear'}
0.5003194689347578 0.025934621787432846 {'class_weight': 'balanced', 'solver': 'lbfgs'}


In [330]:
## S&P 500
lr = LogisticRegression()
print("Macro Average F1 S&P 500")
grid_search = GridSearchCV(lr, param_grid_lr, cv=5, scoring="f1_macro")
grid_search.fit(sp_X_train_pca, sp_y_train_pca)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)
    

print("\n")
print("Accuracy S&P 500")
grid_search = GridSearchCV(lr, param_grid_lr, cv=5, scoring="accuracy")
grid_search.fit(sp_X_train_pca, sp_y_train_pca)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)

Macro Average F1 S&P 500
0.4999688919008686 0.038492689334485325 {'class_weight': None, 'solver': 'liblinear'}
0.4956179011211206 0.042210590942542994 {'class_weight': None, 'solver': 'lbfgs'}
0.5001326449692699 0.03913089917194323 {'class_weight': 'balanced', 'solver': 'liblinear'}
0.49752158490643084 0.04005428770567304 {'class_weight': 'balanced', 'solver': 'lbfgs'}


Accuracy S&P 500
0.5022155378072813 0.03751359619187546 {'class_weight': None, 'solver': 'liblinear'}
0.4989855824084639 0.041323201423955014 {'class_weight': None, 'solver': 'lbfgs'}
0.5022155378072815 0.038173523155595385 {'class_weight': 'balanced', 'solver': 'liblinear'}
0.500275904989109 0.03926915658884822 {'class_weight': 'balanced', 'solver': 'lbfgs'}


#### Technical indicator features

In [331]:
## DJI
lr = LogisticRegression()
print("Macro Average F1 DJI")
grid_search = GridSearchCV(lr, param_grid_lr, cv=5, scoring="f1_macro")
grid_search.fit(dji_X_train_ti, dji_y_train_ti)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)
    

print("\n")
print("Accuracy DJI")
grid_search = GridSearchCV(lr, param_grid_lr, cv=5, scoring="accuracy")
grid_search.fit(dji_X_train_ti, dji_y_train_ti)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)

Macro Average F1 DJI
0.48624542270040044 0.03487647880791409 {'class_weight': None, 'solver': 'liblinear'}
0.4865832735435035 0.03961671081486528 {'class_weight': None, 'solver': 'lbfgs'}
0.49113228967209227 0.039026644219467437 {'class_weight': 'balanced', 'solver': 'liblinear'}
0.4928380486584777 0.03617113635137474 {'class_weight': 'balanced', 'solver': 'lbfgs'}


Accuracy DJI
0.49389897313556685 0.03453856471538073 {'class_weight': None, 'solver': 'liblinear'}
0.4945462088994918 0.03883335537765217 {'class_weight': None, 'solver': 'lbfgs'}
0.4919717871590084 0.039145780798827766 {'class_weight': 'balanced', 'solver': 'liblinear'}
0.49389897313556685 0.03624389671175609 {'class_weight': 'balanced', 'solver': 'lbfgs'}


In [332]:
## nasdaq
lr = LogisticRegression()
print("Macro Average F1 NASDAQ")
grid_search = GridSearchCV(lr, param_grid_lr, cv=5, scoring="f1_macro")
grid_search.fit(nasdaq_X_train_ti, nasdaq_y_train_ti)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)
    

print("\n")
print("Accuracy NASDAQ")
grid_search = GridSearchCV(lr, param_grid_lr, cv=5, scoring="accuracy")
grid_search.fit(nasdaq_X_train_ti, nasdaq_y_train_ti)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)

Macro Average F1 NASDAQ
0.49443031635387086 0.031860558918096984 {'class_weight': None, 'solver': 'liblinear'}
0.49554785729163625 0.03062437027354042 {'class_weight': None, 'solver': 'lbfgs'}
0.49522252896913904 0.025294549010315456 {'class_weight': 'balanced', 'solver': 'liblinear'}
0.4913655621375238 0.029403882246371027 {'class_weight': 'balanced', 'solver': 'lbfgs'}


Accuracy NASDAQ
0.5132185457940048 0.031973376221215266 {'class_weight': None, 'solver': 'liblinear'}
0.5132123223731979 0.032709741932727125 {'class_weight': None, 'solver': 'lbfgs'}
0.49838813401099474 0.024693753905206742 {'class_weight': 'balanced', 'solver': 'liblinear'}
0.49450886837464997 0.029270814148555646 {'class_weight': 'balanced', 'solver': 'lbfgs'}


In [333]:
## nyse
lr = LogisticRegression()
print("Macro Average F1 NYSE")
grid_search = GridSearchCV(lr, param_grid_lr, cv=5, scoring="f1_macro")
grid_search.fit(nyse_X_train_ti, nyse_y_train_ti)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)
    

print("\n")
print("Accuracy NYSE")
grid_search = GridSearchCV(lr, param_grid_lr, cv=5, scoring="accuracy")
grid_search.fit(nyse_X_train_ti, nyse_y_train_ti)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)

Macro Average F1 NYSE
0.49297563750829976 0.019976207524655898 {'class_weight': None, 'solver': 'liblinear'}
0.48750373224258964 0.025755865731548963 {'class_weight': None, 'solver': 'lbfgs'}
0.4986000841109524 0.023007632877243823 {'class_weight': 'balanced', 'solver': 'liblinear'}
0.4919957008727227 0.029343606697682077 {'class_weight': 'balanced', 'solver': 'lbfgs'}


Accuracy NYSE
0.49966393527642355 0.021635734941543524 {'class_weight': None, 'solver': 'liblinear'}
0.493861632610725 0.02628646156049105 {'class_weight': None, 'solver': 'lbfgs'}
0.4990374442485219 0.023206294757653775 {'class_weight': 'balanced', 'solver': 'liblinear'}
0.49259205476610307 0.029341970484149858 {'class_weight': 'balanced', 'solver': 'lbfgs'}


In [334]:
## russell
lr = LogisticRegression()
print("Macro Average F1 RUSSELL")
grid_search = GridSearchCV(lr, param_grid_lr, cv=5, scoring="f1_macro")
grid_search.fit(russell_X_train_ti, russell_y_train_ti)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)
    

print("\n")
print("Accuracy RUSSELL")
grid_search = GridSearchCV(lr, param_grid_lr, cv=5, scoring="accuracy")
grid_search.fit(russell_X_train_ti, russell_y_train_ti)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)

Macro Average F1 RUSSELL
0.506035806650696 0.025970861902138705 {'class_weight': None, 'solver': 'liblinear'}
0.5101276850821498 0.018293253217626742 {'class_weight': None, 'solver': 'lbfgs'}
0.5192036303994468 0.014206270302222719 {'class_weight': 'balanced', 'solver': 'liblinear'}
0.5146895526852595 0.01433941420676678 {'class_weight': 'balanced', 'solver': 'lbfgs'}


Accuracy RUSSELL
0.5132289181620164 0.022725159647272387 {'class_weight': None, 'solver': 'liblinear'}
0.5170957369567473 0.016688081458036484 {'class_weight': None, 'solver': 'lbfgs'}
0.520968779172285 0.013531286752731124 {'class_weight': 'balanced', 'solver': 'liblinear'}
0.5170915880095426 0.012728254621395702 {'class_weight': 'balanced', 'solver': 'lbfgs'}


In [335]:
## S&P 500
lr = LogisticRegression()
print("Macro Average F1 S&P 500")
grid_search = GridSearchCV(lr, param_grid_lr, cv=5, scoring="f1_macro")
grid_search.fit(sp_X_train_ti, sp_y_train_ti)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)
    

print("\n")
print("Accuracy S&P 500")
grid_search = GridSearchCV(lr, param_grid_lr, cv=5, scoring="accuracy")
grid_search.fit(sp_X_train_ti, sp_y_train_ti)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)

Macro Average F1 S&P 500
0.48574052505144466 0.01912134932524751 {'class_weight': None, 'solver': 'liblinear'}
0.49217022726881476 0.011734900411363687 {'class_weight': None, 'solver': 'lbfgs'}
0.4952623858237263 0.011733115434404962 {'class_weight': 'balanced', 'solver': 'liblinear'}
0.4866279109983621 0.01375342499538119 {'class_weight': 'balanced', 'solver': 'lbfgs'}


Accuracy S&P 500
0.49840887874701795 0.02151038133529416 {'class_weight': None, 'solver': 'liblinear'}
0.5048418213878227 0.014279146558462028 {'class_weight': None, 'solver': 'lbfgs'}
0.4971102582719634 0.01290050502423307 {'class_weight': 'balanced', 'solver': 'liblinear'}
0.4887335338657815 0.01445574926378472 {'class_weight': 'balanced', 'solver': 'lbfgs'}


### k-NN

In [336]:
# Grid search parameter
param_grid_lr = {
    "n_neighbors": [5, 11],
    "weights": ["uniform", "distance"],
    "metric": ["minkowski", "euclidean"]
}

#### Full features

In [337]:
## DJI
lr = KNeighborsClassifier()
print("Macro Average F1 DJI")
grid_search = GridSearchCV(lr, param_grid_lr, cv=5, scoring="f1_macro")
grid_search.fit(dji_X_train_full, dji_y_train_full)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)
    

print("\n")
print("Accuracy DJI")
grid_search = GridSearchCV(lr, param_grid_lr, cv=5, scoring="accuracy")
grid_search.fit(dji_X_train_full, dji_y_train_full)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)

Macro Average F1 DJI
0.5024095279898158 0.02844191102038775 {'metric': 'minkowski', 'n_neighbors': 5, 'weights': 'uniform'}
0.5024095279898158 0.02844191102038775 {'metric': 'minkowski', 'n_neighbors': 5, 'weights': 'distance'}
0.4910028306478882 0.01898487508941455 {'metric': 'minkowski', 'n_neighbors': 11, 'weights': 'uniform'}
0.4910028306478882 0.01898487508941455 {'metric': 'minkowski', 'n_neighbors': 11, 'weights': 'distance'}
0.5024095279898158 0.02844191102038775 {'metric': 'euclidean', 'n_neighbors': 5, 'weights': 'uniform'}
0.5024095279898158 0.02844191102038775 {'metric': 'euclidean', 'n_neighbors': 5, 'weights': 'distance'}
0.4910028306478882 0.01898487508941455 {'metric': 'euclidean', 'n_neighbors': 11, 'weights': 'uniform'}
0.4910028306478882 0.01898487508941455 {'metric': 'euclidean', 'n_neighbors': 11, 'weights': 'distance'}


Accuracy DJI
0.5035266051239498 0.028363853578624202 {'metric': 'minkowski', 'n_neighbors': 5, 'weights': 'uniform'}
0.5035266051239498 0.0283638

In [338]:
## nasdaq
lr = KNeighborsClassifier()
print("Macro Average F1 NASDAQ")
grid_search = GridSearchCV(lr, param_grid_lr, cv=5, scoring="f1_macro")
grid_search.fit(nasdaq_X_train_full, nasdaq_y_train_full)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)
    

print("\n")
print("Accuracy NASDAQ")
grid_search = GridSearchCV(lr, param_grid_lr, cv=5, scoring="accuracy")
grid_search.fit(nasdaq_X_train_full, nasdaq_y_train_full)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)

Macro Average F1 NASDAQ
0.47511882785525794 0.03564953775654556 {'metric': 'minkowski', 'n_neighbors': 5, 'weights': 'uniform'}
0.47511882785525794 0.03564953775654556 {'metric': 'minkowski', 'n_neighbors': 5, 'weights': 'distance'}
0.48317219466025507 0.01345370044291804 {'metric': 'minkowski', 'n_neighbors': 11, 'weights': 'uniform'}
0.48317219466025507 0.01345370044291804 {'metric': 'minkowski', 'n_neighbors': 11, 'weights': 'distance'}
0.47511882785525794 0.03564953775654556 {'metric': 'euclidean', 'n_neighbors': 5, 'weights': 'uniform'}
0.47511882785525794 0.03564953775654556 {'metric': 'euclidean', 'n_neighbors': 5, 'weights': 'distance'}
0.48317219466025507 0.01345370044291804 {'metric': 'euclidean', 'n_neighbors': 11, 'weights': 'uniform'}
0.48317219466025507 0.01345370044291804 {'metric': 'euclidean', 'n_neighbors': 11, 'weights': 'distance'}


Accuracy NASDAQ
0.4803319157763717 0.03592819957859206 {'metric': 'minkowski', 'n_neighbors': 5, 'weights': 'uniform'}
0.4803319157763

In [339]:
## nyse
lr = KNeighborsClassifier()
print("Macro Average F1 NYSE")
grid_search = GridSearchCV(lr, param_grid_lr, cv=5, scoring="f1_macro")
grid_search.fit(nyse_X_train_full, nyse_y_train_full)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)
    

print("\n")
print("Accuracy NYSE")
grid_search = GridSearchCV(lr, param_grid_lr, cv=5, scoring="accuracy")
grid_search.fit(nyse_X_train_full, nyse_y_train_full)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)

Macro Average F1 NYSE
0.4708563260532765 0.024461654331790858 {'metric': 'minkowski', 'n_neighbors': 5, 'weights': 'uniform'}
0.4708563260532765 0.024461654331790858 {'metric': 'minkowski', 'n_neighbors': 5, 'weights': 'distance'}
0.4788961235328359 0.03826712140444377 {'metric': 'minkowski', 'n_neighbors': 11, 'weights': 'uniform'}
0.4788961235328359 0.03826712140444377 {'metric': 'minkowski', 'n_neighbors': 11, 'weights': 'distance'}
0.4708563260532765 0.024461654331790858 {'metric': 'euclidean', 'n_neighbors': 5, 'weights': 'uniform'}
0.4708563260532765 0.024461654331790858 {'metric': 'euclidean', 'n_neighbors': 5, 'weights': 'distance'}
0.4788961235328359 0.03826712140444377 {'metric': 'euclidean', 'n_neighbors': 11, 'weights': 'uniform'}
0.4788961235328359 0.03826712140444377 {'metric': 'euclidean', 'n_neighbors': 11, 'weights': 'distance'}


Accuracy NYSE
0.47129343429104864 0.024956628402935707 {'metric': 'minkowski', 'n_neighbors': 5, 'weights': 'uniform'}
0.47129343429104864 0

In [340]:
## russell
lr = KNeighborsClassifier()
print("Macro Average F1 RUSSELL")
grid_search = GridSearchCV(lr, param_grid_lr, cv=5, scoring="f1_macro")
grid_search.fit(russell_X_train_full, russell_y_train_full)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)
    

print("\n")
print("Accuracy RUSSELL")
grid_search = GridSearchCV(lr, param_grid_lr, cv=5, scoring="accuracy")
grid_search.fit(russell_X_train_full, russell_y_train_full)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)

Macro Average F1 RUSSELL
0.4808415944715276 0.025160189366725844 {'metric': 'minkowski', 'n_neighbors': 5, 'weights': 'uniform'}
0.4808415944715276 0.025160189366725844 {'metric': 'minkowski', 'n_neighbors': 5, 'weights': 'distance'}
0.467838240086465 0.010774585011108392 {'metric': 'minkowski', 'n_neighbors': 11, 'weights': 'uniform'}
0.467838240086465 0.010774585011108392 {'metric': 'minkowski', 'n_neighbors': 11, 'weights': 'distance'}
0.4808415944715276 0.025160189366725844 {'metric': 'euclidean', 'n_neighbors': 5, 'weights': 'uniform'}
0.4808415944715276 0.025160189366725844 {'metric': 'euclidean', 'n_neighbors': 5, 'weights': 'distance'}
0.467838240086465 0.010774585011108392 {'metric': 'euclidean', 'n_neighbors': 11, 'weights': 'uniform'}
0.467838240086465 0.010774585011108392 {'metric': 'euclidean', 'n_neighbors': 11, 'weights': 'distance'}


Accuracy RUSSELL
0.48487501296546 0.024835023546741896 {'metric': 'minkowski', 'n_neighbors': 5, 'weights': 'uniform'}
0.48487501296546 0

In [341]:
## S&P 500
lr = KNeighborsClassifier()
print("Macro Average F1 S&P 500")
grid_search = GridSearchCV(lr, param_grid_lr, cv=5, scoring="f1_macro")
grid_search.fit(sp_X_train_full, sp_y_train_full)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)
    

print("\n")
print("Accuracy S&P 500")
grid_search = GridSearchCV(lr, param_grid_lr, cv=5, scoring="accuracy")
grid_search.fit(sp_X_train_full, sp_y_train_full)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)

Macro Average F1 S&P 500
0.49719250203333915 0.014728432644771238 {'metric': 'minkowski', 'n_neighbors': 5, 'weights': 'uniform'}
0.49719250203333915 0.014728432644771238 {'metric': 'minkowski', 'n_neighbors': 5, 'weights': 'distance'}
0.46371210188953327 0.026576217849128208 {'metric': 'minkowski', 'n_neighbors': 11, 'weights': 'uniform'}
0.46371210188953327 0.026576217849128208 {'metric': 'minkowski', 'n_neighbors': 11, 'weights': 'distance'}
0.49719250203333915 0.014728432644771238 {'metric': 'euclidean', 'n_neighbors': 5, 'weights': 'uniform'}
0.49719250203333915 0.014728432644771238 {'metric': 'euclidean', 'n_neighbors': 5, 'weights': 'distance'}
0.46371210188953327 0.026576217849128208 {'metric': 'euclidean', 'n_neighbors': 11, 'weights': 'uniform'}
0.46371210188953327 0.026576217849128208 {'metric': 'euclidean', 'n_neighbors': 11, 'weights': 'distance'}


Accuracy S&P 500
0.5099906648687896 0.017500388164127485 {'metric': 'minkowski', 'n_neighbors': 5, 'weights': 'uniform'}
0.50

#### PCA features

In [342]:
## DJI
lr = KNeighborsClassifier()
print("Macro Average F1 DJI")
grid_search = GridSearchCV(lr, param_grid_lr, cv=5, scoring="f1_macro")
grid_search.fit(dji_X_train_pca, dji_y_train_pca)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)
    

print("\n")
print("Accuracy DJI")
grid_search = GridSearchCV(lr, param_grid_lr, cv=5, scoring="accuracy")
grid_search.fit(dji_X_train_pca, dji_y_train_pca)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)

Macro Average F1 DJI
0.4964641895654502 0.03773302036094321 {'metric': 'minkowski', 'n_neighbors': 5, 'weights': 'uniform'}
0.4964641895654502 0.03773302036094321 {'metric': 'minkowski', 'n_neighbors': 5, 'weights': 'distance'}
0.4914951367822913 0.03842646620548514 {'metric': 'minkowski', 'n_neighbors': 11, 'weights': 'uniform'}
0.4914951367822913 0.03842646620548514 {'metric': 'minkowski', 'n_neighbors': 11, 'weights': 'distance'}
0.4964641895654502 0.03773302036094321 {'metric': 'euclidean', 'n_neighbors': 5, 'weights': 'uniform'}
0.4964641895654502 0.03773302036094321 {'metric': 'euclidean', 'n_neighbors': 5, 'weights': 'distance'}
0.4914951367822913 0.03842646620548514 {'metric': 'euclidean', 'n_neighbors': 11, 'weights': 'uniform'}
0.4914951367822913 0.03842646620548514 {'metric': 'euclidean', 'n_neighbors': 11, 'weights': 'distance'}


Accuracy DJI
0.5028586246240017 0.039200001907478084 {'metric': 'minkowski', 'n_neighbors': 5, 'weights': 'uniform'}
0.5028586246240017 0.0392000

In [343]:
## nasdaq
lr = KNeighborsClassifier()
print("Macro Average F1 NASDAQ")
grid_search = GridSearchCV(lr, param_grid_lr, cv=5, scoring="f1_macro")
grid_search.fit(nasdaq_X_train_pca, nasdaq_y_train_pca)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)
    

print("\n")
print("Accuracy NASDAQ")
grid_search = GridSearchCV(lr, param_grid_lr, cv=5, scoring="accuracy")
grid_search.fit(nasdaq_X_train_pca, nasdaq_y_train_pca)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)

Macro Average F1 NASDAQ
0.4915345238729477 0.03580629873918181 {'metric': 'minkowski', 'n_neighbors': 5, 'weights': 'uniform'}
0.4915345238729477 0.03580629873918181 {'metric': 'minkowski', 'n_neighbors': 5, 'weights': 'distance'}
0.47181937419121994 0.013355218871449394 {'metric': 'minkowski', 'n_neighbors': 11, 'weights': 'uniform'}
0.47181937419121994 0.013355218871449394 {'metric': 'minkowski', 'n_neighbors': 11, 'weights': 'distance'}
0.4915345238729477 0.03580629873918181 {'metric': 'euclidean', 'n_neighbors': 5, 'weights': 'uniform'}
0.4915345238729477 0.03580629873918181 {'metric': 'euclidean', 'n_neighbors': 5, 'weights': 'distance'}
0.47181937419121994 0.013355218871449394 {'metric': 'euclidean', 'n_neighbors': 11, 'weights': 'uniform'}
0.47181937419121994 0.013355218871449394 {'metric': 'euclidean', 'n_neighbors': 11, 'weights': 'distance'}


Accuracy NASDAQ
0.5022881443833628 0.03365938139102421 {'metric': 'minkowski', 'n_neighbors': 5, 'weights': 'uniform'}
0.5022881443833

In [344]:
## nyse
lr = KNeighborsClassifier()
print("Macro Average F1 NYSE")
grid_search = GridSearchCV(lr, param_grid_lr, cv=5, scoring="f1_macro")
grid_search.fit(nyse_X_train_pca, nyse_y_train_pca)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)
    

print("\n")
print("Accuracy NYSE")
grid_search = GridSearchCV(lr, param_grid_lr, cv=5, scoring="accuracy")
grid_search.fit(nyse_X_train_pca, nyse_y_train_pca)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)

Macro Average F1 NYSE
0.49243823778440515 0.02990303306949988 {'metric': 'minkowski', 'n_neighbors': 5, 'weights': 'uniform'}
0.49243823778440515 0.02990303306949988 {'metric': 'minkowski', 'n_neighbors': 5, 'weights': 'distance'}
0.48542167027286043 0.03590126971932132 {'metric': 'minkowski', 'n_neighbors': 11, 'weights': 'uniform'}
0.48542167027286043 0.03590126971932132 {'metric': 'minkowski', 'n_neighbors': 11, 'weights': 'distance'}
0.49243823778440515 0.02990303306949988 {'metric': 'euclidean', 'n_neighbors': 5, 'weights': 'uniform'}
0.49243823778440515 0.02990303306949988 {'metric': 'euclidean', 'n_neighbors': 5, 'weights': 'distance'}
0.48542167027286043 0.03590126971932132 {'metric': 'euclidean', 'n_neighbors': 11, 'weights': 'uniform'}
0.48542167027286043 0.03590126971932132 {'metric': 'euclidean', 'n_neighbors': 11, 'weights': 'distance'}


Accuracy NYSE
0.49774712166787677 0.025917949167509245 {'metric': 'minkowski', 'n_neighbors': 5, 'weights': 'uniform'}
0.497747121667876

In [345]:
## russell
lr = KNeighborsClassifier()
print("Macro Average F1 RUSSELL")
grid_search = GridSearchCV(lr, param_grid_lr, cv=5, scoring="f1_macro")
grid_search.fit(russell_X_train_pca, russell_y_train_pca)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)
    

print("\n")
print("Accuracy RUSSELL")
grid_search = GridSearchCV(lr, param_grid_lr, cv=5, scoring="accuracy")
grid_search.fit(russell_X_train_pca, russell_y_train_pca)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)

Macro Average F1 RUSSELL
0.47768402695600204 0.027895034993252826 {'metric': 'minkowski', 'n_neighbors': 5, 'weights': 'uniform'}
0.47768402695600204 0.027895034993252826 {'metric': 'minkowski', 'n_neighbors': 5, 'weights': 'distance'}
0.47231160150574797 0.02535388933469441 {'metric': 'minkowski', 'n_neighbors': 11, 'weights': 'uniform'}
0.47231160150574797 0.02535388933469441 {'metric': 'minkowski', 'n_neighbors': 11, 'weights': 'distance'}
0.47768402695600204 0.027895034993252826 {'metric': 'euclidean', 'n_neighbors': 5, 'weights': 'uniform'}
0.47768402695600204 0.027895034993252826 {'metric': 'euclidean', 'n_neighbors': 5, 'weights': 'distance'}
0.47231160150574797 0.02535388933469441 {'metric': 'euclidean', 'n_neighbors': 11, 'weights': 'uniform'}
0.47231160150574797 0.02535388933469441 {'metric': 'euclidean', 'n_neighbors': 11, 'weights': 'distance'}


Accuracy RUSSELL
0.4822839954361581 0.028175109915603225 {'metric': 'minkowski', 'n_neighbors': 5, 'weights': 'uniform'}
0.482283

In [346]:
## S&P 500
lr = KNeighborsClassifier()
print("Macro Average F1 S&P 500")
grid_search = GridSearchCV(lr, param_grid_lr, cv=5, scoring="f1_macro")
grid_search.fit(sp_X_train_pca, sp_y_train_pca)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)
    

print("\n")
print("Accuracy S&P 500")
grid_search = GridSearchCV(lr, param_grid_lr, cv=5, scoring="accuracy")
grid_search.fit(sp_X_train_pca, sp_y_train_pca)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)

Macro Average F1 S&P 500
0.4759734160896879 0.02996045435036555 {'metric': 'minkowski', 'n_neighbors': 5, 'weights': 'uniform'}
0.4759734160896879 0.02996045435036555 {'metric': 'minkowski', 'n_neighbors': 5, 'weights': 'distance'}
0.48634112452635014 0.03621543738452994 {'metric': 'minkowski', 'n_neighbors': 11, 'weights': 'uniform'}
0.48634112452635014 0.03621543738452994 {'metric': 'minkowski', 'n_neighbors': 11, 'weights': 'distance'}
0.4759734160896879 0.02996045435036555 {'metric': 'euclidean', 'n_neighbors': 5, 'weights': 'uniform'}
0.4759734160896879 0.02996045435036555 {'metric': 'euclidean', 'n_neighbors': 5, 'weights': 'distance'}
0.48634112452635014 0.03621543738452994 {'metric': 'euclidean', 'n_neighbors': 11, 'weights': 'uniform'}
0.48634112452635014 0.03621543738452994 {'metric': 'euclidean', 'n_neighbors': 11, 'weights': 'distance'}


Accuracy S&P 500
0.4848459703350274 0.029839219209901062 {'metric': 'minkowski', 'n_neighbors': 5, 'weights': 'uniform'}
0.48484597033502

#### Technical indicator features

In [347]:
## DJI
lr = KNeighborsClassifier()
print("Macro Average F1 DJI")
grid_search = GridSearchCV(lr, param_grid_lr, cv=5, scoring="f1_macro")
grid_search.fit(dji_X_train_ti, dji_y_train_ti)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)
    

print("\n")
print("Accuracy DJI")
grid_search = GridSearchCV(lr, param_grid_lr, cv=5, scoring="accuracy")
grid_search.fit(dji_X_train_ti, dji_y_train_ti)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)

Macro Average F1 DJI
0.5111952313926115 0.008736035706590962 {'metric': 'minkowski', 'n_neighbors': 5, 'weights': 'uniform'}
0.5111952313926115 0.008736035706590962 {'metric': 'minkowski', 'n_neighbors': 5, 'weights': 'distance'}
0.521549535820987 0.027981663151547268 {'metric': 'minkowski', 'n_neighbors': 11, 'weights': 'uniform'}
0.521549535820987 0.027981663151547268 {'metric': 'minkowski', 'n_neighbors': 11, 'weights': 'distance'}
0.5111952313926115 0.008736035706590962 {'metric': 'euclidean', 'n_neighbors': 5, 'weights': 'uniform'}
0.5111952313926115 0.008736035706590962 {'metric': 'euclidean', 'n_neighbors': 5, 'weights': 'distance'}
0.521549535820987 0.027981663151547268 {'metric': 'euclidean', 'n_neighbors': 11, 'weights': 'uniform'}
0.521549535820987 0.027981663151547268 {'metric': 'euclidean', 'n_neighbors': 11, 'weights': 'distance'}


Accuracy DJI
0.513861632610725 0.008261387903684152 {'metric': 'minkowski', 'n_neighbors': 5, 'weights': 'uniform'}
0.513861632610725 0.00826

In [348]:
## nasdaq
lr = KNeighborsClassifier()
print("Macro Average F1 NASDAQ")
grid_search = GridSearchCV(lr, param_grid_lr, cv=5, scoring="f1_macro")
grid_search.fit(nasdaq_X_train_ti, nasdaq_y_train_ti)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)
    

print("\n")
print("Accuracy NASDAQ")
grid_search = GridSearchCV(lr, param_grid_lr, cv=5, scoring="accuracy")
grid_search.fit(nasdaq_X_train_ti, nasdaq_y_train_ti)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)

Macro Average F1 NASDAQ
0.4707266398837917 0.029467537997379972 {'metric': 'minkowski', 'n_neighbors': 5, 'weights': 'uniform'}
0.4707266398837917 0.029467537997379972 {'metric': 'minkowski', 'n_neighbors': 5, 'weights': 'distance'}
0.46832124625221017 0.021670383602170662 {'metric': 'minkowski', 'n_neighbors': 11, 'weights': 'uniform'}
0.46832124625221017 0.021670383602170662 {'metric': 'minkowski', 'n_neighbors': 11, 'weights': 'distance'}
0.4707266398837917 0.029467537997379972 {'metric': 'euclidean', 'n_neighbors': 5, 'weights': 'uniform'}
0.4707266398837917 0.029467537997379972 {'metric': 'euclidean', 'n_neighbors': 5, 'weights': 'distance'}
0.46832124625221017 0.021670383602170662 {'metric': 'euclidean', 'n_neighbors': 11, 'weights': 'uniform'}
0.46832124625221017 0.021670383602170662 {'metric': 'euclidean', 'n_neighbors': 11, 'weights': 'distance'}


Accuracy NASDAQ
0.4815869723057774 0.03044376162690558 {'metric': 'minkowski', 'n_neighbors': 5, 'weights': 'uniform'}
0.481586972

In [349]:
## nyse
lr = KNeighborsClassifier()
print("Macro Average F1 NYSE")
grid_search = GridSearchCV(lr, param_grid_lr, cv=5, scoring="f1_macro")
grid_search.fit(nyse_X_train_ti, nyse_y_train_ti)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)
    

print("\n")
print("Accuracy NYSE")
grid_search = GridSearchCV(lr, param_grid_lr, cv=5, scoring="accuracy")
grid_search.fit(nyse_X_train_ti, nyse_y_train_ti)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)

Macro Average F1 NYSE
0.48545263106625036 0.02033339593138549 {'metric': 'minkowski', 'n_neighbors': 5, 'weights': 'uniform'}
0.48545263106625036 0.02033339593138549 {'metric': 'minkowski', 'n_neighbors': 5, 'weights': 'distance'}
0.47837061514159107 0.026283388240966925 {'metric': 'minkowski', 'n_neighbors': 11, 'weights': 'uniform'}
0.47837061514159107 0.026283388240966925 {'metric': 'minkowski', 'n_neighbors': 11, 'weights': 'distance'}
0.48545263106625036 0.02033339593138549 {'metric': 'euclidean', 'n_neighbors': 5, 'weights': 'uniform'}
0.48545263106625036 0.02033339593138549 {'metric': 'euclidean', 'n_neighbors': 5, 'weights': 'distance'}
0.47837061514159107 0.026283388240966925 {'metric': 'euclidean', 'n_neighbors': 11, 'weights': 'uniform'}
0.47837061514159107 0.026283388240966925 {'metric': 'euclidean', 'n_neighbors': 11, 'weights': 'distance'}


Accuracy NYSE
0.4919302976869619 0.019025130750916608 {'metric': 'minkowski', 'n_neighbors': 5, 'weights': 'uniform'}
0.491930297686

In [350]:
## russell
lr = KNeighborsClassifier()
print("Macro Average F1 RUSSELL")
grid_search = GridSearchCV(lr, param_grid_lr, cv=5, scoring="f1_macro")
grid_search.fit(russell_X_train_ti, russell_y_train_ti)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)
    

print("\n")
print("Accuracy RUSSELL")
grid_search = GridSearchCV(lr, param_grid_lr, cv=5, scoring="accuracy")
grid_search.fit(russell_X_train_ti, russell_y_train_ti)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)

Macro Average F1 RUSSELL
0.49943067257885865 0.023194986414455537 {'metric': 'minkowski', 'n_neighbors': 5, 'weights': 'uniform'}
0.49943067257885865 0.023194986414455537 {'metric': 'minkowski', 'n_neighbors': 5, 'weights': 'distance'}
0.4967043818081118 0.014494368006405734 {'metric': 'minkowski', 'n_neighbors': 11, 'weights': 'uniform'}
0.4967043818081118 0.014494368006405734 {'metric': 'minkowski', 'n_neighbors': 11, 'weights': 'distance'}
0.49943067257885865 0.023194986414455537 {'metric': 'euclidean', 'n_neighbors': 5, 'weights': 'uniform'}
0.49943067257885865 0.023194986414455537 {'metric': 'euclidean', 'n_neighbors': 5, 'weights': 'distance'}
0.4967043818081118 0.014494368006405734 {'metric': 'euclidean', 'n_neighbors': 11, 'weights': 'uniform'}
0.4967043818081118 0.014494368006405734 {'metric': 'euclidean', 'n_neighbors': 11, 'weights': 'distance'}


Accuracy RUSSELL
0.5009770770666944 0.023864919056660664 {'metric': 'minkowski', 'n_neighbors': 5, 'weights': 'uniform'}
0.500977

In [351]:
## S&P 500
lr = KNeighborsClassifier()
print("Macro Average F1 S&P 500")
grid_search = GridSearchCV(lr, param_grid_lr, cv=5, scoring="f1_macro")
grid_search.fit(sp_X_train_ti, sp_y_train_ti)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)
    

print("\n")
print("Accuracy S&P 500")
grid_search = GridSearchCV(lr, param_grid_lr, cv=5, scoring="accuracy")
grid_search.fit(sp_X_train_ti, sp_y_train_ti)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)

Macro Average F1 S&P 500
0.48031744894235723 0.012948336327912768 {'metric': 'minkowski', 'n_neighbors': 5, 'weights': 'uniform'}
0.48031744894235723 0.012948336327912768 {'metric': 'minkowski', 'n_neighbors': 5, 'weights': 'distance'}
0.4815987782216512 0.037641653612937975 {'metric': 'minkowski', 'n_neighbors': 11, 'weights': 'uniform'}
0.4815987782216512 0.037641653612937975 {'metric': 'minkowski', 'n_neighbors': 11, 'weights': 'distance'}
0.48031744894235723 0.012948336327912768 {'metric': 'euclidean', 'n_neighbors': 5, 'weights': 'uniform'}
0.48031744894235723 0.012948336327912768 {'metric': 'euclidean', 'n_neighbors': 5, 'weights': 'distance'}
0.4815987782216512 0.037641653612937975 {'metric': 'euclidean', 'n_neighbors': 11, 'weights': 'uniform'}
0.4815987782216512 0.037641653612937975 {'metric': 'euclidean', 'n_neighbors': 11, 'weights': 'distance'}


Accuracy S&P 500
0.48292293330567365 0.012785279877846122 {'metric': 'minkowski', 'n_neighbors': 5, 'weights': 'uniform'}
0.48292

### Decision Tree

In [352]:
# Grid search parameter
param_grid_lr = {
    "criterion": ["gini", "entropy"],
}

#### Full features

In [353]:
## DJI
lr = DecisionTreeClassifier()
print("Macro Average F1 DJI")
grid_search = GridSearchCV(lr, param_grid_lr, cv=5, scoring="f1_macro")
grid_search.fit(dji_X_train_full, dji_y_train_full)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)
    

print("\n")
print("Accuracy DJI")
grid_search = GridSearchCV(lr, param_grid_lr, cv=5, scoring="accuracy")
grid_search.fit(dji_X_train_full, dji_y_train_full)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)

Macro Average F1 DJI
0.5119651745060441 0.02849405336753179 {'criterion': 'gini'}
0.5177666088876158 0.023954577197354904 {'criterion': 'entropy'}


Accuracy DJI
0.5151561041385748 0.029943944211522463 {'criterion': 'gini'}
0.5054890571517477 0.01840307893861987 {'criterion': 'entropy'}


In [354]:
## nasdaq
lr = DecisionTreeClassifier()
print("Macro Average F1 NASDAQ")
grid_search = GridSearchCV(lr, param_grid_lr, cv=5, scoring="f1_macro")
grid_search.fit(nasdaq_X_train_full, nasdaq_y_train_full)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)
    

print("\n")
print("Accuracy NASDAQ")
grid_search = GridSearchCV(lr, param_grid_lr, cv=5, scoring="accuracy")
grid_search.fit(nasdaq_X_train_full, nasdaq_y_train_full)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)

Macro Average F1 NASDAQ
0.49779063240242677 0.026957217186923198 {'criterion': 'gini'}
0.48162842977367826 0.020400397727961452 {'criterion': 'entropy'}


Accuracy NASDAQ
0.4919510424229851 0.023781843549673682 {'criterion': 'gini'}
0.49192407426615503 0.01828877458345673 {'criterion': 'entropy'}


In [355]:
## nyse
lr = DecisionTreeClassifier()
print("Macro Average F1 NYSE")
grid_search = GridSearchCV(lr, param_grid_lr, cv=5, scoring="f1_macro")
grid_search.fit(nyse_X_train_full, nyse_y_train_full)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)
    

print("\n")
print("Accuracy NYSE")
grid_search = GridSearchCV(lr, param_grid_lr, cv=5, scoring="accuracy")
grid_search.fit(nyse_X_train_full, nyse_y_train_full)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)

Macro Average F1 NYSE
0.5102795066588495 0.022420570265741253 {'criterion': 'gini'}
0.5070763303911581 0.020392796014804097 {'criterion': 'entropy'}


Accuracy NYSE
0.5061342184420703 0.012913649994543221 {'criterion': 'gini'}
0.5132206202676071 0.02581555384508155 {'criterion': 'entropy'}


In [356]:
## russell
lr = DecisionTreeClassifier()
print("Macro Average F1 RUSSELL")
grid_search = GridSearchCV(lr, param_grid_lr, cv=5, scoring="f1_macro")
grid_search.fit(russell_X_train_full, russell_y_train_full)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)
    

print("\n")
print("Accuracy RUSSELL")
grid_search = GridSearchCV(lr, param_grid_lr, cv=5, scoring="accuracy")
grid_search.fit(russell_X_train_full, russell_y_train_full)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)

Macro Average F1 RUSSELL
0.484908566646872 0.03207509234961189 {'criterion': 'gini'}
0.49875876112894024 0.020405522754550418 {'criterion': 'entropy'}


Accuracy RUSSELL
0.4848584171766414 0.028029548066361803 {'criterion': 'gini'}
0.5074224665491132 0.016242399177266463 {'criterion': 'entropy'}


In [357]:
## S&P 500
lr = DecisionTreeClassifier()
print("Macro Average F1 S&P 500")
grid_search = GridSearchCV(lr, param_grid_lr, cv=5, scoring="f1_macro")
grid_search.fit(sp_X_train_full, sp_y_train_full)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)
    

print("\n")
print("Accuracy S&P 500")
grid_search = GridSearchCV(lr, param_grid_lr, cv=5, scoring="accuracy")
grid_search.fit(sp_X_train_full, sp_y_train_full)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)

Macro Average F1 S&P 500
0.4870031387315462 0.0417707229390754 {'criterion': 'gini'}
0.4859248320055912 0.02047407513172751 {'criterion': 'entropy'}


Accuracy S&P 500
0.49195519137018984 0.04341252627470855 {'criterion': 'gini'}
0.49711233274556577 0.02182525135799504 {'criterion': 'entropy'}


#### PCA features

In [358]:
## DJI
lr = DecisionTreeClassifier()
print("Macro Average F1 DJI")
grid_search = GridSearchCV(lr, param_grid_lr, cv=5, scoring="f1_macro")
grid_search.fit(dji_X_train_pca, dji_y_train_pca)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)
    

print("\n")
print("Accuracy DJI")
grid_search = GridSearchCV(lr, param_grid_lr, cv=5, scoring="accuracy")
grid_search.fit(dji_X_train_pca, dji_y_train_pca)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)

Macro Average F1 DJI
0.49092859242612635 0.012828150391292193 {'criterion': 'gini'}
0.4919149942919896 0.016495220466734645 {'criterion': 'entropy'}


Accuracy DJI
0.4809895239083083 0.02909993191394055 {'criterion': 'gini'}
0.5061362929156725 0.030707188768571404 {'criterion': 'entropy'}


In [None]:
## nasdaq
lr = DecisionTreeClassifier()
print("Macro Average F1 NASDAQ")
grid_search = GridSearchCV(lr, param_grid_lr, cv=5, scoring="f1_macro")
grid_search.fit(nasdaq_X_train_pca, nasdaq_y_train_pca)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)
    

print("\n")
print("Accuracy NASDAQ")
grid_search = GridSearchCV(lr, param_grid_lr, cv=5, scoring="accuracy")
grid_search.fit(nasdaq_X_train_pca, nasdaq_y_train_pca)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)

Macro Average F1 NASDAQ


In [360]:
## nyse
lr = DecisionTreeClassifier()
print("Macro Average F1 NYSE")
grid_search = GridSearchCV(lr, param_grid_lr, cv=5, scoring="f1_macro")
grid_search.fit(nyse_X_train_pca, nyse_y_train_pca)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)
    

print("\n")
print("Accuracy NYSE")
grid_search = GridSearchCV(lr, param_grid_lr, cv=5, scoring="accuracy")
grid_search.fit(nyse_X_train_pca, nyse_y_train_pca)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)

0.514515091795457 0.0072149780333067084 {'criterion': 'gini'}
0.4874245410227155 0.02960800295859173 {'criterion': 'entropy'}


In [361]:
## russell
lr = DecisionTreeClassifier()
print("Macro Average F1 RUSSELL")
grid_search = GridSearchCV(lr, param_grid_lr, cv=5, scoring="f1_macro")
grid_search.fit(russell_X_train_pca, russell_y_train_pca)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)
    

print("\n")
print("Accuracy RUSSELL")
grid_search = GridSearchCV(lr, param_grid_lr, cv=5, scoring="accuracy")
grid_search.fit(russell_X_train_pca, russell_y_train_pca)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)

Macro Average F1 RUSSELL
0.48943346237860014 0.024726854417723345 {'criterion': 'gini'}
0.4856995667906088 0.02361994132931496 {'criterion': 'entropy'}


Accuracy RUSSELL
0.4957846696400788 0.03180338031809331 {'criterion': 'gini'}
0.48289181620163885 0.024962750882116955 {'criterion': 'entropy'}


In [362]:
## S&P 500
lr = DecisionTreeClassifier()
print("Macro Average F1 S&P 500")
grid_search = GridSearchCV(lr, param_grid_lr, cv=5, scoring="f1_macro")
grid_search.fit(sp_X_train_pca, sp_y_train_pca)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)
    

print("\n")
print("Accuracy S&P 500")
grid_search = GridSearchCV(lr, param_grid_lr, cv=5, scoring="accuracy")
grid_search.fit(sp_X_train_pca, sp_y_train_pca)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)

Macro Average F1 S&P 500
0.5221539716970455 0.04390274023219981 {'criterion': 'gini'}
0.5222305882505955 0.03488364255745387 {'criterion': 'entropy'}


Accuracy S&P 500
0.50810704283788 0.04645545059497568 {'criterion': 'gini'}
0.5273498599730319 0.042173602622073265 {'criterion': 'entropy'}


#### Technical indicator features

In [363]:
## DJI
lr = DecisionTreeClassifier()
print("Macro Average F1 DJI")
grid_search = GridSearchCV(lr, param_grid_lr, cv=5, scoring="f1_macro")
grid_search.fit(dji_X_train_ti, dji_y_train_ti)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)
    

print("\n")
print("Accuracy DJI")
grid_search = GridSearchCV(lr, param_grid_lr, cv=5, scoring="accuracy")
grid_search.fit(dji_X_train_ti, dji_y_train_ti)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)

Macro Average F1 DJI
0.49584437248464697 0.02222473737898607 {'criterion': 'gini'}
0.49477500395648494 0.018913109236330387 {'criterion': 'entropy'}


Accuracy DJI
0.5080759257338451 0.025265182205001034 {'criterion': 'gini'}
0.4971330774815891 0.035258355689156654 {'criterion': 'entropy'}


In [364]:
## nasdaq
lr = DecisionTreeClassifier()
print("Macro Average F1 NASDAQ")
grid_search = GridSearchCV(lr, param_grid_lr, cv=5, scoring="f1_macro")
grid_search.fit(nasdaq_X_train_ti, nasdaq_y_train_ti)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)
    

print("\n")
print("Accuracy NASDAQ")
grid_search = GridSearchCV(lr, param_grid_lr, cv=5, scoring="accuracy")
grid_search.fit(nasdaq_X_train_ti, nasdaq_y_train_ti)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)

Macro Average F1 NASDAQ
0.5100371191098757 0.018345795314447234 {'criterion': 'gini'}
0.5314143627003529 0.0279063657313658 {'criterion': 'entropy'}


Accuracy NASDAQ
0.524823151125402 0.013441212933194576 {'criterion': 'gini'}
0.5390063271444872 0.03005957103302203 {'criterion': 'entropy'}


In [365]:
## nyse
lr = DecisionTreeClassifier()
print("Macro Average F1 NYSE")
grid_search = GridSearchCV(lr, param_grid_lr, cv=5, scoring="f1_macro")
grid_search.fit(nyse_X_train_ti, nyse_y_train_ti)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)
    

print("\n")
print("Accuracy NYSE")
grid_search = GridSearchCV(lr, param_grid_lr, cv=5, scoring="accuracy")
grid_search.fit(nyse_X_train_ti, nyse_y_train_ti)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)

Macro Average F1 NYSE
0.50619669301018 0.030903754304910325 {'criterion': 'gini'}
0.5080878155455847 0.03295988487469803 {'criterion': 'entropy'}


Accuracy NYSE
0.5022300591224977 0.03668609292311111 {'criterion': 'gini'}
0.5151249870345399 0.043955374650254404 {'criterion': 'entropy'}


In [366]:
## russell
lr = DecisionTreeClassifier()
print("Macro Average F1 RUSSELL")
grid_search = GridSearchCV(lr, param_grid_lr, cv=5, scoring="f1_macro")
grid_search.fit(russell_X_train_ti, russell_y_train_ti)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)
    

print("\n")
print("Accuracy RUSSELL")
grid_search = GridSearchCV(lr, param_grid_lr, cv=5, scoring="accuracy")
grid_search.fit(russell_X_train_ti, russell_y_train_ti)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)

Macro Average F1 RUSSELL
0.4977519792761743 0.02840942500231682 {'criterion': 'gini'}
0.48143936501904416 0.022185782814122965 {'criterion': 'entropy'}


Accuracy RUSSELL
0.5015994191473914 0.015720605962287255 {'criterion': 'gini'}
0.5016014936209937 0.022618558095089336 {'criterion': 'entropy'}


In [367]:
## S&P 500
lr = DecisionTreeClassifier()
print("Macro Average F1 S&P 500")
grid_search = GridSearchCV(lr, param_grid_lr, cv=5, scoring="f1_macro")
grid_search.fit(sp_X_train_ti, sp_y_train_ti)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)
    

print("\n")
print("Accuracy S&P 500")
grid_search = GridSearchCV(lr, param_grid_lr, cv=5, scoring="accuracy")
grid_search.fit(sp_X_train_ti, sp_y_train_ti)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)

Macro Average F1 S&P 500
0.5212565101942929 0.024764375041138546 {'criterion': 'gini'}
0.5160494767373722 0.017995597971488336 {'criterion': 'entropy'}


Accuracy S&P 500
0.5235307540711545 0.019766324948152645 {'criterion': 'gini'}
0.5177201535110465 0.016042561193953246 {'criterion': 'entropy'}


### SVM

#### Full features

In [369]:
# DJI
gnb = SVC()
gnb.fit(dji_X_train_full, dji_y_train_full)
print("Macro Average F1 DJI")
analyse_cv(gnb, dji_X_train_full, dji_y_train_full, 5, "f1_macro")

print("\n")
print("Accuracy DJI")
analyse_cv(gnb, dji_X_train_full, dji_y_train_full, 5, "accuracy")

Macro Average F1 DJI
Scores: [0.44788789 0.51505811 0.50747404 0.46203274 0.47673881]
Mean Scores: 0.481838317503124
Standard deviation: 0.025813340895137395


Accuracy DJI
Scores: [0.51446945 0.56451613 0.54193548 0.52258065 0.52903226]
Mean Scores: 0.5345067939010477
Standard deviation: 0.017489678163895694


In [370]:
# NASDAQ
gnb = SVC()
gnb.fit(nasdaq_X_train_full, nasdaq_y_train_full)
print("Macro Average F1 NASDAQ")
analyse_cv(gnb, nasdaq_X_train_full, nasdaq_y_train_full, 5, "f1_macro")

print("\n")
print("Accuracy NASDAQ")
analyse_cv(gnb, nasdaq_X_train_full, nasdaq_y_train_full, 5, "accuracy")

Macro Average F1 NASDAQ
Scores: [0.48180303 0.46048321 0.42059237 0.44685396 0.42081286]
Mean Scores: 0.4461090857585502
Standard deviation: 0.023546678121591332


Accuracy NASDAQ
Scores: [0.57234727 0.56129032 0.55806452 0.55806452 0.52903226]
Mean Scores: 0.5557597759568511
Standard deviation: 0.014357685526053704


In [371]:
# NYSE
gnb = SVC()
gnb.fit(nyse_X_train_full, nyse_y_train_full)
print("Macro Average F1 NYSE")
analyse_cv(gnb, nyse_X_train_full, nyse_y_train_full, 5, "f1_macro")

print("\n")
print("Accuracy NYSE")
analyse_cv(gnb, nyse_X_train_full, nyse_y_train_full, 5, "accuracy")

Macro Average F1 NYSE
Scores: [0.44644013 0.40638614 0.46203274 0.43745287 0.43795253]
Mean Scores: 0.4380528804294248
Standard deviation: 0.018157820762591606


Accuracy NYSE
Scores: [0.50482315 0.48387097 0.52258065 0.50322581 0.5       ]
Mean Scores: 0.5029001140960482
Standard deviation: 0.012342175842800467


In [372]:
# Russell
gnb = SVC()
gnb.fit(russell_X_train_full, russell_y_train_full)
print("Macro Average F1 RUSSELL")
analyse_cv(gnb, russell_X_train_full, russell_y_train_full, 5, "f1_macro")

print("\n")
print("Accuracy RUSSELL")
analyse_cv(gnb, russell_X_train_full, russell_y_train_full, 5, "accuracy")

Macro Average F1 RUSSELL
Scores: [0.42475584 0.44321487 0.48146589 0.46346937 0.48073702]
Mean Scores: 0.4587285967426212
Standard deviation: 0.021993878305940847


Accuracy RUSSELL
Scores: [0.51768489 0.5        0.53870968 0.53225806 0.52258065]
Mean Scores: 0.5222466549113162
Standard deviation: 0.013319867558556573


In [373]:
# S&P 500
gnb = SVC()
gnb.fit(sp_X_train_full, sp_y_train_full)
print("Macro Average F1 S&P 500")
analyse_cv(gnb, sp_X_train_full, sp_y_train_full, 5, "f1_macro")

print("\n")
print("Accuracy S&P 500")
analyse_cv(gnb, sp_X_train_full, sp_y_train_full, 5, "accuracy")

Macro Average F1 S&P 500
Scores: [0.45464455 0.5080811  0.47412099 0.4319933  0.46203274]
Mean Scores: 0.4661745373315158
Standard deviation: 0.02505098213883681


Accuracy S&P 500
Scores: [0.52411576 0.56451613 0.5516129  0.51290323 0.52258065]
Mean Scores: 0.5351457317705632
Standard deviation: 0.01953488580890906


#### PCA features

In [374]:
# DJI
gnb = SVC()
gnb.fit(dji_X_train_pca, dji_y_train_pca)
print("Macro Average F1 DJI")
analyse_cv(gnb, dji_X_train_pca, dji_y_train_pca, 5, "f1_macro")

print("\n")
print("Accuracy DJI")
analyse_cv(gnb, dji_X_train_pca, dji_y_train_pca, 5, "accuracy")

Macro Average F1 DJI
Scores: [0.49192361 0.4839288  0.44339579 0.44581958 0.45970921]
Mean Scores: 0.46495539824461785
Standard deviation: 0.019727102642374074


Accuracy DJI
Scores: [0.54340836 0.53870968 0.49677419 0.49677419 0.51935484]
Mean Scores: 0.5190042526708848
Standard deviation: 0.019861209394577452


In [375]:
# NASDAQ
gnb = SVC()
gnb.fit(nasdaq_X_train_pca, nasdaq_y_train_pca)
print("Macro Average F1 NASDAQ")
analyse_cv(gnb, nasdaq_X_train_pca, nasdaq_y_train_pca, 5, "f1_macro")

print("\n")
print("Accuracy NASDAQ")
analyse_cv(gnb, nasdaq_X_train_pca, nasdaq_y_train_pca, 5, "accuracy")

Macro Average F1 NASDAQ
Scores: [0.43946526 0.45625095 0.4816876  0.47958388 0.44888889]
Mean Scores: 0.461175314706691
Standard deviation: 0.01676998027797834


Accuracy NASDAQ
Scores: [0.55305466 0.55483871 0.55806452 0.59032258 0.56129032]
Mean Scores: 0.5635141582823359
Standard deviation: 0.01369671506201726


In [376]:
# NYSE
gnb = SVC()
gnb.fit(nyse_X_train_pca, nyse_y_train_pca)
print("Macro Average F1 NYSE")
analyse_cv(gnb, nyse_X_train_pca, nyse_y_train_pca, 5, "f1_macro")

print("\n")
print("Accuracy NYSE")
analyse_cv(gnb, nyse_X_train_pca, nyse_y_train_pca, 5, "accuracy")

Macro Average F1 NYSE
Scores: [0.45520063 0.46203274 0.443487   0.46480365 0.45538121]
Mean Scores: 0.45618104517984115
Standard deviation: 0.007366562553326154


Accuracy NYSE
Scores: [0.52090032 0.52258065 0.51612903 0.51612903 0.52903226]
Mean Scores: 0.5209542578570687
Standard deviation: 0.004784720186246138


In [377]:
# Russell
gnb = SVC()
gnb.fit(russell_X_train_pca, russell_y_train_pca)
print("Macro Average F1 RUSSELL")
analyse_cv(gnb, russell_X_train_pca, russell_y_train_pca, 5, "f1_macro")

print("\n")
print("Accuracy RUSSELL")
analyse_cv(gnb, russell_X_train_pca, russell_y_train_pca, 5, "accuracy")

Macro Average F1 RUSSELL
Scores: [0.52369369 0.47403182 0.45952507 0.45206448 0.44579191]
Mean Scores: 0.4710213944607058
Standard deviation: 0.02796959263769261


Accuracy RUSSELL
Scores: [0.56270096 0.51935484 0.49032258 0.51612903 0.49354839]
Mean Scores: 0.5164111606679805
Standard deviation: 0.025902938870680993


In [378]:
# S&P 500
gnb = SVC()
gnb.fit(sp_X_train_pca, sp_y_train_pca)
print("Macro Average F1 S&P 500")
analyse_cv(gnb, sp_X_train_pca, sp_y_train_pca, 5, "f1_macro")

print("\n")
print("Accuracy S&P 500")
analyse_cv(gnb, sp_X_train_pca, sp_y_train_pca, 5, "accuracy")

Macro Average F1 S&P 500
Scores: [0.47248055 0.5200177  0.48754148 0.47827021 0.49107473]
Mean Scores: 0.4898769336330176
Standard deviation: 0.01644812865891541


Accuracy S&P 500
Scores: [0.53376206 0.57096774 0.55806452 0.54516129 0.54193548]
Mean Scores: 0.5499782180271756
Standard deviation: 0.013090458709289139


#### Technical indicator features

In [379]:
# DJI
gnb = SVC()
gnb.fit(dji_X_train_ti, dji_y_train_ti)
print("Macro Average F1 DJI")
analyse_cv(gnb, dji_X_train_ti, dji_y_train_ti, 5, "f1_macro")

print("\n")
print("Accuracy DJI")
analyse_cv(gnb, dji_X_train_ti, dji_y_train_ti, 5, "accuracy")

Macro Average F1 DJI
Scores: [0.4806973  0.45519643 0.47433217 0.42198794 0.50053706]
Mean Scores: 0.46655017935352755
Standard deviation: 0.02657257738674441


Accuracy DJI
Scores: [0.51768489 0.48064516 0.52903226 0.47741935 0.53548387]
Mean Scores: 0.5080531065242195
Standard deviation: 0.024392478535694834


In [380]:
# NASDAQ
gnb = SVC()
gnb.fit(nasdaq_X_train_ti, nasdaq_y_train_ti)
print("Macro Average F1 NASDAQ")
analyse_cv(gnb, nasdaq_X_train_ti, nasdaq_y_train_ti, 5, "f1_macro")

print("\n")
print("Accuracy NASDAQ")
analyse_cv(gnb, nasdaq_X_train_ti, nasdaq_y_train_ti, 5, "accuracy")

Macro Average F1 NASDAQ
Scores: [0.48645117 0.4327612  0.42271881 0.42037296 0.44001204]
Mean Scores: 0.4404632345834433
Standard deviation: 0.024053685664135856


Accuracy NASDAQ
Scores: [0.55305466 0.54193548 0.52580645 0.51612903 0.53548387]
Mean Scores: 0.5344819002178197
Standard deviation: 0.012756163371596897


In [381]:
# NYSE
gnb = SVC()
gnb.fit(nyse_X_train_ti, nyse_y_train_ti)
print("Macro Average F1 NYSE")
analyse_cv(gnb, nyse_X_train_ti, nyse_y_train_ti, 5, "f1_macro")

print("\n")
print("Accuracy NYSE")
analyse_cv(gnb, nyse_X_train_ti, nyse_y_train_ti, 5, "accuracy")

Macro Average F1 NYSE
Scores: [0.45142469 0.44552846 0.49171196 0.49315487 0.440899  ]
Mean Scores: 0.4645437958882431
Standard deviation: 0.0230194392396398


Accuracy NYSE
Scores: [0.49839228 0.50322581 0.52903226 0.51612903 0.49677419]
Mean Scores: 0.508710714656156
Standard deviation: 0.01222318575015961


In [382]:
# Russell
gnb = SVC()
gnb.fit(russell_X_train_ti, russell_y_train_ti)
print("Macro Average F1 RUSSELL")
analyse_cv(gnb, russell_X_train_ti, russell_y_train_ti, 5, "f1_macro")

print("\n")
print("Accuracy RUSSELL")
analyse_cv(gnb, russell_X_train_ti, russell_y_train_ti, 5, "accuracy")

Macro Average F1 RUSSELL
Scores: [0.46932471 0.44324587 0.52410811 0.49462688 0.49226825]
Mean Scores: 0.4847147612098244
Standard deviation: 0.027069036463127735


Accuracy RUSSELL
Scores: [0.50803859 0.48709677 0.54193548 0.52580645 0.53225806]
Mean Scores: 0.5190270718805103
Standard deviation: 0.01942217930775125


In [383]:
# S&P 500
gnb = SVC()
gnb.fit(sp_X_train_ti, sp_y_train_ti)
print("Macro Average F1 S&P 500")
analyse_cv(gnb, sp_X_train_ti, sp_y_train_ti, 5, "f1_macro")

print("\n")
print("Accuracy S&P 500")
analyse_cv(gnb, sp_X_train_ti, sp_y_train_ti, 5, "accuracy")

Macro Average F1 S&P 500
Scores: [0.43925104 0.45443947 0.49477115 0.45272938 0.52758975]
Mean Scores: 0.4737561607066555
Standard deviation: 0.032697242516604635


Accuracy S&P 500
Scores: [0.49839228 0.49354839 0.53548387 0.50967742 0.58387097]
Mean Scores: 0.524194585623898
Standard deviation: 0.03317955686796456
