# Ensemble Machine Learning Analysis

**Author**: Maleakhi Agung Wijaya  
**Email**: *maw219@cam.ac.uk*  
**Description**: This file contains code implementation of ensemble machine learning algorithms, including voting classifier, bagging and pasting, boosting, and stacking.

In [53]:
import numpy as np
import pandas as pd
from sklearn.preprocessing import scale
from os.path import join
from sklearn.metrics import accuracy_score, f1_score, confusion_matrix, classification_report, mean_absolute_error as mae
import os
from pathlib2 import Path
import seaborn as sns
import matplotlib.pyplot as plt
from IPython.display import display, HTML
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.model_selection import GridSearchCV
from sklearn.naive_bayes import GaussianNB, MultinomialNB
from sklearn.model_selection import cross_val_score
from sklearn.dummy import DummyClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import VotingClassifier, RandomForestClassifier
from sklearn.ensemble import AdaBoostClassifier, GradientBoostingClassifier
from sklearn.ensemble import StackingClassifier

import warnings
warnings.filterwarnings('ignore')

import tensorflow as tf
import tensorflow.keras as keras
from tensorflow.keras import layers, models, backend as K, callbacks

In [54]:
%run Utilities.ipynb

## Load data and preprocessing

In this step, we will load all dataframes,fill missing values, and scale so that each column all features are on the same scale. Afterwards, we will generate sequential datasets using **full features, PCA features**, and only **technical indicator features**.

In [4]:
market_orders, n_markets, aggregated_datasets = load_aggregated_datasets([DATASET_DJI, 
                                                                          DATASET_NASDAQ, 
                                                                          DATASET_NYSE,
                                                                          DATASET_RUSSELL, 
                                                                          DATASET_SP])

# Load datasets
## DJI
dji_df = aggregated_datasets["DJI"]

## NASDAQ
nasdaq_df = aggregated_datasets["NASDAQ"]

## NYSE
nyse_df = aggregated_datasets["NYA"]

## Russell
russell_df = aggregated_datasets["RUT"]

## SP
sp_df = aggregated_datasets["S&P"]

In [5]:
# Fill missing values, do some scaling (run prev cell first)
list_df = []

for df in [dji_df, nasdaq_df, nyse_df, russell_df, sp_df]:
    columns = df.columns
    df.fillna(0, inplace=True) # fill na with 0
    y = df["MOVEMENT"].copy()
    X = df.drop(columns=["MOVEMENT"]).copy()
    scaler = StandardScaler()
    X = pd.DataFrame(scaler.fit_transform(X))
    X["MOVEMENT"] = np.array(y)
    X.columns = columns
    list_df.append(X)
    
### Clean dataframe (full features)
dji_df_full = list_df[0]
nasdaq_df_full = list_df[1]
nyse_df_full = list_df[2]
russell_df_full = list_df[3]
sp_df_full = list_df[4]

In [6]:
# PCA dataframe (30 features - explained 90% variance) - pca.explained_variance_ratio_.cumsum()
list_df_pca = []

for df in [dji_df_full, nasdaq_df_full, nyse_df_full, russell_df_full, sp_df_full]:
    pca = PCA(n_components=30)
    y = df["MOVEMENT"].copy()
    X = df.drop(columns=["MOVEMENT"]).copy()
    reduced_X = pd.DataFrame(pca.fit_transform(X))
    reduced_X["MOVEMENT"] = y
    list_df_pca.append(reduced_X)

### Clean dataframe (pca features)
dji_df_pca = list_df_pca[0]
nasdaq_df_pca = list_df_pca[1]
nyse_df_pca = list_df_pca[2]
russell_df_pca = list_df_pca[3]
sp_df_pca = list_df_pca[4]

In [7]:
# Technical indicator dataframe
ti_columns = ["Volume", "mom", "mom1", "mom2", "mom3", 
              "ROC_5", "ROC_10", "ROC_15", "ROC_20",
              "EMA_10", "EMA_20", "EMA_50", "EMA_200"]
list_df_ti = []

for df in [dji_df_full, nasdaq_df_full, nyse_df_full, russell_df_full, sp_df_full]:
    y = df["MOVEMENT"].copy()
    X = df[ti_columns].copy()
    X["MOVEMENT"] = np.array(y)
    list_df_ti.append(X)

### Clean dataframe (ti features)
dji_df_ti = list_df_ti[0]
nasdaq_df_ti = list_df_ti[1]
nyse_df_ti = list_df_ti[2]
russell_df_ti = list_df_ti[3]
sp_df_ti = list_df_ti[4]

In [8]:
# Build sequential dataset
sequence_length = 60

### Sequential dataset (full features)
dji_X_seq, dji_y_seq = generate_sequential_data(dji_df_full, sequence_length)
nasdaq_X_seq, nasdaq_y_seq = generate_sequential_data(nasdaq_df_full, sequence_length)
nyse_X_seq, nyse_y_seq = generate_sequential_data(nyse_df_full, sequence_length)
russell_X_seq, russell_y_seq = generate_sequential_data(russell_df_full, sequence_length)
sp_X_seq, sp_y_seq = generate_sequential_data(sp_df_full, sequence_length)

In [9]:
### Sequential dataset (PCA features)
dji_X_pca_seq, dji_y_pca_seq = generate_sequential_data(dji_df_pca, sequence_length)
nasdaq_X_pca_seq, nasdaq_y_pca_seq = generate_sequential_data(nasdaq_df_pca, sequence_length)
nyse_X_pca_seq, nyse_y_pca_seq = generate_sequential_data(nyse_df_pca, sequence_length)
russell_X_pca_seq, russell_y_pca_seq = generate_sequential_data(russell_df_pca, sequence_length)
sp_X_pca_seq, sp_y_pca_seq = generate_sequential_data(sp_df_pca, sequence_length)

In [10]:
### Sequential dataset (TI features)
dji_X_ti_seq, dji_y_ti_seq = generate_sequential_data(dji_df_ti, sequence_length)
nasdaq_X_ti_seq, nasdaq_y_ti_seq = generate_sequential_data(nasdaq_df_ti, sequence_length)
nyse_X_ti_seq, nyse_y_ti_seq = generate_sequential_data(nyse_df_ti, sequence_length)
russell_X_ti_seq, russell_y_ti_seq = generate_sequential_data(russell_df_ti, sequence_length)
sp_X_ti_seq, sp_y_ti_seq = generate_sequential_data(sp_df_ti, sequence_length)

## Experiments on ensemble machine learning classifiers

In this step, we will build voting ensembles, bagging and pasting ensembles, boosting ensembles, and stacking ensembles.

### Load data

In [11]:
# Sequential flatten (full features)
dji_X_seq_flatten = sequential_reshape(dji_X_seq, (len(dji_X_seq), -1))
nasdaq_X_seq_flatten = sequential_reshape(nasdaq_X_seq, (len(nasdaq_X_seq), -1))
nyse_X_seq_flatten = sequential_reshape(nyse_X_seq, (len(nyse_X_seq), -1))
russell_X_seq_flatten = sequential_reshape(russell_X_seq, (len(russell_X_seq), -1))
sp_X_seq_flatten = sequential_reshape(sp_X_seq, (len(sp_X_seq), -1))

In [12]:
# Sequential flatten (pca)
dji_X_pca_seq_flatten = sequential_reshape(dji_X_pca_seq, (len(dji_X_pca_seq), -1))
nasdaq_X_pca_seq_flatten = sequential_reshape(nasdaq_X_pca_seq, (len(nasdaq_X_pca_seq), -1))
nyse_X_pca_seq_flatten = sequential_reshape(nyse_X_pca_seq, (len(nyse_X_pca_seq), -1))
russell_X_pca_seq_flatten = sequential_reshape(russell_X_pca_seq, (len(russell_X_pca_seq), -1))
sp_X_pca_seq_flatten = sequential_reshape(sp_X_pca_seq, (len(sp_X_pca_seq), -1))

In [13]:
# Sequential flatten (technical indicator)
dji_X_ti_seq_flatten = sequential_reshape(dji_X_ti_seq, (len(dji_X_ti_seq), -1))
nasdaq_X_ti_seq_flatten = sequential_reshape(nasdaq_X_ti_seq, (len(nasdaq_X_ti_seq), -1))
nyse_X_ti_seq_flatten = sequential_reshape(nyse_X_ti_seq, (len(nyse_X_ti_seq), -1))
russell_X_ti_seq_flatten = sequential_reshape(russell_X_ti_seq, (len(russell_X_ti_seq), -1))
sp_X_ti_seq_flatten = sequential_reshape(sp_X_ti_seq, (len(sp_X_ti_seq), -1))

### Split into training and test (80/ 20)

In [14]:
## Full features
dji_X_train_full, dji_X_test_full, dji_y_train_full, dji_y_test_full = train_test_split(dji_X_seq_flatten,
                                                                                        dji_y_seq,
                                                                                        stratify=dji_y_seq,
                                                                                        test_size=0.2)
nasdaq_X_train_full, nasdaq_X_test_full, nasdaq_y_train_full, nasdaq_y_test_full = train_test_split(nasdaq_X_seq_flatten,
                                                                                        nasdaq_y_seq,
                                                                                        stratify=nasdaq_y_seq,
                                                                                        test_size=0.2)
nyse_X_train_full, nyse_X_test_full, nyse_y_train_full, nyse_y_test_full = train_test_split(nyse_X_seq_flatten,
                                                                                        nyse_y_seq,
                                                                                        stratify=nyse_y_seq,
                                                                                        test_size=0.2)
russell_X_train_full, russell_X_test_full, russell_y_train_full, russell_y_test_full = train_test_split(russell_X_seq_flatten,
                                                                                        russell_y_seq,
                                                                                        stratify=russell_y_seq,
                                                                                        test_size=0.2)
sp_X_train_full, sp_X_test_full, sp_y_train_full, sp_y_test_full = train_test_split(sp_X_seq_flatten,
                                                                                        sp_y_seq,
                                                                                        stratify=sp_y_seq,
                                                                                        test_size=0.2)

In [15]:
## pca features
dji_X_train_pca, dji_X_test_pca, dji_y_train_pca, dji_y_test_pca = train_test_split(dji_X_pca_seq_flatten,
                                                                                        dji_y_seq,
                                                                                        stratify=dji_y_seq,
                                                                                        test_size=0.2)
nasdaq_X_train_pca, nasdaq_X_test_pca, nasdaq_y_train_pca, nasdaq_y_test_pca = train_test_split(nasdaq_X_pca_seq_flatten,
                                                                                        nasdaq_y_seq,
                                                                                        stratify=nasdaq_y_seq,
                                                                                        test_size=0.2)
nyse_X_train_pca, nyse_X_test_pca, nyse_y_train_pca, nyse_y_test_pca = train_test_split(nyse_X_pca_seq_flatten,
                                                                                        nyse_y_seq,
                                                                                        stratify=nyse_y_seq,
                                                                                        test_size=0.2)
russell_X_train_pca, russell_X_test_pca, russell_y_train_pca, russell_y_test_pca = train_test_split(russell_X_pca_seq_flatten,
                                                                                        russell_y_seq,
                                                                                        stratify=russell_y_seq,
                                                                                        test_size=0.2)
sp_X_train_pca, sp_X_test_pca, sp_y_train_pca, sp_y_test_pca = train_test_split(sp_X_pca_seq_flatten,
                                                                                        sp_y_seq,
                                                                                        stratify=sp_y_seq,
                                                                                        test_size=0.2)

In [16]:
## ti features
dji_X_train_ti, dji_X_test_ti, dji_y_train_ti, dji_y_test_ti = train_test_split(dji_X_ti_seq_flatten,
                                                                                        dji_y_seq,
                                                                                        stratify=dji_y_seq,
                                                                                        test_size=0.2)
nasdaq_X_train_ti, nasdaq_X_test_ti, nasdaq_y_train_ti, nasdaq_y_test_ti = train_test_split(nasdaq_X_ti_seq_flatten,
                                                                                        nasdaq_y_seq,
                                                                                        stratify=nasdaq_y_seq,
                                                                                        test_size=0.2)
nyse_X_train_ti, nyse_X_test_ti, nyse_y_train_ti, nyse_y_test_ti = train_test_split(nyse_X_ti_seq_flatten,
                                                                                        nyse_y_seq,
                                                                                        stratify=nyse_y_seq,
                                                                                        test_size=0.2)
russell_X_train_ti, russell_X_test_ti, russell_y_train_ti, russell_y_test_ti = train_test_split(russell_X_ti_seq_flatten,
                                                                                        russell_y_seq,
                                                                                        stratify=russell_y_seq,
                                                                                        test_size=0.2)
sp_X_train_ti, sp_X_test_ti, sp_y_train_ti, sp_y_test_ti = train_test_split(sp_X_ti_seq_flatten,
                                                                                        sp_y_seq,
                                                                                        stratify=sp_y_seq,
                                                                                        test_size=0.2)

### Voting Clasifier

In this section, we will implement soft voting classifiers using gaussian NB, logistic regression, k-NN, decision tree, and SVC.

#### Full features

In [21]:
## DJI

# Component classifiers
gnb = GaussianNB()
lr = LogisticRegression()
knn = KNeighborsClassifier()
dt = DecisionTreeClassifier()
svc = SVC(probability=True)

# Voting classifier
voting_clf = VotingClassifier(
    estimators=[('lr', lr), 
                ('dt', dt), 
                ('svc', svc),
                ('knn', knn),
                ('nb', gnb)],
    voting='soft')

voting_clf.fit(dji_X_train_full, dji_y_train_full)
print("Macro Average F1 DJI")
analyse_cv(voting_clf, dji_X_train_full, dji_y_train_full, 5, "f1_macro")

print("\n")
print("Accuracy DJI")
analyse_cv(voting_clf, dji_X_train_full, dji_y_train_full, 5, "accuracy")

Macro Average F1 DJI
Scores: [0.54338235 0.49976359 0.48615567 0.52083333 0.50158092]
Mean Scores: 0.5103431743809022
Standard deviation: 0.019876471437634283


Accuracy DJI
Scores: [0.5326087  0.50724638 0.50724638 0.51811594 0.50545455]
Mean Scores: 0.5141343873517786
Standard deviation: 0.010270169215645763


In [22]:
## NASDAQ

# Component classifiers
gnb = GaussianNB()
lr = LogisticRegression()
knn = KNeighborsClassifier()
dt = DecisionTreeClassifier()
svc = SVC(probability=True)

# Voting classifier
voting_clf = VotingClassifier(
    estimators=[('lr', lr), 
                ('dt', dt), 
                ('svc', svc),
                ('knn', knn),
                ('nb', gnb)],
    voting='soft')

voting_clf.fit(nasdaq_X_train_full, nasdaq_y_train_full)
print("Macro Average F1 NASDAQ")
analyse_cv(voting_clf, nasdaq_X_train_full, nasdaq_y_train_full, 5, "f1_macro")

print("\n")
print("Accuracy NASDAQ")
analyse_cv(voting_clf, nasdaq_X_train_full, nasdaq_y_train_full, 5, "accuracy")

Macro Average F1 NASDAQ
Scores: [0.46306325 0.44711538 0.52597368 0.47341597 0.53184274]
Mean Scores: 0.48828220435238096
Standard deviation: 0.03426331908715754


Accuracy NASDAQ
Scores: [0.47826087 0.46014493 0.53985507 0.47101449 0.54909091]
Mean Scores: 0.4996732542819499
Standard deviation: 0.037145623069549515


In [23]:
## NYSE

# Component classifiers
gnb = GaussianNB()
lr = LogisticRegression()
knn = KNeighborsClassifier()
dt = DecisionTreeClassifier()
svc = SVC(probability=True)

# Voting classifier
voting_clf = VotingClassifier(
    estimators=[('lr', lr), 
                ('dt', dt), 
                ('svc', svc),
                ('knn', knn),
                ('nb', gnb)],
    voting='soft')

voting_clf.fit(nyse_X_train_full, nyse_y_train_full)
print("Macro Average F1 NYSE")
analyse_cv(voting_clf, nyse_X_train_full, nyse_y_train_full, 5, "f1_macro")

print("\n")
print("Accuracy NYSE")
analyse_cv(voting_clf, nyse_X_train_full, nyse_y_train_full, 5, "accuracy")

Macro Average F1 NYSE
Scores: [0.49397876 0.46366677 0.47019263 0.43475293 0.46061281]
Mean Scores: 0.46464077950763566
Standard deviation: 0.018984189935662792


Accuracy NYSE
Scores: [0.48550725 0.45652174 0.47463768 0.44927536 0.45090909]
Mean Scores: 0.4633702239789196
Standard deviation: 0.014267796320944996


In [24]:
## Russell

# Component classifiers
gnb = GaussianNB()
lr = LogisticRegression()
knn = KNeighborsClassifier()
dt = DecisionTreeClassifier()
svc = SVC(probability=True)

# Voting classifier
voting_clf = VotingClassifier(
    estimators=[('lr', lr), 
                ('dt', dt), 
                ('svc', svc),
                ('knn', knn),
                ('nb', gnb)],
    voting='soft')

voting_clf.fit(russell_X_train_full, russell_y_train_full)
print("Macro Average F1 RUSSELL")
analyse_cv(voting_clf, russell_X_train_full, russell_y_train_full, 5, "f1_macro")

print("\n")
print("Accuracy RUSSELL")
analyse_cv(voting_clf, russell_X_train_full, russell_y_train_full, 5, "accuracy")

Macro Average F1 RUSSELL
Scores: [0.51055473 0.46331722 0.47603227 0.49318714 0.4826482 ]
Mean Scores: 0.48514791250450867
Standard deviation: 0.015975345537801007


Accuracy RUSSELL
Scores: [0.49637681 0.47826087 0.5        0.50362319 0.52727273]
Mean Scores: 0.5011067193675889
Standard deviation: 0.015725916791130962


In [25]:
## S&P 500

# Component classifiers
gnb = GaussianNB()
lr = LogisticRegression()
knn = KNeighborsClassifier()
dt = DecisionTreeClassifier()
svc = SVC(probability=True)

# Voting classifier
voting_clf = VotingClassifier(
    estimators=[('lr', lr), 
                ('dt', dt), 
                ('svc', svc),
                ('knn', knn),
                ('nb', gnb)],
    voting='soft')

voting_clf.fit(sp_X_train_full, sp_y_train_full)
print("Macro Average F1 S&P 500")
analyse_cv(voting_clf, sp_X_train_full, sp_y_train_full, 5, "f1_macro")

print("\n")
print("Accuracy S&P 500")
analyse_cv(voting_clf, sp_X_train_full, sp_y_train_full, 5, "accuracy")

Macro Average F1 S&P 500
Scores: [0.46704848 0.47872212 0.48799453 0.5332664  0.51127321]
Mean Scores: 0.49566094765795815
Standard deviation: 0.0237582965907359


Accuracy S&P 500
Scores: [0.49275362 0.50362319 0.51086957 0.50724638 0.49090909]
Mean Scores: 0.5010803689064558
Standard deviation: 0.007913324702793854


#### PCA features

In [26]:
## DJI

# Component classifiers
gnb = GaussianNB()
lr = LogisticRegression()
knn = KNeighborsClassifier()
dt = DecisionTreeClassifier()
svc = SVC(probability=True)

# Voting classifier
voting_clf = VotingClassifier(
    estimators=[('lr', lr), 
                ('dt', dt), 
                ('svc', svc),
                ('knn', knn),
                ('nb', gnb)],
    voting='soft')

voting_clf.fit(dji_X_train_pca, dji_y_train_pca)
print("Macro Average F1 DJI")
analyse_cv(voting_clf, dji_X_train_pca, dji_y_train_pca, 5, "f1_macro")

print("\n")
print("Accuracy DJI")
analyse_cv(voting_clf, dji_X_train_pca, dji_y_train_pca, 5, "accuracy")

Macro Average F1 DJI
Scores: [0.48221741 0.46875494 0.47550676 0.49882699 0.50255373]
Mean Scores: 0.48557196679052794
Standard deviation: 0.0131106818976163


Accuracy DJI
Scores: [0.46376812 0.46376812 0.5        0.47101449 0.51272727]
Mean Scores: 0.4822555994729908
Standard deviation: 0.020264862179990385


In [27]:
## NASDAQ

# Component classifiers
gnb = GaussianNB()
lr = LogisticRegression()
knn = KNeighborsClassifier()
dt = DecisionTreeClassifier()
svc = SVC(probability=True)

# Voting classifier
voting_clf = VotingClassifier(
    estimators=[('lr', lr), 
                ('dt', dt), 
                ('svc', svc),
                ('knn', knn),
                ('nb', gnb)],
    voting='soft')

voting_clf.fit(nasdaq_X_train_pca, nasdaq_y_train_pca)
print("Macro Average F1 NASDAQ")
analyse_cv(voting_clf, nasdaq_X_train_pca, nasdaq_y_train_pca, 5, "f1_macro")

print("\n")
print("Accuracy NASDAQ")
analyse_cv(voting_clf, nasdaq_X_train_pca, nasdaq_y_train_pca, 5, "accuracy")

Macro Average F1 NASDAQ
Scores: [0.50287521 0.4817212  0.50944397 0.54204357 0.50235252]
Mean Scores: 0.5076872944892414
Standard deviation: 0.019541204735485342


Accuracy NASDAQ
Scores: [0.51811594 0.44202899 0.51811594 0.55434783 0.50545455]
Mean Scores: 0.5076126482213439
Standard deviation: 0.03663520010672331


In [28]:
## NYSE

# Component classifiers
gnb = GaussianNB()
lr = LogisticRegression()
knn = KNeighborsClassifier()
dt = DecisionTreeClassifier()
svc = SVC(probability=True)

# Voting classifier
voting_clf = VotingClassifier(
    estimators=[('lr', lr), 
                ('dt', dt), 
                ('svc', svc),
                ('knn', knn),
                ('nb', gnb)],
    voting='soft')

voting_clf.fit(nyse_X_train_pca, nyse_y_train_pca)
print("Macro Average F1 NYSE")
analyse_cv(voting_clf, nyse_X_train_pca, nyse_y_train_pca, 5, "f1_macro")

print("\n")
print("Accuracy NYSE")
analyse_cv(voting_clf, nyse_X_train_pca, nyse_y_train_pca, 5, "accuracy")

Macro Average F1 NYSE
Scores: [0.47463078 0.43371212 0.50802139 0.46923077 0.51266928]
Mean Scores: 0.479652868307786
Standard deviation: 0.02877238246550159


Accuracy NYSE
Scores: [0.47826087 0.44202899 0.51086957 0.4673913  0.53454545]
Mean Scores: 0.48661923583662714
Standard deviation: 0.032597137835410704


In [29]:
## RUSSELL

# Component classifiers
gnb = GaussianNB()
lr = LogisticRegression()
knn = KNeighborsClassifier()
dt = DecisionTreeClassifier()
svc = SVC(probability=True)

# Voting classifier
voting_clf = VotingClassifier(
    estimators=[('lr', lr), 
                ('dt', dt), 
                ('svc', svc),
                ('knn', knn),
                ('nb', gnb)],
    voting='soft')

voting_clf.fit(russell_X_train_pca, russell_y_train_pca)
print("Macro Average F1 RUSSELL")
analyse_cv(voting_clf, russell_X_train_pca, russell_y_train_pca, 5, "f1_macro")

print("\n")
print("Accuracy RUSSELL")
analyse_cv(voting_clf, russell_X_train_pca, russell_y_train_pca, 5, "accuracy")

Macro Average F1 RUSSELL
Scores: [0.47550676 0.50462009 0.5205054  0.51628079 0.43993335]
Mean Scores: 0.49136927506733424
Standard deviation: 0.03014085312127596


Accuracy RUSSELL
Scores: [0.46014493 0.48550725 0.52536232 0.5        0.46545455]
Mean Scores: 0.48729380764163366
Standard deviation: 0.023781781826258994


In [30]:
## s&p 500

# Component classifiers
gnb = GaussianNB()
lr = LogisticRegression()
knn = KNeighborsClassifier()
dt = DecisionTreeClassifier()
svc = SVC(probability=True)

# Voting classifier
voting_clf = VotingClassifier(
    estimators=[('lr', lr), 
                ('dt', dt), 
                ('svc', svc),
                ('knn', knn),
                ('nb', gnb)],
    voting='soft')

voting_clf.fit(sp_X_train_pca, sp_y_train_pca)
print("Macro Average F1 S&P 500")
analyse_cv(voting_clf, sp_X_train_pca, sp_y_train_pca, 5, "f1_macro")

print("\n")
print("Accuracy S&P 500")
analyse_cv(voting_clf, sp_X_train_pca, sp_y_train_pca, 5, "accuracy")

Macro Average F1 S&P 500
Scores: [0.53140917 0.45029348 0.4998322  0.53230785 0.50724754]
Mean Scores: 0.5042180485381227
Standard deviation: 0.029881991102915638


Accuracy S&P 500
Scores: [0.5326087  0.44927536 0.49275362 0.54347826 0.52      ]
Mean Scores: 0.5076231884057971
Standard deviation: 0.03373016465363661


#### Technical indicator features

In [29]:
## DJI

# Component classifiers
gnb = GaussianNB()
lr = LogisticRegression()
knn = KNeighborsClassifier()
dt = DecisionTreeClassifier()
svc = SVC(probability=True)

# Voting classifier
voting_clf = VotingClassifier(
    estimators=[('lr', lr), 
                ('dt', dt), 
                ('svc', svc),
                ('knn', knn),
                ('nb', gnb)],
    voting='soft')

voting_clf.fit(dji_X_train_ti, dji_y_train_ti)
print("Macro Average F1 DJI")
analyse_cv(voting_clf, dji_X_train_ti, dji_y_train_ti, 5, "f1_macro")

print("\n")
print("Accuracy DJI")
analyse_cv(voting_clf, dji_X_train_ti, dji_y_train_ti, 5, "accuracy")

Macro Average F1 DJI
Scores: [0.50298339 0.5072421  0.46185897 0.51530996 0.51298701]
Mean Scores: 0.500076287255127
Standard deviation: 0.019591400014698158


Accuracy DJI
Scores: [0.48913043 0.50362319 0.49275362 0.52536232 0.50909091]
Mean Scores: 0.5039920948616601
Standard deviation: 0.012884041736059134


In [28]:
## NASDAQ

# Component classifiers
gnb = GaussianNB()
lr = LogisticRegression()
knn = KNeighborsClassifier()
dt = DecisionTreeClassifier()
svc = SVC(probability=True)

# Voting classifier
voting_clf = VotingClassifier(
    estimators=[('lr', lr), 
                ('dt', dt), 
                ('svc', svc),
                ('knn', knn),
                ('nb', gnb)],
    voting='soft')

voting_clf.fit(nasdaq_X_train_ti, nasdaq_y_train_ti)
print("Macro Average F1 NASDAQ")
analyse_cv(voting_clf, nasdaq_X_train_ti, nasdaq_y_train_ti, 5, "f1_macro")

print("\n")
print("Accuracy NASDAQ")
analyse_cv(voting_clf, nasdaq_X_train_ti, nasdaq_y_train_ti, 5, "accuracy")

Macro Average F1 NASDAQ
Scores: [0.50243307 0.47757931 0.46946069 0.58325828 0.50735043]
Mean Scores: 0.508016355994574
Standard deviation: 0.040263920487037225


Accuracy NASDAQ
Scores: [0.50362319 0.51811594 0.5        0.56884058 0.52      ]
Mean Scores: 0.5221159420289856
Standard deviation: 0.024636419405031414


In [27]:
## NYSE

# Component classifiers
gnb = GaussianNB()
lr = LogisticRegression()
knn = KNeighborsClassifier()
dt = DecisionTreeClassifier()
svc = SVC(probability=True)

# Voting classifier
voting_clf = VotingClassifier(
    estimators=[('lr', lr), 
                ('dt', dt), 
                ('svc', svc),
                ('knn', knn),
                ('nb', gnb)],
    voting='soft')

voting_clf.fit(nyse_X_train_ti, nyse_y_train_ti)
print("Macro Average F1 NYSE")
analyse_cv(voting_clf, nyse_X_train_ti, nyse_y_train_ti, 5, "f1_macro")

print("\n")
print("Accuracy NYSE")
analyse_cv(voting_clf, nyse_X_train_ti, nyse_y_train_ti, 5, "accuracy")

Macro Average F1 NYSE
Scores: [0.50944397 0.49351015 0.4998322  0.53725782 0.485184  ]
Mean Scores: 0.5050456287466973
Standard deviation: 0.01795384384930819


Accuracy NYSE
Scores: [0.53985507 0.51811594 0.5        0.54710145 0.50909091]
Mean Scores: 0.522832674571805
Standard deviation: 0.01795076693692344


In [34]:
## RUSSELL

# Component classifiers
gnb = GaussianNB()
lr = LogisticRegression()
knn = KNeighborsClassifier()
dt = DecisionTreeClassifier()
svc = SVC(probability=True)

# Voting classifier
voting_clf = VotingClassifier(
    estimators=[('lr', lr), 
                ('dt', dt), 
                ('svc', svc),
                ('knn', knn),
                ('nb', gnb)],
    voting='soft')

voting_clf.fit(russell_X_train_ti, russell_y_train_ti)
print("Macro Average F1 RUSSELL")
analyse_cv(voting_clf, russell_X_train_ti, russell_y_train_ti, 5, "f1_macro")

print("\n")
print("Accuracy RUSSELL")
analyse_cv(voting_clf, russell_X_train_ti, russell_y_train_ti, 5, "accuracy")

Macro Average F1 RUSSELL
Scores: [0.4664107  0.4525464  0.50614339 0.46701826 0.5035574 ]
Mean Scores: 0.47913522796496116
Standard deviation: 0.021640624466269055


Accuracy RUSSELL
Scores: [0.49275362 0.46014493 0.50724638 0.4673913  0.53090909]
Mean Scores: 0.49168906455862976
Standard deviation: 0.025949132734227786


In [35]:
## s&p 500

# Component classifiers
gnb = GaussianNB()
lr = LogisticRegression()
knn = KNeighborsClassifier()
dt = DecisionTreeClassifier()
svc = SVC(probability=True)

# Voting classifier
voting_clf = VotingClassifier(
    estimators=[('lr', lr), 
                ('dt', dt), 
                ('svc', svc),
                ('knn', knn),
                ('nb', gnb)],
    voting='soft')

voting_clf.fit(sp_X_train_ti, sp_y_train_ti)
print("Macro Average F1 S&P 500")
analyse_cv(voting_clf, sp_X_train_ti, sp_y_train_ti, 5, "f1_macro")

print("\n")
print("Accuracy S&P 500")
analyse_cv(voting_clf, sp_X_train_ti, sp_y_train_ti, 5, "accuracy")

Macro Average F1 S&P 500
Scores: [0.48716029 0.43097267 0.47691498 0.48880161 0.49989303]
Mean Scores: 0.4767485151826625
Standard deviation: 0.024020016480244755


Accuracy S&P 500
Scores: [0.50724638 0.44927536 0.47826087 0.48550725 0.50909091]
Mean Scores: 0.48587615283267455
Standard deviation: 0.02187856429701355


### Bagging Ensembles

In this section, we will implement bagging ensembles, involving random forest.

#### Full features

In [38]:
## DJI

# Voting classifier
voting_clf = RandomForestClassifier(n_jobs=-1, bootstrap=True)

voting_clf.fit(dji_X_train_full, dji_y_train_full)
print("Macro Average F1 DJI")
analyse_cv(voting_clf, dji_X_train_full, dji_y_train_full, 5, "f1_macro")

print("\n")
print("Accuracy DJI")
analyse_cv(voting_clf, dji_X_train_full, dji_y_train_full, 5, "accuracy")

Macro Average F1 DJI
Scores: [0.5051116  0.51608563 0.49745226 0.48608981 0.4967497 ]
Mean Scores: 0.5002977990628451
Standard deviation: 0.00994968486956317


Accuracy DJI
Scores: [0.53985507 0.50362319 0.5326087  0.52536232 0.45818182]
Mean Scores: 0.5119262187088274
Standard deviation: 0.029481232713534984


In [41]:
## NASDAQ

voting_clf = RandomForestClassifier(n_jobs=-1, bootstrap=True)

voting_clf.fit(nasdaq_X_train_full, nasdaq_y_train_full)
print("Macro Average F1 NASDAQ")
analyse_cv(voting_clf, nasdaq_X_train_full, nasdaq_y_train_full, 5, "f1_macro")

print("\n")
print("Accuracy NASDAQ")
analyse_cv(voting_clf, nasdaq_X_train_full, nasdaq_y_train_full, 5, "accuracy")

Macro Average F1 NASDAQ
Scores: [0.47274424 0.48854237 0.48623252 0.43495103 0.48065015]
Mean Scores: 0.47262406227552833
Standard deviation: 0.019607513456355367


Accuracy NASDAQ
Scores: [0.56521739 0.51811594 0.48913043 0.53623188 0.54545455]
Mean Scores: 0.5308300395256917
Standard deviation: 0.025791670637041474


In [42]:
## NYSE

voting_clf = RandomForestClassifier(n_jobs=-1, bootstrap=True)

voting_clf.fit(nyse_X_train_full, nyse_y_train_full)
print("Macro Average F1 NYSE")
analyse_cv(voting_clf, nyse_X_train_full, nyse_y_train_full, 5, "f1_macro")

print("\n")
print("Accuracy NYSE")
analyse_cv(voting_clf, nyse_X_train_full, nyse_y_train_full, 5, "accuracy")

Macro Average F1 NYSE
Scores: [0.4705314  0.46759259 0.49662456 0.47097481 0.47712004]
Mean Scores: 0.4765686813811443
Standard deviation: 0.010496216304393808


Accuracy NYSE
Scores: [0.5326087  0.47101449 0.51086957 0.49275362 0.50909091]
Mean Scores: 0.5032674571805007
Standard deviation: 0.020510520877558187


In [43]:
## Russell

voting_clf = RandomForestClassifier(n_jobs=-1, bootstrap=True)

voting_clf.fit(russell_X_train_full, russell_y_train_full)
print("Macro Average F1 RUSSELL")
analyse_cv(voting_clf, russell_X_train_full, russell_y_train_full, 5, "f1_macro")

print("\n")
print("Accuracy RUSSELL")
analyse_cv(voting_clf, russell_X_train_full, russell_y_train_full, 5, "accuracy")

Macro Average F1 RUSSELL
Scores: [0.49664733 0.4619883  0.52238642 0.47998645 0.48785333]
Mean Scores: 0.4897723661843427
Standard deviation: 0.01991151781990376


Accuracy RUSSELL
Scores: [0.55072464 0.48550725 0.5326087  0.53985507 0.51272727]
Mean Scores: 0.5242845849802371
Standard deviation: 0.023013303723969026


In [44]:
## S&P 500

voting_clf = RandomForestClassifier(n_jobs=-1, bootstrap=True)

voting_clf.fit(sp_X_train_full, sp_y_train_full)
print("Macro Average F1 S&P 500")
analyse_cv(voting_clf, sp_X_train_full, sp_y_train_full, 5, "f1_macro")

print("\n")
print("Accuracy S&P 500")
analyse_cv(voting_clf, sp_X_train_full, sp_y_train_full, 5, "accuracy")

Macro Average F1 S&P 500
Scores: [0.49564077 0.44098255 0.49064264 0.49948689 0.47989409]
Mean Scores: 0.48132938809963993
Standard deviation: 0.021218741828972293


Accuracy S&P 500
Scores: [0.51811594 0.55434783 0.54710145 0.55797101 0.56727273]
Mean Scores: 0.548961791831357
Standard deviation: 0.01673554982161777


#### PCA features

In [45]:
## DJI

voting_clf = RandomForestClassifier(n_jobs=-1, bootstrap=True)

voting_clf.fit(dji_X_train_pca, dji_y_train_pca)
print("Macro Average F1 DJI")
analyse_cv(voting_clf, dji_X_train_pca, dji_y_train_pca, 5, "f1_macro")

print("\n")
print("Accuracy DJI")
analyse_cv(voting_clf, dji_X_train_pca, dji_y_train_pca, 5, "accuracy")

Macro Average F1 DJI
Scores: [0.48690344 0.49928008 0.4754025  0.51003796 0.53340565]
Mean Scores: 0.501005926692042
Standard deviation: 0.01994348858488579


Accuracy DJI
Scores: [0.51086957 0.51449275 0.53623188 0.55434783 0.51636364]
Mean Scores: 0.5264611330698287
Standard deviation: 0.016501113639456565


In [46]:
## NASDAQ

voting_clf = RandomForestClassifier(n_jobs=-1, bootstrap=True)

voting_clf.fit(nasdaq_X_train_pca, nasdaq_y_train_pca)
print("Macro Average F1 NASDAQ")
analyse_cv(voting_clf, nasdaq_X_train_pca, nasdaq_y_train_pca, 5, "f1_macro")

print("\n")
print("Accuracy NASDAQ")
analyse_cv(voting_clf, nasdaq_X_train_pca, nasdaq_y_train_pca, 5, "accuracy")

Macro Average F1 NASDAQ
Scores: [0.50816751 0.4195758  0.47826087 0.50517514 0.51825899]
Mean Scores: 0.4858876622705891
Standard deviation: 0.03569799126891495


Accuracy NASDAQ
Scores: [0.52898551 0.52173913 0.55434783 0.52536232 0.57090909]
Mean Scores: 0.5402687747035573
Standard deviation: 0.019130660901620647


In [47]:
## NYSE

voting_clf = RandomForestClassifier(n_jobs=-1, bootstrap=True)

voting_clf.fit(nyse_X_train_pca, nyse_y_train_pca)
print("Macro Average F1 NYSE")
analyse_cv(voting_clf, nyse_X_train_pca, nyse_y_train_pca, 5, "f1_macro")

print("\n")
print("Accuracy NYSE")
analyse_cv(voting_clf, nyse_X_train_pca, nyse_y_train_pca, 5, "accuracy")

Macro Average F1 NYSE
Scores: [0.48448058 0.49870168 0.53688167 0.48072891 0.46456386]
Mean Scores: 0.49307134215149195
Standard deviation: 0.02445346043849067


Accuracy NYSE
Scores: [0.51449275 0.51811594 0.47463768 0.47101449 0.49818182]
Mean Scores: 0.49528853754940705
Standard deviation: 0.01956502237917566


In [48]:
## RUSSELL

voting_clf = RandomForestClassifier(n_jobs=-1, bootstrap=True)

voting_clf.fit(russell_X_train_pca, russell_y_train_pca)
print("Macro Average F1 RUSSELL")
analyse_cv(voting_clf, russell_X_train_pca, russell_y_train_pca, 5, "f1_macro")

print("\n")
print("Accuracy RUSSELL")
analyse_cv(voting_clf, russell_X_train_pca, russell_y_train_pca, 5, "accuracy")

Macro Average F1 RUSSELL
Scores: [0.49801587 0.42911449 0.50876232 0.52996759 0.49683965]
Mean Scores: 0.49253998539555016
Standard deviation: 0.03386814304174349


Accuracy RUSSELL
Scores: [0.52898551 0.47463768 0.50724638 0.47101449 0.50181818]
Mean Scores: 0.49674044795783934
Standard deviation: 0.02156969560319551


In [49]:
## s&p 500

voting_clf = RandomForestClassifier(n_jobs=-1, bootstrap=True)

voting_clf.fit(sp_X_train_pca, sp_y_train_pca)
print("Macro Average F1 S&P 500")
analyse_cv(voting_clf, sp_X_train_pca, sp_y_train_pca, 5, "f1_macro")

print("\n")
print("Accuracy S&P 500")
analyse_cv(voting_clf, sp_X_train_pca, sp_y_train_pca, 5, "accuracy")

Macro Average F1 S&P 500
Scores: [0.50144928 0.54921603 0.44360488 0.52758292 0.47222849]
Mean Scores: 0.49881631863830656
Standard deviation: 0.03776187454044956


Accuracy S&P 500
Scores: [0.52536232 0.48550725 0.52173913 0.55072464 0.48      ]
Mean Scores: 0.5126666666666666
Standard deviation: 0.02644482392558829


#### Technical indicator features

In [26]:
## DJI

voting_clf = RandomForestClassifier(n_jobs=-1, bootstrap=True)

voting_clf.fit(dji_X_train_ti, dji_y_train_ti)
print("Macro Average F1 DJI")
analyse_cv(voting_clf, dji_X_train_ti, dji_y_train_ti, 5, "f1_macro")

print("\n")
print("Accuracy DJI")
analyse_cv(voting_clf, dji_X_train_ti, dji_y_train_ti, 5, "accuracy")

Macro Average F1 DJI
Scores: [0.48461843 0.48608981 0.47331076 0.51465199 0.50374381]
Mean Scores: 0.49248295811773096
Standard deviation: 0.014755128931377802


Accuracy DJI
Scores: [0.49275362 0.50362319 0.46014493 0.48188406 0.54181818]
Mean Scores: 0.49604479578392624
Standard deviation: 0.02702885035081427


In [25]:
## NASDAQ

voting_clf = RandomForestClassifier(n_jobs=-1, bootstrap=True)

voting_clf.fit(nasdaq_X_train_ti, nasdaq_y_train_ti)
print("Macro Average F1 NASDAQ")
analyse_cv(voting_clf, nasdaq_X_train_ti, nasdaq_y_train_ti, 5, "f1_macro")

print("\n")
print("Accuracy NASDAQ")
analyse_cv(voting_clf, nasdaq_X_train_ti, nasdaq_y_train_ti, 5, "accuracy")

Macro Average F1 NASDAQ
Scores: [0.46068743 0.4488609  0.47836789 0.45386992 0.48531944]
Mean Scores: 0.46542111376875706
Standard deviation: 0.014097097353980029


Accuracy NASDAQ
Scores: [0.51449275 0.52898551 0.55434783 0.50724638 0.56363636]
Mean Scores: 0.5337417654808959
Standard deviation: 0.02197020165226752


In [24]:
## NYSE

voting_clf = RandomForestClassifier(n_jobs=-1, bootstrap=True)

voting_clf.fit(nyse_X_train_ti, nyse_y_train_ti)
print("Macro Average F1 NYSE")
analyse_cv(voting_clf, nyse_X_train_ti, nyse_y_train_ti, 5, "f1_macro")

print("\n")
print("Accuracy NYSE")
analyse_cv(voting_clf, nyse_X_train_ti, nyse_y_train_ti, 5, "accuracy")

Macro Average F1 NYSE
Scores: [0.50090416 0.48907812 0.47256347 0.50929726 0.43110031]
Mean Scores: 0.480588664316017
Standard deviation: 0.02764949988785081


Accuracy NYSE
Scores: [0.55797101 0.5        0.48913043 0.48550725 0.52363636]
Mean Scores: 0.5112490118577074
Standard deviation: 0.02688792377255191


In [53]:
## RUSSELL

voting_clf = RandomForestClassifier(n_jobs=-1, bootstrap=True)

voting_clf.fit(russell_X_train_ti, russell_y_train_ti)
print("Macro Average F1 RUSSELL")
analyse_cv(voting_clf, russell_X_train_ti, russell_y_train_ti, 5, "f1_macro")

print("\n")
print("Accuracy RUSSELL")
analyse_cv(voting_clf, russell_X_train_ti, russell_y_train_ti, 5, "accuracy")

Macro Average F1 RUSSELL
Scores: [0.50062112 0.47209052 0.51670259 0.5279874  0.54495726]
Mean Scores: 0.512471778145746
Standard deviation: 0.02483958785224264


Accuracy RUSSELL
Scores: [0.47826087 0.49275362 0.50724638 0.51449275 0.54545455]
Mean Scores: 0.5076416337285903
Standard deviation: 0.022635337530966245


In [54]:
## s&p 500

voting_clf = RandomForestClassifier(n_jobs=-1, bootstrap=True)

voting_clf.fit(sp_X_train_ti, sp_y_train_ti)
print("Macro Average F1 S&P 500")
analyse_cv(voting_clf, sp_X_train_ti, sp_y_train_ti, 5, "f1_macro")

print("\n")
print("Accuracy S&P 500")
analyse_cv(voting_clf, sp_X_train_ti, sp_y_train_ti, 5, "accuracy")

Macro Average F1 S&P 500
Scores: [0.48448058 0.51040155 0.50098682 0.54217745 0.51509892]
Mean Scores: 0.5106290640741865
Standard deviation: 0.01893170927303156


Accuracy S&P 500
Scores: [0.52898551 0.48550725 0.52173913 0.5615942  0.56727273]
Mean Scores: 0.5330197628458498
Standard deviation: 0.029633532706751186


### Pasting Ensembles

In this section, we will implement pasting ensembles, involving random forest.

#### Full features

In [55]:
## DJI

# Voting classifier
voting_clf = RandomForestClassifier(n_jobs=-1, bootstrap=False)

voting_clf.fit(dji_X_train_full, dji_y_train_full)
print("Macro Average F1 DJI")
analyse_cv(voting_clf, dji_X_train_full, dji_y_train_full, 5, "f1_macro")

print("\n")
print("Accuracy DJI")
analyse_cv(voting_clf, dji_X_train_full, dji_y_train_full, 5, "accuracy")

Macro Average F1 DJI
Scores: [0.50221976 0.47689941 0.47826087 0.49187784 0.5042609 ]
Mean Scores: 0.4907037556796249
Standard deviation: 0.011516754436219646


Accuracy DJI
Scores: [0.53985507 0.46376812 0.49275362 0.47463768 0.47272727]
Mean Scores: 0.48874835309617914
Standard deviation: 0.027231368970364128


In [56]:
## NASDAQ

voting_clf = RandomForestClassifier(n_jobs=-1, bootstrap=False)

voting_clf.fit(nasdaq_X_train_full, nasdaq_y_train_full)
print("Macro Average F1 NASDAQ")
analyse_cv(voting_clf, nasdaq_X_train_full, nasdaq_y_train_full, 5, "f1_macro")

print("\n")
print("Accuracy NASDAQ")
analyse_cv(voting_clf, nasdaq_X_train_full, nasdaq_y_train_full, 5, "accuracy")

Macro Average F1 NASDAQ
Scores: [0.46870784 0.47274424 0.43545455 0.49133214 0.49086047]
Mean Scores: 0.47181984717734426
Standard deviation: 0.020377679722002606


Accuracy NASDAQ
Scores: [0.52898551 0.54710145 0.46376812 0.56521739 0.54909091]
Mean Scores: 0.530832674571805
Standard deviation: 0.03544407412875024


In [57]:
## NYSE

voting_clf = RandomForestClassifier(n_jobs=-1, bootstrap=False)

voting_clf.fit(nyse_X_train_full, nyse_y_train_full)
print("Macro Average F1 NYSE")
analyse_cv(voting_clf, nyse_X_train_full, nyse_y_train_full, 5, "f1_macro")

print("\n")
print("Accuracy NYSE")
analyse_cv(voting_clf, nyse_X_train_full, nyse_y_train_full, 5, "accuracy")

Macro Average F1 NYSE
Scores: [0.47256347 0.46289421 0.50263429 0.45753982 0.47565491]
Mean Scores: 0.4742573413945886
Standard deviation: 0.015611983910275207


Accuracy NYSE
Scores: [0.50724638 0.50362319 0.50362319 0.48550725 0.53090909]
Mean Scores: 0.5061818181818182
Standard deviation: 0.014512769505418407


In [58]:
## Russell

voting_clf = RandomForestClassifier(n_jobs=-1, bootstrap=False)

voting_clf.fit(russell_X_train_full, russell_y_train_full)
print("Macro Average F1 RUSSELL")
analyse_cv(voting_clf, russell_X_train_full, russell_y_train_full, 5, "f1_macro")

print("\n")
print("Accuracy RUSSELL")
analyse_cv(voting_clf, russell_X_train_full, russell_y_train_full, 5, "accuracy")

Macro Average F1 RUSSELL
Scores: [0.47998645 0.45415591 0.45905923 0.48927269 0.49744152]
Mean Scores: 0.4759831591642353
Standard deviation: 0.016828233966638833


Accuracy RUSSELL
Scores: [0.51086957 0.50362319 0.51086957 0.50724638 0.57454545]
Mean Scores: 0.5214308300395257
Standard deviation: 0.02669290085256005


In [59]:
## S&P 500

voting_clf = RandomForestClassifier(n_jobs=-1, bootstrap=False)

voting_clf.fit(sp_X_train_full, sp_y_train_full)
print("Macro Average F1 S&P 500")
analyse_cv(voting_clf, sp_X_train_full, sp_y_train_full, 5, "f1_macro")

print("\n")
print("Accuracy S&P 500")
analyse_cv(voting_clf, sp_X_train_full, sp_y_train_full, 5, "accuracy")

Macro Average F1 S&P 500
Scores: [0.52530713 0.50811881 0.47393969 0.54420088 0.53521127]
Mean Scores: 0.5173555563011893
Standard deviation: 0.024791047161244868


Accuracy S&P 500
Scores: [0.54347826 0.4673913  0.53985507 0.47463768 0.55272727]
Mean Scores: 0.5156179183135705
Standard deviation: 0.03673126364679249


#### PCA features

In [60]:
## DJI

voting_clf = RandomForestClassifier(n_jobs=-1, bootstrap=False)

voting_clf.fit(dji_X_train_pca, dji_y_train_pca)
print("Macro Average F1 DJI")
analyse_cv(voting_clf, dji_X_train_pca, dji_y_train_pca, 5, "f1_macro")

print("\n")
print("Accuracy DJI")
analyse_cv(voting_clf, dji_X_train_pca, dji_y_train_pca, 5, "accuracy")

Macro Average F1 DJI
Scores: [0.46392897 0.49927292 0.47465909 0.48690344 0.49845253]
Mean Scores: 0.48464338974420995
Standard deviation: 0.013701071695696666


Accuracy DJI
Scores: [0.42028986 0.52898551 0.52898551 0.54710145 0.44727273]
Mean Scores: 0.49452700922266135
Standard deviation: 0.050760161815896376


In [61]:
## NASDAQ

voting_clf = RandomForestClassifier(n_jobs=-1, bootstrap=False)

voting_clf.fit(nasdaq_X_train_pca, nasdaq_y_train_pca)
print("Macro Average F1 NASDAQ")
analyse_cv(voting_clf, nasdaq_X_train_pca, nasdaq_y_train_pca, 5, "f1_macro")

print("\n")
print("Accuracy NASDAQ")
analyse_cv(voting_clf, nasdaq_X_train_pca, nasdaq_y_train_pca, 5, "accuracy")

Macro Average F1 NASDAQ
Scores: [0.46747183 0.4751126  0.46996102 0.4327853  0.5255848 ]
Mean Scores: 0.4741831098618466
Standard deviation: 0.029730937495421227


Accuracy NASDAQ
Scores: [0.55434783 0.54347826 0.55072464 0.53623188 0.53818182]
Mean Scores: 0.5445928853754941
Standard deviation: 0.007000110970618192


In [62]:
## NYSE

voting_clf = RandomForestClassifier(n_jobs=-1, bootstrap=False)

voting_clf.fit(nyse_X_train_pca, nyse_y_train_pca)
print("Macro Average F1 NYSE")
analyse_cv(voting_clf, nyse_X_train_pca, nyse_y_train_pca, 5, "f1_macro")

print("\n")
print("Accuracy NYSE")
analyse_cv(voting_clf, nyse_X_train_pca, nyse_y_train_pca, 5, "accuracy")

Macro Average F1 NYSE
Scores: [0.48105956 0.47256347 0.54009662 0.51465199 0.52173913]
Mean Scores: 0.5060221540247467
Standard deviation: 0.025397563141138398


Accuracy NYSE
Scores: [0.51449275 0.51086957 0.54710145 0.49637681 0.48363636]
Mean Scores: 0.5104953886693016
Standard deviation: 0.021339466029419683


In [63]:
## RUSSELL

voting_clf = RandomForestClassifier(n_jobs=-1, bootstrap=False)

voting_clf.fit(russell_X_train_pca, russell_y_train_pca)
print("Macro Average F1 RUSSELL")
analyse_cv(voting_clf, russell_X_train_pca, russell_y_train_pca, 5, "f1_macro")

print("\n")
print("Accuracy RUSSELL")
analyse_cv(voting_clf, russell_X_train_pca, russell_y_train_pca, 5, "accuracy")

Macro Average F1 RUSSELL
Scores: [0.46931824 0.45753982 0.49966606 0.49504092 0.49696736]
Mean Scores: 0.48370648222308726
Standard deviation: 0.017033757658192768


Accuracy RUSSELL
Scores: [0.46376812 0.54710145 0.49637681 0.53623188 0.47636364]
Mean Scores: 0.5039683794466404
Standard deviation: 0.032671425961019115


In [17]:
## s&p 500

voting_clf = RandomForestClassifier(n_jobs=-1, bootstrap=False)

voting_clf.fit(sp_X_train_pca, sp_y_train_pca)
print("Macro Average F1 S&P 500")
analyse_cv(voting_clf, sp_X_train_pca, sp_y_train_pca, 5, "f1_macro")

print("\n")
print("Accuracy S&P 500")
analyse_cv(voting_clf, sp_X_train_pca, sp_y_train_pca, 5, "accuracy")

Macro Average F1 S&P 500
Scores: [0.51014199 0.51791751 0.42732932 0.51071716 0.51470588]
Mean Scores: 0.49616237217553716
Standard deviation: 0.03453233035482092


Accuracy S&P 500
Scores: [0.51811594 0.52536232 0.56521739 0.55797101 0.49454545]
Mean Scores: 0.5322424242424243
Standard deviation: 0.026142928999657363


#### Technical indicator features

In [19]:
## DJI

voting_clf = RandomForestClassifier(n_jobs=-1, bootstrap=False)

voting_clf.fit(dji_X_train_ti, dji_y_train_ti)
print("Macro Average F1 DJI")
analyse_cv(voting_clf, dji_X_train_ti, dji_y_train_ti, 5, "f1_macro")

print("\n")
print("Accuracy DJI")
analyse_cv(voting_clf, dji_X_train_ti, dji_y_train_ti, 5, "accuracy")

Macro Average F1 DJI
Scores: [0.49870168 0.47393969 0.46380669 0.46380669 0.51933982]
Mean Scores: 0.4839189128399745
Standard deviation: 0.021825485263393396


Accuracy DJI
Scores: [0.52898551 0.48913043 0.47826087 0.52173913 0.55636364]
Mean Scores: 0.5148959156785244
Standard deviation: 0.02818121589672653


In [20]:
## NASDAQ

voting_clf = RandomForestClassifier(n_jobs=-1, bootstrap=False)

voting_clf.fit(nasdaq_X_train_ti, nasdaq_y_train_ti)
print("Macro Average F1 NASDAQ")
analyse_cv(voting_clf, nasdaq_X_train_ti, nasdaq_y_train_ti, 5, "f1_macro")

print("\n")
print("Accuracy NASDAQ")
analyse_cv(voting_clf, nasdaq_X_train_ti, nasdaq_y_train_ti, 5, "accuracy")

Macro Average F1 NASDAQ
Scores: [0.45627081 0.46348884 0.51690821 0.52238642 0.47013487]
Mean Scores: 0.4858378322329423
Standard deviation: 0.02800513073142719


Accuracy NASDAQ
Scores: [0.50362319 0.46376812 0.5326087  0.54710145 0.52      ]
Mean Scores: 0.5134202898550725
Standard deviation: 0.028660877773086388


In [21]:
## NYSE

voting_clf = RandomForestClassifier(n_jobs=-1, bootstrap=False)

voting_clf.fit(nyse_X_train_ti, nyse_y_train_ti)
print("Macro Average F1 NYSE")
analyse_cv(voting_clf, nyse_X_train_ti, nyse_y_train_ti, 5, "f1_macro")

print("\n")
print("Accuracy NYSE")
analyse_cv(voting_clf, nyse_X_train_ti, nyse_y_train_ti, 5, "accuracy")

Macro Average F1 NYSE
Scores: [0.49935023 0.46923077 0.48927269 0.48397436 0.47933571]
Mean Scores: 0.48423275148055617
Standard deviation: 0.010030416347717114


Accuracy NYSE
Scores: [0.52536232 0.5        0.5326087  0.45289855 0.50909091]
Mean Scores: 0.5039920948616601
Standard deviation: 0.028028846905014018


In [22]:
## RUSSELL

voting_clf = RandomForestClassifier(n_jobs=-1, bootstrap=False)

voting_clf.fit(russell_X_train_ti, russell_y_train_ti)
print("Macro Average F1 RUSSELL")
analyse_cv(voting_clf, russell_X_train_ti, russell_y_train_ti, 5, "f1_macro")

print("\n")
print("Accuracy RUSSELL")
analyse_cv(voting_clf, russell_X_train_ti, russell_y_train_ti, 5, "accuracy")

Macro Average F1 RUSSELL
Scores: [0.55911135 0.54942996 0.53557692 0.47554156 0.49421184]
Mean Scores: 0.5227743264458506
Standard deviation: 0.03237764278663983


Accuracy RUSSELL
Scores: [0.52898551 0.53985507 0.53623188 0.48188406 0.54909091]
Mean Scores: 0.5272094861660079
Standard deviation: 0.023569725826537786


In [23]:
## s&p 500

voting_clf = RandomForestClassifier(n_jobs=-1, bootstrap=False)

voting_clf.fit(sp_X_train_ti, sp_y_train_ti)
print("Macro Average F1 S&P 500")
analyse_cv(voting_clf, sp_X_train_ti, sp_y_train_ti, 5, "f1_macro")

print("\n")
print("Accuracy S&P 500")
analyse_cv(voting_clf, sp_X_train_ti, sp_y_train_ti, 5, "accuracy")

Macro Average F1 S&P 500
Scores: [0.53671215 0.53265602 0.55046419 0.49664733 0.49544069]
Mean Scores: 0.5223840776734693
Standard deviation: 0.022305272221735888


Accuracy S&P 500
Scores: [0.49275362 0.5326087  0.52173913 0.54347826 0.53818182]
Mean Scores: 0.5257523056653491
Standard deviation: 0.018004060506428683


### Adaptive Boosting

In [33]:
param_grid_clf = {
    "n_estimators": [50],
    "learning_rate": [0.1, 0.5, 1]
}

#### Full features

In [34]:
## DJI
clf = AdaBoostClassifier(
    DecisionTreeClassifier(max_depth=1), algorithm="SAMME.R"
)
print("Macro Average F1 DJI")
grid_search = GridSearchCV(clf, param_grid_clf, cv=5, scoring="f1_macro")
grid_search.fit(dji_X_train_full, dji_y_train_full)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)
    

print("\n")
print("Accuracy DJI")
grid_search = GridSearchCV(clf, param_grid_clf, cv=5, scoring="accuracy")
grid_search.fit(dji_X_train_full, dji_y_train_full)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)

Macro Average F1 DJI
0.4525734953470188 0.015634611839286546 {'learning_rate': 0.1, 'n_estimators': 50}
0.5029472396455452 0.03426024037810419 {'learning_rate': 0.5, 'n_estimators': 50}
0.48876032697044797 0.014457222983612569 {'learning_rate': 1, 'n_estimators': 50}


Accuracy DJI
0.501836627140975 0.01962034456496461 {'learning_rate': 0.1, 'n_estimators': 50}
0.510566534914361 0.0374229106145649 {'learning_rate': 0.5, 'n_estimators': 50}
0.49311462450592886 0.012038030445193902 {'learning_rate': 1, 'n_estimators': 50}


In [35]:
## nasdaq
clf = AdaBoostClassifier(
    DecisionTreeClassifier(max_depth=1), algorithm="SAMME.R"
)
print("Macro Average F1 NASDAQ")
grid_search = GridSearchCV(clf, param_grid_clf, cv=5, scoring="f1_macro")
grid_search.fit(nasdaq_X_train_full, nasdaq_y_train_full)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)
    

print("\n")
print("Accuracy NASDAQ")
grid_search = GridSearchCV(clf, param_grid_clf, cv=5, scoring="accuracy")
grid_search.fit(nasdaq_X_train_full, nasdaq_y_train_full)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)

Macro Average F1 NASDAQ
0.45050425118109516 0.02683934338252419 {'learning_rate': 0.1, 'n_estimators': 50}
0.4969492079423368 0.021205308762264376 {'learning_rate': 0.5, 'n_estimators': 50}
0.5208865393124809 0.019540334429361858 {'learning_rate': 1, 'n_estimators': 50}


Accuracy NASDAQ
0.5452990777338603 0.018026666887296393 {'learning_rate': 0.1, 'n_estimators': 50}
0.5141291172595521 0.01834750573854238 {'learning_rate': 0.5, 'n_estimators': 50}
0.5308010540184454 0.01930337809325062 {'learning_rate': 1, 'n_estimators': 50}


In [36]:
## nyse
clf = AdaBoostClassifier(
    DecisionTreeClassifier(max_depth=1), algorithm="SAMME.R"
)
print("Macro Average F1 NYSE")
grid_search = GridSearchCV(clf, param_grid_clf, cv=5, scoring="f1_macro")
grid_search.fit(nyse_X_train_full, nyse_y_train_full)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)
    

print("\n")
print("Accuracy NYSE")
grid_search = GridSearchCV(clf, param_grid_clf, cv=5, scoring="accuracy")
grid_search.fit(nyse_X_train_full, nyse_y_train_full)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)

Macro Average F1 NYSE
0.46333929125067075 0.025003528355934224 {'learning_rate': 0.1, 'n_estimators': 50}
0.47883455598761815 0.02007962763750959 {'learning_rate': 0.5, 'n_estimators': 50}
0.4732572786665676 0.021939830532237043 {'learning_rate': 1, 'n_estimators': 50}


Accuracy NYSE
0.5018471673254282 0.028861978241132732 {'learning_rate': 0.1, 'n_estimators': 50}
0.4873306982872201 0.01784301897026557 {'learning_rate': 0.5, 'n_estimators': 50}
0.4757101449275362 0.02160297458261375 {'learning_rate': 1, 'n_estimators': 50}


In [37]:
## russell
clf = AdaBoostClassifier(
    DecisionTreeClassifier(max_depth=1), algorithm="SAMME.R"
)
print("Macro Average F1 RUSSELL")
grid_search = GridSearchCV(clf, param_grid_clf, cv=5, scoring="f1_macro")
grid_search.fit(russell_X_train_full, russell_y_train_full)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)
    

print("\n")
print("Accuracy RUSSELL")
grid_search = GridSearchCV(clf, param_grid_clf, cv=5, scoring="accuracy")
grid_search.fit(russell_X_train_full, russell_y_train_full)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)

Macro Average F1 RUSSELL
0.46411004613492235 0.025884652610953292 {'learning_rate': 0.1, 'n_estimators': 50}
0.4944139570249403 0.016738636706588515 {'learning_rate': 0.5, 'n_estimators': 50}
0.4787660282181959 0.01666708751367443 {'learning_rate': 1, 'n_estimators': 50}


Accuracy RUSSELL
0.5025375494071146 0.01949795707796024 {'learning_rate': 0.1, 'n_estimators': 50}
0.4996363636363636 0.014868450779465817 {'learning_rate': 0.5, 'n_estimators': 50}
0.4814888010540185 0.01603025489683728 {'learning_rate': 1, 'n_estimators': 50}


In [38]:
## S&P 500
clf = AdaBoostClassifier(
    DecisionTreeClassifier(max_depth=1), algorithm="SAMME.R"
)
print("Macro Average F1 S&P 500")
grid_search = GridSearchCV(clf, param_grid_clf, cv=5, scoring="f1_macro")
grid_search.fit(sp_X_train_full, sp_y_train_full)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)
    

print("\n")
print("Accuracy S&P 500")
grid_search = GridSearchCV(clf, param_grid_clf, cv=5, scoring="accuracy")
grid_search.fit(sp_X_train_full, sp_y_train_full)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)

Macro Average F1 S&P 500
0.46054065267987304 0.013366787063789359 {'learning_rate': 0.1, 'n_estimators': 50}
0.48418268690316396 0.013309336496289183 {'learning_rate': 0.5, 'n_estimators': 50}
0.49557105507554206 0.01639668597560913 {'learning_rate': 1, 'n_estimators': 50}


Accuracy S&P 500
0.5300974967061924 0.011671376288766855 {'learning_rate': 0.1, 'n_estimators': 50}
0.4952753623188405 0.012651966397687283 {'learning_rate': 0.5, 'n_estimators': 50}
0.5017971014492754 0.015306872870310337 {'learning_rate': 1, 'n_estimators': 50}


#### PCA features

In [39]:
## DJI
clf = AdaBoostClassifier(
    DecisionTreeClassifier(max_depth=1), algorithm="SAMME.R"
)
print("Macro Average F1 DJI")
grid_search = GridSearchCV(clf, param_grid_clf, cv=5, scoring="f1_macro")
grid_search.fit(dji_X_train_pca, dji_y_train_pca)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)
    

print("\n")
print("Accuracy DJI")
grid_search = GridSearchCV(clf, param_grid_clf, cv=5, scoring="accuracy")
grid_search.fit(dji_X_train_pca, dji_y_train_pca)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)

Macro Average F1 DJI
0.47042309173739316 0.029648651816772273 {'learning_rate': 0.1, 'n_estimators': 50}
0.49383639047911404 0.018430792963686358 {'learning_rate': 0.5, 'n_estimators': 50}
0.48910003531605584 0.0204456964037993 {'learning_rate': 1, 'n_estimators': 50}


Accuracy DJI
0.5083214756258234 0.024252247250855057 {'learning_rate': 0.1, 'n_estimators': 50}
0.4989328063241107 0.018132315034945456 {'learning_rate': 0.5, 'n_estimators': 50}
0.49455599472990774 0.021681750280016922 {'learning_rate': 1, 'n_estimators': 50}


In [40]:
## nasdaq
clf = AdaBoostClassifier(
    DecisionTreeClassifier(max_depth=1), algorithm="SAMME.R"
)
print("Macro Average F1 NASDAQ")
grid_search = GridSearchCV(clf, param_grid_clf, cv=5, scoring="f1_macro")
grid_search.fit(nasdaq_X_train_pca, nasdaq_y_train_pca)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)
    

print("\n")
print("Accuracy NASDAQ")
grid_search = GridSearchCV(clf, param_grid_clf, cv=5, scoring="accuracy")
grid_search.fit(nasdaq_X_train_pca, nasdaq_y_train_pca)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)

Macro Average F1 NASDAQ
0.45313376177982995 0.026172534199017834 {'learning_rate': 0.1, 'n_estimators': 50}
0.531478318207582 0.027134411824721903 {'learning_rate': 0.5, 'n_estimators': 50}
0.5285402751514464 0.029533223541838908 {'learning_rate': 1, 'n_estimators': 50}


Accuracy NASDAQ
0.542429512516469 0.014859269189753622 {'learning_rate': 0.1, 'n_estimators': 50}
0.5475230566534914 0.0278181382626351 {'learning_rate': 0.5, 'n_estimators': 50}
0.535931488801054 0.028368903763543764 {'learning_rate': 1, 'n_estimators': 50}


In [41]:
## nyse
clf = AdaBoostClassifier(
    DecisionTreeClassifier(max_depth=1), algorithm="SAMME.R"
)
print("Macro Average F1 NYSE")
grid_search = GridSearchCV(clf, param_grid_clf, cv=5, scoring="f1_macro")
grid_search.fit(nyse_X_train_pca, nyse_y_train_pca)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)
    

print("\n")
print("Accuracy NYSE")
grid_search = GridSearchCV(clf, param_grid_clf, cv=5, scoring="accuracy")
grid_search.fit(nyse_X_train_pca, nyse_y_train_pca)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)

Macro Average F1 NYSE
0.4594557349620791 0.020376460124105704 {'learning_rate': 0.1, 'n_estimators': 50}
0.5079171088734994 0.016133227203335612 {'learning_rate': 0.5, 'n_estimators': 50}
0.4948832798873523 0.03166002018280091 {'learning_rate': 1, 'n_estimators': 50}


Accuracy NYSE
0.5192332015810277 0.02432236119074618 {'learning_rate': 0.1, 'n_estimators': 50}
0.516300395256917 0.01644987216780363 {'learning_rate': 0.5, 'n_estimators': 50}
0.4989328063241107 0.03147072540332041 {'learning_rate': 1, 'n_estimators': 50}


In [42]:
## russell
clf = AdaBoostClassifier(
    DecisionTreeClassifier(max_depth=1), algorithm="SAMME.R"
)
print("Macro Average F1 RUSSELL")
grid_search = GridSearchCV(clf, param_grid_clf, cv=5, scoring="f1_macro")
grid_search.fit(russell_X_train_pca, russell_y_train_pca)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)
    

print("\n")
print("Accuracy RUSSELL")
grid_search = GridSearchCV(clf, param_grid_clf, cv=5, scoring="accuracy")
grid_search.fit(russell_X_train_pca, russell_y_train_pca)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)

Macro Average F1 RUSSELL
0.4687495051680909 0.0189579358870856 {'learning_rate': 0.1, 'n_estimators': 50}
0.5029524001962671 0.016635186055803657 {'learning_rate': 0.5, 'n_estimators': 50}
0.5097216105293654 0.02446853629561892 {'learning_rate': 1, 'n_estimators': 50}


Accuracy RUSSELL
0.5148906455862978 0.022224345711024125 {'learning_rate': 0.1, 'n_estimators': 50}
0.5134097496706193 0.010089916040107623 {'learning_rate': 0.5, 'n_estimators': 50}
0.515570487483531 0.0203027052373092 {'learning_rate': 1, 'n_estimators': 50}


In [43]:
## S&P 500
clf = AdaBoostClassifier(
    DecisionTreeClassifier(max_depth=1), algorithm="SAMME.R"
)
print("Macro Average F1 S&P 500")
grid_search = GridSearchCV(clf, param_grid_clf, cv=5, scoring="f1_macro")
grid_search.fit(sp_X_train_pca, sp_y_train_pca)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)
    

print("\n")
print("Accuracy S&P 500")
grid_search = GridSearchCV(clf, param_grid_clf, cv=5, scoring="accuracy")
grid_search.fit(sp_X_train_pca, sp_y_train_pca)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)

Macro Average F1 S&P 500
0.4561908136725415 0.02076493240189046 {'learning_rate': 0.1, 'n_estimators': 50}
0.49324404208628625 0.0302773115155978 {'learning_rate': 0.5, 'n_estimators': 50}
0.47950479431892196 0.011242360331885857 {'learning_rate': 1, 'n_estimators': 50}


Accuracy S&P 500
0.5293702239789196 0.01338379427898788 {'learning_rate': 0.1, 'n_estimators': 50}
0.5068537549407115 0.027513440524140276 {'learning_rate': 0.5, 'n_estimators': 50}
0.4865823451910408 0.009965598782160722 {'learning_rate': 1, 'n_estimators': 50}


#### Technical Indicator Features

In [44]:
## DJI
clf = AdaBoostClassifier(
    DecisionTreeClassifier(max_depth=1), algorithm="SAMME.R"
)
print("Macro Average F1 DJI")
grid_search = GridSearchCV(clf, param_grid_clf, cv=5, scoring="f1_macro")
grid_search.fit(dji_X_train_ti, dji_y_train_ti)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)
    

print("\n")
print("Accuracy DJI")
grid_search = GridSearchCV(clf, param_grid_clf, cv=5, scoring="accuracy")
grid_search.fit(dji_X_train_ti, dji_y_train_ti)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)

Macro Average F1 DJI
0.4768711047306596 0.009179383928448482 {'learning_rate': 0.1, 'n_estimators': 50}
0.5020036990344552 0.023927402432336015 {'learning_rate': 0.5, 'n_estimators': 50}
0.4896754194454747 0.022206548520241967 {'learning_rate': 1, 'n_estimators': 50}


Accuracy DJI
0.5235836627140975 0.018155544209525806 {'learning_rate': 0.1, 'n_estimators': 50}
0.5119683794466404 0.024242324855155147 {'learning_rate': 0.5, 'n_estimators': 50}
0.4953043478260869 0.022221243414696445 {'learning_rate': 1, 'n_estimators': 50}


In [45]:
## nasdaq
clf = AdaBoostClassifier(
    DecisionTreeClassifier(max_depth=1), algorithm="SAMME.R"
)
print("Macro Average F1 NASDAQ")
grid_search = GridSearchCV(clf, param_grid_clf, cv=5, scoring="f1_macro")
grid_search.fit(nasdaq_X_train_ti, nasdaq_y_train_ti)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)
    

print("\n")
print("Accuracy NASDAQ")
grid_search = GridSearchCV(clf, param_grid_clf, cv=5, scoring="accuracy")
grid_search.fit(nasdaq_X_train_ti, nasdaq_y_train_ti)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)

Macro Average F1 NASDAQ
0.4428518687603467 0.016750181224440875 {'learning_rate': 0.1, 'n_estimators': 50}
0.5019263844391861 0.023754866808525038 {'learning_rate': 0.5, 'n_estimators': 50}
0.5037566137667435 0.0153632331980624 {'learning_rate': 1, 'n_estimators': 50}


Accuracy NASDAQ
0.5373491436100133 0.015191641658413396 {'learning_rate': 0.1, 'n_estimators': 50}
0.5250171277997365 0.021870469066043497 {'learning_rate': 0.5, 'n_estimators': 50}
0.5170540184453227 0.01848412744854565 {'learning_rate': 1, 'n_estimators': 50}


In [46]:
## nyse
clf = AdaBoostClassifier(
    DecisionTreeClassifier(max_depth=1), algorithm="SAMME.R"
)
print("Macro Average F1 NYSE")
grid_search = GridSearchCV(clf, param_grid_clf, cv=5, scoring="f1_macro")
grid_search.fit(nyse_X_train_ti, nyse_y_train_ti)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)
    

print("\n")
print("Accuracy NYSE")
grid_search = GridSearchCV(clf, param_grid_clf, cv=5, scoring="accuracy")
grid_search.fit(nyse_X_train_ti, nyse_y_train_ti)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)

Macro Average F1 NYSE
0.4526473093628516 0.019523353043514886 {'learning_rate': 0.1, 'n_estimators': 50}
0.47492831283831477 0.023008561581899782 {'learning_rate': 0.5, 'n_estimators': 50}
0.49192677186306827 0.01924896723459381 {'learning_rate': 1, 'n_estimators': 50}


Accuracy NYSE
0.50832674571805 0.01651532888317859 {'learning_rate': 0.1, 'n_estimators': 50}
0.48656653491436097 0.021393412800154846 {'learning_rate': 0.5, 'n_estimators': 50}
0.498893280632411 0.014867732526305756 {'learning_rate': 1, 'n_estimators': 50}


In [47]:
## russell
clf = AdaBoostClassifier(
    DecisionTreeClassifier(max_depth=1), algorithm="SAMME.R"
)
print("Macro Average F1 RUSSELL")
grid_search = GridSearchCV(clf, param_grid_clf, cv=5, scoring="f1_macro")
grid_search.fit(russell_X_train_ti, russell_y_train_ti)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)
    

print("\n")
print("Accuracy RUSSELL")
grid_search = GridSearchCV(clf, param_grid_clf, cv=5, scoring="accuracy")
grid_search.fit(russell_X_train_ti, russell_y_train_ti)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)

Macro Average F1 RUSSELL
0.490332604098015 0.04419340395520677 {'learning_rate': 0.1, 'n_estimators': 50}
0.5113378744525638 0.03986675002336953 {'learning_rate': 0.5, 'n_estimators': 50}
0.5176534706831314 0.02889540811469937 {'learning_rate': 1, 'n_estimators': 50}


Accuracy RUSSELL
0.5344084321475625 0.039539541359247546 {'learning_rate': 0.1, 'n_estimators': 50}
0.517760210803689 0.037519036327385756 {'learning_rate': 0.5, 'n_estimators': 50}
0.5213675889328063 0.02669576440567771 {'learning_rate': 1, 'n_estimators': 50}


In [48]:
## S&P 500
clf = AdaBoostClassifier(
    DecisionTreeClassifier(max_depth=1), algorithm="SAMME.R"
)
print("Macro Average F1 S&P 500")
grid_search = GridSearchCV(clf, param_grid_clf, cv=5, scoring="f1_macro")
grid_search.fit(sp_X_train_ti, sp_y_train_ti)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)
    

print("\n")
print("Accuracy S&P 500")
grid_search = GridSearchCV(clf, param_grid_clf, cv=5, scoring="accuracy")
grid_search.fit(sp_X_train_ti, sp_y_train_ti)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)

Macro Average F1 S&P 500
0.4337769148557323 0.016727256860572214 {'learning_rate': 0.1, 'n_estimators': 50}
0.4917759103134884 0.0228506007282694 {'learning_rate': 0.5, 'n_estimators': 50}
0.49080706476965047 0.02407117271801259 {'learning_rate': 1, 'n_estimators': 50}


Accuracy S&P 500
0.515604743083004 0.02050610865372306 {'learning_rate': 0.1, 'n_estimators': 50}
0.5054202898550725 0.020126172063568537 {'learning_rate': 0.5, 'n_estimators': 50}
0.49961528326745724 0.022610555770794596 {'learning_rate': 1, 'n_estimators': 50}


### Gradient Boosting

In [70]:
param_grid_clf = {
    "loss": ["deviance"],
    "learning_rate": [0.1, 0.5]
}

#### Full features

In [50]:
## DJI
clf = GradientBoostingClassifier()
print("Macro Average F1 DJI")
grid_search = GridSearchCV(clf, param_grid_clf, cv=5, scoring="f1_macro")
grid_search.fit(dji_X_train_full, dji_y_train_full)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)
    

print("\n")
print("Accuracy DJI")
grid_search = GridSearchCV(clf, param_grid_clf, cv=5, scoring="accuracy")
grid_search.fit(dji_X_train_full, dji_y_train_full)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)

Macro Average F1 DJI
0.49505578694717756 0.03625403014782734 {'learning_rate': 0.1, 'loss': 'deviance'}
0.48680835213665546 0.02526723077949783 {'learning_rate': 0.1, 'loss': 'exponential'}
0.4944434764305966 0.03885566314045275 {'learning_rate': 0.5, 'loss': 'deviance'}
0.4823799790835753 0.04164366726360134 {'learning_rate': 0.5, 'loss': 'exponential'}


Accuracy DJI
0.5112569169960474 0.04088405336429025 {'learning_rate': 0.1, 'loss': 'deviance'}
0.5032885375494072 0.029654165058722877 {'learning_rate': 0.1, 'loss': 'exponential'}
0.5069117259552043 0.045480984920914964 {'learning_rate': 0.5, 'loss': 'deviance'}
0.48807641633728593 0.04366620058388847 {'learning_rate': 0.5, 'loss': 'exponential'}


In [51]:
## nasdaq
clf = GradientBoostingClassifier()
print("Macro Average F1 NASDAQ")
grid_search = GridSearchCV(clf, param_grid_clf, cv=5, scoring="f1_macro")
grid_search.fit(nasdaq_X_train_full, nasdaq_y_train_full)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)
    

print("\n")
print("Accuracy NASDAQ")
grid_search = GridSearchCV(clf, param_grid_clf, cv=5, scoring="accuracy")
grid_search.fit(nasdaq_X_train_full, nasdaq_y_train_full)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)

Macro Average F1 NASDAQ
0.4793343993108488 0.01600000495801843 {'learning_rate': 0.1, 'loss': 'deviance'}
0.4988765736644285 0.021571838833003953 {'learning_rate': 0.1, 'loss': 'exponential'}
0.515755973908832 0.02486537969575692 {'learning_rate': 0.5, 'loss': 'deviance'}
0.5023791520932057 0.024859412393335965 {'learning_rate': 0.5, 'loss': 'exponential'}


Accuracy NASDAQ
0.5105296442687747 0.023954013328815787 {'learning_rate': 0.1, 'loss': 'deviance'}
0.5271725955204216 0.021680421543581452 {'learning_rate': 0.1, 'loss': 'exponential'}
0.5279472990777339 0.027747885631482807 {'learning_rate': 0.5, 'loss': 'deviance'}
0.5250118577075099 0.027119349507837325 {'learning_rate': 0.5, 'loss': 'exponential'}


In [71]:
## nyse
clf = GradientBoostingClassifier()
print("Macro Average F1 NYSE")
grid_search = GridSearchCV(clf, param_grid_clf, cv=5, scoring="f1_macro")
grid_search.fit(nyse_X_train_full, nyse_y_train_full)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)
    

print("\n")
print("Accuracy NYSE")
grid_search = GridSearchCV(clf, param_grid_clf, cv=5, scoring="accuracy")
grid_search.fit(nyse_X_train_full, nyse_y_train_full)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)

Macro Average F1 NYSE
0.4851903430099333 0.02974151543370733 {'learning_rate': 0.1, 'loss': 'deviance'}
0.4846979648749768 0.029035269954887394 {'learning_rate': 0.5, 'loss': 'deviance'}


Accuracy NYSE
0.4909644268774704 0.023851521887487553 {'learning_rate': 0.1, 'loss': 'deviance'}
0.49168906455862976 0.033304508672132305 {'learning_rate': 0.5, 'loss': 'deviance'}


In [72]:
## russell
clf = GradientBoostingClassifier()
print("Macro Average F1 RUSSELL")
grid_search = GridSearchCV(clf, param_grid_clf, cv=5, scoring="f1_macro")
grid_search.fit(russell_X_train_full, russell_y_train_full)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)
    

print("\n")
print("Accuracy RUSSELL")
grid_search = GridSearchCV(clf, param_grid_clf, cv=5, scoring="accuracy")
grid_search.fit(russell_X_train_full, russell_y_train_full)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)

Macro Average F1 RUSSELL
0.49603918366733807 0.018832941030789793 {'learning_rate': 0.1, 'loss': 'deviance'}
0.49319364376712327 0.017298397073202746 {'learning_rate': 0.5, 'loss': 'deviance'}


Accuracy RUSSELL
0.511973649538867 0.01275359433352935 {'learning_rate': 0.1, 'loss': 'deviance'}
0.49528853754940716 0.01602392449271341 {'learning_rate': 0.5, 'loss': 'deviance'}


In [73]:
## S&P 500
clf = GradientBoostingClassifier()
print("Macro Average F1 S&P 500")
grid_search = GridSearchCV(clf, param_grid_clf, cv=5, scoring="f1_macro")
grid_search.fit(sp_X_train_full, sp_y_train_full)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)
    

print("\n")
print("Accuracy S&P 500")
grid_search = GridSearchCV(clf, param_grid_clf, cv=5, scoring="accuracy")
grid_search.fit(sp_X_train_full, sp_y_train_full)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)

Macro Average F1 S&P 500
0.49265686384880814 0.02571762830009158 {'learning_rate': 0.1, 'loss': 'deviance'}
0.5121984247970095 0.027841197679650556 {'learning_rate': 0.5, 'loss': 'deviance'}


Accuracy S&P 500
0.5061528326745718 0.029295284410635548 {'learning_rate': 0.1, 'loss': 'deviance'}
0.5148774703557313 0.021646709599565567 {'learning_rate': 0.5, 'loss': 'deviance'}


#### PCA features

In [74]:
## DJI
clf = GradientBoostingClassifier()
print("Macro Average F1 DJI")
grid_search = GridSearchCV(clf, param_grid_clf, cv=5, scoring="f1_macro")
grid_search.fit(dji_X_train_pca, dji_y_train_pca)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)
    

print("\n")
print("Accuracy DJI")
grid_search = GridSearchCV(clf, param_grid_clf, cv=5, scoring="accuracy")
grid_search.fit(dji_X_train_pca, dji_y_train_pca)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)

Macro Average F1 DJI
0.49798382519896733 0.02697577010009606 {'learning_rate': 0.1, 'loss': 'deviance'}
0.4806440120316061 0.012903266667187708 {'learning_rate': 0.5, 'loss': 'deviance'}


Accuracy DJI
0.5025612648221344 0.03168618419479978 {'learning_rate': 0.1, 'loss': 'deviance'}
0.488034255599473 0.013044003662512784 {'learning_rate': 0.5, 'loss': 'deviance'}


In [75]:
## nasdaq
clf = GradientBoostingClassifier()
print("Macro Average F1 NASDAQ")
grid_search = GridSearchCV(clf, param_grid_clf, cv=5, scoring="f1_macro")
grid_search.fit(nasdaq_X_train_pca, nasdaq_y_train_pca)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)
    

print("\n")
print("Accuracy NASDAQ")
grid_search = GridSearchCV(clf, param_grid_clf, cv=5, scoring="accuracy")
grid_search.fit(nasdaq_X_train_pca, nasdaq_y_train_pca)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)

Macro Average F1 NASDAQ
0.4934361797337111 0.014011243127508836 {'learning_rate': 0.1, 'loss': 'deviance'}
0.5207189548112141 0.04121635861557056 {'learning_rate': 0.5, 'loss': 'deviance'}


Accuracy NASDAQ
0.5214018445322793 0.01671776052809872 {'learning_rate': 0.1, 'loss': 'deviance'}
0.5308379446640317 0.038165492135585184 {'learning_rate': 0.5, 'loss': 'deviance'}


In [76]:
## nyse
clf = GradientBoostingClassifier()
print("Macro Average F1 NYSE")
grid_search = GridSearchCV(clf, param_grid_clf, cv=5, scoring="f1_macro")
grid_search.fit(nyse_X_train_pca, nyse_y_train_pca)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)
    

print("\n")
print("Accuracy NYSE")
grid_search = GridSearchCV(clf, param_grid_clf, cv=5, scoring="accuracy")
grid_search.fit(nyse_X_train_pca, nyse_y_train_pca)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)

Macro Average F1 NYSE
0.5060074893313765 0.03742399697323566 {'learning_rate': 0.1, 'loss': 'deviance'}
0.4977559785309721 0.027564909737655578 {'learning_rate': 0.5, 'loss': 'deviance'}


Accuracy NYSE
0.527201581027668 0.024155026051069305 {'learning_rate': 0.1, 'loss': 'deviance'}
0.49961528326745724 0.025035158905718058 {'learning_rate': 0.5, 'loss': 'deviance'}


In [77]:
## russell
clf = GradientBoostingClassifier()
print("Macro Average F1 RUSSELL")
grid_search = GridSearchCV(clf, param_grid_clf, cv=5, scoring="f1_macro")
grid_search.fit(russell_X_train_pca, russell_y_train_pca)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)
    

print("\n")
print("Accuracy RUSSELL")
grid_search = GridSearchCV(clf, param_grid_clf, cv=5, scoring="accuracy")
grid_search.fit(russell_X_train_pca, russell_y_train_pca)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)

Macro Average F1 RUSSELL
0.48387129112349514 0.020888157997414555 {'learning_rate': 0.1, 'loss': 'deviance'}
0.46954610075770714 0.04285295388596876 {'learning_rate': 0.5, 'loss': 'deviance'}


Accuracy RUSSELL
0.4938445322793149 0.01586236128699128 {'learning_rate': 0.1, 'loss': 'deviance'}
0.48368115942028983 0.040992988678223326 {'learning_rate': 0.5, 'loss': 'deviance'}


In [78]:
## S&P 500
clf = GradientBoostingClassifier()
print("Macro Average F1 S&P 500")
grid_search = GridSearchCV(clf, param_grid_clf, cv=5, scoring="f1_macro")
grid_search.fit(sp_X_train_pca, sp_y_train_pca)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)
    

print("\n")
print("Accuracy S&P 500")
grid_search = GridSearchCV(clf, param_grid_clf, cv=5, scoring="accuracy")
grid_search.fit(sp_X_train_pca, sp_y_train_pca)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)

Macro Average F1 S&P 500
0.4938572324483713 0.02877957733905235 {'learning_rate': 0.1, 'loss': 'deviance'}
0.4884405768597582 0.021554989849595585 {'learning_rate': 0.5, 'loss': 'deviance'}


Accuracy S&P 500
0.5243083003952569 0.022177282922144687 {'learning_rate': 0.1, 'loss': 'deviance'}
0.4996337285902504 0.021691693202783646 {'learning_rate': 0.5, 'loss': 'deviance'}


#### Technical Indicator Features

In [79]:
## DJI
clf = GradientBoostingClassifier()
print("Macro Average F1 DJI")
grid_search = GridSearchCV(clf, param_grid_clf, cv=5, scoring="f1_macro")
grid_search.fit(dji_X_train_ti, dji_y_train_ti)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)
    

print("\n")
print("Accuracy DJI")
grid_search = GridSearchCV(clf, param_grid_clf, cv=5, scoring="accuracy")
grid_search.fit(dji_X_train_ti, dji_y_train_ti)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)

Macro Average F1 DJI
0.48845694685895696 0.02200289061676878 {'learning_rate': 0.1, 'loss': 'deviance'}
0.48987854959222304 0.04176920262353819 {'learning_rate': 0.5, 'loss': 'deviance'}


Accuracy DJI
0.5105296442687747 0.029548293061733063 {'learning_rate': 0.1, 'loss': 'deviance'}
0.49379710144927536 0.031734616845366215 {'learning_rate': 0.5, 'loss': 'deviance'}


In [80]:
## nasdaq
clf = GradientBoostingClassifier()
print("Macro Average F1 NASDAQ")
grid_search = GridSearchCV(clf, param_grid_clf, cv=5, scoring="f1_macro")
grid_search.fit(nasdaq_X_train_ti, nasdaq_y_train_ti)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)
    

print("\n")
print("Accuracy NASDAQ")
grid_search = GridSearchCV(clf, param_grid_clf, cv=5, scoring="accuracy")
grid_search.fit(nasdaq_X_train_ti, nasdaq_y_train_ti)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)

Macro Average F1 NASDAQ
0.4833833297157743 0.018729137558753932 {'learning_rate': 0.1, 'loss': 'deviance'}
0.48566111741617035 0.024008881892979 {'learning_rate': 0.5, 'loss': 'deviance'}


Accuracy NASDAQ
0.5098050065876152 0.022402869167209103 {'learning_rate': 0.1, 'loss': 'deviance'}
0.4974650856389987 0.020864303777751787 {'learning_rate': 0.5, 'loss': 'deviance'}


In [81]:
## nyse
clf = GradientBoostingClassifier()
print("Macro Average F1 NYSE")
grid_search = GridSearchCV(clf, param_grid_clf, cv=5, scoring="f1_macro")
grid_search.fit(nyse_X_train_ti, nyse_y_train_ti)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)
    

print("\n")
print("Accuracy NYSE")
grid_search = GridSearchCV(clf, param_grid_clf, cv=5, scoring="accuracy")
grid_search.fit(nyse_X_train_ti, nyse_y_train_ti)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)

Macro Average F1 NYSE
0.4743064516948866 0.029167646812750525 {'learning_rate': 0.1, 'loss': 'deviance'}
0.49886983144631436 0.024802192760899767 {'learning_rate': 0.5, 'loss': 'deviance'}


Accuracy NYSE
0.48148089591567855 0.027658714301301224 {'learning_rate': 0.1, 'loss': 'deviance'}
0.5075915678524374 0.020082205583245312 {'learning_rate': 0.5, 'loss': 'deviance'}


In [82]:
## russell
clf = GradientBoostingClassifier()
print("Macro Average F1 RUSSELL")
grid_search = GridSearchCV(clf, param_grid_clf, cv=5, scoring="f1_macro")
grid_search.fit(russell_X_train_ti, russell_y_train_ti)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)
    

print("\n")
print("Accuracy RUSSELL")
grid_search = GridSearchCV(clf, param_grid_clf, cv=5, scoring="accuracy")
grid_search.fit(russell_X_train_ti, russell_y_train_ti)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)

Macro Average F1 RUSSELL
0.49432753468821045 0.016773479923367956 {'learning_rate': 0.1, 'loss': 'deviance'}
0.5107132265880319 0.024598967591995348 {'learning_rate': 0.5, 'loss': 'deviance'}


Accuracy RUSSELL
0.49819235836627146 0.019912500387049017 {'learning_rate': 0.1, 'loss': 'deviance'}
0.5097944664031621 0.021670127401974205 {'learning_rate': 0.5, 'loss': 'deviance'}


In [83]:
## S&P 500
clf = GradientBoostingClassifier()
print("Macro Average F1 S&P 500")
grid_search = GridSearchCV(clf, param_grid_clf, cv=5, scoring="f1_macro")
grid_search.fit(sp_X_train_ti, sp_y_train_ti)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)
    

print("\n")
print("Accuracy S&P 500")
grid_search = GridSearchCV(clf, param_grid_clf, cv=5, scoring="accuracy")
grid_search.fit(sp_X_train_ti, sp_y_train_ti)
cv_results = grid_search.cv_results_
# Print statistics from grid search cv
for mean_score, std_score, params in zip(cv_results["mean_test_score"], cv_results["std_test_score"], cv_results["params"]):
    print(mean_score, std_score, params)

Macro Average F1 S&P 500
0.49982383465247315 0.014615147492751602 {'learning_rate': 0.1, 'loss': 'deviance'}
0.5067352941607655 0.014035627734349927 {'learning_rate': 0.5, 'loss': 'deviance'}


Accuracy S&P 500
0.5177575757575757 0.016616412333086552 {'learning_rate': 0.1, 'loss': 'deviance'}
0.5097918313570488 0.011664246443440374 {'learning_rate': 0.5, 'loss': 'deviance'}


### Stacking Ensembles

In this section, we will implement stacking ensembles, level 0 classifiers (logistic regression, knn, decision tree, SVM, gaussian NB); level 1 classifier (logistic regression).

#### Full features

In [55]:
## DJI

# Voting classifier
voting_clf = stacking_classifier()

voting_clf.fit(dji_X_train_full, dji_y_train_full)
print("Macro Average F1 DJI")
analyse_cv(voting_clf, dji_X_train_full, dji_y_train_full, 5, "f1_macro")

print("\n")
print("Accuracy DJI")
analyse_cv(voting_clf, dji_X_train_full, dji_y_train_full, 5, "accuracy")

Macro Average F1 DJI
Scores: [0.41304348 0.35058824 0.46195652 0.46228555 0.41018994]
Mean Scores: 0.41961274523468733
Standard deviation: 0.041256007957169175


Accuracy DJI
Scores: [0.4673913  0.53623188 0.54710145 0.51449275 0.50909091]
Mean Scores: 0.5148616600790513
Standard deviation: 0.0275062277039988


In [56]:
## NASDAQ

voting_clf = stacking_classifier()

voting_clf.fit(nasdaq_X_train_full, nasdaq_y_train_full)
print("Macro Average F1 NASDAQ")
analyse_cv(voting_clf, nasdaq_X_train_full, nasdaq_y_train_full, 5, "f1_macro")

print("\n")
print("Accuracy NASDAQ")
analyse_cv(voting_clf, nasdaq_X_train_full, nasdaq_y_train_full, 5, "accuracy")

Macro Average F1 NASDAQ
Scores: [0.38715694 0.38898807 0.3623966  0.37735145 0.38730229]
Mean Scores: 0.38063907157156524
Standard deviation: 0.010001830546082734


Accuracy NASDAQ
Scores: [0.50362319 0.55072464 0.55434783 0.54710145 0.55272727]
Mean Scores: 0.5417048748353096
Standard deviation: 0.019193912839431112


In [57]:
## NYSE

voting_clf = stacking_classifier()

voting_clf.fit(nyse_X_train_full, nyse_y_train_full)
print("Macro Average F1 NYSE")
analyse_cv(voting_clf, nyse_X_train_full, nyse_y_train_full, 5, "f1_macro")

print("\n")
print("Accuracy NYSE")
analyse_cv(voting_clf, nyse_X_train_full, nyse_y_train_full, 5, "accuracy")

Macro Average F1 NYSE
Scores: [0.51361678 0.49749135 0.46908062 0.49315985 0.42475698]
Mean Scores: 0.47962111585557776
Standard deviation: 0.03091770528312782


Accuracy NYSE
Scores: [0.53623188 0.5        0.52173913 0.52173913 0.50545455]
Mean Scores: 0.5170329380764164
Standard deviation: 0.012938903933886928


In [58]:
## Russell

voting_clf = stacking_classifier()

voting_clf.fit(russell_X_train_full, russell_y_train_full)
print("Macro Average F1 RUSSELL")
analyse_cv(voting_clf, russell_X_train_full, russell_y_train_full, 5, "f1_macro")

print("\n")
print("Accuracy RUSSELL")
analyse_cv(voting_clf, russell_X_train_full, russell_y_train_full, 5, "accuracy")

Macro Average F1 RUSSELL
Scores: [0.42671131 0.41815476 0.41462488 0.50271739 0.36029009]
Mean Scores: 0.4244996874146362
Standard deviation: 0.045570916950316294


Accuracy RUSSELL
Scores: [0.55797101 0.50724638 0.51449275 0.53623188 0.51272727]
Mean Scores: 0.525733860342556
Standard deviation: 0.018903774536376903


In [59]:
## S&P 500

voting_clf = stacking_classifier()

voting_clf.fit(sp_X_train_full, sp_y_train_full)
print("Macro Average F1 S&P 500")
analyse_cv(voting_clf, sp_X_train_full, sp_y_train_full, 5, "f1_macro")

print("\n")
print("Accuracy S&P 500")
analyse_cv(voting_clf, sp_X_train_full, sp_y_train_full, 5, "accuracy")

Macro Average F1 S&P 500
Scores: [0.41540286 0.45698293 0.3834843  0.42802632 0.43040597]
Mean Scores: 0.4228604760662515
Standard deviation: 0.023891678299806943


Accuracy S&P 500
Scores: [0.53623188 0.51449275 0.52898551 0.55072464 0.56363636]
Mean Scores: 0.5388142292490118
Standard deviation: 0.017045836113682555


#### PCA features

In [60]:
## DJI

voting_clf = stacking_classifier()

voting_clf.fit(dji_X_train_pca, dji_y_train_pca)
print("Macro Average F1 DJI")
analyse_cv(voting_clf, dji_X_train_pca, dji_y_train_pca, 5, "f1_macro")

print("\n")
print("Accuracy DJI")
analyse_cv(voting_clf, dji_X_train_pca, dji_y_train_pca, 5, "accuracy")

Macro Average F1 DJI
Scores: [0.3724744  0.39902799 0.43673469 0.49117505 0.49065639]
Mean Scores: 0.4380137047250708
Standard deviation: 0.04777928905447698


Accuracy DJI
Scores: [0.50724638 0.54710145 0.53985507 0.5615942  0.54545455]
Mean Scores: 0.5402503293807641
Standard deviation: 0.017994442245951862


In [61]:
## NASDAQ

voting_clf = stacking_classifier()

voting_clf.fit(nasdaq_X_train_pca, nasdaq_y_train_pca)
print("Macro Average F1 NASDAQ")
analyse_cv(voting_clf, nasdaq_X_train_pca, nasdaq_y_train_pca, 5, "f1_macro")

print("\n")
print("Accuracy NASDAQ")
analyse_cv(voting_clf, nasdaq_X_train_pca, nasdaq_y_train_pca, 5, "accuracy")

Macro Average F1 NASDAQ
Scores: [0.35664336 0.43905132 0.42545006 0.41412935 0.50089209]
Mean Scores: 0.42723323695402726
Standard deviation: 0.04631007704302886


Accuracy NASDAQ
Scores: [0.55434783 0.55797101 0.55072464 0.54347826 0.54909091]
Mean Scores: 0.5511225296442687
Standard deviation: 0.004899488280426646


In [62]:
## NYSE

voting_clf = stacking_classifier()

voting_clf.fit(nyse_X_train_pca, nyse_y_train_pca)
print("Macro Average F1 NYSE")
analyse_cv(voting_clf, nyse_X_train_pca, nyse_y_train_pca, 5, "f1_macro")

print("\n")
print("Accuracy NYSE")
analyse_cv(voting_clf, nyse_X_train_pca, nyse_y_train_pca, 5, "accuracy")

Macro Average F1 NYSE
Scores: [0.44159204 0.40466946 0.36293535 0.50212224 0.46360239]
Mean Scores: 0.43498429689312185
Standard deviation: 0.047917012578907604


Accuracy NYSE
Scores: [0.52898551 0.55072464 0.5326087  0.50362319 0.51272727]
Mean Scores: 0.5257338603425559
Standard deviation: 0.016373323761522396


In [63]:
## RUSSELL

voting_clf = stacking_classifier()

voting_clf.fit(russell_X_train_pca, russell_y_train_pca)
print("Macro Average F1 RUSSELL")
analyse_cv(voting_clf, russell_X_train_pca, russell_y_train_pca, 5, "f1_macro")

print("\n")
print("Accuracy RUSSELL")
analyse_cv(voting_clf, russell_X_train_pca, russell_y_train_pca, 5, "accuracy")

Macro Average F1 RUSSELL
Scores: [0.48888889 0.45023453 0.51911364 0.44861265 0.48486641]
Mean Scores: 0.478343224371306
Standard deviation: 0.026419671835610087


Accuracy RUSSELL
Scores: [0.53985507 0.53985507 0.55797101 0.48913043 0.55636364]
Mean Scores: 0.536635046113307
Standard deviation: 0.02498746108363223


In [64]:
## s&p 500

voting_clf = stacking_classifier()

voting_clf.fit(sp_X_train_pca, sp_y_train_pca)
print("Macro Average F1 S&P 500")
analyse_cv(voting_clf, sp_X_train_pca, sp_y_train_pca, 5, "f1_macro")

print("\n")
print("Accuracy S&P 500")
analyse_cv(voting_clf, sp_X_train_pca, sp_y_train_pca, 5, "accuracy")

Macro Average F1 S&P 500
Scores: [0.4595404  0.45526316 0.40267949 0.35922759 0.38514372]
Mean Scores: 0.412370873393893
Standard deviation: 0.0393043542333991


Accuracy S&P 500
Scores: [0.51811594 0.51449275 0.53985507 0.5326087  0.53454545]
Mean Scores: 0.5279235836627141
Standard deviation: 0.009846186097341615


#### Technical indicator features

In [65]:
## DJI

voting_clf = stacking_classifier()

voting_clf.fit(dji_X_train_ti, dji_y_train_ti)
print("Macro Average F1 DJI")
analyse_cv(voting_clf, dji_X_train_ti, dji_y_train_ti, 5, "f1_macro")

print("\n")
print("Accuracy DJI")
analyse_cv(voting_clf, dji_X_train_ti, dji_y_train_ti, 5, "accuracy")

Macro Average F1 DJI
Scores: [0.42687382 0.41412935 0.45690673 0.48223954 0.39706104]
Mean Scores: 0.435442095723654
Standard deviation: 0.03050077168725467


Accuracy DJI
Scores: [0.52536232 0.5326087  0.54347826 0.53623188 0.53818182]
Mean Scores: 0.5351725955204216
Standard deviation: 0.006032855065274832


In [66]:
## NASDAQ

voting_clf = stacking_classifier()

voting_clf.fit(nasdaq_X_train_ti, nasdaq_y_train_ti)
print("Macro Average F1 NASDAQ")
analyse_cv(voting_clf, nasdaq_X_train_ti, nasdaq_y_train_ti, 5, "f1_macro")

print("\n")
print("Accuracy NASDAQ")
analyse_cv(voting_clf, nasdaq_X_train_ti, nasdaq_y_train_ti, 5, "accuracy")

Macro Average F1 NASDAQ
Scores: [0.40483597 0.36081519 0.38621947 0.43202885 0.39083412]
Mean Scores: 0.39494672051428825
Standard deviation: 0.023370583656313258


Accuracy NASDAQ
Scores: [0.55797101 0.53623188 0.54710145 0.57246377 0.56727273]
Mean Scores: 0.5562081686429512
Standard deviation: 0.013204248413661043


In [67]:
## NYSE

voting_clf = stacking_classifier()

voting_clf.fit(nyse_X_train_ti, nyse_y_train_ti)
print("Macro Average F1 NYSE")
analyse_cv(voting_clf, nyse_X_train_ti, nyse_y_train_ti, 5, "f1_macro")

print("\n")
print("Accuracy NYSE")
analyse_cv(voting_clf, nyse_X_train_ti, nyse_y_train_ti, 5, "accuracy")

Macro Average F1 NYSE
Scores: [0.48375029 0.45197423 0.45104039 0.47046093 0.34834123]
Mean Scores: 0.4411134156110251
Standard deviation: 0.047963235955164094


Accuracy NYSE
Scores: [0.48913043 0.53623188 0.52173913 0.54347826 0.53454545]
Mean Scores: 0.5250250329380765
Standard deviation: 0.019266410629387366


In [68]:
## RUSSELL

voting_clf = stacking_classifier()

voting_clf.fit(russell_X_train_ti, russell_y_train_ti)
print("Macro Average F1 RUSSELL")
analyse_cv(voting_clf, russell_X_train_ti, russell_y_train_ti, 5, "f1_macro")

print("\n")
print("Accuracy RUSSELL")
analyse_cv(voting_clf, russell_X_train_ti, russell_y_train_ti, 5, "accuracy")

Macro Average F1 RUSSELL
Scores: [0.42444391 0.52127334 0.47031114 0.41457998 0.47989409]
Mean Scores: 0.4621004906313635
Standard deviation: 0.03888873835685103


Accuracy RUSSELL
Scores: [0.51811594 0.55072464 0.53985507 0.4673913  0.50545455]
Mean Scores: 0.5163083003952569
Standard deviation: 0.029164632052939697


In [69]:
## s&p 500

voting_clf = stacking_classifier()

voting_clf.fit(sp_X_train_ti, sp_y_train_ti)
print("Macro Average F1 S&P 500")
analyse_cv(voting_clf, sp_X_train_ti, sp_y_train_ti, 5, "f1_macro")

print("\n")
print("Accuracy S&P 500")
analyse_cv(voting_clf, sp_X_train_ti, sp_y_train_ti, 5, "accuracy")

Macro Average F1 S&P 500
Scores: [0.37786574 0.40588794 0.36081519 0.42475642 0.39634146]
Mean Scores: 0.3931333511491791
Standard deviation: 0.022138442147906234


Accuracy S&P 500
Scores: [0.54710145 0.52173913 0.56884058 0.54710145 0.53454545]
Mean Scores: 0.5438656126482214
Standard deviation: 0.01562777539703877
