**Questions:**

**Classification**
- Can any of the models predict, with a precision > 60%, the reaction of the market after **30 min** of the release by just using features from the release of the new?
- Can any of the models predict, with a precision > 60%, the reaction of the market after **60 min** of the release by just using features from the release of the new?

- Which model has the highest weighted precision?
- Which model has the highest precision by classification type?
- Which model has the highest f1 value (microavg)?

**Regression**

- Which is the best performance regression model?
- Do the the errors follow a normal distribution?

In [18]:
import pandas as pd
import numpy as np
import glob, os

pd.set_option('display.max_columns', 100)
pd.set_option('display.width', 1000)


In [19]:
list_path =['./models_out/']

all_files = [glob.glob(os.path.join(path, 'sweeps_*.csv')) for path in list_path]
             
df_from_each_file = (pd.read_csv(file) for list_files in all_files for file in list_files )
df = pd.concat(df_from_each_file, ignore_index=True)


In [20]:
df.head(3)

Unnamed: 0.1,Unnamed: 0,model_type,model,sweeps_market_variables,sweep_news_agg,sweep_buy_sell,before_data,sweep_grid,best_score,best_params,f1_microavg,precision_weighted,precision_EUR_down,precision_EUR_same,precision_EUR_up,support_EUR_down,support_EUR_same,support_EUR_up,report,elapsed_time
0,0,clf,kn,basic,ALL_NO_1_1_1,0_30,included,basic,0.55,{'n_neighbors': 51},0.55,0.5,0.41,0.57,0.44,303.0,686.0,303.0,"{'0': {'precision': 0.4112903225806452, 'recal...",1.81
1,1,clf,svc-rbf,basic,ALL_NO_1_1_1,0_30,included,basic,0.55,"{'C': 1, 'gamma': 1}",0.53,0.28,0.0,0.53,0.0,303.0,686.0,303.0,"{'0': {'precision': 0.0, 'recall': 0.0, 'f1-sc...",6.84
2,2,clf,dtree,basic,ALL_NO_1_1_1,0_30,included,basic,0.55,"{'max_depth': 5, 'min_samples_leaf': 200}",0.53,0.46,0.33,0.58,0.32,303.0,686.0,303.0,"{'0': {'precision': 0.32926829268292684, 'reca...",0.34


We split the dataframe into classification and regression dataframes

In [21]:
df_clf = df[df['model_type'] == 'clf']
df_reg = df[df['model_type'] == 'reg']

** Which is the best precision rate achieved, 30 min after the publication of the new, by just using features obtained from the news release? **

In [22]:
df_clf[df_clf['sweep_buy_sell'] == '0_30'].sort_values(by='precision_EUR_down', ascending=False)[0:3]

Unnamed: 0.1,Unnamed: 0,model_type,model,sweeps_market_variables,sweep_news_agg,sweep_buy_sell,before_data,sweep_grid,best_score,best_params,f1_microavg,precision_weighted,precision_EUR_down,precision_EUR_same,precision_EUR_up,support_EUR_down,support_EUR_same,support_EUR_up,report,elapsed_time
112,0,clf,kn,basic,High_NO_1_1_1,0_30,included,basic,0.48,{'n_neighbors': 131},0.46,0.43,0.44,0.5,0.3,157.0,220.0,127.0,"{'0': {'precision': 0.4375, 'recall': 0.178343...",0.64
126,0,clf,kn,all,High_NO_1_1_1,0_30,included,basic,0.48,{'n_neighbors': 131},0.46,0.43,0.44,0.5,0.3,157.0,220.0,127.0,"{'0': {'precision': 0.4375, 'recall': 0.178343...",0.65
116,4,clf,xgb,basic,High_NO_1_1_1,0_30,included,basic,0.49,{'n_estimators': 10},0.47,0.44,0.43,0.51,0.34,157.0,220.0,127.0,"{'0': {'precision': 0.43037974683544306, 'reca...",0.26


Ok, not good. Just 44% of precision in predicting thet the EUR is going to be devaluated w.r.t USD. Let´s see on the opposite direction

In [26]:
df_clf[df_clf['sweep_buy_sell'] == '0_30'].sort_values(by='precision_EUR_up', ascending=False)[0:3]

Unnamed: 0.1,Unnamed: 0,model_type,model,sweeps_market_variables,sweep_news_agg,sweep_buy_sell,before_data,sweep_grid,best_score,best_params,f1_microavg,precision_weighted,precision_EUR_down,precision_EUR_same,precision_EUR_up,support_EUR_down,support_EUR_same,support_EUR_up,report,elapsed_time
84,0,clf,kn,basic,ALL_YES_3_2_1,0_30,included,basic,0.58,{'n_neighbors': 191},0.61,0.61,0.37,0.61,0.86,194.0,550.0,173.0,"{'0': {'precision': 0.36666666666666664, 'reca...",2.9
98,0,clf,kn,all,ALL_YES_3_2_1,0_30,included,basic,0.58,{'n_neighbors': 191},0.61,0.61,0.37,0.61,0.86,194.0,550.0,173.0,"{'0': {'precision': 0.36666666666666664, 'reca...",2.89
73,3,clf,rforest,all,ALL_YES_1_1_1,0_30,included,basic,0.58,"{'max_depth': 7, 'min_samples_leaf': 10, 'n_es...",0.61,0.54,0.29,0.62,0.58,194.0,550.0,173.0,"{'0': {'precision': 0.29411764705882354, 'reca...",16.06


In this case seems to be quite good. 86% !

Let´s get the same metrics but after 60 min

In [27]:
df_clf[df_clf['sweep_buy_sell'] == '0_60'].sort_values(by='precision_EUR_down', ascending=False)[0:3]

Unnamed: 0.1,Unnamed: 0,model_type,model,sweeps_market_variables,sweep_news_agg,sweep_buy_sell,before_data,sweep_grid,best_score,best_params,f1_microavg,precision_weighted,precision_EUR_down,precision_EUR_same,precision_EUR_up,support_EUR_down,support_EUR_same,support_EUR_up,report,elapsed_time
299,5,clf,gboosting,all,High_NO_1_1_1,0_60,included,basic,0.48,{'n_estimators': 50},0.46,0.46,0.51,0.49,0.34,169.0,206.0,129.0,"{'0': {'precision': 0.514018691588785, 'recall...",1.06
285,5,clf,gboosting,basic,High_NO_1_1_1,0_60,included,basic,0.48,{'n_estimators': 50},0.46,0.46,0.51,0.49,0.34,169.0,206.0,129.0,"{'0': {'precision': 0.514018691588785, 'recall...",1.04
297,3,clf,rforest,all,High_NO_1_1_1,0_60,included,basic,0.5,"{'max_depth': 7, 'min_samples_leaf': 20, 'n_es...",0.49,0.48,0.5,0.52,0.4,169.0,206.0,129.0,"{'0': {'precision': 0.5042016806722689, 'recal...",12.76


In [28]:
df_clf[df_clf['sweep_buy_sell'] == '0_60'].sort_values(by='precision_EUR_up', ascending=False)[0:3]

Unnamed: 0.1,Unnamed: 0,model_type,model,sweeps_market_variables,sweep_news_agg,sweep_buy_sell,before_data,sweep_grid,best_score,best_params,f1_microavg,precision_weighted,precision_EUR_down,precision_EUR_same,precision_EUR_up,support_EUR_down,support_EUR_same,support_EUR_up,report,elapsed_time
322,0,clf,kn,all,High_YES_1_1_1,0_60,included,basic,0.51,{'n_neighbors': 91},0.47,0.5,0.42,0.47,0.62,116.0,162.0,111.0,"{'0': {'precision': 0.42028985507246375, 'reca...",0.55
308,0,clf,kn,basic,High_YES_1_1_1,0_60,included,basic,0.51,{'n_neighbors': 91},0.47,0.5,0.42,0.47,0.62,116.0,162.0,111.0,"{'0': {'precision': 0.42028985507246375, 'reca...",0.55
327,5,clf,gboosting,all,High_YES_1_1_1,0_60,included,basic,0.5,{'n_estimators': 10},0.46,0.44,0.4,0.47,0.45,116.0,162.0,111.0,"{'0': {'precision': 0.4, 'recall': 0.206896551...",0.82


In this case, both indicators are quite poor, useless to make investments decisions.

** Which is the model with the highest weighted precision? **

In [29]:
df_clf.sort_values(by='precision_weighted', ascending=False)[0:5]

Unnamed: 0.1,Unnamed: 0,model_type,model,sweeps_market_variables,sweep_news_agg,sweep_buy_sell,before_data,sweep_grid,best_score,best_params,f1_microavg,precision_weighted,precision_EUR_down,precision_EUR_same,precision_EUR_up,support_EUR_down,support_EUR_same,support_EUR_up,report,elapsed_time
704,4,clf,xgb,basic,ALL_NO_3_2_1,60_120,included,basic,0.68,{'n_estimators': 50},0.7,0.71,0.75,0.66,0.74,332.0,618.0,342.0,"{'0': {'precision': 0.7549407114624506, 'recal...",1.2
397,5,clf,gboosting,basic,ALL_YES_1_1_1,30_60,included,basic,0.71,{'n_estimators': 50},0.71,0.71,0.72,0.69,0.74,225.0,471.0,221.0,"{'0': {'precision': 0.7165775401069518, 'recal...",2.29
352,2,clf,dtree,all,ALL_NO_1_1_1,30_60,included,basic,0.7,"{'max_depth': 5, 'min_samples_leaf': 100}",0.71,0.71,0.74,0.7,0.69,311.0,658.0,323.0,"{'0': {'precision': 0.7445255474452555, 'recal...",0.63
353,3,clf,rforest,all,ALL_NO_1_1_1,30_60,included,basic,0.7,"{'max_depth': 7, 'min_samples_leaf': 30, 'n_es...",0.7,0.71,0.75,0.69,0.7,311.0,658.0,323.0,"{'0': {'precision': 0.7461538461538462, 'recal...",27.44
719,5,clf,gboosting,all,ALL_NO_3_2_1,60_120,included,basic,0.67,{'n_estimators': 10},0.69,0.71,0.75,0.65,0.76,332.0,618.0,342.0,"{'0': {'precision': 0.7469879518072289, 'recal...",4.0


** Which model has the highest precision by classification type? **

In [47]:
df_clf.sort_values(by='precision_EUR_down', ascending=False)[0:5]

Unnamed: 0.1,Unnamed: 0,model_type,model,sweeps_market_variables,sweep_news_agg,sweep_buy_sell,before_data,sweep_grid,best_score,best_params,f1_microavg,precision_weighted,precision_EUR_down,precision_EUR_same,precision_EUR_up,support_EUR_down,support_EUR_same,support_EUR_up,report,elapsed_time
465,3,clf,rforest,all,High_NO_1_1_1,30_60,included,basic,0.7,"{'max_depth': 7, 'min_samples_leaf': 20, 'n_es...",0.69,0.69,0.81,0.63,0.65,169.0,206.0,129.0,"{'0': {'precision': 0.8092105263157895, 'recal...",11.16
451,3,clf,rforest,basic,High_NO_1_1_1,30_60,included,basic,0.7,"{'max_depth': 6, 'min_samples_leaf': 30, 'n_es...",0.69,0.7,0.81,0.62,0.66,169.0,206.0,129.0,"{'0': {'precision': 0.8120805369127517, 'recal...",10.79
495,5,clf,gboosting,all,High_YES_1_1_1,30_60,included,basic,0.7,{'n_estimators': 10},0.69,0.7,0.81,0.61,0.72,116.0,162.0,111.0,"{'0': {'precision': 0.8080808080808081, 'recal...",0.95
453,5,clf,gboosting,basic,High_NO_1_1_1,30_60,included,basic,0.7,{'n_estimators': 10},0.7,0.7,0.8,0.65,0.67,169.0,206.0,129.0,"{'0': {'precision': 0.7973856209150327, 'recal...",1.05
467,5,clf,gboosting,all,High_NO_1_1_1,30_60,included,basic,0.7,{'n_estimators': 10},0.69,0.69,0.77,0.64,0.68,169.0,206.0,129.0,"{'0': {'precision': 0.7716049382716049, 'recal...",1.2


In [32]:
df_clf.sort_values(by='precision_EUR_same', ascending=False)[0:5]

Unnamed: 0.1,Unnamed: 0,model_type,model,sweeps_market_variables,sweep_news_agg,sweep_buy_sell,before_data,sweep_grid,best_score,best_params,f1_microavg,precision_weighted,precision_EUR_down,precision_EUR_same,precision_EUR_up,support_EUR_down,support_EUR_same,support_EUR_up,report,elapsed_time
366,2,clf,dtree,basic,ALL_NO_3_2_1,30_60,included,basic,0.7,"{'max_depth': 2, 'min_samples_leaf': 10}",0.7,0.7,0.73,0.71,0.65,311.0,658.0,323.0,"{'0': {'precision': 0.7312925170068028, 'recal...",0.67
380,2,clf,dtree,all,ALL_NO_3_2_1,30_60,included,basic,0.7,"{'max_depth': 2, 'min_samples_leaf': 10}",0.7,0.7,0.73,0.71,0.65,311.0,658.0,323.0,"{'0': {'precision': 0.7312925170068028, 'recal...",0.8
762,6,clf,ada,basic,ALL_YES_3_2_1,60_120,included,basic,0.7,{'n_estimators': 10},0.68,0.69,0.61,0.71,0.71,205.0,492.0,220.0,"{'0': {'precision': 0.6121495327102804, 'recal...",0.64
338,2,clf,dtree,basic,ALL_NO_1_1_1,30_60,included,basic,0.7,"{'max_depth': 2, 'min_samples_leaf': 10}",0.7,0.7,0.73,0.71,0.65,311.0,658.0,323.0,"{'0': {'precision': 0.7312925170068028, 'recal...",0.48
734,6,clf,ada,basic,ALL_YES_1_1_1,60_120,included,basic,0.7,{'n_estimators': 10},0.68,0.69,0.61,0.71,0.71,205.0,492.0,220.0,"{'0': {'precision': 0.6121495327102804, 'recal...",0.78


In [33]:
df_clf.sort_values(by='precision_EUR_up', ascending=False)[0:5]

Unnamed: 0.1,Unnamed: 0,model_type,model,sweeps_market_variables,sweep_news_agg,sweep_buy_sell,before_data,sweep_grid,best_score,best_params,f1_microavg,precision_weighted,precision_EUR_down,precision_EUR_same,precision_EUR_up,support_EUR_down,support_EUR_same,support_EUR_up,report,elapsed_time
98,0,clf,kn,all,ALL_YES_3_2_1,0_30,included,basic,0.58,{'n_neighbors': 191},0.61,0.61,0.37,0.61,0.86,194.0,550.0,173.0,"{'0': {'precision': 0.36666666666666664, 'reca...",2.89
84,0,clf,kn,basic,ALL_YES_3_2_1,0_30,included,basic,0.58,{'n_neighbors': 191},0.61,0.61,0.37,0.61,0.86,194.0,550.0,173.0,"{'0': {'precision': 0.36666666666666664, 'reca...",2.9
702,2,clf,dtree,basic,ALL_NO_3_2_1,60_120,included,basic,0.68,"{'max_depth': 3, 'min_samples_leaf': 10}",0.69,0.71,0.73,0.64,0.81,332.0,618.0,342.0,"{'0': {'precision': 0.7328519855595668, 'recal...",0.72
716,2,clf,dtree,all,ALL_NO_3_2_1,60_120,included,basic,0.68,"{'max_depth': 3, 'min_samples_leaf': 10}",0.69,0.71,0.73,0.64,0.81,332.0,618.0,342.0,"{'0': {'precision': 0.7328519855595668, 'recal...",0.86
840,0,clf,kn,basic,ALL_NO_1_1_1,60_180,included,basic,0.6,{'n_neighbors': 51},0.61,0.64,0.59,0.58,0.78,316.0,624.0,352.0,"{'0': {'precision': 0.5891089108910891, 'recal...",6.23


The decision tree model for the 'ALL_NO_3_2_1' grouping is quite good to predict UP movements(76%)
Let´s see the highest accuracy obtained for the same window size in the opossite direction

In [50]:
df_clf[df_clf['sweep_buy_sell'] == '60_120'].sort_values(by='precision_EUR_down', ascending=False)[0:5]

Unnamed: 0.1,Unnamed: 0,model_type,model,sweeps_market_variables,sweep_news_agg,sweep_buy_sell,before_data,sweep_grid,best_score,best_params,f1_microavg,precision_weighted,precision_EUR_down,precision_EUR_same,precision_EUR_up,support_EUR_down,support_EUR_same,support_EUR_up,report,elapsed_time
717,3,clf,rforest,all,ALL_NO_3_2_1,60_120,included,basic,0.67,"{'max_depth': 7, 'min_samples_leaf': 10, 'n_es...",0.69,0.7,0.76,0.65,0.74,332.0,618.0,342.0,"{'0': {'precision': 0.7629310344827587, 'recal...",28.54
719,5,clf,gboosting,all,ALL_NO_3_2_1,60_120,included,basic,0.67,{'n_estimators': 10},0.69,0.71,0.75,0.65,0.76,332.0,618.0,342.0,"{'0': {'precision': 0.7469879518072289, 'recal...",4.0
704,4,clf,xgb,basic,ALL_NO_3_2_1,60_120,included,basic,0.68,{'n_estimators': 50},0.7,0.71,0.75,0.66,0.74,332.0,618.0,342.0,"{'0': {'precision': 0.7549407114624506, 'recal...",1.2
703,3,clf,rforest,basic,ALL_NO_3_2_1,60_120,included,basic,0.67,"{'max_depth': 6, 'min_samples_leaf': 50, 'n_es...",0.68,0.7,0.75,0.65,0.73,332.0,618.0,342.0,"{'0': {'precision': 0.7543103448275862, 'recal...",27.79
690,4,clf,xgb,all,ALL_NO_1_1_1,60_120,included,basic,0.68,{'n_estimators': 50},0.7,0.7,0.75,0.67,0.72,332.0,618.0,342.0,"{'0': {'precision': 0.7509727626459144, 'recal...",1.4


76%... Quite good as well.
Would we obtain even better results by ensembling both models? Let´s see

In [63]:
list(df_clf[df_clf['sweep_buy_sell'] == '60_120'].sort_values(by='precision_EUR_down', ascending=False)[0:1]['best_params'])

["{'max_depth': 7, 'min_samples_leaf': 10, 'n_estimators': 200}"]

In [52]:
df_agg = pd.read_csv('./models_out/ALL_YES_3_2_1-basic-included-60_120-basic.csv')

In [53]:
df_agg.columns

Index(['Unnamed: 0', 'datetime', 'new_id', 'forecast_error_diff_deviation', 'forecast_error_diff_outlier_class', 'previous_error_diff_deviation', 'previous_error_diff_outlier_class', 'fe_accurate', 'fe_better', 'fe_worse', 'pe_accurate', 'pe_better', 'pe_worse', 'High', 'Low', 'Medium', 'year', 'week', 'weekday', 'num_news', 'volatility_0_5_after', 'pips_agg_0_5_after', 'volatility_5_10_after', 'pips_agg_5_10_after', 'volatility_10_15_after', 'pips_agg_10_15_after', 'volatility_15_20_after', 'pips_agg_15_20_after', 'volatility_20_25_after', 'pips_agg_20_25_after', 'volatility_25_30_after', 'pips_agg_25_30_after', 'volatility_0_30_after', 'pips_agg_0_30_after', 'volatility_0_60_after', 'pips_agg_0_60_after', 'volatility_60_0_before', 'pips_agg_60_0_before', 'pips_agg_0_120_after', 'direction_agg_0_120_after'], dtype='object')

In [90]:
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import VotingClassifier
from sklearn.model_selection import cross_val_score
from sklearn.metrics import classification_report


columns_of_model = ['new_id', 'forecast_error_diff_deviation', 'forecast_error_diff_outlier_class', 
                    'previous_error_diff_deviation', 'previous_error_diff_outlier_class', 'fe_accurate', 
                    'fe_better', 'fe_worse', 'pe_accurate', 'pe_better', 'pe_worse', 'High', 'Low', 'Medium',
                    'year', 'week', 'weekday', 'num_news', 'volatility_0_5_after', 'pips_agg_0_5_after', 
                    'volatility_5_10_after', 'pips_agg_5_10_after', 'volatility_10_15_after', 'pips_agg_10_15_after', 
                    'volatility_15_20_after', 'pips_agg_15_20_after', 'volatility_20_25_after', 'pips_agg_20_25_after',
                    'volatility_25_30_after', 'pips_agg_25_30_after', 'volatility_0_30_after', 'pips_agg_0_30_after', 
                    'volatility_0_60_after', 'pips_agg_0_60_after', 'volatility_60_0_before', 'pips_agg_60_0_before']

X = df_agg[columns_of_model].values
y = df_agg['direction_agg_0_120_after'].values
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [91]:
clf_dt = DecisionTreeClassifier(min_samples_leaf= 10, max_depth= 3)

In [92]:
clf_rf = RandomForestClassifier(max_depth= 7, min_samples_leaf= 10, n_estimators= 200)

In [93]:
clf_vot = VotingClassifier(estimators=[('dt', clf_dt),('rf', clf_rf)])

In [94]:
clf_vot.fit(X_train, y_train)

VotingClassifier(estimators=[('dt', DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=3,
            max_features=None, max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=10, min_samples_split=2,
            min_weight_fraction_leaf=0...obs=None,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False))],
         flatten_transform=None, n_jobs=None, voting='hard', weights=None)

In [95]:
y_predict = clf_vot.predict(X_test)
classification_report(y_test, y_predict, output_dict=True)

{'0': {'precision': 0.6386138613861386,
  'recall': 0.6292682926829268,
  'f1-score': 0.6339066339066339,
  'support': 205},
 '1': {'precision': 0.6951219512195121,
  'recall': 0.8109756097560976,
  'f1-score': 0.74859287054409,
  'support': 492},
 '2': {'precision': 0.7446808510638298,
  'recall': 0.4772727272727273,
  'f1-score': 0.5817174515235458,
  'support': 220},
 'micro avg': {'precision': 0.6902944383860414,
  'recall': 0.6902944383860414,
  'f1-score': 0.6902944383860414,
  'support': 917},
 'macro avg': {'precision': 0.6928055545564935,
  'recall': 0.6391722099039173,
  'f1-score': 0.6547389853247566,
  'support': 917},
 'weighted avg': {'precision': 0.6943790935858244,
  'recall': 0.6902944383860414,
  'f1-score': 0.682918638597309,
  'support': 917}}

Ok, this ensembled model is not better than the two individual ones

## Regression

In [40]:
df_reg.head(2)

Unnamed: 0.1,Unnamed: 0,model_type,model,sweeps_market_variables,sweep_news_agg,sweep_buy_sell,before_data,sweep_grid,best_score,best_params,f1_microavg,precision_weighted,precision_EUR_down,precision_EUR_same,precision_EUR_up,support_EUR_down,support_EUR_same,support_EUR_up,report,elapsed_time
7,0,reg,kn,basic,ALL_NO_1_1_1,0_30,included,basic,19.98,{'n_neighbors': 181},,,,,,,,,20.36,1.67
8,1,reg,svr-rbf,basic,ALL_NO_1_1_1,0_30,included,basic,20.0,"{'C': 1, 'gamma': 1}",,,,,,,,,20.34,4.62


which is the best performance regression model?

In [44]:
df_reg.sort_values(by='report', ascending='False')[0:10]

Unnamed: 0.1,Unnamed: 0,model_type,model,sweeps_market_variables,sweep_news_agg,sweep_buy_sell,before_data,sweep_grid,best_score,best_params,f1_microavg,precision_weighted,precision_EUR_down,precision_EUR_same,precision_EUR_up,support_EUR_down,support_EUR_same,support_EUR_up,report,elapsed_time
418,5,reg,gboosting,all,ALL_YES_1_1_1,30_60,included,basic,15.31,{'n_estimators': 50},,,,,,,,,14.34,0.51
446,5,reg,gboosting,all,ALL_YES_3_2_1,30_60,included,basic,15.35,{'n_estimators': 50},,,,,,,,,14.39,0.55
416,3,reg,rforest,all,ALL_YES_1_1_1,30_60,included,basic,15.42,"{'max_depth': 4, 'min_samples_leaf': 10, 'n_es...",,,,,,,,,14.55,38.93
444,3,reg,rforest,all,ALL_YES_3_2_1,30_60,included,basic,15.42,"{'max_depth': 4, 'min_samples_leaf': 10, 'n_es...",,,,,,,,,14.56,39.48
402,3,reg,rforest,basic,ALL_YES_1_1_1,30_60,included,basic,15.41,"{'max_depth': 4, 'min_samples_leaf': 10, 'n_es...",,,,,,,,,14.56,43.27
430,3,reg,rforest,basic,ALL_YES_3_2_1,30_60,included,basic,15.41,"{'max_depth': 4, 'min_samples_leaf': 10, 'n_es...",,,,,,,,,14.56,33.44
417,4,reg,xgb,all,ALL_YES_1_1_1,30_60,included,basic,15.3,{'n_estimators': 50},,,,,,,,,14.6,0.24
404,5,reg,gboosting,basic,ALL_YES_1_1_1,30_60,included,basic,15.37,{'n_estimators': 50},,,,,,,,,14.6,0.58
432,5,reg,gboosting,basic,ALL_YES_3_2_1,30_60,included,basic,15.31,{'n_estimators': 50},,,,,,,,,14.6,0.45
445,4,reg,xgb,all,ALL_YES_3_2_1,30_60,included,basic,15.4,{'n_estimators': 50},,,,,,,,,14.65,0.25


Hmm, all in the 30_60 windown. 
As the clf model for 60_120 is quite good (81% precision in up direction), it would be ideal to also have a decent accuracy on the corresponding regression model

In [45]:
df_reg[df_reg['sweep_buy_sell'] == '60_120'].sort_values(by='report', ascending='False')[0:10]

Unnamed: 0.1,Unnamed: 0,model_type,model,sweeps_market_variables,sweep_news_agg,sweep_buy_sell,before_data,sweep_grid,best_score,best_params,f1_microavg,precision_weighted,precision_EUR_down,precision_EUR_same,precision_EUR_up,support_EUR_down,support_EUR_same,support_EUR_up,report,elapsed_time
768,5,reg,gboosting,basic,ALL_YES_3_2_1,60_120,included,basic,20.83,{'n_estimators': 50},,,,,,,,,20.66,0.48
740,5,reg,gboosting,basic,ALL_YES_1_1_1,60_120,included,basic,20.91,{'n_estimators': 50},,,,,,,,,20.66,0.5
767,4,reg,xgb,basic,ALL_YES_3_2_1,60_120,included,basic,21.01,{'n_estimators': 50},,,,,,,,,20.7,0.23
739,4,reg,xgb,basic,ALL_YES_1_1_1,60_120,included,basic,21.04,{'n_estimators': 50},,,,,,,,,20.7,0.24
783,6,reg,ada,all,ALL_YES_3_2_1,60_120,included,basic,21.91,{'n_estimators': 10},,,,,,,,,20.75,0.78
769,6,reg,ada,basic,ALL_YES_3_2_1,60_120,included,basic,21.96,{'n_estimators': 10},,,,,,,,,20.76,0.79
781,4,reg,xgb,all,ALL_YES_3_2_1,60_120,included,basic,21.04,{'n_estimators': 50},,,,,,,,,20.77,0.26
753,4,reg,xgb,all,ALL_YES_1_1_1,60_120,included,basic,21.09,{'n_estimators': 50},,,,,,,,,20.87,0.26
754,5,reg,gboosting,all,ALL_YES_1_1_1,60_120,included,basic,20.86,{'n_estimators': 50},,,,,,,,,20.88,0.55
752,3,reg,rforest,all,ALL_YES_1_1_1,60_120,included,basic,21.32,"{'max_depth': 4, 'min_samples_leaf': 10, 'n_es...",,,,,,,,,20.9,40.92
