## Appendix: Simple models
*Tim Braams (8460701), Vinh Phan (8462380), Maximilian Pintilie (8462780), Rahul Singh (8464147), Kartik Vijay (8463465), Diego Zucchino (8345420)*   
  
Just a bunch of simple Sklearn models with default parameters applied to the data to get a feeling for model performance. More in-depth analysis can be found in the submission notebook.

In [1]:
import os
import pandas as pd
from sklearn.metrics import mean_squared_error, mean_absolute_error

import scripts
import scripts.ResultStore as rs
import scripts.FitPredict as fp

### Load data

In [2]:
train = pd.read_json("../02_Pool_Data/pooled_train_300.json")
test = pd.read_json("../02_Pool_Data/pooled_test_300.json")

In [3]:
train = train.loc[~(train==0).any(axis=1)]

In [4]:
y_train = train["returns"]
X_train = train.drop(["index", "asset", "returns"], axis=1)

y_test = test["returns"]
X_test = test.drop(["index", "asset", "returns"], axis=1)

### Helper functions

In [5]:
results = rs.ResultStore(load_if_exists=False)

### Baseline

In [6]:
from sklearn.dummy import DummyRegressor

In [7]:
from sklearn.dummy import DummyRegressor
dummy_model, dummy_predictions, dummy_results = fp.fit_predict(DummyRegressor(strategy="mean"), 
                                                            X_fit=X_train, y_fit=y_train,
                                                            X_validate=X_test, y_validate=y_test,
                                                            store=results)

### Linear Regression
https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html#sklearn.linear_model.LinearRegression

In [8]:
from sklearn.linear_model import LinearRegression

In [9]:
from sklearn.linear_model import LinearRegression
linear_model, linear_predictions, linear_results = fp.fit_predict(LinearRegression(), 
                                                               X_fit=X_train, y_fit=y_train,
                                                               X_validate=X_test, y_validate=y_test, 
                                                               store=results)

### Support Vector Machine
https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVR.html#sklearn.svm.SVR

In [10]:
from sklearn.svm import SVR

In [11]:
svr_model, svr_predictions, svr_results = fp.fit_predict(SVR(kernel="rbf", C=1, gamma=0.1, epsilon=0.1), store=results, X_fit=X_train, y_fit=y_train,
                                                               X_validate=X_test, y_validate=y_test)

### ElasticNet
https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.ElasticNet.html#sklearn.linear_model.ElasticNet

In [12]:
from sklearn.linear_model import ElasticNet

In [13]:
elastic_model, elastic_predictions, elastic_results = fp.fit_predict(ElasticNet(), 
                                                                     X_fit=X_train, y_fit=y_train,
                                                               X_validate=X_test, y_validate=y_test, store=results)

### PLS Regression
https://scikit-learn.org/stable/modules/generated/sklearn.cross_decomposition.PLSRegression.html#sklearn.cross_decomposition.PLSRegression

In [14]:
from sklearn.cross_decomposition import PLSRegression

In [15]:
pls_model, pls_predictions, pls_results = fp.fit_predict(PLSRegression(), X_fit=X_train, y_fit=y_train,
                                                               X_validate=X_test, y_validate=y_test, store=results)

### Gradient Boosting
https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingRegressor.html#sklearn.ensemble.GradientBoostingRegressor

In [16]:
from sklearn.ensemble import GradientBoostingRegressor

In [None]:
gbr_model, gbr_predictions, gbr_results = fp.fit_predict(GradientBoostingRegressor(), X_fit=X_train, y_fit=y_train,
                                                               X_validate=X_test, y_validate=y_test, store=results)

### XGBoost
https://xgboost.readthedocs.io/en/stable/python/python_api.html#module-xgboost.sklearn

In [None]:
from xgboost import XGBRegressor

In [None]:
xgb_model, xgb_predictions, xgb_results = fp.fit_predict(XGBRegressor(), X_fit=X_train, y_fit=y_train,
                                                               X_validate=X_test, y_validate=y_test, store=results)

### MLP
https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPRegressor.html#sklearn.neural_network.MLPRegressor

In [None]:
from sklearn.neural_network import MLPRegressor

In [None]:
mlp_model, mlp_predictions, mlp_results = fp.fit_predict(MLPRegressor(), X_fit=X_train, y_fit=y_train,
                                                               X_validate=X_test, y_validate=y_test, store=results)

### Results
We can see that ensamble methods outperform the other methods. It would make sense to further look into these methods and dont invest more time into the other models.

In [None]:
results.get_df().sort_values("mse")

In [None]:
results.save(path="results", name="simple.json")