<a href="https://colab.research.google.com/github/microprediction/precise/blob/main/examples_colab_notebooks/lazypredict_model_portfolio.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!pip install lazypredict
!pip install precise
!pip install --upgrade pandas 

## Using LazyPredict and Precise to construct a portfolio of models 


*   [LazyPredict](https://github.com/shankarpandala/lazypredict) is a package that generates a slew of sklearn models 
*   [Precise](https://github.com/microprediction/precise) is a package that builds portfolios. 

Let's see if a convex combination (long portfolio) of models performs better than just picking the best out of sample model. I use the data example pulled straight from the LazyPredict README, in turn borrowed from sklearn. 

In [1]:
from sklearn import datasets
from sklearn.utils import shuffle
import numpy as np
from pprint import pprint
from lazypredict.Supervised import LazyRegressor



Here's what we do: 

1.  Train on X_train, y_train
2.  Select best based on X_test, y_test out of sample performance
3.  Retrain on X_train+X_test
4.  Estimate portfolio using X_test,y_test covariance
5.  Compare the val performance of:
    - The best model from step 2, retrained in step 3.
    - A weighted combination of models from step 4.







In [23]:
boston = datasets.load_boston()
X, y = shuffle(boston.data, boston.target)
X = X.astype(np.float32)
n_train = 100
n_test = 50
X_train, y_train = X[:n_train], y[:n_train]
X_test, y_test = X[n_train:(n_train+n_test)], y[n_train:(n_train+n_test)]
X_val, y_val = X[(n_train+n_test):], y[(n_train+n_test):]
X_train_and_test = X[:(n_train+n_test)]
y_train_and_test = y[:(n_train+n_test)]

# Train on some, predict test
reg1 = LazyRegressor(verbose=0, ignore_warnings=False, custom_metric=None, predictions=True)
models1, predictions1 = reg1.fit(np.copy(X_train), np.copy(X_test), np.copy(y_train), np.copy(y_test))
print(models1[:5])

# Train on some, predict validation
reg2 = LazyRegressor(verbose=0, ignore_warnings=False, custom_metric=None, predictions=True)
X_train_and_test_copy = np.copy(X_train_and_test)
X_val_copy = np.copy(X_val)
models2, predictions2 = reg2.fit(X_train_and_test_copy, X_val_copy, np.copy(y_train_and_test), np.copy(y_val))
yhat_val = predictions2.values
print(models2[:5])

# In-sample performance on train
reg3 = LazyRegressor(verbose=0, ignore_warnings=False, custom_metric=None, predictions=True)
models3, predictions3 = reg3.fit(np.copy(X_train), np.copy(X_train), np.copy(y_train), np.copy(y_train))

# In-sample performance on train + test
reg4 = LazyRegressor(verbose=0, ignore_warnings=False, custom_metric=None, predictions=True)
models4, predictions4 = reg4.fit(np.copy(X_train_and_test), np.copy(X_train_and_test), np.copy(y_train_and_test), np.copy(y_train_and_test))

best_model_1 = models1.index[0]  # <-- Best out of sample on test
best_model_2 = models3.index[0]  # <-- Best in sample on train
best_model_3 = models4.index[0]  # <-- Best in sample on train+test

# Train cov on out of sample prediction errors
print('Creating portfolio ...')
from precise.skaters.managers.ppomanagers import ppo_sk_glcv_pcov_d0_n100_t0_vol_long_manager as mgr
s = {}
yhat_test = np.copy(predictions1.values)
n_test = len(yhat_test)
es = [-1]*(n_test-1)+[1]
for y, y_target,e in zip(yhat_test, y_test,es):
    y_error = np.copy(y-y_target)
    w, s = mgr(s=s, y=y_error, e=e)

w_dict = sorted([(wi,mi) for (wi,mi) in zip(w, models1.index) if wi>0], reverse=True)
pprint(w_dict)

# Refit models using all the train+test data, and combine
sum_w = sum(w)
yhat_weighted = np.dot( yhat_val, w )
predictions2['>> weighted portfolio of models '] = yhat_weighted
predictions2['>> best out of sample model  (' + best_model_1 + ')'] = predictions2[best_model_1]
predictions2['>> best in sample i (' + best_model_2 + ')'] = predictions2[best_model_2]
predictions2['>> best in sample ii (' + best_model_3 + ')'] = predictions2[best_model_3]

val_errors = predictions2.copy()
for col in predictions2.columns:
    val_errors[col] = predictions2[col] - y_val

sq_errors = val_errors**2
print(sq_errors.mean().sort_values())
print('done')


 71%|███████▏  | 30/42 [00:01<00:00, 15.52it/s]

MLPRegressor model failed to execute
Stochastic Optimizer: Maximum iterations (200) reached and the optimization hasn't converged yet.


100%|██████████| 42/42 [00:02<00:00, 18.34it/s]
  2%|▏         | 1/42 [00:00<00:04,  9.58it/s]

                             Adjusted R-Squared  R-Squared  RMSE  Time Taken
Model                                                                       
OrthogonalMatchingPursuitCV                0.77       0.83  3.53        0.04
RandomForestRegressor                      0.76       0.83  3.56        0.23
PoissonRegressor                           0.75       0.82  3.66        0.02
BaggingRegressor                           0.73       0.80  3.84        0.04
ExtraTreesRegressor                        0.72       0.80  3.86        0.15


 67%|██████▋   | 28/42 [00:01<00:01, 11.76it/s]

MLPRegressor model failed to execute
Stochastic Optimizer: Maximum iterations (200) reached and the optimization hasn't converged yet.


100%|██████████| 42/42 [00:02<00:00, 16.87it/s]
  5%|▍         | 2/42 [00:00<00:02, 15.16it/s]

                           Adjusted R-Squared  R-Squared  RMSE  Time Taken
Model                                                                     
GradientBoostingRegressor                0.88       0.88  3.24        0.12
RandomForestRegressor                    0.84       0.85  3.66        0.28
ExtraTreesRegressor                      0.83       0.84  3.85        0.17
BaggingRegressor                         0.83       0.83  3.86        0.04
AdaBoostRegressor                        0.83       0.83  3.88        0.10


 71%|███████▏  | 30/42 [00:01<00:00, 15.90it/s]

MLPRegressor model failed to execute
Stochastic Optimizer: Maximum iterations (200) reached and the optimization hasn't converged yet.


100%|██████████| 42/42 [00:02<00:00, 18.48it/s]
 74%|███████▍  | 31/42 [00:01<00:00, 13.81it/s]

MLPRegressor model failed to execute
Stochastic Optimizer: Maximum iterations (200) reached and the optimization hasn't converged yet.


100%|██████████| 42/42 [00:02<00:00, 16.92it/s]


Creating portfolio ...
[(0.07594000000000001, 'DecisionTreeRegressor'),
 (0.059660000000000005, 'GeneralizedLinearRegressor'),
 (0.055560000000000005, 'KNeighborsRegressor'),
 (0.05206000000000001, 'NuSVR'),
 (0.04846000000000001, 'TransformedTargetRegressor'),
 (0.04558000000000001, 'PoissonRegressor'),
 (0.044770000000000004, 'LarsCV'),
 (0.04420000000000001, 'PassiveAggressiveRegressor'),
 (0.043030000000000006, 'LinearSVR'),
 (0.04299000000000001, 'XGBRegressor'),
 (0.04263000000000001, 'SVR'),
 (0.04241000000000001, 'LinearRegression'),
 (0.04241000000000001, 'GradientBoostingRegressor'),
 (0.04241000000000001, 'DummyRegressor'),
 (0.03783000000000001, 'BayesianRidge'),
 (0.029790000000000004, 'RandomForestRegressor'),
 (0.029210000000000003, 'Ridge'),
 (0.026990000000000004, 'TweedieRegressor'),
 (0.025740000000000002, 'KernelRidge'),
 (0.024570000000000005, 'OrthogonalMatchingPursuitCV'),
 (0.022060000000000003, 'Lars'),
 (0.019960000000000002, 'ElasticNet'),
 (0.019730000000000