<a href="https://colab.research.google.com/github/sugatoray/CodeSnippets/blob/master/src/Notebooks/ml_prediction_with_lazypredict.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Using `lazypredict` Library for Machine Learning

The [`lazypredict`][#gh-lazypredict] library gives a list of common ML algorithm outputs on a given dataset. This notebook is an example implementation to quickly test it out.

[#gh-lazypredict]: https://github.com/shankarpandala/lazypredict

## Example: 

How to install?

```bash
pip install lazyinstall
```

How to import?

```python
import lazypredict
# For LazyClassifier
from lazypredict.Supervised import LazyClassifier
# For LazyRegressor
from lazypredict.Supervised import LazyRegressor
```



### Example - Classification

```python
# import lazypredict
from lazypredict.Supervised import LazyClassifier
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
data = load_breast_cancer()
X = data.data
y= data.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.5, random_state =123)
clf = LazyClassifier(verbose=0, ignore_warnings=True, custom_metric=None)
clf_models, clf_predictions = clf.fit(X_train, X_test, y_train, y_test)
clf_models
```


<details>
    <summary>
    Sample Output - LazyClassifier
    </summary>

| Model                          |   Accuracy |   Balanced Accuracy |   ROC AUC |   F1 Score |   Time Taken |
|:-------------------------------|-----------:|--------------------:|----------:|-----------:|-------------:|
| LinearSVC                      |   0.989474 |            0.987544 |  0.987544 |   0.989462 |    0.0150008 |
| SGDClassifier                  |   0.989474 |            0.987544 |  0.987544 |   0.989462 |    0.0109992 |
| MLPClassifier                  |   0.985965 |            0.986904 |  0.986904 |   0.985994 |    0.426     |
| Perceptron                     |   0.985965 |            0.984797 |  0.984797 |   0.985965 |    0.0120046 |
| LogisticRegression             |   0.985965 |            0.98269  |  0.98269  |   0.985934 |    0.0200036 |
| LogisticRegressionCV           |   0.985965 |            0.98269  |  0.98269  |   0.985934 |    0.262997  |
| SVC                            |   0.982456 |            0.979942 |  0.979942 |   0.982437 |    0.0140011 |
| CalibratedClassifierCV         |   0.982456 |            0.975728 |  0.975728 |   0.982357 |    0.0350015 |
| PassiveAggressiveClassifier    |   0.975439 |            0.974448 |  0.974448 |   0.975464 |    0.0130005 |
| LabelPropagation               |   0.975439 |            0.974448 |  0.974448 |   0.975464 |    0.0429988 |
| LabelSpreading                 |   0.975439 |            0.974448 |  0.974448 |   0.975464 |    0.0310006 |
| RandomForestClassifier         |   0.97193  |            0.969594 |  0.969594 |   0.97193  |    0.033     |
| GradientBoostingClassifier     |   0.97193  |            0.967486 |  0.967486 |   0.971869 |    0.166998  |
| QuadraticDiscriminantAnalysis  |   0.964912 |            0.966206 |  0.966206 |   0.965052 |    0.0119994 |
| HistGradientBoostingClassifier |   0.968421 |            0.964739 |  0.964739 |   0.968387 |    0.682003  |
| RidgeClassifierCV              |   0.97193  |            0.963272 |  0.963272 |   0.971736 |    0.0130029 |
| RidgeClassifier                |   0.968421 |            0.960525 |  0.960525 |   0.968242 |    0.0119977 |
| AdaBoostClassifier             |   0.961404 |            0.959245 |  0.959245 |   0.961444 |    0.204998  |
| ExtraTreesClassifier           |   0.961404 |            0.957138 |  0.957138 |   0.961362 |    0.0270066 |
| KNeighborsClassifier           |   0.961404 |            0.95503  |  0.95503  |   0.961276 |    0.0560005 |
| BaggingClassifier              |   0.947368 |            0.954577 |  0.954577 |   0.947882 |    0.0559971 |
| BernoulliNB                    |   0.950877 |            0.951003 |  0.951003 |   0.951072 |    0.0169988 |
| LinearDiscriminantAnalysis     |   0.961404 |            0.950816 |  0.950816 |   0.961089 |    0.0199995 |
| GaussianNB                     |   0.954386 |            0.949536 |  0.949536 |   0.954337 |    0.0139935 |
| NuSVC                          |   0.954386 |            0.943215 |  0.943215 |   0.954014 |    0.019989  |
| DecisionTreeClassifier         |   0.936842 |            0.933693 |  0.933693 |   0.936971 |    0.0170023 |
| NearestCentroid                |   0.947368 |            0.933506 |  0.933506 |   0.946801 |    0.0160074 |
| ExtraTreeClassifier            |   0.922807 |            0.912168 |  0.912168 |   0.922462 |    0.0109999 |
| CheckingClassifier             |   0.361404 |            0.5      |  0.5      |   0.191879 |    0.0170043 |
| DummyClassifier                |   0.512281 |            0.489598 |  0.489598 |   0.518924 |    0.0119965 |

</details>

### Example - Regression

```python
from lazypredict.Supervised import LazyRegressor
from sklearn import datasets
from sklearn.utils import shuffle
import numpy as np
boston = datasets.load_boston()
X, y = shuffle(boston.data, boston.target, random_state=13)
X = X.astype(np.float32)
offset = int(X.shape[0] * 0.9)
X_train, y_train = X[:offset], y[:offset]
X_test, y_test = X[offset:], y[offset:]
reg = LazyRegressor(verbose=0, ignore_warnings=False, custom_metric=None )
reg_models, reg_predictions = reg.fit(X_train, X_test, y_train, y_test)
reg_models
```


<details>
    <summary>
    Sample Output - LazyRegressior
    </summary>

| Model                         |   R-Squared |     RMSE |   Time Taken |
|:------------------------------|------------:|---------:|-------------:|
| SVR                           |   0.877199  |  2.62054 |    0.0330021 |
| RandomForestRegressor         |   0.874429  |  2.64993 |    0.0659981 |
| ExtraTreesRegressor           |   0.867566  |  2.72138 |    0.0570002 |
| AdaBoostRegressor             |   0.865851  |  2.73895 |    0.144999  |
| NuSVR                         |   0.863712  |  2.7607  |    0.0340044 |
| GradientBoostingRegressor     |   0.858693  |  2.81107 |    0.13      |
| KNeighborsRegressor           |   0.826307  |  3.1166  |    0.0179954 |
| HistGradientBoostingRegressor |   0.810479  |  3.25551 |    0.820995  |
| BaggingRegressor              |   0.800056  |  3.34383 |    0.0579946 |
| MLPRegressor                  |   0.750536  |  3.73503 |    0.725997  |
| HuberRegressor                |   0.736973  |  3.83522 |    0.0370018 |
| LinearSVR                     |   0.71914   |  3.9631  |    0.0179989 |
| RidgeCV                       |   0.718402  |  3.9683  |    0.018003  |
| BayesianRidge                 |   0.718102  |  3.97041 |    0.0159984 |
| Ridge                         |   0.71765   |  3.9736  |    0.0149941 |
| LinearRegression              |   0.71753   |  3.97444 |    0.0190051 |
| TransformedTargetRegressor    |   0.71753   |  3.97444 |    0.012001  |
| LassoCV                       |   0.717337  |  3.9758  |    0.0960066 |
| ElasticNetCV                  |   0.717104  |  3.97744 |    0.0860076 |
| LassoLarsCV                   |   0.717045  |  3.97786 |    0.0490005 |
| LassoLarsIC                   |   0.716636  |  3.98073 |    0.0210001 |
| LarsCV                        |   0.715031  |  3.99199 |    0.0450008 |
| Lars                          |   0.715031  |  3.99199 |    0.0269964 |
| SGDRegressor                  |   0.714362  |  3.99667 |    0.0210009 |
| RANSACRegressor               |   0.707849  |  4.04198 |    0.111998  |
| ElasticNet                    |   0.690408  |  4.16088 |    0.0190012 |
| Lasso                         |   0.662141  |  4.34668 |    0.0180018 |
| OrthogonalMatchingPursuitCV   |   0.591632  |  4.77877 |    0.0180008 |
| ExtraTreeRegressor            |   0.583314  |  4.82719 |    0.0129974 |
| PassiveAggressiveRegressor    |   0.556668  |  4.97914 |    0.0150032 |
| GaussianProcessRegressor      |   0.428298  |  5.65425 |    0.0580051 |
| OrthogonalMatchingPursuit     |   0.379295  |  5.89159 |    0.0180039 |
| DecisionTreeRegressor         |   0.318767  |  6.17217 |    0.0230272 |
| DummyRegressor                |  -0.0215752 |  7.55832 |    0.0140116 |
| LassoLars                     |  -0.0215752 |  7.55832 |    0.0180008 |
| KernelRidge                   |  -8.24669   | 22.7396  |    0.0309792 |

</details>

## Install

Install `lazypredict`

In [1]:
! pip install lazypredict

Collecting lazypredict
  Downloading https://files.pythonhosted.org/packages/fe/7f/ee936a25b600eec90a112ac1b2de7b56ea9b58ae5a8bbc2c7d870a35037a/lazypredict-0.2.6-py2.py3-none-any.whl
Installing collected packages: lazypredict
Successfully installed lazypredict-0.2.6


Install `catboost`

In [6]:
! pip install catboost

Collecting catboost
[?25l  Downloading https://files.pythonhosted.org/packages/b2/aa/e61819d04ef2bbee778bf4b3a748db1f3ad23512377e43ecfdc3211437a0/catboost-0.23.2-cp36-none-manylinux1_x86_64.whl (64.8MB)
[K     |â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ| 64.8MB 58kB/s 
Installing collected packages: catboost
Successfully installed catboost-0.23.2


## Test `lazypredict`

As of now (2020-07-08)[$^{[1]}$][#1], [$^{[2]}$][#2] you need to **` import`** the following **`four`** libraries: 

[#1]: https://github.com/shankarpandala/lazypredict/issues/97#issuecomment-655424594
[#2]: https://github.com/shankarpandala/lazypredict/issues/97#issuecomment-655077863

- `sys`
- `catboost`
- `xgboost`
- `lightgbm`

ðŸ‘‰ *So install them if necessary.*

<!--- ðŸ‘‰ :point_right: --->

In [7]:
import sys
import catboost
import xgboost
import lightgbm

### Test `LazyClassifier`

#### Run

In [22]:
from lazypredict.Supervised import LazyClassifier
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
data = load_breast_cancer()
X = data.data
y= data.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.5, random_state =123)
clf = LazyClassifier(verbose=0, ignore_warnings=True, custom_metric=None)
clf_models, clf_predictions = clf.fit(X_train, X_test, y_train, y_test)

 97%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–‹| 30/31 [00:01<00:00, 20.28it/s]

Learning rate set to 0.006019
0:	learn: 0.6842374	total: 8.83ms	remaining: 8.82s
1:	learn: 0.6761436	total: 16.7ms	remaining: 8.33s
2:	learn: 0.6663337	total: 24.5ms	remaining: 8.13s
3:	learn: 0.6578786	total: 33.5ms	remaining: 8.34s
4:	learn: 0.6479119	total: 42.2ms	remaining: 8.39s
5:	learn: 0.6403447	total: 50.2ms	remaining: 8.32s
6:	learn: 0.6318749	total: 58.2ms	remaining: 8.26s
7:	learn: 0.6234635	total: 65.9ms	remaining: 8.17s
8:	learn: 0.6145548	total: 73.6ms	remaining: 8.1s
9:	learn: 0.6080760	total: 81.5ms	remaining: 8.07s
10:	learn: 0.6004730	total: 89.3ms	remaining: 8.03s
11:	learn: 0.5916307	total: 97ms	remaining: 7.99s
12:	learn: 0.5840032	total: 105ms	remaining: 7.97s
13:	learn: 0.5763518	total: 113ms	remaining: 7.94s
14:	learn: 0.5700144	total: 120ms	remaining: 7.9s
15:	learn: 0.5628284	total: 128ms	remaining: 7.88s
16:	learn: 0.5570130	total: 143ms	remaining: 8.28s
17:	learn: 0.5509589	total: 152ms	remaining: 8.28s
18:	learn: 0.5448114	total: 160ms	remaining: 8.24s
19:

100%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ| 31/31 [00:09<00:00,  3.26it/s]

989:	learn: 0.0132635	total: 8.19s	remaining: 82.7ms
990:	learn: 0.0132418	total: 8.19s	remaining: 74.4ms
991:	learn: 0.0132211	total: 8.21s	remaining: 66.2ms
992:	learn: 0.0131960	total: 8.21s	remaining: 57.9ms
993:	learn: 0.0131616	total: 8.22s	remaining: 49.6ms
994:	learn: 0.0131413	total: 8.23s	remaining: 41.4ms
995:	learn: 0.0131157	total: 8.24s	remaining: 33.1ms
996:	learn: 0.0130911	total: 8.24s	remaining: 24.8ms
997:	learn: 0.0130714	total: 8.25s	remaining: 16.5ms
998:	learn: 0.0130481	total: 8.26s	remaining: 8.27ms
999:	learn: 0.0130245	total: 8.27s	remaining: 0us





#### Show Results

In [23]:
# show results
clf_models

Unnamed: 0_level_0,Accuracy,Balanced Accuracy,ROC AUC,F1 Score,Time Taken
Model,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
LinearSVC,0.99,0.99,0.99,0.99,0.02
Perceptron,0.99,0.98,0.98,0.99,0.01
LogisticRegression,0.99,0.98,0.98,0.99,0.02
XGBClassifier,0.98,0.98,0.98,0.98,0.1
SVC,0.98,0.98,0.98,0.98,0.02
RandomForestClassifier,0.98,0.98,0.98,0.98,0.18
LabelPropagation,0.98,0.97,0.97,0.98,0.02
LabelSpreading,0.98,0.97,0.97,0.98,0.02
CatBoostClassifier,0.98,0.97,0.97,0.98,8.5
SGDClassifier,0.97,0.97,0.97,0.97,0.01


#### Best Model

In [24]:
# best model (R-Squared)
clf_models.iloc[models['R-Squared'].argmax()]

Accuracy            0.99
Balanced Accuracy   0.99
ROC AUC             0.99
F1 Score            0.99
Time Taken          0.02
Name: LinearSVC, dtype: float64

### Test - `LazyRegressor`

#### Run

In [25]:
from lazypredict.Supervised import LazyRegressor
from sklearn import datasets
from sklearn.utils import shuffle
import numpy as np
boston = datasets.load_boston()
X, y = shuffle(boston.data, boston.target, random_state=13)
X = X.astype(np.float32)
offset = int(X.shape[0] * 0.9)
X_train, y_train = X[:offset], y[:offset]
X_test, y_test = X[offset:], y[offset:]
reg = LazyRegressor(verbose=0,ignore_warnings=False, custom_metric=None )
reg_models, reg_predictions = reg.fit(X_train, X_test, y_train, y_test)

 95%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–Œ| 38/40 [00:03<00:00,  9.98it/s]

StackingRegressor model failed to execute
__init__() missing 1 required positional argument: 'estimators'
Learning rate set to 0.033926
0:	learn: 9.1712310	total: 2.44ms	remaining: 2.44s
1:	learn: 8.9920394	total: 4.68ms	remaining: 2.33s
2:	learn: 8.8073226	total: 8.41ms	remaining: 2.79s
3:	learn: 8.6529582	total: 10.5ms	remaining: 2.62s
4:	learn: 8.5016504	total: 12.7ms	remaining: 2.52s
5:	learn: 8.3672201	total: 14.8ms	remaining: 2.45s
6:	learn: 8.2005790	total: 16.9ms	remaining: 2.4s
7:	learn: 8.0422771	total: 19.1ms	remaining: 2.37s
8:	learn: 7.8933034	total: 21.1ms	remaining: 2.33s
9:	learn: 7.7304436	total: 23.2ms	remaining: 2.3s
10:	learn: 7.5795232	total: 25.3ms	remaining: 2.27s
11:	learn: 7.4344012	total: 27.4ms	remaining: 2.26s
12:	learn: 7.2892673	total: 29.6ms	remaining: 2.25s
13:	learn: 7.1426430	total: 31.7ms	remaining: 2.23s
14:	learn: 7.0170940	total: 33.8ms	remaining: 2.22s
15:	learn: 6.8807202	total: 36.8ms	remaining: 2.26s
16:	learn: 6.7703803	total: 39.5ms	remaining

100%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ| 40/40 [00:05<00:00,  7.07it/s]

966:	learn: 0.6098279	total: 2.21s	remaining: 75.5ms
967:	learn: 0.6088615	total: 2.22s	remaining: 73.3ms
968:	learn: 0.6079635	total: 2.22s	remaining: 71ms
969:	learn: 0.6076427	total: 2.22s	remaining: 68.7ms
970:	learn: 0.6072832	total: 2.22s	remaining: 66.4ms
971:	learn: 0.6063911	total: 2.23s	remaining: 64.1ms
972:	learn: 0.6052529	total: 2.23s	remaining: 61.8ms
973:	learn: 0.6047029	total: 2.23s	remaining: 59.5ms
974:	learn: 0.6037375	total: 2.23s	remaining: 57.2ms
975:	learn: 0.6024339	total: 2.23s	remaining: 54.9ms
976:	learn: 0.6015916	total: 2.24s	remaining: 52.6ms
977:	learn: 0.6006800	total: 2.24s	remaining: 50.4ms
978:	learn: 0.6004717	total: 2.24s	remaining: 48.1ms
979:	learn: 0.5996676	total: 2.24s	remaining: 45.8ms
980:	learn: 0.5991516	total: 2.24s	remaining: 43.5ms
981:	learn: 0.5982463	total: 2.25s	remaining: 41.2ms
982:	learn: 0.5976489	total: 2.25s	remaining: 38.9ms
983:	learn: 0.5969437	total: 2.25s	remaining: 36.6ms
984:	learn: 0.5963700	total: 2.25s	remaining: 34




#### Show Results

In [26]:
# show results
reg_models

Unnamed: 0_level_0,R-Squared,RMSE,Time Taken
Model,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
SVR,0.88,2.62,0.03
CatBoostRegressor,0.87,2.66,2.46
XGBRegressor,0.87,2.67,0.05
BaggingRegressor,0.86,2.75,0.05
NuSVR,0.86,2.76,0.04
RandomForestRegressor,0.86,2.79,0.38
GradientBoostingRegressor,0.86,2.8,0.17
AdaBoostRegressor,0.85,2.93,0.11
HistGradientBoostingRegressor,0.83,3.06,0.29
LGBMRegressor,0.83,3.11,0.07


#### Best Model

In [27]:
# best model (R-Squared)
reg_models.iloc[models['R-Squared'].argmax()]

R-Squared    0.88
RMSE         2.62
Time Taken   0.03
Name: SVR, dtype: float64

## End Notes

Make notes about the library and leave them here.

## Some Extra Stuff

This is an example of a clickable-expandable section in markdown (taken from [github-cml][#gh-cml]).

[#gh-cml]: https://github.com/iterative/cml

<details>    
    <summary>
    Summary-title
    </summary>

summary body goes here.

```python
import numpy as np

print(np.arange(5))
```

Some other [text](www.google.com).

</details>