# How to Compare Machine Learning Models Using lazypredict






<center><img src="https://jlaw.netlify.app/2022/05/03/ml-for-the-lazy-can-automl-beat-my-model/featured_hu30da1cff1e42efaf9229d794144bf478_91906_720x0_resize_lanczos_2.png" 
        alt="ML for the lazy" 
        width="400" 
        height="300" 
        class="centerImage"
        style="width:50%;" /><center>

lazypredict is a package that allows us to build some basic machine learning models without much code, while allowing us to understand which models work best without any parameter tuning.


## Contents

1. Learning Outcomes

2. Machine Learning Models: Recap

3. Lazy Classification

4. Lazy Regression

5. Limitations of lazypredict

6. Glossary of terms

7. References


## 1. Learning Outcomes

By the end of this session, you will be able to:

- Recall the most popular machine learning methods and when they are useful
- Calculate which machine learning model is best using the `LazyPredict` package

## 2. Machine Learning Models: Recap

First, we will recap some of the most commonly used methods used for **supervised** machine learning. The link to the slides can be found [here](https://docs.google.com/presentation/d/1yGoHqwrHyvr-nzIEKMKPtUWctO398b4upXvX8FYFWcw/edit?usp=sharing).

### Install the `lazypredict` library

In [1]:
!pip install lazypredict

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting lazypredict
  Downloading lazypredict-0.2.9-py2.py3-none-any.whl (12 kB)
Collecting xgboost==1.1.1
  Downloading xgboost-1.1.1-py3-none-manylinux2010_x86_64.whl (127.6 MB)
[K     |████████████████████████████████| 127.6 MB 18 kB/s 
[?25hCollecting scipy==1.5.4
  Downloading scipy-1.5.4-cp37-cp37m-manylinux1_x86_64.whl (25.9 MB)
[K     |████████████████████████████████| 25.9 MB 2.8 MB/s 
[?25hCollecting pytest==5.4.3
  Downloading pytest-5.4.3-py3-none-any.whl (248 kB)
[K     |████████████████████████████████| 248 kB 47.9 MB/s 
Collecting scikit-learn==0.23.1
  Downloading scikit_learn-0.23.1-cp37-cp37m-manylinux1_x86_64.whl (6.8 MB)
[K     |████████████████████████████████| 6.8 MB 27.3 MB/s 
[?25hCollecting lightgbm==2.3.1
  Downloading lightgbm-2.3.1-py2.py3-none-manylinux1_x86_64.whl (1.2 MB)
[K     |████████████████████████████████| 1.2 MB 45.0 MB/s 
[?25hCollecting

### Lazy Classification

In [2]:
#import required packages
from lazypredict.Supervised import LazyClassifier
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split



In [3]:
#load in the dataset
data = load_breast_cancer()
X = data.data
y = data.target

In [4]:
#split the dataset into train and test data
X_train, X_test, y_train, y_test = train_test_split(X, y,test_size=.30,random_state =123)

In [5]:
#define and build our lazy classifier
clf = LazyClassifier(verbose=0,ignore_warnings=True, custom_metric=None)
models,predictions = clf.fit(X_train, X_test, y_train, y_test)

100%|██████████| 29/29 [00:01<00:00, 16.18it/s]


In [6]:
models

Unnamed: 0_level_0,Accuracy,Balanced Accuracy,ROC AUC,F1 Score,Time Taken
Model,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
LogisticRegression,0.99,0.99,0.99,0.99,0.09
SGDClassifier,0.99,0.99,0.99,0.99,0.05
LinearSVC,0.99,0.99,0.99,0.99,0.05
Perceptron,0.99,0.99,0.99,0.99,0.02
SVC,0.98,0.98,0.98,0.98,0.03
RandomForestClassifier,0.98,0.98,0.98,0.98,0.23
ExtraTreesClassifier,0.98,0.98,0.98,0.98,0.14
RidgeClassifier,0.98,0.98,0.98,0.98,0.02
QuadraticDiscriminantAnalysis,0.98,0.98,0.98,0.98,0.02
AdaBoostClassifier,0.98,0.98,0.98,0.98,0.17


### Lazy Regression

In [7]:
#import the required packages
from lazypredict.Supervised import LazyRegressor
from sklearn import datasets
from sklearn.utils import shuffle
import numpy as np

In [8]:
#load in the dataset
boston = datasets.load_boston()

In [None]:
#shuffling the dataset
X, y = shuffle(boston.data, boston.target, random_state=13)
offset = int(X.shape[0] * 0.9)

In [9]:
#split data into train and test
X_train, y_train = X[:offset], y[:offset]
X_test, y_test = X[offset:], y[offset:]

In [10]:
#define and build our lazy regressor
reg = LazyRegressor(verbose=0, ignore_warnings=False, custom_metric=None)
models, predictions = reg.fit(X_train, X_test, y_train, y_test)

100%|██████████| 42/42 [00:06<00:00,  6.71it/s]


In [12]:
models

Unnamed: 0_level_0,Adjusted R-Squared,R-Squared,RMSE,Time Taken
Model,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
SVR,0.83,0.88,2.62,0.05
BaggingRegressor,0.83,0.88,2.63,0.05
NuSVR,0.82,0.86,2.76,0.08
RandomForestRegressor,0.81,0.86,2.79,0.68
XGBRegressor,0.81,0.86,2.79,0.5
GradientBoostingRegressor,0.81,0.86,2.84,0.2
ExtraTreesRegressor,0.79,0.84,2.98,0.23
HistGradientBoostingRegressor,0.77,0.83,3.06,0.54
AdaBoostRegressor,0.77,0.83,3.06,0.14
PoissonRegressor,0.77,0.83,3.11,0.02


## Limitations of lazypredict


*   We can't use lazypredict to solve problems belonging to unsupervised learning or reinforcement learning.
*   Although the package does a great job at generating all the baseline models, there is no feature for hyperparameter tuning within lazypredict itself.
 
*   No option for data visualisation - this is something we need to do ourselves afterwards whereas other libraries such as PyCaret do allow this.





## Glossary of Terms

To be able to fully interpret the tables that lazypredict produces, we need to ensure we understand what each column is telling us. 

**Accuracy Score** 

Informally, accuracy is the fraction of predictions our model got right. Formally, accuracy has the definition of number of correct predictions divided by the number of total predictions.



**Balanced Accuracy Score**

Further development on the standard accuracy metric where it's adjusted to perform better on imbalanced datasets. The way it does this is by calculating the average accuracy for each class, instead of combining them as is the case with standard accuracy.


**ROC AUC Score**

ROC AUC score ranges from 0.5 to 1, where 1 is the perfect score and 0.5 means the model is as good as random.

**F1 Score**

Also sometimes called the **F Score**, and is a way of combining the precision (the fraction of true positive examples among the examples that the model classified as positive) and the recall (fraction of examples classified as positive, among the total number of positive examples) of a model. It's possible to adjust the score to give more weighting to the precision over the recall, or vice versa.

<img src="https://images.deepai.org/user-content/9954225913-thumb-4901.svg" 
        alt="Mathematical formula for F! score" 
        width="400" 
        height="300" 
        style="display: block; margin: 0 auto" />


The mathematical formula for the F1 Score.        




**R Squared Score**

This metric gives an indication of how good a model fits a given dataset. It indicates how close the regression line (i.e the predicted values plotted) is to the actual data values. 

The R squared value lies between 0 and 1 where 0 indicates that this model doesn't fit the given data and 1 indicates that the model fits perfectly to the dataset provided.


**Adjusted R Squared Score**

R squared tends to optimistically estimate the fit of the linear regression. Adjusted R squared attempts to correct for this overestimation. 

It is calculated by dividing the residual mean square error by the total mean square error (which is the sample variance of the target field). The result is then subtracted from 1.

A value of 1 indicates a model that perfectly predicts values in the target field. A value that is less than or equal to 0 indicates a model that has no predictive value.

**Root Mean Squared Error (RMSE)**

Root Mean Square Error (RMSE) is the standard deviation of the residuals (prediction errors). Residuals are a measure of how far from the regression line data points are; RMSE is a measure of how spread out these residuals are. In other words, it tells you how concentrated the data is around the line of best fit.

## References

https://towardsdatascience.com/3-amazing-low-code-machine-learning-libraries-that-you-should-know-about-a66895c0cc08

[lazypredict Documentation](https://lazypredict.readthedocs.io/)

[lazypredict GitHub](https://github.com/shankarpandala/lazypredict)