# AutoML tools: LazyPredict

In this notebook, we will explore a simple AutoML library:
[**LazyPredict**](https://lazypredict.readthedocs.io/en/latest/).

We will be using these tools for regression (Boston dataset) and classification (Titanic dataset) problems. We will explore their features and limitations. 

First, we install the
[**LazyPredict**](https://lazypredict.readthedocs.io/en/latest/)
library.

As of June 2024 there is an issue with the LazyPredict library.
It still relies on an older version (< 1.2) of scikit-learn.
One easy fix is to downgrade scikit-learn to version 1.1.3.

In [0]:
!pip install -q lazypredict

In [0]:
!pip install --force-reinstall -v scikit-learn==1.1.3

In [0]:
# You only need to run this cell after installing the optuna package on Databricks
dbutils.library.restartPython()

Then we load the Boston dataset using Pandas.

In [0]:
import pandas as pd

boston_df = pd.read_csv('../../../Data/Boston.csv')

Before using AutoML tools, let's take a quick look at our dataset and its structure:

In [0]:
boston_df.head()

In [0]:
boston_df.describe()

In [0]:
from sklearn.model_selection import train_test_split

X = boston_df.iloc[:, 1:14]
y = boston_df.iloc[:, -1]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)

## Regression with LazyPredict

[LazyPredict](https://lazypredict.readthedocs.io/en/latest/)
is an open-source Python library which applies various machine learning models on a dataset and compares their performances.
It supports regression as well as classification problems. 

This library is a simple tool **without hyperparameter tuning**, but it can provide valueable insight, which type of algorithm works well for a given problem and is worth exploring further.

Let's try it out!

In [0]:
from lazypredict.Supervised import LazyRegressor

reg = LazyRegressor(predictions=True)
models, predictions = reg.fit(X_train, X_test, y_train, y_test)

In [0]:
models

In [0]:
predictions

You can also pass to LazyRegressor() additional optional parameters such as the **verbose** flag, which controls the level of output produced during training, and the **custom_metric** parameter, which allows you to specify a custom metric to use for evaluating the model. See example below:

In [0]:
from sklearn.metrics import mean_absolute_error

reg = LazyRegressor(verbose=0, predictions=True, custom_metric = mean_absolute_error)
models, predictions = reg.fit(X_train, X_test, y_train, y_test)

In [0]:
models

Now we can see our custom metric in the last column.

We got top-5 models: 
* [Gradient Boosting Regressor](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingRegressor.html)
* [Bagging Regressor](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.BaggingRegressor.html)
* [Random Forest Regressor](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html)
* [XGB Regressor](https://xgboost.readthedocs.io/en/stable/python/python_api.html#module-xgboost.sklearn)
* [Extra Trees Regressor](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.ExtraTreesRegressor.html). 

LazyPredict provides an easy way to see which models work better, so we can focus on them, tune hyperparameters etc.

The disadvantage is that LazyPredict doesn't give an opportunity to export the best model.

##Your turn!

Now, it's time to take your newly acquired knowledge and skills to the next level by trying the LazyPredict library for classification problem.

In [0]:
# Task: Import titanic.csv dataset

titanic_df = pd.read_csv('../../../Data/titanic.csv')

In [0]:
X = titanic_df[['Sex', 'Embarked', 'Pclass', 'Age', 'Survived']]
y = titanic_df[['Survived']]

In [0]:
X

In [0]:
y

In [0]:
# Task: split the dataset into train and test sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)

## Classification with LazyPredict

*Previously, we used LazyPredict for a regression problem. Now, since you have a classification task, it's recommended to go through the documentation to address the following task: https://lazypredict.readthedocs.io/en/latest/usage.html#classification.*

In [0]:
# Task: compare different classification models on titanic dataset with LazyClassifier

# Your code here...
from lazypredict.Supervised import LazyClassifier

clf = LazyClassifier()
models, predictions = clf.fit(X_train, X_test, y_train, y_test)
models


# Think how would you interpret the results

Congratulations! You've completed the study notebook on automating machine learning workflows with and LazyPredict. 
By automating repetitive tasks, this library enables us to iterate faster, experiment with various algorithms, and gain valuable insights from our data more efficiently.

As you continue your journey in machine learning and keep on using this library, we encourage you to dive deeper into the documentation and Github page.

**Resources:**
- Documentation: https://lazypredict.readthedocs.io/en/latest/
- Github: https://github.com/shankarpandala/lazypredict