# **UGain ML Tree-based Models auto-sklearn**

In this notebook, auto-sklearn is illustrated. This is a toolkit that automates algorithm selection and hyperparameter tuning.

**1. Import Libraries**

First we need to import several libraries. Auto-sklearn does not come with google colab by default, therefore we use pip install.

In [None]:
!pip install auto-sklearn

In [None]:
# Auto-Sklearn

# Basic libraries
import pandas as pd
import numpy as np
import warnings
warnings.simplefilter("ignore", UserWarning)
warnings.simplefilter("ignore", RuntimeWarning)

# Sklearn
## Data
from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split

## Models
from sklearn import tree
from sklearn import ensemble
from sklearn.model_selection import GridSearchCV

## Model Explaination
from sklearn.inspection import permutation_importance

## Metrics
from sklearn.metrics import mean_squared_error

# XGBoost
import xgboost

# Plotting
import graphviz
import seaborn as sns
import matplotlib.pyplot as plt
from IPython.display import display

# Auto-Sklearn
try:
  import autosklearn.regression
  import autosklearn.metrics
finally:
  import autosklearn.regression
  import autosklearn.metrics

Next, we load the dataset and split into training and test. Seed ensures we always get the same split.

In [None]:
# Load dataset
diabetes_data = load_diabetes()
predictors = diabetes_data['data']
labels = diabetes_data['target']

# Parameters
seed = 0

# Train - Test Split
X_train, X_test, y_train, y_test = train_test_split(predictors, 
                                                    labels, 
                                                    random_state=seed)

**2. Use AutoML to fit the best model**

Now we are ready to fit the model. This happens in 4 steps:
- Define the model
- Fit to training data
- Calculate rmse for train and test. Print to the screen
- Print out the final models ('leaderboard')

First we define and fit the model

In [None]:
# Define AutoML regression model from Auto-Sklearn
automl = autosklearn.regression.AutoSklearnRegressor(time_left_for_this_task=120,
                                                     metric=autosklearn.metrics.root_mean_squared_error)

# Fit AutoML regression model 
automl.fit(X_train, y_train)

Next, we define a simple function that calculates the rmse and then we use it to get the train and test rmse values

In [None]:
#simple function to calculate rmse
def get_rmse(model, predictors, labels):
  predictions = model.predict(predictors)
  rmse = mean_squared_error(labels, predictions, squared=False)
  return rmse

# Calculate root mean square error of the train and test sets
train_rmse = get_rmse(automl, X_train, y_train)
test_rmse = get_rmse(automl, X_test, y_test)

# Verbose
print("Train set root mean squared error is: {} and test set root mean squared error is: {}".format(round(train_rmse, 4), 
                                                                                                    round(test_rmse, 4)))

Finally, we can take a look at the leaderboard

In [None]:
# Verbose Final Model Leaderboard from AutoML
print(automl.leaderboard())