### AutoML — Let Machine Learning Give Your Model Selection a Jump-Start
##### Leveraging AutoML to increase productivity

### What Is AutoML?
Automatic Machine Learning or AutoML is the process of automating the ML workflow of data cleaning, model selection, training, hyperparameter optimization, and even sometimes model deployment. AutoML was initially developed with the goal of making ML more accessible to non-technical users and over time has evolved into a reliable productivity tool even for experienced ML practitioners.

Now that we understand what AutoML is, let’s move on to seeing it in action

#### Implementation
We will initially go through the quick implementation of AutoML, using AutoGluon and then will compare the results to a developed model.

If this is your first time using AutoGluon, you may need to install it in your environment.

-pip3 install -U pip

-pip3 install -U setuptools wheel

-pip3 install torch==1.12.1+cpu torchvision==0.13.1+cpu torchtext==0.13.1 -f -https://download.pytorch.org/whl/cpu/torch_stable.html

-pip3 install autogluon


##### Now that AutoGluon is ready to use, let’s import the libraries that we will be using.

Follow along with the comments in the code to understand better what automl is all about.

In [1]:
# Import libraries
import pandas as pd
from sklearn.model_selection import train_test_split
from autogluon.tabular import TabularDataset, TabularPredictor

# Show all columns/rows of the dataframe
pd.set_option("display.max_columns", None)
pd.set_option("display.max_rows", None)

In [2]:
# Load the data into a dataframe
df = pd.read_csv('auto-cleaned.csv')

In [4]:
df.head(2)

Unnamed: 0,symboling,make,fuel-type,aspiration,num-of-doors,body-style,drive-wheels,engine-location,wheel-base,length,width,height,curb-weight,engine-type,num-of-cylinders,engine-size,fuel-system,bore,stroke,compression-ratio,horsepower,peak-rpm,city-mpg,highway-mpg,price
0,3,alfa-romero,gas,std,two,convertible,rwd,front,88.6,168.8,64.1,48.8,2548,dohc,four,130,mpfi,3.47,2.68,9.0,111,5000,21,27,13495
1,3,alfa-romero,gas,std,two,convertible,rwd,front,88.6,168.8,64.1,48.8,2548,dohc,four,130,mpfi,3.47,2.68,9.0,111,5000,21,27,16500


In [8]:
# Split the data into train and test set
df_train, df_test = train_test_split(df, test_size=0.3, random_state=1234)

print(f"Data includes {df.shape[0]} rows (and {df.shape[1]} columns), broken down into {df_train.shape[0]} rows for training and the balance {df_test.shape[0]} rows for testing.")

Data includes 193 rows (and 25 columns), broken down into 135 rows for training and the balance 58 rows for testing.


In [11]:
df.head()

Unnamed: 0,symboling,make,fuel-type,aspiration,num-of-doors,body-style,drive-wheels,engine-location,wheel-base,length,width,height,curb-weight,engine-type,num-of-cylinders,engine-size,fuel-system,bore,stroke,compression-ratio,horsepower,peak-rpm,city-mpg,highway-mpg,price
0,3,alfa-romero,gas,std,two,convertible,rwd,front,88.6,168.8,64.1,48.8,2548,dohc,four,130,mpfi,3.47,2.68,9.0,111,5000,21,27,13495
1,3,alfa-romero,gas,std,two,convertible,rwd,front,88.6,168.8,64.1,48.8,2548,dohc,four,130,mpfi,3.47,2.68,9.0,111,5000,21,27,16500
2,1,alfa-romero,gas,std,two,hatchback,rwd,front,94.5,171.2,65.5,52.4,2823,ohcv,six,152,mpfi,2.68,3.47,9.0,154,5000,19,26,16500
3,2,audi,gas,std,four,sedan,fwd,front,99.8,176.6,66.2,54.3,2337,ohc,four,109,mpfi,3.19,3.4,10.0,102,5500,24,30,13950
4,2,audi,gas,std,four,sedan,4wd,front,99.4,176.6,66.4,54.3,2824,ohc,five,136,mpfi,3.19,3.4,8.0,115,5500,18,22,17450


In [10]:
# Run AutoGluon

# Create a dictionary of hyperparameters for the models to be included
hyperparameters_dict = {
    'GBM':{}, 
    'CAT':{},
    'XGB':{},
    'RF':{}, 
    'XT':{}, 
    'KNN':{},
    'LR':{},
    }

# 1. Fit/train the models
autogluon_predictor = TabularPredictor(label="price").fit(train_data=df_train, presets='best_quality', hyperparameters=hyperparameters_dict)

# 2. Create predictions
predictions = autogluon_predictor.predict(df_test)

# 3. Create the leaderboard
autogluon_predictor.leaderboard(silent=True)

No path specified. Models will be saved in: "AutogluonModels/ag-20230214_103557/"
Presets specified: ['best_quality']
Stack configuration (auto_stack=True): num_stack_levels=0, num_bag_folds=5, num_bag_sets=1
Beginning AutoGluon training ...
AutoGluon will save models to "AutogluonModels/ag-20230214_103557/"
AutoGluon Version:  0.6.3b20230214
Python Version:     3.10.9
Operating System:   Linux
Platform Machine:   x86_64
Platform Version:   #1 SMP PREEMPT_DYNAMIC Debian 6.0.12-1kali1 (2022-12-19)
Train Data Rows:    135
Train Data Columns: 24
Label Column: price
Preprocessing data ...
AutoGluon infers your prediction problem is: 'regression' (because dtype of label-column == int and many unique label-values observed).
	Label info (max, min, mean, stddev): (41315, 5118, 12979.57037, 8081.9961)
	If 'regression' is not the correct problem_type, please manually specify the problem_type parameter during predictor init (You may specify problem_type as one of: ['binary', 'multiclass', 'regres

Unnamed: 0,model,score_val,pred_time_val,fit_time,pred_time_val_marginal,fit_time_marginal,stack_level,can_infer,fit_order
0,WeightedEnsemble_L2,-2142.505043,0.450482,28.385936,0.000599,0.313455,2,True,8
1,CatBoost_BAG_L1,-2321.737536,0.053872,22.810377,0.053872,22.810377,1,True,4
2,RandomForest_BAG_L1,-2331.651089,0.134075,1.151259,0.134075,1.151259,1,True,3
3,ExtraTrees_BAG_L1,-2391.51825,0.125949,0.711177,0.125949,0.711177,1,True,5
4,XGBoost_BAG_L1,-2623.307136,0.049394,4.555348,0.049394,4.555348,1,True,6
5,LinearModel_BAG_L1,-2660.735162,0.135987,3.399668,0.135987,3.399668,1,True,7
6,LightGBM_BAG_L1,-3347.538296,0.055285,5.970834,0.055285,5.970834,1,True,2
7,KNeighbors_BAG_L1,-3535.041665,0.053565,0.01312,0.053565,0.01312,1,True,1


## Leaders Table

In the final results, the column named “model” shows the name of the models that we included in our dictionary of models. There are eight of them (note that row numbers range from 0 to 7 for a total of 8). Column named “score_val” is the Root Mean Squared Error (RMSE) multiplied by -1 (AutoGluon does this multiplication by -1 so that the higher number is the better). Models are ranked from the best at the top of the table to the worst at the bottom of the table. In other words, “WeightedEnsemble_L2” is the best model in this exercise with an RMSE of ~2,142.

