In [1]:
from autogluon.tabular import TabularPredictor,TabularDataset
import pandas as pd

In [2]:
train_data = TabularDataset('https://autogluon.s3.amazonaws.com/datasets/Inc/train.csv')
test_data = TabularDataset('https://autogluon.s3.amazonaws.com/datasets/Inc/test.csv')
test_data_no_label = test_data.drop(columns=['class'])

subsample_size = 500
train_data = train_data.sample(n=subsample_size, random_state=0)
train_data.head()

Unnamed: 0,age,workclass,fnlwgt,education,education-num,marital-status,occupation,relationship,race,sex,capital-gain,capital-loss,hours-per-week,native-country,class
6118,51,Private,39264,Some-college,10,Married-civ-spouse,Exec-managerial,Wife,White,Female,0,0,40,United-States,>50K
23204,58,Private,51662,10th,6,Married-civ-spouse,Other-service,Wife,White,Female,0,0,8,United-States,<=50K
29590,40,Private,326310,Some-college,10,Married-civ-spouse,Craft-repair,Husband,White,Male,0,0,44,United-States,<=50K
18116,37,Private,222450,HS-grad,9,Never-married,Sales,Not-in-family,White,Male,0,2339,40,El-Salvador,<=50K
33964,62,Private,109190,Bachelors,13,Married-civ-spouse,Exec-managerial,Husband,White,Male,15024,0,40,United-States,>50K


In [3]:
train_data["class"].value_counts()

Unnamed: 0_level_0,count
class,Unnamed: 1_level_1
<=50K,365
>50K,135


In [4]:
test_data["class"].value_counts()

Unnamed: 0_level_0,count
class,Unnamed: 1_level_1
<=50K,7451
>50K,2318


# Why Autogluon is Different?

It emphasizes advanced ensembe/stacking methods rather than solely hyperparameter optimization which often yields better results.

>Unlike existing AutoML frameworks that primarily focus
on model/hyperparameter selection, AutoGluonTabular succeeds by ensembling multiple models
and stacking them in multiple layers. Experiments
reveal that our multi-layer combination of many
models offers better use of allocated training time
than seeking out the best.

> Do not specify the hyperparameter_tune_kwargs argument (counterintuitively, hyperparameter tuning is not the best way to spend a limited training time budgets, as model ensembling is often superior). We recommend you only use hyperparameter_tune_kwargs if your goal is to deploy a single model rather than an ensemble. Do not specify the hyperparameters argument (allow AutoGluon to adaptively select which models/hyperparameters to use).

* It is also recommended not to optimize hyperparameters for most of the cases.

# Model Training and Prediction

* Model selects the validation set from the training set automatically.

* Model decides the problem type automatically based on target column.

* Model selects best performing model on validation set to create the test set predictions. It is also possible to predict the outcomes using a particular model.

* Feature importances are calculated based on permutation importance. If the importance is negative, it is better to drop that column and train the model again.

* Feature engineering is done automatically though manual preprocessing is still possible:

## Automatic Feature Engineering

* **Numeric:** No preprocessing
* **Category:** Ordinal Encoding
* **Text Features:** n-grams using transformer models
* **Datetime:** min,max, day, month, year etc.

**Note:** Use pandas operations for datapreprocessing since it is more convenient than the supported ones (IMO).

In [5]:
predictor = TabularPredictor(label="class",
                             eval_metric = "balanced_accuracy").fit(train_data,
                                                presets = "medium", #arrange depending on the task
                                                ag_args_fit={'num_gpus': 1}, #GPU support
                                                verbosity = 1)

No path specified. Models will be saved in: "AutogluonModels/ag-20251112_103624"
Preset alias specified: 'medium' maps to 'medium_quality'.
AutoGluon infers your prediction problem is: 'binary' (because only two unique label-values observed).
	If 'binary' is not the correct problem_type, please manually specify the problem_type parameter during Predictor init (You may specify problem_type as one of: ['binary', 'multiclass', 'regression', 'quantile'])
Default metric period is 5 because BalancedAccuracy is/are not implemented for GPU
Metric BalancedAccuracy is not implemented on GPU. Will use CPU for metric computation, this could significantly affect learning time
Metric balanced_accuracy is not supported by this model - using log_loss instead
Potential solutions:
- Use a data structure that matches the device ordinal in the booster.
- Set the device for booster before call to inplace_predict.


  return func(**kwargs)


In [6]:
y_pred = predictor.predict(test_data_no_label)
y_pred.head(2)

Unnamed: 0,class
0,<=50K
1,<=50K


In [7]:
y_pred_cat = predictor.predict(test_data_no_label, model = "CatBoost")
y_pred_cat.head(2)

Unnamed: 0,class
0,>50K
1,>50K


In [8]:
y_pred_proba = predictor.predict_proba(test_data_no_label)
y_pred_proba.head(2)

Unnamed: 0,<=50K,>50K
0,0.951992,0.048008
1,0.981833,0.018167


In [9]:
results = predictor.fit_summary(show_plot=True)

*** Summary of fit() ***
Estimated performance of each model:
                  model  score_val        eval_metric  pred_time_val  fit_time  pred_time_val_marginal  fit_time_marginal  stack_level  can_infer  fit_order
0   WeightedEnsemble_L2   0.794267  balanced_accuracy       0.075737  2.098807                0.000649           0.058857            2       True         12
1               XGBoost   0.787418  balanced_accuracy       0.008210  1.460747                0.008210           1.460747            1       True          9
2            LightGBMXT   0.780568  balanced_accuracy       0.004757  5.348600                0.004757           5.348600            1       True          1
3              LightGBM   0.773719  balanced_accuracy       0.004077  1.029810                0.004077           1.029810            1       True          2
4         LightGBMLarge   0.755200  balanced_accuracy       0.003973  2.504804                0.003973           2.504804            1       True        

In [10]:
#full test dataset is required for final evaluation
predictor.leaderboard(test_data)

Unnamed: 0,model,score_test,score_val,eval_metric,pred_time_test,pred_time_val,fit_time,pred_time_test_marginal,pred_time_val_marginal,fit_time_marginal,stack_level,can_infer,fit_order
0,WeightedEnsemble_L2,0.804582,0.794267,balanced_accuracy,0.307452,0.075737,2.098807,0.003764,0.000649,0.058857,2,True,12
1,LightGBMXT,0.802185,0.780568,balanced_accuracy,0.259004,0.004757,5.3486,0.259004,0.004757,5.3486,1,True,1
2,RandomForestEntr,0.799674,0.731862,balanced_accuracy,0.296722,0.06671,0.565462,0.296722,0.06671,0.565462,1,True,4
3,XGBoost,0.799544,0.787418,balanced_accuracy,0.062553,0.00821,1.460747,0.062553,0.00821,1.460747,1,True,9
4,ExtraTreesEntr,0.798462,0.694825,balanced_accuracy,0.19117,0.067116,0.80147,0.19117,0.067116,0.80147,1,True,7
5,RandomForestGini,0.798256,0.738711,balanced_accuracy,0.241135,0.066878,0.579203,0.241135,0.066878,0.579203,1,True,3
6,ExtraTreesGini,0.797288,0.713343,balanced_accuracy,0.22596,0.056496,0.752115,0.22596,0.056496,0.752115,1,True,6
7,NeuralNetFastAI,0.789065,0.664637,balanced_accuracy,0.133759,0.010321,3.525885,0.133759,0.010321,3.525885,1,True,8
8,NeuralNetTorch,0.786928,0.720193,balanced_accuracy,0.078265,0.011044,2.780131,0.078265,0.011044,2.780131,1,True,10
9,LightGBM,0.786371,0.773719,balanced_accuracy,0.118353,0.004077,1.02981,0.118353,0.004077,1.02981,1,True,2


In [11]:
#full test dataset is required for final evaluation
predictor.feature_importance(test_data)

Unnamed: 0,importance,stddev,p_value,n,p99_high,p99_low
marital-status,0.083779,0.005557,2.308964e-06,5,0.095221,0.07233775
education-num,0.047794,0.004493,9.264955e-06,5,0.057045,0.03854209
capital-gain,0.040851,0.001202,8.971941e-08,5,0.043326,0.03837738
age,0.039221,0.005078,3.297172e-05,5,0.049676,0.02876596
hours-per-week,0.027026,0.008842,0.001198474,5,0.045232,0.008820893
relationship,0.010984,0.002198,0.0001826186,5,0.01551,0.006457763
sex,0.003768,0.002381,0.01201756,5,0.00867,-0.001134028
native-country,0.002485,0.001197,0.004863494,5,0.004949,1.973779e-05
occupation,0.00213,0.002183,0.04726559,5,0.006625,-0.002364715
education,0.001989,0.00096,0.00490126,5,0.003966,1.139858e-05


In [12]:
predictor.transform_features(test_data).head(3)

Unnamed: 0,age,fnlwgt,education-num,sex,capital-gain,capital-loss,hours-per-week,workclass,education,marital-status,occupation,relationship,race,native-country
0,31,169085,7,0,0,0,20,3,1,1,10,5,4,14
1,17,226203,8,1,0,0,45,5,2,3,10,3,4,14
2,47,54260,11,1,0,1887,60,3,7,1,3,0,4,14


In [13]:
test_results = predictor.evaluate(test_data)
dec_thres = predictor.decision_threshold
dec_thres

0.203

In [14]:
test_results

{'balanced_accuracy': np.float64(0.8045815635983102),
 'accuracy': 0.8014126317944519,
 'mcc': np.float64(0.5447820766162407),
 'roc_auc': np.float64(0.8902184811924534),
 'f1': 0.6595296595296596,
 'precision': 0.5559171597633136,
 'recall': 0.8106125970664366}

# Manual Feature Preprocessing

If target encoding will be used, then validation set should be fed to the model manually.

In [15]:
#Object to Category
cat_cols = train_data.select_dtypes(include=['object']).columns.tolist()
for cat_col in cat_cols:
    train_data[cat_col] = train_data[cat_col].astype('category')
    test_data[cat_col] = test_data[cat_col].astype('category')

In [16]:
predictor2 = TabularPredictor(label="class",
                             eval_metric = "balanced_accuracy").fit(train_data,
                                                presets = "medium", #arrange depending on the task
                                                ag_args_fit={'num_gpus': 1}, #GPU support
                                                verbosity = 1)


No path specified. Models will be saved in: "AutogluonModels/ag-20251112_103703"
Preset alias specified: 'medium' maps to 'medium_quality'.
AutoGluon infers your prediction problem is: 'binary' (because only two unique label-values observed).
	If 'binary' is not the correct problem_type, please manually specify the problem_type parameter during Predictor init (You may specify problem_type as one of: ['binary', 'multiclass', 'regression', 'quantile'])
Default metric period is 5 because BalancedAccuracy is/are not implemented for GPU
Metric BalancedAccuracy is not implemented on GPU. Will use CPU for metric computation, this could significantly affect learning time
Metric balanced_accuracy is not supported by this model - using log_loss instead


In [17]:
predictor2.leaderboard(test_data)

Unnamed: 0,model,score_test,score_val,eval_metric,pred_time_test,pred_time_val,fit_time,pred_time_test_marginal,pred_time_val_marginal,fit_time_marginal,stack_level,can_infer,fit_order
0,WeightedEnsemble_L2,0.804582,0.794267,balanced_accuracy,0.219177,0.07882,1.671678,0.002018,0.000637,0.061933,2,True,12
1,LightGBMXT,0.802185,0.780568,balanced_accuracy,0.201315,0.005404,1.220655,0.201315,0.005404,1.220655,1,True,1
2,RandomForestEntr,0.799674,0.731862,balanced_accuracy,0.163028,0.066706,0.56677,0.163028,0.066706,0.56677,1,True,4
3,XGBoost,0.799544,0.787418,balanced_accuracy,0.046088,0.011429,1.039688,0.046088,0.011429,1.039688,1,True,9
4,ExtraTreesEntr,0.798462,0.694825,balanced_accuracy,0.192945,0.066635,0.607113,0.192945,0.066635,0.607113,1,True,7
5,RandomForestGini,0.798256,0.738711,balanced_accuracy,0.171071,0.066754,0.570057,0.171071,0.066754,0.570057,1,True,3
6,ExtraTreesGini,0.797288,0.713343,balanced_accuracy,0.192307,0.056154,0.747992,0.192307,0.056154,0.747992,1,True,6
7,NeuralNetFastAI,0.789065,0.664637,balanced_accuracy,0.130173,0.008479,0.605405,0.130173,0.008479,0.605405,1,True,8
8,NeuralNetTorch,0.786928,0.720193,balanced_accuracy,0.048687,0.014762,1.715881,0.048687,0.014762,1.715881,1,True,10
9,LightGBM,0.78594,0.773719,balanced_accuracy,0.09219,0.004505,1.150728,0.09219,0.004505,1.150728,1,True,2


In [18]:
predictor2.evaluate(test_data)

{'balanced_accuracy': np.float64(0.8045815635983102),
 'accuracy': 0.8014126317944519,
 'mcc': np.float64(0.5447820766162407),
 'roc_auc': np.float64(0.8902184811924534),
 'f1': 0.6595296595296596,
 'precision': 0.5559171597633136,
 'recall': 0.8106125970664366}