### <p style="font-family:JetBrains Mono; font-weight:normal; letter-spacing: 2px; color:#9E3F00; font-size:180%; text-align:left;padding: 0px; border-bottom: 3px solid #9E3F00">AutoGluon released the v1.0.0 Nov 30, 2023 ! Check the site below for what’s new in the version. https://auto.gluon.ai/stable/index.html</p>

### In this notebook we are using the latest pre-release 1.0.1b20240101 and I will try to show you some advanced examples what you can do in the framework.

In [None]:
!pip install autogluon==1.0.1b20240101 -q

### We can also install the TabPFN library for later use if running on GPU.

In [None]:
#!pip install autogluon.tabular[tabpfn]==1.0.1b20231208 -q

### We are doing the label encoding before the AG feature engineering part. Can also be done inside the AG engine. But for the later distillation it worked better this way here. Some extra features are added based on the Age column. Feature Engineering is equal as important as model engineering, but we leave it with only the FE below.

In [None]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import pickle
from autogluon.tabular import TabularPredictor
from sklearn.preprocessing import LabelEncoder, MinMaxScaler

train_df = pd.read_csv('/kaggle/input/playground-series-s3e26/train.csv')

# Advanced feature engineering for training data
train_df['Age_Group'] = pd.cut(train_df['Age'], bins=[9000, 15000, 20000, 25000, 30000], labels=['A', 'B', 'C', 'D'],).astype('category')
train_df['Log_Age'] = np.log1p(train_df['Age'])
scaler = MinMaxScaler()
train_df['Scaled_Age'] = scaler.fit_transform(train_df['Age'].values.reshape(-1, 1))
train_df = train_df.drop(columns=['id'])

test_df = pd.read_csv('/kaggle/input/playground-series-s3e26/test.csv')

# Advanced feature engineering for test data
test_df['Age_Group'] = pd.cut(test_df['Age'], bins=[9000, 15000, 20000, 25000, 30000], labels=['A', 'B', 'C', 'D']).astype('category')
test_df['Log_Age'] = np.log1p(test_df['Age'])
test_df['Scaled_Age'] = scaler.transform(test_df['Age'].values.reshape(-1, 1))

test_df = test_df.drop(columns=['id'])

fts_continuous = list(train_df.select_dtypes(include=['float64', 'int64']).columns)
fts_categorical = list(train_df.select_dtypes(include=['object','category']).columns)
fts_categorical.remove('Status')

train_df = pd.get_dummies(train_df,
                       columns=fts_categorical)
test_df = pd.get_dummies(test_df, 
                      columns=fts_categorical)

label_encoder = LabelEncoder()
train_df['Status'] = label_encoder.fit_transform(train_df['Status'])

In [None]:
train_df.head()

In [None]:
train_df.info()

## Check the coming AG feature engineering before the engine starts!

In [None]:
from autogluon.features.generators import AutoMLPipelineFeatureGenerator
auto_ml_pipeline_feature_generator = AutoMLPipelineFeatureGenerator()
auto_ml_pipeline_feature_generator.fit_transform(X=train_df.drop('Status',axis=1))

### There are many training options in AG, I should recommend using presets='best_quality' in the fit state which in v.1.0.0 has Zero-Shot HPO included and many other SOTA parts. 
### Below we are instead custom tuning the models by first get the deafault parameters from get_hyperparameter_config and also add some of the more advanced architectures to show what other models are included in autogluon tabular like FTTransformer and TabPFN.

In [None]:
from autogluon.tabular.configs.hyperparameter_configs import get_hyperparameter_config
custom_hyperparameters = get_hyperparameter_config('default')
print(custom_hyperparameters.keys())
custom_hyperparameters['XGB']


### We can also use unlabeled data for the TabTransformer training, which is the only model in AG that uses that option and the model also needs to be added to the training below. It also shows how to add custom search space during training if the hyperparameter_tune_kwargs is used.
### As we are uing CPU here, we skip this training for now, better using transformers with GPU.

In [None]:
# from autogluon.common import space
# custom_hyperparameters['TRANSF'] = {
#         "lr": space.Real(5e-5, 5e-3, default=1e-3, log=True),
#         "weight_decay": space.Real(1e-6, 5e-2, default=1e-6, log=True),
#         "p_dropout": space.Categorical(0.1, 0, 0.5),
#         "n_heads": space.Categorical(8, 4),
#         "hidden_dim": space.Categorical(128, 32, 64, 256),
#         "n_layers": space.Categorical(2, 1, 3, 4, 5),
#         "feature_dim": space.Int(8, 128, default=64),
#         "num_output_layers": space.Categorical(1, 2),
#     }
# custom_hyperparameters['TABPFN'] = {"N_ensemble_configurations": space.Categorical(2, 4, 8)}
# custom_hyperparameters['FT_TRANSFORMER'] = {}

### Let's start the training!
### I leave some options unused so you can try other settings.
### Time limit is set for the training of the different solutions and also the extra option for HPO tuning within that time limit.
### We are exluding some deafult models to save some tuning and training time. For the same reason we decrease the folds from deafult 8 to 6.

### Dynamic_stacking option can also be used, the function described below. This take some extra time and we will set it here to Off/False.
### "Dynamic stacking is enabled (dynamic_stacking=True). AutoGluon will try to determine whether the input data is affected by stacked overfitting and enable or disable stacking as a consequence.
### Detecting stacked overfitting by sub-fitting AutoGluon on the input data. That is, copies of AutoGluon will be sub-fit on subset(s) of the data. Then, the holdout validation data is used to detect stacked overfitting."
### 2 stacking level training is AG default with highest option of training.

### There many options one can experiment with like exclude some models and skip the auto feature engineering, I add them but not used, only as example of options. We use the keep_only_best option which removes models after training that not is used in the final ensemble.

In [None]:
label = 'Status'
eval_metric = 'log_loss'
predictor = TabularPredictor(label = label,
                             sample_weight='auto_weight',
                             eval_metric = eval_metric,
                             verbosity = 3).fit(train_df,unlabeled_data=test_df,time_limit=60*60*1,
                                                presets='best_quality',dynamic_stacking=False,
                                                ag_args_ensemble = {'fold_fitting_strategy':'sequential_local'},
                                              #  excluded_model_types=['KNN','XT','RF'],
                                              #  feature_generator=None,
                                                num_bag_folds=6,
                                                hyperparameters = custom_hyperparameters,
                                                hyperparameter_tune_kwargs='auto',
                                                num_gpus=0,num_cpus=4,
                                                keep_only_best=True)

### Now we try re-train on a pseudolabel of the test data and see if it improve the score, otherwise it will back and pick the original trained model.

In [None]:
predictor.fit_pseudolabel(pseudo_data=test_df,use_ensemble=True,return_pred_prob=True,
                          fit_ensemble=True,it_ensemble_every_iter=True,keep_only_best=True,max_iter=10,
                            presets='best_quality',dynamic_stacking=True,time_limit=60*60*1,
                            ag_args_ensemble = {'fold_fitting_strategy':'sequential_local'},
                           # excluded_model_types=['KNN','XT','RF','TABPFN','GBM','NN_TORCH'],
                            hyperparameters = custom_hyperparameters,
                            num_gpus=0,num_cpus=4)

In [None]:
predictor.leaderboard()

### Now we try distillation 
### As described on the AG site:
### "Distill AutoGluon’s most accurate ensemble-predictor into single models which are simpler/faster and require less memory/compute. Distillation can produce a model that is more accurate than the same model fit directly on the original training data."
### We also try to use the test set for the extra augmentation data.

In [None]:
## first saving the best trained models for later used.
predictor_best_model = predictor.leaderboard()['model'][1]
predictor_best_ensemble = predictor.leaderboard()['model'][0]

In [None]:
predictor.distill(
                hyperparameters = {'RF':{},'CAT':{}},#'GBM':{},'NN_TORCH':{},'RF':{},'CAT':{}
                augment_args = {'size_factor':5, 'max_size': int(1e4)})

In [None]:
predictor.leaderboard()

### Setting the pre trained distill model ensemble as primary model and also load it into memory for faster handling "persist mode".

In [None]:
#model_to_deploy = distillmodel.leaderboard()['model'][1]
predictor.set_model_best('WeightedEnsemble_L2_DSTL')
predictor.model_best
predictor.persist()

### Now let us do an extra training with only some few SOTA models, showing how one can use AutoGluon also as a fast HPO tuning engine. Below will also show how to pick a single Zero-Shot HPO model from the framework. lets start with that and pick the catboost ZS-HPO.

In [None]:
from autogluon.tabular.configs.hyperparameter_configs import get_hyperparameter_config
custom_hyperparameters = get_hyperparameter_config('zeroshot')
print(custom_hyperparameters.keys())

### We turn off the dynamic_stacking and stacking levels saving some time. Setthe time limit based on your

In [None]:
label = 'Status'
eval_metric = 'log_loss'
predictor2 = TabularPredictor(label = label,
                             eval_metric = eval_metric,
                             verbosity = 3).fit(train_df,
                                                presets='best_quality',
                                                dynamic_stacking=False,
                                                num_stack_levels=0,
                                                num_bag_folds=6,
                                                hyperparameters = {'CAT':custom_hyperparameters['CAT']},
                                                time_limit=60*60*1,
                                                num_gpus=0,num_cpus=4,
                                                keep_only_best=True)

In [None]:
predictor2.leaderboard()

### Now we will train an another solution, custom tune selected models parameters on the dataset, we select the three SOTA architectures for it.

In [None]:
label = 'Status'
eval_metric = 'log_loss'
predictor3 = TabularPredictor(label = label,
                             eval_metric = eval_metric,
                             verbosity = 3).fit(train_df,
                                                presets='best_quality',
                                                dynamic_stacking=False,
                                                hyperparameter_tune_kwargs='auto',
                                                time_limit=60*60*1,
                                                num_bag_folds=6,
                                                hyperparameters = {'CAT':{},'XGB':{},'GBM':{}},
                                                num_gpus=0,num_cpus=4,
                                                keep_only_best=True)

In [None]:
predictor3.leaderboard()

### Now to the extra stacking part. This you often see in top solutions in many benchmarks. AG does in the background but here we also do it in custom own created stacking.
### We take the prediction and training out-of-folder prediction and add that to the orginal dataset as extra features.
### For this we use four models, the best model from each ensemble, this can reduce the change of overfit instead of using the ensembles. Please try use the ensemble as well, experimenting and study the result is the way.

### As distill training doesn't have OOF, it uses that for augmentation, we will not use it for stacking but instead for later ensemble.

In [None]:
## first predict using the distillation model.
bmodel = predictor.model_best
print(bmodel)
pred_distill = predictor.predict_proba(model=bmodel,data=test_df,as_pandas=False)
pred_distill

In [None]:
train_df_stacking = train_df.copy()
test_df_stacking = test_df.copy()

### Pick the top 5 features from training to the new dataset. Here you can try all feature or some, this is for demo example for what you can do.

In [None]:
feature_importance = predictor3.feature_importance(train_df)

In [None]:
coltouse = list(feature_importance[:5].T.columns)
train_df_stacking = train_df_stacking[coltouse+[label]]
test_df_stacking = test_df_stacking[coltouse]

### Take the best model from the different trained AG solutions and create the new features from test set prediction and train set OOF prediction.

In [None]:
#model_to_deploy = predictor.leaderboard()['model'][1]
predictor.set_model_best(predictor_best_model)
bmodel = predictor.model_best
print(bmodel)
train_df_stacking[['ag_1_Status_C','ag_1_Status_CL','ag_1_Status_D']] = predictor.get_oof_pred_proba().values
test_df_stacking[['ag_1_Status_C','ag_1_Status_CL','ag_1_Status_D']] = predictor.predict_proba(model=bmodel,data=test_df,as_pandas=False)

In [None]:
model_to_deploy = predictor2.leaderboard()['model'][1]
predictor2.set_model_best(model_to_deploy)
bmodel = predictor2.model_best
print(bmodel)
train_df_stacking[['ag_2_Status_C','ag_2_Status_CL','ag_2_Status_D']] = predictor2.get_oof_pred_proba().values
test_df_stacking[['ag_2_Status_C','ag_2_Status_CL','ag_2_Status_D']] = predictor2.predict_proba(model=bmodel,data=test_df,as_pandas=False)

In [None]:
model_to_deploy = predictor3.leaderboard()['model'][1]
predictor3.set_model_best(model_to_deploy)
bmodel = predictor3.model_best
print(bmodel)
train_df_stacking[['ag_3_Status_C','ag_3_Status_CL','ag_3_Status_D']] = predictor3.get_oof_pred_proba().values
test_df_stacking[['ag_3_Status_C','ag_3_Status_CL','ag_3_Status_D']] = predictor3.predict_proba(model=bmodel,data=test_df,as_pandas=False)

In [None]:
test_df_stacking

### Now we will train a final stacked solution with a tuned XGB as meta model.

In [None]:
custom_hyperparameters = get_hyperparameter_config('default')
custom_hyperparameters['TABPFN'] = {}
label = 'Status'
eval_metric = 'log_loss'
predictor_final = TabularPredictor(label = label,
                             eval_metric = eval_metric,
                             verbosity = 3).fit(train_df_stacking,
                                                presets='best_quality',
                                                dynamic_stacking=False,
                                                num_stack_levels=0,
                                                hyperparameter_tune_kwargs='auto',
                                                hyperparameters = {'XGB':custom_hyperparameters['XGB']},
                                                time_limit=60*60*1,
                                                num_bag_folds=6,
                                                num_gpus=0,num_cpus=4,
                                                keep_only_best=True)

In [None]:
predictor_final.leaderboard()

In [None]:
final_pred = predictor_final.predict_proba(test_df_stacking)
final_pred

In [None]:
sub = pd.read_csv("/kaggle/input/playground-series-s3e26/sample_submission.csv")

### Finally we ensemle with all the AG solutions trained in this notebook.

In [None]:
## First predict using the best ensembles for every solution.
predictor.set_model_best(predictor_best_ensemble)
bmodel = predictor.model_best
print(bmodel)
pred1 = predictor.predict_proba(model=bmodel,data=test_df)

model_to_deploy = predictor2.leaderboard()['model'][0]
predictor2.set_model_best(model_to_deploy)
bmodel = predictor2.model_best
print(bmodel)
pred2 = predictor2.predict_proba(model=bmodel,data=test_df)

model_to_deploy = predictor3.leaderboard()['model'][0]
predictor3.set_model_best(model_to_deploy)
bmodel = predictor3.model_best
print(bmodel)
pred3 = predictor3.predict_proba(model=bmodel,data=test_df)

In [None]:
## remove all the trained models, as we only need the predictions for now.
!rm -r *

In [None]:
## Ensemble all the predictions.
sub.iloc[:,1:] = (final_pred.values*.6 +  pred_distill*.4)*.5 + pred1.values*.2 + pred2.values*.2 + pred3.values*.1
sub.to_csv('submission.csv', index=False)
sub

## That's it!