<a href="https://colab.research.google.com/github/jcl347/JModel_Kaggle/blob/main/Loans_%7C_Autogluon.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# AutoGluon Tabular v1.4 Extreme Preset and Ensemble Models

This notebook uses AutoGluon Tabular v1.4 with the Extreme preset to tackle the loan default prediction task in Kaggle Playground Series S5E11.

- Competition overview:  
  https://www.kaggle.com/competitions/playground-series-s5e11/overview

- Related discussion threads for this approach:  
  - Discussion 617692 (AutoGluon / ensemble strategy, results, and tips):  
    https://www.kaggle.com/competitions/playground-series-s5e11/discussion/617692  
  - Discussion 614986 (additional approaches and ideas):  
    https://www.kaggle.com/competitions/playground-series-s5e11/discussion/614986  

---

## What AutoGluon Tabular Does

AutoGluon Tabular is an AutoML framework for structured tabular data. It:

- Automatically handles preprocessing  
  - Infers feature types (numeric, categorical, text, datetime)  
  - Deals with missing values and categorical encoding  

- Trains a portfolio of models  
  - Gradient boosted trees (LightGBM, XGBoost, CatBoost)  
  - Linear models, k-nearest neighbors, random forests  
  - Neural networks and tabular foundation models  

- Uses bagging and stacking  
  - Bagging: multiple folds or resamples per model to get robust out-of-fold predictions  
  - Stacking: higher level models learn how to combine predictions from lower level models  

- Builds a final weighted ensemble that often outperforms any single model

The goal is to replace manual model selection and blending with a strong automatically tuned ensemble.

---

## The Extreme Preset (v1.4)

The Extreme preset is the highest accuracy mode of AutoGluon Tabular v1.4 for small and medium sized tabular datasets. It:

- Uses meta-learned hyperparameters from large meta-benchmarks  
- Trains more models and uses deeper bagging and stacking than the best_quality preset  
- Adds several state of the art tabular models to the ensemble:
  - Mitra
  - TabPFNv2
  - TabICL
  - RealMLP
  - TabM

For Kaggle style problems, Extreme tries to squeeze out the best ROC AUC by combining many diverse, strong base learners.

---

## Key New Models in the Ensemble (v1.4)

These models are important additions in AutoGluon Tabular v1.4 and are used inside the Extreme preset.

### Mitra

- Tabular foundation model with a transformer style architecture  
- Pretrained on large amounts of synthetic tabular data  
- Designed to generalize well after fine tuning and capture rich feature interactions  

### TabPFNv2

- Based on prior data fitted networks (TabPFN)  
- Acts like an in context learning model for tabular data  
- Very strong on small datasets, where traditional deep models can overfit  

### TabICL

- Tabular foundation model specialized for in context learning on larger datasets  
- Treats tabular prediction as "examples plus query row produce prediction"  
- Focused on classification tasks and can scale better than TabPFNv2 in row count  

### RealMLP

- High performance MLP based tabular model  
- Uses architectural and training tricks tuned on many datasets  
- Often competitive with gradient boosted trees, especially when the signal is friendly to neural networks  

### TabM

- Parameter efficient ensemble of MLPs inside a single network  
- Produces multiple predictions per sample internally and combines them  
- Captures ensemble diversity without training many separate models  

---

## Why Use This Stack For This Competition?

For this loan default prediction task:

- Classic models in AutoGluon (gradient boosted trees, linear models, random forests) provide solid baselines  
- The foundation models and advanced MLPs (Mitra, TabPFNv2, TabICL, RealMLP, TabM) capture more complex non linear structure  
- The Extreme preset bags and stacks all of these, then learns the best combination based on validation ROC AUC

This gives a strong meta learned ensemble that can outperform a manually tuned blend of logistic regression, LightGBM, and XGBoost, while keeping the training code relatively simple.


## What AutoGluon Tabular Does

AutoGluon Tabular is an AutoML framework for structured tabular data. It:

- Automatically handles preprocessing  
  - Infers feature types (numeric, categorical, text, datetime)  
  - Deals with missing values and categorical encoding  

- Trains a portfolio of models  
  - Gradient boosted trees (LightGBM, XGBoost, CatBoost)  
  - Linear models, k-nearest neighbors, random forests  
  - Neural networks and tabular foundation models  

- Uses bagging and stacking  
  - Bagging: multiple folds or resamples per model to get robust out-of-fold predictions  
  - Stacking: higher level models learn how to combine predictions from lower level models  

- Builds a final weighted ensemble that often outperforms any single model

The goal is to replace manual model selection and blending with a strong automatically tuned ensemble.

---

## The Extreme Preset (version 1.4)

The Extreme preset is the highest accuracy mode of AutoGluon Tabular version 1.4 for small and medium sized tabular datasets. It:

- Uses meta-learned hyperparameters from large meta-benchmarks  
- Trains more models and uses deeper bagging and stacking than the best_quality preset  
- Adds several state of the art tabular models to the ensemble:
  - Mitra
  - TabPFNv2
  - TabICL
  - RealMLP
  - TabM

For Kaggle style problems, Extreme tries to squeeze out the best ROC AUC by combining many diverse, strong base learners.

---

## Key New Models in the Ensemble (version 1.4)

These models are important additions in AutoGluon Tabular version 1.4 and are used inside the Extreme preset.

### Mitra

- Tabular foundation model with a transformer style architecture  
- Pretrained on large amounts of synthetic tabular data  
- Designed to generalize well after fine tuning and capture rich feature interactions  

### TabPFNv2

- Based on prior data fitted networks (TabPFN)  
- Acts like an in context learning model for tabular data  
- Very strong on small datasets, where traditional deep models can overfit  

### TabICL

- Tabular foundation model specialized for in context learning on larger datasets  
- Treats tabular prediction as "examples plus query row produce prediction"  
- Focused on classification tasks and can scale better than TabPFNv2 in row count  

### RealMLP

- High performance MLP based tabular model  
- Uses architectural and training tricks tuned on many datasets  
- Often competitive with gradient boosted trees, especially when the signal is friendly to neural networks  

### TabM

- Parameter efficient ensemble of MLPs inside a single network  
- Produces multiple predictions per sample internally and combines them  
- Captures ensemble diversity without training many separate models  

---

## Why Use This Stack For This Competition?

For this loan default prediction task:

- Classic models in AutoGluon (gradient boosted trees, linear models, random forests) provide solid baselines  
- The foundation models and advanced MLPs (Mitra, TabPFNv2, TabICL, RealMLP, TabM) capture more complex non linear structure  
- The Extreme preset bags and stacks all of these, then learns the best combination based on validation ROC AUC

This gives a strong meta learned ensemble that can outperform a manually tuned blend of logistic regression, LightGBM, and XGBoost, while keeping the training code relatively simple.


In [1]:
from google.colab import files

# Upload kaggle.json from your local machine
files.upload()


Saving kaggle.json to kaggle.json


{'kaggle.json': b'{"username":"jmodel","key":"9636a7ab466d9bb15529bbcc8071a87b"}'}

In [2]:
import os, zipfile

# Make the .kaggle directory and move kaggle.json into it
os.makedirs('/root/.kaggle', exist_ok=True)
!mv kaggle.json /root/.kaggle/

# Set correct permissions
!chmod 600 /root/.kaggle/kaggle.json

# Install Kaggle CLI
!pip install -q kaggle

In [3]:
# List competitions just to confirm it works (optional)
!kaggle competitions list | head

# Download the data for the competition
!kaggle competitions download -c playground-series-s5e11

ref                                                                                 deadline             category              reward  teamCount  userHasEntered  
----------------------------------------------------------------------------------  -------------------  ---------------  -----------  ---------  --------------  
https://www.kaggle.com/competitions/hull-tactical-market-prediction                 2025-12-15 23:59:00  Featured         100,000 Usd       2311            True  
https://www.kaggle.com/competitions/vesuvius-challenge-surface-detection            2026-02-13 23:59:00  Research         100,000 Usd        103           False  
https://www.kaggle.com/competitions/google-tunix-hackathon                          2026-01-12 23:59:00  Featured         100,000 Usd         47           False  
https://www.kaggle.com/competitions/csiro-biomass                                   2026-01-28 23:59:00  Research          75,000 Usd       1217           False  
https://www.kaggle.com

In [4]:
!unzip -o playground-series-s5e11.zip

Archive:  playground-series-s5e11.zip
  inflating: sample_submission.csv   
  inflating: test.csv                
  inflating: train.csv               


In [5]:
import pandas as pd

train = pd.read_csv("train.csv")
test = pd.read_csv("test.csv")
sample_submission = pd.read_csv("sample_submission.csv")

print(train.shape, test.shape)
train.head()

(593994, 13) (254569, 12)


Unnamed: 0,id,annual_income,debt_to_income_ratio,credit_score,loan_amount,interest_rate,gender,marital_status,education_level,employment_status,loan_purpose,grade_subgrade,loan_paid_back
0,0,29367.99,0.084,736,2528.42,13.67,Female,Single,High School,Self-employed,Other,C3,1.0
1,1,22108.02,0.166,636,4593.1,12.92,Male,Married,Master's,Employed,Debt consolidation,D3,0.0
2,2,49566.2,0.097,694,17005.15,9.76,Male,Single,High School,Employed,Debt consolidation,C5,1.0
3,3,46858.25,0.065,533,4682.48,16.1,Female,Single,High School,Employed,Debt consolidation,F1,1.0
4,4,25496.7,0.053,665,12184.43,10.21,Male,Married,High School,Employed,Other,D1,1.0


In [6]:
# Upgrade scikit-learn to a version that has `get_tags` (needed by AutoGluon 1.4)
!pip install -U "scikit-learn>=1.6.1,<1.7.0"

import sklearn
print("sklearn version:", sklearn.__version__)

sklearn version: 1.6.1


In [7]:
# Make sure xgboost is on a version AutoGluon 1.4 plays nicely with
!pip install -q "xgboost<3.0.0"

import xgboost
print("XGBoost version:", xgboost.__version__)  # should now be 2.x

# Try to install AutoGluon Tabular 1.4.x with all tabular extras
!uv pip install autogluon

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m223.6/223.6 MB[0m [31m5.5 MB/s[0m eta [36m0:00:00[0m
[?25hXGBoost version: 2.1.4
[2mUsing Python 3.12.12 environment at: /usr[0m
[2K[2mResolved [1m216 packages[0m [2min 2.23s[0m[0m
[2K[2mPrepared [1m62 packages[0m [2min 19.55s[0m[0m
[2mUninstalled [1m10 packages[0m [2min 939ms[0m[0m
[2K[2mInstalled [1m62 packages[0m [2min 223ms[0m[0m
 [32m+[39m [1madagio[0m[2m==0.2.6[0m
 [32m+[39m [1maiohttp-cors[0m[2m==0.8.1[0m
 [32m+[39m [1mautogluon[0m[2m==1.4.0[0m
 [32m+[39m [1mautogluon-common[0m[2m==1.4.0[0m
 [32m+[39m [1mautogluon-core[0m[2m==1.4.0[0m
 [32m+[39m [1mautogluon-features[0m[2m==1.4.0[0m
 [32m+[39m [1mautogluon-multimodal[0m[2m==1.4.0[0m
 [32m+[39m [1mautogluon-tabular[0m[2m==1.4.0[0m
 [32m+[39m [1mautogluon-timeseries[0m[2m==1.4.0[0m
 [32m+[39m [1mboto3[0m[2m==1.40.74[0m
 [32m+[39m [1mbotocore[0m[2m==1.40.74[0m
 [32m+[39

In [8]:
# ============================================================
# 0. IMPORTS
# ============================================================
import numpy as np
import pandas as pd

from sklearn.metrics import roc_auc_score
from autogluon.tabular import TabularDataset, TabularPredictor

# ============================================================
# 1. BASIC CONFIG
# ============================================================
TARGET = "loan_paid_back"
ID_COL = "id"
AUTOML_PATH = "AutogluonModels_extreme"

# If your files are in a subdirectory (e.g. "./ps5e11"), set it here.
# Otherwise leave as "." for current directory.
DATA_DIR = "."

# ============================================================
# 2. LOAD DATA (LOCAL FILES FROM KAGGLE API DOWNLOAD)
# ============================================================
train_path = f"{DATA_DIR}/train.csv"
test_path  = f"{DATA_DIR}/test.csv"

train = pd.read_csv(train_path)
test  = pd.read_csv(test_path)

# Ensure target is integer 0/1
train[TARGET] = train[TARGET].astype(int)

# ============================================================
# 3. FEATURE ENGINEERING
#    Using only:
#    annual_income, debt_to_income_ratio, credit_score,
#    loan_amount, interest_rate, gender, marital_status,
#    education_level, employment_status, loan_purpose,
#    grade_subgrade
# ============================================================

def add_features(df):
    """
    EDA-driven feature engineering using the allowed columns.
    AutoGluon will still see the original categorical columns
    and handle encoding automatically.
    """
    # 1) Ability-to-pay features
    eps = 1.0  # to avoid division by zero
    df["loan_to_income"] = df["loan_amount"] / (df["annual_income"] + eps)

    # 2) Log transforms to stabilize scale and capture diminishing returns
    df["log_annual_income"] = np.log1p(df["annual_income"])
    df["log_loan_amount"] = np.log1p(df["loan_amount"])

    # 3) Interaction between debt load and interest rate
    df["dti_x_interest"] = df["debt_to_income_ratio"] * df["interest_rate"]

    # 4) Credit score band (categorical)
    credit_bins = [0, 580, 640, 700, 760, 900]
    credit_labels = ["very_low", "low", "fair", "good", "excellent"]
    df["credit_score_band"] = pd.cut(
        df["credit_score"],
        bins=credit_bins,
        labels=credit_labels,
        include_lowest=True
    )

    return df

# Apply feature engineering to train and test
train_fe = add_features(train.copy())
test_fe  = add_features(test.copy())

# ============================================================
# 4. PREPARE DATA FOR AUTOGluon
# ============================================================
# Drop ID column so it is not used as a feature
train_ag = train_fe.drop(columns=[ID_COL])
test_ag  = test_fe.drop(columns=[ID_COL])

# Wrap in TabularDataset (adds metadata for AutoGluon)
train_ag = TabularDataset(train_ag)
test_ag  = TabularDataset(test_ag)

# ============================================================
# 5. DEFINE PREDICTOR (EXTREME PRESET)
# ============================================================
predictor = TabularPredictor(
    label=TARGET,
    eval_metric="roc_auc",
    path=AUTOML_PATH
)

# ============================================================
# 6. FIT WITH EXTREME PRESET
# ============================================================
# presets="extreme" is the heavy, high-accuracy preset in AutoGluon v1.4.
# It trains many model families (GBMs, neural nets, tabular foundation models),
# with bagging and stacking.
#
# time_limit is total training time in seconds.
# ag_args_fit can control GPU usage. Here we assume 1 GPU in Colab.
# Set num_gpus=0 to force CPU-only.

predictor = predictor.fit(
    train_data=train_ag,
    presets="extreme",
    time_limit=25000,       # adjust as needed
    ag_args_fit={
        "num_gpus": 1      # set to 0 if you are on CPU-only
    }
)

# ============================================================
# 7. INSPECT MODELS AND VALIDATION PERFORMANCE
# ============================================================
lb = predictor.leaderboard(silent=True)
print(lb)

# ============================================================
# 8. OPTIONAL: OOF ROC AUC (AutoGluon out-of-fold predictions)
# ============================================================
try:
    oof_pred = predictor.predict_proba_oof(as_multiclass=False)
    oof_auc = roc_auc_score(train[TARGET], oof_pred)
    print("AutoGluon EXTREME OOF ROC AUC:", round(oof_auc, 6))
except Exception as e:
    print("Could not compute OOF predictions (not all models are bagged).")
    print("Error:", e)

# ============================================================
# 9. PREDICT ON TEST AND BUILD SUBMISSION
# ============================================================
test_pred_proba = predictor.predict_proba(test_ag, as_multiclass=False)

submission = pd.DataFrame({
    ID_COL: test[ID_COL],
    TARGET: test_pred_proba.clip(0.0, 1.0)
})

submission.to_csv("submission.csv", index=False)
submission.head()


Preset alias specified: 'extreme' maps to 'extreme_quality'.
Verbosity: 2 (Standard Logging)
AutoGluon Version:  1.4.0
Python Version:     3.12.12
Operating System:   Linux
Platform Machine:   x86_64
Platform Version:   #1 SMP Thu Oct  2 10:42:05 UTC 2025
CPU Count:          12
Memory Avail:       50.70 GB / 52.96 GB (95.7%)
Disk Space Avail:   192.71 GB / 235.68 GB (81.8%)
Presets specified: ['extreme']
`extreme` preset uses a dynamic portfolio based on dataset size...
	Detected data size: large (>30000 samples), using `zeroshot` portfolio (identical to 'best_quality' preset).
Using hyperparameters preset: hyperparameters='zeroshot'
Setting dynamic_stacking from 'auto' to True. Reason: Enable dynamic_stacking when use_bag_holdout is disabled. (use_bag_holdout=False)
Stack configuration (auto_stack=True): num_stack_levels=1, num_bag_folds=8, num_bag_sets=1
DyStack is enabled (dynamic_stacking=True). AutoGluon will try to determine whether the input data is affected by stacked overfitti

[36m(_ray_fit pid=2376)[0m [1000]	valid_set's binary_logloss: 0.250308
[36m(_ray_fit pid=2376)[0m [2000]	valid_set's binary_logloss: 0.250115


[36m(_ray_fit pid=2880)[0m 	Training S1F2 with GPU, note that this may negatively impact model quality compared to CPU training.


[36m(_ray_fit pid=2880)[0m [1000]	valid_set's binary_logloss: 0.253826


[36m(_ray_fit pid=3250)[0m 	Training S1F3 with GPU, note that this may negatively impact model quality compared to CPU training.


[36m(_ray_fit pid=3250)[0m [1000]	valid_set's binary_logloss: 0.254251


[36m(_ray_fit pid=3616)[0m 	Training S1F4 with GPU, note that this may negatively impact model quality compared to CPU training.


[36m(_ray_fit pid=3616)[0m [1000]	valid_set's binary_logloss: 0.258725


[36m(_ray_fit pid=3967)[0m 	Training S1F5 with GPU, note that this may negatively impact model quality compared to CPU training.


[36m(_ray_fit pid=3967)[0m [1000]	valid_set's binary_logloss: 0.253784


[36m(_ray_fit pid=4229)[0m 	Training S1F6 with GPU, note that this may negatively impact model quality compared to CPU training.


[36m(_ray_fit pid=4229)[0m [1000]	valid_set's binary_logloss: 0.252323


[36m(_ray_fit pid=4524)[0m 	Training S1F7 with GPU, note that this may negatively impact model quality compared to CPU training.


[36m(_ray_fit pid=4524)[0m [1000]	valid_set's binary_logloss: 0.25415
[36m(_ray_fit pid=4524)[0m [2000]	valid_set's binary_logloss: 0.25404


[36m(_ray_fit pid=4971)[0m 	Training S1F8 with GPU, note that this may negatively impact model quality compared to CPU training.
[36m(_dystack pid=1522)[0m 	0.9149	 = Validation score   (roc_auc)
[36m(_dystack pid=1522)[0m 	525.04s	 = Training   runtime
[36m(_dystack pid=1522)[0m 	73.87s	 = Validation runtime
[36m(_dystack pid=1522)[0m Fitting model: LightGBM_BAG_L1 ... Training model for up to 3626.16s of the 5708.42s of remaining time.
[36m(_dystack pid=1522)[0m 	Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy (1.0 workers, per: cpus=1, gpus=1, memory=0.59%)
[36m(_ray_fit pid=5250)[0m 	Training S1F1 with GPU, note that this may negatively impact model quality compared to CPU training.


[36m(_ray_fit pid=5250)[0m [1000]	valid_set's binary_logloss: 0.242809


[36m(_ray_fit pid=5572)[0m 	Training S1F2 with GPU, note that this may negatively impact model quality compared to CPU training.


[36m(_ray_fit pid=5572)[0m [1000]	valid_set's binary_logloss: 0.24572


[36m(_ray_fit pid=5885)[0m 	Training S1F3 with GPU, note that this may negatively impact model quality compared to CPU training.


[36m(_ray_fit pid=5885)[0m [1000]	valid_set's binary_logloss: 0.246263


[36m(_ray_fit pid=6139)[0m 	Training S1F4 with GPU, note that this may negatively impact model quality compared to CPU training.


[36m(_ray_fit pid=6139)[0m [1000]	valid_set's binary_logloss: 0.250601


[36m(_ray_fit pid=6438)[0m 	Training S1F5 with GPU, note that this may negatively impact model quality compared to CPU training.


[36m(_ray_fit pid=6438)[0m [1000]	valid_set's binary_logloss: 0.245691


[36m(_ray_fit pid=6759)[0m 	Training S1F6 with GPU, note that this may negatively impact model quality compared to CPU training.


[36m(_ray_fit pid=6759)[0m [1000]	valid_set's binary_logloss: 0.244085


[36m(_ray_fit pid=7067)[0m 	Training S1F7 with GPU, note that this may negatively impact model quality compared to CPU training.


[36m(_ray_fit pid=7067)[0m [1000]	valid_set's binary_logloss: 0.245068


[36m(_ray_fit pid=7364)[0m 	Training S1F8 with GPU, note that this may negatively impact model quality compared to CPU training.


[36m(_ray_fit pid=7364)[0m [1000]	valid_set's binary_logloss: 0.24423


[36m(_dystack pid=1522)[0m 	0.9215	 = Validation score   (roc_auc)
[36m(_dystack pid=1522)[0m 	431.09s	 = Training   runtime
[36m(_dystack pid=1522)[0m 	56.63s	 = Validation runtime
[36m(_dystack pid=1522)[0m Fitting model: RandomForestGini_BAG_L1 ... Training model for up to 3186.21s of the 5268.47s of remaining time.
[36m(_dystack pid=1522)[0m 	0.9123	 = Validation score   (roc_auc)
[36m(_dystack pid=1522)[0m 	78.5s	 = Training   runtime
[36m(_dystack pid=1522)[0m 	18.18s	 = Validation runtime
[36m(_dystack pid=1522)[0m Fitting model: RandomForestEntr_BAG_L1 ... Training model for up to 3088.25s of the 5170.51s of remaining time.
[36m(_dystack pid=1522)[0m 	0.9114	 = Validation score   (roc_auc)
[36m(_dystack pid=1522)[0m 	111.77s	 = Training   runtime
[36m(_dystack pid=1522)[0m 	21.72s	 = Validation runtime
[36m(_dystack pid=1522)[0m Fitting model: CatBoost_BAG_L1 ... Training model for up to 2953.51s of the 5035.77s of remaining time.
[36m(_dystack pid=1522

                          model  score_val eval_metric  pred_time_val  \
0           WeightedEnsemble_L3   0.922946     roc_auc     728.725211   
1        NeuralNetFastAI_BAG_L2   0.922809     roc_auc     624.504262   
2                XGBoost_BAG_L2   0.922742     roc_auc     621.668251   
3               LightGBM_BAG_L2   0.922686     roc_auc     624.736338   
4               CatBoost_BAG_L2   0.922616     roc_auc     620.384613   
5             LightGBMXT_BAG_L2   0.922542     roc_auc     631.300204   
6          LightGBMLarge_BAG_L2   0.922442     roc_auc     628.110886   
7           WeightedEnsemble_L2   0.922417     roc_auc     346.511343   
8          LightGBM_r131_BAG_L2   0.922403     roc_auc     622.275187   
9          LightGBM_r131_BAG_L1   0.922285     roc_auc     275.805714   
10    NeuralNetTorch_r79_BAG_L2   0.922204     roc_auc     625.928045   
11        NeuralNetTorch_BAG_L2   0.922068     roc_auc     625.646583   
12         CatBoost_r177_BAG_L2   0.921944     roc_

Unnamed: 0,id,loan_paid_back
0,593994,0.934494
1,593995,0.978753
2,593996,0.506093
3,593997,0.905129
4,593998,0.962052


In [9]:
# ============================================================
# 10. SUBMIT TO KAGGLE COMPETITION
# ============================================================
# Prereqs (only need to do once per runtime):
# 1) You have kaggle.json in ~/.kaggle/kaggle.json
# 2) You ran:  !chmod 600 ~/.kaggle/kaggle.json
# 3) The file "submission.csv" exists in the current working directory.

!pip install -q kaggle

from kaggle.api.kaggle_api_extended import KaggleApi
api = KaggleApi()
api.authenticate()

COMPETITION = "playground-series-s5e11"
SUBMISSION_FILE = "submission.csv"
MESSAGE = "AutoGluon extreme in Colab"

print(f"Submitting {SUBMISSION_FILE} to {COMPETITION} with message: '{MESSAGE}'")
api.competition_submit(
    file_name=SUBMISSION_FILE,
    message=MESSAGE,
    competition=COMPETITION
)
print("Submission sent.")


Submitting submission.csv to playground-series-s5e11 with message: 'AutoGluon extreme in Colab'


100%|██████████| 4.25M/4.25M [00:00<00:00, 5.82MB/s]


Submission sent.


In [16]:
fi = predictor.feature_importance(train_fe)
print(fi)

These features in provided data are not utilized by the predictor and will be ignored: ['id']
Computing feature importance via permutation shuffling for 16 features using 5000 rows with 5 shuffle sets...
	1514.66s	= Expected runtime (302.93s per shuffle set)
	819.45s	= Actual runtime (Completed 5 of 5 shuffle sets)


                      importance    stddev       p_value  n  p99_high  \
employment_status       0.191806  0.011328  1.453327e-06  5  0.215131   
debt_to_income_ratio    0.056826  0.004876  6.441856e-06  5  0.066866   
credit_score            0.049524  0.004018  5.152240e-06  5  0.057797   
dti_x_interest          0.008374  0.000524  1.828962e-06  5  0.009453   
interest_rate           0.007229  0.001160  7.678873e-05  5  0.009617   
grade_subgrade          0.006841  0.000474  2.752356e-06  5  0.007817   
loan_amount             0.005387  0.000283  9.164347e-07  5  0.005971   
loan_to_income          0.004676  0.000826  1.123628e-04  5  0.006378   
annual_income           0.004530  0.000572  2.988819e-05  5  0.005709   
log_loan_amount         0.003875  0.000240  1.750091e-06  5  0.004369   
log_annual_income       0.003308  0.000311  9.244102e-06  5  0.003948   
loan_purpose            0.002132  0.000282  3.599223e-05  5  0.002713   
education_level         0.001321  0.000354  5.62417

In [10]:
!zip -r AutogluonModels_extreme.zip AutogluonModels_extreme

from google.colab import files
files.download("AutogluonModels_extreme.zip")


  adding: AutogluonModels_extreme/ (stored 0%)
  adding: AutogluonModels_extreme/version.txt (stored 0%)
  adding: AutogluonModels_extreme/metadata.json (deflated 68%)
  adding: AutogluonModels_extreme/models/ (stored 0%)
  adding: AutogluonModels_extreme/models/NeuralNetFastAI_BAG_L2/ (stored 0%)
  adding: AutogluonModels_extreme/models/NeuralNetFastAI_BAG_L2/S1F4/ (stored 0%)
  adding: AutogluonModels_extreme/models/NeuralNetFastAI_BAG_L2/S1F4/model.pkl (deflated 52%)
  adding: AutogluonModels_extreme/models/NeuralNetFastAI_BAG_L2/S1F4/model-internals.pkl (deflated 54%)
  adding: AutogluonModels_extreme/models/NeuralNetFastAI_BAG_L2/model.pkl (deflated 54%)
  adding: AutogluonModels_extreme/models/NeuralNetFastAI_BAG_L2/S1F1/ (stored 0%)
  adding: AutogluonModels_extreme/models/NeuralNetFastAI_BAG_L2/S1F1/model.pkl (deflated 52%)
  adding: AutogluonModels_extreme/models/NeuralNetFastAI_BAG_L2/S1F1/model-internals.pkl (deflated 54%)
  adding: AutogluonModels_extreme/models/NeuralNetFa

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

In [11]:
files.download("submission.csv")

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>