# Introduction to AutoGluon

AutoGluon is an open-source library designed to simplify the process of machine learning by automating the model selection and training process. It’s particularly useful for tabular data, and allows you to train high-quality models with minimal effort and code.

**Key features of AutoGluon:**
- **AutoML for Tabular Data**: AutoGluon automatically selects and trains a variety of models (like Random Forests, XGBoost, Neural Networks, etc.) to find the best-performing model for your dataset.
- **Ensemble Methods**: AutoGluon combines different models through ensembling techniques to boost prediction accuracy.
- **Easy-to-Use API**: With only a few lines of code, you can build powerful machine learning models.
- **Hyperparameter Optimization**: AutoGluon automates the process of hyperparameter tuning, helping you find the best parameters for your models.
- **Supports Multiple Task Types**: You can use AutoGluon for classification, regression, and other tasks with minimal configuration.

AutoGluon is an excellent choice for users who want to quickly build predictive models without needing to fine-tune machine learning algorithms manually.


In [2]:
# Install AutoGluon library for tabular data prediction
!pip install autogluon

Collecting autogluon
  Downloading autogluon-1.1.1-py3-none-any.whl.metadata (11 kB)
Collecting autogluon.core==1.1.1 (from autogluon.core[all]==1.1.1->autogluon)
  Downloading autogluon.core-1.1.1-py3-none-any.whl.metadata (11 kB)
Collecting autogluon.features==1.1.1 (from autogluon)
  Downloading autogluon.features-1.1.1-py3-none-any.whl.metadata (11 kB)
Collecting autogluon.tabular==1.1.1 (from autogluon.tabular[all]==1.1.1->autogluon)
  Downloading autogluon.tabular-1.1.1-py3-none-any.whl.metadata (13 kB)
Collecting autogluon.multimodal==1.1.1 (from autogluon)
  Downloading autogluon.multimodal-1.1.1-py3-none-any.whl.metadata (12 kB)
Collecting autogluon.timeseries==1.1.1 (from autogluon.timeseries[all]==1.1.1->autogluon)
  Downloading autogluon.timeseries-1.1.1-py3-none-any.whl.metadata (12 kB)
Collecting scipy<1.13,>=1.5.4 (from autogluon.core==1.1.1->autogluon.core[all]==1.1.1->autogluon)
  Downloading scipy-1.12.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metad

In [1]:
# Install the Kaggle package to enable Kaggle API functionality
!pip install kaggle



In [2]:
# Upload the kaggle.json file to Colab
from google.colab import files
files.upload()

Saving kaggle.json to kaggle.json


{'kaggle.json': b'{"username":"subhashpolisetti347","key":"0590675aeb1ac3bbbb4f3b6a4ac2e351"}'}

In [3]:
# Create the Kaggle directory if it doesn't exist
!mkdir -p ~/.kaggle

# Move the kaggle.json file to this directory
!mv kaggle.json ~/.kaggle/

# Set the required permissions for the file
!chmod 600 ~/.kaggle/kaggle.json

In [4]:
# Download the IEEE-CIS Fraud Detection dataset from Kaggle
!kaggle competitions download -c ieee-fraud-detection

# Unzip the dataset
!unzip ieee-fraud-detection.zip


Downloading ieee-fraud-detection.zip to /content
 95% 112M/118M [00:00<00:00, 153MB/s] 
100% 118M/118M [00:00<00:00, 154MB/s]
Archive:  ieee-fraud-detection.zip
  inflating: sample_submission.csv   
  inflating: test_identity.csv       
  inflating: test_transaction.csv    
  inflating: train_identity.csv      
  inflating: train_transaction.csv   


In [5]:
import pandas as pd

# Directory where the unzipped CSV files are located
directory = '/content/'

# Load the transaction and identity datasets
train_identity = pd.read_csv(directory+'train_identity.csv')
train_transaction = pd.read_csv(directory+'train_transaction.csv')

# Merge the two datasets on 'TransactionID'
train_data = pd.merge(train_transaction, train_identity, on='TransactionID', how='left')

# Check the first few rows to ensure data is loaded correctly
train_data.head()

Unnamed: 0,TransactionID,isFraud,TransactionDT,TransactionAmt,ProductCD,card1,card2,card3,card4,card5,...,id_31,id_32,id_33,id_34,id_35,id_36,id_37,id_38,DeviceType,DeviceInfo
0,2987000,0,86400,68.5,W,13926,,150.0,discover,142.0,...,,,,,,,,,,
1,2987001,0,86401,29.0,W,2755,404.0,150.0,mastercard,102.0,...,,,,,,,,,,
2,2987002,0,86469,59.0,W,4663,490.0,150.0,visa,166.0,...,,,,,,,,,,
3,2987003,0,86499,50.0,W,18132,567.0,150.0,mastercard,117.0,...,,,,,,,,,,
4,2987004,0,86506,50.0,H,4497,514.0,150.0,mastercard,102.0,...,samsung browser 6.2,32.0,2220x1080,match_status:2,T,F,T,T,mobile,SAMSUNG SM-G892A Build/NRD90M


In [6]:
from autogluon.tabular import TabularPredictor

# Define the target label and evaluation metric
label = 'isFraud'
eval_metric = 'roc_auc'

# Define the save path for AutoGluon models
save_path = '/content/AutoGluonModels/'

# Train the model with AutoGluon
predictor = TabularPredictor(label=label, eval_metric=eval_metric, path=save_path, verbosity=3).fit(
    train_data, presets='good_quality', time_limit=3600, excluded_model_types=['NN', 'STACKER'], keep_only_best=True
)

# Print the summary of the fit process
results = predictor.fit_summary()

Verbosity: 3 (Detailed Logging)
AutoGluon Version:  1.1.1
Python Version:     3.10.12
Operating System:   Linux
Platform Machine:   x86_64
Platform Version:   #1 SMP PREEMPT_DYNAMIC Thu Jun 27 21:05:47 UTC 2024
CPU Count:          96
GPU Count:          0
Memory Avail:       324.49 GB / 334.56 GB (97.0%)
Disk Space Avail:   197.25 GB / 225.33 GB (87.5%)
Presets specified: ['good_quality']
User Specified kwargs:
{'auto_stack': True,
 'excluded_model_types': ['NN', 'STACKER'],
 'keep_only_best': True,
 'num_bag_sets': 1,
 'refit_full': True,
 'save_bag_folds': False,
 'set_best_to_refit_full': True}
Full kwargs:
{'_feature_generator_kwargs': None,
 '_save_bag_folds': None,
 'ag_args': None,
 'ag_args_ensemble': None,
 'ag_args_fit': None,
 'auto_stack': True,
 'calibrate': 'auto',
 'ds_args': {'clean_up_fits': True,
             'detection_time_frac': 0.25,
             'enable_ray_logging': True,
             'holdout_data': None,
             'holdout_frac': 0.1111111111111111,
       

[36m(_ray_fit pid=16505)[0m [50]	valid_set's binary_logloss: 0.0964283
[36m(_ray_fit pid=16508)[0m [100]	valid_set's binary_logloss: 0.0852749[32m [repeated 10x across cluster] (Ray deduplicates logs by default. Set RAY_DEDUP_LOGS=0 to disable log deduplication, or see https://docs.ray.io/en/master/ray-observability/ray-logging.html#log-deduplication for more options.)[0m
[36m(_ray_fit pid=16504)[0m [150]	valid_set's binary_logloss: 0.0826913[32m [repeated 12x across cluster][0m
[36m(_ray_fit pid=16507)[0m [250]	valid_set's binary_logloss: 0.077055[32m [repeated 12x across cluster][0m
[36m(_ray_fit pid=16509)[0m [300]	valid_set's binary_logloss: 0.0745371[32m [repeated 11x across cluster][0m
[36m(_ray_fit pid=16505)[0m [400]	valid_set's binary_logloss: 0.072224[32m [repeated 13x across cluster][0m
[36m(_ray_fit pid=16506)[0m [500]	valid_set's binary_logloss: 0.0668289[32m [repeated 14x across cluster][0m
[36m(_ray_fit pid=16508)[0m [600]	valid_set's binary_l

[36m(_dystack pid=8788)[0m Saving /content/AutoGluonModels/ds_sub_fit/sub_fit_ho/models/LightGBMXT_BAG_L1/utils/oof.pkl
[36m(_dystack pid=8788)[0m Saving /content/AutoGluonModels/ds_sub_fit/sub_fit_ho/models/LightGBMXT_BAG_L1/model.pkl
[36m(_dystack pid=8788)[0m 	0.9698	 = Validation score   (roc_auc)
[36m(_dystack pid=8788)[0m 	461.84s	 = Training   runtime
[36m(_dystack pid=8788)[0m 	230.94s	 = Validation runtime
[36m(_dystack pid=8788)[0m 	284.1	 = Inference  throughput (rows/s | 65616 batch size)
[36m(_dystack pid=8788)[0m Saving /content/AutoGluonModels/ds_sub_fit/sub_fit_ho/models/trainer.pkl
[36m(_dystack pid=8788)[0m Fitting model: LightGBM_BAG_L1 ... Training model for up to 73.25s of the 360.69s of remaining time.
[36m(_dystack pid=8788)[0m 	Fitting LightGBM_BAG_L1 with 'num_gpus': 0, 'num_cpus': 96
[36m(_dystack pid=8788)[0m Saving /content/AutoGluonModels/ds_sub_fit/sub_fit_ho/models/LightGBM_BAG_L1/utils/model_template.pkl
[36m(_dystack pid=8788)[0m L

[36m(_ray_fit pid=19462)[0m [50]	valid_set's binary_logloss: 0.0870287[32m [repeated 4x across cluster][0m
[36m(_ray_fit pid=19462)[0m [100]	valid_set's binary_logloss: 0.078415[32m [repeated 10x across cluster][0m
[36m(_ray_fit pid=19464)[0m [150]	valid_set's binary_logloss: 0.0745762[32m [repeated 9x across cluster][0m
[36m(_ray_fit pid=19463)[0m [250]	valid_set's binary_logloss: 0.0718996[32m [repeated 11x across cluster][0m
[36m(_ray_fit pid=19465)[0m [250]	valid_set's binary_logloss: 0.0711994[32m [repeated 15x across cluster][0m
[36m(_ray_fit pid=19459)[0m [450]	valid_set's binary_logloss: 0.0638379[32m [repeated 15x across cluster][0m
[36m(_ray_fit pid=19466)[0m [500]	valid_set's binary_logloss: 0.0624308[32m [repeated 13x across cluster][0m
[36m(_ray_fit pid=19463)[0m [600]	valid_set's binary_logloss: 0.062946[32m [repeated 11x across cluster][0m


[36m(_dystack pid=8788)[0m Saving /content/AutoGluonModels/ds_sub_fit/sub_fit_ho/models/LightGBM_BAG_L1/utils/oof.pkl
[36m(_dystack pid=8788)[0m Saving /content/AutoGluonModels/ds_sub_fit/sub_fit_ho/models/LightGBM_BAG_L1/model.pkl
[36m(_dystack pid=8788)[0m 	0.9523	 = Validation score   (roc_auc)
[36m(_dystack pid=8788)[0m 	60.66s	 = Training   runtime
[36m(_dystack pid=8788)[0m 	9.71s	 = Validation runtime
[36m(_dystack pid=8788)[0m 	6759.4	 = Inference  throughput (rows/s | 65616 batch size)
[36m(_dystack pid=8788)[0m Saving /content/AutoGluonModels/ds_sub_fit/sub_fit_ho/models/trainer.pkl
[36m(_dystack pid=8788)[0m Fitting model: RandomForestGini_BAG_L1 ... Training model for up to 3.82s of the 291.26s of remaining time.
[36m(_dystack pid=8788)[0m 	Fitting RandomForestGini_BAG_L1 with 'num_gpus': 0, 'num_cpus': 96
[36m(_dystack pid=8788)[0m Saving /content/AutoGluonModels/ds_sub_fit/sub_fit_ho/models/RandomForestGini_BAG_L1/utils/model_template.pkl
[36m(_dystac

[36m(_ray_fit pid=21089)[0m [50]	valid_set's binary_logloss: 0.056042[32m [repeated 13x across cluster][0m
[36m(_ray_fit pid=21093)[0m [100]	valid_set's binary_logloss: 0.0462755[32m [repeated 10x across cluster][0m
[36m(_ray_fit pid=21092)[0m [150]	valid_set's binary_logloss: 0.0472325[32m [repeated 9x across cluster][0m
[36m(_ray_fit pid=21086)[0m [200]	valid_set's binary_logloss: 0.0445287[32m [repeated 13x across cluster][0m
[36m(_ray_fit pid=21090)[0m [300]	valid_set's binary_logloss: 0.0438801[32m [repeated 13x across cluster][0m
[36m(_ray_fit pid=21092)[0m [400]	valid_set's binary_logloss: 0.0429908[32m [repeated 12x across cluster][0m
[36m(_ray_fit pid=21088)[0m [550]	valid_set's binary_logloss: 0.041922[32m [repeated 12x across cluster][0m
[36m(_ray_fit pid=21087)[0m [500]	valid_set's binary_logloss: 0.0430395[32m [repeated 13x across cluster][0m
[36m(_ray_fit pid=21091)[0m [600]	valid_set's binary_logloss: 0.0441219[32m [repeated 15x across 

[36m(_dystack pid=8788)[0m Saving /content/AutoGluonModels/ds_sub_fit/sub_fit_ho/models/LightGBMXT_BAG_L2/utils/oof.pkl
[36m(_dystack pid=8788)[0m Saving /content/AutoGluonModels/ds_sub_fit/sub_fit_ho/models/LightGBMXT_BAG_L2/model.pkl
[36m(_dystack pid=8788)[0m 	0.972	 = Validation score   (roc_auc)
[36m(_dystack pid=8788)[0m 	93.49s	 = Training   runtime
[36m(_dystack pid=8788)[0m 	8.2s	 = Validation runtime
[36m(_dystack pid=8788)[0m 	263.7	 = Inference  throughput (rows/s | 65616 batch size)
[36m(_dystack pid=8788)[0m Saving /content/AutoGluonModels/ds_sub_fit/sub_fit_ho/models/trainer.pkl
[36m(_dystack pid=8788)[0m Fitting model: LightGBM_BAG_L2 ... Training model for up to 147.85s of the 147.16s of remaining time.
[36m(_dystack pid=8788)[0m 	Fitting LightGBM_BAG_L2 with 'num_gpus': 0, 'num_cpus': 96
[36m(_dystack pid=8788)[0m Saving /content/AutoGluonModels/ds_sub_fit/sub_fit_ho/models/LightGBM_BAG_L2/utils/model_template.pkl
[36m(_dystack pid=8788)[0m Loadi

[36m(_ray_fit pid=22476)[0m [50]	valid_set's binary_logloss: 0.0436994[32m [repeated 3x across cluster][0m
[36m(_ray_fit pid=22479)[0m [100]	valid_set's binary_logloss: 0.0433817[32m [repeated 13x across cluster][0m
[36m(_ray_fit pid=22477)[0m [200]	valid_set's binary_logloss: 0.0426208[32m [repeated 11x across cluster][0m


[36m(_dystack pid=8788)[0m Saving /content/AutoGluonModels/ds_sub_fit/sub_fit_ho/models/LightGBM_BAG_L2/utils/oof.pkl
[36m(_dystack pid=8788)[0m Saving /content/AutoGluonModels/ds_sub_fit/sub_fit_ho/models/LightGBM_BAG_L2/model.pkl
[36m(_dystack pid=8788)[0m 	0.9727	 = Validation score   (roc_auc)
[36m(_dystack pid=8788)[0m 	26.97s	 = Training   runtime
[36m(_dystack pid=8788)[0m 	2.6s	 = Validation runtime
[36m(_dystack pid=8788)[0m 	269.7	 = Inference  throughput (rows/s | 65616 batch size)
[36m(_dystack pid=8788)[0m Saving /content/AutoGluonModels/ds_sub_fit/sub_fit_ho/models/trainer.pkl
[36m(_dystack pid=8788)[0m Fitting model: RandomForestGini_BAG_L2 ... Training model for up to 112.99s of the 112.3s of remaining time.
[36m(_dystack pid=8788)[0m 	Fitting RandomForestGini_BAG_L2 with 'num_gpus': 0, 'num_cpus': 96
[36m(_dystack pid=8788)[0m Saving /content/AutoGluonModels/ds_sub_fit/sub_fit_ho/models/RandomForestGini_BAG_L2/utils/model_template.pkl
[36m(_dystack

[36m(_ray_fit pid=24273)[0m 0:	learn: 0.5878536	test: 0.5879774	best: 0.5879774 (0)	total: 842ms	remaining: 2h 20m 18s
[36m(_ray_fit pid=22478)[0m [200]	valid_set's binary_logloss: 0.0414538[32m [repeated 3x across cluster][0m
[36m(_ray_fit pid=24274)[0m 20:	learn: 0.0650284	test: 0.0653728	best: 0.0653728 (20)	total: 19.6s	remaining: 2h 35m 6s[32m [repeated 8x across cluster][0m
[36m(_ray_fit pid=24271)[0m 
[36m(_ray_fit pid=24271)[0m bestTest = 0.05550287062
[36m(_ray_fit pid=24271)[0m bestIteration = 24
[36m(_ray_fit pid=24271)[0m 
[36m(_ray_fit pid=24271)[0m Shrink model to first 25 iterations.


[36m(_dystack pid=8788)[0m Saving /content/AutoGluonModels/ds_sub_fit/sub_fit_ho/models/CatBoost_BAG_L2/utils/oof.pkl
[36m(_dystack pid=8788)[0m Saving /content/AutoGluonModels/ds_sub_fit/sub_fit_ho/models/CatBoost_BAG_L2/model.pkl
[36m(_dystack pid=8788)[0m 	0.9622	 = Validation score   (roc_auc)
[36m(_dystack pid=8788)[0m 	34.83s	 = Training   runtime
[36m(_dystack pid=8788)[0m 	1.37s	 = Validation runtime
[36m(_dystack pid=8788)[0m 	271.1	 = Inference  throughput (rows/s | 65616 batch size)
[36m(_dystack pid=8788)[0m Saving /content/AutoGluonModels/ds_sub_fit/sub_fit_ho/models/trainer.pkl
[36m(_dystack pid=8788)[0m Skipping ExtraTreesGini_BAG_L2 due to lack of time remaining.
[36m(_dystack pid=8788)[0m Saving /content/AutoGluonModels/ds_sub_fit/sub_fit_ho/models/trainer.pkl
[36m(_dystack pid=8788)[0m Skipping ExtraTreesEntr_BAG_L2 due to lack of time remaining.
[36m(_dystack pid=8788)[0m Saving /content/AutoGluonModels/ds_sub_fit/sub_fit_ho/models/trainer.pkl


[36m(_dystack pid=8788)[0m 0:	learn: 0.5852337	total: 205ms	remaining: 4.92s
[36m(_ray_fit pid=24273)[0m 20:	learn: 0.0646370	test: 0.0654382	best: 0.0654382 (20)	total: 21.5s	remaining: 2h 50m 31s[32m [repeated 7x across cluster][0m
[36m(_ray_fit pid=24272)[0m [32m [repeated 14x across cluster][0m
[36m(_ray_fit pid=24272)[0m bestTest = 0.05497604913[32m [repeated 7x across cluster][0m
[36m(_ray_fit pid=24272)[0m bestIteration = 24[32m [repeated 7x across cluster][0m
[36m(_ray_fit pid=24272)[0m Shrink model to first 25 iterations.[32m [repeated 7x across cluster][0m
[36m(_dystack pid=8788)[0m 20:	learn: 0.0637626	total: 1.92s	remaining: 366ms


[36m(_dystack pid=8788)[0m Saving /content/AutoGluonModels/ds_sub_fit/sub_fit_ho/models/CatBoost_BAG_L2_FULL/model.pkl
[36m(_dystack pid=8788)[0m 	6.51s	 = Training   runtime
[36m(_dystack pid=8788)[0m Saving /content/AutoGluonModels/ds_sub_fit/sub_fit_ho/models/trainer.pkl
[36m(_dystack pid=8788)[0m Loading: /content/AutoGluonModels/ds_sub_fit/sub_fit_ho/models/WeightedEnsemble_L3/model.pkl
[36m(_dystack pid=8788)[0m Fitting model: WeightedEnsemble_L3_FULL | Skipping fit via cloning parent ...
[36m(_dystack pid=8788)[0m 	Ensemble Weights: {'LightGBM_BAG_L2': 0.636, 'LightGBMXT_BAG_L2': 0.318, 'CatBoost_BAG_L2': 0.045}
[36m(_dystack pid=8788)[0m 	13.08s	 = Training   runtime
[36m(_dystack pid=8788)[0m Saving /content/AutoGluonModels/ds_sub_fit/sub_fit_ho/models/WeightedEnsemble_L3_FULL/model.pkl
[36m(_dystack pid=8788)[0m Saving /content/AutoGluonModels/ds_sub_fit/sub_fit_ho/models/trainer.pkl
[36m(_dystack pid=8788)[0m Loading: /content/AutoGluonModels/ds_sub_fit/s

[36m(_dystack pid=8788)[0m 24:	learn: 0.0556805	total: 2.26s	remaining: 0us


[36m(_dystack pid=8788)[0m Loading: /content/AutoGluonModels/ds_sub_fit/sub_fit_ho/models/LightGBMXT_BAG_L1_FULL/model.pkl
[36m(_dystack pid=8788)[0m Loading: /content/AutoGluonModels/ds_sub_fit/sub_fit_ho/models/LightGBM_BAG_L1_FULL/model.pkl
[36m(_dystack pid=8788)[0m Loading: /content/AutoGluonModels/ds_sub_fit/sub_fit_ho/models/WeightedEnsemble_L2_FULL/model.pkl
[36m(_dystack pid=8788)[0m Loading: /content/AutoGluonModels/ds_sub_fit/sub_fit_ho/models/LightGBMXT_BAG_L2_FULL/model.pkl
[36m(_dystack pid=8788)[0m Loading: /content/AutoGluonModels/ds_sub_fit/sub_fit_ho/models/LightGBM_BAG_L2_FULL/model.pkl
[36m(_dystack pid=8788)[0m Loading: /content/AutoGluonModels/ds_sub_fit/sub_fit_ho/models/CatBoost_BAG_L2_FULL/model.pkl
[36m(_dystack pid=8788)[0m Loading: /content/AutoGluonModels/ds_sub_fit/sub_fit_ho/models/WeightedEnsemble_L3_FULL/model.pkl
[36m(_dystack pid=8788)[0m Deleting DyStack predictor artifacts (clean_up_fits=True) ...
Leaderboard on holdout data (DyStack

0:	learn: 0.6121004	total: 167ms	remaining: 1s


Saving /content/AutoGluonModels/models/CatBoost_BAG_L1_FULL/model.pkl
	5.46s	 = Training   runtime
Saving /content/AutoGluonModels/models/trainer.pkl
Loading: /content/AutoGluonModels/models/LightGBMXT_BAG_L2/model.pkl
Loading: /content/AutoGluonModels/models/LightGBMXT_BAG_L2/utils/model_template.pkl
Loading: /content/AutoGluonModels/models/LightGBMXT_BAG_L2/utils/model_template.pkl
Loading: /content/AutoGluonModels/models/LightGBMXT_BAG_L2/utils/model_template.pkl
Fitting 1 L2 models ...
Loading: /content/AutoGluonModels/models/LightGBMXT_BAG_L1/utils/oof.pkl
Loading: /content/AutoGluonModels/models/LightGBM_BAG_L1/utils/oof.pkl
Loading: /content/AutoGluonModels/models/RandomForestGini_BAG_L1/utils/oof.pkl
Loading: /content/AutoGluonModels/models/RandomForestEntr_BAG_L1/utils/oof.pkl
Loading: /content/AutoGluonModels/models/CatBoost_BAG_L1/utils/oof.pkl


6:	learn: 0.3164583	total: 753ms	remaining: 0us


Fitting model: LightGBMXT_BAG_L2_FULL ...
	Fitting LightGBMXT_BAG_L2_FULL with 'num_gpus': 0, 'num_cpus': 48
Saving /content/AutoGluonModels/models/LightGBMXT_BAG_L2_FULL/utils/model_template.pkl
Loading: /content/AutoGluonModels/models/LightGBMXT_BAG_L2_FULL/utils/model_template.pkl
	Fitting 500 rounds... Hyperparameters: {'learning_rate': 0.05, 'extra_trees': True}
Saving /content/AutoGluonModels/models/LightGBMXT_BAG_L2_FULL/model.pkl
	24.33s	 = Training   runtime
Saving /content/AutoGluonModels/models/trainer.pkl
Loading: /content/AutoGluonModels/models/LightGBM_BAG_L2/model.pkl
Loading: /content/AutoGluonModels/models/LightGBM_BAG_L2/utils/model_template.pkl
Loading: /content/AutoGluonModels/models/LightGBM_BAG_L2/utils/model_template.pkl
Loading: /content/AutoGluonModels/models/LightGBM_BAG_L2/utils/model_template.pkl
Fitting 1 L2 models ...
Loading: /content/AutoGluonModels/models/LightGBMXT_BAG_L1/utils/oof.pkl
Loading: /content/AutoGluonModels/models/LightGBM_BAG_L1/utils/oof.

0:	learn: 0.5719813	total: 258ms	remaining: 55.9s
20:	learn: 0.0575983	total: 4.94s	remaining: 46.4s
40:	learn: 0.0397486	total: 9.43s	remaining: 40.7s
60:	learn: 0.0374595	total: 14.1s	remaining: 36.4s
80:	learn: 0.0368754	total: 18.8s	remaining: 31.9s
100:	learn: 0.0366908	total: 23.5s	remaining: 27.2s
120:	learn: 0.0365256	total: 27.9s	remaining: 22.4s
140:	learn: 0.0363970	total: 32.6s	remaining: 17.8s
160:	learn: 0.0363196	total: 37s	remaining: 13.1s
180:	learn: 0.0362517	total: 41.4s	remaining: 8.46s
200:	learn: 0.0361939	total: 45.8s	remaining: 3.87s


Saving /content/AutoGluonModels/models/CatBoost_BAG_L2_FULL/model.pkl
	54.3s	 = Training   runtime
Saving /content/AutoGluonModels/models/trainer.pkl
Loading: /content/AutoGluonModels/models/WeightedEnsemble_L3/model.pkl
Fitting model: WeightedEnsemble_L3_FULL | Skipping fit via cloning parent ...
	Ensemble Weights: {'LightGBMXT_BAG_L2': 0.36, 'LightGBM_BAG_L2': 0.32, 'CatBoost_BAG_L2': 0.28, 'RandomForestEntr_BAG_L2': 0.04}
	25.9s	 = Training   runtime
Saving /content/AutoGluonModels/models/WeightedEnsemble_L3_FULL/model.pkl
Saving /content/AutoGluonModels/models/trainer.pkl


217:	learn: 0.0361477	total: 49.7s	remaining: 0us


Loading: /content/AutoGluonModels/models/LightGBMXT_BAG_L1_FULL/model.pkl
Saving /content/AutoGluonModels/models/LightGBMXT_BAG_L1_FULL/model.pkl
Loading: /content/AutoGluonModels/models/LightGBM_BAG_L1_FULL/model.pkl
Saving /content/AutoGluonModels/models/LightGBM_BAG_L1_FULL/model.pkl
Loading: /content/AutoGluonModels/models/RandomForestGini_BAG_L1_FULL/model.pkl
Saving /content/AutoGluonModels/models/RandomForestGini_BAG_L1_FULL/model.pkl
Loading: /content/AutoGluonModels/models/RandomForestEntr_BAG_L1_FULL/model.pkl
Saving /content/AutoGluonModels/models/RandomForestEntr_BAG_L1_FULL/model.pkl
Loading: /content/AutoGluonModels/models/CatBoost_BAG_L1_FULL/model.pkl
Saving /content/AutoGluonModels/models/CatBoost_BAG_L1_FULL/model.pkl
Loading: /content/AutoGluonModels/models/LightGBMXT_BAG_L2_FULL/model.pkl
Saving /content/AutoGluonModels/models/LightGBMXT_BAG_L2_FULL/model.pkl
Loading: /content/AutoGluonModels/models/LightGBM_BAG_L2_FULL/model.pkl
Saving /content/AutoGluonModels/mode

*** Summary of fit() ***
Estimated performance of each model:
                          model score_val eval_metric  pred_time_val    fit_time  pred_time_val_marginal  fit_time_marginal  stack_level  can_infer  fit_order
0  RandomForestGini_BAG_L1_FULL      None     roc_auc     125.710804   37.550590              125.710804          37.550590            1       True          3
1  RandomForestEntr_BAG_L1_FULL      None     roc_auc     127.032806   33.069827              127.032806          33.069827            1       True          4
2      WeightedEnsemble_L3_FULL      None     roc_auc            NaN  591.306964                     NaN          25.897851            3       True         10
3  RandomForestEntr_BAG_L2_FULL      None     roc_auc            NaN  472.592563              128.499431          33.567673            2       True          8
4          LightGBM_BAG_L2_FULL      None     roc_auc            NaN  453.212598                     NaN          14.187708            2       



In [9]:
# Display the column names of the training data
print("Training Data Columns:")
print(train_data.columns)



Training Data Columns:
Index(['TransactionID', 'isFraud', 'TransactionDT', 'TransactionAmt',
       'ProductCD', 'card1', 'card2', 'card3', 'card4', 'card5',
       ...
       'id_31', 'id_32', 'id_33', 'id_34', 'id_35', 'id_36', 'id_37', 'id_38',
       'DeviceType', 'DeviceInfo'],
      dtype='object', length=434)


In [10]:
# Replace hyphens with underscores in train_identity column names to match train_transaction
train_identity.columns = train_identity.columns.str.replace('-', '_')



In [11]:
# Merge the training datasets on 'TransactionID'
train_data = pd.merge(train_transaction, train_identity, on='TransactionID', how='left')


In [12]:
# Display the first few rows of the training data
print("First few rows of the training data:")
print(train_data.head())

First few rows of the training data:
   TransactionID  isFraud  TransactionDT  TransactionAmt ProductCD  card1  \
0        2987000        0          86400            68.5         W  13926   
1        2987001        0          86401            29.0         W   2755   
2        2987002        0          86469            59.0         W   4663   
3        2987003        0          86499            50.0         W  18132   
4        2987004        0          86506            50.0         H   4497   

   card2  card3       card4  card5  ...                id_31  id_32  \
0    NaN  150.0    discover  142.0  ...                  NaN    NaN   
1  404.0  150.0  mastercard  102.0  ...                  NaN    NaN   
2  490.0  150.0        visa  166.0  ...                  NaN    NaN   
3  567.0  150.0  mastercard  117.0  ...                  NaN    NaN   
4  514.0  150.0  mastercard  102.0  ...  samsung browser 6.2   32.0   

       id_33           id_34  id_35 id_36 id_37  id_38  DeviceType  \
0  

In [14]:
# Make predictions on the test data
# Note: Since 'isFraud' is not in test_data, ensure it is removed from the features if present
if 'isFraud' in train_data.columns:
    test_data = train_data.drop(columns=['isFraud'])

In [15]:
# Predict the probability of the positive class (fraudulent transactions)
y_pred_proba = predictor.predict_proba(test_data, as_multiclass=False)

Loading: /content/AutoGluonModels/models/CatBoost_BAG_L1_FULL/model.pkl
Loading: /content/AutoGluonModels/models/LightGBMXT_BAG_L1_FULL/model.pkl
Loading: /content/AutoGluonModels/models/LightGBM_BAG_L1_FULL/model.pkl
Loading: /content/AutoGluonModels/models/RandomForestEntr_BAG_L1_FULL/model.pkl
Loading: /content/AutoGluonModels/models/RandomForestGini_BAG_L1_FULL/model.pkl
Loading: /content/AutoGluonModels/models/CatBoost_BAG_L2_FULL/model.pkl
Loading: /content/AutoGluonModels/models/LightGBMXT_BAG_L2_FULL/model.pkl
Loading: /content/AutoGluonModels/models/LightGBM_BAG_L2_FULL/model.pkl
Loading: /content/AutoGluonModels/models/RandomForestEntr_BAG_L2_FULL/model.pkl
Loading: /content/AutoGluonModels/models/WeightedEnsemble_L3_FULL/model.pkl


In [16]:
# Load the sample submission file
submission = pd.read_csv(directory + 'sample_submission.csv')

In [18]:
# Display the first few rows of the submission file
print("First few rows of the submission file:")
print(submission.head())


First few rows of the submission file:
   TransactionID  isFraud
0        3663549      0.5
1        3663550      0.5
2        3663551      0.5
3        3663552      0.5
4        3663553      0.5


In [19]:
# Save the submission file
submission.to_csv(directory + 'my_submission.csv', index=False)

print("\nSubmission file saved to:", directory + 'my_submission.csv')



Submission file saved to: /content/my_submission.csv
