<a href="https://colab.research.google.com/github/subhashpolisetti/AutoGluon_ML_End-to-End_Implementations/blob/main/1e_autogluon_tabular_gpu.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Training Models with GPU Support


Training with a GPU can significantly speed up base algorithms and is essential for text and vision models, where training without a GPU is infeasibly slow. The CUDA toolkit is required for GPU training. Please refer to the official documentation for installation instructions.

```python
# Initialize the TabularPredictor with the specified label and fit it with GPU support
predictor = TabularPredictor(label=label).fit(
    train_data,
    num_gpus=1,  # Allocate 1 GPU for the entire Tabular Predictor
)


### Define Hyperparameters for Specific Models

You can specify hyperparameters to control the use of GPUs for training specific models. Here’s how to do it:

```python
# Define hyperparameters for specific models
hyperparameters = {
    'GBM': [
        {'ag_args_fit': {'num_gpus': 0}},  # Train using CPU
        {'ag_args_fit': {'num_gpus': 1}}   # Train using GPU (must be <= total num_gpus allocated to TabularPredictor)
    ]
}

# Initialize the TabularPredictor with the specified label and fit it with GPU support
predictor = TabularPredictor(label=label).fit(
    train_data,
    num_gpus=1,
    hyperparameters=hyperparameters,
)




### Multi-modal Training

In the **Multimodal Data Tables: Tabular, Text, and Image** tutorial, we discussed how to train an ensemble model that can utilize tabular data, text, and images. If the available GPUs do not have enough VRAM to fit the default model, or if you need to speed up testing, you can use different backends.

To retrieve the regular configuration, you can do it as follows:
```python
# Example code to retrieve regular configuration
regular_configuration = predictor.get_model_best()


In [3]:
# Install AutoGluon with all its dependencies for tabular data
!pip install autogluon.tabular[all]


Collecting autogluon.tabular[all]
  Downloading autogluon.tabular-1.1.1-py3-none-any.whl.metadata (13 kB)
Collecting scipy<1.13,>=1.5.4 (from autogluon.tabular[all])
  Downloading scipy-1.12.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (60 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m60.4/60.4 kB[0m [31m2.3 MB/s[0m eta [36m0:00:00[0m
Collecting scikit-learn<1.4.1,>=1.3.0 (from autogluon.tabular[all])
  Downloading scikit_learn-1.4.0-1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (11 kB)
Collecting autogluon.core==1.1.1 (from autogluon.tabular[all])
  Downloading autogluon.core-1.1.1-py3-none-any.whl.metadata (11 kB)
Collecting autogluon.features==1.1.1 (from autogluon.tabular[all])
  Downloading autogluon.features-1.1.1-py3-none-any.whl.metadata (11 kB)
Collecting xgboost<2.1,>=1.6 (from autogluon.tabular[all])
  Downloading xgboost-2.0.3-py3-none-manylinux2014_x86_64.whl.metadata (2.0 kB)
Collecting torch<2.4,

In [4]:
# Importing the function to get hyperparameter configurations for AutoGluon
from autogluon.tabular.configs.hyperparameter_configs import get_hyperparameter_config

# Retrieving the hyperparameter configuration for the 'multimodal' task
hyperparameters = get_hyperparameter_config('multimodal')

# Displaying the hyperparameters for the multimodal configuration
hyperparameters


{'NN_TORCH': {},
 'GBM': [{},
  {'extra_trees': True, 'ag_args': {'name_suffix': 'XT'}},
  'GBMLarge'],
 'CAT': {},
 'XGB': {},
 'AG_AUTOMM': {},
 'VW': {}}

### Enabling GPU for LightGBM

The default installation of LightGBM does not support GPU training; however, GPU support can be enabled through a special installation. If `num_gpus` is set, the following warning will be displayed:

```python
# Example warning message when GPU support is not enabled
print("Warning: LightGBM is not installed with GPU support. Please install it with GPU support to use GPU training.")


# If the suggested commands do not work, follow these steps to uninstall and reinstall LightGBM:

# Uninstall the existing LightGBM package
!pip uninstall -y lightgbm

Install LightGBM from source following the instructions in the official guide
Ensure to include the optional "Install Python Interface" section to enable compatibility with AutoGluon


# Advanced Resource Allocation

Most of the time, you would only need to set num_cpus and num_gpus at the predictor fit level to control the total resources you granted to the TabularPredictor. However, if you want to have more detailed control, we offer the following options.

ag_args_ensemble: ag_args_fit: { RESOURCES } allows you to control the total resources granted to a bagged model. If using parallel folding strategy, individual base model's resources will be calculated respectively. This value needs to be <= total resources granted to TabularPredictor This parameter will be ignored if bagging model is not enabled.

ag_args_fit: { RESOURCES } allows you to control the total resources granted to a single base model. This value needs to be <= total resources granted to TabularPredictor and <= total resources granted to a bagged model if applicable.

As an example, consider the following scenario



### Fit the AutoGluon predictor with specified parameters

```python
predictor.fit(
    num_cpus=32,  # Total CPUs to use for fitting
    num_gpus=4,   # Total GPUs to use for fitting
    hyperparameters={
        'NN_TORCH': {},  # Hyperparameters for the Neural Network (PyTorch)
    },
    num_bag_folds=2,  # Number of bagging folds for ensemble learning
    ag_args_ensemble={  # Arguments for ensemble fitting
        'ag_args_fit': {
            'num_cpus': 10,  # CPUs for fitting the ensemble
            'num_gpus': 2,   # GPUs for fitting the ensemble
        }
    },
    ag_args_fit={  # Arguments for fitting the model
        'num_cpus': 4,   # CPUs for model fitting
        'num_gpus': 0.5, # Fraction of a GPU for model fitting
    },
    hyperparameter_tune_kwargs={  # Hyperparameter tuning arguments
        'searcher': 'random',  # Searcher method for hyperparameter tuning
        'scheduler': 'local',   # Scheduler for resource allocation
        'num_trials': 2         # Number of trials for tuning
    }
)



### Resource Allocation for TabularPredictor and HPO Trials

1. **HPO Trials**: We train 2 Hyperparameter Optimization (HPO) trials, which run 2 folds in parallel simultaneously.
   - **Total resources for TabularPredictor**: 32 CPUs and 4 GPUs.

2. **Bagged Model Resources**:
   - For the bagged model, we allocate 10 CPUs and 2 GPUs.
   - This means we run two HPO trials in parallel, each utilizing 10 CPUs and 2 GPUs, resulting in a total of:
     - **20 CPUs and 4 GPUs**.

3. **Individual Model Base Resources**:
   - For an individual model base, we want to use 4 CPUs and 0.5 GPUs.
   - Given that we can train two folds in parallel according to the bagged level resources:
     - This translates to **8 CPUs and 1 GPU** for a bagged model.
     - Thus, when running two trials in parallel, we will use **16 CPUs and 2 GPUs**.

4. **Summary**:
   - In total, we will use **16 CPUs and 2 GPUs** with two trials of the bagged model running in parallel, each training two folds concurrently.
   - This results in **4 models training in parallel**.
