# Search Best Architecture and Hyperparameter

Sometimes (or often) we do not know exactly which architecture is the best for our data. In artificial intelligence, it is common for an architecture to be the best for one dataset and not so good for another dataset. To try to help to find the best solution, this Notebook will use two main function in PyTorch Tabular. One of them is Sweep to run all architecture available in PyTorch Tabular with default hyperparameters to search for the possible best architecture for our data. Afterward, we will use Tuner to search for the best hyperparameter of the best architecture that we found in Sweep.

In [1]:
import warnings
warnings.filterwarnings("ignore")

from sklearn.model_selection import train_test_split

from pytorch_tabular.utils import make_mixed_dataset
from pytorch_tabular.config import DataConfig, OptimizerConfig, TrainerConfig

## Data
First of all, let's create a synthetic data which is a mix of numerical and categorical features and have multiple targets for classification. It means that there are multiple columns which we need to predict with the same set of features.

In [2]:
data, cat_col_names, num_col_names = make_mixed_dataset(
    task="classification", n_samples=3000, n_features=7, n_categories=4
)

train, test = train_test_split(data, random_state=42)
train, valid = train_test_split(train, random_state=42)

## Common Configs

In [3]:
data_config = DataConfig(
    target=[
        "target"
    ],
    continuous_cols=num_col_names,
    categorical_cols=cat_col_names,
)
trainer_config = TrainerConfig(
    batch_size=32,
    max_epochs=50,
    early_stopping="valid_accuracy",
    early_stopping_mode="max",
    early_stopping_patience=3,
    checkpoints="valid_accuracy",
    load_best=True,
    progress_bar="none"
)
optimizer_config = OptimizerConfig()

## Model Sweep
https://pytorch-tabular.readthedocs.io/en/latest/apidocs_coreclasses/#pytorch_tabular.model_sweep

Let's train all available models ("high_memory"). If some of them return as "OOM" it means that you do not have enough memory to run in the current batch_size. You can ignore that model or reduce the batch_size in TrainerConfig.

In [4]:
from pytorch_tabular import model_sweep

In [5]:
sweep_df, best_model = model_sweep(
                            task="classification",
                            train=train,
                            test=valid,
                            data_config=data_config,
                            optimizer_config=optimizer_config,
                            trainer_config=trainer_config,
                            model_list="high_memory",
                            verbose=False # Make True if you want to log metrics and params each trial
                        )

Output()

In [6]:
best_model.evaluate(test)

[{'test_loss': 0.44678735733032227, 'test_accuracy': 0.8053333163261414}]

In the following table, we can see the best models (with default hyperparameters) for our dataset. But we are not satisfied, so in this case we will take the top two models and use Tuner to find better hyperparameters and have a better result.

**PS: Each time that run the Notebook the result may change a little, so you might see different top model that we will use in the next section.**

In [7]:
sweep_df.drop(columns=["params", "time_taken", "epochs"]).sort_values("test_accuracy", ascending=False).style.background_gradient(
    subset=["test_accuracy"], cmap="RdYlGn"
).background_gradient(subset=["time_taken_per_epoch", "test_loss"], cmap="RdYlGn_r")

Unnamed: 0,model,# Params,test_loss,test_accuracy,time_taken_per_epoch
1,CategoryEmbeddingModel,12 T,0.458506,0.797513,0.190966
3,FTTransformerModel,272 T,0.486184,0.77087,0.529126
4,GANDALFModel,8 T,0.562945,0.705151,0.341467
8,TabTransformerModel,272 T,0.547346,0.69627,0.47092
0,AutoIntModel,14 T,0.580009,0.689165,0.360073
5,GatedAdditiveTreeEnsembleModel,79 T,0.673274,0.660746,3.624957
2,DANetModel,431 T,0.692986,0.64476,2.104359
6,NODEModel,864 T,0.676671,0.626998,1.497243
7,TabNetModel,6 T,0.708919,0.538188,0.484836


## Model Tuner
https://pytorch-tabular.readthedocs.io/en/latest/apidocs_coreclasses/#pytorch_tabular.TabularModelTuner

Perfect!! Now that we know the best models, let take the top two and play with their hyperparameters to try find better results.

In [8]:
from pytorch_tabular.models import (
    CategoryEmbeddingModelConfig,
    FTTransformerConfig
)   

We can use two main strategies: 
- grid_search: to search for all hyperparameters that were defined, but remember that each new fields that you add will considerably increase the total training time. If you configure 4 optimizers, 4 layes, 2 activations and 2 dropout, that means 64 (4 * 4 * 2 * 3) trainings.
- random_search: will get randomly get "n_trials" hyperparameters settings from each model that has been defined. It is useful for faster training, but remember that will not test all hyperparameters.


For all hyperparameters options: https://pytorch-tabular.readthedocs.io/en/latest/apidocs_model/

More informations about how the hyperparameter spaces work: https://pytorch-tabular.readthedocs.io/en/latest/tutorials/10-Hyperparameter%20Tuning/#define-the-hyperparameter-space

Let's define some hyperparameters.

PS: This Notebook is to exemplify the functions and does not mean that are the best hyperparameters to try.

In [9]:
search_space_category_embedding = {
    "optimizer_config__optimizer": ["Adam", "SGD"],
    "model_config__layers": ["128-64-32", "1024-512-256", "32-64-128", "256-512-1024"],
    "model_config__activation": ["ReLU", "LeakyReLU"],
    "model_config__embedding_dropout": [0.0, 0.2],
}
model_config_category_embedding = CategoryEmbeddingModelConfig(task="classification")

In [10]:
search_space_ft_transformer = {
    "optimizer_config__optimizer": ["Adam", "SGD"],
    "model_config__input_embed_dim": [32, 64],
    "model_config__num_attn_blocks": [3, 6, 8],
    "model_config__ff_hidden_multiplier": [4, 8],
    "model_config__transformer_activation": ["GEGLU", "LeakyReLU"],
    "model_config__embedding_dropout": [0.0, 0.2],
}
model_config_ft_transformer = FTTransformerConfig(task="classification")

Let's add all search spaces and model configs in list.

**Important** They must be in the same order and same length

In [11]:
search_spaces = [search_space_category_embedding, search_space_ft_transformer]
model_configs = [model_config_category_embedding, model_config_ft_transformer]

In [12]:
from pytorch_tabular.tabular_model_tuner import TabularModelTuner

In [13]:
tuner = TabularModelTuner(
    data_config=data_config,
    model_config=model_configs,
    optimizer_config=optimizer_config,
    trainer_config=trainer_config
)
with warnings.catch_warnings():
    warnings.simplefilter("ignore")
    tuner_df = tuner.tune(
        train=train,
        validation=valid,
        search_space=search_spaces,
        strategy="grid_search",  # random_search
        # n_trials=5,
        metric="accuracy",
        mode="max",
        progress_bar=True,
        verbose=False # Make True if you want to log metrics and params each trial
    )

Output()

Nice!!! We now know the best architecture and possible hyperparameters for our dataset. Maybe the result is not good enough, but at least will reduce the options. With these results, we will know better which are the best hyperparameters that can be better explored and others that do not make sense to continue using.

It is even a good idea to explore the architecture paper so that, who knows, it can guide you further towards the best hyperparameters.

In [15]:
tuner_df.trials_df.sort_values("accuracy", ascending=False).style.background_gradient(
    subset=["accuracy"], cmap="RdYlGn"
).background_gradient(subset=["loss"], cmap="RdYlGn_r")

Unnamed: 0,trial_id,model,model_config__activation,model_config__embedding_dropout,model_config__layers,optimizer_config__optimizer,loss,accuracy,model_config__ff_hidden_multiplier,model_config__input_embed_dim,model_config__num_attn_blocks,model_config__transformer_activation
22,22,0-CategoryEmbeddingModelConfig,LeakyReLU,0.0,256-512-1024,Adam,0.339012,0.857904,,,,
26,26,0-CategoryEmbeddingModelConfig,LeakyReLU,0.2,1024-512-256,Adam,0.375515,0.817052,,,,
20,20,0-CategoryEmbeddingModelConfig,LeakyReLU,0.0,32-64-128,Adam,0.368664,0.815275,,,,
2,2,0-CategoryEmbeddingModelConfig,ReLU,0.0,1024-512-256,Adam,0.407023,0.813499,,,,
6,6,0-CategoryEmbeddingModelConfig,ReLU,0.0,256-512-1024,Adam,0.445294,0.811723,,,,
10,10,0-CategoryEmbeddingModelConfig,ReLU,0.2,1024-512-256,Adam,0.446737,0.811723,,,,
18,18,0-CategoryEmbeddingModelConfig,LeakyReLU,0.0,1024-512-256,Adam,0.44442,0.80817,,,,
30,30,0-CategoryEmbeddingModelConfig,LeakyReLU,0.2,256-512-1024,Adam,0.39853,0.797513,,,,
14,14,0-CategoryEmbeddingModelConfig,ReLU,0.2,256-512-1024,Adam,0.455243,0.781528,,,,
72,40,1-FTTransformerConfig,,0.0,,Adam,0.445089,0.779751,8.0,64.0,6.0,GEGLU


In [16]:
tuner_df.best_model.evaluate(test)

[{'test_loss': 0.38250666856765747, 'test_accuracy': 0.8173333406448364}]

After training, the best model will be saved in output variable as "best_model". So if you liked the result and wish to use the model in the future, you can save it calling "save_model".


In [17]:
tuner_df.best_model.save_model("best_model", inference_only=True)

In [18]:
# Load saved model
#from pytorch_tabular import TabularModel
#loaded_model = TabularModel.load_model("best_model")
#loaded_model.evaluate(test)