#    Hyperparameter Tuning for Intent Classification using LSTM with Attention Mechanism
This notebook is 2nd part of the Ultimate AI Challenge. In the previous notebook, I trained a LSTM Model with Attention Mechanism for Intent Classification on the ATIS dataset. In this notebook, I perform a hyper parameter optimization to learn the best parameters for the model and also log our experiments using an ML server, e.g. MLFlow. Propreitary clouds already use experiment logging, model registry and deployment services, e.g. Vertex AI in GCP, Azure ML and Amazon Sagemaker. I have worked in Vertex AI and Azure ML. MLflow + Optuna is a good open source and scalable alternative to them. I use Optuna for hyperparameter tuning and MLflow for experiment tracking and model registry. Model Serving is done through pytorch save and load methods. To avoid pickling and version issues, one can also use model state dictionaries. The notebook is divided into the following sections:
1. Data Preparation
2. Hyperparameter Tuning
3. Model Training and Evaluation
4. Visualize the results
5. Model Export, Registry, Model Loading and Inference

Let's start

### Important Note; carefully set the project Root, so relative class imports work.

In [1]:
# if your notebook starts in the current directory, you need to set the directory to project root
import os
marker = '.git'  # Replace with your unique marker file or directory
while not os.path.exists(marker):
    os.chdir('..')
# Verify the current working directory
print("Current Working Directory:", os.getcwd())

Current Working Directory: /Users/ahmedjawad/Documents/JobSearch/UltimateAIChallenge/ultimate_aiv2


In [2]:
import optuna
import mlflow
from machine_learning.pipelines.data_loaders import vocab_size, output_dim, batch_size, device
import warnings
import torch.nn as nn
import torch.optim as optim
warnings.filterwarnings("ignore")


inside IntentTokenizer
Actual Vocabulary Size: 890
Encoding labels for the first time and adding unknown class.
Label Encoding: {'abbreviation': 0, 'aircraft': 1, 'aircraft+flight+flight_no': 2, 'airfare': 3, 'airfare+flight_time': 4, 'airline': 5, 'airline+flight_no': 6, 'airport': 7, 'capacity': 8, 'cheapest': 9, 'city': 10, 'distance': 11, 'flight': 12, 'flight+airfare': 13, 'flight_no': 14, 'flight_time': 15, 'ground_fare': 16, 'ground_service': 17, 'ground_service+ground_fare': 18, 'meal': 19, 'quantity': 20, 'restriction': 21, '<unknown>': 22}
Using device: mps
Number of training samples: torch.Size([4634, 46])
Number of training batches: 145
Number of test samples: torch.Size([850, 30])
Number of test batches: 27


## Data Preparation
Steps:
1. Load the data
2. Tokenize the data
3. Create a PyTorch Dataset
4. Create a PyTorch DataLoader

In [3]:
optuna.logging.set_verbosity(optuna.logging.ERROR)

## Hyperparameter Tuning
The Model I am using for the hyperparameter tuning is the IntentClassifierLSTMWithAttention. The hyperparameters are:
1. Learning rate : 1e-3 to 1e-1
2. Hidden dimension : 32, 64, 128, 256
3. Embedding dimension : 64, 128, 256, 512
4. Dropout rate : 0.1 to 0.5
5. Weight decay : 1e-6 to 1e-3

The objective function is the average validation accuracy over 5 folds. The best model is the one with the highest average validation accuracy. Note our test data is completely hidden to the accuracy optimizer of optuna. One can choose any metric to optimize, for example minimization of loss, etc..

## Optuna Study for Hyperparameter Optimization
Create an experiment in MLflow and run the hyperparameter tuning experiment. The best model is the one with the highest average validation accuracy. You can install the mlflow and optuna using pip install. To start mlflow server, I used the following command
`mlflow server --backend-store-uri=sqlite:///mlrunsdb15.db --default-artifact-root=file:mlruns --host 127.0.0.1 --port 1234`
Once started, it creates mlruns directory in project root and also a db in the root to log the progress, store models and experiments. In practise, it is installed on a server, and http requests are sent to the server. Anyhow pay close attention to the directory you give in the command to run the server. Here I keep it simple. Optuna also creates a db to store it's studies. Here I choose 20 tries. For 5 folds cross validation, and 5 epochs each, the study takes less than an hour my M1 machine. Optuna uses Bayesian Parameter Optimization.

Check the MLflow Server

pip install mlflow

`mlflow server --backend-store-uri=mlruns --default-artifact-root=file:mlruns --host 127.0.0.1 --port 1234`

In [4]:
### How to define the objective function

In [5]:
def log_hyperparameters(trial):
    # Log hyperparameters

    mlflow.log_param("lr", trial.params["lr"])
    mlflow.log_param("hidden_dim", trial.params["hidden_dim"])
    mlflow.log_param("embedding_dim", trial.params["embedding_dim"])
    mlflow.log_param("dropout_rate", trial.params["dropout_rate"])
    mlflow.log_param("weight_decay", trial.params["weight_decay"])
    print(
        f'lr: {trial.params["lr"]}, hidden_dim: {trial.params["hidden_dim"]}, embedding_dim: {trial.params["embedding_dim"]}, dropout_rate: {trial.params["dropout_rate"]}, weight_decay: {trial.params["weight_decay"]}')

    return True


In [6]:
def objective_ELSTMA(trial):
    from machine_learning.learners.IntentClassifierLSTMWithAttention import IntentClassifierLSTMWithAttention
    from sklearn.model_selection import KFold
    from machine_learning.pipelines.data_loaders import train_df, test_loader, tokenizer
    from machine_learning.learners.model_utils import train, evaluate
    epochs = 3
    with mlflow.start_run():
        # Suggest hyperparameters
        lr = trial.suggest_float("lr", 1e-4, 1e-2, log=True)
        hidden_dim = trial.suggest_categorical("hidden_dim", [32, 64, 256])
        embedding_dim = trial.suggest_categorical("embedding_dim", [64, 128, 256])
        dropout_rate = trial.suggest_float("dropout_rate", 0.1, 0.5)
        weight_decay = trial.suggest_float("weight_decay", 1e-5, 1e-3, log=True)
        criterion = nn.CrossEntropyLoss()
        log_hyperparameters(trial)
        # Model, loss, and optimizer
        # model = IntentClassifierLSTM(cfg.vocab_size, embedding_dim, hidden_dim, cfg.output_dim,dropout_rate).to(device)
        model = IntentClassifierLSTMWithAttention(vocab_size, embedding_dim, hidden_dim, output_dim, dropout_rate).to(device)
        optimizer = optim.Adam(model.parameters(), lr=lr, weight_decay=weight_decay)
        kfold = KFold(n_splits=5, shuffle=True, random_state=42)
        fold_val_acc = []

        for fold, (train_idx, val_idx) in enumerate(kfold.split(train_df)):
            # Prepare fold data
            train_data_subset = tokenizer.process_data(train_df.loc[train_idx,:], device=device)
            val_data_subset = tokenizer.process_data(train_df.loc[val_idx,:], device=device)
            train_subset_loader = DataLoader(train_data_subset, batch_size=batch_size, shuffle=True)
            val_subset_loader = DataLoader(val_data_subset, batch_size=batch_size, shuffle=False)
            fold_loss = train(model, optimizer, criterion, train_subset_loader, epochs)
            val_accuracy = evaluate(model,  criterion, val_subset_loader, data_type="Validation")
            print(f'Fold: {fold + 1}, Training Loss: {fold_loss:.4f}, Validation Accuracy: {val_accuracy:.4f}')
            fold_val_acc.append(val_accuracy)
        average_val_acc = sum(fold_val_acc) / len(fold_val_acc)

        # Log metrics
        mlflow.log_metric("train_loss", fold_loss)
        print(f'Foldloss: {fold_loss:.4f}')
        mlflow.log_metric("accuracy", average_val_acc)
        print(f'Average validation accuracy: {average_val_acc:.4f}')
        test_accuracy = evaluate(model, nn.CrossEntropyLoss(), test_loader, data_type="Test")
        print(f'Test Accuracy: {test_accuracy:.4f}')
        mlflow.log_metric("test_accuracy", test_accuracy)
        mlflow.pytorch.log_model(model, f"best_model_{study.study_name}")
        if test_accuracy>0.97:
            mlflow.pytorch.log_model(model, f"best_model_{study.study_name}_test_accuracy_{test_accuracy}")
    return average_val_acc

In [8]:
from pathlib import Path
from torch.utils.data import DataLoader
mlflow.set_tracking_uri('http://127.0.0.1:1234')

model_class_name = "IntentClassifierLSTMWithAttention"
#optuna.logging.get_logger("optuna").addHandler(logging.StreamHandler(sys.stdout))
storage_name = "sqlite:///data/db/{}.db".format(model_class_name)
study = optuna.create_study(study_name=model_class_name, load_if_exists=True, storage=storage_name,direction="maximize")
study.optimize(objective_ELSTMA, n_trials=2)
best_trial = study.best_trial

lr: 0.002953811327086381, hidden_dim: 64, embedding_dim: 128, dropout_rate: 0.3284367520175706, weight_decay: 2.4930518646037698e-05
Epoch [1/3], Loss: 1.0852, Accuracy: 0.7896
Test Loss: 0.5672
Test Accuracy: 0.8835
Epoch [2/3], Loss: 0.3616, Accuracy: 0.9274
Test Loss: 0.4296
Test Accuracy: 0.9071
Epoch [3/3], Loss: 0.1755, Accuracy: 0.9601
Test Loss: 0.3695
Test Accuracy: 0.9318
Validation Loss: 0.1034
Validation Accuracy: 0.9720
Fold: 1, Training Loss: 0.1755, Validation Accuracy: 0.9720
Epoch [1/3], Loss: 0.1325, Accuracy: 0.9690
Test Loss: 0.2747
Test Accuracy: 0.9341
Epoch [2/3], Loss: 0.0914, Accuracy: 0.9752
Test Loss: 0.2964
Test Accuracy: 0.9306
Epoch [3/3], Loss: 0.0744, Accuracy: 0.9808
Test Loss: 0.3573
Test Accuracy: 0.9459
Validation Loss: 0.1369
Validation Accuracy: 0.9784
Fold: 2, Training Loss: 0.0744, Validation Accuracy: 0.9784
Epoch [1/3], Loss: 0.0855, Accuracy: 0.9798
Test Loss: 0.2142
Test Accuracy: 0.9541
Epoch [2/3], Loss: 0.0546, Accuracy: 0.9876
Test Loss: 

## Best Model Training and Evaluation (Saving, Reloading the best model, through optuna or mlflow)
Once I find the best set of hyper parameters, you can open mlflow on tracking uri, e.g. in my case , I set http://127.0.0.1:1234. Also if you install optuna-dashboard, you can see very nice visualizations and get very detailed insights on hyperparameter space, their importance, and why some combinations work more than others. I invite to try it. Tune the best model with a larger number of epochs and evaluate it on the test set. You can reload an optuna study from its' db later to get the best parameters and train a model.

### Train the best model
Let's say that you have saved an optuna study and you want to reload it later. You feel that the number of epochs were not enough and you want to train the best model with more epochs. Or you want save the model initialization parameters so that you can load the model initial state later and use it along with the saved state dictionary. You can do the following:

In [9]:
import optuna
import logging
import sys
# Reload Optuna Study to get the best parameters
model_class_name = "IntentClassifierLSTMWithAttention"
#experiment_id = get_or_create_experiment(model_class_name)
if experiment := mlflow.get_experiment_by_name(model_class_name):
    experiment_id= experiment.experiment_id
else:
    experiment_id=mlflow.create_experiment(model_class_name)

In [10]:

storage_name = "sqlite:///data/db/{}.db".format(model_class_name)
study = optuna.create_study(study_name=model_class_name, load_if_exists=True, storage=storage_name,direction="maximize")
mlflow.set_experiment(experiment_id=experiment_id)
storage = optuna.storages.RDBStorage(url=f"sqlite:///data/db/{model_class_name}.db")
optuna.logging.get_logger("optuna").addHandler(logging.StreamHandler(sys.stdout));
study = optuna.create_study(study_name=model_class_name, storage=storage,load_if_exists=True,direction="maximize");

In [11]:
best_trial = study.best_trial
print(f'Best trial: score {best_trial.value:.4f},\nparams {best_trial.params}')

Best trial: score 0.9903,
params {'lr': 0.0013487934809448435, 'hidden_dim': 64, 'embedding_dim': 128, 'dropout_rate': 0.17918187700846566, 'weight_decay': 4.75333827188117e-05}


In [12]:
from machine_learning.learners.IntentClassifierLSTMWithAttention import IntentClassifierLSTMWithAttention
from machine_learning.pipelines.data_loaders import train_loader, test_loader, tokenizer
from machine_learning.learners.model_utils import train, evaluate


with mlflow.start_run():
    # define constants and hyperparameters
    num_epochs = 15
    model = IntentClassifierLSTMWithAttention(
        vocab_size=len(tokenizer.word2idx)+1,
        embedding_dim=256,
        hidden_dim=128,
        output_dim=len(tokenizer.le.classes_),
        dropout_rate=0.4
    ).to(device)
    
    optimizer = optim.Adam(model.parameters(), lr=0.0013,
                           weight_decay=4.5e-06)
    train_loss = train(model, optimizer, nn.CrossEntropyLoss(), train_loader, num_epochs)
    test_accuracy = evaluate(model, nn.CrossEntropyLoss(), test_loader, data_type="Test")
    mlflow.log_metric("test_accuracy", test_accuracy)
    print(f'Test Accuracy: {test_accuracy:.4f}')

Epoch [1/15], Loss: 1.1125, Accuracy: 0.7943
Test Loss: 0.6413
Test Accuracy: 0.8835
Epoch [2/15], Loss: 0.3232, Accuracy: 0.9344
Test Loss: 0.6764
Test Accuracy: 0.9106
Epoch [3/15], Loss: 0.1815, Accuracy: 0.9603
Test Loss: 0.3442
Test Accuracy: 0.9400
Epoch [4/15], Loss: 0.1244, Accuracy: 0.9722
Test Loss: 0.3034
Test Accuracy: 0.9482
Epoch [5/15], Loss: 0.0868, Accuracy: 0.9791
Test Loss: 0.3000
Test Accuracy: 0.9518
Epoch [6/15], Loss: 0.0682, Accuracy: 0.9827
Test Loss: 0.3063
Test Accuracy: 0.9565
Epoch [7/15], Loss: 0.0700, Accuracy: 0.9842
Test Loss: 0.3326
Test Accuracy: 0.9576
Epoch [8/15], Loss: 0.0535, Accuracy: 0.9862
Test Loss: 0.4206
Test Accuracy: 0.9600
Epoch [9/15], Loss: 0.0600, Accuracy: 0.9864
Test Loss: 0.3738
Test Accuracy: 0.9576
Epoch [10/15], Loss: 0.0520, Accuracy: 0.9875
Test Loss: 0.3775
Test Accuracy: 0.9553
Epoch [11/15], Loss: 0.0249, Accuracy: 0.9937
Test Loss: 0.2841
Test Accuracy: 0.9647
Early stopping
Test Loss: 0.2852
Test Accuracy: 0.9647
Test Acc

### Save the model
One can save the model in a number of ways. torch.save and torch.load are the most common ways, however this creates a pickle file and can cause python version issues. Saving the model state dictionary and loading it later is a better way. Because it can be mapped to any model class and device. For quick notebook saving and loading I use torch.save and torch.load. For production, I use model state dictionary and provide a clever construct to save the model. The model is saved with its class name. The parameters are saved in order in a json file. Later on this model can be recreated by providing the class name in the constructor and loading the parameters from the json file. The model state dictionary is saved in a separate file. The tokenizer and label encoder are also saved in separate files.



In [13]:
import torch
class_name=model.__class__.__name__
print(f"class_name={class_name}")
model.save_config_file(f"config/model_initialization/{class_name}.json")
torch.save(model.state_dict(),f"data/models/{class_name}_state_dict.pth")

class_name=IntentClassifierLSTMWithAttention


### Load the model
class_name is a string that can be used to create the model. The model state dictionary is loaded from the saved file. The tokenizer and label encoder are also loaded from the saved files.

In [14]:
# Load the trained model and tokenizer from saved files
from machine_learning.learners.intent_classifier import IntentClassifier
class_name="IntentClassifierLSTMWithAttention"
model_serve=IntentClassifier(class_name)
model_serve.load(class_name)

True

In [15]:
import pandas as pd

query_str = "how much do you charge for a flight to New York"
response = model_serve.predict(query_str)
# Creating a DataFrame with the query string
print(response)

[{'label': 'airfare', 'confidence': 0.9917524456977844}, {'label': 'abbreviation', 'confidence': 0.0018157875165343285}, {'label': 'ground_fare', 'confidence': 0.0017492893384769559}]


In [16]:
query_str = "I want to book a hotel in New York"
response = model_serve.predict(query_str)
# Creating a DataFrame with the query string
print(response)

[{'label': 'flight', 'confidence': 0.9957910776138306}, {'label': 'flight_time', 'confidence': 0.0013819440500810742}, {'label': 'airport', 'confidence': 0.0005671249236911535}]


## Visualize the results
Optuna provides a number of visualizations to analyze the results of the hyperparameter tuning experiment. I use these visualizations to analyze the results of the hyperparameter tuning experiment.

In [17]:
fig = optuna.visualization.plot_contour(study)
fig.update_layout(
    width=1600,   # Width of the figure in pixels
    height=800   # Height of the figure in pixels
)
fig.write_image("machine_learning/notebooks/plotly_contours.png") 

![Parallel Coordinates](plotly_contours.png)

You can make very nice conclusions about embedding dimensions and hidden dimmesions along with other dimensions and even restart the study from here. For 
1. Embedding dimension is ideal somewhere between 100 and 300
2. Hidden dimension is ideal between 50 50 100
3. learning rate between 0.001 and 0.0001.
4. dropout rate between 0.3 to 0.4

In [18]:
fig = optuna.visualization.plot_parallel_coordinate(study)
fig.update_layout(
    width=1600,   # Width of the figure in pixels
    height=800   # Height of the figure in pixels
)
fig.write_image("machine_learning/notebooks/plot_parallel_coordinate.png") 

![Parallel Coordinates](plot_parallel_coordinate.png)

To check Optuna's complete set of visualizations do the following:

!pip install optuna-dashboard

!optuna-dashboard sqlite:///IntentClassifierLSTMWithAttention.db 

## Save any model and tokenizer combination, and server from any model and tokenizer combination
While building different models, there can be situations when a user is interested in saving any model_token combination and serving from it later. The following functions can be used to save and serve from.

In [19]:
from machine_learning.learners.model_utils import save_modelname_tokenizer, serve_modelname_query
model_name="best_ICELSTMAmodel"
save_modelname_tokenizer(model, "Best_ELSTMA", tokenizer)
serve_modelname_query(model_name, "how much do you charge for a flight to New York")


Predicted label: ['airfare']


array(['airfare'], dtype=object)

### Next Step for model improvements

Visualize the impact of Attentions and Embeddings
One can visualize how attention and embeddings are playing but in the interest of time, I will not much further into improvment of training mechanism. I will shift to model evaulation and performance monitoring in production mode in the next notebook.