#    Hyperparameter Tuning for Intent Classification using LSTM with Attention Mechanism
This notebook is 2nd part of the Ultimate AI Challenge. In the previous notebook, I trained a LSTM Model with Attention Mechanism for Intent Classification on the ATIS dataset. In this notebook, I perform a hyper parameter optimization to learn the best parameters for the model and also log our experiments using an ML server, e.g. MLFlow. Propreitary clouds already use experiment logging, model registry and deployment services, e.g. Vertex AI in GCP, Azure ML and Amazon Sagemaker. I have worked in Vertex AI and Azure ML. MLflow + Optuna is a good open source and scalable alternative to them. I use Optuna for hyperparameter tuning and MLflow for experiment tracking and model registry. Model Serving is done through pytorch save and load methods. To avoid pickling and version issues, one can also use model state dictionaries. The notebook is divided into the following sections:
1. Data Preparation
2. Hyperparameter Tuning
3. Model Training and Evaluation
4. Visualize the results
5. Model Export, Registry, Model Loading and Inference

Let's start

In [1]:
import sys
from sklearn.model_selection import KFold
import torch.nn as nn
from machine_learning.IntentClassifierLSTMWithAttention import IntentClassifierLSTMWithAttention
from machine_learning.IntentTokenizer import IntentTokenizer
import torch.optim as optim
import pandas as pd
import torch
from torch.utils.data import DataLoader
from machine_learning.model_utils import train, evaluate, predict, get_or_create_experiment
import optuna
import logging
import mlflow
from optuna.visualization import plot_optimization_history
import matplotlib.pyplot as plt


import warnings
warnings.filterwarnings("ignore")

device=torch.device("cuda" if torch.cuda.is_available() else "mps" if torch.backends.mps.is_available() else "cpu")
print(f"Using device: {device}")

Using device: mps


## Data Preparation
Steps:
1. Load the data
2. Tokenize the data
3. Create a PyTorch Dataset
4. Create a PyTorch DataLoader

In [2]:
optuna.logging.set_verbosity(optuna.logging.ERROR)

train_df = pd.read_csv('data/atis/train.tsv', sep='\t', header=None, names=["text", "label"])
test_df = pd.read_csv('data/atis/test.tsv', sep='\t', header=None, names=["text", "label"])
tokenizer = IntentTokenizer(train_df)

inside IntentTokenizer
Actual Vocabulary Size: 890
Encoding labels for the first time and adding unknown class.
Label Encoding: {'abbreviation': 0, 'aircraft': 1, 'aircraft+flight+flight_no': 2, 'airfare': 3, 'airfare+flight_time': 4, 'airline': 5, 'airline+flight_no': 6, 'airport': 7, 'capacity': 8, 'cheapest': 9, 'city': 10, 'distance': 11, 'flight': 12, 'flight+airfare': 13, 'flight_no': 14, 'flight_time': 15, 'ground_fare': 16, 'ground_service': 17, 'ground_service+ground_fare': 18, 'meal': 19, 'quantity': 20, 'restriction': 21, '<unknown>': 22}


In [3]:

# define constants and hyperparameters
vocab_size=tokenizer.max_vocab_size
output_dim=len(tokenizer.le.classes_)
batch_size = 32
num_epochs = 3

train_data = tokenizer.process_data(train_df, device=device)
train_loader = DataLoader(train_data, shuffle=True, batch_size=batch_size)
print("Number of training samples:", train_data.tensors[0].size())
print("Number of training batches:", len(train_loader))

test_data = tokenizer.process_data(test_df, device=device)
print("Number of test samples:", test_data.tensors[0].size())
test_loader = DataLoader(test_data, shuffle=True, batch_size=batch_size)
print("Number of test batches:", len(test_loader))

Number of training samples: torch.Size([4634, 46])
Number of training batches: 145
Number of test samples: torch.Size([850, 30])
Number of test batches: 27


## Hyperparameter Tuning
The Model I am using for the hyperparameter tuning is the IntentClassifierLSTMWithAttention. The hyperparameters are:
1. Learning rate : 1e-3 to 1e-1
2. Hidden dimension : 32, 64, 128, 256
3. Embedding dimension : 64, 128, 256, 512
4. Dropout rate : 0.1 to 0.5
5. Weight decay : 1e-6 to 1e-3

The objective function is the average validation accuracy over 5 folds. The best model is the one with the highest average validation accuracy. Note our test data is completely hidden to the accuracy optimizer of optuna. One can choose any metric to optimize, for example minimization of loss, etc..

In [4]:
def log_hyperparameters(trial):
    # Log hyperparameters
    
    mlflow.log_param("lr", trial.params["lr"])
    mlflow.log_param("hidden_dim", trial.params["hidden_dim"])
    mlflow.log_param("embedding_dim", trial.params["embedding_dim"])
    mlflow.log_param("dropout_rate", trial.params["dropout_rate"])
    mlflow.log_param("weight_decay", trial.params["weight_decay"])
    print(f'lr: {trial.params["lr"]}, hidden_dim: {trial.params["hidden_dim"]}, embedding_dim: {trial.params["embedding_dim"]}, dropout_rate: {trial.params["dropout_rate"]}, weight_decay: {trial.params["weight_decay"]}')

    return

def log_metrics(trial, accuracy):
    # Log metrics
    mlflow.log_metric("accuracy", accuracy)
    return
def objective(trial):
    with mlflow.start_run():
        # Suggest hyperparameters
        lr = trial.suggest_float("lr", 1e-4, 1e-2, log=True)
        hidden_dim = trial.suggest_categorical("hidden_dim", [32, 64, 256])
        embedding_dim = trial.suggest_categorical("embedding_dim", [64, 128, 256])
        dropout_rate = trial.suggest_float("dropout_rate", 0.1, 0.5)
        weight_decay = trial.suggest_float("weight_decay", 1e-5, 1e-3, log=True)
        criterion = nn.CrossEntropyLoss()
        log_hyperparameters(trial)
        # Model, loss, and optimizer
        # model = IntentClassifierLSTM(cfg.vocab_size, embedding_dim, hidden_dim, cfg.output_dim,dropout_rate).to(device)
        model = IntentClassifierLSTMWithAttention(vocab_size, embedding_dim, hidden_dim, output_dim, dropout_rate).to(device)
        optimizer = optim.Adam(model.parameters(), lr=lr, weight_decay=weight_decay)
        kfold = KFold(n_splits=3, shuffle=True, random_state=42)
        fold_val_acc = []

        for fold, (train_idx, val_idx) in enumerate(kfold.split(train_df)):
            # Prepare fold data
            train_data_subset = tokenizer.process_data(train_df.loc[train_idx,:], device=device)
            val_data_subset = tokenizer.process_data(train_df.loc[val_idx,:], device=device)
            train_subset_loader = DataLoader(train_data_subset, batch_size=batch_size, shuffle=True)
            val_subset_loader = DataLoader(val_data_subset, batch_size=batch_size, shuffle=False)
            fold_loss = train(model, optimizer, criterion, train_subset_loader, num_epochs)
            val_accuracy = evaluate(model,  criterion, val_subset_loader, data_type="Validation")
            print(f'Fold: {fold + 1}, Training Loss: {fold_loss:.4f}, Validation Accuracy: {val_accuracy:.4f}')
            fold_val_acc.append(val_accuracy)
        average_val_acc = sum(fold_val_acc) / len(fold_val_acc)
        print(f'Average validation accuracy: {average_val_acc:.4f}')
        log_metrics(trial, average_val_acc)
    return average_val_acc

## Optuna Study for Hyperparameter Optimization
Create an experiment in MLflow and run the hyperparameter tuning experiment. The best model is the one with the highest average validation accuracy. You can install the mlflow and optuna using pip install. To start mlflow server, I used the following command
`mlflow server --backend-store-uri=sqlite:///mlrunsdb15.db --default-artifact-root=file:mlruns --host 127.0.0.1 --port 1234`
Once started, it creates mlruns directory in project root and also a db in the root to log the progress, store models and experiments. In practise, it is installed on a server, and http requests are sent to the server. Anyhow pay close attention to the directory you give in the command to run the server. Here I keep it simple. Optuna also creates a db to store it's studies. Here I choose 20 tries. For 5 folds cross validation, and 5 epochs each, the study takes less than an hour my M1 machine. Optuna uses Bayesian Parameter Optimization.

Check the MLflow Server

pip install mlflow

`mlflow server --backend-store-uri=mlruns --default-artifact-root=file:mlruns --host 127.0.0.1 --port 1234`

In [5]:
from pathlib import Path

mlflow.set_tracking_uri('http://127.0.0.1:1234')

model_class_name = "IntentClassifierLSTMWithAttention"
#optuna.logging.get_logger("optuna").addHandler(logging.StreamHandler(sys.stdout))
storage_name = f"sqlite:///{model_class_name}.db".format(model_class_name)
study = optuna.create_study(study_name=model_class_name, load_if_exists=True, storage=storage_name,direction="maximize")
study.optimize(objective, n_trials=2)
best_trial = study.best_trial

lr: 0.0008968117895606247, hidden_dim: 32, embedding_dim: 128, dropout_rate: 0.34900153507625337, weight_decay: 3.268795236062601e-05
Epoch [1/3], Loss: 2.7140, Accuracy: 0.3538
Epoch [2/3], Loss: 1.4381, Accuracy: 0.7789
Epoch [3/3], Loss: 0.7764, Accuracy: 0.8608
Validation Loss: 0.4229
Validation Accuracy: 0.9068
Fold: 1, Training Loss: 0.7764, Validation Accuracy: 0.9068
Epoch [1/3], Loss: 0.5246, Accuracy: 0.8945
Epoch [2/3], Loss: 0.4129, Accuracy: 0.9107
Epoch [3/3], Loss: 0.3470, Accuracy: 0.9262
Validation Loss: 0.2754
Validation Accuracy: 0.9288
Fold: 2, Training Loss: 0.3470, Validation Accuracy: 0.9288
Epoch [1/3], Loss: 0.3387, Accuracy: 0.9233
Epoch [2/3], Loss: 0.2684, Accuracy: 0.9375
Epoch [3/3], Loss: 0.2361, Accuracy: 0.9427
Validation Loss: 0.1582
Validation Accuracy: 0.9618
Fold: 3, Training Loss: 0.2361, Validation Accuracy: 0.9618
Average validation accuracy: 0.9325
lr: 0.0019924748555873264, hidden_dim: 64, embedding_dim: 128, dropout_rate: 0.26840519079042957, 

## Best Model Training and Evaluation (Saving, Reloading the best model, through optuna or mlflow)
Once I find the best set of hyper parameters, you can open mlflow on tracking uri, e.g. in my case , I set http://127.0.0.1:1234. Also if you install optuna-dashboard, you can see very nice visualizations and get very detailed insights on hyperparameter space, their importance, and why some combinations work more than others. I invite to try it. Tune the best model with a larger number of epochs and evaluate it on the test set. You can reload an optuna study from its' db later to get the best parameters and train a model.

### Train the best model
Let's say that you have saved an optuna study and you want to reload it later. You feel that the number of epochs were not enough and you want to train the best model with more epochs. Or you want save the model initialization parameters so that you can load the model initial state later and use it along with the saved state dictionary. You can do the following:

In [6]:
import optuna
import logging
import sys
# Reload Optuna Study to get the best parameters
model_class_name = "IntentClassifierLSTMWithAttention"
experiment_id = get_or_create_experiment(model_class_name)
storage_name = f"sqlite:///{model_class_name}.db".format(model_class_name)
study = optuna.create_study(study_name=model_class_name, load_if_exists=True, storage=storage_name,direction="maximize")
mlflow.set_experiment(experiment_id=experiment_id)
storage = optuna.storages.RDBStorage(url=f"sqlite:///{model_class_name}.db")
optuna.logging.get_logger("optuna").addHandler(logging.StreamHandler(sys.stdout));
study = optuna.create_study(study_name=model_class_name, storage=storage,load_if_exists=True,direction="maximize");

In [7]:
best_trial = study.best_trial
print(f'Best trial: score {best_trial.value:.4f},\nparams {best_trial.params}')

Best trial: score 0.9903,
params {'lr': 0.0013487934809448435, 'hidden_dim': 64, 'embedding_dim': 128, 'dropout_rate': 0.17918187700846566, 'weight_decay': 4.75333827188117e-05}


In [8]:
### Save , Load and Or train the best model further
from machine_learning.IntentTokenizer import IntentTokenizer
from machine_learning.IntentClassifierLSTMWithAttention import IntentClassifierLSTMWithAttention
from machine_learning.model_utils import train, evaluate, predict
import torch.nn as nn
import torch.optim as optim
import pandas as pd
import torch
from torch.utils.data import DataLoader

device=torch.device("cuda" if torch.cuda.is_available() else "mps" if torch.backends.mps.is_available() else "cpu")

train_df = pd.read_csv('data/atis/train.tsv', sep='\t', header=None, names=["text", "label"])
#ood_df=pd.read_csv('data/atis/ood.tsv', sep='\t', header=None, names=["text", "label"])
#train_df=pd.concat([train_df,ood_df])
test_df = pd.read_csv('data/atis/test.tsv', sep='\t', header=None, names=["text", "label"])

tokenizer = IntentTokenizer(train_df,5000)

train_data = tokenizer.process_data(train_df, device=device)
test_data = tokenizer.process_data(test_df, device=device)
batch_size = 32
train_loader = DataLoader(train_data, shuffle=True, batch_size=batch_size)
test_loader = DataLoader(test_data, shuffle=True, batch_size=batch_size)

with mlflow.start_run():
    # define constants and hyperparameters
    num_epochs = 30
    model = IntentClassifierLSTMWithAttention(
        vocab_size=len(tokenizer.word2idx)+1,
        embedding_dim=best_trial.params['embedding_dim'],
        hidden_dim=best_trial.params['hidden_dim'],
        output_dim=len(tokenizer.le.classes_),
        dropout_rate=best_trial.params['dropout_rate']
    ).to(device)
    
    optimizer = optim.Adam(model.parameters(), lr=best_trial.params['lr'],
                           weight_decay=best_trial.params['weight_decay'])
    train_loss = train(model, optimizer, nn.CrossEntropyLoss(), train_loader, num_epochs)
    test_accuracy = evaluate(model, nn.CrossEntropyLoss(), test_loader, data_type="Test")
    mlflow.log_metric("test_accuracy", test_accuracy)
    print(f'Test Accuracy: {test_accuracy:.4f}')

Epoch [1/30], Loss: 1.3819, Accuracy: 0.7328
Epoch [2/30], Loss: 0.3311, Accuracy: 0.9338
Epoch [3/30], Loss: 0.2089, Accuracy: 0.9545
Epoch [4/30], Loss: 0.1195, Accuracy: 0.9724
Epoch [5/30], Loss: 0.0988, Accuracy: 0.9778
Epoch [6/30], Loss: 0.0829, Accuracy: 0.9814
Epoch [7/30], Loss: 0.0495, Accuracy: 0.9892
Epoch [8/30], Loss: 0.0462, Accuracy: 0.9892
Epoch [9/30], Loss: 0.0546, Accuracy: 0.9894
Epoch [10/30], Loss: 0.0304, Accuracy: 0.9931
Epoch [11/30], Loss: 0.0317, Accuracy: 0.9922
Epoch [12/30], Loss: 0.0336, Accuracy: 0.9914
Epoch [13/30], Loss: 0.0291, Accuracy: 0.9927
Epoch [14/30], Loss: 0.0345, Accuracy: 0.9907
Epoch [15/30], Loss: 0.0232, Accuracy: 0.9944
Epoch [16/30], Loss: 0.0174, Accuracy: 0.9963
Epoch [17/30], Loss: 0.0129, Accuracy: 0.9974
Epoch [18/30], Loss: 0.0156, Accuracy: 0.9953
Epoch [19/30], Loss: 0.0167, Accuracy: 0.9955
Epoch [20/30], Loss: 0.0133, Accuracy: 0.9972
Epoch [21/30], Loss: 0.0113, Accuracy: 0.9965
Epoch [22/30], Loss: 0.0179, Accuracy: 0.99

### Save the model
Lets' use a clever construct to save the model. The model is saved with its class name. The parameters are saved in order in a json file. Later on this model can be recreated by providing the class name in the constructor and loading the parameters from the json file. The model state dictionary is saved in a separate file. The tokenizer and label encoder are also saved in separate files.

In [17]:
class_name=model.__class__.__name__
print(f"class_name={class_name}")
model.save_config_file(f"config/{class_name}.json")
torch.save(model.state_dict(),f"models/{class_name}_state_dict.pth")

class_name=IntentClassifierLSTMWithAttention


### Load the model
class_name is a string that can be used to create the model. The model state dictionary is loaded from the saved file. The tokenizer and label encoder are also loaded from the saved files.

In [18]:
# Load the trained model and tokenizer from saved files
from intent_classifier import IntentClassifier
class_name="IntentClassifierLSTMWithAttention"
model_serve=IntentClassifier(class_name)
model_serve.load(class_name)

True

In [19]:
import pandas as pd

query_str = "how much do you charge for a flight to New York"
response = model_serve.predict(query_str)
# Creating a DataFrame with the query string
print(response)

[{'label': 'airfare', 'confidence': 0.9998663663864136}, {'label': 'abbreviation', 'confidence': 4.765590711031109e-05}, {'label': 'flight+airfare', 'confidence': 2.4344944904441945e-05}]


In [20]:
query_str = "I want to book a hotel in New York"
response = model_serve.predict(query_str)
# Creating a DataFrame with the query string
print(response)

[{'label': 'flight', 'confidence': 0.9997010827064514}, {'label': 'aircraft', 'confidence': 8.617574349045753e-05}, {'label': 'ground_service', 'confidence': 5.792764568468556e-05}]


## Visualize the results
Optuna provides a number of visualizations to analyze the results of the hyperparameter tuning experiment. I use these visualizations to analyze the results of the hyperparameter tuning experiment.

In [21]:
fig = optuna.visualization.plot_contour(study)
fig.update_layout(
    width=1600,   # Width of the figure in pixels
    height=800   # Height of the figure in pixels
)
fig.write_image("plotly_contours.png") 

![Plotly Contours](plotly_contours.png)

You can make very nice conclusions about embedding dimensions and hidden dimmesions along with other dimensions and even restart the study from here. For 
1. Embedding dimension is ideal somewhere between 100 and 300
2. Hidden dimension is ideal between 50 50 100
3. learning rate between 0.001 and 0.0001.
4. dropout rate between 0.3 to 0.4

In [22]:
fig = optuna.visualization.plot_parallel_coordinate(study)
fig.update_layout(
    width=1600,   # Width of the figure in pixels
    height=800   # Height of the figure in pixels
)
fig.write_image("plot_parallel_coordinate.png") 

![Parallel Coordinates](plot_parallel_coordinate.png)

To check Optuna's complete set of visualizations do the following:

!pip install optuna-dashboard

!optuna-dashboard sqlite:///IntentClassifierLSTMWithAttention.db 

## Save the model and check model Serving

In [23]:
model_name = "best_ICELSTMAmodel"
torch.save(model, f"models/{model_name}.pth")
tokenizer.save_state(f"models/{model_name}_tokenizer.pickle", f"models/{model_name}_le.pickle")

In [24]:
# Load the trained model and tokenizer from saved files
model_serve = torch.load(f"models/{model_name}.pth").to(device)
tokenizer = IntentTokenizer.load_state(IntentTokenizer,f"models/{model_name}_tokenizer.pickle", f"models/{model_name}_le.pickle")
max_query_length = 50
query_text = "what airlines off from love field between 6 and 10 am on june sixth"
query = pd.DataFrame({"text": [query_text]})
prediction = predict(model_serve, query,tokenizer,device)
print(f"Predicted label: {prediction}")

Predicted label: ['airline']


### Next Step for model improvements

Visualize the impact of Attentions and Embeddings
One can visualize how attention and embeddings are playing but in the interest of time, I will not much further into improvment of training mechanism. I will shift to model evaulation and performance monitoring in production mode in the next notebook.