## Intent Classification With PyTorch
Previously, my focus in the notebooks was on obtaining labeled data for my chatbot. However, this current notebook is centered around utilizing PyTorch for the classification of intents within fresh, unseen user-generated data. The model has transitioned to a supervised learning approach, leveraging the labels derived from the unsupervised learning conducted in the preceding notebook.

### RASA Comparison

Rasa trains this intent classification step with SVM and GridsearchCV because they can try different configurations ([source](https://medium.com/bhavaniravi/intent-classification-demystifying-rasanlu-part-4-685fc02f5c1d)). When deploying preprocessing pipeline should remain same between train and test.

In [2]:
# Standard 
import collections
import yaml
import re
import os

# Data science
import pandas as pd
print(f"Pandas: {pd.__version__}")
import numpy as np
print(f"Numpy: {np.__version__}")

# Machine Learning
import sklearn
print(f"Sklearn: {sklearn.__version__}")

# Deep Learning
from torch import nn
import torch.optim as optim

# Visualization 
import seaborn as sns
import matplotlib.pyplot as plt
sns.set(style="ticks", color_codes=True)

# Preprocessing and Torch
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from sklearn.preprocessing import OneHotEncoder
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
# from torchtext.data.utils import get_tokenizer
# from torch.nn.utils.rnn import pad_sequence
# from torch.utils.data import Dataset, DataLoader, TensorDataset
# from torchtext.vocab import build_vocab_from_iterator
# from torchtext.data import get_tokenizer

# Reading in training data
train = pd.read_pickle('objects/train.pkl')
print(f'Training data: {train.head()}')

Pandas: 2.2.2
Numpy: 1.26.4
Sklearn: 1.4.2


FileNotFoundError: [Errno 2] No such file or directory: 'objects/train.pkl'

In [None]:
# Assuming 'train' is a DataFrame containing 'Utterance' and 'Intent' columns

# Tokenize the text data using PyTorch's tokenizer
tokenizer = get_tokenizer('basic_english')

# Tokenize and encode the text data
X_train = [tokenizer(text) for text in train['Utterance']]
y_train = train['Intent']

# Split the data into train and validation sets
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.3, 
                                                  shuffle=True, stratify=y_train, random_state=7)

# Convert labels to PyTorch tensors
y_train = torch.tensor(y_train.values)
y_val = torch.tensor(y_val.values)

print(f'\nShape checks:\nX_train: {len(X_train)} X_val: {len(X_val)}\ny_train: {len(y_train)} y_val: {len(y_val)}')

## Torchtext Preprocessing

### Torchtext tokenizer 
- Add description later 

### Plan of Action
- Prepare the dataset 

In [2]:
%pwd

'c:\\Sagar Study\\ML and Learning\\Projects\\customer-support-bot\\amazon_customer_support\\notebooks'

- Steps taken
    -   the words would involve creating a vocabulary dictionary to map words to indices 
    -   For each sequence, the words are converted into their corresponding indices based on the word dictionary 
    - When feeding sentences into the model, ensure a consistent sequence length is crucial 
    - To achieve this, sequences are padded with zeros until they reach the length of the longest sequence 
    - This padding ensures uniformity, and shorter maximum lengths are typically preferred for ease of training, as longer sequences can pose challenges 
    - This padding ensures uniformity, and shorter maximum lengths are typically preferred for ease of training, as longer sequences can pose challenges 


In [3]:
# Tokenize tge text data using PyTorch's tokenizer
tokenizer = get_tokenizer("basic_english")

# Tokenize and encode the text data  
X_train = [tokenizer(text) for text in train_df["Utterance"]]
y_train = train_df["Intent"]

# Split the data into train and validation sets 
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train,
                                                  test_size=0.3, shuffle=True, stratify = y_train, random_state = SEED_VALUE)

# Convert labels to PyTorch Tensors 
y_train = torch.tensor(y_train.values)
y_val = torch.tensor(y_val.values)

print(f"\nShape checks:\nX_train: {len(X_train)} X_val: {len(X_val)} y_train: {len(y_train)} y_val: {len(y_val)}")

In [None]:
y_train 

In [None]:
#  Write a function to average all of the word vectors in a given paragraph
def get_sentence_vector(sentence, word_vectors):
    
    # Initialize the vector as all zeros
    vector = np.zeros(word_vectors.vector_size)
    
    num_words = 0
    # Loop over each word in the sentence
    for word in sentence:
        if word in word_vectors:
            vector += word_vectors[word]
            num_words += 1
    # Average the vector
    if num_words:
        vector /= num_words
    return vector

In [None]:
# Create a vocab mapping for words 
word_to_idx = {word: index+1 for index, (word, _) in enumerate(t.word_index.items())}
index_to_word = {index+1: word for index, (word, _) in enumerate(t.word_index.items())}

# Convert documents to sequences of indices 
indexed_X_train = [[word_to_index[word] for word in doc] for doc in X_train]
indexed_X_val = [[word_to_idx[word] for word in doc] for doc in X_val]

# Pad sequences to a common length 
max_length = min(max(len(doc) for doc in indexed_X_train), 100)
padded_X_train = pad_sequence([torch.tensor(doc[ :max_length]) + [0]*(max_length - len(doc) for doc in indexed_X_train], batch_first=True)
padded_X_val = pad_sequence([torch.tensor(doc[: max_length] + [0] * (max_length - len(doc)) for doc in indexed_X_val], batch_first=True)
                            
# Define vocabulary size 
vocab_size = len(word_to_index) + 1 

print(f"Vocab size:\n{vocab_size}")
print(f"Max length:\n{max_length}")


print(f"padded_X_train\n{padded_X_train}")
print(f"padded_X_val\n{padded_X_val}")

Running the example fits the Tokenizer with 5 small documents. The details of the fit Tokenizer are printed. Then the 5 documents are encoded using a word count 

Each document is encoded as a 9-element vector with one position for each word and the chosen encoding scheme value for each word position. In this case, a simple word count is used 

## Embedding Matrix 

PyTorch models involve dealing with one-hot encoding for multiclass classification and using embeddings for document representations. Below are PyTorch specific of the provided statements

If we are using Doc2Vec embeddings, how will we pass in our tweets? We may have to pass it in as full tweets. Check how we pass in the tweets. We may have to perform tokenization at a tweet level. If we pass it in, if it's Twweet 57, it will activate the node such that it gets multiplied out by the embeddings for the 57th document 

In [None]:
# We can see that there are 4 different dimensionality options 
!ls models/glove.twitter.27B 

- Here we compute an index mapping of words to known embeddings by parsing the data dump of pre-trained embeddings 
- I use 50D because my X_train has a max_length of 32 

- Just include `weights and biases tracking` as a part of training mode 

In [None]:
# Using Glove Embeddings 
embeddings_index = {} 

with open("models/glove.twitter.27B/glove.twitter.27B.50d.txt", encoding="utf8") as f: 
    for line in f: 
        values = line.split()
        word = values[0]
        coefs = np.asarray(values[1:], dtype="float32")
        embeddings_index[word] = coefs
    f.close()
    
print("Found %s word vectors."%len(embeddings_index))

Now we can leverage our embeddings_index dictionary and our word_to_index dictionary to create an embedding matrix that we can use to initialize our embedding layer. We will use the same dimensionality as our GloVe embeddings (50).

In [None]:
# Initializing required objects 
word_index = t.word_index 
EMBEDDING_DIM = 50 # Because we are using the 50D glove embeddings 

# Getting my embedding matrix
embedding_matrix = np.zeros((len(word_index)+1, EMBEDDING_DIM))
for word, idx in word_index.items():
    embedding_vector = embeddings_index.get(word)
    if embedding_vector is not None:
        # Words not found in embedding index will be all zeros
        embedding_matrix[idx] = embedding_vector

In [None]:
embedding_matrix.shape, embeddings_index.shape 

Great, now we can start the modeling 

In a regular word embedding, the order of the embeddings in the matrix has to be setup so that it matches how the words 

I also made sure the order of the embeddings are the same order of the words as they are in the model 

Here, I also made sure that domain-specific words like customer support is in my Twitter embeddings. One example of this is 'customer' and you can clearly see that it is in indeed in my embeddings file, which is cool 

<img src="visualizations/amazon.png" alt="Drawing" style="width: 400px; "/>

In [None]:
# Encoding the target variable 
le = LabelEncoder()
le.fit(y_train)

y_train = le.transform(y_train)
y_valid = le.transform(y_valid)

In [None]:
y_train 

In [None]:
# Tokenize and encode the text data
X_train = [tokenizer(text) for text in train["Utterance"]]
y_train = train["Intent"]

In [None]:
padded_X_train[1]

In [None]:
le.classes_

In [None]:
X_train, X_valid, y_train, y_valid = train_test_split(train_df["Utterance"], train_df["Intent"], test_size=0.3,    shuffle=True, stratify=train_df["Intent"], random_state=SEED_VALUE)

**Build train and test dataloaders**

In [None]:
# Make torch datasets from train and test sets 
train = torch.utils.data.TensorDataset(torch.tensor(X_train), torch.tensor(y_train))
valid = torch.utils.data.TensorDataset(torch.tensor(X_valid), torch.tensor(y_valid))

# Create data loaders
train_loader = torch.utils.data.DataLoader(train, batch_size=BATCH_SIZE, shuffle=True)
valid_loader = torch.utils.data.DataLoader(valid, batch_size=BATCH_SIZE, shuffle=True)

# Architect the Neural Network  
- I will create a neural network with PyTorch with the output layer having the same no. of nodes as they are intents. The following is the architecture

In [None]:
class IntentClassificationModel(nn.Module): 
    def __init__(sel): 
        
        # Embedding layer
        self.embedding = nn.Embedding.from_pretrained(embedding_matrix, freeze=True)

        # Bidirectional LSTM layer
        self.lstm = nn.LSTM(embedding_matrix.shape[1], hidden_size, bidirectional=True)

        # Dense layers
        self.dense1 = nn.Linear(hidden_size * 2, dense_size)
        self.dense2 = nn.Linear(dense_size, dense_size)

        # Dropout layer
        self.dropout = nn.Dropout(dropout_rate)

        # Output layer
        self.output_layer = nn.Linear(dense_size, num_intents)

    def forward(self, input_data):
        embedded = self.embedding(input_data)
        lstm_out, _ = self.lstm(embedded)
        lstm_out = torch.cat((lstm_out[:, -1, :hidden_size], lstm_out[:, 0, hidden_size:]), dim=1)
        dense1_out = nn.functional.relu(self.dense1(lstm_out))
        dense2_out = nn.functional.relu(self.dense2(dense1_out))
        dropped_out = self.dropout(dense2_out)
        output = nn.functional.softmax(self.output_layer(dropped_out), dim=1)

        return output

In [None]:
# Creat the model with 32 as the max token length 
model = IntentClassificationModel(vocab_size, torch.Tensor(embedding_matrix), hidden_state=128, dense_size=600, num_intents=10, dropout_rate=0.5)

# Define the loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

In [None]:
epochs = 100 

# Initialize tracker for minimum validation loss
valid_loss_min = np.Inf # Set initial "min" to infinity

# Some lists to keep track of loss and accuracy during each epoch
epoch_lst = []
train_loss_lst = []
valid_loss_lst = []
train_acc_lst = []
valid_acc_lst = []

# Start looping through epochs
for epoch in range(epochs): 
    
    # Monitor training loss
    train_loss = 0.0
    valid_loss = 0.0
    
    # Train the model 
    
    # Set the model to training mode
    model.train()
    
    # Calculate accuracy
    correct = 0
    total = 0
    
    # Load train images with labels(targets)
    for data, target in train_loader: 
        
        # Clear the gradients of all optimized variables
        optimizer.zero_grad()
        
        # Forward pass: compute predicted outputs by passing inputs to the model
        output = model(data)
        pred_labels = torch.argmax(output, 1)
        
        # Compute total no. of labels 
        total += len(target)
        
        # Total correct predictions
        correct += (pred_labels == target).sum()
        
        # Calculate the loss 
        loss = criterion(output, target)
        
        # backward pass: compute gradient of the loss with respect to model parameters
        loss.backward()
        
        # Perform a single optimization step (parameter update)
        optimizer.step()
        
        # Update running training loss
        train_loss += loss.item()*data.size(0)
        
        # Calculate average training loss over an epoch
        train_loss = train_loss/len(train_loader.dataset)
        
    # Average accuracy 
    accuracy = 100 * correct/float(total)
    
    # Append them to a list for plotting and printing purposes
    train_loss_lst.append(train_loss)
    train_acc_lst.append(accuracy)
    
    
    # Set the model to evaluation mode 
    model.eval()
    
    # Caclulate validation accuracy
    correct = 0 
    total = 0
    
    with torch.no_grad(): 
        for data, target in test_loader:
            
            # Forward pass: compute predicted outputs by passing inputs to the model
            output = model(data)
            pred_labels = torch.argmax(output, 1)
            
            # Compute total no. of labels 
            total += len(target)
            
            # Total correct predictions
            correct += (pred_labels == target).sum()
            
            # Calculate the loss 
            loss = criterion(output, target)
            
            # Update running validation loss 
            valid_loss += loss.item()*data.size(0)
            
            # Calculate average validation loss over an epoch
            valid_loss = valid_loss/len(valid_loader.dataset)
            
            # Total no. of labels
            total += len(target)    
            
            # Total correct predictions
            correct += (pred_labels == target).sum()
            
# Calculate average validaton loss and accuracy over an epoch
val_loss = val_loss / len(valid_loader.dataset)
accuracy = 100 * correct/float(total)

# Put them in their list 
valid_acc_lst.append(accuracy)
val_loss_lst.append(val_loss)      

# Print the epoch and training loss details with validation accuracy 
print(f"Epoch: {epoch+1}/{epochs}.. Training loss: {train_loss:.3f}.. Validation Loss: {val_loss:.3f}.. Validation Accuracy: {accuracy:.3f}")

# Save the model if validation loss has decreased  
if val_loss <= valid_loss_min: 
    print(f"Validation loss decreased ({valid_loss_min:.3f} --> {val_loss:.3f}). Saving model...")
    torch.save(model.state_dict(), "models/intent_classification_model.pt")
    valid_loss_min = val_loss
# Move to next epoch 
epoch_lst.append(epoch + 1)

**Load the model with the lowest validation loss**

In [None]:
model.load_state_dict(torch.load("models/intent_classification_model.pt"))

In [None]:
# Visualize training loss vs validation loss (the loss is how bad the model is during training)
plt.figure(figsize=(10, 7))
plt.plot(train_loss_lst, label="Training Loss", color="cyan")
plt.plot(valid_loss_lst, label="Validation Loss", color="magenta")
plt.title("Training Loss vs Validation Loss")
plt.xlabel("Epochs")
plt.ylabel("Loss")
plt.legend()
plt.show()

# Visualize training accuracy vs validation accuracy 
plt.figure(figsize=(10, 7)) 
plt.plot(train_acc_lst, label="Training Accuracy", color="magenta")   
plt.plot(valid_acc_lst, label="Validation Accuracy", color="cyan")
plt.title("Training accuracy vs validation accuracy")
plt.xlabel("Epochs")
plt.ylabel("Accuracy")
plt.legend()
plt.show()

At after 20 epochs, the slope becomes a flat line, not really much change in the loss. Floor effect is you can't get any lower than 0 loss. It really quickly learns from the training data what it needs to learn. If you continue to train, you 're basically overfitting to the training data, you are fitting to the unimportant signal. 

For example, in the context of images, if the model learns to recognize what a cat is, it might now be too detailed and learn that cats have to be the color black as well.

### Model improvements
The model overfits at a low epoch. Model is significantly overfitting. Plot out accuracies.

Don't need 100 training epoch.

Look at learning rate scheduling, after certain number of epochs, decrease learning rate.
* Learning rate scheduling
* Early stopping or reducing epochs
* Dropout layers
* Regularization
* Improve distinctiveness between intent data

After I have applied these improvements, my accuracy went up.

In [None]:
def infer_intent(user_input): 
    # Tokenize the input 
    user_input = tokenizer(user_input)
    
    # Convert the input to indices 
    indexed_input = [word_to_idx[word] for word in user_input]
    
    # Pad the input 
    padded_input = pad_sequence([torch.tensor(indexed_input)], batch_first=True)
    
    # Make predictions 
    output = model(padded_input)
    
    # Get the predicted label 
    pred_label = torch.argmax(output, 1)
    
    # Return the predicted label 
    return le.inverse_transform(pred_label)

In [None]:
# Initializing checkpoint settings to view progress and save model 
filename = 'models/intent_classification_model.h5'

# Learning Rate Scheduling 
# This function keeps the initial learning rate for the first ten epochs


In [None]:
# Encoding the target variable 
le = LabelEncoder()
le.fit(y_train)

y_train = le.transform(y_train)
y_valid = le.transform(y_valid)

In [None]:
le.classes_

In [None]:
import io # for encoding
def yield_tokens(file_path):
    with io.open(file_path, encoding='utf-8') as file:
        for line in file:
            yield line.strip.split()
vocab = build_vocab_from_iterator(yield_tokens(file_path), specials=["<unk>"])

In [None]:
# I use Torch's tokenizer API 
# Train-test split of 95% train and 5% test


In [None]:
# Configuration for training
# Change all of the following configurations as per the specifications in the original repo 
# Set a seed value 
seed_value = 12321 

# 1. Set `PYTHONHASHSEED` environment variable at a fixed value
import os
os.environ['PYTHONHASHSEED']=str(seed_value)

# 2. Set `python` built-in pseudo-random generator at a fixed value
import random
random.seed(seed_value)

# 3. Set `numpy` pseudo-random generator at a fixed value
import numpy as np
np.random.seed(seed_value)

# 4. Set `pytorch` pseudo-random generator at a fixed value
torch.manual_seed(seed_value)

In [None]:
class MODEL_EVAL_METRIC:
    accuracy = "accuracy"
    f1_score = "f1_score"
    
class Config: 
 
    VOCAB_SIZE = 0
    BATCH_SIZE = 512 
    EMB_SIZE = 300 
    OUT_SIZE = 2
    NUM_FOLDS = 5 #  
    NUM_EPOCHS = 10 
    NUM_WORKERS = 8
    
# I want to update the pretrained embedding weights during training process 
# I want to use a pretrained embedding
    EMB_WT_UPDATE = True
    DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    MODEL_EVAL_METRIC = MODEL_EVAL_METRIC.accuracy
    FAST_DEV_RUN = False 
    PATIENCE = 6 
    IS_BIDIRECTIONAL = True 
    
    # Model hyperparameters
    MODEL_PARAMS = {
        "hidden_size": 128,
        "num_layers": 2,
        "drop_out": 0.4258,
        "lr": 0.000366,
        "weight_decay": 0.00001
    }

In [None]:
# The dataset class for CSV/TSV files 
class CustomDataset(Dataset):
    def __init__(self, data, tokenizer, vocab, max_length):
        self.data = data
        self.tokenizer = tokenizer
        self.vocab = vocab
        self.max_length = max_length

    def __len__(self):
        return len(self.data)

    def __getitem__(self, idx):
        label = self.data[idx][0]
        text = self.data[idx][1]
        tokens = self.tokenizer(text)[:self.max_length]
        tokens = [self.vocab[token] for token in tokens]
        return (torch.tensor(tokens), torch.tensor(label))

In [None]:
# Create embedding matrix 
def create_embedding_matrix(word_index, embedding_dict=None, dim=100): 
    num_words = len(word_index) + 1 # the word_index dictionary start from 1, not 0, since 0 is reserved for padding
    embedding_matrix = np.zeros((num_words, dim))
    for word, idx in word_index.items(): 
        embedding_vector = embedding_dict.get(word)
        if embedding_vector is not None: 
            embedding_matrix[idx] = embedding_vector
    return embedding_matrix

In [None]:
# Get the training and validation data
def create_data(train_df, valid_df): 
    X_train = train_df["text"].values
    y_train = train_df["label"].values
    X_valid = valid_df["text"].values
    y_valid = valid_df["label"].values
    
    ds_train = CustomDataset(X_train, tokenizer, vocab, max_length=100)
    ds_valid = CustomDataset(X_valid, tokenizer, vocab, max_length=100)
    
    torch_train = DataLoader(ds_train, batch_size=CONFIG.batch_size, collate_fn = pad_collate, num_workers=Config.NUM_WORKERS, shuffle=True)
    
    torch_valid = DataLoader(ds_valid, batch_size=CONFIG.batch_size, collate_fn = pad_collate, num_workers=Config.NUM_WORKERS, shuffle=True)
    
    return torch_train, torch_valid

In [None]:
# Pad the Input Sequence.  If the goal is to train with mini-batches, one @ needs to pad the sequences in batch. 
# In other words, given a mini-batch of size N, if the length of the largest sequence is L, 
# one needs to pad every sequence with a length of smaller than L with zeros and make their 
# lengths equal to L. Moreover, it is important that the sequences in the batch are in the 
# descending order.

from cProfile import label


def pad_collate(batch):
    # Each element in the batch is a tuple (token_tensor, label) 
    # Sort the batch (based on word count) in descending order 
    
    sorted_batch = sorted(batch, key=lambda x: x[0].shape[0], reverse=True)
    sequences = [x[0] for x in sorted_batch]
    sequences_padded = pad_sequence(sequences, batch_first=True, padding_value=0)
    
    # Also need to store the length of each sequence. This is later needed in order to unpad the sequences
    seq_len = torch.Tensor([x[0].shape[0] for x in sorted_batch])
    labels = torch.LongTensor([x[1] for x in sorted_batch]) 
    
    return sequences_padded, seq_len, labels

In [None]:
# Combine the input data into a TensorDataset (see what other types of data are availabel as well)
dataset = TensorDataset()

## Model Architecture 
- Create a neural network in Torch for intent classification 

In [None]:
from torch import lstm

# Enhance the architecture later 
class IntentClassifier(nn.Module):
    
    def __init__(self, vocab_size, embedding_dim, embedding_matrix, hidden_dim, output_dim, n_layers, dropout): 
        super().__init__() # In Python3, class specification ain't required in the super() call anymore 
        
        # Emebdding layer with pretrained weights 
        self.embedding = nn.Embedding(vocab_size, embedding_dim)
        self.embedding.weight.data.copy_(torch.from_numpy(embedding_matrix))
        self.embedding.weight.requires_grad = False
        
        # LSTM layer 
        self.lstm = nn.LSTM(embedding_dim, 
                            hidden_dim, 
                            num_layers=n_layers, 
                            bidirectional=True, 
                            dropout=dropout)
        
        # Dense layers 
        self.fc1 = nn.Linear(hidden_dim*2, 600)  # 2 for bidirectional 
        self.fc2 = nn.Linear(600, 600)
        
        # Dropout layer
        self.dropout = nn.Dropout(dropout)  
        
        # Output layer 
        self.out = nn.Linear(600, output_dim)
        
    def forward(self, inputs):
        
        # text = [batch_size, embed_length]
        
        embeddings = self.dropout(self.embedding(inputs))
        
        # embedded = [batch_size, sent_length, emb_dim]
        assert embeddings.shape == (inputs.shape[0], inputs.shape[1], self.embedding_dim)
         
        # pack_padded_sequence before feeding to the LSTM. This is required so PyTorch knows 
        # which elements of the sequence are padded and ignores them in the computation 
        # Accomplished only after the embedding step 
        embeds_pack = pack_padded_sequence(embeddings, inputs_lengths, batch_first=True)
        
        # Get the dimensions of the packed sequence 
        dimensions = embeds_pack.data.size()
        _, (hidden, _) = self.lstm(embeds_pack)
        # Ours task being a classification model, we are only interested in the final hidden state and not the LSTM output 
        # h_n and c_n = [num_directions * num_layers, batch_size, hidden_size]
        final_hidden_forward = hidden[-2,:,:] # [batch_size, hidden_dim]
        final_hidden_backward = hidden[-1,:,:] # [bacth_size, hidden_dim]
        
        # Concat the final forward and hidden backward states 
        hidden = torch.cat((final_hidden_forward, final_hidden_backward), dim=1)
                
        # Dense Linear Layers 
        dense_outputs_1 = F.relu(self.fc1(hidden))
        dense_outputs_2 = self.dropout(F.relu(self.fc2(dense_outputs_1)))
        
        # Final output classification layer
        final_output = F.soft(self.out(dense_outputs_2))
    
    return final_output

# Instantiate the model
intent_model = IntentClassifier(vocab_size=Config.VOCAB_SIZE, embedding_dim=Config.EMB_SIZE, embedding_matrix=embedding_matrix, hidden_dim=Config.MODEL_PARAMS["hidden_size"], output_dim=Config.OUT_SIZE, n_layers=Config.MODEL_PARAMS["num_layers"], dropout=Config.MODEL_PARAMS["drop_out"])

In [None]:
from torch import lstm

# Enhance the architecture later 
class IntentClassifier(nn.Module):
    
    def __init__(self, vocab_size, embedding_dim, embedding_matrix, hidden_dim, output_dim, n_layers, dropout): 
        super().__init__() # In Python3, class specification ain't required in the super() call anymore 
        
        # Emebdding layer with pretrained weights 
        self.embedding = nn.Embedding(vocab_size, embedding_dim)
        self.embedding.weight.data.copy_(torch.from_numpy(embedding_matrix))
        self.embedding.weight.requires_grad = False
        
        # LSTM layer 
        self.lstm = nn.LSTM(embedding_dim, 
                            hidden_dim, 
                            num_layers=n_layers, 
                            bidirectional=True, 
                            dropout=dropout)
        
        # Dense layers 
        self.fc1 = nn.Linear(hidden_dim*2, 600)  # 2 for bidirectional 
        self.fc2 = nn.Linear(600, 600)
        
        # Dropout layer
        self.dropout = nn.Dropout(dropout)  
        
        # Output layer 
        self.out = nn.Linear(600, output_dim)
        
    def forward(self, inputs):
        
        # text = [batch_size, embed_length]
        
        embeddings = self.dropout(self.embedding(inputs))
        
        # embedded = [batch_size, sent_length, emb_dim]
        assert embeddings.shape == (inputs.shape[0], inputs.shape[1], self.embedding_dim)
         
        # pack_padded_sequence before feeding to the LSTM. This is required so PyTorch knows 
        # which elements of the sequence are padded and ignores them in the computation 
        # Accomplished only after the embedding step 
        embeds_pack = pack_padded_sequence(embeddings, inputs_lengths, batch_first=True)
        
        # Get the dimensions of the packed sequence 
        dimensions = embeds_pack.data.size()
        _, (hidden, _) = self.lstm(embeds_pack)
        # Ours task being a classification model, we are only interested in the final hidden state and not the LSTM output 
        # h_n and c_n = [num_directions * num_layers, batch_size, hidden_size]
        final_hidden_forward = hidden[-2,:,:] # [batch_size, hidden_dim]
        final_hidden_backward = hidden[-1,:,:] # [bacth_size, hidden_dim]
        
        # Concat the final forward and hidden backward states 
        hidden = torch.cat((final_hidden_forward, final_hidden_backward), dim=1)
                
        # Dense Linear Layers 
        dense_outputs_1 = F.relu(self.fc1(hidden))
        dense_outputs_2 = self.dropout(F.relu(self.fc2(dense_outputs_1)))
        
        # Final output classification layer
        final_output = F.soft(self.out(dense_outputs_2))
    
    return final_output

# Instantiate the model
intent_model = IntentClassifier(vocab_size=Config.VOCAB_SIZE, embedding_dim=Config.EMB_SIZE, embedding_matrix=embedding_matrix, hidden_dim=Config.MODEL_PARAMS["hidden_size"], output_dim=Config.OUT_SIZE, n_layers=Config.MODEL_PARAMS["num_layers"], dropout=Config.MODEL_PARAMS["drop_out"])