# Stack Overflow Tag Recommendation AI Model

This notebook demonstrates a multi-label classification project to automatically recommend relevant tags for Stack Overflow questions. The project explores both simple NLP approaches, such as TF-IDF and basic neural networks, and advanced transfer learning methods like BERT. The goal is to compare performance and build a model that efficiently predicts accurate tags, improving question categorization, searchability, and overall user experience.

**Approach: Data → Clean → Split → Tokenize → Dataset → DataLoader → Model → Train → Evaluate → Save**




# Install required packages

In [None]:
!pip install torchmetrics



# Import Necessary Libraries

In [None]:
import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader
import pandas as pd
from torchmetrics.classification import MultilabelF1Score, MultilabelRecall, MultilabelPrecision
from torchmetrics.classification import MultilabelAccuracy, MultilabelExactMatch
# from torcheval.metrics import MultilabelAccuracy

In [None]:
device = "cuda" if torch.cuda.is_available() else "cpu"
device

'cuda'

# Mount Google Drive

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


# Navigate to project directory

In [None]:
# List contents of your My Drive
!ls /content/drive/MyDrive/

 Banner_images	 DATA  'DS CLASS'   DS_PROJECTS   PROJECTS


In [None]:
%cd /content/drive/MyDrive/DS_PROJECTS/AI_Powered_Tag_Recommendation_for_Stack_Overflow

/content/drive/MyDrive/DS_PROJECTS/AI_Powered_Tag_Recommendation_for_Stack_Overflow


In [None]:
!ls /content/drive/MyDrive/DS_PROJECTS/AI_Powered_Tag_Recommendation_for_Stack_Overflow

data			inference.ipynb  train_multilabel.ipynb
data_preparetion.ipynb	model		 transformer_train.ipynb


In [None]:
import os
cwd = os.getcwd()
cwd

'/content/drive/MyDrive/DS_PROJECTS/AI_Powered_Tag_Recommendation_for_Stack_Overflow'

# Load data

This section focuses on loading the dataset. For machine learning projects, data is typically split into three sets:
*   **Training Data:** Used to train the model, allowing it to learn patterns.
*   **Validation Data:** Used during training to tune hyperparameters and prevent overfitting. It provides an unbiased evaluation of a model fit on the training dataset while tuning model hyperparameters.
*   **Test Data:** Used *after* training to evaluate the final model's performance on unseen data. It simulates how the model would perform in a real-world scenario.

In [None]:
train_data_file_path = '/content/drive/MyDrive/DS_PROJECTS/AI_Powered_Tag_Recommendation_for_Stack_Overflow/data/train.csv'
val_data_file_path = '/content/drive/MyDrive/DS_PROJECTS/AI_Powered_Tag_Recommendation_for_Stack_Overflow/data/val.csv'
test_data_file_path = '/content/drive/MyDrive/DS_PROJECTS/AI_Powered_Tag_Recommendation_for_Stack_Overflow/data/test.csv'

train_df = pd.read_csv(train_data_file_path)
val_df = pd.read_csv(val_data_file_path)
test_df = pd.read_csv(test_data_file_path)

print(f"Train shape: {train_df.shape}, Val shape: {val_df.shape}, Test shape: {test_df.shape}")

Train shape: (179903, 101), Val shape: (20015, 101), Test shape: (10738, 101)


In [None]:
train_df.head(5)

Unnamed: 0,question_summary,.net,ajax,algorithm,amazon-web-services,android,android-studio,angular,angularjs,arrays,...,ubuntu,unit-testing,unix,vim,visual-studio,visual-studio-code,windows,wpf,xcode,xml
0,Parsing JSON objects for HTML table I am tryin...,0,0,0,0,0,0,0,0,1,...,0,0,0,0,0,0,0,0,0,0
1,ListPopupWindow not obeying WRAP_CONTENT width...,0,0,0,0,1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Convert from Long to date format I want to con...,0,0,0,0,1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Removing Windows newlines on Linux (sed vs. aw...,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,JSON.net Serialize C# object to JSON Issue I a...,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [None]:
train_df.iloc[0]['question_summary']

'Parsing JSON objects for HTML table I am trying to display a "leaderboard" table based on JSON data. I have read a lot about the JSON format and overcome some initial obstacles, but my Javascript knowledge is very limited and I need ...'

# Text Preprocessing Setup (NLTK)

This cell imports necessary libraries from `nltk` (Natural Language Toolkit) and downloads required NLTK data resources. These are essential for cleaning and preparing text data for machine learning models.

**Background Concepts:**
*   **`re` (Regular Expressions):** Used for pattern matching and text manipulation (e.g., removing punctuation).
*   **`nltk`:** A leading platform for building Python programs to work with human language data.
*   **`stopwords`:** Common words (like 'the', 'is', 'a') that often carry little meaning in NLP tasks and are usually removed.
*   **`wordnet`:** A lexical database for the English language, used here to help with lemmatization.
*   **`WordNetLemmatizer`:** Reduces words to their base or dictionary form (lemma) (e.g., 'running' -> 'run', 'better' -> 'good'). This helps in reducing the vocabulary size and treating different forms of a word as the same.
*   **`word_tokenize`:** Splits a text into individual words or tokens.
*   **`collections.Counter`:** A dictionary subclass for counting hashable objects, used here to count word frequencies.
*   **`nltk.download(...)`:** Downloads necessary NLTK data packages. `punkt` is for tokenization, `stopwords` for stop word lists, `wordnet` and `omw-1.4` (Open Multilingual WordNet) for lemmatization, and `punkt_tab` is likely a specific tokenizer needed for this dataset.

In [None]:
import re
import nltk
from nltk.corpus import stopwords, wordnet
from nltk.stem import WordNetLemmatizer
from nltk.tokenize import word_tokenize
from collections import Counter


nltk.download("punkt")
nltk.download("stopwords")
nltk.download("wordnet")
nltk.download("omw-1.4")
nltk.download("punkt_tab") # Add this line to download the missing resource
lemmatizer = WordNetLemmatizer()
stop_words = set(stopwords.words("english"))

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package omw-1.4 to /root/nltk_data...
[nltk_data]   Package omw-1.4 is already up-to-date!
[nltk_data] Downloading package punkt_tab to /root/nltk_data...
[nltk_data]   Package punkt_tab is already up-to-date!


# Text Cleaning Function

This function is defined to clean and normalize text data. Here's a breakdown of its steps:
1.  **`re.sub(r'[^a-z0-9\s]', '', text.lower())`:**
    *   `text.lower()`: Converts all text to lowercase to ensure consistency (e.g., 'Python' and 'python' are treated the same).
    *   `re.sub(r'[^a-z0-9\s]', '', ...)`: Uses regular expressions to remove any characters that are not lowercase letters (`a-z`), numbers (`0-9`), or whitespace (`\s`). This cleans up punctuation, special symbols, etc.
2.  **`words = word_tokenize(text)`:** Splits the cleaned text into a list of individual words (tokens).
3.  **`tokens = [lemmatizer.lemmatize(word) for word in words if word not in stop_words]`:**
    *   **List Comprehension:** Iterates through each `word` in the `words` list.
    *   `if word not in stop_words`: Filters out common stop words (e.g., 'a', 'the', 'is') that don't usually contribute much to the meaning of the text for classification.
    *   `lemmatizer.lemmatize(word)`: Applies lemmatization to each remaining word, reducing it to its base form (e.g., 'running' becomes 'run', 'better' becomes 'good').

The function returns a list of cleaned, lemmatized, and non-stop words.

In [None]:
def text_preprocess(text):
  text = re.sub(r'[^a-z0-9\s]', '', text.lower())
  words = word_tokenize(text)
  tokens = [lemmatizer.lemmatize(word) for word in words if word not in stop_words]
  return tokens

# Tokenizer Function

This function converts a list of text `tokens` (words) into a numerical sequence, which is a format that neural networks can understand. It also handles vocabulary mapping and sequence padding/truncation.

**Background Concepts:**
*   **`vocab`:** A dictionary mapping each unique word to a unique integer ID. This is how text is converted into numbers.
*   **`<UNK>` (Unknown Token):** If a word in the input `tokens` is not found in the `vocab` (meaning it wasn't seen during training), it's mapped to the ID of the `<UNK>` token (typically 1). This prevents errors and handles out-of-vocabulary words.
*   **`<PAD>` (Padding Token):** Neural networks typically require input sequences to have the same length. Padding involves adding special `<PAD>` tokens (typically 0) to shorter sequences or truncating longer sequences to a `max_len`. This function adds padding tokens if the sequence is shorter than `max_len`.
*   **`max_len`:** The maximum length of a sequence. All sequences will be either padded to this length or truncated if they exceed it.

In [None]:
def text_to_numerical_sequence(tokens,vocab,max_len):
  seq = [vocab.get(word,vocab.get("<UNK>")) for word in tokens]
  if len(seq) < max_len:
    seq += [vocab["<PAD>"]] * (max_len - len(seq))
  else:
    seq = seq[:max_len]
  return seq

# Data Preprocessing and Numerical Encoding for Training Data

This is a critical section that prepares the raw text data for input into the neural network.

1.  **`texts = train_df['question_summary'].tolist()`:** Extracts all `question_summary` strings from the training DataFrame into a list.
2.  **`processed_texts = [text_preprocess(t) for t in texts]`:** Applies the `text_preprocess` function to each `question_summary`, cleaning and lemmatizing the text. It prints an example of a `processed_text` to show the output.
3.  **`word_counts = Counter(...)`:** Counts the frequency of every word across all `processed_texts`. This is used to build the vocabulary.
4.  **`max_vocab_size = 30000`:** Sets a limit on the vocabulary size. Using a fixed size helps manage model complexity and memory.
5.  **`vocab = {"<PAD>":0, "<UNK>":1}`:** Initializes the vocabulary dictionary with special tokens:
    *   `<PAD>`: For padding sequences to a uniform length.
    *   `<UNK>`: For handling words not present in the defined vocabulary.
6.  **Building the Vocabulary:** It iterates through the `max_vocab_size` most common words from `word_counts` and assigns them a unique integer ID (starting from 2). Less frequent words or words beyond `max_vocab_size` will be treated as `<UNK>`.
7.  **`max_len = max(len(tokens) for tokens in processed_texts)`:** Determines the maximum length of any processed text sequence. This `max_len` will be used to ensure all input sequences to the model have a consistent length.
8.  **`encoded_texts = [text_to_numerical_sequence(tokens, vocab, max_len) for tokens in processed_texts]`:** Converts each `processed_text` into a numerical sequence using the `text_to_numerical_sequence` function and the created `vocab` and `max_len`.
9.  **`encoded_texts = torch.tensor(encoded_texts, dtype=torch.long)`:** Converts the list of numerical sequences into a PyTorch tensor with a `long` data type (suitable for indices).
10. **`label_cols = train_df.columns[1:]`:** Extracts the column names corresponding to the tags (all columns except the first one, which is `question_summary`). These are the target labels for classification.
11. **`labels = torch.tensor(train_df[label_cols].values, dtype=torch.float32)`:** Extracts the numerical tag data (0s and 1s) from the DataFrame and converts it into a PyTorch tensor with a `float32` data type. This is because multi-label classification often uses `BCEWithLogitsLoss`, which expects float targets.

In [None]:
texts = train_df['question_summary'].tolist()
print(f"texts len: {len(texts)}\ntexts: {texts[:5]}")
# preprocess all texts
processed_texts  = [text_preprocess(t) for t in texts]
print(f"processed_texts[0]:{processed_texts[0]}")

word_counts = Counter([word for tokens in processed_texts for word in tokens])
print("word_counts info: ",len(word_counts), type(word_counts), word_counts.most_common(10))

max_vocab_size = 30000
# reserve special tokens
vocab = {"<PAD>":0, "<UNK>":1}
for idx, (word, _) in enumerate(word_counts.most_common(max_vocab_size)):
  vocab[word] = idx+2

print("top 10 vocab: ",list(vocab.items())[:10])

max_len = max(len(tokens) for tokens in processed_texts)
print(f"max_len:{max_len}")

tokenizer_max_length = 128

encoded_texts = [text_to_numerical_sequence(tokens,vocab,tokenizer_max_length) for tokens in processed_texts]
print(f"encoded_texts[0]:{encoded_texts[0]}")

encoded_texts = torch.tensor(encoded_texts, dtype=torch.long)
print(f"type(encoded_texts): {type(encoded_texts)}")

label_cols = train_df.columns[1:]  # all columns except the first one
print(f"label_cols: {label_cols}")

labels = torch.tensor(train_df[label_cols].values, dtype=torch.float32)
print(f"labels.shape: {labels.shape}")
num_labels = labels.shape[1]
print(f"The number of labels is: {num_labels}")

texts len: 179903
texts: ['Parsing JSON objects for HTML table I am trying to display a "leaderboard" table based on JSON data. I have read a lot about the JSON format and overcome some initial obstacles, but my Javascript knowledge is very limited and I need ...', "ListPopupWindow not obeying WRAP_CONTENT width spec I'm trying to use ListPopupWindow to show a list of strings via an ArrayAdapter (eventually this will be a more complex custom adapter). Code is below. As shown in the screenshot, the resulting ...", 'Convert from Long to date format I want to convert Long value to String or Date in this format dd/mm/YYYY. I have this value in Long format: 1343805819061. It is possible to convert it to Date format?', 'Removing Windows newlines on Linux (sed vs. awk) Have some delimited files with improperly placed newline characters in the middle of fields (not line ends), appearing as ^M in Vim. They originate from freebcp (on Centos 6) exports of a MSSQL ...', "JSON.net Serialize C# obje

# Preparing Validation Data for Model Validation

This section mirrors the preprocessing steps applied to the training data, but it's specifically for the validation dataset (`val_df`). The purpose is to prepare the validation text summaries and their corresponding labels in the same numerical and tensor format as the training data, so they can be fed into the model for evaluation during training.

**Key Steps (identical to training data preprocessing):**
1.  Extract `question_summary` from `val_df`.
2.  Preprocess the validation texts using `text_preprocess`.
3.  Convert the processed validation texts into numerical sequences using the *same* `vocab` and `max_len` derived from the training data. This is crucial to ensure consistency between training and validation inputs.
4.  Convert the numerical sequences into a PyTorch tensor (`val_encoded_texts`).
5.  Extract the label columns (tags) from `val_df`.
6.  Convert the validation labels into a PyTorch tensor (`val_labels`).


In [None]:
val_texts = val_df['question_summary'].tolist()
print(f"val_texts len: {len(texts)}\nval_texts: {val_texts[:5]}")
# preprocess all texts
val_processed_texts  = [text_preprocess(t) for t in val_texts]
print(f"val_processed_texts[0]:{val_processed_texts[0]}")

val_encoded_texts = [text_to_numerical_sequence(tokens,vocab,tokenizer_max_length) for tokens in val_processed_texts]
print(f"val_encoded_texts[0]:{val_encoded_texts[0]}")

val_encoded_texts = torch.tensor(val_encoded_texts, dtype=torch.long)
print(f"type(val_encoded_texts): {type(val_encoded_texts)}")

val_label_cols = val_df.columns[1:]  # all columns except the first one
print(f"val_label_cols: {val_label_cols}")

val_labels = torch.tensor(val_df[val_label_cols].values, dtype=torch.float32)
print(f"val_labels.shape: {val_labels.shape}")


val_texts len: 179903
val_texts: ['Parsing JSON objects for HTML table I am trying to display a "leaderboard" table based on JSON data. I have read a lot about the JSON format and overcome some initial obstacles, but my Javascript knowledge is very limited and I need ...', "ListPopupWindow not obeying WRAP_CONTENT width spec I'm trying to use ListPopupWindow to show a list of strings via an ArrayAdapter (eventually this will be a more complex custom adapter). Code is below. As shown in the screenshot, the resulting ...", 'Convert from Long to date format I want to convert Long value to String or Date in this format dd/mm/YYYY. I have this value in Long format: 1343805819061. It is possible to convert it to Date format?', 'Removing Windows newlines on Linux (sed vs. awk) Have some delimited files with improperly placed newline characters in the middle of fields (not line ends), appearing as ^M in Vim. They originate from freebcp (on Centos 6) exports of a MSSQL ...', "JSON.net Serialize

# `TextDataset` Custom Class

This custom `TextDataset` class is designed to hold your preprocessed numerical text sequences (`encoded_texts`) and their corresponding multi-hot encoded labels (`multi_hot_labels`).

*   **`__init__(self, encoded_texts, multi_hot_labels)`:** The constructor initializes the dataset with the preprocessed text data and its labels.
*   **`__len__(self)`:** Returns the total number of text samples in the dataset.
*   **`__getitem__(self, idx)`:** This method is called by the `DataLoader` to fetch a single sample. It returns a tuple containing the encoded text at `idx` and its corresponding multi-hot label.

In [None]:
class TextDataset(Dataset):
  """
  Custom Dataset class for text data.
  Args:
    encoded_texts (torch.Tensor): Encoded text sequences.
    multi_hot_labels (torch.Tensor): Multi-hot encoded labels.

  Methods:
    __len__: Returns the number of samples in the dataset.
    __getitem__: Retrieves a sample from the dataset.
  """
  def __init__(self,encoded_texts,multi_hot_labels):
    super().__init__()
    self.encoded_texts = encoded_texts
    self.multi_hot_labels = multi_hot_labels

  def __len__(self):
    return len(self.encoded_texts)

  def __getitem__(self,idx):
    return self.encoded_texts[idx], self.multi_hot_labels[idx]



# Creating Dataset and DataLoader

This section continues the creation of `Dataset` and `DataLoader` instances, specifically for both training and validation data.

**Background Concepts:**
*   **`train_dataset = TextDataset(encoded_texts, labels)`:** An instance of your custom `TextDataset` is created using the preprocessed training texts and labels.
*   **`train_dataloader = DataLoader(train_dataset, batch_size=16, shuffle=True)`:**
    *   `train_dataset`: The dataset to load data from.
    *   `batch_size=16`: The number of samples to load at once. This is a crucial hyperparameter affecting training speed and memory usage.
    *   `shuffle=True`: For training data, it's good practice to shuffle the data at the beginning of each epoch to prevent the model from learning the order of samples.
*   **Validation `Dataset` and `DataLoader`:** Similar instances are created for the validation data. Note that `shuffle=False` for the validation `DataLoader` because the order of validation samples doesn't typically affect evaluation metrics.

In [None]:
### Training
train_dataset = TextDataset(encoded_texts,labels)

train_dataloader = DataLoader(train_dataset,batch_size=128,shuffle=True)

### Validation
val_dataset = TextDataset(val_encoded_texts,val_labels)

val_dataloader = DataLoader(val_dataset,batch_size=128,shuffle=False)


# Model Definition

This section defines the neural network architecture using `torch.nn.Module`. This model is designed for multi-label text classification, leveraging an Embedding layer, an LSTM, and an attention mechanism.

**Background Concepts:**
*   **`nn.Module`:** The base class for all neural network modules in PyTorch. Any custom model must inherit from it.
*   **`__init__(self, ...)`:** The constructor where you define the layers of your neural network.
    *   `vocab_size`: The total number of unique words in your vocabulary.
    *   `embedding_dim`: The size of the dense vector representation for each word. Words are converted from discrete IDs to continuous vectors.
    *   `hidden_dim`: The number of features in the hidden state of the LSTM. A larger hidden dimension allows the LSTM to capture more complex patterns.
    *   `num_layers`: The number of recurrent layers in the LSTM. More layers allow the model to learn hierarchies of temporal dependencies.
    *   `num_labels`: The total number of possible tags (output classes) for your multi-label problem.
    *   `dropout`: A regularization technique that randomly sets a fraction of input units to zero at each update during training. This helps prevent overfitting.
*   **`nn.Embedding(vocab_size, embedding_dim, padding_idx=0)`:**
    *   This layer takes integer word IDs and converts them into dense vectors (embeddings). Each word gets a unique `embedding_dim`-sized vector.
    *   `padding_idx=0`: Specifies that the padding token (which has ID 0 in our `vocab`) should not be updated during training and its embedding should remain zero. This means padded elements don't contribute to the model's learning.
*   **`nn.LSTM(...)` (Long Short-Term Memory):**
    *   A type of recurrent neural network (RNN) particularly good at processing sequential data like text. LSTMs can capture long-range dependencies in text, overcoming the vanishing gradient problem of simpler RNNs.
    *   `input_size=embedding_dim`: Each input to the LSTM is a word embedding.
    *   `hidden_size=hidden_dim`: The size of the hidden state and cell state of the LSTM.
    *   `num_layers`: Stacks multiple LSTM layers.
    *   `batch_first=True`: Indicates that the input tensor will have batch dimension first (batch, sequence, features).
    *   `dropout`: Applied to the output of each LSTM layer except the last.
    *   `bidirectional=True`: The LSTM processes the sequence in both forward and backward directions, allowing it to capture context from both past and future words. This doubles the `hidden_dim` in the output (`hidden_dim * 2`).
*   **`nn.Linear(hidden_dim * 2, 1)` (Attention Layer):**
    *   After the bidirectional LSTM, the output for each word position is a concatenation of its forward and backward hidden states (`hidden_dim * 2`).
    *   An attention mechanism is used to give more weight to important words in the `question_summary`. This linear layer transforms the LSTM output into a single scalar score per word, which will later be used to compute attention weights.
*   **`nn.Dropout(...)`:** Applies dropout to prevent overfitting.
*   **`nn.Linear(hidden_dim * 2, hidden_dim)` (Fully Connected Layer 1):** A linear transformation from the pooled LSTM output to an intermediate hidden layer.
*   **`nn.ReLU()`:** Rectified Linear Unit activation function, which introduces non-linearity to the model.
*   **`nn.Linear(hidden_dim, num_labels)` (Fully Connected Layer 2):** The final output layer that maps the hidden features to `num_labels` outputs (one for each possible tag). These outputs are *logits* (raw scores) before activation.
*   **`forward(self, x)`:** Defines how data flows through the network.
    *   `embedded = self.embedding(x)`: Converts input word IDs to embeddings.
    *   `lstm_out, _ = self.lstm(embedded)`: Passes embeddings through the LSTM. `lstm_out` contains the hidden states for each time step.
    *   `attention_weights = torch.softmax(self.attention(lstm_out), dim=1)`: Calculates attention weights. The `attention` layer outputs scores, and `softmax` normalizes these scores into a probability distribution over the sequence, indicating the importance of each word.
    *   `pooled = torch.sum(attention_weights * lstm_out, dim=1)`: Applies the attention weights to the LSTM output. Instead of simple mean/max pooling, this performs a weighted sum, giving more influence to important words identified by the attention mechanism. This results in a single fixed-size vector representation for the entire sequence.
    *   The rest of the `forward` pass consists of applying dropout, passing through the fully connected layers, and ReLU activation to produce the final `logits`.

In [None]:
class AiStackOverflowTagRecommendation(nn.Module):
  def __init__(self,vocab_size,embedding_dim=300,hidden_dim=256,num_layers=2,num_labels=100,dropout=0.5):
    super().__init__()
    self.embedding = nn.Embedding(vocab_size,embedding_dim,padding_idx=0)
    self.lstm = nn.LSTM(
        input_size=embedding_dim,
        hidden_size=hidden_dim,
        num_layers=num_layers,
        batch_first=True,
        dropout=dropout if num_layers > 1 else 0,  # LSTM dropout only works with >1 layer
        bidirectional=True
    )
    # Attention layer for better pooling
    self.attention = nn.Linear(hidden_dim * 2, 1)
    self.dropout1 = nn.Dropout(dropout)
    self.fc1 = nn.Linear(hidden_dim * 2, hidden_dim)
    self.relu = nn.ReLU()
    self.dropout2 = nn.Dropout(dropout)
    self.fc2 = nn.Linear(hidden_dim,num_labels)

  def forward(self,x):
    embedded = self.embedding(x)
    lstm_out, _ = self.lstm(embedded)
    # Attention-based pooling instead of mean pooling
    attention_weights = torch.softmax(self.attention(lstm_out), dim=1)
    pooled = torch.sum(attention_weights * lstm_out, dim=1)
    dropped_out = self.dropout1(pooled)
    fc1_out = self.fc1(dropped_out)
    relu_out = self.relu(fc1_out)
    dropped2_out = self.dropout2(relu_out)
    logits = self.fc2(dropped2_out)
    return logits



# Creating Train and Test Functions

This section introduces the core functions that encapsulate the training and testing (or validation) logic for a single epoch. Separating these concerns into functions makes the training loop cleaner, more organized, and easier to read or modify.

**Background Concepts:**
*   **Training Loop:** The iterative process where the model learns from the training data by repeatedly performing forward passes, calculating loss, and updating weights.
*   **Evaluation Loop:** The process where the model's performance is measured on unseen data (validation or test sets) without updating its weights.

In [None]:
def train_step(
    model: nn.Module,
    dataloader: torch.utils.data.DataLoader,
    loss_fn: torch.nn.Module,
    optimizer: torch.optim.Optimizer,
    device: torch.device,
    num_labels: int # Required for metric initialization
):
  """
  Trains a PyTorch model for a single epoch.

  Turns a target PyTorch model to training mode and then
  runs through all of the required training steps (forward
  pass, loss calculation, optimizer step).
  Args:
    model: A PyTorch model to be trained.
    dataloader: A DataLoader instance for the model to be trained on.
    loss_fn: A PyTorch loss function to minimize.
    optimizer: A PyTorch optimizer to help minimize the loss function.
    device: A target device to compute on (e.g GPU)
    num_labels: The total number of classes (tags) in the multi-label problem.

  Returns:
      A dictionary containing the average training loss and all accumulated
      training metrics (as float values).
  """
  model.train()
  # Initialize loss accumulator
  train_loss = 0

  # --- Initialize Metrics ---
  # Threshold is set to 0.5, which is typical for multi-label classification.
  THRESHOLD = 0.5

  # Accuracy Metrics
  exact_match_metric = MultilabelExactMatch(num_labels=num_labels, threshold=THRESHOLD).to(device)

  # F1, Recall, and Precision Metrics (Micro Averaging)
  micro_f1_metric = MultilabelF1Score(num_labels=num_labels, threshold=THRESHOLD, average='micro').to(device)
  macro_f1_metric = MultilabelF1Score(num_labels=num_labels, threshold=THRESHOLD, average='macro').to(device)
  micro_recall_metric = MultilabelRecall(num_labels=num_labels, threshold=THRESHOLD, average='micro').to(device)
  micro_precision_metric = MultilabelPrecision(num_labels=num_labels, threshold=THRESHOLD, average='micro').to(device)

  # for batch, (X, y) in enumerate(dataloader):
  for X, y in tqdm(dataloader, desc="Training"):
    X, y = X.to(device), y.to(device)

    # 1. Forward pass
    y_logits = model(X)

    # y_prob = torch.sigmoid(y_logits) # No longer needed for loss calculation
    # y_pred = (torch.sigmoid(y_logits) > THRESHOLD).float() # Keep y_pred for metric calculation

    # 2. Calculate Loss
    loss = loss_fn(y_logits, y) # Calculate loss using logits and true labels
    train_loss += loss.item()

    # 3. Backpropagation
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    # train_acc += (y_pred.round() == y).sum().item()/len(y_pred)

    # 4. Update Metrics (Pass logits and true labels)
    exact_match_metric.update(y_logits, y)
    micro_f1_metric.update(y_logits, y)
    macro_f1_metric.update(y_logits, y)
    micro_recall_metric.update(y_logits, y)
    micro_precision_metric.update(y_logits, y)


  # --- Final Metric Calculation ---
  train_loss_avg = train_loss / len(dataloader)

  results = {
      "train_loss": train_loss_avg,
      "train_exact_match_acc": exact_match_metric.compute().item(),
      "train_micro_f1": micro_f1_metric.compute().item(),
      "train_macro_f1": macro_f1_metric.compute().item(),
      "train_micro_recall": micro_recall_metric.compute().item(),
      "train_micro_precision": micro_precision_metric.compute().item(),
  }

  # Reset metrics for the next epoch (good practice)
  for metric in [macro_f1_metric, exact_match_metric, micro_f1_metric, micro_recall_metric, micro_precision_metric]:
      metric.reset()

  return results

In [None]:
def test_step(
    model: nn.Module,
    dataloader: torch.utils.data.DataLoader,
    loss_fn: torch.nn.Module,
    device: torch.device,
    num_labels: int # Required for metric initialization
):
  """
  Tests a PyTorch model for a single epoch.

  Turns a target PyTorch model to "eval" mode and then performs
  a forward pass on a testing dataset.

  Args:
    model: A PyTorch model to be tested.
    dataloader: A DataLoader instance for the model to be tested on.
    loss_fn: A PyTorch loss function to calculate loss on the test data.
    device: A target device to compute on (e.g. "cuda" or "cpu").
    num_labels: The total number of classes (tags) in the multi-label problem.

  Returns:
    A dictionary containing the average testing loss and all accumulated
    testing metrics (as float values).
  """
  model.eval()
  test_loss = 0
  # --- 1. Initialize ALL Metrics (using torchmetrics) ---
  THRESHOLD = 0.5

  # Accuracy Metrics (using optimal torchmetrics classes/parameters)
  # test_hamming_metric = MultilabelAccuracy(num_labels=num_labels, threshold=THRESHOLD, average='micro').to(device)
  test_exact_match_metric = MultilabelExactMatch(num_labels=num_labels, threshold=THRESHOLD).to(device)

  # F1, Recall, and Precision Metrics (Micro Averaging)
  test_macro_f1_metric = MultilabelF1Score(num_labels=num_labels, threshold=THRESHOLD, average='macro').to(device)
  test_micro_f1_metric = MultilabelF1Score(num_labels=num_labels, threshold=THRESHOLD, average='micro').to(device)
  test_micro_recall_metric = MultilabelRecall(num_labels=num_labels, threshold=THRESHOLD, average='micro').to(device)
  test_micro_precision_metric = MultilabelPrecision(num_labels=num_labels, threshold=THRESHOLD, average='micro').to(device)

  # --- Inference Loop ---
  with torch.inference_mode():
    # for X, y in dataloader:
    for X, y in tqdm(dataloader, desc="Evaluating"):
      X, y = X.to(device), y.to(device)

      # 1. Forward pass
      test_logits = model(X)

      # 2. Calculate Loss (Note: y must be float for loss_fn)
      loss = loss_fn(test_logits, y.float())
      test_loss += loss.item()

      # 3. Update Metrics (Pass logits and true labels)
      # Metrics handle the sigmoid and thresholding internally.
      test_macro_f1_metric.update(test_logits, y)
      test_exact_match_metric.update(test_logits, y)
      test_micro_f1_metric.update(test_logits, y)
      test_micro_recall_metric.update(test_logits, y)
      test_micro_precision_metric.update(test_logits, y)

  # --- 2. Calculate Final Scores (Outside the loop, only once) ---
  test_loss_avg = test_loss / len(dataloader)

  results = {
      "test_loss": test_loss_avg,
      "test_macro_f1": test_macro_f1_metric.compute().item(),
      "test_exact_match_acc": test_exact_match_metric.compute().item(),
      "test_micro_f1": test_micro_f1_metric.compute().item(),
      "test_micro_recall": test_micro_recall_metric.compute().item(),
      "test_micro_precision": test_micro_precision_metric.compute().item(),
  }

  # Reset metrics for the next evaluation (essential for clean use)
  for metric in [test_macro_f1_metric, test_exact_match_metric, test_micro_f1_metric, test_micro_recall_metric, test_micro_precision_metric]:
      metric.reset()

  return results

# Creating a training and testing loop for a multi-label PyTorch model

In [None]:
from timeit import default_timer as timer
def print_train_time(start: float, end: float, device: torch.device = None):
    """Prints difference between start and end time.

    Args:
        start (float): Start time of computation (preferred in timeit format).
        end (float): End time of computation.
        device ([type], optional): Device that compute is running on. Defaults to None.

    Returns:
        float: time between start and end in seconds (higher is longer).
    """
    total_time = end - start
    print(f"Train time on {device}: {total_time:.3f} seconds")
    return total_time

In [None]:
torch.manual_seed(42)
model = AiStackOverflowTagRecommendation(
    vocab_size=len(vocab),
    embedding_dim=300,
    hidden_dim=256,
    num_layers=2,
    num_labels=num_labels,
    dropout=0.5
).to(device)

model


AiStackOverflowTagRecommendation(
  (embedding): Embedding(30002, 300, padding_idx=0)
  (lstm): LSTM(300, 256, num_layers=2, batch_first=True, dropout=0.5, bidirectional=True)
  (attention): Linear(in_features=512, out_features=1, bias=True)
  (dropout1): Dropout(p=0.5, inplace=False)
  (fc1): Linear(in_features=512, out_features=256, bias=True)
  (relu): ReLU()
  (dropout2): Dropout(p=0.5, inplace=False)
  (fc2): Linear(in_features=256, out_features=100, bias=True)
)

# Creating Loss Function and Optimizer

This section defines the crucial components needed for training a neural network: the loss function and the optimizer.

**Background Concepts:**
*   **Loss Function (`nn.BCEWithLogitsLoss()`):**
    *   This is the standard loss function for **multi-label classification** tasks where each output (tag) is an independent binary classification problem.
    *   `BCEWithLogitsLoss` combines a `Sigmoid` activation layer and `Binary Cross Entropy` loss in one function. This is numerically more stable than applying `Sigmoid` and then `BCELoss` separately.
    *   It expects raw logits (unnormalized scores) from the model's output and target labels (0s and 1s, typically `float32`). For each label, it calculates how well the model predicted the presence or absence of that label.
*   **Optimizer (`torch.optim.Adam(model.parameters(), lr=0.001)`):**
    *   The optimizer is responsible for updating the model's parameters (weights and biases) during training to minimize the loss function.
    *   **`Adam` (Adaptive Moment Estimation):** A popular and generally effective optimization algorithm. It adapts the learning rate for each parameter, performing well in various scenarios.
    *   `model.parameters()`: Provides all learnable parameters of your neural network to the optimizer.
    *   `lr=0.001`: The learning rate, a crucial hyperparameter that controls the step size taken by the optimizer during each update. A smaller learning rate means smaller steps, potentially leading to better convergence but slower training; a larger learning rate might converge faster but could overshoot the optimal solution.

In [None]:
loss_fn = nn.BCEWithLogitsLoss()
lr = 3e-4 # 0.0003
weight_decay = 1e-5 # 0.00001
# optimizer = torch.optim.Adam(model.parameters(),lr=0.001)
# optimizer = torch.optim.AdamW(model.parameters(), lr=lr, weight_decay=weight_decay)
optimizer = torch.optim.AdamW(model.parameters(), lr=0.001, weight_decay=1e-5)


In [None]:
# Import tqdm for progress bar
from tqdm.auto import tqdm

torch.manual_seed(42)
train_start_time = timer()

epocs = 15

# creating training and testng loop
for epoch in range(epocs):
  print(f"Epoch: {epoch+1}\n---------")
  train_metrics = train_step(
    model=model,
    dataloader=train_dataloader,
    loss_fn=loss_fn,
    optimizer=optimizer,
    device=device,
    num_labels=num_labels # Ensure num_labels is defined and passed here
  )
  # Unpack all metrics needed for printing from the dictionary
  train_loss = train_metrics['train_loss']
  train_exact_match_acc = train_metrics['train_exact_match_acc'] * 100
  train_macro_f1 = train_metrics["train_macro_f1"]
  train_micro_f1 = train_metrics['train_micro_f1']
  train_micro_recall = train_metrics['train_micro_recall']
  train_micro_precision = train_metrics['train_micro_precision']
  print(f"Train Loss: {train_loss:.5f} | Train Exact Match Acc: {train_exact_match_acc:.2f}% | Train Macro F1/Micro F1/Micro Recall/Micro Precision: {train_macro_f1:.4f}/{train_micro_f1:.4f}/{train_micro_recall:.4f}/{train_micro_precision:.4f}")
  test_metrics = test_step(
      model=model,
      dataloader=val_dataloader,
      loss_fn=loss_fn,
      device=device,
      num_labels=num_labels
  )
  test_loss = test_metrics['test_loss']
  test_exact_match_acc = test_metrics['test_exact_match_acc'] * 100
  test_macro_f1 = test_metrics['test_macro_f1']
  test_micro_f1 = test_metrics['test_micro_f1']
  test_micro_recall = test_metrics['test_micro_recall']
  test_micro_precision = test_metrics['test_micro_precision']

  print(f"Test Loss: {test_loss:.5f} | Test Exact Match Acc: {test_exact_match_acc:.2f}% | Test Macro F1/Micro F1/Micro Recall/Micro Precision: {test_macro_f1:.4f}/{test_micro_f1:.4f}/{test_micro_recall:.4f}/{test_micro_precision:.4f}")


train_end_time = timer()
print_train_time(train_start_time, train_end_time, device)


Epoch: 1
---------


Training:   0%|          | 0/1406 [00:00<?, ?it/s]

Train Loss: 0.07486 | Train Exact Match Acc: 0.86% | Train Macro F1/Micro F1/Micro Recall/Micro Precision: 0.0056/0.0221/0.0122/0.1217


Evaluating:   0%|          | 0/157 [00:00<?, ?it/s]

Test Loss: 0.05491 | Test Exact Match Acc: 6.80% | Test Macro F1/Micro F1/Micro Recall/Micro Precision: 0.0149/0.1433/0.0787/0.7969
Epoch: 2
---------


Training:   0%|          | 0/1406 [00:00<?, ?it/s]

Train Loss: 0.04269 | Train Exact Match Acc: 24.01% | Train Macro F1/Micro F1/Micro Recall/Micro Precision: 0.2630/0.4623/0.3286/0.7800


Evaluating:   0%|          | 0/157 [00:00<?, ?it/s]

Test Loss: 0.02960 | Test Exact Match Acc: 39.04% | Test Macro F1/Micro F1/Micro Recall/Micro Precision: 0.4533/0.6416/0.5162/0.8474
Epoch: 3
---------


Training:   0%|          | 0/1406 [00:00<?, ?it/s]

Train Loss: 0.03129 | Train Exact Match Acc: 38.04% | Train Macro F1/Micro F1/Micro Recall/Micro Precision: 0.4882/0.6353/0.5211/0.8137


Evaluating:   0%|          | 0/157 [00:00<?, ?it/s]

Test Loss: 0.02445 | Test Exact Match Acc: 45.27% | Test Macro F1/Micro F1/Micro Recall/Micro Precision: 0.5808/0.7045/0.6019/0.8491
Epoch: 4
---------


Training:   0%|          | 0/1406 [00:00<?, ?it/s]

Train Loss: 0.02733 | Train Exact Match Acc: 42.69% | Train Macro F1/Micro F1/Micro Recall/Micro Precision: 0.5581/0.6816/0.5810/0.8245


Evaluating:   0%|          | 0/157 [00:00<?, ?it/s]

Test Loss: 0.02174 | Test Exact Match Acc: 49.22% | Test Macro F1/Micro F1/Micro Recall/Micro Precision: 0.6259/0.7368/0.6485/0.8530
Epoch: 5
---------


Training:   0%|          | 0/1406 [00:00<?, ?it/s]

Train Loss: 0.02482 | Train Exact Match Acc: 45.45% | Train Macro F1/Micro F1/Micro Recall/Micro Precision: 0.5942/0.7086/0.6174/0.8313


Evaluating:   0%|          | 0/157 [00:00<?, ?it/s]

Test Loss: 0.01992 | Test Exact Match Acc: 51.50% | Test Macro F1/Micro F1/Micro Recall/Micro Precision: 0.6596/0.7608/0.6862/0.8537
Epoch: 6
---------


Training:   0%|          | 0/1406 [00:00<?, ?it/s]

Train Loss: 0.02288 | Train Exact Match Acc: 47.95% | Train Macro F1/Micro F1/Micro Recall/Micro Precision: 0.6217/0.7298/0.6465/0.8379


Evaluating:   0%|          | 0/157 [00:00<?, ?it/s]

Test Loss: 0.01841 | Test Exact Match Acc: 54.29% | Test Macro F1/Micro F1/Micro Recall/Micro Precision: 0.6808/0.7748/0.7009/0.8661
Epoch: 7
---------


Training:   0%|          | 0/1406 [00:00<?, ?it/s]

Train Loss: 0.02134 | Train Exact Match Acc: 49.81% | Train Macro F1/Micro F1/Micro Recall/Micro Precision: 0.6418/0.7457/0.6678/0.8442


Evaluating:   0%|          | 0/157 [00:00<?, ?it/s]

Test Loss: 0.01720 | Test Exact Match Acc: 56.27% | Test Macro F1/Micro F1/Micro Recall/Micro Precision: 0.7086/0.7915/0.7333/0.8596
Epoch: 8
---------


Training:   0%|          | 0/1406 [00:00<?, ?it/s]

Train Loss: 0.01997 | Train Exact Match Acc: 51.72% | Train Macro F1/Micro F1/Micro Recall/Micro Precision: 0.6629/0.7612/0.6885/0.8509


Evaluating:   0%|          | 0/157 [00:00<?, ?it/s]

Test Loss: 0.01631 | Test Exact Match Acc: 58.47% | Test Macro F1/Micro F1/Micro Recall/Micro Precision: 0.7265/0.8026/0.7458/0.8689
Epoch: 9
---------


Training:   0%|          | 0/1406 [00:00<?, ?it/s]

Train Loss: 0.01878 | Train Exact Match Acc: 53.56% | Train Macro F1/Micro F1/Micro Recall/Micro Precision: 0.6790/0.7731/0.7052/0.8554


Evaluating:   0%|          | 0/157 [00:00<?, ?it/s]

Test Loss: 0.01547 | Test Exact Match Acc: 60.04% | Test Macro F1/Micro F1/Micro Recall/Micro Precision: 0.7392/0.8144/0.7621/0.8744
Epoch: 10
---------


Training:   0%|          | 0/1406 [00:00<?, ?it/s]

Train Loss: 0.01788 | Train Exact Match Acc: 54.93% | Train Macro F1/Micro F1/Micro Recall/Micro Precision: 0.6933/0.7830/0.7188/0.8597


Evaluating:   0%|          | 0/157 [00:00<?, ?it/s]

Test Loss: 0.01489 | Test Exact Match Acc: 61.19% | Test Macro F1/Micro F1/Micro Recall/Micro Precision: 0.7519/0.8215/0.7709/0.8792
Epoch: 11
---------


Training:   0%|          | 0/1406 [00:00<?, ?it/s]

Train Loss: 0.01698 | Train Exact Match Acc: 56.48% | Train Macro F1/Micro F1/Micro Recall/Micro Precision: 0.7083/0.7935/0.7333/0.8645


Evaluating:   0%|          | 0/157 [00:00<?, ?it/s]

Test Loss: 0.01432 | Test Exact Match Acc: 63.11% | Test Macro F1/Micro F1/Micro Recall/Micro Precision: 0.7644/0.8319/0.7836/0.8867
Epoch: 12
---------


Training:   0%|          | 0/1406 [00:00<?, ?it/s]

Train Loss: 0.01624 | Train Exact Match Acc: 57.87% | Train Macro F1/Micro F1/Micro Recall/Micro Precision: 0.7202/0.8016/0.7441/0.8689


Evaluating:   0%|          | 0/157 [00:00<?, ?it/s]

Test Loss: 0.01385 | Test Exact Match Acc: 64.19% | Test Macro F1/Micro F1/Micro Recall/Micro Precision: 0.7782/0.8382/0.7935/0.8883
Epoch: 13
---------


Training:   0%|          | 0/1406 [00:00<?, ?it/s]

Train Loss: 0.01546 | Train Exact Match Acc: 59.17% | Train Macro F1/Micro F1/Micro Recall/Micro Precision: 0.7322/0.8105/0.7568/0.8725


Evaluating:   0%|          | 0/157 [00:00<?, ?it/s]

Test Loss: 0.01352 | Test Exact Match Acc: 65.51% | Test Macro F1/Micro F1/Micro Recall/Micro Precision: 0.7905/0.8444/0.8046/0.8883
Epoch: 14
---------


Training:   0%|          | 0/1406 [00:00<?, ?it/s]

Train Loss: 0.01488 | Train Exact Match Acc: 60.32% | Train Macro F1/Micro F1/Micro Recall/Micro Precision: 0.7428/0.8175/0.7660/0.8763


Evaluating:   0%|          | 0/157 [00:00<?, ?it/s]

Test Loss: 0.01303 | Test Exact Match Acc: 66.85% | Test Macro F1/Micro F1/Micro Recall/Micro Precision: 0.8015/0.8528/0.8186/0.8900
Epoch: 15
---------


Training:   0%|          | 0/1406 [00:00<?, ?it/s]

Train Loss: 0.01425 | Train Exact Match Acc: 61.46% | Train Macro F1/Micro F1/Micro Recall/Micro Precision: 0.7530/0.8242/0.7765/0.8783


Evaluating:   0%|          | 0/157 [00:00<?, ?it/s]

Test Loss: 0.01273 | Test Exact Match Acc: 68.05% | Test Macro F1/Micro F1/Micro Recall/Micro Precision: 0.8115/0.8566/0.8188/0.8979
Train time on cuda: 2220.798 seconds


2220.797524862

| Metric        | What it Means                                                   |
| ------------- | --------------------------------------------------------------- |
| **Precision** | Out of predicted positives → how many were correct?             |
| **Recall**    | Out of actual positives → how many did we find?                 |
| **F1**        | Balance between precision & recall                              |
| **Micro**     | Global performance across all labels (frequent labels dominate) |
| **Macro**     | Average performance per label (rare labels treated equally)     |


# Save model

In [None]:
os.makedirs("model", exist_ok=True)
model_save_path = "model/ai_stackoverflow_model.pth"
torch.save({
    'model_state_dict': model.state_dict(),
    'vocab': vocab,
    'labels': label_cols.tolist(),
    'tokenizer_max_length':128,
    'vocab_size': len(vocab),
    'embedding_dim': 300,
    'hidden_dim': 256,
    'num_layers': 2,
    'num_labels': len(label_cols),
    'dropout': 0.5
}, model_save_path)


# Loading the model for inference

In [None]:
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"device: {device}")
ckpt = torch.load("model/ai_stackoverflow_model.pth", map_location=device)

loaded_vocab = ckpt['vocab']
loaded_label_cols = ckpt['labels']
max_token_len = ckpt['tokenizer_max_length']

loaded_model = AiStackOverflowTagRecommendation(
    vocab_size=ckpt['vocab_size'],
    embedding_dim=ckpt['embedding_dim'],
    hidden_dim=ckpt['hidden_dim'],
    num_layers=ckpt['num_layers'],
    num_labels=ckpt['num_labels'],
    dropout=ckpt['dropout']
)

loaded_model.load_state_dict(ckpt['model_state_dict'])
loaded_model.to(device)
loaded_model.eval()


device: cuda


AiStackOverflowTagRecommendation(
  (embedding): Embedding(30002, 300, padding_idx=0)
  (lstm): LSTM(300, 256, num_layers=2, batch_first=True, dropout=0.5, bidirectional=True)
  (attention): Linear(in_features=512, out_features=1, bias=True)
  (dropout1): Dropout(p=0.5, inplace=False)
  (fc1): Linear(in_features=512, out_features=256, bias=True)
  (relu): ReLU()
  (dropout2): Dropout(p=0.5, inplace=False)
  (fc2): Linear(in_features=256, out_features=100, bias=True)
)

#Performing Inference on the Test Dataset

This section demonstrates how to use the trained model to predict tags for a portion of the `test_df`. The steps are similar to the single-prediction example:
1.  Take a subset of `question_summary` entries from `test_df`.
2.  For each summary, preprocess the text using the `text_preprocess` function.
3.  Convert the processed text into a numerical sequence using the `text_to_numerical_sequence` function and the loaded vocabulary (`loaded_vocab`) and `max_len` (`max_len_for_inference`).
4.  Convert the numerical sequence into a PyTorch tensor, adding a batch dimension.
5.  Pass the input tensor through the `loaded_model` to get prediction logits.
6.  Apply a sigmoid function to convert logits to probabilities, and then apply the `threshold` (0.5) to get binary predictions.
7.  Map the binary predictions to the actual tag names using `loaded_label_cols`.
8.  Print the original question summary and the recommended tags.

# Predict Function

In [None]:
def predict_tags(text, model, vocab, label_cols, max_len=128, threshold=0.5, device="cpu"):
    """
    Predict tags for a single input text.
    """

    # ------------------ PREPROCESS TEXT ------------------
    tokens = text_preprocess(text)
    encoded = text_to_numerical_sequence(tokens, vocab, max_len)
    input_tensor = torch.tensor([encoded], dtype=torch.long).to(device)

    # ------------------ MODEL INFERENCE ------------------
    model.eval()
    with torch.inference_mode():
        logits = model(input_tensor)

    # Convert logits → probabilities
    probs = torch.sigmoid(logits)

    # Binary threshold (0 or 1 per label)
    binary_preds = (probs > threshold).int()

    # Find indices of predicted label(s)
    pred_indices = torch.nonzero(binary_preds[0]).flatten().tolist()

    # Map indices → label names
    predicted_tags = [label_cols[idx] for idx in pred_indices]

    return predicted_tags, probs.cpu().numpy().flatten()


In [None]:
# Let's test the first 10 entries from the test_df
num_test_samples = 15
test_samples = test_df.sample(n=num_test_samples, random_state=42)
threshold = 0.5

print(f"--- Tag Recommendations for the first {num_test_samples} test samples ---")

for i, row in test_samples.iterrows():
  question_summary = row['question_summary']
  print(f"\nQuestion {i+1}: {question_summary}")

  # ----------- 1. GET TRUTH / GROUND-TRUTH LABELS -----------
  truth_label_indices = [idx for idx, col in enumerate(loaded_label_cols) if row[col] == 1]
  truth_tags = [loaded_label_cols[idx] for idx in truth_label_indices]

  print("Ground-Truth Tags:")
  if truth_tags:
    for tag in truth_tags:
      print(f"{tag}", end=", ")
  else:
    print("\nNo ground-truth tags")

  # ----------- 2. Preprocess + Predict -----------
  # processed_text = text_preprocess(question_summary)
  # encoded_text = text_to_numerical_sequence(processed_text, loaded_vocab, max_token_len)

  # input_tensor = torch.tensor([encoded_text], dtype=torch.long).to(device)

  # loaded_model.eval()
  # # with torch.no_grad(): # it is also rihgt method
  # with torch.inference_mode():
  #     output_logits = loaded_model(input_tensor)

  # output_probabilities = torch.sigmoid(output_logits)
  # predicted_tags_binary = (output_probabilities > threshold).int()

  # predicted_indices = torch.nonzero(predicted_tags_binary[0]).flatten().tolist()
  # predicted_tags = [loaded_label_cols[idx] for idx in predicted_indices]


  predicted_tags, probabilities = predict_tags(question_summary, loaded_model, loaded_vocab, loaded_label_cols,max_len=max_token_len, threshold=threshold, device=device)

  # ----------- 3. PRINT PREDICTED TAGS -----------
  print("\nPredicted Tags:")
  if predicted_tags:
      for tag in predicted_tags:
          print(f"{tag}", end=", ")
  else:
      print("\nNo tags predicted (threshold too high)")
  print("\n")

print("\n--- Test Dataset Inference Complete ---")


--- Tag Recommendations for the first 15 test samples ---

Question 6871: Python: access class property from string [duplicate] I have a class like the following: class User: def __init__(self): self.data = [] self.other_data = [] def doSomething(self, source): // if source = 'other_data' how ...
Ground-Truth Tags:
python, 
Predicted Tags:
python, 


Question 3882: MySQL PHP - SELECT WHERE id = array()? [duplicate] Possible Duplicate: MySQL query using an array Passing an array to mysql I have an array in PHP: $array = array(1, 4, 5, 7); As you can see, I have an array of different values, but I want to ...
Ground-Truth Tags:
arrays, html, mysql, php, 
Predicted Tags:
arrays, mysql, php, 


Question 6011: Position fixed doesn't work when using -webkit-transform I am using -webkit-transform (and -moz-transform / -o-transform) to rotate a div. Also have position fixed added so the div scrolls down with the user. In Firefox it works fine, but in webkit based ...
Ground-Truth Tags:
css, ht

# Task
Evaluate the trained model's performance on the `test_df` by preprocessing the test data, creating a `TextDataset` and `DataLoader`, and then running the `test_step` function to compute and display the test loss, exact match accuracy, and summarizing the overall performance.

## Prepare Test Data

### Subtask:
Preprocess the 'test_df' DataFrame to convert question summaries into numerical sequences and extract multi-hot encoded labels, using the same vocabulary and maximum sequence length established during training.


**Reasoning**:
I need to preprocess the `test_df` data by extracting the question summaries, cleaning them, converting them to numerical sequences using the pre-trained vocabulary and max length, and then preparing the labels in the required PyTorch tensor format. This follows the same steps as preparing the training and validation data.



In [None]:
test_texts = test_df['question_summary'].tolist()
print(f"test_texts len: {len(test_texts)}\ntest_texts: {test_texts[:5]}")

# preprocess all texts
test_processed_texts  = [text_preprocess(t) for t in test_texts]
print(f"test_processed_texts[0]:{test_processed_texts[0]}")

test_encoded_texts = [text_to_numerical_sequence(tokens,loaded_vocab,max_len) for tokens in test_processed_texts]
print(f"test_encoded_texts[0]:{test_encoded_texts[0]}")

test_encoded_texts = torch.tensor(test_encoded_texts, dtype=torch.long)
print(f"type(test_encoded_texts): {type(test_encoded_texts)}")

test_label_cols = test_df.columns[1:]  # all columns except the first one
print(f"test_label_cols: {test_label_cols}")

test_labels = torch.tensor(test_df[test_label_cols].values, dtype=torch.float32)
print(f"test_labels.shape: {test_labels.shape}")

test_texts len: 10738
test_texts: ["Sum of list of lists; returns sum list Let data = [[3,7,2],[1,4,5],[9,8,7]] Let's say I want to sum the elements for the indices of each list in the list, like adding numbers in a matrix column to get a single list. I am assuming that all ...", "success_url in UpdateView, based on passed value How can I set success_url based on a parameter? I really want to go back to where I came from, not some static place. In pseudo code: url(r'^entry/(?P<pk>\\d+)/edit/(?P<category>\\d+)', ...", 'NSLocale and country name I used this code for getting which country iPhone belong to: NSLocale *locale = [NSLocale currentLocale]; NSString *countryCode = [locale objectForKey: NSLocaleCountryCode]; NSString *country = [...', "No Linker option in Visual Studio Project Properties I'm trying to add a library to a Visual Studio C++ project, however, when I go to project properties I only have options for, Common Properties Startup Project Project Dependencies Debug ...", "m

**Reasoning**:
Now that the test data is preprocessed, I need to create a `TextDataset` and `DataLoader` for it, which will be used to feed the test data to the model for evaluation in batches.



In [None]:
### Test
test_dataset = TextDataset(test_encoded_texts,test_labels)

test_dataloader = DataLoader(test_dataset,batch_size=16,shuffle=False)

print("Test Dataset and DataLoader created successfully.")

Test Dataset and DataLoader created successfully.


**Reasoning**:
Now that the test dataset and dataloader are prepared, I will run the `test_step` function to evaluate the trained model's performance on the test data and display the results.



In [None]:
print("\n--- Evaluating Model on Test Data ---")
test_metrics = test_step(
    model=model,
    dataloader=val_dataloader,
    loss_fn=loss_fn,
    device=device,
    num_labels=num_labels
)
test_loss = test_metrics['test_loss']
test_exact_match_acc = test_metrics['test_exact_match_acc'] * 100
test_macro_f1 = test_metrics['test_macro_f1']
test_micro_f1 = test_metrics['test_micro_f1']
test_micro_recall = test_metrics['test_micro_recall']
test_micro_precision = test_metrics['test_micro_precision']

print(
    f"Final Test Loss: {test_loss:.5f} | "
    f"Final Test Exact Match Acc: {test_exact_match_acc:.2f}% | "
    f"Final Test Macro F1/Micro F1/Micro Recall/Micro Precision: {test_macro_f1:.4f}/{test_micro_f1:.4f}/{test_micro_recall:.4f}/{test_micro_precision:.4f}"
)
print("--- Test Evaluation Complete ---")


--- Evaluating Model on Test Data ---


Evaluating:   0%|          | 0/157 [00:00<?, ?it/s]

Final Test Loss: 0.01273 | Final Test Exact Match Acc: 68.05% | Final Test Macro F1/Micro F1/Micro Recall/Micro Precision: 0.8115/0.8566/0.8188/0.8979
--- Test Evaluation Complete ---
