# Self Attention Mechanism Assignment (Graded): Text Classification with AG News Dataset

Welcome to your programming assignment on Self Attention Mechanism! You will build a Deep Learning Model with Self Attention Mechanism for text classification on the AG News Dataset. 

## Problem Description

- In this assignment, you will explore and implement self-attention mechanisms for text classification using the AG News dataset.

- Your goal is to build and train a text classification model that utilizes self-attention to categorize news articles into predefined topics.

## Dataset Description

- The AG News dataset is a collection of news articles categorized into four classes:
    - World

    - Sports
    - Business
    - Sci/Tech

- The dataset consists of:
    - 120,000 training examples
    - 7,600 test examples

- Each example in the dataset contains:
    - Text: The news article text
    - Label: An integer (0-3) representing the category of the news article

- For more information about the AG News dataset, you can visit the following link: [AG News Dataset](https://www.kaggle.com/datasets/amananandrai/ag-news-classification-dataset)


## Assignment Tasks

1. **Data Preparation and Exploration**
    - Explore the dataset structure, including the number of samples, class distribution, and text length statistics.
    
    - Implement a function to preprocess the text data, including tokenization and padding.

2. **Implement a Self-Attention Layer**
    - Create a custom Keras layer that implements the self-attention mechanism.

    - The layer should take a sequence of word embeddings as input and output attention-weighted representations.

    - Implement the attention score calculation and softmax normalization.

3. **Build the Text Classification Model**

    - Design a neural network architecture that incorporates the self-attention layer.
    
    - The model should include an embedding layer, the self-attention layer, and output layers for classification.
    
    - Compile the model with appropriate loss function and optimizer.

4. **Train and Evaluate the Model**
    - Split the training data into training and validation sets.

    - Train the model on the training set and monitor its performance on the validation set.

    - Implement early stopping to prevent overfitting.

    - Evaluate the trained model on the test set and report accuracy and other relevant metrics.

5. **Prediction and Interpretation**
   - Use the trained model to make predictions on new, unseen reviews

## Instructions

- Only write code when you see any of the below prompts,

    ```
    # YOUR CODE GOES HERE
    # YOUR CODE ENDS HERE
    # TODO
    ```

- Do not modify any other section of the code unless tated otherwise in the comments.

# Code Section

In [None]:
import tensorflow as tf
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
import seaborn as sns
from helpers.methods import load_data, plot_results, detect_and_set_device
from tests.test_methods import test_vectorize_text, test_self_attention, test_create_model, test_train_model, test_evaluate_model

In [None]:
 # load data
train_data, test_data = load_data()

# Split training data into train and validation sets
train_data, val_data = train_test_split(train_data, test_size=0.2, random_state=42)
print("Data Spit into Train and Validation Sets")

## Task: Let's get to know about our dataset

In [None]:
# Display the first few rows of the training data
train_data.head()

### Shape of the dataset: Testing and Training

In [None]:
# TODO: Number of samples
train_samples = 
test_samples = 

print("Dataset Structure:")
print(f"Number of training samples: {train_samples}")
print(f"Number of test samples: {test_samples}")

### Class distribution

In [None]:
# TODO: Class distribution
class_distribution = 
print("\nClass distribution:")
print(class_distribution)

### Visualizing class distribution

In [None]:
# Visualize class distribution
plt.figure(figsize=(10, 6))
class_distribution.plot(kind='bar')
plt.title('Class Distribution in Training Set')
plt.xlabel('Class Label')
plt.ylabel('Number of Samples')
plt.xticks(rotation=0)
plt.show()

### Text length statistics

In [None]:
# Text length statistics
text_length_stats = 

print("\nText length statistics:")
print(text_length_stats)

### Visualizing text length distribution

In [None]:
# Visualize text length distribution
plt.figure(figsize=(10, 6))
train_data['text_length'].hist(bins=50)
plt.title('Distribution of Text Lengths in Training Set')
plt.xlabel('Text Length')
plt.ylabel('Frequency')
plt.show()

## Task: Preprocessing the dataset

**Task Hints:**

Complete the vectorize_text function for preprocessing the text data.

- Use the TextVectorization layer from TensorFlow/Keras to convert the text data into sequences of integers.

- Set the maximum number of words in the vocabulary to 20,000 using the max_tokens parameter in TextVectorization.

- Set the maximum length for the sequences to 100 using the output_sequence_length parameter in TextVectorization.

- Adapt the vectorization layer to the training data using the adapt method, passing in the 'Text' column values from the train_data DataFrame.

- Return the configured vectorize_layer for use in the model.

In [9]:
# Text Vectorization
def vectorize_text(train_data, test_data):
    max_features = 
    sequence_length = 
    
    # YOUR CODE GOES HERE
    vectorize_layer

    # YOUR CODE ENDS HERE
    
    test_vectorize_text(vectorize_layer)
    
    return vectorize_layer

## Task: Building the Self-Attention Mechanism

**Task Hints:**

Implement the SelfAttention class as a custom layer in TensorFlow/Keras.

- Create a class that inherits from tf.keras.layers.Layer.

- In the __init__ method:
  - Initialize the parent class using super().
  - Create three Dense layers: W1 and W2 with the specified number of units, and V with one unit.

- Implement the call method to perform the self-attention mechanism:
  - Calculate the attention scores using the W1, W2, and V layers.
  - Apply softmax to get attention weights.
  - Use the attention weights to create a context vector.
  - Ensure the output shape is (batch_size, embedding_dim).

- Add comments to explain the shape of tensors at each step of the attention mechanism.

- Consider implementing a get_config method for serialization if you plan to save the model.

- Test the layer with sample input to ensure it produces the expected output shape.

In [None]:
# Self-Attention Layer
class SelfAttention(tf.keras.layers.Layer):
    def __init__(self, units):
        super(SelfAttention, self).__init__()
        self.W1 = tf.keras.layers.Dense(units)
        self.W2 = tf.keras.layers.Dense(units)
        self.V = tf.keras.layers.Dense(1)

    def call(self, inputs):
        # TODO: inputs shape: (batch_size, seq_len, embedding_dim)
        
        
        # TODO: score shape: (batch_size, seq_len, 1)
        
        
        # TODO: attention_weights shape: (batch_size, seq_len, 1)
        
        
        # TODO: context_vector shape: (batch_size, embedding_dim)
        
        
        return context_vector
    
test_self_attention(SelfAttention(64))

## Task: Model Building

**Task Hints:**

Complete the create_model function to build the text classification model with self-attention.

- Use the Sequential API from Keras to create the model.

- Add the following layers in order:
  1. The vectorize_layer as the first layer to process input text.
  2. An Embedding layer with 20,000 input dimensions and 128 output dimensions.
  3. A Bidirectional LSTM layer with 64 units, setting return_sequences=True.
  4. The custom SelfAttention layer with 64 units.
  5. A Dense layer with 64 units and ReLU activation.
  6. A Dropout layer with a rate of 0.5 for regularization.
  7. A final Dense layer with 4 units (one for each class) and softmax activation.

- Compile the model with:
  - The sparse categorical crossentropy loss function.
  - The Adam optimizer.
  - Accuracy as the metric to monitor.

- Return the compiled model.

- Consider adding a summary() call to print the model architecture for verification.

- Ensure that the input and output shapes of each layer are compatible.

In [11]:
# Model Definition
def create_model(vectorize_layer):
    embedding_dim = 
    
    # YOUR CODE GOES HERE


    # YOUR CODE ENDS HERE
    
    test_create_model(model)
    
    return model



## Task: Model Training

**Task Hints:**

Complete the train_model function to train the text classification model with early stopping.

- Set up an EarlyStopping callback:
  - Monitor the validation loss ('val_loss').
  - Set the patience to 3 epochs.
  - Enable restoring the best weights.

- Use the model.fit method to train the model:
  - Pass the training data (train_data['Text']) and labels (train_data['Class'] - 1).
  - Provide validation data using the validation_data parameter.
  - Set the number of epochs to 20.
  - Use a batch size of 32.
  - Include the early stopping callback in the callbacks list.

- Remember to subtract 1 from the class labels to make them 0-indexed, as required by the loss function.

- Return the training history object for later analysis and visualization.

- Consider adding a verbose parameter to control the output during training.

- Ensure that the input data types match what the model expects (text data for inputs, integer labels for targets).

In [12]:
# Training Function
def train_model(model, train_data, val_data, runtime_device):
    # TODO: Define early stopping callback
    
    with tf.device('/' + runtime_device + ':0'):
        # YOUR CODE GOES HERE
        history = 


        # YOUR CODE ENDS HERE
        )
        
    test_train_model(history)
    return history

## Task: Model Evaluation

**Task Hints:**

Complete the evaluate_model function to assess the performance of the trained model on the test data.

- Use model.evaluate to compute the loss and accuracy on the test set:
  - Pass the test data (test_data['Text']) and labels (test_data['Class'] - 1).
  - Remember to subtract 1 from the class labels to make them 0-indexed.
  - Print the test accuracy with 4 decimal places.

- Generate predictions using model.predict:
  - Use the test data (test_data['Text']) as input.
  - Convert the predicted probabilities to class labels using np.argmax.

- Create a classification report:
  - Use sklearn's classification_report function.
  - Compare the true labels (test_data['Class'] - 1) with the predicted classes.
  - Print the classification report, which includes precision, recall, and F1-score for each class.

- Return the predicted classes for further analysis or visualization.

- Consider adding additional evaluation metrics if needed, such as confusion matrix or ROC AUC score.

- Ensure that the input data types match what the model expects (text data for inputs, integer labels for targets).

In [13]:
# Evaluation Function
def evaluate_model(model, test_data):
    
    # TODO: Evaluate the model on the test data
    accuracy, loss = 
    print(f"Test accuracy: {accuracy:.4f}")
    
    # TODO: Make predictions on the test data
    
    # TODO: Convert the predictions to class labels

    
    print(classification_report(test_data['Class'] - 1, predicted_classes))
    
    test_evaluate_model(predicted_classes)
    
    return predicted_classes

## Driver code to run the built pipeline

In [None]:
#---------------- Do not change the code below ----------------#

def main():
    # Set device
    device = detect_and_set_device()
    # Load and preprocess data
    train_data, test_data = load_data()
    
    # Split training data into train and validation sets
    train_data, val_data = train_test_split(train_data, test_size=0.2, random_state=42)
    
    # Create vectorization layer
    vectorize_layer = vectorize_text(train_data, test_data)
    
    # Create and compile model
    model = create_model(vectorize_layer)
    
    # Train model
    history = train_model(model, train_data, val_data, device)
    
    # Evaluate model
    predicted_classes = evaluate_model(model, test_data)
    
    return history, test_data, predicted_classes
    


if __name__ == "__main__":
    history, test_data, predicted_classes = main()

## Plot Training History

In [None]:
#---------------- Do not change the code below ----------------#
# Run this cell to visualize results
plot_results(history, test_data, predicted_classes)