<a href="https://colab.research.google.com/github/saitejamalyala/8051/blob/gh-pages/nlp_trends.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

@author: Adnan Ahmad

# Multiclass text classification

This notebook aims to demonstrate the use of some of the recent NLP trends to perform **Multiclass text classification**, with a focus on news data.

**Trends**

The following trends will be used in this notebook:

1. Knowledge distillation
2. Zero-shot learning
3. Roberta language model.
4. Quantization
5. Performance evaluation.


**Library**

We will be using the Huggingface library for this notebook.

**Steps**

1. Collect a dataset for the knowledge distillation task.
2. Use a pre-trained Roberta (large) model for zero-shot classification 
3. Distill the original model and generate student (small) model for the task.
4. Use ONNX and quantization to further reduce the model size.
5. Evaluate multiple models size, model performance and model latency.
6. Use gradel.io to create a user interface.

**Task summary**

* This code trains a small, efficient model called the student model to mimic the behavior of a large, accurate model called the teacher model on a specific text classification task. It does this using a technique called zero-shot distillation, which involves training the student model using only the input-output pairs of the teacher model, without any labeled training examples for the task. The student model is a variant of the original model, which is smaller and faster than the original RoBERT model. The trained student model is then evaluated on a test set and its performance is compared to the teacher model. Finally, the student model is converted to the ONNX format and quantized to reduce its size and improve its performance, and these versions are also evaluated on the test set.

Overall, these codes demonstrate how zero-shot distillation and quantization can be used to train a small, efficient models.

**Results** 

- `Original model`: 355MB, 69.33% accuracy, 667 sec (on test set).
- `Distilled model`: 255MB, 70.32% accuracy, 48 sec (on test set).
- `Quantized mode`: 65MB, 70.41% accuracy , 858 sec (on test set).



# Part-1: Zero-shot classification.

In [None]:
!pip install --quiet ipython-autotime
%load_ext autotime

In [None]:
!pip install --quiet transformers
!pip install --quiet datasets==2.8.0

### Get the dataset

In [None]:
# Import the load_dataset function from the datasets module
from datasets import load_dataset

# Load the 'ag_news' dataset and split it into train and test sets
train, test = load_dataset('ag_news', split=['train', 'test'])

# Get the text of the first element in the train set
train[0]['text']




  0%|          | 0/2 [00:00<?, ?it/s]

"Wall St. Bears Claw Back Into the Black (Reuters) Reuters - Short-sellers, Wall Street's dwindling\\band of ultra-cynics, are seeing green again."

### Perform zero-shot classification

`roberta-large-mnli` model is a RoBERTa model fine-tuned on the MNLI dataset, and is part of the transformers library. The model has a size of around 355 MB.

In [None]:
# Import the pipeline function from the transformers module
from transformers import pipeline

# Create a zero-shot classification pipeline using the 'roberta-large-mnli' model and device 0
zero_shot_classifier = pipeline('zero-shot-classification', model="roberta-large-mnli", device=0) #google/flan-t5-small

# Define a sequence of text to classify
sequence = "A new moon has been discovered in Jupiter's orbit."

# Define the class names for the zero-shot classification
class_names = ["the world", "sports", "business", "science/tech"]

# Use the zero-shot classification pipeline to classify the sequence of text
result = zero_shot_classifier(sequence, class_names)

# Print the result
print(result)


Some weights of the model checkpoint at roberta-large-mnli were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


{'sequence': "A new moon has been discovered in Jupiter's orbit.", 'labels': ['science/tech', 'the world', 'business', 'sports'], 'scores': [0.7967993021011353, 0.08699766546487808, 0.07631053775548935, 0.039892517030239105]}


### Save data and class labels

In [None]:
# Create a directory called 'agnews'
!mkdir agnews

# Open a file called 'train_unlabeled.txt' in the 'agnews' directory in write mode
with open("agnews/train_unlabeled.txt", 'w') as f:
  # Iterate through the 'text' field of the train set
  for seq in train["text"]:
    # Write each sequence to the file and add a newline character
    f.write(seq + '\n')

time: 363 ms (started: 2023-01-02 14:02:05 +00:00)


In [None]:
# Open a file called 'class_names.txt' in the 'agnews' directory in write mode
with open("agnews/class_names.txt", 'w') as f:
  # Iterate through the class names
  for label in class_names:
    # Write each class name to the file and add a newline character
    f.write(label + '\n')


time: 941 µs (started: 2023-01-02 14:02:07 +00:00)


# Part-2: Zero-shot distillition.

This code runs the `distill_classifier.py` script from the transformers library with the specified arguments. The script performs zero-shot distillation to train a new smaller student model for zero-shot classification. The student model is trained on the data in the `train_unlabeled.txt` file and the class names are taken from the `class_names.txt` file. Train file is one sentence per line, class labels are just the set of classes/labels. The are not mapped to the train file (unsupervised). The script uses mixed precision training for faster and more efficient training.

**Note:** This process will take ~30 min to complete.

In [None]:
!git clone https://github.com/huggingface/transformers.git

Cloning into 'transformers'...
remote: Enumerating objects: 120766, done.[K
remote: Counting objects: 100% (159/159), done.[K
remote: Compressing objects: 100% (114/114), done.[K
remote: Total 120766 (delta 62), reused 96 (delta 31), pack-reused 120607[K
Receiving objects: 100% (120766/120766), 114.39 MiB | 22.07 MiB/s, done.
Resolving deltas: 100% (90227/90227), done.
time: 14.2 s (started: 2023-01-02 14:02:10 +00:00)


In [None]:
# Run the distill_classifier.py script from the transformers library
!python transformers/examples/research_projects/zero-shot-distillation/distill_classifier.py \
--data_file ./agnews/train_unlabeled.txt \
--class_names_file ./agnews/class_names.txt \
--hypothesis_template "This text is about {}." \
--student_name_or_path distilbert-base-uncased \
--output_dir ./distilbert-base-uncased-agnews-student \
--fp16


Load the trained `distilled student model` and its associated `tokenizer` from the specified directory. 

In [None]:
from transformers import AutoModelForSequenceClassification, AutoTokenizer

# Load the trained student model's tokenizer from the specified directory
tokenizer = AutoTokenizer.from_pretrained("./distilbert-base-uncased-agnews-student")

# Load the trained student model from the specified directory
model = AutoModelForSequenceClassification.from_pretrained("./distilbert-base-uncased-agnews-student")

# Print the model's configuration
print(model.config)

DistilBertConfig {
  "_name_or_path": "./distilbert-base-uncased-agnews-student",
  "activation": "gelu",
  "architectures": [
    "DistilBertForSequenceClassification"
  ],
  "attention_dropout": 0.1,
  "dim": 768,
  "dropout": 0.1,
  "hidden_dim": 3072,
  "id2label": {
    "0": "the world",
    "1": "sports",
    "2": "business",
    "3": "science/tech"
  },
  "initializer_range": 0.02,
  "label2id": {
    "business": 2,
    "science/tech": 3,
    "sports": 1,
    "the world": 0
  },
  "max_position_embeddings": 512,
  "model_type": "distilbert",
  "n_heads": 12,
  "n_layers": 6,
  "pad_token_id": 0,
  "qa_dropout": 0.1,
  "seq_classif_dropout": 0.2,
  "sinusoidal_pos_embds": false,
  "tie_weights_": true,
  "torch_dtype": "float32",
  "transformers_version": "4.25.1",
  "vocab_size": 30522
}

time: 1.57 s (started: 2023-01-02 14:56:50 +00:00)


### Create text-classification pipeline using new student model

In [None]:
# Import the TextClassificationPipeline class from the transformers module
from transformers import TextClassificationPipeline

# Create a text classification pipeline using the trained student model and its associated tokenizer
distilled_classifier = TextClassificationPipeline(model=model, tokenizer=tokenizer, 
                                                   # Return scores for all classes
                                                   return_all_scores=True, 
                                                   # Use device 0
                                                   device=0)

# Classify the sequence of text using the distilled classifier pipeline
result = distilled_classifier(sequence)

# Print the result
print(result)


[[{'label': 'the world', 'score': 0.1471155285835266}, {'label': 'sports', 'score': 0.029311053454875946}, {'label': 'business', 'score': 0.052956074476242065}, {'label': 'science/tech', 'score': 0.7706173062324524}]]


# Part-3: Zero-shot classifier VS Zero-shot distilled (student) classifier performence

This code calculates the accuracy of the `zero-shot classifier` and `distilled student classifier` on the test set. It does this by iterating through the test set in batches, using the each classifier to classify the examples in each batch, and storing the predictions. It then calculates the `accuracy` as the proportion of correct predictions and prints the accuracy and runtime. This can be useful for evaluating the performance of the original `zero-shot classifier` and the `distilled student classifier` on unseen data.

### Original zero-shot classifier evaluation

In [None]:
# Import the necessary libraries
import numpy as np
from time import time
from tqdm.auto import tqdm

# Start the timer
start = time()

# Set the batch size
batch_size = 32

# Set the hypothesis template
hypothesis_template = "This text is about {}."

# Initialize an empty list for storing predictions
preds = []

# Iterate through the test set in batches
for i in tqdm(range(0, len(test), batch_size)):
  # Get the examples in the current batch
  examples = test[i:i+batch_size]['text']
  # Use the zero-shot classifier to classify the examples
  outputs = zero_shot_classifier(examples, class_names, hypothesis_template=hypothesis_template)
  # Store the predictions for the current batch
  preds += [class_names.index(o['labels'][0]) for o in outputs]

# Calculate the accuracy of the model
accuracy = np.mean(np.array(preds) == np.array(test['label']))

# Print the accuracy and runtime
print(f"Teacher model accuracy: {accuracy*100:0.2f}%")
print(f"Runtime: {time() - start : 0.2f} seconds")


  0%|          | 0/238 [00:00<?, ?it/s]



Teacher model accuracy: 69.33%
Runtime:  667.78 seconds
time: 11min 7s (started: 2023-01-02 14:57:10 +00:00)


### Distilled student classifier evaluation

In [None]:
# Start the timer
start = time()

# Set the batch size
batch_size = 128

# Set the distilled classifier to return only the top score
distilled_classifier.return_all_scores = False

# Initialize an empty list for storing predictions
preds = []

# Iterate through the test set in batches
for i in tqdm(range(0, len(test), batch_size)):
  # Get the examples in the current batch
  examples = test[i:i+batch_size]['text']
  # Use the distilled classifier to classify the examples
  outputs = distilled_classifier(examples)
  # Store the predictions for the current batch
  preds += [class_names.index(max(o, key=lambda x: x['score'])['label']) for o in outputs]

# Calculate the accuracy of the model
accuracy = np.mean(np.array(preds) == np.array(test['label']))

# Print the accuracy and runtime
print(f"Distilled model accuracy: {accuracy*100:0.2f}%")
print(f"Runtime: {time() - start : 0.2f} seconds")


  0%|          | 0/60 [00:00<?, ?it/s]

Distilled model accuracy: 70.32%
Runtime:  48.39 seconds
time: 48.4 s (started: 2023-01-02 15:08:23 +00:00)


# Part-4: Quantize distilled model

This code converts the `distilled student classifier` pipeline to `ONNX format` and saves the ONNX model to the specified file. The ONNX model can be used for running the model on other platforms that support ONNX, such as Microsoft Windows, Apple iOS, and Android. The conversion is performed using the `convert_pytorch` function from the `convert_graph_to_onnx` module of the transformers library. The `opset` argument specifies the version of ONNX to use and the `use_external_format` argument specifies whether to use the internal ONNX format or an external format.

### Convert to ONNX

In [None]:
!pip install transformers[onnx]
!pip install onnxruntime-gpu
# Restart runtime

In [None]:
!mkdir onnx

In [None]:
# Make sure the ``model`` and the ``pipeline`` are defined in previous sections

In [None]:
# Import the necessary modules
import transformers
import transformers.convert_graph_to_onnx as onnx_convert
from pathlib import Path

# Move the model to the CPU
model = model.cpu()

# Create a text classification pipeline using the trained student model and its associated tokenizer
distilled_classifier = TextClassificationPipeline(model=model, tokenizer=tokenizer, return_all_scores=True)

# Convert the distilled classifier pipeline to ONNX format
onnx_convert.convert_pytorch(distilled_classifier, 
                             # Use ONNX version 12
                             opset=12, 
                             # Save the ONNX model to the specified file
                             output=Path("onnx/classifier.onnx"), 
                             # Use the internal ONNX format
                             use_external_format=False)


Using framework PyTorch: 1.13.0+cu116
Found input input_ids with shape: {0: 'batch', 1: 'sequence'}
Found input attention_mask with shape: {0: 'batch', 1: 'sequence'}
Found output output_0 with shape: {0: 'batch'}
Ensuring inputs are in correct order
head_mask is not present in the generated input list.
Generated inputs order: ['input_ids', 'attention_mask']


  mask, torch.tensor(torch.finfo(scores.dtype).min)


### Quantize model

This code quantizes the ONNX model, which means that it converts the model's floating point weights to integer weights in order to reduce the model's size and improve its performance. The quantization is performed using the `quantize_dynamic` function from the onnxruntime library. The `weight_type` argument specifies the type of integer weights to use and the `optimize_model` argument specifies whether to optimize the model for quantization.

In [None]:
# Import the necessary modules
from onnxruntime.quantization import quantize_dynamic, QuantType
import tensorflow as tf

# Quantize the ONNX model
quantize_dynamic("onnx/classifier.onnx", 
                 # Save the quantized model to the specified file
                 "onnx/classifier_int8.onnx", 
                 # Use 8-bit integer weights
                 weight_type=QuantType.QUInt8,
                 # Optimize the model for quantization
                 optimize_model=True)


Ignore MatMul due to non constant B: /[/distilbert/transformer/layer.0/attention/MatMul]
Ignore MatMul due to non constant B: /[/distilbert/transformer/layer.0/attention/MatMul_1]
Ignore MatMul due to non constant B: /[/distilbert/transformer/layer.1/attention/MatMul]
Ignore MatMul due to non constant B: /[/distilbert/transformer/layer.1/attention/MatMul_1]
Ignore MatMul due to non constant B: /[/distilbert/transformer/layer.2/attention/MatMul]
Ignore MatMul due to non constant B: /[/distilbert/transformer/layer.2/attention/MatMul_1]
Ignore MatMul due to non constant B: /[/distilbert/transformer/layer.3/attention/MatMul]
Ignore MatMul due to non constant B: /[/distilbert/transformer/layer.3/attention/MatMul_1]
Ignore MatMul due to non constant B: /[/distilbert/transformer/layer.4/attention/MatMul]
Ignore MatMul due to non constant B: /[/distilbert/transformer/layer.4/attention/MatMul_1]
Ignore MatMul due to non constant B: /[/distilbert/transformer/layer.5/attention/MatMul]
Ignore MatM

In [None]:
# Import the necessary modules
import onnxruntime as ort
import numpy as np

# Load the ONNX model
session = ort.InferenceSession("onnx/classifier.onnx", 
                               # Use the CUDA execution provider
                               providers=['CUDAExecutionProvider'])

# Load the quantized ONNX model
session_int8 = ort.InferenceSession("onnx/classifier_int8.onnx",  
                                    # Use the CUDA execution provider
                                    providers=['CUDAExecutionProvider'])


# Part-5: Evaluate Distilled model VS Quantized distilled model

This part evaluates the `quantized` and `non-quantized` student classifier.

### Non-quantized model evaluation

In [None]:
# Import the necessary modules
import numpy as np
from time import time
from tqdm.auto import tqdm

# Start the timer
start = time()

# Set the batch size
batch_size = 128

# Initialize an empty list for storing predictions
preds = []

# Iterate through the test set in batches
for i in tqdm(range(0, len(test), batch_size)):
  # Get the examples in the current batch
  examples = test[i:i+batch_size]['text']
  # Tokenize the examples
  examples_tokened = tokenizer(examples, 
                               # Use padding to ensure that all input sequences have the same length
                               padding="max_length", 
                               # Truncate the input sequences to the specified length
                               truncation=True, 
                               # Set the maximum input sequence length
                               max_length=128)

  # Create the input feed for the ONNX model
  input_feed = {
      # Set the input IDs
      "input_ids": np.array(examples_tokened['input_ids']),
      # Set the attention mask
      "attention_mask": np.array(examples_tokened['attention_mask']),
  }

  # Run the ONNX model and get the output
  out = session.run(input_feed=input_feed, 
                    # Specify the output tensor
                    output_names=['output_0'])[0]
  # Get the predictions from the output tensor
  predictions = np.argmax(out, axis=-1)
  # Store the predictions for the current batch
  preds += predictions.tolist()

# Calculate the accuracy of the model
accuracy = np.mean(np.array(preds) == np.array(test['label']))
# Print the accuracy and runtime
print(f"Distilled model accuracy: {accuracy*100:0.2f}%")
print(f"Runtime: {time() - start : 0.2f} seconds")

  0%|          | 0/60 [00:00<?, ?it/s]

Distilled model accuracy: 70.29%
Runtime:  27.12 seconds


### Quantized model evaluation

In [None]:
# Start the timer
start = time()

# Set the batch size
batch_size = 256

# Initialize an empty list for storing predictions
preds = []

# Iterate through the test set in batches
for i in tqdm(range(0, len(test), batch_size)):
  # Get the examples in the current batch
  examples = test[i:i+batch_size]['text']
  # Tokenize the examples
  examples_tokened = tokenizer(examples, 
                               # Use padding to ensure that all input sequences have the same length
                               padding="max_length", 
                               # Truncate the input sequences to the specified length
                               truncation=True, 
                               # Set the maximum input sequence length
                               max_length=128)

  # Create the input feed for the ONNX model
  input_feed = {
      # Set the input IDs
      "input_ids": np.array(examples_tokened['input_ids']),
      # Set the attention mask
      "attention_mask": np.array(examples_tokened['attention_mask']),
  }

  # Run the quantized ONNX model and get the output
  out = session_int8.run(input_feed=input_feed, 
                         # Specify the output tensor
                         output_names=['output_0'])[0]
  # Get the predictions from the output tensor
  predictions = np.argmax(out, axis=-1)
  # Store the predictions for the current batch
  preds += predictions.tolist()


# Calculate the accuracy of the model
accuracy = np.mean(np.array(preds) == np.array(test['label']))
# Print the accuracy and runtime
print(f"Quantized distilled model accuracy: {accuracy*100:0.2f}%")
print(f"Runtime: {time() - start : 0.2f} seconds")

  0%|          | 0/30 [00:00<?, ?it/s]

Quantized distilled model accuracy: 70.41%
Runtime:  858.91 seconds


# Part-6: Demo app

In [None]:
!pip install --quiet gradio

In [None]:
import gradio as gr

def classify_text(text):
  # Tokenize the input text
  examples_tokened = tokenizer([text], 
                               # Use padding to ensure that all input sequences have the same length
                               padding="max_length", 
                               # Truncate the input sequences to the specified length
                               truncation=True, 
                               # Set the maximum input sequence length
                               max_length=128)
  
  # Create the input feed for the ONNX model
  input_feed = {
      # Set the input IDs
      "input_ids": np.array(examples_tokened['input_ids']),
      # Set the attention mask
      "attention_mask": np.array(examples_tokened['attention_mask']),
  }

  # Run the quantized ONNX model and get the output
  out = session_int8.run(input_feed=input_feed, 
                         # Specify the output tensor
                         output_names=['output_0'])[0]

  pred_dict = {"the world":out[0][0], "sports": out[0][1], "business":out[0][2], "science/tech": out[0][3]}
  # Define the class names for the zero-shot classification
  class_names = ["the world", "sports", "business", "science/tech"]
  # Get the predictions from the output tensor
  predictions = np.argmax(out, axis=-1)
  #print(predictions)
  label = class_names[predictions[0]]

  return str(pred_dict)

In [None]:
demo = gr.Interface(
    classify_text,
    inputs=gr.Textbox(placeholder="Enter a string here"), 
    outputs=gr.Textbox()
    )

demo.launch()

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
Note: opening Chrome Inspector may crash demo inside Colab notebooks.

To create a public link, set `share=True` in `launch()`.


<IPython.core.display.Javascript object>

