# Hardware used AWS c6i.8xlarge 32vCPUs and 64 GB Memory - (in RHOAI)

```
Note :Do not use GPU hardware for this notebook as the pruned teacher model was trained on CPU machines
```

# **Multi-Sequence Text Classification: Sparse Transfer Learning with the Python API**

In this example, you will fine-tune a 90% pruned BERT model onto the QQP dataset (a multi-sequence binary classification problem) using SparseML's Hugging Face Integration.

### **Sparse Transfer Learning Overview**

Sparse Transfer Learning is very similiar to typical fine-tuning you are used to when training models. However, with Sparse Transfer Learning, we start the training process from a pre-sparsified checkpoint and maintain the sparsity structure while the fine tuning occurs.

At the end, you will have a sparse model trained on your dataset, ready to be deployed with DeepSparse for GPU-class performance on CPUs!

### **Pre-Sparsified BERT**
SparseZoo, Neural Magic's open source repository of pre-sparsified models, contains a 90% pruned version of BERT, which has been sparsified on the upstream Wikipedia and BookCorpus datasets with the
masked language modeling objective. [Check out the model card](https://sparsezoo.neuralmagic.com/models/nlp%2Fmasked_language_modeling%2Fobert-base%2Fpytorch%2Fhuggingface%2Fwikipedia_bookcorpus%2Fpruned90-none). We will use this model as the starting point for the transfer learning process.

## **Installation**

Install SparseML via `pip`.

In [None]:
# !pip install sparseml[transformers]

In [2]:
import sparseml
from sparsezoo import Model
from sparseml.transformers.utils import SparseAutoModel
from sparseml.transformers.sparsification import Trainer, TrainingArguments
import numpy as np
from transformers import (
    AutoModelForSequenceClassification,
    AutoConfig, 
    AutoTokenizer, 
    EvalPrediction, 
    default_data_collator
)
from datasets import load_dataset, load_metric

## **Step 1: Load a Dataset**

SparseML is integrated with Hugging Face, so we can use the `datasets` class to load datasets from the Hugging Face hub or from local files. 

[QQP Dataset Card](https://huggingface.co/datasets/glue/viewer/qqp/test)

In [3]:
# load dataset from HF Hub
dataset = load_dataset("glue", "qqp")

# alternatively, load from local CSV files
dataset["train"].to_csv("qqp-train.csv")
dataset["validation"].to_csv("qqp-validation.csv")
data_files = {
  "train": "qqp-train.csv",
  "validation": "qqp-validation.csv"
}
dataset_from_json = load_dataset("csv", data_files=data_files)

Creating CSV from Arrow format:   0%|          | 0/364 [00:00<?, ?ba/s]

Creating CSV from Arrow format:   0%|          | 0/41 [00:00<?, ?ba/s]

Downloading data files:   0%|          | 0/2 [00:00<?, ?it/s]

Extracting data files:   0%|          | 0/2 [00:00<?, ?it/s]

Generating train split: 0 examples [00:00, ? examples/s]

  return pd.read_csv(xopen(filepath_or_buffer, "rb", download_config=download_config), **kwargs)


Generating validation split: 0 examples [00:00, ? examples/s]

  return pd.read_csv(xopen(filepath_or_buffer, "rb", download_config=download_config), **kwargs)


In [4]:
print(dataset_from_json["train"])

Dataset({
    features: ['question1', 'question2', 'label', 'idx'],
    num_rows: 363846
})


In [5]:
!head qqp-train.csv --lines=5

question1,question2,label,idx
How is the life of a math student? Could you describe your own experiences?,Which level of prepration is enough for the exam jlpt5?,0,0
How do I control my horny emotions?,How do you control your horniness?,1,1
What causes stool color to change to yellow?,What can cause stool to come out as little balls?,0,2
What can one do after MBBS?,What do i do after my MBBS ?,1,3


In [6]:
# configs for below
INPUT_COL_1 = "question1"
INPUT_COL_2 = "question2"
LABEL_COL = "label"
NUM_LABELS = len(dataset_from_json["train"].unique(LABEL_COL))

## **Step 2: Setup Evaluation Metric**

QQP is a multi-input binary classification problem where we predict one of two class labels (duplicat, not duplicate) for each input pair. We will use the `accuracy` metric as the evaluation metric. 

Since SparseML is integrated with Hugging Face, we can pass `compute_metrics` function for evaluation (which will be passed to the `Trainer` class below).

In [7]:
# setup dataset and tokenize
metric = load_metric("glue", "qqp")

# setup metrics function
def compute_metrics(p: EvalPrediction):
  preds = p.predictions[0] if isinstance(p.predictions, tuple) else p.predictions
  preds = np.argmax(preds, axis=1)
  result = metric.compute(predictions=preds, references=p.label_ids)
  if len(result) > 1:
      result["combined_score"] = np.mean(list(result.values())).item()
  return result

  metric = load_metric("glue", "qqp")


## **Step 3: Download Files for Sparse Transfer Learning**

First, we need to select a sparse checkpoint to begin the training process. In this case, we will fine-tune a 90% pruned version of BERT onto the QQP dataset. This model is available in SparseZoo, identified by the following stub:
```
zoo:nlp/masked_language_modeling/obert-base/pytorch/huggingface/wikipedia_bookcorpus/pruned90-none
```

Next, we need to create a sparsification recipe for usage in the training process. Recipes are YAML files that encode the sparsity related algorithms and parameters to be applied by SparseML. For Sparse Transfer Learning, we need to use a recipe that instructs SparseML to maintain sparsity during the training process and to apply quantization over the final few epochs. 

In the case of QQP, there is a transfer learning recipe available in the SparseZoo, identified by the following stub:

```
zoo:nlp/text_classification/obert-base/pytorch/huggingface/qqp/pruned90_quant-none
```

Finally, SparseML has the optional ability to apply model distillation from a teacher model during the transfer learning process to boost accuracy. In this case, we will use a dense version of BERT trained on the QQP dataset which is hosted in SparseZoo. This model is identified by the following stub:

```
zoo:nlp/text_classification/obert-base/pytorch/huggingface/qqp/base-none
```

Use the `sparsezoo` python client to download the models and recipe using their SparseZoo stubs.

In [8]:
# download 90% pruned upstream BERT trained on MLM objective
model_stub = "zoo:nlp/masked_language_modeling/obert-base/pytorch/huggingface/wikipedia_bookcorpus/pruned90-none"
model_path = Model(model_stub, download_path="./model").training.path 

# download dense BERT trained on QQP dataset
teacher_stub = "zoo:nlp/text_classification/obert-base/pytorch/huggingface/qqp/base-none"
teacher_path = Model(teacher_stub, download_path="./teacher").training.path 

# download transfer recipe for QQP
transfer_stub = "zoo:nlp/text_classification/obert-base/pytorch/huggingface/qqp/pruned90_quant-none"
recipe_path = Model(transfer_stub, download_path="./transfer_recipe").recipes.default.path

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

Downloading (…)l/training/vocab.txt:   0%|          | 0.00/226k [00:00<?, ?B/s]

Downloading (…)ng/pytorch_model.bin:   0%|          | 0.00/418M [00:00<?, ?B/s]

Downloading (…)ng/eval_results.json:   0%|          | 0.00/252 [00:00<?, ?B/s]

Downloading (…)g/train_results.json:   0%|          | 0.00/123 [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/384 [00:00<?, ?B/s]

Downloading (…)training/config.json:   0%|          | 0.00/656 [00:00<?, ?B/s]

Downloading (…)ining/tokenizer.json:   0%|          | 0.00/455k [00:00<?, ?B/s]

Downloading (…)ng/training_args.bin:   0%|          | 0.00/2.36k [00:00<?, ?B/s]

Downloading (…)g/trainer_state.json:   0%|          | 0.00/269k [00:00<?, ?B/s]

Downloading (…)ing/all_results.json:   0%|          | 0.00/252 [00:00<?, ?B/s]

Downloading (…)training/config.json:   0%|          | 0.00/701 [00:00<?, ?B/s]

Downloading (…)lidation-metric.yaml:   0%|          | 0.00/163 [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/285 [00:00<?, ?B/s]

Downloading (…)ining/tokenizer.json:   0%|          | 0.00/455k [00:00<?, ?B/s]

Downloading (…)ing/eval_results.txt:   0%|          | 0.00/27.0 [00:00<?, ?B/s]

Downloading (…)r/training/vocab.txt:   0%|          | 0.00/226k [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

Downloading (…)ng/pytorch_model.bin:   0%|          | 0.00/418M [00:00<?, ?B/s]

Downloading (…)fer_recipe/recipe.md:   0%|          | 0.00/3.25k [00:00<?, ?B/s]

In [9]:
%ls ./model/training

all_results.json   special_tokens_map.json  training_args.bin
config.json        tokenizer_config.json    train_results.json
eval_results.json  tokenizer.json           vocab.txt
pytorch_model.bin  trainer_state.json


In [11]:
%ls ./teacher/training/

config.json        qqp-validation-metric.yaml  tokenizer.json
eval_results.txt   special_tokens_map.json     vocab.txt
pytorch_model.bin  tokenizer_config.json


In [12]:
%cat ./transfer_recipe/recipe.md

<!--
Copyright (c) 2021 - present / Neuralmagic, Inc. All Rights Reserved.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

   http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->

---

version: 1.1.0

# General Variables
num_epochs: &num_epochs 13
init_lr: 1.5e-4 
final_lr: 0

qat_start_epoch: &qat_start_epoch 8.0
observer_epoch: &observer_epoch 12.0
quantize_embeddings: &quantize_embeddings 1

distill_hardness: &distill_hardness 1.0
distill_temperature: &distill_temperature 2.0

# Modifiers:

training_modifiers:
  - !EpochRangeModifier
      end_epoch: eval(num_epochs)

## **Step 4: Setup Hugging Face Model Objects**

Next, we will set up the Hugging Face `tokenizer`, `config`, and `model`. 

These are all native Hugging Face objects, so check out the Hugging Face docs for more details on `AutoModel`, `AutoConfig`, and `AutoTokenizer` as needed. 

We instantiate these classes by passing the local path to the directory containing the `pytorch_model.bin`, `tokenizer.json`, and `config.json` files from the SparseZoo download.

In [13]:
# shared tokenizer between teacher and student
# see examples for how to use models with different tokenizers
tokenizer = AutoTokenizer.from_pretrained(model_path)

# setup configs
model_config = AutoConfig.from_pretrained(model_path, num_labels=NUM_LABELS)
teacher_config = AutoConfig.from_pretrained(teacher_path, num_labels=NUM_LABELS)

# initialize model using familiar HF AutoModel
model_kwargs = {"config": model_config}
model_kwargs["state_dict"], s_delayed = SparseAutoModel._loadable_state_dict(model_path)
model = AutoModelForSequenceClassification.from_pretrained(model_path, **model_kwargs,)
SparseAutoModel.log_model_load(model, model_path, "student", s_delayed) # prints metrics on sparsity profile

# initialize teacher using familiar HF AutoModel
teacher_kwargs = {"config": teacher_config}
teacher_kwargs["state_dict"], t_delayed = SparseAutoModel._loadable_state_dict(teacher_path)
teacher = AutoModelForSequenceClassification.from_pretrained(teacher_path, **teacher_kwargs,)
SparseAutoModel.log_model_load(teacher, teacher_path, "teacher", t_delayed) # prints metrics on sparsity profile

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at ./model/training and are newly initialized: ['classifier.weight', 'bert.pooler.dense.bias', 'classifier.bias', 'bert.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
2024-01-23 21:56:44 sparseml.transformers.utils.model INFO     Loaded student from ./model/training with 109483778 total params. Of those there are 85526016 prunable params which have 89.3777046740959 avg sparsity.
2024-01-23 21:56:45 sparseml.transformers.utils.model INFO     sparse model detected, all sparsification info: {"params_summary": {"total": 109483778, "sparse": 76441960, "sparsity_percent": 69.82035274668728, "prunable": 85526016, "prunable_sparse": 76441190, "prunable_sparsity_percent": 89.3777046740959, "quantizable": 85609730, "quantized": 0, "quantized_percent": 0.0}, "params_info": {"bert.encoder.layer.0.attention.self.query.weig

## **Step 5: Tokenize Dataset**

Run the tokenizer on the dataset. This is standard Hugging Face functionality.

In [14]:
MAX_LEN = 128
def preprocess_fn(examples):
  args = None
  if INPUT_COL_2 is None:
    args = (examples[INPUT_COL_1], )
  else:
    args = (examples[INPUT_COL_1], examples[INPUT_COL_2])
  result = tokenizer(*args, 
                   padding="max_length", 
                   max_length=min(tokenizer.model_max_length, 128), 
                   truncation=True)
  return result

# tokenize the dataset
tokenized_dataset = dataset_from_json.map(
    preprocess_fn,
    batched=True,
    desc="Running tokenizer on dataset"
)

Running tokenizer on dataset:   0%|          | 0/363846 [00:00<?, ? examples/s]

Running tokenizer on dataset:   0%|          | 0/40430 [00:00<?, ? examples/s]

## **Step 6: Run Training**

SparseML has a custom `Trainer` class that inherits from the [Hugging Face `Trainer` Class](https://huggingface.co/docs/transformers/main_classes/trainer). As such, the SparseML `Trainer` has all of the existing functionality of the HF trainer. However, in addition, we can supply a `recipe` and (optionally) a `teacher`. 


As we saw above, the `recipe` encodes the sparsity related algorithms and hyperparameters of the training process in a YAML file. The SparseML `Trainer` parses the `recipe` and adjusts the training workflow to apply the algorithms in the recipe.

The `teacher` is an optional argument that instructs SparseML to apply model distillation to support the training process.

***We run with only a subset of training samples for the purposes of a quick demo. Set `MAX_SAMPLES=None` to train on the entire dataset.***

In [15]:
# optionally run with subset (for timing)
MAX_SAMPLES = 2000
if MAX_SAMPLES is not None:
  train_dataset = tokenized_dataset["train"].select(range(MAX_SAMPLES))
  eval_dataset = tokenized_dataset["validation"].select(range(MAX_SAMPLES))
else:
  train_dataset = tokenized_dataset["train"]
  eval_dataset = tokenized_dataset["validation"]

# setup trainer arguments
training_args = TrainingArguments(
    output_dir="./training_output",
    do_train=True,
    do_eval=True,
    resume_from_checkpoint=False,
    logging_strategy="epoch",
    evaluation_strategy="epoch",
    save_strategy="epoch",
    save_total_limit=1,
    per_device_train_batch_size=32,
    per_device_eval_batch_size=32,
    fp16=False)

# initialize trainer
trainer = Trainer(
    model=model,
    model_state_path=model_path,
    recipe=recipe_path,
    teacher=teacher,
    metadata_args=["per_device_train_batch_size","per_device_eval_batch_size","fp16"],
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    tokenizer=tokenizer,
    data_collator=default_data_collator,
    compute_metrics=compute_metrics)

2024-01-23 21:57:33 sparseml.core.logger INFO     Logging all SparseML modifier-level logs to sparse_logs/23-01-2024_21.57.33.log
2024-01-23 21:57:33 sparseml.transformers.sparsification.trainer INFO     Loaded SparseML recipe variable into manager for recipe: ./transfer_recipe/recipe.md, recipe_variables: None and metadata {'per_device_train_batch_size': 32, 'per_device_eval_batch_size': 32, 'fp16': False}


In [16]:
# step 5: run training
%rm -rf training_output
train_result = trainer.train(resume_from_checkpoint=False)
trainer.save_model()
trainer.save_state()
trainer.save_optimizer_and_scheduler(training_args.output_dir)

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
2024-01-23 21:57:41 sparseml.transformers.sparsification.trainer INFO     Applied structure from SparseML recipe argument to model at epoch 0.0
2024-01-23 21:57:41 sparseml.pytorch.sparsification.distillation.modifier_distillation_base INFO     distillation modifier using distillation_teacher object
2024-01-23 21:57:41 sparseml.transformers.sparsification.trainer INFO     Modified the optimizer from the recipe for training with total_batch_size: 32 and steps_per_epoch: 63


Epoch,Training Loss,Validation Loss,Accuracy,F1,Combined Score
1,1.5954,1.843563,0.5875,0.606205,0.596853
2,1.1266,1.042887,0.743,0.672611,0.707806
3,0.7896,1.202728,0.7315,0.679403,0.705451
4,0.5213,1.138175,0.7555,0.678501,0.717
5,0.3678,1.382125,0.7275,0.686601,0.707051
6,0.214,1.290257,0.759,0.69455,0.726775
7,0.1722,1.242969,0.763,0.693005,0.728003
8,0.1529,1.267217,0.7625,0.682274,0.722387
9,0.1808,1.693169,0.7155,0.682655,0.699077
10,0.1486,1.318492,0.7455,0.694294,0.719897


2024-01-23 22:07:27 sparseml.transformers.sparsification.trainer INFO     Saved SparseML recipe with model state to ./training_output/checkpoint-63/recipe.yaml
2024-01-23 22:17:00 sparseml.transformers.sparsification.trainer INFO     Saved SparseML recipe with model state to ./training_output/checkpoint-126/recipe.yaml
2024-01-23 22:25:46 sparseml.transformers.sparsification.trainer INFO     Saved SparseML recipe with model state to ./training_output/checkpoint-189/recipe.yaml
2024-01-23 22:34:16 sparseml.transformers.sparsification.trainer INFO     Saved SparseML recipe with model state to ./training_output/checkpoint-252/recipe.yaml
2024-01-23 22:42:57 sparseml.transformers.sparsification.trainer INFO     Saved SparseML recipe with model state to ./training_output/checkpoint-315/recipe.yaml
2024-01-23 22:51:31 sparseml.transformers.sparsification.trainer INFO     Saved SparseML recipe with model state to ./training_output/checkpoint-378/recipe.yaml
2024-01-23 23:00:08 sparseml.transf

## Step 7: Export To ONNX

Run the following to export the model to ONNX. The script creates a `deployment` folder containing ONNX file and the necessary configuration files (e.g. `tokenizer.json`) for deployment with DeepSparse.

In [17]:
!sparseml.transformers.export_onnx \
  --model_path training_output \
  --task text_classification

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


2024-01-24 00:08:01 sparseml.transformers.export INFO     Attempting onnx export for model at /opt/app-root/src/neural-magic/sparse_transfer/training_output for task text-classification
2024-01-24 00:08:01 sparseml.transformers.export INFO     Using default sequence length of 512 (inferred from HF transformers config) 
Some weights of the model checkpoint at /opt/app-root/src/neural-magic/sparse_transfer/training_output were not used when initializing BertForSequenceClassification: ['bert.encoder.layer.1.attention.self.attention_scores_matmul.output_quant_stubs.0.activation_post_process.activation_post_process.min_val', 'bert.encoder.layer.7.attention.self.context_layer_matmul.output_quant_stubs.0.activation_post_process.activation_post_process.min_val', 'bert.encoder.layer.7.attention.self.attention_scores_matmul.input_quant_stubs.0.activation_post_process.activation_post_process.eps', 'bert.encoder.layer.5.attention.self.key.quant.activation_post_process.activation_post_process.max_v

## Deploy with DeepSparse**

In [18]:
%pip install deepsparse

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Collecting deepsparse
  Downloading deepsparse-1.6.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (46.9 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m46.9/46.9 MB[0m [31m74.7 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
Installing collected packages: deepsparse
Successfully installed deepsparse-1.6.1

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip available: [0m[31;49m22.2.2[0m[39;49m -> [0m[32;49m23.3.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


In [1]:
from deepsparse import Pipeline

pipeline = Pipeline.create("text_classification", model_path="./deployment")

DeepSparse, Copyright 2021-present / Neuralmagic, Inc. version: 1.6.1 COMMUNITY | (eff4f95d) (release) (optimized) (system=avx512_vnni, binary=avx512)


In [2]:
prediction = pipeline(
    sequences=[
        [
            "What is the plural of hypothesis?",
            "What is the plural of thesis?"
        ]
    ]
)
print(prediction) # label 0 is not a duplicate

labels=['LABEL_0'] scores=[0.9992422461509705]


In [3]:
prediction = pipeline(
    sequences=[
        [
          "Why don't people simply 'Google' instead of asking questions on Quora?",
          "Why do people ask Quora questions instead of just searching google?"
        ]
    ]
)
print(prediction) # label 1 is a duplicate

labels=['LABEL_1'] scores=[0.9939254522323608]
