# **Text Classification: Sparse Transfer Learning with the Python API**

In this example, you will fine-tune a 90% pruned BERT model onto the SICK dataset (a multi-sequence classification problem) using SparseML's Hugging Face Integration.

### **Sparse Transfer Learning Overview**

Sparse Transfer Learning is very similiar to the typical transfer learning process used to train NLP models, where we fine-tune a pretrained checkpoint onto a smaller downstream dataset. With Sparse Transfer Learning, however, we simply start the training process from a pre-sparsified checkpoint and maintain sparsity while the fine-tuning occurs.

At the end, you will have a sparse model trained on your dataset, ready to be deployed with DeepSparse for GPU-class performance on CPUs!

### **Pre-Sparsified BERT**
SparseZoo, Neural Magic's open source repository of pre-sparsified models, contains a 90% pruned version of BERT, which has been sparsified on the upstream Wikipedia and BookCorpus datasets with the
masked language modeling objective. [Check out the model card](https://sparsezoo.neuralmagic.com/models/nlp%2Fmasked_language_modeling%2Fobert-base%2Fpytorch%2Fhuggingface%2Fwikipedia_bookcorpus%2Fpruned90-none). We will use this model as the starting point for the transfer learning process.


***Let's dive in!***

## **Installation**

Install SparseML via `pip`.



In [None]:
!pip install sparseml[transformers]

If you are running on Google Colab, restart the runtime after this step.

In [None]:
import sparseml
from sparsezoo import Model
from sparseml.transformers.utils import SparseAutoModel
from sparseml.transformers.sparsification import Trainer, TrainingArguments
import numpy as np
from transformers import (
    AutoModelForSequenceClassification,
    AutoConfig, 
    AutoTokenizer, 
    EvalPrediction, 
    default_data_collator
)
from datasets import load_dataset, load_metric

## **Step 1: Load a Dataset**

SparseML is integrated with Hugging Face, so we can use the `datasets` class to load datasets from the Hugging Face hub or from local files. 

[SICK Dataset Card](https://huggingface.co/datasets/sick)

In [None]:
# load_dataset from HF hub
dataset = load_dataset("sick")

# alternatively, load from local JSON files
dataset["train"].to_csv("sick-train.csv")
dataset["validation"].to_csv("sick-validation.csv")
data_files = {
  "train": "sick-train.csv",
  "validation": "sick-validation.csv"
}
dataset_from_json = load_dataset("csv", data_files=data_files)

In [None]:
!head sick-train.csv --lines=5

In [None]:
print(dataset_from_json["train"])

In [None]:
# configs
INPUT_COL_1 = "sentence_A"
INPUT_COL_2 = "sentence_B"
LABEL_COL = "label"
NUM_LABELS = len(dataset_from_json["train"].unique(LABEL_COL))

## **Step 2: Setup Evaluation Metric**

SICK is a multi-class classification problem where we predict one of three class labels for each input pair (entailment, contradiction, or neutral). We will use the `accuracy` metric (% of correct predictions) as the evaluation metric. 

Since SparseML is integrated with Hugging Face, we can pass a `compute_metrics` function for evaluation (which will be passed to the `Trainer` class below).

In [None]:
metric = load_metric("accuracy")

def compute_metrics(p: EvalPrediction):
  preds = p.predictions[0] if isinstance(p.predictions, tuple) else p.predictions
  preds = np.argmax(preds, axis=1)
  result = metric.compute(predictions=preds, references=p.label_ids)
  if len(result) > 1:
      result["combined_score"] = np.mean(list(result.values())).item()
  return result


## **Step 3: Download Files for Sparse Transfer Learning**

First, we need to select a sparse checkpoint to begin the training process. In this case, we will fine-tune a 90% pruned version of BERT onto the SICK dataset. This model is available in SparseZoo, identified by the following stub:
```
zoo:nlp/masked_language_modeling/obert-base/pytorch/huggingface/wikipedia_bookcorpus/pruned90-none
```

Next, we need to create a sparsification recipe for usage in the training process. Recipes are YAML files that encode the sparsity related algorithms and parameters to be applied by SparseML. For Sparse Transfer Learning, we need to use a recipe that instructs SparseML to maintain sparsity during the training process and to apply quantization over the final few epochs.

In SparseZoo, there is a transfer recipe which was used to fine-tune BERT onto the MNLI task (which is also a multi-sequence multi-class classification problem). Since SICK is a similiar problem to MNLI, we will use the MNLI recipe, which is identified by the following stub:

```
zoo:nlp/text_classification/obert-base/pytorch/huggingface/mnli/pruned90_quant-none
```

Use the `sparsezoo` python client to download the models and recipe using their SparseZoo stubs.

In [None]:
# downloads 90% pruned upstream BERT trained on MLM objective
model_stub = "zoo:nlp/masked_language_modeling/obert-base/pytorch/huggingface/wikipedia_bookcorpus/pruned90-none" 
model_path = Model(model_stub, download_path="./model").training.path 

# downloads transfer recipe for MNLI(pruned90_quant)
transfer_stub = "zoo:nlp/text_classification/obert-base/pytorch/huggingface/mnli/pruned90_quant-none"
recipe_path = Model(transfer_stub, download_path="./transfer_recipe").recipes.default.path

We can see that the upstream model (trained on Wikipedia BookCorpus) and  configuration files have been downloaded to the local directory.

In [None]:
%ls ./model/training

#### Inspecting the Recipe

Here is the transfer learning recipe:

```yaml
version: 1.1.0

num_epochs: 13
init_lr: 8e-5
final_lr: 0

qat_start_epoch: 8.0
observer_epoch: 12.0
quantize_embeddings: 1

distill_hardness: &distill_hardness 1.0
distill_temperature: &distill_temperature 3.0

weight_decay: 0.0

# Modifiers:

training_modifiers:
  - !EpochRangeModifier
      end_epoch: eval(num_epochs)
      start_epoch: 0.0

  - !LearningRateFunctionModifier
      start_epoch: 0
      end_epoch: eval(num_epochs)
      lr_func: linear
      init_lr: eval(init_lr)
      final_lr: eval(final_lr)

quantization_modifiers:
  - !QuantizationModifier
      start_epoch: eval(qat_start_epoch)
      disable_quantization_observer_epoch: eval(observer_epoch)
      freeze_bn_stats_epoch: eval(observer_epoch)
      quantize_embeddings: eval(quantize_embeddings)
      quantize_linear_activations: 0
      exclude_module_types: ['LayerNorm', 'Tanh']
      submodules:
        - bert.embeddings
        - bert.encoder
        - bert.pooler
        - classifier

distillation_modifiers:
  - !DistillationModifier
     hardness: eval(distill_hardness)
     temperature: eval(distill_temperature)
     distill_output_keys: [logits]

constant_modifiers:
  - !ConstantPruningModifier
      start_epoch: 0.0
      params: __ALL_PRUNABLE__

regularization_modifiers:
  - !SetWeightDecayModifier
      start_epoch: 0.0
      weight_decay: eval(weight_decay)
```


The `Modifiers` in the transfer learning recipe are the important items that encode how SparseML should modify the training process for Sparse Transfer Learning:
- `ConstantPruningModifier` tells SparseML to pin weights at 0 over all epochs, maintaining the sparsity structure of the network
- `QuantizationModifier` tells SparseML to quanitze the weights with quantization aware training over the last 5 epochs
- `DistillationModifier` tells SparseML how to apply distillation during the trainign process, targeting the logits

Below, SparseML's `Trainer` will parses the modifiers and updates the training process to implement the algorithms specified here.

## **Step 4: Setup Hugging Face Model Objects**

Next, we will set up the Hugging Face `tokenizer`, `config`, and `model`.

These are all native Hugging Face objects, so check out the Hugging Face docs for more details on `AutoModel`, `AutoConfig`, and `AutoTokenizer` as needed. 

We instantiate these classes by passing the local path to the directory containing the `pytorch_model.bin`, `tokenizer.json`, and `config.json` files from the SparseZoo download.

In [None]:
# initialize config, tokenizer
config = AutoConfig.from_pretrained(model_path, num_labels=NUM_LABELS)
tokenizer = AutoTokenizer.from_pretrained(model_path)

# initialize model using familiar HF AutoModel
model_kwargs = {"config": config}
model_kwargs["state_dict"], s_delayed = SparseAutoModel._loadable_state_dict(model_path)
model = AutoModelForSequenceClassification.from_pretrained(model_path,**model_kwargs,)
SparseAutoModel.log_model_load(model, model_path, "student", s_delayed) # prints metrics on sparsity profile

# FYI: there is a factory function called SparseAutoModel that does the same as above
# model, teacher = SparseAutoModel.text_classification_from_pretrained_distil(
#     model_name_or_path=model_path,
#     model_kwargs={"config":config},
# )

## **Step 5: Tokenize Dataset**

Run the tokenizer on the dataset. This is standard Hugging Face functionality.

In [None]:
MAX_LEN = 128
def preprocess_fn(examples):
  args = None
  if INPUT_COL_2 is None:
    args = (examples[INPUT_COL_1], )
  else:
    args = (examples[INPUT_COL_1], examples[INPUT_COL_2])
  result = tokenizer(*args, 
                   padding="max_length", 
                   max_length=min(tokenizer.model_max_length, MAX_LEN), 
                   truncation=True)
  return result

# tokenize the dataset
tokenized_dataset = dataset_from_json.map(
    preprocess_fn,
    batched=True,
    desc="Running tokenizer on dataset"
)

## **Step 6: Run Training**

SparseML has a custom `Trainer` class that inherits from the [Hugging Face `Trainer` Class](https://huggingface.co/docs/transformers/main_classes/trainer). As such, the SparseML `Trainer` has all of the existing functionality of the HF trainer. However, in addition, we can supply a `recipe` and (optionally) a `teacher`. 


As we saw above, the `recipe` encodes the sparsity related algorithms and hyperparameters of the training process in a YAML file. The SparseML `Trainer` parses the `recipe` and adjusts the training workflow to apply the algorithms in the recipe. We use the `recipe_args` function to modify the recipe slightly (training for more epochs than used for MNLI).

The `teacher` is an optional argument that instructs SparseML to apply model distillation to support the training process. We are not using a teacher here, so setting to `disable` turns off distillation.

In [None]:
# setup trainer arguments
training_args = TrainingArguments(
    output_dir="./training_output",
    do_train=True,
    do_eval=True,
    resume_from_checkpoint=False,
    evaluation_strategy="epoch",
    save_strategy="epoch",
    logging_strategy="epoch",
    save_total_limit=1,
    per_device_train_batch_size=8,
    per_device_eval_batch_size=32,
    gradient_accumulation_steps=4, 
    fp16=False)

# initialize trainer
trainer = Trainer(
    model=model,
    model_state_path=model_path,
    recipe=recipe_path,
    recipe_args='{"num_epochs": 15, "qat_start_epoch": 10.0, "observer_epoch": 14.0}',
    teacher="disable",
    metadata_args=["per_device_train_batch_size","per_device_eval_batch_size","fp16"],
    args=training_args,
    train_dataset=tokenized_dataset["train"],
    eval_dataset=tokenized_dataset["validation"],
    tokenizer=tokenizer,
    data_collator=default_data_collator,
    compute_metrics=compute_metrics)

In [None]:
# step 5: run training
train_result = trainer.train()
trainer.save_model()
trainer.save_state()
trainer.save_optimizer_and_scheduler(training_args.output_dir)

## Step 7: Export To ONNX

Run the following to export the model to ONNX. The script creates a `deployment` folder containing ONNX file and the necessary configuration files (e.g. `tokenizer.json`) for deployment with DeepSparse.

In [None]:
!sparseml.transformers.export_onnx \
  --model_path training_output \
  --task text_classification

# **Optional: Deploy with DeepSparse**

In [None]:
%pip install deepsparse

In [None]:
from deepsparse import Pipeline

pipeline = Pipeline.create("text_classification", model_path="./deployment")

In [None]:
prediction = pipeline(
    sequences=[
        [
            "A brown dog is attacking another animal in front of the tall man in pants",
            "A brown dog is attacking another animal in front of the man in pants"
        ]
    ]
)
print(prediction) # label 0 is an entailment

In [None]:
prediction = pipeline(
    sequences=[
        [
          "A person is riding the bicycle on one wheel",
          "There is no man in a black jacket doing tricks on a motorbike"
        ]
    ]
)
print(prediction) # label 1 is neutral

In [None]:
prediction = pipeline(
    sequences=[
        [
          "There is no man in a black jacket doing tricks on a motorbike",
          "A person in a black jacket is doing tricks on a motorbike"
        ]
    ]
)
print(prediction) # label 2 is a contradiction

In [None]:
prediction = pipeline(
    sequences=[
        ["A brown dog is attacking another animal in front of the tall man in pants","A brown dog is attacking another animal in front of the man in pants"],
        ["A person is riding the bicycle on one wheel","There is no man in a black jacket doing tricks on a motorbike"],
        ["There is no man in a black jacket doing tricks on a motorbike","A person in a black jacket is doing tricks on a motorbike"],
    ]
)
print(prediction)