# **Sentiment Analysis: Sparse Transfer Learning with the Python API**

In this example, you will fine-tune a 90% pruned BERT model onto the SST2 dataset using SparseML's Hugging Face Integration.

### **Sparse Transfer Learning Overview**

Sparse Transfer Learning is very similiar to typical fine-tuning you are used to when training models. However, with Sparse Transfer Learning, we start the training process from a pre-sparsified checkpoint and maintain the sparsity structure while the fine tuning occurs. At the end, you will have a sparse model trained on your dataset, ready to be deployed with DeepSparse for GPU-class performance on CPUs!

### **Pre-Sparsified BERT**
SparseZoo, Neural Magic's open source repository of pre-sparsified models, contains a 90% pruned version of BERT, which has been sparsified on the upstream Wikipedia and BookCorpus datasets with the
masked language modeling objective. [Check out the model card](https://sparsezoo.neuralmagic.com/models/nlp%2Fmasked_language_modeling%2Fobert-base%2Fpytorch%2Fhuggingface%2Fwikipedia_bookcorpus%2Fpruned90-none). We will use this model as the starting point for the transfer learning process.


***Let's dive in!***

## **Installation**

Install SparseML via `pip`.



In [None]:
%pip uninstall torch -y
%pip install sparseml[torch]

If you are running on Google Colab, restart the runtime after this step.

In [None]:
import sparseml
from sparsezoo import Model
from sparseml.transformers.utils import SparseAutoModel
from sparseml.transformers.sparsification import Trainer, TrainingArguments
import numpy as np
from transformers import (
    AutoModelForSequenceClassification,
    AutoConfig, 
    AutoTokenizer, 
    EvalPrediction, 
    default_data_collator
)
from datasets import load_dataset, load_metric

## **Step 1: Load a Dataset**

SparseML is integrated with Hugging Face, so we can use the `datasets` class to load datasets from the Hugging Face hub or from local files. 

[SST2 Dataset Card](https://huggingface.co/datasets/glue/viewer/sst2/test)

In [45]:
# load dataset natively
dataset = load_dataset("glue", "sst2")



  0%|          | 0/3 [00:00<?, ?it/s]

In [46]:
print(dataset)
print(dataset["train"][0])

DatasetDict({
    train: Dataset({
        features: ['sentence', 'label', 'idx'],
        num_rows: 67349
    })
    validation: Dataset({
        features: ['sentence', 'label', 'idx'],
        num_rows: 872
    })
    test: Dataset({
        features: ['sentence', 'label', 'idx'],
        num_rows: 1821
    })
})
{'sentence': 'hide new secretions from the parental units ', 'label': 0, 'idx': 0}


In [None]:
# alternatively, save to save to csv and reload as example
dataset["train"].to_csv("sst2-train.csv")
dataset["validation"].to_csv("sst2-validation.csv")

data_files = {
  "train": "sst2-train.csv",
  "validation": "sst2-validation.csv"
}
dataset = load_dataset("csv", data_files=data_files)

In [44]:
print(dataset)
print(dataset["train"][0])

DatasetDict({
    train: Dataset({
        features: ['Unnamed: 0', 'sentence', 'label', 'idx'],
        num_rows: 67349
    })
    validation: Dataset({
        features: ['Unnamed: 0', 'sentence', 'label', 'idx'],
        num_rows: 872
    })
})
{'Unnamed: 0': 0, 'sentence': 'hide new secretions from the parental units ', 'label': 0, 'idx': 0}


In [4]:
!head sst2-train.csv --lines=5

,sentence,label,idx
0,hide new secretions from the parental units ,0,0
1,"contains no wit , only labored gags ",0,1
2,that loves its characters and communicates something rather beautiful about human nature ,1,2
3,remains utterly satisfied to remain the same throughout ,0,3


In [5]:
!head sst2-validation.csv --lines=5

,sentence,label,idx
0,it 's a charming and often affecting journey . ,1,0
1,unflinchingly bleak and desperate ,0,1
2,allows us to hope that nolan is poised to embark a major career as a commercial yet inventive filmmaker . ,1,2
3,"the acting , costumes , music , cinematography and sound are all astounding given the production 's austere locales . ",1,3


## **Step 2: Setup Evaluation Metric**

Sentiment analysis is a simple binary classification problem. We will use the accuracy function as the evaluation metric. We can use the native Hugging Face `compute_metrics` function (which will be passed to the `Trainer` class below).

In [6]:
metric = load_metric("glue", "sst2")

def compute_metrics(p: EvalPrediction):
  preds = p.predictions[0] if isinstance(p.predictions, tuple) else p.predictions
  preds = np.argmax(preds, axis=1)
  result = metric.compute(predictions=preds, references=p.label_ids)
  if len(result) > 1:
      result["combined_score"] = np.mean(list(result.values())).item()
  return result

Downloading:   0%|          | 0.00/1.84k [00:00<?, ?B/s]

## **Step 3: Download Files for Sparse Transfer Learning**

First, we need to select a sparse checkpoint to begin the training process. In this case, we will fine-tune a 90% pruned version of BERT onto the SST2 dataset. This model is available in SparseZoo, identified by the following stub:
```
zoo:nlp/masked_language_modeling/obert-base/pytorch/huggingface/wikipedia_bookcorpus/pruned90-none
```

Next, we also need to create/select a sparsification recipe for usage in the training process. Recipes are YAML files that encode the sparsity related algorithms and parameters to be applied by SparseML. For Sparse Transfer Learning, we use a recipe that instructs SparseML to maintain sparsity during the training process and to apply quantization over the final few epochs.In the case of SST2, there is a transfer learning recipe available in the SparseZoo, identified by the following stub:
```
zoo:nlp/sentiment_analysis/obert-base/pytorch/huggingface/sst2/pruned90_quant-none
```

Finally, SparseML has the optional ability to apply model distillation from a teacher model during the transfer learning process to boost accuracy. In this case, we will use a dense version of BERT trained on the SST2 dataset which is hosted in SparseZoo. This model is identified by the following stub:

```
zoo:nlp/sentiment_analysis/obert-base/pytorch/huggingface/sst2/base-none
```

Use the `sparsezoo` python client to download the models and recipe using their SparseZoo stubs.

In [7]:
# downloads pruned-BERT model
model_stub = "zoo:nlp/masked_language_modeling/obert-base/pytorch/huggingface/wikipedia_bookcorpus/pruned90-none" 
download_dir = "./model"
zoo_model = Model(model_stub, download_path=download_dir)
model_path = zoo_model.training.path 

print(model_path)

downloading...:   0%|          | 0.00/2.36k [00:00<?, ?B/s]

downloading...:   0%|          | 0.00/269k [00:00<?, ?B/s]

downloading...:   0%|          | 0.00/384 [00:00<?, ?B/s]

downloading...:   0%|          | 0.00/123 [00:00<?, ?B/s]

downloading...:   0%|          | 0.00/226k [00:00<?, ?B/s]

downloading...:   0%|          | 0.00/455k [00:00<?, ?B/s]

downloading...:   0%|          | 0.00/656 [00:00<?, ?B/s]

downloading...:   0%|          | 0.00/418M [00:00<?, ?B/s]

downloading...:   0%|          | 0.00/112 [00:00<?, ?B/s]

downloading...:   0%|          | 0.00/123 [00:00<?, ?B/s]

./model/training


In [9]:
%ls ./model/training

all_results.json   special_tokens_map.json  trainer_state.json  vocab.txt
config.json        tokenizer_config.json    training_args.bin
pytorch_model.bin  tokenizer.json           train_results.json


In [8]:
# downloads transfer learning recipe
transfer_stub = "zoo:nlp/sentiment_analysis/obert-base/pytorch/huggingface/sst2/pruned90_quant-none"
download_dir = "./transfer_recipe"
zoo_model = Model(transfer_stub, download_path=download_dir)
recipe_path = zoo_model.recipes.default.path

print(recipe_path)

downloading...:   0%|          | 0.00/4.17k [00:00<?, ?B/s]

./transfer_recipe/recipe/recipe_original.md


In [10]:
%ls ./transfer_recipe/recipe/recipe_original.md

./transfer_recipe/recipe/recipe_original.md


In [11]:
# downloads teacher
teacher_stub = "zoo:nlp/sentiment_analysis/obert-base/pytorch/huggingface/sst2/base-none"
download_dir = "./teacher"
zoo_model = Model(teacher_stub, download_path=download_dir)
teacher_path = zoo_model.training.path 

downloading...:   0%|          | 0.00/3.17k [00:00<?, ?B/s]

downloading...:   0%|          | 0.00/228 [00:00<?, ?B/s]

downloading...:   0%|          | 0.00/418M [00:00<?, ?B/s]

downloading...:   0%|          | 0.00/226k [00:00<?, ?B/s]

downloading...:   0%|          | 0.00/422 [00:00<?, ?B/s]

downloading...:   0%|          | 0.00/695k [00:00<?, ?B/s]

downloading...:   0%|          | 0.00/404 [00:00<?, ?B/s]

downloading...:   0%|          | 0.00/196 [00:00<?, ?B/s]

downloading...:   0%|          | 0.00/8.21k [00:00<?, ?B/s]

downloading...:   0%|          | 0.00/112 [00:00<?, ?B/s]

downloading...:   0%|          | 0.00/934 [00:00<?, ?B/s]

In [12]:
%ls ./teacher/training

all_results.json   special_tokens_map.json  training_args.bin
config.json        tokenizer_config.json    train_results.json
eval_results.json  tokenizer.json           vocab.txt
pytorch_model.bin  trainer_state.json


## **Step 4: Setup Hugging Face Model Objects**

Next, we will set up the Hugging Face `tokenizer, config, and model`. These are all native Hugging Face objects, so check out the Hugging Face docs for more details on `AutoModel`, `AutoConfig`, and `AutoTokenizer` as needed. We instantiate these classes by passing the local path to the directory containing the `pytorch_model.bin`, `tokenizer.json`, and `config.json` files from the SparseZoo download.

In [None]:
# shared tokenizer between teacher and student
# see examples for how to use models with different tokenizers
tokenizer = AutoTokenizer.from_pretrained(model_path)

# setup configs
model_config = AutoConfig.from_pretrained(model_path, num_labels=2)
teacher_config = AutoConfig.from_pretrained(teacher_path, num_labels=2)

model_kwargs = {"config": model_config}
model_kwargs["state_dict"], s_delayed = SparseAutoModel._loadable_state_dict(model_path)
model = AutoModelForSequenceClassification.from_pretrained(model_path, **model_kwargs,)

teacher_kwargs = {'config':teacher_config}
teacher_kwargs["state_dict"], t_delayed = SparseAutoModel._loadable_state_dict(teacher_path)
teacher = AutoModelForSequenceClassification.from_pretrained(teacher_path, **teacher_kwargs,)

# optional - prints metrics about sparsity profiles of the models
SparseAutoModel.log_model_load(model, model_path, "student", s_delayed)
SparseAutoModel.log_model_load(teacher, teacher_path, "teacher", t_delayed)

In [None]:
# factory function does the same as above
# model, teacher = SparseAutoModel.text_classification_from_pretrained_distil(
#     model_name_or_path=model_path,
#     model_kwargs={"config":model_config},
#     teacher_name_or_path=teacher_path,
#     teacher_kwargs={},
# )

## **Step 5: Tokenize Dataset**

Run the tokenizer on the dataset.

In [37]:
print(dataset["train"].features)

{'Unnamed: 0': Value(dtype='int64', id=None), 'sentence': Value(dtype='string', id=None), 'label': Value(dtype='int64', id=None), 'idx': Value(dtype='int64', id=None)}


In [30]:
print(dataset)

# setup dataset configuration
INPUT_COL_1 = "sentence"
INPUT_COL_2 = None
LABEL_COL = "label"
NUM_LABELS = len(dataset["train"].unique(LABEL_COL))

DatasetDict({
    train: Dataset({
        features: ['Unnamed: 0', 'sentence', 'label', 'idx'],
        num_rows: 67349
    })
    validation: Dataset({
        features: ['Unnamed: 0', 'sentence', 'label', 'idx'],
        num_rows: 872
    })
})


In [39]:
MAX_LEN = 128
def preprocess_fn(examples):
  args = None
  if INPUT_COL_2 is None:
    args = (examples[INPUT_COL_1], )
  else:
    args = (examples[INPUT_COL_1], examples[INPUT_COL_2])
  result = tokenizer(*args, 
                   padding="max_length", 
                   max_length=min(tokenizer.model_max_length, MAX_LEN), 
                   truncation=True)
  return result

tokenized_dataset = dataset.map(
    preprocess_fn,
    batched=True,
    desc="Running tokenizer on dataset"
)

Running tokenizer on dataset:   0%|          | 0/68 [00:00<?, ?ba/s]

Running tokenizer on dataset:   0%|          | 0/1 [00:00<?, ?ba/s]

## **Step 6: Run Training**

SparseML has a custom `Trainer` class that inherits from the [Hugging Face `Trainer` Class](https://huggingface.co/docs/transformers/main_classes/trainer). As such, the SparseML `Trainer` has all of the existing functionality of the HF trainer. However, in addition, we can supply a `recipe` and (optionally) a `teacher`. 


As we saw above, the `recipe` encodes the sparsity related algorithms and hyperparameters of the training process in a YAML file. The SparseML `Trainer` parses the `recipe` and adjusts the training workflow to apply the algorithms in the recipe.

The `teacher` is an optional argument that instructs SparseML to apply model distillation to support the training process.

In [None]:
training_args = TrainingArguments(
    output_dir="./training_output",
    do_train=True,
    do_eval=True,
    resume_from_checkpoint=False,
    evaluation_strategy="epoch",
    save_strategy="epoch",
    logging_strategy="epoch",
    save_total_limit=1,
    per_device_train_batch_size=32,
    per_device_eval_batch_size=32,
    fp16=True)

trainer = Trainer(
    model=model,
    model_state_path=model_path,
    recipe=recipe_path,
    teacher=teacher,
    metadata_args=["per_device_train_batch_size","per_device_eval_batch_size","fp16"],
    args=training_args,
    train_dataset=tokenized_dataset["train"],
    eval_dataset=tokenized_dataset["validation"],
    tokenizer=tokenizer,
    data_collator=default_data_collator,
    compute_metrics=compute_metrics)

In [None]:
train_result = trainer.train(resume_from_checkpoint=False)
trainer.save_model()  # Saves the tokenizer too for easy upload
trainer.save_state()
trainer.save_optimizer_and_scheduler(training_args.output_dir)

In [None]:
!sparseml.transformers.export_onnx \
  --model_path training_output \
  --task sentiment_analysis

  warn(f"Failed to load image Python extension: {e}")
2023-02-23 22:52:29.490758: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2023-02-23 22:52:29.490906: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2023-02-23 22:52:30 sparseml.transformers.export INFO     Attempting onnx export for model at training_output/checkpoint-630 for task sentiment-analysis
INFO:sparseml.transformers.export:Attempting onnx export for model at training_output/checkpoint-630 for task sentiment-analysis
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at trai