# **Text Classification: Sparse Transfer Learning with the CLI**

In this example, you will fine-tune a 90% pruned BERT model onto some text classification datasets using SparseML's CLI.

### **Sparse Transfer Learning Overview**

Sparse Transfer Learning is very similiar to typical fine-tuning you are used to when training models. However, with Sparse Transfer Learning, we start the training process from a pre-sparsified checkpoint and maintain the sparsity structure while the fine tuning occurs. At the end, you will have a sparse model trained on your dataset, ready to be deployed with DeepSparse for GPU-class performance on CPUs!

### **Pre-Sparsified BERT**
SparseZoo, Neural Magic's open source repository of pre-sparsified models, contains a 90% pruned version of BERT, which has been sparsified on the upstream Wikipedia and BookCorpus datasets with the
masked language modeling objective. [Check out the model card](https://sparsezoo.neuralmagic.com/models/nlp%2Fmasked_language_modeling%2Fobert-base%2Fpytorch%2Fhuggingface%2Fwikipedia_bookcorpus%2Fpruned90-none). We will use this model as the starting point for the transfer learning process.


***Let's dive in!***

## **Installation**

Install SparseML via `pip`.



In [None]:
%pip uninstall torch torchvision -y
%pip install sparseml[torch]

If you are running on Google Colab, restart the runtime after this step.

In [None]:
!sparseml.transformers.text_classification --help

## **Sparse Transfer Learning with MNLI**

SparseML's CLI enables you to kick-off training workflows with various utilities like dataset loading, checkpoint saving, 
metric reporting, and logging handled for you. All we have to do is pass a `model_name_or_path` (the starting checkpoint), a `task` (the GLUE task to train on), and a `recipe` (a YAML file specifying the sparsity related parameters) and we are up and running. The `recipes` is critical for instructing the training script how to modify the training process with sparsity related algorithms. For Sparse Transfer Learning, we will use a `recipe` that instructs SparseML to maintain sparsity during the training process and to apply quantization over the final few epochs. 

### **Run Transfer Learning**

For MNLI, there is a pre-made transfer learning recipe available in [SparseZoo](https://sparsezoo.neuralmagic.com/models/nlp%2Ftext_classification%2Fobert-base%2Fpytorch%2Fhuggingface%2Fmnli%2Fpruned90_quant-none). As such, we kick off transfer learning with the following:

In [None]:
!sparseml.transformers.train.text_classification \
  --task_name mnli \
  --model_name_or_path zoo:nlp/masked_language_modeling/obert-base/pytorch/huggingface/wikipedia_bookcorpus/pruned90-none \
  --recipe zoo:nlp/text_classification/obert-base/pytorch/huggingface/mnli/pruned90_quant-none \
  --distill_teacher zoo:nlp/text_classification/obert-base/pytorch/huggingface/mnli/base-none \
  --output_dir sparse_quantized_bert-text_classification_mnli \
  --do_train --do_eval --max_seq_length 128 --evaluation_strategy epoch --logging_steps 1000 --save_steps 1000 \
  --per_device_train_batch_size 8 --per_device_eval_batch_size 32 --gradient_accumulation_steps 4 --preprocessing_num_workers 32 \
  --seed 5114

Let's discuss the key arguments:
- `--task_name mnli` instructs SparseML to download and fine-tune onto the MNLI dataset. You can pass any GLUE task as the task name and SparseML automatically downloads the dataset from the Hugging Face hub.

- `--model_name_or_path zoo:nlp/masked_language_modeling/obert-base/pytorch/huggingface/wikipedia_bookcorpus/pruned90-none` specifies the starting checkpoint for the fine tuning. Here, we passed a SparseZoo stub identifying the 90% pruned version of BERT trained with masked language modeling on the Wikipedia and BookCorpus datasets. SparseML downloads the checkpoint from the Zoo when the script starts.

- `--recipe zoo:nlp/text_classification/obert-base/pytorch/huggingface/mnli/pruned90_quant-none` specifies the recipe to be applied by SparseML. Here, we passed a SparseZoo stub identifying the transfer learning recipe for the MNLI dataset. SparseML downloads the recipe from the Zoo when the script starts. See below for the details of what this recipe looks like.

- `--distill_teacher zoo:nlp/text_classification/obert-base/pytorch/huggingface/mnli/base-none` is an optional argument that specifies a model to use for as a teacher to apply distillation during the training process. We passed a SparseZoo stub identifying a dense BERT model trained on MNLI. SparseML downloads the teacher from the Zoo when the script starts.

The script downloads the starting checkpoint, the teacher model, and transfer learning recipe from SparseZoo as well as the MNLI 
dataset and trains the model for 12 epochs, converging to ~82.5% accuracy on the validation set.

#### **Transfer Learning Recipe**

Here's what the transfer learning recipe for the MNLI dataset looks like.

The "Modifiers" are the important items that encode how SparseML should modify the training process for Sparse Transfer Learning:
- `ConstantPruningModifier` tells SparseML to pin weights at 0 over all epochs, maintaining the sparsity structure of the network
- `QuantizationModifier` tells SparseML to quanitze the weights with quantization aware training over the last 5 epochs
- `DistillationModifier` tells SparseML how to apply distillation to the model, including the layer and some hyperparameters

SparseML parses the modifiers and updates the training process to implement the algorithms and hyperparameters specified in the recipes.

In [None]:
from sparsezoo import Model
transfer_stub = "zoo:nlp/text_classification/obert-base/pytorch/huggingface/mnli/pruned90_quant-none"
download_dir = "./transfer_recipe-mnli"
zoo_model = Model(transfer_stub, download_path=download_dir)
recipe_path = zoo_model.recipes.default.path
print(recipe_path)

In [None]:
%cat ./transfer_recipe-mnli/recipe/recipe_original.md

### **Export to ONNX**

Once you have trained your model, export to ONNX in order to deploy with DeepSparse. The artifacts of the training process 
are saved to your local filesystem. 

Run the following to convert your PyTorch checkpoint to ONNX:

In [None]:
!sparseml.transformers.export_onnx \
  --model_path sparse_quantized_bert-text_classification_mnli \
  --task text_classification

The script above creates a `deployment` folder in your local directory, which has all of the files needed for deployment with DeepSparse including the `model.onnx`, `config.json`, and `tokenizer.json` files.

## **Sparse Transfer Learning with QQP**

SparseML's CLI enables you to kick-off training workflows with various utilities like dataset loading, checkpoint saving, 
metric reporting, and logging handled for you. All we have to do is pass a `model_name_or_path` (the starting checkpoint), a `task` (the GLUE task to train on), and a `recipe` (a YAML file specifying the sparsity related parameters) and we are up and running. The `recipes` is critical for instructing the training script how to modify the training process with sparsity related algorithms. For Sparse Transfer Learning, we will use a `recipe` that instructs SparseML to maintain sparsity during the training process and to apply quantization over the final few epochs. 

### **Run Transfer Learning**

For QQP, there is a pre-made transfer learning recipe available in [SparseZoo](https://sparsezoo.neuralmagic.com/models/nlp%2Ftext_classification%2Fobert-base%2Fpytorch%2Fhuggingface%2Fqqp%2Fpruned90_quant-none). As such, we kick off transfer learning with the following:

In [None]:
!sparseml.transformers.train.text_classification \
  --task_name qqp \
  --model_name_or_path zoo:nlp/masked_language_modeling/obert-base/pytorch/huggingface/wikipedia_bookcorpus/pruned90-none \
  --recipe zoo:nlp/text_classification/obert-base/pytorch/huggingface/qqp/pruned90_quant-none \
  --distill_teacher zoo:nlp/text_classification/obert-base/pytorch/huggingface/qqp/base-none \
  --output_dir obert_base_pruned90_quant_qqp \
  --do_train --do_eval --evaluation_strategy epoch --logging_steps 1000 --save_steps 1000 \
  --per_device_train_batch_size 32 --per_device_eval_batch_size 32 --preprocessing_num_workers 32 \
  --max_seq_length 128 \
  --seed 10194

Let's discuss the key arguments:
- `--task_name qqp` instructs SparseML to download and fine-tune onto the QQP dataset. You can pass any GLUE task as the task name and SparseML automatically downloads the dataset from the Hugging Face hub.

- `--model_name_or_path zoo:nlp/masked_language_modeling/obert-base/pytorch/huggingface/wikipedia_bookcorpus/pruned90-none` specifies the starting checkpoint for the fine tuning. Here, we passed a SparseZoo stub identifying the 90% pruned version of BERT trained with masked language modeling on the Wikipedia and BookCorpus datasets. SparseML downloads the checkpoint from the Zoo when the script starts.

- `--recipe zoo:nlp/text_classification/obert-base/pytorch/huggingface/qqp/pruned90_quant-none` specifies the recipe to be applied by SparseML. Here, we passed a SparseZoo stub identifying the transfer learning recipe for the QQP dataset. SparseML downloads the recipe from the Zoo when the script starts. See below for the details of what this recipe looks like.

- `--distill_teacher zoo:nlp/text_classification/obert-base/pytorch/huggingface/qqp/base-none` is an optional argument that specifies a model to use for as a teacher to apply distillation during the training process. We passed a SparseZoo stub identifying a dense BERT model trained on QQP. SparseML downloads the teacher from the Zoo when the script starts.

The script downloads the starting checkpoint, the teacher model, and transfer learning recipe from SparseZoo as well as the QQP 
dataset and trains the model for 3 epochs, converging to ~91% accuracy on the validation set.

#### **Transfer Learning Recipe**

Here's what the transfer learning recipe for the QQP dataset looks like.

The "Modifiers" are the important items that encode how SparseML should modify the training process for Sparse Transfer Learning:
- `ConstantPruningModifier` tells SparseML to pin weights at 0 over all epochs, maintaining the sparsity structure of the network
- `QuantizationModifier` tells SparseML to quanitze the weights with quantization aware training over the last 5 epochs
- `DistillationModifier` tells SparseML how to apply distillation to the model, including the layer and some hyperparameters

SparseML parses the modifiers and updates the training process to implement the algorithms and hyperparameters specified in the recipes.

In [None]:
from sparsezoo import Model
transfer_stub = "zoo:nlp/text_classification/obert-base/pytorch/huggingface/qqp/pruned90_quant-none"
download_dir = "./transfer_recipe-qqp"
zoo_model = Model(transfer_stub, download_path=download_dir)
recipe_path = zoo_model.recipes.default.path
print(recipe_path)

In [None]:
%cat ./transfer_recipe-qqp/recipe/recipe_original.md

### **Export to ONNX**

Once you have trained your model, export to ONNX in order to deploy with DeepSparse. The artifacts of the training process 
are saved to your local filesystem. 

Run the following to convert your PyTorch checkpoint to ONNX:

In [None]:
!sparseml.transformers.export_onnx \
  --model_path obert_base_pruned90_quant_qqp \
  --task text_classification

The script above creates a `deployment` folder in your local directory, which has all of the files needed for deployment with DeepSparse including the `model.onnx`, `config.json`, and `tokenizer.json` files.

## **Sparse Transfer Learning with a Custom Dataset: Single Input Classification**

Beyond the built-in GLUE tasks, we can also use datasets for single input multi-class classification problems. The datasets can either be passed as Hugging Face hub model identifiers or via local CSV/JSON files.

Let's try to transfer onto the [TweetEval Emotion Dataset](https://huggingface.co/datasets/tweet_eval). This dataset 
contains single sentences with 4 labels representing the emotion of the tweet (`0=anger, 1=joy, 2=optimism, 3=sadness`.

For simplicity, we will perform the fine-tuning without distillation. Although the transfer learning recipe contains distillation modifiers, we can turn them off by setting `--distill_teacher disable`.

### **Using a Hugging Face Dataset**

Let's walk through how to pass a Hugging Face dataset identifier to the CLI.

#### **Inspecting TweetEval Dataset**

In [None]:
from datasets import load_dataset
from pprint import pprint

emotion = load_dataset("tweet_eval", "emotion")
print(emotion)
pprint(emotion["train"][0])

#### **Kick off Training**

To configure the training script to use the emotion dataset, we replaced the `--task_name` argument with `--dataset_name tweet_eval --dataset_config_name emotion --input_column_names text --label_column_name label`. Since the TweetEval dataset contains multiple subsets (e.g. there is a subset
that classifies text into an emoji), we pass the `--dataset_config_name` to specify the subset.


Run the following to kick of the training process:

In [None]:
!sparseml.transformers.train.text_classification \
  --model_name_or_path zoo:nlp/masked_language_modeling/obert-base/pytorch/huggingface/wikipedia_bookcorpus/pruned90-none \
  --recipe zoo:nlp/sentiment_analysis/obert-base/pytorch/huggingface/sst2/pruned90_quant-none \
  --recipe_args '{"num_epochs":12,"qat_start_epoch":7.0, "observer_epoch": 11.0}' \
  --distill_teacher disable \
  --dataset_name tweet_eval --dataset_config_name emotion \
  --input_column_names "text" --label_column_name "label" \
  --output_dir sparse_quantized_bert-text_classification_tweet_eval_emotion \
  --do_train --do_eval --max_seq_length 128 --evaluation_strategy epoch --logging_steps 1000 --save_steps 1000 \
  --per_device_train_batch_size 8 --per_device_eval_batch_size 32 --gradient_accumulation_steps 4 --preprocessing_num_workers 32 \
  --seed 5114

You will notice that we used the same recipe as we did in the SST2 case (identified by the SparseZoo stub `zoo:nlp/sentiment_analysis/obert-base/pytorch/huggingface/sst2/pruned90_quant-none`). 

Since the TweetEval Emotion dataset is a single sentence multi-class classification problem, we used the transfer learning recipe from the sentiment analysis task (a single sentence binary classification problem) as the starting point.



To update a recipe, you can download the YAML file from SparseZoo, make updates to the YAML directly, and pass the local path to SparseML. In this case, we used `--recipe_args '{"num_epochs":12,"qat_start_epoch":7.0, "observer_epoch": 11.0}'` to modify a recipe on the fly, updating to only run for 11 epochs.

### **Using Local CSV/JSON Files**

Let's walk through how to pass a CSV/JSON dataset to the CLI.

#### **Save Dataset as a CSV File**

For this example, we use Hugging Face `datasets` to create a CSV file for Tweet Eval dataset that can be passed to SparseML's CLI but you can use any framework you want to create the CSV.

In [None]:
from datasets import load_dataset
from pprint import pprint

emotion = load_dataset("tweet_eval", "emotion")
print(emotion)
pprint(emotion["train"][0])
emotion["train"].to_csv("./emotion-train.csv")
emotion["validation"].to_csv("./emotion-validation.csv")

We can see that the data is a CSV file with text and label as the columns:

In [None]:
!head ./emotion-train.csv --lines=10

In [None]:
!head ./emotion-validation.csv --lines=10

#### **Kick off Training**

To use the local files with the CLI, pass `--train_file ./emotion-train.csv --validation_file ./emotion-validation.csv  --input_column_names text --label_column_name label`.

Run the following:

In [None]:
!sparseml.transformers.text_classification \
  --model_name_or_path zoo:nlp/masked_language_modeling/obert-base/pytorch/huggingface/wikipedia_bookcorpus/pruned90-none \
  --recipe zoo:nlp/sentiment_analysis/obert-base/pytorch/huggingface/sst2/pruned90_quant-none \
  --recipe_args '{"num_epochs":12,"qat_start_epoch":7.0, "observer_epoch": 11.0}' \
  --distill_teacher disable \
  --train_file ./emotion-train.csv --validation_file ./emotion-validation.csv \
  --input_column_names "text" --label_column_name "label" \
  --output_dir sparse_quantized_bert-text_classification_tweet_eval_emotion-csv \
  --do_train --do_eval --max_seq_length 128 --evaluation_strategy epoch --logging_steps 1000 --save_steps 1000 \
  --per_device_train_batch_size 8 --per_device_eval_batch_size 32 --gradient_accumulation_steps 4 --preprocessing_num_workers 32 \
  --seed 5114

## **Sparse Transfer Learning with a Custom Dataset: Multiple Input Classification**

Beyond the built-in GLUE tasks, we can also use datasets for multiple input multi-class classification problems. The datasets can either be passed as Hugging Face hub model identifiers or via local CSV/JSON files.

Let's try to transfer onto the [SICK (Sentences Involving Compositional Knowldedge) Dataset](https://huggingface.co/datasets/sick),
which includes 10,000 pairs of sentences with entailment relationships (`0=entailment, 1=neural, 2=contradiction`).

For simplicity, we will perform the fine-tuning without distillation. Although the transfer learning recipe contains distillation modifiers, we can turn them off by setting `--distill_teacher disable`.

### **Using a Hugging Face Dataset**

Let's walk through how to pass a Hugging Face dataset identifier to the CLI.

#### **Inspecting SICK Dataset**

`Sentence_A` and `Sentence_B` are the input columns. `label` is the label column.

In [None]:
from datasets import load_dataset
from pprint import pprint

sick = load_dataset("sick")
print(sick)
pprint(sick["train"][0])

#### **Kick off Training**

To configure the training script to use the SICK dataset, we replaced the `--task_name` argument with `--dataset_name sick --input_column_names 'sentence_A,sentence_B' --label_column_name label`.


Run the following to kick of the training process:

In [None]:
!sparseml.transformers.text_classification \
  --model_name_or_path zoo:nlp/masked_language_modeling/obert-base/pytorch/huggingface/wikipedia_bookcorpus/pruned90-none \
  --recipe zoo:nlp/text_classification/obert-base/pytorch/huggingface/mnli/pruned90_quant-none \
  --distill_teacher disable \
  --dataset_name sick --input_column_names 'sentence_A,sentence_B' --label_column_name 'label' \
  --output_dir sparse_quantized_bert-text_classification_sick \
  --do_train --do_eval --max_seq_length 128 --evaluation_strategy epoch --logging_steps 1000 --save_steps 1000 \
  --per_device_train_batch_size 8 --per_device_eval_batch_size 32 --gradient_accumulation_steps 4 --preprocessing_num_workers 32 \
  --seed 5114

You will notice that we used the same recipe as we did in the MNLI case (identified by the SparseZoo stub `zoo:nlp/text_classification/obert-base/pytorch/huggingface/mnli/pruned90_quant-none`). 

Since the MNLI dataset is a multi sentence multi-class classification problem (similiarly, it is an entailment problem), we used the transfer learning recipe from the sentiment analysis task (a single sentence binary classification problem) as the starting point.

To update a recipe, you can download the YAML file from SparseZoo, make updates to the YAML directly, and pass the local path to SparseML. Alternative, you can use `--recipe_args` to update on the fly.

### **Using Local CSV/JSON Files**

Let's walk through how to pass a CSV/JSON dataset to the CLI.

#### **Save Dataset as a CSV File**

For this example, we use Hugging Face `datasets` to create a CSV file for SICK dataset that can be passed to SparseML's CLI but you can use any framework you want to create the CSV.

In [None]:
from datasets import load_dataset

sick = load_dataset("sick")
print(sick)
print(sick["train"][0])
sick["train"].to_csv("./sick-train.csv")
sick["validation"].to_csv("./sick-validation.csv")

We can see that the data is a CSV file with `sentence_A` and `sentence_B` as the input columns and `label` as the label column:

In [None]:
!head ./sick-train.csv --lines=10

In [None]:
!head ./sick-validation.csv --lines=10

#### **Kick off Training**

To use the local files with the CLI, pass `--train_file ./sick-train.csv --validation_file ./sick-validation.csv  --input_column_names 'sentence_A,sentence_B' --label_column_name label`.

Run the following:

In [None]:
!sparseml.transformers.train.text_classification \
  --model_name_or_path zoo:nlp/masked_language_modeling/obert-base/pytorch/huggingface/wikipedia_bookcorpus/pruned90-none \
  --recipe zoo:nlp/text_classification/obert-base/pytorch/huggingface/mnli/pruned90_quant-none \
  --distill_teacher disable \
  --train_file ./sick-train.csv --validation_file ./sick-validation.csv --input_column_names 'sentence_A,sentence_B' --label_column_name 'label' \
  --output_dir sparse_quantized_bert-text_classification_sick-csv \
  --do_train --do_eval --max_seq_length 128 --evaluation_strategy epoch --logging_steps 1000 --save_steps 1000 \
  --per_device_train_batch_size 8 --per_device_eval_batch_size 32 --gradient_accumulation_steps 4 --preprocessing_num_workers 32 \
  --seed 5114