
A notebook using Neural Magic's [SparseML](https://github.com/neuralmagic/sparseml) library to convert a dense Hugging Face model into a light and super fast sparsified model! This notebook has the potential to unlock 1000's of dense models previously fine-tuned and uploaded on Hugging Face's Model Hub. 🚀🚀🚀

<br>

To learn more about sparse transfer learning, check out the docs [here](https://docs.neuralmagic.com/get-started/transfer-a-sparsified-model/nlp-text-classification).

<br>

This notebook allows devs to:
*   Download a dense BERT base uncased model from the Hugging Face Model Hub.
*   Distill this dense model (teacher) onto a sparse pre-trained transformer (sparse student) by using the [emotion dataset](https://huggingface.co/datasets/emotion).
*   Export our model to ONNX format.
*   Run inference using the super fast DeepSparse Engine.

There only a few requirements necessary to allow for the sparse-transfer of select Hugging Face models:
*   Does SparseML currently support the NLP task?
*   Does SparseML currently support the model architecture?
*   Is the dataset of interest available for download/training?

<br>

---

<br>

In the example below, we'll sparse-transfer a dense BERT base uncased onto a 6 layered pruned quantized BERT from the Neural Magic [SparseZoo](https://sparsezoo.neuralmagic.com/?domain=nlp&sub_domain=masked_language_modeling&page=1). We'll use a [dense BERT](https://huggingface.co/nateraw/bert-base-uncased-emotion?text=I+like+you.+I+love+you) previously fine-tuned on the emotion dataset, which is a multi-class classification task.

Emotion is a dataset of English Twitter messages with six basic emotions: `anger`, `fear`, `joy`, `love`, `sadness`, and `surprise`.

In [None]:
!nvidia-smi # double check you're in a GPU runtime

In [None]:
!pip install transformers

Now let's download a dense BERT base uncased that was previously fine-tuned on the emotion dataset and save the tokenizer and weights to a folder called `dense model`.

In [None]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("nateraw/bert-base-uncased-emotion")
model = AutoModelForSequenceClassification.from_pretrained("nateraw/bert-base-uncased-emotion")
tokenizer.save_pretrained("/content/dense_model")
model.save_pretrained("/content/dense_model")

Let's now install the sparseml library to begin the sparse transfer from the dense model to a sparse model. After install, you may need to `restart runtime`, and re-run the cell block to make sure dependencies were installed correctly.

In [None]:
!pip install sparseml[torch]

Run the following the CLI command to initiate the sparse transfer learning.The dense model, as seen in the `distill_teacher` argument, will transfer its knowledge onto a 6 layer pruned quantized bert base student model, as seen in the `model_name_or_path` argument. When doing sparse transferring, your dense model's architecture needs to match the architecture of the student model, which in this case is bert base uncased.

The modifiers required to do this transfer can be found in the `recipe`. To learn mmore about recipes and its modifiers, you can read more in the [docs](https://docs.neuralmagic.com/user-guide/recipes).

Unlike the training parameters of traditional fine-tuning, the parameters when conducting sparse-transfer learning are a lot more sensitive. And a good heuristic to start with is learning to calibrate the most critical parameters during the training: the `initial learning rate` and `number of epochs`. These parameters are hard coded in the recipe, however, to speed things up for this example, we've already tinkered with various values for these two parameters and have overridden the recipe with custom values found in the `recipe_args` argument. Most likely, for any model you do sparse transfer learning, these values will be overriden during your tinkering process.

The following command will give you a sparse model with an accuracy close to ~92.5% on the validation dataset with a total training time of ~47 mins with a T4 GPU.

In [None]:
!sparseml.transformers.train.text_classification \
  --output_dir sparse_model \
	--model_name_or_path zoo:nlp/masked_language_modeling/bert-base/pytorch/huggingface/wikipedia_bookcorpus/6layer_pruned80_quant-none-vnni \
	--distill_teacher ./dense_model/ \
	--recipe zoo:nlp/masked_language_modeling/bert-base/pytorch/huggingface/wikipedia_bookcorpus/pruned80_quant-none-vnni?recipe_type=transfer-text_classification \
  --dataset_name "emotion" \
  --recipe_args '{"num_epochs":9, "init_lr":0.000057}' \
	--do_train \
	--do_eval \
  --eval_steps 200 \
	--max_seq_length 128 \
	--evaluation_strategy steps \
	--per_device_train_batch_size 32 \
	--per_device_eval_batch_size 32 \
	--preprocessing_num_workers 8 \
	--fp16 \
	--seed 42 \
	--save_strategy steps \
	--save_steps 200 \
	--save_total_limit 3 \
	--overwrite_output_dir \
	--load_best_model_at_end

Now that we have a sparse model, we can now export its PyTorch weights into ONNX format with the following command:

In [None]:
!sparseml.transformers.export_onnx --model_path sparse_model --task 'text_classification' --sequence_length 128

Le'ts do the same to the dense model:

In [None]:
!sparseml.transformers.export_onnx --model_path dense_model --task 'text_classification' --sequence_length 128

Let's now install [DeepSparse](https://github.com/neuralmagic/deepsparse) and benchmark these two models on the colab's single CPU and compare their speeds!

In [None]:
!pip install deepsparse

In [None]:
!deepsparse.benchmark dense_model/model.onnx --batch_size 1

In [None]:
!deepsparse.benchmark sparse_model/model.onnx --batch_size 1

Pretty incredible, the dense model gives us a latency of 378 ms while the new sparse model is gives us a latency of 46 ms, an 8X speedup!! 🤯🤯🤯

<br>

For more resources, you can always give [SparseML](https://github.com/neuralmagic/sparseml) and [DeepSparse](https://github.com/neuralmagic/deepsparse) a ⭐, and let us know what you think on our [slack community channel](https://join.slack.com/t/discuss-neuralmagic/shared_invite/zt-q1a1cnvo-YBoICSIw3L1dmQpjBeDurQ)!