# Run this notebook on CUDA and with GPUs as it involves Finetuning and Transfer Learning

In [7]:
# Note: CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs).

## Sparse Transfer

In [2]:
# !pip install sparseml
# !pip install sparseml[transformers]

### create a teacher for the desired text classification dataset, we will fine-tune a dense BERT model from the SparseZoo

## NLP

In [1]:
# models are saved locally

In [3]:
!sparseml.transformers.text_classification \
    --output_dir models/teacher \
    --model_name_or_path "zoo:nlp/masked_language_modeling/bert-base/pytorch/huggingface/wikipedia_bookcorpus/base-none" \
    --recipe "zoo:nlp/masked_language_modeling/bert-base/pytorch/huggingface/wikipedia_bookcorpus/base-none?recipe_type=transfer-text_classification" \
    --recipe_args '{"init_lr":0.00003}' \
    --task_name sst2 \
    --max_seq_length 128 \
    --per_device_train_batch_size 32 --per_device_eval_batch_size 32 \
    --do_train --do_eval --evaluation_strategy epoch --fp16  \
    --save_strategy epoch --save_total_limit 1

Downloading (…)ng/pytorch_model.bin: 100%|███| 418M/418M [00:06<00:00, 69.1MB/s]
Downloading (…)g/train_results.json: 100%|██████| 462/462 [00:00<00:00, 206kB/s]
Downloading (…)training/config.json: 100%|██████| 605/605 [00:00<00:00, 285kB/s]
Downloading (…)okenizer_config.json: 100%|██████| 285/285 [00:00<00:00, 130kB/s]
Downloading (…)ining/tokenizer.json: 100%|███| 455k/455k [00:00<00:00, 12.1MB/s]
Downloading (…)cial_tokens_map.json: 100%|█████| 112/112 [00:00<00:00, 49.2kB/s]
Downloading (…)g/trainer_state.json: 100%|█| 91.9k/91.9k [00:00<00:00, 6.11MB/s]
Downloading (…)ng/eval_results.json: 100%|██████| 355/355 [00:00<00:00, 145kB/s]
Downloading (…)e/training/vocab.txt: 100%|███| 226k/226k [00:00<00:00, 6.60MB/s]
Downloading (…)ng/training_args.bin: 100%|██| 2.30k/2.30k [00:00<00:00, 760kB/s]
Downloading (…)ing/all_results.json: 100%|██████| 796/796 [00:00<00:00, 334kB/s]
Downloading readme: 100%|██████████████████| 31.9k/31.9k [00:00<00:00, 34.5MB/s]
Downloading data files:   0%

### With the teacher model trained, it is ready to be distilled into a sparsified student model.

In [6]:
!sparseml.transformers.text_classification \
    --output_dir models/sparsified \
    --model_name_or_path "zoo:nlp/masked_language_modeling/distilbert-none/pytorch/huggingface/wikipedia_bookcorpus/pruned80_quant-none-vnni" \
    --recipe "zoo:nlp/sentiment_analysis/distilbert-none/pytorch/huggingface/sst2/pruned80_quant-none-vnni" \
    --distill_teacher models/teacher \
    --task_name sst2 \
    --max_seq_length 128 \
    --per_device_train_batch_size 32 --per_device_eval_batch_size 32 \
    --do_train --do_eval --evaluation_strategy epoch --fp16 \
    --save_strategy epoch --save_total_limit 1

Downloading (…)ng/training_args.bin: 100%|█| 3.23k/3.23k [00:00<00:00, 1.57MB/s]
Downloading (…)okenizer_config.json: 100%|██████| 421/421 [00:00<00:00, 190kB/s]
Downloading (…)d/training/vocab.txt: 100%|███| 226k/226k [00:00<00:00, 4.52MB/s]
Downloading (…)ining/tokenizer.json: 100%|███| 695k/695k [00:00<00:00, 10.4MB/s]
Downloading (…)cial_tokens_map.json: 100%|█████| 112/112 [00:00<00:00, 51.9kB/s]
Downloading (…)ng/pytorch_model.bin: 100%|███| 256M/256M [00:03<00:00, 70.0MB/s]
Downloading (…)ng/eval_results.json: 100%|██████| 269/269 [00:00<00:00, 127kB/s]
Downloading (…)training/recipe.yaml: 100%|██| 1.55k/1.55k [00:00<00:00, 596kB/s]
Downloading (…)g/train_results.json: 100%|█████| 199/199 [00:00<00:00, 92.7kB/s]
Downloading (…)training/config.json: 100%|██████| 615/615 [00:00<00:00, 296kB/s]
Downloading (…)lidation-metric.yaml: 100%|█████| 177/177 [00:00<00:00, 80.7kB/s]
Downloading (…)ing/all_results.json: 100%|██████| 448/448 [00:00<00:00, 193kB/s]
Downloading (…)g/trainer_sta

#### Exporting for inference

In [1]:
!sparseml.transformers.export_onnx \
    --model_path models/sparsified \
    --task 'text-classification' --finetuning_task sst2 \
    --sequence_length 128

2024-01-18 12:41:33 sparseml.transformers.export INFO     Attempting onnx export for model at /home/jupyter/neural-magic/models/sparsified for task text-classification
Some weights of the model checkpoint at /home/jupyter/neural-magic/models/sparsified were not used when initializing DistilBertForSequenceClassification: ['distilbert.transformer.layer.2.attention.attention_scores_matmul.input_quant_stubs.1.activation_post_process.activation_post_process.eps', 'distilbert.transformer.layer.1.attention.attention_scores_matmul.output_quant_stubs.0.activation_post_process.activation_post_process.eps', 'distilbert.transformer.layer.4.attention.context_layer_matmul.input_quant_stubs.0.activation_post_process.fake_quant_enabled', 'distilbert.transformer.layer.0.attention.attention_scores_matmul.input_quant_stubs.0.activation_post_process.activation_post_process.eps', 'distilbert.transformer.layer.3.ffn.lin2.module.weight_fake_quant.activation_post_process.min_val', 'distilbert.transformer.laye