# Distilling knowlege in Transformer models and test prediction for GLUE tasks, using *torchdistill*

## 1. Make sure you have access to GPU/TPU
Google Colab: Runtime -> Change runtime type -> Hardware accelarator: "GPU" or "TPU"

In [None]:
!nvidia-smi

Thu Jun  3 03:02:38 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 465.27       Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   45C    P8    10W /  70W |      0MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

## 2. Clone torchdistill repository to use its example code and configuration files

In [None]:
!git clone https://github.com/yoshitomo-matsubara/torchdistill

Cloning into 'torchdistill'...
remote: Enumerating objects: 5231, done.[K
remote: Counting objects: 100% (1013/1013), done.[K
remote: Compressing objects: 100% (374/374), done.[K
remote: Total 5231 (delta 575), reused 982 (delta 561), pack-reused 4218[K
Receiving objects: 100% (5231/5231), 1.24 MiB | 20.17 MiB/s, done.
Resolving deltas: 100% (3189/3189), done.


## 3. Install dependencies and *torchdistill*

In [None]:
!pip install -r torchdistill/examples/hf_transformers/requirements.txt
!pip install torchdistill

Collecting accelerate
[?25l  Downloading https://files.pythonhosted.org/packages/f7/fa/d173d923c953d930702066894abf128a7e5258c6f64cf088d2c5a83f46a3/accelerate-0.3.0-py3-none-any.whl (49kB)
[K     |██████▋                         | 10kB 20.9MB/s eta 0:00:01[K     |█████████████▏                  | 20kB 26.5MB/s eta 0:00:01[K     |███████████████████▊            | 30kB 31.1MB/s eta 0:00:01[K     |██████████████████████████▎     | 40kB 35.0MB/s eta 0:00:01[K     |████████████████████████████████| 51kB 7.5MB/s 
[?25hCollecting datasets>=1.1.3
[?25l  Downloading https://files.pythonhosted.org/packages/94/f8/ff7cd6e3b400b33dcbbfd31c6c1481678a2b2f669f521ad20053009a9aa3/datasets-1.7.0-py3-none-any.whl (234kB)
[K     |████████████████████████████████| 235kB 34.9MB/s 
[?25hCollecting sentencepiece!=0.1.92
[?25l  Downloading https://files.pythonhosted.org/packages/f5/99/e0808cb947ba10f575839c43e8fafc9cc44e4a7a2c8f79c60db48220a577/sentencepiece-0.1.95-cp37-cp37m-manylinux2014_x86_6

## (Optional) Configure Accelerate for 2x-speedup training by mixed-precision

If you are **NOT** using the Google Colab Pro, it will exceed 12 hours (maximum lifetimes for free Google Colab users) to fine-tune a base-sized model for the following 9 different tasks with Tesla K80.
By using mixed-precision training, you can complete all the 9 fine-tuning jobs.
[This table](https://github.com/huggingface/transformers/tree/master/examples/pytorch/text-classification#mixed-precision-training) gives you a good idea about how long it will take to fine-tune a BERT-Base on a Titan RTX with/without mixed-precision.

In [None]:
!accelerate config

In which compute environment are you running? ([0] This machine, [1] AWS (Amazon SageMaker)): 0
Which type of machine are you using? ([0] No distributed training, [1] multi-GPU, [2] TPU): 0
How many processes in total will you use? [1]: 1
Do you wish to use FP16 (mixed precision)? [yes/NO]: yes


## 4. Distill knowledge in Transformer models for GLUE tasks
The following examples demonstrate how to distill knowledge in fine-tuned BERT-Large (uncased) to pretrained BERT-Base (uncased) on each of datasets in GLUE.  
**Note**: Test splits for GLUE tasks in `datasets` package are not labeled, and you use only training and validation spltis in this example, following [Hugging Face's example](https://github.com/huggingface/transformers/tree/master/examples/pytorch/text-classification).

### 4.1 CoLA task

In [None]:
!accelerate launch torchdistill/examples/hf_transformers/text_classification.py \
  --config torchdistill/configs/sample/glue/cola/kd/bert_base_uncased_from_bert_large_uncased.yaml \
  --task cola \
  --log log/glue/cola/kd/bert_base_uncased_from_bert_large_uncased.txt \
  --private_output leaderboard/glue/kd/bert_base_uncased_from_bert_large_uncased/

2021-06-02 03:03:01.360176: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021/06/02 03:03:03	INFO	__main__	Namespace(adjust_lr=False, config='torchdistill/configs/sample/glue/cola/kd/bert_base_uncased_from_bert_large_uncased.yaml', log='log/glue/cola/kd/bert_base_uncased_from_bert_large_uncased.txt', private_output='leaderboard/glue/kd/bert_base_uncased_from_bert_large_uncased/', seed=None, student_only=False, task_name='cola', test_only=False, world_size=1)
2021/06/02 03:03:03	INFO	__main__	Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda
Use FP16 precision: True

2021/06/02 03:03:03	INFO	filelock	Lock 139654721540560 acquired on /root/.cache/huggingface/transformers/eedef380ce365bc39df6f2de366cb59636f721cdf5e2dbb42868d244e14ae7ad.51ed9871d1da956df05ec6d5bf494e731858cf63eb41e7e9f916bd5b46be2278.lock
https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-cola

### 4.2 SST-2 task

In [None]:
!accelerate launch torchdistill/examples/hf_transformers/text_classification.py \
  --config torchdistill/configs/sample/glue/sst2/kd/bert_base_uncased_from_bert_large_uncased.yaml \
  --task sst2 \
  --log log/glue/sst2/kd/bert_base_uncased_from_bert_large_uncased.txt \
  --private_output leaderboard/glue/kd/bert_base_uncased_from_bert_large_uncased/

2021-06-02 03:07:49.381458: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021/06/02 03:07:51	INFO	__main__	Namespace(adjust_lr=False, config='torchdistill/configs/sample/glue/sst2/kd/bert_base_uncased_from_bert_large_uncased.yaml', log='log/glue/sst2/kd/bert_base_uncased_from_bert_large_uncased.txt', private_output='leaderboard/glue/kd/bert_base_uncased_from_bert_large_uncased/', seed=None, student_only=False, task_name='sst2', test_only=False, world_size=1)
2021/06/02 03:07:51	INFO	__main__	Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda
Use FP16 precision: True

2021/06/02 03:07:51	INFO	filelock	Lock 140107769119120 acquired on /root/.cache/huggingface/transformers/8e456da2df0487e2723d5c9a14dbbc8b15a3bac0e29fe9083effe4d206ea0115.d9ebfbafdb59660a02fc9ce1616d7406b757d783672f5959e587d5ccb82a7850.lock
https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-sst2

### 4.3 MRPC task

In [None]:
!accelerate launch torchdistill/examples/hf_transformers/text_classification.py \
  --config torchdistill/configs/sample/glue/mrpc/kd/bert_base_uncased_from_bert_large_uncased.yaml \
  --task mrpc \
  --log log/glue/mrpc/kd/bert_base_uncased_from_bert_large_uncased.txt \
  --private_output leaderboard/glue/kd/bert_base_uncased_from_bert_large_uncased/

2021-06-02 03:30:00.642949: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021/06/02 03:30:02	INFO	__main__	Namespace(adjust_lr=False, config='torchdistill/configs/sample/glue/mrpc/kd/bert_base_uncased_from_bert_large_uncased.yaml', log='log/glue/mrpc/kd/bert_base_uncased_from_bert_large_uncased.txt', private_output='leaderboard/glue/kd/bert_base_uncased_from_bert_large_uncased/', seed=None, student_only=False, task_name='mrpc', test_only=False, world_size=1)
2021/06/02 03:30:02	INFO	__main__	Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda
Use FP16 precision: True

2021/06/02 03:30:03	INFO	filelock	Lock 139909012588176 acquired on /root/.cache/huggingface/transformers/b9bd611980e6b9a94ed8faf7032113a29ff5a748986795182e7e5e274b5f7399.6ed35abb2634e07a995f226e139bc86550bf086755ba56d4fbc054c847ab745f.lock
https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-mrpc

### 4.4 STS-B task

In [None]:
!accelerate launch torchdistill/examples/hf_transformers/text_classification.py \
  --config torchdistill/configs/sample/glue/stsb/kd/bert_base_uncased_from_bert_large_uncased.yaml \
  --task stsb \
  --log log/glue/stsb/kd/bert_base_uncased_from_bert_large_uncased.txt \
  --private_output leaderboard/glue/kd/bert_base_uncased_from_bert_large_uncased/

2021-06-02 03:34:57.164794: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021/06/02 03:34:58	INFO	__main__	Namespace(adjust_lr=False, config='torchdistill/configs/sample/glue/stsb/kd/bert_base_uncased_from_bert_large_uncased.yaml', log='log/glue/stsb/kd/bert_base_uncased_from_bert_large_uncased.txt', private_output='leaderboard/glue/kd/bert_base_uncased_from_bert_large_uncased/', seed=None, student_only=False, task_name='stsb', test_only=False, world_size=1)
2021/06/02 03:34:59	INFO	__main__	Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda
Use FP16 precision: True

2021/06/02 03:34:59	INFO	filelock	Lock 139701413451152 acquired on /root/.cache/huggingface/transformers/90815121d17e57ac2f3e84d161527081f158431040397b0d706c024deb16abe8.73e95a2822231762b04780041d908a2c57b1a49edab562d859e258c67b2c3c53.lock
https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-stsb

### 4.5 QQP task

In [None]:
!accelerate launch torchdistill/examples/hf_transformers/text_classification.py \
  --config torchdistill/configs/sample/glue/qqp/kd/bert_base_uncased_from_bert_large_uncased.yaml \
  --task qqp \
  --log log/glue/qqp/kd/bert_base_uncased_from_bert_large_uncased.txt \
  --private_output leaderboard/glue/kd/bert_base_uncased_from_bert_large_uncased/

2021-06-02 03:38:54.293623: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021/06/02 03:38:56	INFO	__main__	Namespace(adjust_lr=False, config='torchdistill/configs/sample/glue/qqp/kd/bert_base_uncased_from_bert_large_uncased.yaml', log='log/glue/qqp/kd/bert_base_uncased_from_bert_large_uncased.txt', private_output='leaderboard/glue/kd/bert_base_uncased_from_bert_large_uncased/', seed=None, student_only=False, task_name='qqp', test_only=False, world_size=1)
2021/06/02 03:38:56	INFO	__main__	Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda
Use FP16 precision: True

2021/06/02 03:38:56	INFO	filelock	Lock 140443960170000 acquired on /root/.cache/huggingface/transformers/c6730a889404372cd78a39f068b75ca306635a3917e558492a19a0d45744a44d.f536da3d1200cfceae13ba24b76fd5963abfa15c8bb1c0f10dc5607f87d61f41.lock
https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-qqp/res

### 4.6 MNLI task

In [None]:
!accelerate launch torchdistill/examples/hf_transformers/text_classification.py \
  --config torchdistill/configs/sample/glue/mnli/kd/bert_base_uncased_from_bert_large_uncased.yaml \
  --task mnli \
  --log log/glue/mnli/kd/bert_base_uncased_from_bert_large_uncased.txt \
  --private_output leaderboard/glue/kd/bert_base_uncased_from_bert_large_uncased/

2021-06-02 15:41:39.173367: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021/06/02 15:41:41	INFO	__main__	Namespace(adjust_lr=False, config='torchdistill/configs/sample/glue/mnli/kd/bert_base_uncased_from_bert_large_uncased.yaml', log='log/glue/mnli/kd/bert_base_uncased_from_bert_large_uncased.txt', private_output='leaderboard/glue/kd/bert_base_uncased_from_bert_large_uncased/', seed=None, student_only=False, task_name='mnli', test_only=False, world_size=1)
2021/06/02 15:41:41	INFO	__main__	Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda
Use FP16 precision: True

2021/06/02 15:41:41	INFO	filelock	Lock 140424696020240 acquired on /root/.cache/huggingface/transformers/5b5f978453cf40beb680cdd3d4aa881c966097f83937fbf475e0ed640062dbca.c73d14e62466b28d4e1ef822a490987b8f83b052127d2564f2e5bbce495e3c09.lock
https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-mnli

### 4.7 QNLI task

In [None]:
!accelerate launch torchdistill/examples/hf_transformers/text_classification.py \
  --config torchdistill/configs/sample/glue/qnli/kd/bert_base_uncased_from_bert_large_uncased.yaml \
  --task qnli \
  --log log/glue/qnli/kd/bert_base_uncased_from_bert_large_uncased.txt \
  --private_output leaderboard/glue/kd/bert_base_uncased_from_bert_large_uncased/

2021-06-03 03:06:20.905872: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021/06/03 03:06:23	INFO	__main__	Namespace(adjust_lr=False, config='torchdistill/configs/sample/glue/qnli/kd/bert_base_uncased_from_bert_large_uncased.yaml', log='log/glue/qnli/kd/bert_base_uncased_from_bert_large_uncased.txt', private_output='leaderboard/glue/kd/bert_base_uncased_from_bert_large_uncased/', seed=None, student_only=False, task_name='qnli', test_only=False, world_size=1)
2021/06/03 03:06:23	INFO	__main__	Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda
Use FP16 precision: True

2021/06/03 03:06:23	INFO	filelock	Lock 140398783211984 acquired on /root/.cache/huggingface/transformers/20af61132d3eccaf2a0b4fd9a18767272ba96e31e9a0b1d0035ef88dcdf1825b.92355a00abb9ce3df48e653f7303627944e38f408924eaed14015d4b5ab463c2.lock
https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-qnli

### 4.8 RTE task

In [None]:
!accelerate launch torchdistill/examples/hf_transformers/text_classification.py \
  --config torchdistill/configs/sample/glue/rte/kd/bert_base_uncased_from_bert_large_uncased.yaml \
  --task rte \
  --log log/glue/rte/kd/bert_base_uncased_from_bert_large_uncased.txt \
  --private_output leaderboard/glue/kd/bert_base_uncased_from_bert_large_uncased/

2021-06-03 04:17:06.686595: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021/06/03 04:17:08	INFO	__main__	Namespace(adjust_lr=False, config='torchdistill/configs/sample/glue/rte/kd/bert_base_uncased_from_bert_large_uncased.yaml', log='log/glue/rte/kd/bert_base_uncased_from_bert_large_uncased.txt', private_output='leaderboard/glue/kd/bert_base_uncased_from_bert_large_uncased/', seed=None, student_only=False, task_name='rte', test_only=False, world_size=1)
2021/06/03 04:17:08	INFO	__main__	Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda
Use FP16 precision: True

2021/06/03 04:17:08	INFO	filelock	Lock 139704299298960 acquired on /root/.cache/huggingface/transformers/16ae5f5e66414330b3b1301297beabe73eab3d3ca4743f255d1d899cef7b3e57.0d01319f22c08f39829b3b7d9a37c919dbc12ba1748f82f78b2c7ab8a4b40add.lock
https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-rte/res

### 4.9 WNLI task

In [None]:
!accelerate launch torchdistill/examples/hf_transformers/text_classification.py \
  --config torchdistill/configs/sample/glue/wnli/kd/bert_base_uncased_from_bert_large_uncased.yaml \
  --task wnli \
  --log log/glue/wnli/kd/bert_base_uncased_from_bert_large_uncased.txt \
  --private_output leaderboard/glue/kd/bert_base_uncased_from_bert_large_uncased/

2021-06-03 04:20:10.071306: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021/06/03 04:20:12	INFO	__main__	Namespace(adjust_lr=False, config='torchdistill/configs/sample/glue/wnli/kd/bert_base_uncased_from_bert_large_uncased.yaml', log='log/glue/wnli/kd/bert_base_uncased_from_bert_large_uncased.txt', private_output='leaderboard/glue/kd/bert_base_uncased_from_bert_large_uncased/', seed=None, student_only=False, task_name='wnli', test_only=False, world_size=1)
2021/06/03 04:20:12	INFO	__main__	Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda
Use FP16 precision: True

2021/06/03 04:20:12	INFO	filelock	Lock 140066006998800 acquired on /root/.cache/huggingface/transformers/04eef7afba7d6430e4bcc3fab62c73ea6bd06850ab82561e857383ebcfb81159.16b0b1ea3b7f500238dcbd1293b26497ebd58e316af660c029812a5db5c8c7c6.lock
https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-wnli

# 5. Validate your prediction files for GLUE leaderboard
To make sure your prediction files contain the right numbers of samples (lines), you should see the following output by `wc -l <your prediction dir path>`.

```
   1105 AX.tsv
   1064 CoLA.tsv
   9848 MNLI-mm.tsv
   9797 MNLI-m.tsv
   1726 MRPC.tsv
   5464 QNLI.tsv
 390966 QQP.tsv
   3001 RTE.tsv
   1822 SST-2.tsv
   1380 STS-B.tsv
    147 WNLI.tsv
 426320 total
```

In [None]:
!wc -l leaderboard/glue/kd/bert_base_uncased_from_bert_large_uncased/*

   1105 leaderboard/glue/kd/bert_base_uncased_from_bert_large_uncased/AX.tsv
   1064 leaderboard/glue/kd/bert_base_uncased_from_bert_large_uncased/CoLA.tsv
   9848 leaderboard/glue/kd/bert_base_uncased_from_bert_large_uncased/MNLI-mm.tsv
   9797 leaderboard/glue/kd/bert_base_uncased_from_bert_large_uncased/MNLI-m.tsv
   1726 leaderboard/glue/kd/bert_base_uncased_from_bert_large_uncased/MRPC.tsv
   5464 leaderboard/glue/kd/bert_base_uncased_from_bert_large_uncased/QNLI.tsv
 390966 leaderboard/glue/kd/bert_base_uncased_from_bert_large_uncased/QQP.tsv
   3001 leaderboard/glue/kd/bert_base_uncased_from_bert_large_uncased/RTE.tsv
   1822 leaderboard/glue/kd/bert_base_uncased_from_bert_large_uncased/SST-2.tsv
   1380 leaderboard/glue/kd/bert_base_uncased_from_bert_large_uncased/STS-B.tsv
    147 leaderboard/glue/kd/bert_base_uncased_from_bert_large_uncased/WNLI.tsv
 426320 total


## 6. Zip the submission files and download to make a submission

In [None]:
!zip bert_base_uncased_from_bert_large_uncased-submission.zip leaderboard/glue/kd/bert_base_uncased_from_bert_large_uncased/*

  adding: leaderboard/glue/kd/bert_base_uncased_from_bert_large_uncased/AX.tsv (deflated 82%)
  adding: leaderboard/glue/kd/bert_base_uncased_from_bert_large_uncased/CoLA.tsv (deflated 64%)
  adding: leaderboard/glue/kd/bert_base_uncased_from_bert_large_uncased/MNLI-mm.tsv (deflated 83%)
  adding: leaderboard/glue/kd/bert_base_uncased_from_bert_large_uncased/MNLI-m.tsv (deflated 83%)
  adding: leaderboard/glue/kd/bert_base_uncased_from_bert_large_uncased/MRPC.tsv (deflated 64%)
  adding: leaderboard/glue/kd/bert_base_uncased_from_bert_large_uncased/QNLI.tsv (deflated 84%)
  adding: leaderboard/glue/kd/bert_base_uncased_from_bert_large_uncased/QQP.tsv (deflated 73%)
  adding: leaderboard/glue/kd/bert_base_uncased_from_bert_large_uncased/RTE.tsv (deflated 84%)
  adding: leaderboard/glue/kd/bert_base_uncased_from_bert_large_uncased/SST-2.tsv (deflated 64%)
  adding: leaderboard/glue/kd/bert_base_uncased_from_bert_large_uncased/STS-B.tsv (deflated 56%)
  adding: leaderboard/glue/kd/bert_ba

Download the zip file from "Files" menu.  
To submit the file to the GLUE system, refer to their webpage.
https://gluebenchmark.com/

## 7. More sample configurations, models, datasets...
You can find more [sample configurations](https://github.com/yoshitomo-matsubara/torchdistill/tree/master/configs/sample/) in the [***torchdistill***](https://github.com/yoshitomo-matsubara/torchdistill) repository.  
If you would like to use larger datasets e.g., **ImageNet** and **COCO** datasets and models in `torchvision` (or your own modules), refer to the [official configurations](https://github.com/yoshitomo-matsubara/torchdistill/tree/master/configs/official) used in some published papers.  
Experiments with such large datasets and models will require you to use your own machine due to limited disk space and session time (12 hours for free version and 24 hours for Colab Pro) on Google Colab.


# Colab examples for training student models without teacher models
You can find Colab examples for training student models without teacher models in the [***torchdistill***](https://github.com/yoshitomo-matsubara/torchdistill) repository.