Revisiting Intermediate Layer Distillation for Compressing Language Models: An Overfitting Perspective
Jongwoo Ko,
Seungjoon Park,
Minchan Jeong,
Sukjin Hong,
Euijai Ahn,
Du-Seong Chang,
Se-Young Yun
pip install -r requirements.txt
python download_glue_data.py
For BERT experiments, you have to prepare the teacher model and student model. You have to download the teacher and student model from these link.
- BERT-base : https://storage.googleapis.com/bert_models/2020_02_20/uncased_L-12_H-768_A-12.zip
- BERT-small : https://storage.googleapis.com/bert_models/2020_02_20/uncased_L-6_H-768_A-12.zip
Then, you have to first fine-tune the teacher model, and then conducting ILD. You only need pytorch_model.bin and config.json.
bash run_ft_standard.sh ${task_name}
bash scripts/standard_glue_truncated_bert.sh 0 ${task_name}
bash scripts/standard_glue_truncated_bert.sh 1 ${task_name}
bash scripts/standard_glue_truncated_bert.sh 2 ${task_name}
bash scripts/standard_glue_truncated_bert.sh 3 ${task_name}