Skip to content

jongwooko/CR-ILD

Repository files navigation

Consistency-regularized Intermediate Layer Distillation (EACL2023 Findings)

arXiv | EACL | Slide | Code

Revisiting Intermediate Layer Distillation for Compressing Language Models: An Overfitting Perspective
Jongwoo Ko, Seungjoon Park, Minchan Jeong, Sukjin Hong, Euijai Ahn, Du-Seong Chang, Se-Young Yun

Requirements

Python modules

pip install -r requirements.txt

Example to Run

Prepare the GLUE datasets

python download_glue_data.py

Prepare the pre-trained Language Models

For BERT experiments, you have to prepare the teacher model and student model. You have to download the teacher and student model from these link.

Then, you have to first fine-tune the teacher model, and then conducting ILD. You only need pytorch_model.bin and config.json.

Examples for script files

Fine-tuning

bash run_ft_standard.sh ${task_name}

CR-ILD

bash scripts/standard_glue_truncated_bert.sh 0 ${task_name}

TinyBERT-like KD

bash scripts/standard_glue_truncated_bert.sh 1 ${task_name}

BERT-EMD

bash scripts/standard_glue_truncated_bert.sh 2 ${task_name}

Patient KD

bash scripts/standard_glue_truncated_bert.sh 3 ${task_name}

References

About

About Code for the paper "Revisiting Intermediate Layer Distillation for Compressing Language Models: An Overfitting Perspective" (EACL 2023 Findings)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published