[COLING2025]Enhancing Knowledge Distillation of Large Language Models through Efficient Multi-Modal Distribution Alignment

Training

After setting up the environment, first configure the distributed training environment using the accelerate library. Use the following command to specify the number of GPUs and DeepSpeed settings.

accelerate config

Distillation for Pre-Training Task
```
bash run_pretrain.sh
```
You can view and change the adjustable parameter settings in pretrain/kd_pretrain.py.
Distillation for Downstream Tasks
```
bash run_sft.sh
```
You can view and change the adjustable parameter settings in sft/kd_sft.py.

Citations

Thank you for citing our work.

@misc{peng2024enhancingknowledgedistillationlarge,
      title={Enhancing Knowledge Distillation of Large Language Models through Efficient Multi-Modal Distribution Alignment}, 
      author={Tianyu Peng and Jiajun Zhang},
      year={2024},
      eprint={2409.12545},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2409.12545}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
pretrain		pretrain
sft		sft
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
run_pretrain.sh		run_pretrain.sh
run_sft.sh		run_sft.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

[COLING2025]Enhancing Knowledge Distillation of Large Language Models through Efficient Multi-Modal Distribution Alignment

Training

Citations

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

[COLING2025]Enhancing Knowledge Distillation of Large Language Models through Efficient Multi-Modal Distribution Alignment

Training

Citations

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages