Skip to content
master
Go to file
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 

README.md

Knowledge Distillation

PyTorch implementations of algorithms for knowledge distillation.

Setup

build

$ docker build -t kd -f Dockerfile .

run

$ docker run -v local_data_path:/data -v project_path:/app -p 0.0.0.0:8084:8084 -it kd

Experiments

  1. Task-specific distillation from BERT to BiLSTM. Data: SST-2 binary classification.

Papers

  1. Cristian Bucila, Rich Caruana, Alexandru Niculescu-Mizil "ModelCompression" (2006) pdf.

  2. Victor Sanh, Lysandre Debut, Julien Chaumond, Thomas Wolf "DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter" (2019) https://arxiv.org/abs/1910.01108.

  3. Raphael Tang, Yao Lu, Linqing Liu, Lili Mou, Olga Vechtomova, Jimmy Lin "Distilling Task-Specific Knowledge from BERT into Simple Neural Networks" (2019) https://arxiv.org/abs/1903.12136.

  4. Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut "ALBERT: A Lite BERT for Self-supervised Learning of Language Representations" (2019) https://arxiv.org/abs/1909.11942.

  5. Rafael Müller, Simon Kornblith, Geoffrey Hinton "Subclass Distillation" (2020) https://arxiv.org/abs/2002.03936.

  6. Iulia Turc, Ming-Wei Chang, Kenton Lee, Kristina Toutanova "Well-Read Students Learn Better: On the Importance of Pre-training Compact Models" (2020) https://arxiv.org/abs/1908.08962.

About

PyTorch implementations of algorithms for knowledge distillation.

Topics

Resources

License

Releases

No releases published

Packages

No packages published
You can’t perform that action at this time.