Skip to content
A PyTorch implementation for exploring deep and shallow knowledge distillation (KD) experiments with flexibility
Branch: master
Clone or download
peterliht Merge pull request #3 from akarle/master
Adding note about Issue #2 in KD Loss
Latest commit e4c4013 May 8, 2018
Type Name Latest commit message Commit time
Failed to load latest commit information.
experiments polish code comments; cleanup ./experiments folder; modify readme; up… Mar 20, 2018
mnist earlier simple experiments with mnist Mar 20, 2018
model Adding note about #2 in KD Loss Apr 29, 2018
LICENSE Initial commit Mar 9, 2018 update readme Apr 20, 2018 dev cleanup and update, not final version yet Mar 19, 2018 distill analysis Mar 17, 2018 fix redundancy Mar 20, 2018
requirements.txt update README and requirements Mar 27, 2018 polish code comments; cleanup ./experiments folder; modify readme; up… Mar 20, 2018 first commit Mar 10, 2018 polish code comments; cleanup ./experiments folder; modify readme; up… Mar 20, 2018 polish code comments; cleanup ./experiments folder; modify readme; up… Mar 20, 2018


  • Exploring knowledge distillation of DNNs for efficient hardware solutions
  • Author: Haitong Li
  • Framework: PyTorch
  • Dataset: CIFAR-10


  • A framework for exploring "shallow" and "deep" knowledge distillation (KD) experiments
  • Hyperparameters defined by "params.json" universally (avoiding long argparser commands)
  • Hyperparameter searching and result synthesizing (as a table)
  • Progress bar, tensorboard support, and checkpoint saving/loading (
  • Pretrained teacher models available for download


  • Clone the repo

    git clone
  • Install the dependencies (including Pytorch)

    pip install -r requirements.txt


  • ./ main entrance for train/eval with or without KD on CIFAR-10
  • ./experiments/: json files for each experiment; dir for hypersearch
  • ./model/: teacher and student DNNs, knowledge distillation (KD) loss defination, dataloader

Key notes about usage for your experiments:

  • Download the zip file for pretrained teacher model checkpoints from this Box folder
  • Simply move the unzipped subfolders into 'knowledge-distillation-pytorch/experiments/' (replacing the existing ones if necessary; follow the default path naming)
  • Call to start training 5-layer CNN with ResNet-18's dark knowledge, or training ResNet-18 with state-of-the-art deeper models distilled
  • Use for hypersearch
  • Hyperparameters are defined in params.json files universally. Refer to the header of for details

Train (dataset: CIFAR-10)

Note: all the hyperparameters can be found and modified in 'params.json' under 'model_dir'

-- Train a 5-layer CNN with knowledge distilled from a pre-trained ResNet-18 model

python --model_dir experiments/cnn_distill

-- Train a ResNet-18 model with knowledge distilled from a pre-trained ResNext-29 teacher

python --model_dir experiments/resnet18_distill/resnext_teacher

-- Hyperparameter search for a specified experiment ('parent_dir/params.json')

python --parent_dir experiments/cnn_distill_alpha_temp

--Synthesize results of the recent hypersearch experiments

python --parent_dir experiments/cnn_distill_alpha_temp

Results: "Shallow" and "Deep" Distillation

Quick takeaways (more details to be added):

  • Knowledge distillation provides regularization for both shallow DNNs and state-of-the-art DNNs
  • Having unlabeled or partial dataset can benefit from dark knowledge of teacher models

-Knowledge distillation from ResNet-18 to 5-layer CNN

Model Dropout = 0.5 No Dropout
5-layer CNN 83.51% 84.74%
5-layer CNN w/ ResNet18 84.49% 85.69%

-Knowledge distillation from deeper models to ResNet-18

Model Test Accuracy
Baseline ResNet-18 94.175%
+ KD WideResNet-28-10 94.333%
+ KD PreResNet-110 94.531%
+ KD DenseNet-100 94.729%
+ KD ResNext-29-8 94.788%


H. Li, "Exploring knowledge distillation of Deep neural nets for efficient hardware solutions," CS230 Report, 2018

Hinton, Geoffrey, Oriol Vinyals, and Jeff Dean. "Distilling the knowledge in a neural network." arXiv preprint arXiv:1503.02531 (2015).

Romero, A., Ballas, N., Kahou, S. E., Chassang, A., Gatta, C., & Bengio, Y. (2014). Fitnets: Hints for thin deep nets. arXiv preprint arXiv:1412.6550.

You can’t perform that action at this time.