Knowledge-Distillation(KD) is a simple way to compress model while keeping the performance of original model. This repository provide a implementation of the paper "Distilling the Knowledge in a Neural Network" with some changes. Please check the references for detailed explanations.
- python 3.6 >
- pytorch 1.3 >
.
|--experiments (it will include the scripts to run, results of train and test)
|--train_net.py (main solver to train the model)
|--data/
| |--data_loader.py (Data queue module)
|--models/
| |--Loss.py (Loss functions that are used in this project)
|--engine/
| |--trainer.py
| |--inference.py
| |--solver.py
|--utils/
| |--logger.py (module that can visualization of image and training plot)
| |--checkpointer.py
| |--measure.py
You should make clear that all directory and files are located correctly
- Check all requirements to run
- Additional feature to improve baseline(Hinton's 15)
- Train teachers
I added two scripts. One for training the teacher model and another for trainingthe student using teacher.
Specify your teacher model in models and build_model if you have own your model and put arguments correctly to run.
$ ./experiments/exp1/train.sh
Table will be appear
$ tensorboard --logdir=experiment/ --port=6666
and go to 'localhost:6666' on webbrowser. You can see the accuracy and loss graph
- "Distilling the Knowledge in a Neural Network"
- "Python implementaion of hinton's KD"
- I refered CS230 report to understand loss function(Cross entropy + KL_div)