KnowledgeDistillation Layer (Caffe implementation)

This is a CPU implementation of knowledge distillation in Caffe.
This code is heavily based on softmax_loss_layer.hpp and softmax_loss_layer.cpp.

Please refer to the paper

Hinton, G. Vinyals, O. and Dean, J. Distilling knowledge in a neural network. 2015.

Installation

Install Caffe in your directory CAFFE
Clone this repository in your directory ROOT

cd $ROOT
git clone https://github.com/wentianli/knowledge_distillation_caffe.git

Move files to your Caffe folder

cp $ROOT/knowledge_distillation_layer.hpp $CAFFE/include/caffe/layers
cp $ROOT/knowledge_distillation_layer.cpp $CAFFE/src/caffe/layers

Modify $CAFFE/src/caffe/proto/caffe.proto
add optional KnowledgeDistillationParameter in LayerParameter

message LayerParameter {
  ...

  //next available layer-specific ID
  optional KnowledgeDistillationParameter knowledge_distillation_param = 147;
}

add message KnowledgeDistillationParameter

message KnowledgeDistillationParameter {
  optional float temperature = 1 [default = 1];
}

Build Caffe

Usage

KnowledgeDistillation Layer has one specific parameter temperature.

The layer takes 2 or 3 input blobs:
bottom[0]: the logits of the student
bottom[1]: the logits of the teacher
bottom[2](optional): label inputs
The logits are first divided by temperatrue T, then mapped to probability distributions over classes using the softmax function. The layer computes KL divergence instead of cross entropy. The gradients are multiplied by T^2, as suggested in the paper.

Common setting in prototxt (2 input blobs are given)

layer {
  name: "KD"
  type: "KnowledgeDistillation"
  bottom: "student_logits"
  bottom: "taecher_logits"
  top: "KL_div"
  include { phase: TRAIN }
  knowledge_distillation_param { temperature: 4 } #usually larger than 1
  loss_weight: 1
}

If you have ignore_label, 3 input blobs should be given

layer {
  name: "KD"
  type: "KnowledgeDistillation"
  bottom: "student_logits"
  bottom: "taecher_logits"
  bottom: "label"
  top: "KL_div"
  include { phase: TRAIN }
  knowledge_distillation_param { temperature: 4 }
  loss_param {ignore_label: 2}
  loss_weight: 1
}

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
README.md		README.md
knowledge_distillation_layer.cpp		knowledge_distillation_layer.cpp
knowledge_distillation_layer.hpp		knowledge_distillation_layer.hpp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

knowledge_distillation_layer.cpp

knowledge_distillation_layer.cpp

knowledge_distillation_layer.hpp

knowledge_distillation_layer.hpp

Repository files navigation

KnowledgeDistillation Layer (Caffe implementation)

Installation

Usage

About

Releases

Packages

Languages

qiu931110/knowledge_distillation_caffe

Folders and files

Latest commit

History

Repository files navigation

KnowledgeDistillation Layer (Caffe implementation)

Installation

Usage

About

Resources

Stars

Watchers

Forks

Languages