Skip to content

Commit

Permalink
add self distillation and example for resnet (#1473)
Browse files Browse the repository at this point in the history
  • Loading branch information
sywangyi committed Nov 11, 2022
1 parent ffa9a39 commit acdd4ca
Show file tree
Hide file tree
Showing 15 changed files with 1,685 additions and 0 deletions.
13 changes: 13 additions & 0 deletions docs/distillation.md
Expand Up @@ -7,6 +7,8 @@ Distillation

1.2. [Intermediate Layer Knowledge Distillation](#intermediate-layer-knowledge-distillation)

1.3. [Self Distillation](#self-distillation)

2. [Distillation Support Matrix](#distillation-support-matrix)
3. [Get Started with Distillation API ](#get-started-with-distillation-api)
4. [Examples](#examples)
Expand Down Expand Up @@ -35,12 +37,23 @@ $$L_{KD} = \sum\limits_i D(T_t^{n_i}(F_t^{n_i}), T_s^{m_i}(F_s^{m_i}))$$

Where $D$ is a distance measurement as before, $F_t^{n_i}$ the output feature of the $n_i$'s layer of the teacher model, $F_s^{m_i}$ the output feature of the $m_i$'s layer of the student model. Since the dimensions of $F_t^{n_i}$ and $F_s^{m_i}$ are usually different, the transformations $T_t^{n_i}$ and $T_s^{m_i}$ are needed to match dimensions of the two features. Specifically, the transformation can take the forms like identity, linear transformation, 1X1 convolution etc.

### Self Distillation

Self-distillation ia a one-stage training method where the teacher model and student models can be trained together. It attaches several attention modules and shallow classifiers at different depths of neural networks and distills knowledge from the deepest classifier to the shallower classifiers. Different from the conventional knowledge distillation methods where the knowledge of the teacher model is transferred to another student model, self-distillation can be considered as knowledge transfer in the same model, from the deeper layers to the shallower layers.
The additional classifiers in self-distillation allow the neural network to work in a dynamic manner, which leads to a much higher acceleration.
<br>

<img src="./imgs/self-distillation.png" alt="Architecture" width=800 height=350>

Architecture from paper [Self-Distillation: Towards Efficient and Compact Neural Networks](https://ieeexplore.ieee.org/document/9381661)

## Distillation Support Matrix

|Distillation Algorithm |PyTorch |TensorFlow |
|------------------------------------------------|:--------:|:---------:|
|Knowledge Distillation |&#10004; |&#10004; |
|Intermediate Layer Knowledge Distillation |&#10004; |Will be supported|
|Self Distillation |&#10004; |&#10006; |

## Get Started with Distillation API

Expand Down
Binary file added docs/imgs/self-distillation.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
20 changes: 20 additions & 0 deletions examples/README.md
Expand Up @@ -336,6 +336,7 @@ Intel® Neural Compressor validated examples with multiple compression technique
<th>Student Model</th>
<th>Teacher Model</th>
<th>Domain</th>
<th>Approach </th>
<th>Examples</th>
</tr>
</thead>
Expand All @@ -344,6 +345,7 @@ Intel® Neural Compressor validated examples with multiple compression technique
<td>MobileNet</td>
<td>DenseNet201</td>
<td>Image Recognition</td>
<td>Knowledge Distillation</td>
<td><a href="./tensorflow/image_recognition/tensorflow_models/distillation">pb</a></td>
</tr>
</tbody>
Expand Down Expand Up @@ -613,6 +615,7 @@ Intel® Neural Compressor validated examples with multiple compression technique
<th>Student Model</th>
<th>Teacher Model</th>
<th>Domain</th>
<th>Approach</th>
<th>Examples</th>
</tr>
</thead>
Expand All @@ -621,60 +624,77 @@ Intel® Neural Compressor validated examples with multiple compression technique
<td>CNN-2</td>
<td>CNN-10</td>
<td>Image Recognition</td>
<td>Knowledge Distillation</td>
<td><a href="./pytorch/image_recognition/CNN-2/distillation/eager">eager</a></td>
</tr>
<tr>
<td>MobileNet V2-0.35</td>
<td>WideResNet40-2</td>
<td>Image Recognition</td>
<td>Knowledge Distillation</td>
<td><a href="./pytorch/image_recognition/MobileNetV2-0.35/distillation/eager">eager</a></td>
</tr>
<tr>
<td>ResNet18|ResNet34|ResNet50|ResNet101</td>
<td>ResNet18|ResNet34|ResNet50|ResNet101</td>
<td>Image Recognition</td>
<td>Knowledge Distillation</td>
<td><a href="./pytorch/image_recognition/torchvision_models/distillation/eager">eager</a></td>
</tr>
<tr>
<td>ResNet18|ResNet34|ResNet50|ResNet101</td>
<td>ResNet18|ResNet34|ResNet50|ResNet101</td>
<td>Image Recognition</td>
<td>Self Distillation</td>
<td><a href="./pytorch/image_recognition/torchvision_models/self_distillation/eager">eager</a></td>
</tr>
<tr>
<td>VGG-8</td>
<td>VGG-13</td>
<td>Image Recognition</td>
<td>Knowledge Distillation</td>
<td><a href="./pytorch/image_recognition/VGG-8/distillation/eager">eager</a></td>
</tr>
<tr>
<td>BlendCNN</td>
<td>BERT-Base</td>
<td>Natural Language Processing</td>
<td>Knowledge Distillation</td>
<td><a href="./pytorch/nlp/blendcnn/distillation/eager">eager</a></td>
</tr>
<tr>
<td>DistilBERT</td>
<td>BERT-Base</td>
<td>Natural Language Processing</td>
<td>Knowledge Distillation</td>
<td><a href="./pytorch/nlp/huggingface_models/question-answering/distillation/eager">eager</a></td>
</tr>
<tr>
<td>BiLSTM</td>
<td>RoBERTa-Base</td>
<td>Natural Language Processing</td>
<td>Knowledge Distillation</td>
<td><a href="./pytorch/nlp/huggingface_models/text-classification/distillation/eager">eager</a></td>
</tr>
<tr>
<td>TinyBERT</td>
<td>BERT-Base</td>
<td>Natural Language Processing</td>
<td>Knowledge Distillation</td>
<td><a href="./pytorch/nlp/huggingface_models/text-classification/distillation/eager">eager</a></td>
</tr>
<tr>
<td>BERT-3</td>
<td>BERT-Base</td>
<td>Natural Language Processing</td>
<td>Knowledge Distillation</td>
<td><a href="./pytorch/nlp/huggingface_models/text-classification/distillation/eager">eager</a></td>
</tr>
<tr>
<td>DistilRoBERTa</td>
<td>RoBERTa-Large</td>
<td>Natural Language Processing</td>
<td>Knowledge Distillation</td>
<td><a href="./pytorch/nlp/huggingface_models/text-classification/distillation/eager">eager</a></td>
</tr>
</tbody>
Expand Down
@@ -0,0 +1,22 @@
Details **TBD**
### Prepare requirements
```shell
pip install -r requirements.txt
```
### Run self distillation
```shell
bash run_distillation.sh --topology=(resnet18|resnet34|resnet50|resnet101) --config=conf.yaml --output_model=path/to/output_model --dataset_location=path/to/dataset --use_cpu=(0|1)
```
### CIFAR100 benchmark
https://github.com/weiaicunzai/pytorch-cifar100

### Paper:
[Be Your Own Teacher: Improve the Performance of Convolutional Neural Networks via Self Distillation](https://openaccess.thecvf.com/content_ICCV_2019/html/Zhang_Be_Your_Own_Teacher_Improve_the_Performance_of_Convolutional_Neural_ICCV_2019_paper.html)

[Self-Distillation: Towards Efficient and Compact Neural Networks](https://ieeexplore.ieee.org/document/9381661)

### Our results in CIFAR100
| model | Baseline | Classifier1 | Classifier2 | Classifier3 | Classifier4 | Ensemble |
| :------: | :-------:| :---------: | :---------: | :---------: | :---------: | :------: |
| Resnet50 | 80.88 | 82.06 | 83.64 | 83.85 | 83.41 | 85.10 |

0 comments on commit acdd4ca

Please sign in to comment.