Improving Convolutional Networks via Attention Transfer (ICLR 2017)
Jupyter Notebook Python
Switch branches/tags
Nothing to show
Clone or download
Failed to load latest commit information. add pth resnet34 link Jul 10, 2018 fix KL Jul 11, 2018 imagenet pytorch0.4 Jul 10, 2018
requirements.txt remove pandas and second tqdm Jul 11, 2018 fix KL Jul 11, 2018
visualize-attention.ipynb add visualization maps notebook May 26, 2018

Attention Transfer

PyTorch code for "Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer"
Conference paper at ICLR2017:

What's in this repo so far:

  • Activation-based AT code for CIFAR-10 experiments
  • Code for ImageNet experiments (ResNet-18-ResNet-34 student-teacher)
  • Jupyter notebook to visualize attention maps of ResNet-34 visualize-attention.ipynb


  • grad-based AT
  • Scenes and CUB activation-based AT code

The code uses PyTorch Note that the original experiments were done using torch-autograd, we have so far validated that CIFAR-10 experiments are exactly reproducible in PyTorch, and are in process of doing so for ImageNet (results are very slightly worse in PyTorch, due to hyperparameters).


    author = {Sergey Zagoruyko and Nikos Komodakis},
    title = {Paying More Attention to Attention: Improving the Performance of
             Convolutional Neural Networks via Attention Transfer},
    booktitle = {ICLR},
    url = {},
    year = {2017}}


First install PyTorch, then install torchnet:

pip install git+

then install other Python packages:

pip install -r requirements.txt



This section describes how to get the results in the table 1 of the paper.

First, train teachers:

python --save logs/resnet_40_1_teacher --depth 40 --width 1
python --save logs/resnet_16_2_teacher --depth 16 --width 2
python --save logs/resnet_40_2_teacher --depth 40 --width 2

To train with activation-based AT do:

python --save logs/at_16_1_16_2 --teacher_id resnet_16_2_teacher --beta 1e+3

To train with KD:

python --save logs/kd_16_1_16_2 --teacher_id resnet_16_2_teacher --alpha 0.9

We plan to add AT+KD with decaying beta to get the best knowledge transfer results soon.


Pretrained model

We provide ResNet-18 pretrained model with activation based AT:

Model val error
ResNet-18 30.4, 10.8
ResNet-18-ResNet-34-AT 29.3, 10.0

Download link:

Model definition:

Convergence plot:

Train from scratch

Download pretrained weights for ResNet-34 (see also functional-zoo for more information):


Prepare the data following fb.resnet.torch and run training (e.g. using 2 GPUs):

python --imagenetpath ~/ILSVRC2012 --depth 18 --width 1 \
                   --teacher_params resnet-34-export.hkl --gpu_id 0,1 --ngpu 2 \
                   --beta 1e+3