Knowledge distillation: A good teacher is patient and consistent

by Lucas Beyer, Xiaohua Zhai, Amélie Royer, Larisa Markeeva, Rohan Anil, Alexander Kolesnikov

Introduction

We publish all teacher models, and configurations for the main experiments of the paper, as well as training logs and student models.

Please read the main big_vision README to learn how to run configs, and remember that each config file contains an example invocation in the top-level comment.

Results

We provide the following colab to read and plot the logfiles of a few runs that we reproduced on Cloud.

ImageNet-1k

The file bit_i1k.py is the configuration which reproduces our distillation runs on ImageNet-1k reported in Figures 1 and 5(left) and the first row of Table1.

We release both student and teacher models:

Model	Download link	Resolution	ImageNet top-1 acc. (paper)
BiT-R50x1	link	160	80.5
BiT-R50x1	link	224	82.8
BiT-R152x2	link	224	83.0
BiT-R152x2	link	384	84.3

Flowers/Pet/Food/Sun

The files bigsweep_flowers_pet.py and bigsweep_food_sun.py can be used to reproduce the distillation runs on these datasets and shown in Figures 3,4,9-12, and Table4.

While our open-source release does not currently support doing hyper-parameter sweeps, we still provide an example of the sweeps at the end of the configs for reference.

Teacher models

Links to all teacher models we used can be found in common.py.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Knowledge distillation: A good teacher is patient and consistent

Introduction

Results

ImageNet-1k

Flowers/Pet/Food/Sun

Teacher models

Files

README.md

Latest commit

History

README.md

File metadata and controls

Knowledge distillation: A good teacher is patient and consistent

Introduction

Results

ImageNet-1k

Flowers/Pet/Food/Sun

Teacher models