Skip to content

Latest commit

 

History

History
43 lines (31 loc) · 1.98 KB

File metadata and controls

43 lines (31 loc) · 1.98 KB

Knowledge distillation: A good teacher is patient and consistent

by Lucas Beyer, Xiaohua Zhai, Amélie Royer, Larisa Markeeva, Rohan Anil, Alexander Kolesnikov

Introduction

We publish all teacher models, and configurations for the main experiments of the paper, as well as training logs and student models.

Please read the main big_vision README to learn how to run configs, and remember that each config file contains an example invocation in the top-level comment.

Results

We provide the following colab to read and plot the logfiles of a few runs that we reproduced on Cloud.

ImageNet-1k

The file bit_i1k.py is the configuration which reproduces our distillation runs on ImageNet-1k reported in Figures 1 and 5(left) and the first row of Table1.

We release both student and teacher models:

Model Download link Resolution ImageNet top-1 acc. (paper)
BiT-R50x1 link 160 80.5
BiT-R50x1 link 224 82.8
BiT-R152x2 link 224 83.0
BiT-R152x2 link 384 84.3

Flowers/Pet/Food/Sun

The files bigsweep_flowers_pet.py and bigsweep_food_sun.py can be used to reproduce the distillation runs on these datasets and shown in Figures 3,4,9-12, and Table4.

While our open-source release does not currently support doing hyper-parameter sweeps, we still provide an example of the sweeps at the end of the configs for reference.

Teacher models

Links to all teacher models we used can be found in common.py.