# Knowledge Distillation

> How to apply knowledge distillation with fasterai

In [None]:
#all_slow

In [None]:
#hide
from fasterai.distill.all import *
from fastai.vision.all import *

We'll illustrate how to use Knowledge Distillation to distill the knowledge of a Resnet34 (the teacher), to a Resnet18 (the student)

Let's us grab some data

In [None]:
path = untar_data(URLs.PETS)
files = get_image_files(path/"images")

def label_func(f): return f[0].isupper()

dls = ImageDataLoaders.from_name_func(path, files, label_func, item_tfms=Resize(64))

The first step is then to train the teacher model. We'll start from a pretrained model, ensuring to get good results on our dataset.

In [None]:
teacher = cnn_learner(dls, resnet34, metrics=accuracy)
teacher.unfreeze()
teacher.fit_one_cycle(5)

epoch,train_loss,valid_loss,accuracy,time
0,0.719888,2.204128,0.786198,00:07
1,0.51262,0.474024,0.826116,00:07
2,0.326589,0.292092,0.869418,00:07
3,0.178782,0.176971,0.928281,00:07
4,0.091608,0.172914,0.935047,00:07


### Without KD

We'll now train a Resnet18 from scratch, and without any help from the teacher model, to get that as a baseline 

In [None]:
student = Learner(dls, resnet18(num_classes=2), metrics=accuracy)
student.fit_one_cycle(5)

epoch,train_loss,valid_loss,accuracy,time
0,0.60839,0.611538,0.64682,00:06
1,0.572184,0.619386,0.635318,00:06
2,0.515913,0.480325,0.757781,00:06
3,0.433941,0.453154,0.769959,00:06
4,0.352041,0.420598,0.79364,00:06


### With KD

And now we train the same model, but with the help of the teacher.

In [None]:
student = Learner(dls, resnet18(num_classes=2), metrics=accuracy)
kd = KnowledgeDistillation(teacher, T=10)
student.fit_one_cycle(5, cbs=kd)

epoch,train_loss,valid_loss,accuracy,time
0,0.59721,0.738136,0.703654,00:07
1,0.554334,0.682321,0.684032,00:07
2,0.518414,0.505285,0.747632,00:07
3,0.443556,0.435712,0.778078,00:07
4,0.359205,0.386629,0.817997,00:07


When helped, the student model performs better ! 