# Using Pretrained Pytorch Models

This notebook covers how to use pretrained models using FastAI library. I will specify three sources each of which will be covered here:
+ Fastai models: https://github.com/fastai/fastai/blob/fe4ab9ee0100f1ea390787beb0b7a1dd82412e61/fastai/torch_imports.py
+ Cadene models: https://github.com/Cadene/pretrained-models.pytorch
+ Torchvision models: https://pytorch.org/docs/master/torchvision/models.html

In fact, Torchvision models are included in Cadene repository as well. So we could essentially duplicate the steps for Cadene models.

I will be using Dogs and Cats dataset for demonstration in all the cases. I will also assume some familiarity with FAI library. The dataset can be found at http://files.fast.ai/data/. The purpose of the notebook is only to introduce you how to get the models working. Any other kind of tuning (like adding more fc layers etc.) can be easily incorporated on a case-to-case basis.

## Using FAI models

In [1]:
import matplotlib
matplotlib.use('Agg')

In [2]:
%reload_ext autoreload
%autoreload 2
%matplotlib inline

In [3]:
# from fastai.imports import *
# from fastai.transforms import *
# from fastai.conv_learner import *
# from fastai.model import *
# from fastai.dataset import *
# from fastai.sgdr import *
# from fastai.plots import *

In [4]:
from fastai import *
from fastai.vision import *

In [5]:
bs = 64

In [6]:
path = untar_data(URLs.PETS); path

PosixPath('/home/haider/.fastai/data/oxford-iiit-pet')

In [7]:
path.ls()

[PosixPath('/home/haider/.fastai/data/oxford-iiit-pet/annotations'),
 PosixPath('/home/haider/.fastai/data/oxford-iiit-pet/images')]

In [8]:
path_anno = path/'annotations'
path_img = path/'images'

The first thing we do when we approach a problem is to take a look at the data. We _always_ need to understand very well what the problem is and what the data looks like before we can figure out how to solve it. Taking a look at the data means understanding how the data directories are structured, what the labels are and what some sample images look like.

The main difference between the handling of image classification datasets is the way labels are stored. In this particular dataset, labels are stored in the filenames themselves. We will need to extract them to be able to classify the images into the correct categories. Fortunately, the fastai library has a handy function made exactly for this, `ImageDataBunch.from_name_re` gets the labels from the filenames using a [regular expression](https://docs.python.org/3.6/library/re.html).

In [9]:
fnames = get_image_files(path_img)
fnames[:5]

[PosixPath('/home/haider/.fastai/data/oxford-iiit-pet/images/leonberger_84.jpg'),
 PosixPath('/home/haider/.fastai/data/oxford-iiit-pet/images/american_pit_bull_terrier_78.jpg'),
 PosixPath('/home/haider/.fastai/data/oxford-iiit-pet/images/newfoundland_13.jpg'),
 PosixPath('/home/haider/.fastai/data/oxford-iiit-pet/images/english_setter_63.jpg'),
 PosixPath('/home/haider/.fastai/data/oxford-iiit-pet/images/Persian_38.jpg')]

In [10]:
np.random.seed(2)
pat = re.compile(r'/([^/]+)_\d+.jpg$')

In [11]:
data = ImageDataBunch.from_name_re(path_img, fnames, pat, ds_tfms=get_transforms(), size=224, bs=bs
                                  ).normalize(imagenet_stats)

#### FAI models Method-1 : `create_cnn (pretrained)`

What this does is to use the pretrained model (downloads it automatically). For few architectures you might need to download the weights file from http://files.fast.ai/models/ and put it inside fastai folder.

In this, the first argument is the model function call. For example, here it is resnet34 which is a function rather than a class or something.

In [12]:
learn1 = create_cnn(data, models.resnet34, metrics=error_rate)

In [15]:
learn1.fit_one_cycle(1)

epoch,train_loss,valid_loss,error_rate
1,0.849185,0.300581,0.093369


In [17]:
learn1.unfreeze()
learn1.fit_one_cycle(1)

epoch,train_loss,valid_loss,error_rate
1,0.599341,0.320468,0.109608


#### FAI models Method-2 : `create_cnn (from_model_data)`

In this case the argument required is the model which should be inherited from the pytorch class `nn.Module`

In [22]:
model = models.resnet34(pretrained=True)

learn2 = create_cnn(data, models.resnet34, metrics=error_rate)

We can have a look at what our model looks like using `learn.model` or `learn.models.model`. And compare the differences with the model obtained from the pretrained method of ConvLearner.

In [23]:
learn1.model

Sequential(
  (0): Sequential(
    (0): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
    (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU(inplace)
    (3): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
    (4): Sequential(
      (0): BasicBlock(
        (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace)
        (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
      (1): BasicBlock(
        (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (rel

In [24]:
model

ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (1): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace)
      (conv2): Co

In [25]:
children(model)[-2:]

[AvgPool2d(kernel_size=7, stride=1, padding=0),
 Linear(in_features=512, out_features=1000, bias=True)]

In [30]:
children(learn1.model)[1][-9:]  # [1] is the head

Sequential(
  (0): AdaptiveConcatPool2d(
    (ap): AdaptiveAvgPool2d(output_size=1)
    (mp): AdaptiveMaxPool2d(output_size=1)
  )
  (1): Lambda()
  (2): BatchNorm1d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (3): Dropout(p=0.25)
  (4): Linear(in_features=1024, out_features=512, bias=True)
  (5): ReLU(inplace)
  (6): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (7): Dropout(p=0.5)
  (8): Linear(in_features=512, out_features=37, bias=True)
)

As we can see the `ConvLearner.pretrained` method changes the last few layers of the input model. Specifically it changes the `AvgPool2d` with `AdaptiveConcatPool2d`, Flattens it, Adds to linear layers with batch norms and relu, adds dropout, and finally puts a LogSoftmax layer.

The reason for using Adaptive Pooling layers is that it allows the model to use any image size rather than being restricted to a constant size of say 224 x 224. What Adaptive Pooling layer does is it to specify the output size rather than the kernel size.

We will now try to replicate what is happening inside the pretrained method to be able to apply to other models as well. It is a good idea to see the function definitions of `ConvnetBuilder`, `ConvLearner.pretrained` and `ConvLearner.from_model_data` if you haven't already done so

We will now write a custom head which we will append in front of the existing model. There are again two ways to do so.

If you have a fixed size input say 224x224, then you can directly append another linear layer in front of the existing model and another log softmax. In this what essentially happens is that the linear layer outputs a 1000 dimensional output which is trained on ImageNet. Usually, it is a good idea to retrain the fully connected layers which brings us to the other method.

The other way is to remove the average pooling, replace it with adaptive concat pooling and add your own linear layer.

In [35]:
create_head(nf=1024, nc=37, ps=0.5, bn_final=False)  # Model head that takes nf features, runs through lin_ftrs, and ends with nc classes. ps is the probability of the dropouts

Sequential(
  (0): AdaptiveConcatPool2d(
    (ap): AdaptiveAvgPool2d(output_size=1)
    (mp): AdaptiveMaxPool2d(output_size=1)
  )
  (1): Lambda()
  (2): BatchNorm1d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (3): Dropout(p=0.25)
  (4): Linear(in_features=1024, out_features=512, bias=True)
  (5): ReLU(inplace)
  (6): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (7): Dropout(p=0.5)
  (8): Linear(in_features=512, out_features=37, bias=True)
)

In [37]:
custom_head1 = create_head(nf=1024, nc=37, ps=0.5, bn_final=False)

In [40]:
# custom_head1 = nn.Sequential(nn.ReLU(), nn.BatchNorm1d(1000),
#                             nn.Linear(in_features=1000, out_features=2), nn.LogSoftmax())
model_ch1 = nn.Sequential(*list(children(model)[:-2]),custom_head1)

In [41]:
model_ch1

Sequential(
  (0): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (2): ReLU(inplace)
  (3): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (4): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (1): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace)
      (conv2): Conv2d(64, 64, ker

In [42]:
# learn2_ch1 = ConvLearner.from_model_data(model_ch1, data)
learn2_ch1 = create_cnn(data, model_ch1, metrics=error_rate)

TypeError: conv2d(): argument 'input' (position 1) must be Tensor, not bool

In [19]:
learn2_ch1.fit(1e-2, 1, cycle_len=1, best_save_name='dc_ach1', metrics=[accuracy])

HBox(children=(IntProgress(value=0, description='Epoch', max=1), HTML(value='')))

epoch      trn_loss   val_loss   accuracy                     
    0      0.070181   0.057458   0.976     



[array([0.05746]), 0.976]

It is also a good idea to have a look at the number of parameters of the each layer. It can be easily done using `learn.summary()`

In [21]:
learn2_ch1.summary()

OrderedDict([('Conv2d-1',
              OrderedDict([('input_shape', [-1, 3, 224, 224]),
                           ('output_shape', [-1, 64, 112, 112]),
                           ('trainable', True),
                           ('nb_params', 9408)])),
             ('BatchNorm2d-2',
              OrderedDict([('input_shape', [-1, 64, 112, 112]),
                           ('output_shape', [-1, 64, 112, 112]),
                           ('trainable', True),
                           ('nb_params', 128)])),
             ('ReLU-3',
              OrderedDict([('input_shape', [-1, 64, 112, 112]),
                           ('output_shape', [-1, 64, 112, 112]),
                           ('nb_params', 0)])),
             ('MaxPool2d-4',
              OrderedDict([('input_shape', [-1, 64, 112, 112]),
                           ('output_shape', [-1, 64, 56, 56]),
                           ('nb_params', 0)])),
             ('Conv2d-5',
              OrderedDict([('input_shape', [-1, 64, 56, 56

Before we can apply method 2, we need to know the output dimension after adaptiveconcatpool2d and flatten. Do note we need to this for any one size, as adaptive pooling will ensure the same size will remain with any input size. For this, we will create a custom head till the flatten layer, and pass one minibatch and note the output size.

In [23]:
custom_head2 = nn.Sequential(AdaptiveConcatPool2d(), Flatten())
model_ch2 = nn.Sequential(*list(children(model))[:-2], custom_head2)

In [25]:
learn2_ch2 = ConvLearner.from_model_data(model_ch2, data)

In [26]:
learn2_ch2.models.model(V(next(iter(data.trn_dl))[0]))

Variable containing:
  2.1928   0.2979   0.0000  ...    0.9831   0.2992   0.6577
  1.6296   3.7968   4.8847  ...    1.0180   0.6856   0.5800
  4.4176   6.9506   2.2004  ...    2.3628   0.2063   0.2136
           ...               ⋱              ...            
  5.5922   3.3013   0.2849  ...    0.5287   0.0443   1.3995
  4.4066   0.9158   1.2564  ...    1.0420   0.1587   0.7406
  5.9668   2.0380   0.0000  ...    2.5776   0.2262   0.6182
[torch.cuda.FloatTensor of size 64x1024 (GPU 0)]

So we now know that the output size is 1024.

To check that the output is independent of input shape, we create another data object with a different size

In [27]:
sz2 = 299
data2 = ImageClassifierData.from_paths(PATH, tfms=tfms_from_model(arch, 299))

In [30]:
learn2_ch2_d2 = ConvLearner.from_model_data(model_ch2, data2)

In [31]:
learn2_ch2_d2.models.model(V(next(iter(data2.trn_dl))[0]))

Variable containing:
  4.5496   0.1859   3.1584  ...    0.4355   0.3250   1.0967
  0.9409   2.3361  17.4928  ...    0.0437   1.1525   1.8423
  2.7002   2.8415   1.1543  ...    0.1919   0.3417   0.0063
           ...               ⋱              ...            
  3.2327   4.7135   2.4346  ...    0.6134   0.3094   0.2462
  1.0582   4.7659  12.3103  ...    0.3718   1.0124   0.8549
  3.3615   2.7063   0.9418  ...    0.9223   0.1614   0.3817
[torch.cuda.FloatTensor of size 64x1024 (GPU 0)]

Voila, the output size is again of the size 1024.

Now, we can create rest of the custom head

In [35]:
custom_head3 = nn.Sequential(AdaptiveConcatPool2d(), Flatten(), nn.BatchNorm1d(1024), 
                            nn.Linear(in_features=1024, out_features=512), nn.ReLU(),
                            nn.BatchNorm1d(512), nn.Linear(in_features=512, out_features=2),
                            nn.LogSoftmax())
model_ch3 = nn.Sequential(*list(children(model))[:-2], custom_head3)

In [36]:
learn2_ch3 = ConvLearner.from_model_data(model_ch3, data)

In [38]:
learn2_ch3.fit(1e-2, 1, cycle_len=1, best_save_name='dc_ach3', metrics=[accuracy])

HBox(children=(IntProgress(value=0, description='Epoch', max=1), HTML(value='')))

epoch      trn_loss   val_loss   accuracy                     
    0      0.06065    0.042667   0.983     



[array([0.04267]), 0.983]

## Using Cadene Models and Torchvision Models

If you have been using Pytorch for some time, chances are you have come across this fantastic repository https://github.com/Cadene/pretrained-models.pytorch. It contains pretrained weights for a lot of models with their definitions and allows very simple function calls. 

The repository in fact has all the models in the torchvision repository as well https://pytorch.org/docs/master/torchvision/models.html. So if you want to use any torchvision model, look up its name in the Cadene repository and we are done.

The easiest way to install is to do `pip install pretrainedmodels` in your fastai conda environment.

I will just be copy pasting from the docs 

In [6]:
import pretrainedmodels

In [7]:
print(pretrainedmodels.model_names)

['fbresnet152', 'bninception', 'resnext101_32x4d', 'resnext101_64x4d', 'inceptionv4', 'inceptionresnetv2', 'alexnet', 'densenet121', 'densenet169', 'densenet201', 'densenet161', 'resnet18', 'resnet34', 'resnet50', 'resnet101', 'resnet152', 'inceptionv3', 'squeezenet1_0', 'squeezenet1_1', 'vgg11', 'vgg11_bn', 'vgg13', 'vgg13_bn', 'vgg16', 'vgg16_bn', 'vgg19_bn', 'vgg19', 'nasnetamobile', 'nasnetalarge', 'dpn68', 'dpn68b', 'dpn92', 'dpn98', 'dpn131', 'dpn107', 'xception', 'senet154', 'se_resnet50', 'se_resnet101', 'se_resnet152', 'se_resnext50_32x4d', 'se_resnext101_32x4d', 'cafferesnet101', 'pnasnet5large', 'polynet']


For demo purposes, lets use se_resnet50. Using `pretrained='imagenet'` downloads the model to `$HOME/.torch/models`. This can be changed by exporting the environment variable `$TORCH_MODEL_ZOO` to a different location.

In [8]:
model_name = 'se_resnet50' # could be fbresnet152 or inceptionresnetv2
model_cadene = pretrainedmodels.__dict__[model_name](num_classes=1000, pretrained='imagenet')

In [9]:
list(children(model_cadene))[-2:]

[AvgPool2d(kernel_size=7, stride=1, padding=0, ceil_mode=False, count_include_pad=True),
 Linear(in_features=2048, out_features=1000, bias=True)]

We will use the same techniques as we did for custom head for resnet34 for fastai models.

In [10]:
custom_head2 = nn.Sequential(AdaptiveConcatPool2d(), Flatten())
cadene_model_ch2 = nn.Sequential(*list(children(model_cadene))[:-2], custom_head2)
learn3_tmp = ConvLearner.from_model_data(cadene_model_ch2, data)

In [11]:
learn3_tmp.models.model(V(next(iter(data.trn_dl))[0]))

Variable containing:
  2.1128   2.2811   0.9118  ...    0.0000   0.0281   0.7989
  2.3026   1.2391   0.0042  ...    0.9331   0.0986   0.7122
  5.4738   0.8968   1.4244  ...    0.5880   1.3534   0.1224
           ...               ⋱              ...            
  2.4001   1.2857   1.2013  ...    0.0000   0.0066   0.1230
  1.3911   2.5906   0.0000  ...    1.3579   0.2531   0.2078
  3.1068   0.0000   7.8722  ...    1.8038   0.0805   0.7115
[torch.cuda.FloatTensor of size 16x4096 (GPU 0)]

In [15]:
custom_head3 = nn.Sequential(AdaptiveConcatPool2d(), Flatten(), nn.BatchNorm1d(4096), 
                            nn.Linear(in_features=4096, out_features=512), nn.ReLU(),
                            nn.BatchNorm1d(512), nn.Linear(in_features=512, out_features=2),
                            nn.LogSoftmax())
cadene_model_ch3 = nn.Sequential(*list(children(model_cadene))[:-2], custom_head3)

In [16]:
learn3_cadene = ConvLearner.from_model_data(cadene_model_ch3, data)

In [17]:
learn3_cadene.fit(1e-2, 1, cycle_len=1, best_save_name='dc_cd_ach3', metrics=[accuracy])

HBox(children=(IntProgress(value=0, description='Epoch', max=1), HTML(value='')))

epoch      trn_loss   val_loss   accuracy                       
    0      0.073795   0.04389    0.9855    



[array([0.04389]), 0.9855]

Voila, now we can use any of the pretrained models that Cadene has provided.