In [4]:
#| include: false
!pip install -Uqq fastbook
import fastbook
fastbook.setup_book()
from fastbook import *

## Introduction

The [fastai deep learning library (as of 2021)](https://www.fast.ai/2020/02/13/fastai-A-Layered-API-for-Deep-Learning/) is a layered API that has 4 levels of abstraction.

- Application layer
- High level API
- Mid level API
- Low level API

![](https://github.com/pranath/blog/raw/master/images/fastai-layered.png "The fastai layered API")

In this article we will look at how to build custom applications in the fastai library, by looking at how current fastai image model applications are actually built.

## Fastai Image Model Applications

### cnn_learner

When using this application, the first parameter we need to give it is an architecture which will be used as the *body* of the network. Usually this will be a ResNet architecture we pre-trained weights that is automaticially downloaded for you.

Next the final layer of the pre-trained model is cut, in fact all layers after the final pooling layer is also cut as well. Within each model we have a dictionary of information that allows us to identify these different points within the layers called *model_meta* here for example for ResNet50.

In [5]:
model_meta[resnet50]

{'cut': -2,
 'split': <function fastai.vision.learner._resnet_split>,
 'stats': ([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])}

Key parts of the network are:

- **Head** - The part of the network specialised for a particular task i.e. with a CNN the part after the adaptive average pooling layer
- **Body** - Everything else not the Head including the Stem
- **Stem** - The first layers of the network

We we take all the layers before the cut point of -2, we get the body of the model that fastai will keep to use for transfer learning. Then we can add a new head.

In [6]:
#| output: false
create_head(20,2)

Sequential(
  (0): AdaptiveConcatPool2d(
    (ap): AdaptiveAvgPool2d(output_size=1)
    (mp): AdaptiveMaxPool2d(output_size=1)
  )
  (1): Flatten(full=False)
  (2): BatchNorm1d(40, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (3): Dropout(p=0.25, inplace=False)
  (4): Linear(in_features=40, out_features=512, bias=False)
  (5): ReLU(inplace=True)
  (6): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (7): Dropout(p=0.5, inplace=False)
  (8): Linear(in_features=512, out_features=2, bias=False)
)

With this function we can choose how many extra layers should be added at the end as well as how much dropout and pooling. Fastai by default adds 2 linear layers rather than just one, as fastai have found this helps transfer learning work more quickly and easily than just one extra layer.

### unet_learner

This architecture is most often used for image segmentation tasks. 

We start of building this in the same way as the cnn_learner, chopping off the old head. For image segmentation, we are going to have to add a very different type of head to end up with a model that actually generates an image for segmentation. 

One way we could do this is to add layers that can increase the grid size in a CNN, for example duplicating each of the pixels to make an image twice as big - this is known as *nearest neighbour interpolation*. Another approach uses strides, in this case a stride of half, which is known as *transposed convolution*. However neither of these approaches works well in practice.

They key problem here is there is simply not enough information in these downsampled activations alone to be able to recreate something like the oroginal image quality needed for segmentation - its a big ask! And perhaps not realistic.

The solution to this problem here is our friend again *skip connections* however using them not accross one layer - but reaching these connections far accross to the opposite side of the architecture.

![](https://github.com/pranath/blog/raw/master/images/unet.png "The Unet architecture")

Here on the left half of the model is a CNN, and the transposed convolutional layers on the right, with the extra skip connections in gray. This helps the Unet do a much better job at generate the type of images we want for segmentation. One challenge with Unet's is the exact architecture does in this case depend on the image size, however fastai has a *DynamicUnet* object that automatically generates the correct architecture based on the data and image sizes given.

### A Siamese Network

Let's now try to create a custom model. [In an earlier article we looked at creating a Siamese network model](2021-05-30-fastai-midlevel-api.html). Let's recap the details of that model.

In [7]:
#| include: false
from fastai.vision.all import *
path = untar_data(URLs.PETS)
files = get_image_files(path/"images")

class SiameseImage(fastuple):
    def show(self, ctx=None, **kwargs): 
        img1,img2,same_breed = self
        if not isinstance(img1, Tensor):
            if img2.size != img1.size: img2 = img2.resize(img1.size)
            t1,t2 = tensor(img1),tensor(img2)
            t1,t2 = t1.permute(2,0,1),t2.permute(2,0,1)
        else: t1,t2 = img1,img2
        line = t1.new_zeros(t1.shape[0], t1.shape[1], 10)
        return show_image(torch.cat([t1,line,t2], dim=2), 
                          title=same_breed, ctx=ctx)
    
def label_func(fname):
    return re.match(r'^(.*)_\d+.jpg$', fname.name).groups()[0]

class SiameseTransform(Transform):
    def __init__(self, files, label_func, splits):
        self.labels = files.map(label_func).unique()
        self.lbl2files = {l: L(f for f in files if label_func(f) == l) for l in self.labels}
        self.label_func = label_func
        self.valid = {f: self._draw(f) for f in files[splits[1]]}
        
    def encodes(self, f):
        f2,t = self.valid.get(f, self._draw(f))
        img1,img2 = PILImage.create(f),PILImage.create(f2)
        return SiameseImage(img1, img2, t)
    
    def _draw(self, f):
        same = random.random() < 0.5
        cls = self.label_func(f)
        if not same: cls = random.choice(L(l for l in self.labels if l != cls)) 
        return random.choice(self.lbl2files[cls]),same
    
splits = RandomSplitter()(files)
tfm = SiameseTransform(files, label_func, splits)
tls = TfmdLists(files, tfm, splits=splits)
dls = tls.dataloaders(after_item=[Resize(224), ToTensor], 
    after_batch=[IntToFloatTensor, Normalize.from_stats(*imagenet_stats)])

Let's now build a custom model for the Siamese task. We will use a pre-trained model, pass 2 images through it, concatinate the results, then send them to a custom head that will return 2 predictions.

In terms of overall architecture and models lets define it like this.

In [8]:
class SiameseModel(Module):
    def __init__(self, encoder, head):
        self.encoder,self.head = encoder,head
    
    def forward(self, x1, x2):
        ftrs = torch.cat([self.encoder(x1), self.encoder(x2)], dim=1)
        return self.head(ftrs)

We can create a body/encoder by taking a pre-trained model and cutting it, we just need to specify where we want to cut. The cut position for a ResNet is -2.

In [9]:
encoder = create_body(resnet34, cut=-2)

Downloading: "https://download.pytorch.org/models/resnet34-333f7ec4.pth" to /root/.cache/torch/hub/checkpoints/resnet34-333f7ec4.pth


HBox(children=(FloatProgress(value=0.0, max=87306240.0), HTML(value='')))




We can then create a head. If we look at the encoder/body it will tell us the last layer has 512 features, so this head will take 2*512 - as we will have 2 images.

In [10]:
head = create_head(512*2, 2, ps=0.5)

We can now build our model from our constructed head and body.

In [11]:
model = SiameseModel(encoder, head)

Before we can use a Learner to train the model we need to define 2 more things. Firstly, a loss function. We might use here cross-entropy, but as our targets are boolean we need to convert them to integers or Pytorch will throw and error.

Secondly, we need to define a custom splitter that will tell the fastai library how to split the model into parameter groups, which will help train only the head of the model when we do transfer learning. Here we want 2 parameter groups one for the encoder/body and one for the head. So lets define a splitter as well.

In [12]:
def loss_func(out, targ):
    return nn.CrossEntropyLoss()(out, targ.long())

def siamese_splitter(model):
    return [params(model.encoder), params(model.head)]

We can now define a learner using our data, model, loss function, splitter and a metric. As we are defining a learner manually here, we also have to call freeze manually as well, to ensure only the last paramete group i.e. the head is trained.

In [13]:
learn = Learner(dls, model, loss_func=loss_func, 
                splitter=siamese_splitter, metrics=accuracy)
learn.freeze()

Let's now train our model.

In [14]:
learn.fit_one_cycle(4, 3e-3)

epoch,train_loss,valid_loss,accuracy,time
0,0.523447,0.334643,0.861299,03:03
1,0.373501,0.231564,0.913396,03:02
2,0.299143,0.209658,0.920162,03:02
3,0.251663,0.188553,0.928281,03:03


This has trained only our head. Lets now unfreeze the whole model to make it all trainable, and use discriminative learning rates. This will give a lower learning rate for the body and a higher one for the head.

In [15]:
learn.unfreeze()
learn.fit_one_cycle(4, slice(1e-6,1e-4))

epoch,train_loss,valid_loss,accuracy,time
0,0.23514,0.188717,0.924222,04:15
1,0.233328,0.179823,0.932341,04:12
2,0.210744,0.172465,0.928958,04:12
3,0.224448,0.176144,0.930311,04:14


## Points to consider with architectures

There are a few points to consider when training models in practice. if you are running out of memory or time - then training a smaller model could be a good approach. If you are not training long enough to actually overfit, then you are probably not taking advantage of the capacity of your model.

So one should first try to get to the point where your model is overfitting.

![](https://github.com/pranath/blog/raw/master/images/practical_principles.png "Practical principles of applying deep learning in practice")

Often many people when faced with a model that overfits, start with the wrong thing first i.e. to use a smaller model, or more regularization. Using a smaller model should be one of the last steps one tries, as this reduces the capaity of your model to actually learn what is needed.

A better approach is to actually try to use **more data**, such as adding more labels to the data, or using data augmentation for example. Mixup can be useful for this. Only once you are using much more data and are still overfitting, one could consider more generalisable architectures - for example adding batch norm could help here.

After this if its still not working, one could use regularisation, such as adding dropout to the last layers, but also throughout the model. Only after these have failed one should consider using a smaller model.

## Conclusion

In this article we have looked at how to build custom fastai application architectures, using image model examples.