# Transfer Learning with Resnet

In this notebook we load a small datasets that contains dolphins and elephants. We classify the images using CNNs and compare two approaches and see what worsk better:
1. Training a CNN from sratch against
2. Finetuning a pretrained ResNet.

In [7]:
import torch
from torchvision import transforms
import torchvision
import torch.nn as nn
import torch.nn.functional as F
import torchvision.datasets as datasets

torch.manual_seed(0)

<torch._C.Generator at 0x7fdd2210f710>

Let's load our data a have a look at the shape of some images:

In [8]:
dataset = datasets.ImageFolder(root='../data/anmials')
for i, data in enumerate(dataset):
    print(data)
    if i == 10:
        break

(<PIL.Image.Image image mode=RGB size=300x179 at 0x7FDD242741F0>, 0)
(<PIL.Image.Image image mode=RGB size=300x179 at 0x7FDD2427FA90>, 0)
(<PIL.Image.Image image mode=RGB size=300x166 at 0x7FDD2437E8E0>, 0)
(<PIL.Image.Image image mode=RGB size=300x259 at 0x7FDD242741F0>, 0)
(<PIL.Image.Image image mode=RGB size=300x225 at 0x7FDD2437E280>, 0)
(<PIL.Image.Image image mode=RGB size=300x277 at 0x7FDD2427FA90>, 0)
(<PIL.Image.Image image mode=RGB size=300x183 at 0x7FDD2437EEE0>, 0)
(<PIL.Image.Image image mode=RGB size=300x225 at 0x7FDD2427F670>, 0)
(<PIL.Image.Image image mode=RGB size=300x214 at 0x7FDD2437E8E0>, 0)
(<PIL.Image.Image image mode=RGB size=300x223 at 0x7FDD2427FA90>, 0)
(<PIL.Image.Image image mode=RGB size=300x277 at 0x7FDD2437E280>, 0)


We see that the pictures have all width=300 but a varying height. To use them in transfer learning they need to have the standard shape (224, 224), which is the data format of ImageNet (on which most pretrained models are trained on).  

To get them into this shape, we first increase the height to 224 (this will also increase the height) and then take the 224 square which is center in the middle.

In [9]:
image_transforms = transforms.Compose([
             transforms.Resize(size=224),
             transforms.CenterCrop(size=224),
             transforms.ToTensor(),
             transforms.Normalize([0.485, 0.456, 0.406],[0.229, 0.224, 0.225]) # standard normalization for transfer learning
    ])

In [10]:
data = datasets.ImageFolder(root='../data/anmials', transform=image_transforms)

print(len(data), "data points")

129 data points


Next, we split the data into train and test and define the data loaders.

In [11]:
train_set, test_set = torch.utils.data.random_split(data, [100, 29])

batch_size = 10
trainloader = torch.utils.data.DataLoader(train_set, batch_size=batch_size,
                                          shuffle=True)
testloader = torch.utils.data.DataLoader(test_set, batch_size=29,
                                         shuffle=False)

## Tasks:
### Task 1.
Train a CNN from scratch to identify the object on the image (dolphin or elephant). For this, use the same CNN architecture as in notebook `5_CNN_CIFAR10.ipynb` (to make this work, you need to adjust some network parameters for this dataset). Train for 20 epochs on the train data and afterwards compute accuracy on the test data.

In [12]:
class Net(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(44944, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 2)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = torch.flatten(x, 1)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

In [13]:
import torch.optim as optim

net = Net()

criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)

for epoch in range(20):
    
    loss_batch = 0
    for i, data in enumerate(trainloader, 0):
        
        inputs, labels = data

        # zero the parameter gradients
        optimizer.zero_grad()

        # forward + backward + optimize
        outputs = net(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        loss_batch += loss.item()
        
    print(f'{epoch + 1}, loss: {loss_batch:.5f}')
    loss_batch = 0

print('Finished Training')

1, loss: 6.88152
2, loss: 6.62755
3, loss: 6.11002
4, loss: 5.09790
5, loss: 3.92423
6, loss: 3.11455
7, loss: 2.17701
8, loss: 1.75489
9, loss: 1.16424
10, loss: 0.91531
11, loss: 0.78906
12, loss: 0.47361
13, loss: 0.53425
14, loss: 0.29821
15, loss: 0.20747
16, loss: 0.13045
17, loss: 0.06855
18, loss: 0.04904
19, loss: 0.04147
20, loss: 0.03367
Finished Training


In [14]:
correct = 0
total = 0

wrong_images = []
wrong_labesl = []

with torch.no_grad():
    for data in testloader:
        images, labels = data
        # calculate outputs by running images through the network
        outputs = net(images)
        # the class with the highest energy is what we choose as prediction
        predicted = torch.argmax(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

print(f'Accuracy of the network on the test images: {100 * correct // total} %')

Accuracy of the network on the test images: 86 %


### Task 2:
Instead of training a CNN, load a pretrained ResNet18 and only train the last layer (see the lecture slides how that works). Train again for 20 epochs and compare the results.

In [15]:
model = torchvision.models.resnet18(pretrained=True)

for param in model.parameters():
      param.requires_grad = False

fc_inputs = model.fc.in_features
model.fc = nn.Linear(fc_inputs, 2)

In [16]:
import torch.optim as optim

net = model

criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)

for epoch in range(20):

    loss_batch = 0.0
    for i, data in enumerate(trainloader):
        # get the inputs; data is a list of [inputs, labels]
        inputs, labels = data
        #print(labels)

        # zero the parameter gradients
        optimizer.zero_grad()

        # forward + backward + optimize
        outputs = net(inputs)
        
        #print(outputs.shape)
        #print(outputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        loss_batch += loss.item()
        
    print(f'{epoch + 1}, loss: {loss_batch:.5f}')
    loss_batch = 0

print('Finished Training')

1, loss: 6.21116
2, loss: 3.06055
3, loss: 2.19162
4, loss: 1.54697
5, loss: 1.20638
6, loss: 1.26148
7, loss: 1.34913
8, loss: 0.98523
9, loss: 0.76459
10, loss: 1.12265
11, loss: 0.80007
12, loss: 0.80555
13, loss: 1.16048
14, loss: 1.41405
15, loss: 0.62329
16, loss: 1.00320
17, loss: 1.26670
18, loss: 0.60839
19, loss: 0.56927
20, loss: 0.47967
Finished Training


In [17]:
correct = 0
total = 0

wrong_images = []
wrong_labesl = []

with torch.no_grad():
    for data in testloader:
        images, labels = data
        outputs = net(images)
        predicted = torch.argmax(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

print(f'Accuracy of the network on the test images: {100 * correct // total} %')

Accuracy of the network on the test images: 100 %
