<img style="float: right;" src="../bvlecture_exercises/htwlogo.jpg">

# Exercise: Where is Waldo ?

**Author**: _Erik Rodner_ <br>
**Lecture**: Computer Vision and Machine Learning I

In this exercise, we will design an awkward convolutional neural network purely for template matching, i.e. finding a structure in an image by sliding-over it.

**Can you spot the errors in architecture, the optimization, or the evaluation code?**


In [None]:
# import some dependencies
import torchvision
import torch
import torchvision.transforms as transforms
import numpy as np
import matplotlib.pyplot as plt

import torch.optim as optim
import time
import torch.nn as nn
import torch.nn.functional as F
from PIL import Image
from skimage.feature import peak_local_max
from collections import OrderedDict
torch.set_printoptions(linewidth=120)

### Defining the simulation and the corresponding data iterator

Our dataset will be generated in an synthetic manner:
1. **Input image generation**:
    1. A small icon is placed at a random location in a larger image
    2. Salt-and-pepper noise is added 
2. **Target image generation**:
    1. A 2D-Gaussian is placed at the position of the icon with fixed size (done by convolving an impulse with a large Gaussian filter)
    
The 2D-Gaussian simulates a filter output with highest peak at the icon location - this is exactly what we want the model to output after training. The dataset consists of 128 example but is completely random and 128 is a completely arbitrary value that simply defines an epoch length.

In [None]:
class MyIterableDataset(torch.utils.data.Dataset):
    def __init__(self, icon_fn="waldo_small.jpg"):
        """ Initialization of the dataset and preparing it """
        super(MyIterableDataset).__init__()

        img_waldo = Image.open(icon_fn)

        # some preprocessing independent of the random images
        # later on - no gradients required here
        with torch.no_grad():
            # transform the PIL image to a tensor 
            self.input_tensor = transforms.ToTensor()(img_waldo)
            # add a black border around the icon - this defines the size of the large image
            self.input_tensor = transforms.functional.pad(self.input_tensor, 32, fill=0)
            # ... the icon is now placed in the middle
            
            # target image should have the same size as input image
            self.target_tensor = torch.zeros((1,self.input_tensor.shape[1],self.input_tensor.shape[2]))
            # add a single white pixel in the middle
            self.target_tensor[:, self.target_tensor.shape[1]//2, self.target_tensor.shape[2]//2] = 1
            # convolve with a Gaussian filter generating a 2D Gaussian in the middle
            self.target_tensor = transforms.GaussianBlur(61, (10, 10))(self.target_tensor)
            # normalize the target image
            self.target_tensor = self.target_tensor / torch.max(self.target_tensor)
            
    def __len__(self):
        """ Return an arbitrary length that defines an epoch """
        return 128
        
    def __getitem__(self, idx):
        """ return a single item of the dataset """
        
        # sample some translation parameters
        # we could also change the code here to allow for rotation and scaling
        affine_params = transforms.RandomAffine((0,0)).get_params(degrees=(0,0), 
                                                              translate=(0.3,0.3), 
                                                              scale_ranges=(1, 1), 
                                                              shears=(0, 0), 
                                                              img_size=self.target_tensor.size())

        # apply the same transformation to both image and target
        transformed_input = transforms.functional.affine(self.input_tensor, *affine_params)
        transformed_target = transforms.functional.affine(self.target_tensor, *affine_params)
        
        # add salt-and-pepper noise to the input image
        random_mask = torch.rand_like(transformed_input)
        transformed_input[random_mask<0.05] = 0
        transformed_input[random_mask>0.95] = 1
        
        return transformed_input, transformed_target
        
        
# the dataset iterator
train_dataset = MyIterableDataset()
# the dataset loader that returns the data in batches
train_data_loader = torch.utils.data.DataLoader(train_dataset, batch_size=8, num_workers=0)

### Data visualization

Let's visualize a batch of data.

In [None]:
# Get the first batch
input_batch, target_batch = next(iter(train_data_loader))

In [None]:
plt.figure(figsize=(20,10))
grid_inputs = torchvision.utils.make_grid(nrow=8, tensor=input_batch)
plt.imshow(np.transpose(grid_inputs, axes=(1,2,0)))
plt.title("input images");
plt.figure(figsize=(20,10))
grid_targets = torchvision.utils.make_grid(nrow=8, tensor=target_batch)
plt.imshow(np.transpose(grid_targets, axes=(1,2,0)))
plt.title("target images");

### Define the model

We will use the simplest ConvNet possible: a single convolution layer and a rectified linear unit.

In [None]:
# define network as an pytorch module
class Network(nn.Module): 
    def __init__(self):
        super().__init__() 
        
        # defining the layers of the convnet by defining
        # a list of transformations that will be applied later on
        self.layers = [
             ("conv1", nn.Conv2d(in_channels=3, out_channels=1, kernel_size=(5,5), padding=2)),
             ("relu1", nn.ReLU()) 
        ]
        
        # the transformation is a composition of all previous transformations
        self.seq = torch.nn.Sequential( OrderedDict(self.layers) )
        
        
    def forward(self, t): 
        # forward pass
        return self.seq(t) 

### Optimization loop
In contrast to previous notebooks we will use a pytorch optimizer that provides
us with Adam - improved stochastic gradient descent with adaptive step sizes.
```optimizer``` performs the gradient update for us, which we previously performed manually.

In [None]:
# initialize the network
cnn_model = Network()
# define the optimizer and its parameters

optimizer = optim.Adam(lr=0.1, params=cnn_model.parameters())

# training loop
for epoch in range(5):
    start_time = time.time()
    total_loss = 0
    # loop through all batches of the data
    for batch in train_data_loader:
        input_imgs, target_imgs = batch
        pred_imgs = cnn_model(input_imgs) # get preds
        
        # as a loss we will use a quadratic loss function here 
        # however, this is likely not a good choice
        loss = F.mse_loss(pred_imgs, target_imgs)
        optimizer.zero_grad() # zero grads
        loss.backward() # calculates gradients 
        optimizer.step() # update the weights
        
        total_loss += loss.item()
        
    end_time = time.time() - start_time    
    print("Epoch no.",epoch+1 ,"|total_loss: ", total_loss, "| epoch_duration: ", round(end_time,2),"sec")

### Testing on the training set

We now evaluate our model on the training set ....wait, isn't this illegal?
Since we randomly sample the dataset, it is not :)

In [None]:
input_batch, target_batch = next(iter(train_data_loader))
pred_batch = cnn_model.forward(input_batch)

In [None]:
total_err = 0
for i in range(target_batch.shape[0]):
    with torch.no_grad():
        target_peak = peak_local_max(np.array(target_batch[i,0,...]), num_peaks=1)
        pred_peak = peak_local_max(np.array(pred_batch[i,0,...]), num_peaks=1)
        
    if pred_peak.shape != (1,2):
        pred_peak = np.array([[0,0]])
    euclid_err = np.linalg.norm(target_peak-pred_peak)
    total_err += euclid_err
    print (f"Example no. {i}: euclidean distance between peaks is {euclid_err:.2f}px")
    
total_err /= target_batch.shape[0]    
print (f"Average error is {total_err:.2f}px")

In [None]:
plt.figure(figsize=(20,10))
grid_inputs = torchvision.utils.make_grid(nrow=8, tensor=input_batch)
plt.imshow(np.transpose(grid_inputs, axes=(1,2,0)))
plt.title("input images");
plt.figure(figsize=(20,10))
grid_targets = torchvision.utils.make_grid(nrow=8, tensor=target_batch)
plt.imshow(np.transpose(grid_targets, axes=(1,2,0)))
plt.title("target images");
plt.figure(figsize=(20,10))
grid_preds = torchvision.utils.make_grid(nrow=8, tensor=pred_batch)
plt.imshow(np.transpose(grid_preds, axes=(1,2,0)))
plt.title("predictions");

### Visualizing the parameter of the first convolutional layer

Let us visualize the learned filter parameters.

In [None]:
plt.figure()
with torch.no_grad():
    conv_parameters = np.array(cnn_model.seq[0].weight[0,0,:,:])
plt.imshow(conv_parameters, cmap=plt.cm.gray)
plt.colorbar()
plt.show()