<img style="float: right;" src="../htwlogo.jpg">

# Convolutions and Dataloaders

**Author**: _Erik Rodner_ <br>
**Lecture**: Computer Vision and Machine Learning I

In the following notebook, we show the output of typical layers in a convolutional neural network. Furthermore, we get introduced to a predefined data loader.

In [None]:
# import some dependencies
import torchvision
import torch
import torchvision.transforms as transforms
import numpy as np
import matplotlib.pyplot as plt

import torch.optim as optim
import time
import torch.nn as nn
import torch.nn.functional as F
from PIL import Image
from collections import OrderedDict
torch.set_printoptions(linewidth=120)
from torchvision.datasets import CIFAR10

### Defining the dataset and the data loader

We will use a predefined data loader from pytorch to get some relevant images.

In [None]:
train_dataset = CIFAR10 (root="./cifar10",
                         train=True, 
                         download=True, 
                         transform=transforms.Compose([transforms.ToTensor()]))

In [None]:
train_data_loader = torch.utils.data.DataLoader(train_dataset, batch_size=1, num_workers=0)

Get a first single image from the data loader.

In [None]:
input_batch, labels = next(iter(train_data_loader))

### Convolution operation

Let's define a $3 \times 3 \times 3$ mean filter as a convolutional layer, i.e. a filter size of 3 in $x$, in $y$ as well as across the channels.

In [None]:
conv_parameters = torch.ones((1,3,3,3))/27
conv_result = F.conv2d(input_batch, weight=conv_parameters)

In [None]:
input_pil = transforms.ToPILImage()(input_batch[0,...])
conv_result_np = np.array(conv_result[0,0,...])
plt.figure()
plt.subplot(1,2,1)
plt.imshow(input_pil)
plt.subplot(1,2,2)
plt.imshow(conv_result_np, cmap=plt.cm.gray)

### Maximum pooling operation

We now visualize the result of a maximum pooling operation. This is a non-linear filter operation and could be for example also used to reduce impulse noise of a certain kind.

In [None]:
pooling_result = F.max_pool2d(input_batch, kernel_size=2, stride=2)

In [None]:
plt.figure(figsize=(15,10))
plt.subplot(1,4,1)
plt.imshow(input_pil)
for i in range(3):
    plt.subplot(1,4,i+2)
    pooling_result_np = np.array(pooling_result[0,i,...])
    plt.imshow(pooling_result_np, cmap=plt.cm.gray)
    plt.title(f"channel {i} of the max-pool result")