# Stride and padding

We have seen that we can control the behavior of a convolutional layer by specifying the numbeer of filters and the size of each filter. For instance, to increase the number of nodes in a convolutional layer you could increase the number of filters. To increase the size of the detected patterns, you could increase the size of your filter. But there are even more hyperparameters that we can do. 

One of these hyperparameters is referred to as the stride of the convolution. The stride is just the amount by which the filter slides over the image. We have seen on the last notebook the stride as one. We move the convolution window horizontally and vertically across the image one pixel at a time. A stride of one makes the convolutional layer roughly the same width and height as the input image. Look at the next picture:

<img src="assets/Stride1.png">

In that image, the purple convolutional layer as stacked feature maps. If we instead make the stride equal to two, the convolutional layer is about half the width and height of the image.

<img src="assets/Stride2.png">

We save roughly because it depends on what you do at the edge of our image. To see how the treatment of the edges will matter, consider our toy example of a five-by-five grey scale image. Let's say we have a different filter now with the height and width of two, say the stride is also two. Then, as before, we start with the filter in the top left corner of and calculate the value for the first node in the convolutional layer. 

<img src="assets/filter2x2.png">

We then move the filter two units to the right and do the same. But when we modve the filter two more units to the right the filter extends outside the image. Do we still want to keep the corresponding convolutional node? For now, let's just populate the places where the filter extends outside with a question mark and proceed as planned. 

So now, how do we deal with these nodes where the filter extended outside the image? We could, as a first option, just get rid of them. Note that if we choose this option, it is possible that our convolutional layer has no information about some regions of the image. For our example, this is the case here for the right and bottom edges of the image. 

<img src="assets/Filter2x2onImage.png">

As a second option, we could plan ahead for this case by padding the image with zeros to give the filter more space to move. Now, when we populate the convolutional layer, we get contributions from every region in the image. 

<img src="assets/StrideOption2.png">

# Pooling layers

We are now ready to introduce ourselves to the second and final type of layer that we will need to introduce before building our own convolutional neural networks. These so-called pooling layers often take convolutional layers as input. 

Recall that a convolutional layer is a stack of feature maps where we have one feature map for each filter. A complicated dataset with many different object categories will require a large number of filters, each responsible for finding a pattern in the image. 

<img src="assets/MultiCNNs.png">

More filters means a bigger stack, which means that the dimensionality of our convolutional layers can get quite large. Higher dimensionality means we will need to use more parameters, which can lead to overfitting. Thus, we need a method for reducing this dimensionality. This is the role of pooling layers within a convolutional neural network. We will focus on two different types of pooling layers.

The first type is a max pooling layer, max pooling layers will take a stack of feature maps as input. In the following image, we have enlarged and visualized all three of the feature maps. As with convolutional layers, we will define a window size and stride. In this case, we will use a window size of two and a stride of two. 

<img src="assets/CNNwithMap.png">

We will work with each feature map separately. Let's begin with the first feature map, we start with our window in the top left corner of the image. The value of the corresponding node in the max pooling layer is calculated by just taking the maximum of the pixels contained in the window. For our case, we had a 1, 9, 5, and 4 in our window, so nine was the maximum. If we continue this process and do it for all of our feature maps, the output is a stack with the same number of feature maps, but each feature maps has been reduced in width and height. 

<img src="assets/CNNwithMapResult.png">

In this case, the width and height are half of that of the previous convolutional layer. 

## Other kinds of pooling

The other tupe of pooling, and it is worth nothing that some architectures choose to use [average pooling](https://pytorch.org/docs/stable/nn.html#avgpool2d), which chooses to average pixel values in a given window size. So in a 2x2 window, this operation will see 4 pixel values, and return a single, average of those four values, as output. 

This kind of pooling is typically not used for image classification problems because maxpooling is better at noticing the most important details about edges and other features in an image, we can see this used in applications for which smoothing an image is preferable.