## **Valid vs Same convolutions**


| Type                | Definition                                                                                                                   | Output Size                                                        | Padding                                                                                                     | Advantages                                                                                                        | Limitations                                                                                 | Use Cases                                                                                                                           |
|---------------------|------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------|
| Valid Convolutions (a.k.a no-padding convolutions)  | Convolutions where the filter is applied only within the image area without padding. They are also known as no-padding convolutions. | Reduces with each layer, smaller than the input image.             | None - the borders of the image are not used in the convolution computation.                                   | Does not introduce artificial data, uses only actual image pixels. Suitable when the integrity of the original image is important. | Limits the depth of the network due to reduced spatial dimensions. Important border information may be lost. | When the focus is on the central part of the image or when the dataset is large enough to train without the need for data augmentation from padding. |
| Padded Convolutions (a.k.a same convolutions) | Convolutions where artificial padding is added to the input image, allowing the convolution to cover border areas. They ensure the output feature map has the same spatial dimensions as the input when the stride is 1. | Preserved as the input image (when stride is 1), does not reduce across layers. | Added around the border of the image, typically zero-padding but can also be other forms like replicate or reflect padding. | Allows for deeper network architectures by preserving the spatial dimensions of the feature map. Ensures pixels on the border of the image are given equal importance. | May introduce artificial data which can affect the learning process. Potential for edge artifacts due to padding. | When border information is crucial or when maintaining the size of the feature map is necessary to apply multiple convolutional layers. |


The term "same convolution" refers to the type of padding used in a convolutional operation to ensure that the output feature map has the same spatial dimensions as the input image when the stride is set to 1. The word "same" literally means that the dimensionality is the same before and after the convolution.

Here's why this terminology isn't as weird as it might seem:

1. **Descriptive Naming**: The term "same" intuitively describes the outcome of the convolution — the output size is the "same" as the input size. This is in contrast to "valid" convolution, where the output size is typically smaller than the input.

2. **Simplicity**: It simplifies the explanation and implementation of network architectures. When designing or reading about neural networks, knowing that a convolutional layer uses "same" padding immediately indicates that the layer's output will match its input in terms of width and height dimensions.

3. **Consistency Across Implementations**: The term "same" has become a standard in various deep learning libraries and frameworks, such as TensorFlow and Keras. This consistency helps in maintaining a common language among practitioners.

4. **Practicality**: In practice, maintaining the same dimensionality is often desired, especially when you want to build deep networks without reducing the spatial resolution of the feature maps. This type of padding allows for deep networks that can retain detailed spatial information throughout the layers.

5. **Convention**: Over time, "same" has become the conventional term for this padding technique. While it may have seemed arbitrary at first, its widespread use has cemented its place in the machine learning and computer vision lexicon.

In conclusion, the term "same convolution" may seem a bit odd initially, but it serves as a straightforward descriptor of what the padding does to the spatial dimensions of the convolutional output, making it a practical term that has been widely adopted in the field.

**CONVOLUTION**

In convolutional neural networks, the relationship between input size and output size for a convolutional layer is governed by the following formula:

$${Output\ Size} = \frac{{Input\ Size} - {Filter\ Size} + 2 \times {Padding}}{{Stride}} + 1$$

Where:

- **Input Size** is the height or width of the input image (assuming the input is a square, otherwise you would calculate height and width separately).
- **Filter Size** (also known as Kernel Size) is the size of the filter (kernel) used in the convolutional layer.
- **Padding** is the amount of padding applied to the input image before the convolution.
- **Stride** is the number of pixels by which the filter moves during the convolution.

In the case of the code you provided, since no padding or stride is explicitly mentioned, it can be assumed that the default values are used. The default stride is typically 1, and the default padding is typically 0. Thus, for each dimension (height and width), the formula simplifies to:

$${Output\ Size} = {Input\ Size} - {Filter\ Size} + 1$$

Applying the simplified formula to your scenario:

- **Input Size (H and W)**: 32 (height and width of the input)
- **Filter Size**: 3 (the kernel size of the convolutional layer)
- **Padding**: 0 (no padding mentioned, assuming the default)
- **Stride**: 1 (assuming the default stride)

The output size for each dimension would therefore be:

$${Output\ Size} = 32 - 3 + 1 = 30$$

Hence, both the output height and width are 30, which is consistent with the tensor shape `[1, 8, 30, 30]` given in the output of the code snippet.

**POOLING**

Output of a pooling operation, which is quite similar to the formula used for convolutional layers. The formula for the output size after a pooling operation is:

$${Output\ Size} = \left\lfloor \frac{{Input\ Size} - {Pool\ Size} + 2 \times {Padding}}{{Stride}} \right\rfloor + 1$$

Where:

- **Input Size** is the height or width of the input.
- **Pool Size** is the size of the window over which the pooling operation is done.
- **Padding** is the amount of zero-padding added to the edges of the input.
- **Stride** is the number of pixels by which the pooling window moves after each operation.

The $\lfloor\ \rfloor$ function is applied because the output size must be an integer, and any fractional part is truncated.

The default values for stride and padding in a pooling layer are typically equal to the pool size and 0, respectively, unless specified otherwise. Therefore, if the pool size is 2, and padding is not added, and stride is not specified, both stride and padding would default to 2 and 0, respectively.

For example, if we apply the above formula to a max pooling operation with a 2x2 pool size, no padding (padding = 0), and a stride of 2 (which is the typical setting for max pooling with a 2x2 window), the output size calculation for each dimension would be:

$${Output\ Size} = \left\lfloor \frac{30 - 2 + 2 \times 0}{2} \right\rfloor + 1 = \left\lfloor \frac{28}{2} \right\rfloor + 1 = 14 + 1 = 15$$

So, after a 2x2 max pooling operation on an input of size 30x30 (assuming stride of 2 and no padding), the output dimension would be 15x15.
