<a href="https://colab.research.google.com/github/glorivaas/Machine_Learning25/blob/main/Lab11.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#**HOMEWORK 11**
##**Understanding Deconvolution in Autoencoders**

### **Theoretical part**

1. **What is a transposed convolution?**

 A transposed convolution (also called deconvolution) is a layer that reverses the spatial effects of a standard convolution.
 Its goal is to upsample a smaller input feature map into a larger output. It is often used in the field of autoencoders (in the decoder), GANs (to generate images), U-Net (to recover spatial resolution).
  
  We can think of it as answering the following question: "How do we reconstruct a larger image that might have been downsampled by a convolution before?"


2. **How does it differ from a regular convolution?**

 The goal of a regular convolution is to downsample, or reduce the size of the input (which are larger elements than the output), whereas transposed convolution focuses on upsampling, or increasing the size of the inputs. Moreover, their usses differ, being mainly encoding (feaure extraction) for regular convolution and decoding (reconstruction, generation) for transposed one.
 Their way of operating is also different: in regular convolution, the kernel compresses information into smaller areas. In transposed convolution, we apply the kernel in a way that distributes the input values over a larger space, effectively expanding it.


3. **How does it upsample feature maps?**

 The main steps for increasing the spatial dimensions of an input tensor are the following:
  1. Insert zeros between elements (depending on stride).

  2. Slide a kernel over this expanded input.

  3. Multiply and sum values just like in convolution.

  4. Overlap adds up: multiple kernel applications can contribute to the same output position.

  5. The result is an upsampled output grid with learned spatial structure.

  Maybe it is clearer if we see an example for this.

    - Input feature map: a 2D grid (size 2×2)

    - Kernel: size 3×3

    - Stride: 2

    - Padding: 0

  We'll show what happens when we apply a transposed convolution.

- **Step 1**: Insert Zeros. This depends on the stride.

  If stride = 2, insert (stride - 1) = 1 zero between each row and column.

  For example, for a 2×2 input: <br>
  Input: <br>
  [1 2] <br>
  [3 4]

  Expanded: <br>
  [1 0 2] <br>
  [0 0 0]<br>
  [3 0 4] <br>
  Now we have a 3×3 grid, where the non-zero values are the original inputs, and the rest are zeros.

  Note: With larger strides, more zeros are inserted.

- **Step 2**: Slide the kernel over the expanded grid.

  Let’s say the kernel is: <br>
  [a b c] <br>
  [d e f] <br>
  [g h i] <br>
 At each position, we perform the usual element-wise multiplication and summation between the kernel and the overlapping region in the expanded grid.

 For instance, the top-left region of the expanded grid might be multiplied with the kernel, and the sum is written into the output at a specific location.

 This step is why the operation is called “transposed” — mathematically, it's the gradient of a convolution, but operationally it "spreads out" the input.

- **Step 3**: sum overlapping contributions. <br>
 Because we are sliding the kernel over a grid that has inserted zeros and a stride of 1, some output positions will receive multiple contributions from different overlapping kernel applications.

  In those cases, we sum the values contributed by each overlapping region.

- **Final result** <br>
  The result is an output grid larger than the input. For example:

  If you start with a 2×2 input, a 3×3 kernel, and stride = 2, you may end up with a 5×5 output, depending on the padding and dilation.


  <br>

  **Note**: formula to compute output size

  Output size = (Input size - 1) * stride - 2 * padding + kernel size

  Example:

  Input = 2

  Stride = 2

  Padding = 0

  Kernel size = 3

  Then:

  (2 - 1) * 2 - 0 + 3 = 5

4. **What are stride, padding, and kernel size, and how do they influence the result in a transposed convolution?**

  1. **Stride**:
    - is the step size with which the kernel is moved over the input.
    - controls how far apart the output patches (generated from each input value) are placed.

    - in transposed convolution, a larger stride increases the spacing between the output contributions, thereby increasing the output size.

    - Mathematically, the output size increases with stride, often approximately multiplied by the stride value.

 2. **Padding**:
    - in transposed convolution refers to the amount of cropping or shifting done after expanding the input via stride and kernel.

    - in contrast to regular convolution (where padding adds zeros around the input), in transposed convolution it affects how the output is trimmed.

    - proper padding ensures that the transposed convolution reverses the spatial transformations done by the original convolution (in encoder-decoder architectures like autoencoders or U-Nets).

  3. **Kernel size**:
    - the kernel size refers to the height and width of the filter (EX. 3×3).

    - this kernel is applied to each input element, generating a region of influence in the output.

    - larger kernels increase the area affected by each input value, which leads to greater spread in the output.

  They are all related to the output size by the previous formula: <br>
  Output size = (Input size - 1) * stride - 2 * padding + kernel size

#**Manual diagram**

![My Diagram](https://github.com/glorivaas/Machine_Learning25/blob/main/CamScanner%2006-02-2025%2011.18.pdf)