When we train very deep neural networks, they often struggle to learn. Adding more layers doesn’t always make the model smarter — it can actually make it worse, because information gets “washed out” or distorted as it travels through so many layers.

<figure>
  <img src="asset/super_resolution.png" alt="Super Resolution" width="520">
</figure>
<figure>
  <img src="asset/how_to_upsample.png" alt="How to update Resolution" width="520">
</figure>
<figure>
  <img src="asset/image_minus.png" alt="Image minus to get  Redisual" width="520">
</figure>
<figure>
  <img src="asset/residual_update.png" alt="Super Resolution" width="520">
</figure>


### 1. The Key Idea: Learn the *Difference*, Not the Whole Thing

Look at the middle panels:

You see one image being subtracted from another, producing a *noisy difference pattern*. That “difference” represents the **residual** — the *small tweak* or *correction* needed to turn one image into the other.

So instead of learning:

> "How to create the new image from scratch"

the network learns:

> "What needs to change from the input image to get the desired result."

This is a much easier task for optimization.

---

### 2. The Skip (Shortcut) Connection

In the final panel, you see the original image being **added** to the “difference” image (the residual).
That’s exactly what a **ResNet block** does:

> Output = Input + Residual

Here, the residual is the output of some small stack of layers — say, two convolutional layers — that learn what extra detail should be added.

This simple addition (a **skip connection**) lets the original information flow straight through the network, untouched. The model only needs to learn the adjustments — just like small brushstrokes refining a painting.


<figure>
  <img src="asset/ResNet.PNG" alt="Super Resolution" width="800">
</figure>

<figure>
  <img src="asset/residual_block.png" alt="Super Resolution" width="520">
</figure>
<figure>
  <img src="asset/residual_block_two_methods.png" alt="Super Resolution" width="520">
</figure>

| Version   | Diagram         | Name            | Used In                    | Key Feature                   |
| --------- | --------------- | --------------- | -------------------------- | ----------------------------- |
| **Left**  | Post-activation | Original ResNet | ResNet-18/34/50/101        | Activation after addition     |
| **Right** | Pre-activation  | Improved ResNet | ResNet-v2 / later versions | Activation before convolution |

In modern practice, the right one (pre-activation) is considered the better design, because it stabilizes training for deeper architectures and simplifies the residual mapping.

In [1]:
import torch 
import torch.nn as nn