<a href="https://colab.research.google.com/github/samiha-mahin/An-Image-Processing-Repo/blob/main/Spatial_Attention_In_CNN.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#**Spatial Attention in CNNs**



### 🔹 **What is Spatial Attention in CNN?**

In CNNs, feature maps contain a lot of information, but not every **region (spatial location)** is equally important.

* Some areas of the image matter more for prediction (like the object region),
* Other areas may be background noise.

**Spatial Attention** helps the model **focus on "where" the important features are located** in the image.



### 🔹**How It Works**

1. Take the feature maps from a CNN layer (say size $H \times W \times C$).

   * $H, W$ = height & width of the feature map
   * $C$ = number of channels

2. **Compress along channels**

   * Apply **max pooling** and **average pooling** along the channel axis → this gives 2 feature maps of size $H \times W \times 1$.
   * Why?

     * Max pooling highlights the strongest features at each location.
     * Average pooling gives general context.

3. **Combine**

   * Concatenate these 2 maps → apply a convolution (usually $7 \times 7$) → then a **sigmoid**.
   * This produces a **spatial attention map** of size $H \times W \times 1$.

4. **Re-weight feature maps**

   * Multiply this attention map with the original feature map → the CNN will now focus more on important spatial regions.



### 🔹 **Formula**

If $F \in \mathbb{R}^{H \times W \times C}$ is the input feature map:

$$
M_s(F) = \sigma( f^{7 \times 7}([AvgPool(F); MaxPool(F)]) )
$$

where:

* $\sigma$ = sigmoid
* $f^{7 \times 7}$ = convolution with 7×7 kernel
* $M_s(F)$ = spatial attention map

Then the output is:

$$
F' = M_s(F) \otimes F
$$



### 🔹 **Intuition**

Think of it like telling the CNN:
👉 "Don’t waste time looking at the sky or background pixels—**focus on the object area** where the useful information is!"

---



 # **Difference between Spatial Attention (like in CBAM)** and **Squeeze-and-Excitation (SE) blocks**.



## 🔹 **1. SE Net (Squeeze-and-Excitation Attention)**

* **Focuses on channels (“what” features are important).**
* Workflow:

  1. **Squeeze**: Do global average pooling on the feature map → compress spatial dimensions → get a vector of size $1 \times 1 \times C$.
  2. **Excitation**: Pass this through 2 fully connected (FC) layers + sigmoid → outputs weights for each channel.
  3. **Reweight**: Multiply these weights with the original feature maps channel-wise.

✨ **Key Idea**:
SE tells the model **which feature maps (channels)** are important (e.g., texture vs color vs shape).
It answers **“what” to focus on**.


## 🔹 **2. Spatial Attention (Spatio Attention, e.g., CBAM)**

* **Focuses on spatial locations (“where” is important).**
* Workflow:

  1. Apply **average pooling** + **max pooling** along channels → get 2 spatial maps of size $H \times W \times 1$.
  2. Concatenate them → pass through convolution + sigmoid → output attention map.
  3. Multiply this with original feature map spatially.

✨ **Key Idea**:
Spatial attention tells the model **which regions in the image** are important (e.g., object vs background).
It answers **“where” to focus**.



## 🔹** Comparison Table**

| Feature            | SE Block (Squeeze-Excitation)  | Spatial Attention (CBAM style) |
| ------------------ | ------------------------------ | ------------------------------ |
| **Focus**          | Channel-wise (what features)   | Spatial (where in the image)   |
| **Mechanism**      | Global avg pooling + FC layers | Pooling across channels + Conv |
| **Output size**    | $1 \times 1 \times C$          | $H \times W \times 1$          |
| **Attention type** | “What is important?”           | “Where is important?”          |
| **Granularity**    | Global per-channel weights     | Local pixel/region weights     |



## 🔹 **Together**

* **SE block = channel attention only.**
* **Spatial attention = location attention only.**
* **CBAM = both (channel + spatial).**
  That’s why CBAM is often described as a **generalization of SE**.


