----
-----
----
----

## PATTERN 1 — Attention Inside Residual Branch (CBAM)

**Girdi**

\[
x \in \mathbb{R}^{C \times H \times W}
\]



**Akış (tek hücre – baştan sona)**

```text
x
│
├─ Skip (identity) yolu
│   ├─ identity = x
│   └─ (kanal/çözünürlük değişiyorsa)
│       identity = Conv1×1(stride) → BatchNorm
│
└─ Residual branch (F(x))
    ├─ f = Conv3×3(stride)(x)
    ├─ f = BatchNorm(f)
    ├─ f = ReLU(f)
    ├─ f = Conv3×3(stride=1)(f)
    ├─ f = BatchNorm(f)
    │
    ├─ Channel Attention
    │   ├─ α_c = ChannelAttention(f)        # (C × 1 × 1)
    │   └─ f   = α_c ⊙ f
    │
    ├─ Spatial Attention
    │   ├─ α_s = SpatialAttention(f)        # (1 × H × W)
    │   └─ f   = α_s ⊙ f
    │
    └─ f = A(F(x)) ⊙ F(x)
│
└─ Residual Add
    ├─ y = identity + f
    └─ y = ReLU(y)   (opsiyonel)
```
Tek satırlık denklem :: :: y=skip(x)+(A(F(x))⊙F(x))

Net kurallar

* Attention’a giren x değil, F(x)’tir

* Attention yeni feature üretmez, maske (α) üretir

* Maske çarpılır (⊙), toplanmaz

* Toplama yalnızca en sonda, skip yolu ile yapılır

* Skip yolu attention’dan etkilenmez

----
-----
----
----



-------------------------
-------------------------
-------------------------
-------------------------
-------------------------
-------------------------


## Uygulama için önce bir CBAM Attention tanımlayalım.Eğer attention kısmına aşina değilseniz bu dosyaya göz gezdirmenizi öneririm.
> **Torch CNN - Part_2\Attention Mekanizmaları**

-------------------------
-------------------------
-------------------------
-------------------------
-------------------------
-------------------------




# 1) CBAM’i sıfırdan yazalım (PyTorch)

CBAM iki parçadan oluşur:

* Channel Attention: (B,C,1,1) maske üretir

* Spatial Attention: (B,1,H,W) maske üretir

Sonra ikisi sırayla çarpılır.

In [8]:
import torch
import torch.nn as nn

class ChannelAttention(nn.Module):
    def __init__(self, channels: int, reduction: int = 16):
        super().__init__()
        hidden = max(channels // reduction, 4)

        self.avg_pool = nn.AdaptiveAvgPool2d(1)
        self.max_pool = nn.AdaptiveMaxPool2d(1)

        # CBAM paper: shared MLP (biz 1x1 conv ile yapıyoruz)
        self.mlp = nn.Sequential(
            nn.Conv2d(channels, hidden, kernel_size=1, bias=False),
            nn.ReLU(inplace=True),
            nn.Conv2d(hidden, channels, kernel_size=1, bias=False),
        )
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        # x: (B,C,H,W)
        avg_out = self.mlp(self.avg_pool(x))  # (B,C,1,1)
        max_out = self.mlp(self.max_pool(x))  # (B,C,1,1)
        attn = self.sigmoid(avg_out + max_out)
        return attn  # (B,C,1,1)


class SpatialAttention(nn.Module):
    def __init__(self, kernel_size: int = 7):
        super().__init__()
        assert kernel_size in (3, 7)
        padding = 3 if kernel_size == 7 else 1

        self.conv = nn.Conv2d(2, 1, kernel_size=kernel_size, padding=padding, bias=False)
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        # x: (B,C,H,W)
        # channel-wise pooling -> (B,1,H,W) + (B,1,H,W)
        avg_map = torch.mean(x, dim=1, keepdim=True)
        max_map, _ = torch.max(x, dim=1, keepdim=True)
        cat = torch.cat([avg_map, max_map], dim=1)  # (B,2,H,W)
        attn = self.sigmoid(self.conv(cat))         # (B,1,H,W)
        return attn


class CBAM(nn.Module):
    def __init__(self, channels: int, reduction: int = 16, spatial_kernel: int = 7):
        super().__init__()
        self.ca = ChannelAttention(channels, reduction=reduction)
        self.sa = SpatialAttention(kernel_size=spatial_kernel)

    def forward(self, x):
        # Channel attention
        x = self.ca(x) * x
        # Spatial attention
        x = self.sa(x) * x
        return x

# 2) Pattern-1 Residual Block: “Attention inside residual branch” + CBAM

Kural:

* f = F(x) üret

* f = CBAM(f) (maskelenmiş f döner)

* y = x + f (skip path burada)

Ayrıca downsample gerektiğinde (stride=2 veya kanal artışı) skip path’i 1x1 conv ile eşitliyoruz.

In [9]:
class Pattern_1_Residual(nn.Module):
    def __init__(self, in_ch:int , out_ch:int , stride :int = 1 ,reduction:int=16,spatial_kernel:int=7):
        super().__init__()

        self.conv1 = nn.Conv2d(in_ch, out_ch, kernel_size=3, stride=stride, padding=1, bias=False)
        self.bn1   = nn.BatchNorm2d(out_ch)
        self.act   = nn.ReLU(inplace=True)

        self.conv2 = nn.Conv2d(out_ch, out_ch, kernel_size=3, stride=1, padding=1, bias=False)
        self.bn2   = nn.BatchNorm2d(out_ch)

        # PATTERN -- 1
        self.cbam = CBAM(in_ch,out_ch,reduction=reduction,spatial_kernel=spatial_kernel)

        self.downsample = None
        if stride != 1 or in_ch != out_ch:
            self.downsample = nn.Sequential(nn.Conv2d(in_ch,out_ch,kernel_size=1,stride=stride,bias=False),
            nn.BatchNorm2d(out_ch)                              
            )
        
    def forward(self,x):
        identity = x 
        if self.downsample is not None:
            identity = self.downsample(x)
        
        f = self.act(self.bn1(self.conv1(x)))
        f = self.bn2(self.conv2(x))

        # attention inside residual branch: f -> cbam(f)  (çarpma CBAM içinde)
        f = self.cbam(f) #  # f = A(F(x)) ⊙ F(x) (CBAM bunu yapmış oluyor)

        y = identity +f 
        y = self.act(y)
        return y 

> **Not: CBAM’in içindeki * x çarpmaları maskeyi uygular. Bu yüzden block içinde ayrıca alpha * f yazmıyoruz.**

### CBAM’i conv’dan önce koyarsak ne olur?

#### CBAM pre-attention olursa:

* Channel + spatial mask, x’i filtreler

* Conv’lar artık “seçili bölgeleri” işler

Bu bazen işe yarar ama:

* CBAM’in spatial maskesi erken katmanda “yanlış yerde” yoğunlaşırsa,
model daha baştan bilgiyi kesebilir.

# 3) Bunu bir modele entegre edelim (Mini-ResNet)

Aşağıdaki model:

* Stem

* 3 stage (channel: 64→128→256, her stage başında stride=2 downsample)

* Classifier

In [10]:
import torch
import torch.nn as nn

class ChannelAttention_1(nn.Module):
    def __init__(self, channels,reduction:int=16):
        super().__init__()
        hidden = max(channels  // reduction,4)

        self.avg = nn.AdaptiveAvgPool2d(1)
        self.max = nn.AdaptiveMaxPool2d(1)
        self.mlp = nn.Sequential(
            nn.Conv2d(channels,hidden,1,bias=False),
            nn.ReLU(inplace=True),
            nn.Conv2d(hidden,channels,1,bias=False))

        self.sigmoid = nn.Sigmoid()

    def forward(self,x):
        return self.sigmoid(self.mlp(self.avg(x)) + self.mlp(self.max(x)))
    
class SpatialAttention_1(nn.Module):
    def __init__(self, kernel_size:int = 7):
        super().__init__()

        padding = 3 if kernel_size == 7 else 1 
        self.conv = nn.Conv2d(2,1,kernel_size,padding=padding , bias=False)
        self.sigmoid = nn.Sigmoid()

    def forward(self,x):
        avg_map = torch.mean(x,dim=1, keepdim=True) # B,1,H,W
        max_map = torch.max(x,dim=1 , keepdim=True) # B,1,H,W

        a = torch.cat([avg_map,max_map],dim=1)
        return self.sigmoid(self.conv(a))

class CBAM_1(nn.Module):
    def __init__(self, channels, reduction=16, spatial_kernel=7):
        super().__init__()
        self.ca = ChannelAttention_1(channels, reduction)
        self.sa = SpatialAttention_1(spatial_kernel)

    def forward(self, x):
        x = self.ca(x) * x
        x = self.sa(x) * x
        return x

In [11]:
class ResCBAM_Block(nn.Module):
    def __init__(self, in_ch, out_ch, stride=1, reduction=16, spatial_kernel=7):
        super().__init__()
        self.act = nn.ReLU(inplace=True)

        self.conv1 = nn.Conv2d(in_ch, out_ch, 3, stride=stride, padding=1, bias=False)
        self.bn1   = nn.BatchNorm2d(out_ch)
        self.conv2 = nn.Conv2d(out_ch, out_ch, 3, stride=1, padding=1, bias=False)
        self.bn2   = nn.BatchNorm2d(out_ch)

        self.cbam  = CBAM(out_ch, reduction=reduction, spatial_kernel=spatial_kernel)

        self.skip = None
        if stride != 1 or in_ch != out_ch:
            self.skip = nn.Sequential(
                nn.Conv2d(in_ch, out_ch, 1, stride=stride, bias=False),
                nn.BatchNorm2d(out_ch)
            )

    def forward(self, x):
        identity = x if self.skip is None else self.skip(x)

        f = self.act(self.bn1(self.conv1(x)))
        f = self.bn2(self.conv2(f))   # f = F(x)

        f = self.cbam(f)              # f = A(F(x)) ⊙ F(x) (CBAM kendi içinde çarpar)

        y = identity + f              # klasik residual toplama
        y = self.act(y)
        return y

### MODEL - CNN

In [12]:
class SimpleCNN_With_ResCBAM(nn.Module):
    def __init__(self, num_classes=10):
        super().__init__()

        self.stem = nn.Sequential(
            nn.Conv2d(3, 32, 3, padding=1, bias=False),
            nn.BatchNorm2d(32),
            nn.ReLU(inplace=True),
        )

        # Normal conv blok
        self.block1 = nn.Sequential(
            nn.Conv2d(32, 64, 3, padding=1, bias=False),
            nn.BatchNorm2d(64),
            nn.ReLU(inplace=True),
        )

        # Buraya Pattern-1 entegre edildi 
        self.rescbam1 = ResCBAM_Block(64, 64, stride=1)     # çözünürlük aynı
        self.rescbam2 = ResCBAM_Block(64, 128, stride=2)    # downsample + kanal artışı

        # Devam: normal conv
        self.block2 = nn.Sequential(
            nn.Conv2d(128, 128, 3, padding=1, bias=False),
            nn.BatchNorm2d(128),
            nn.ReLU(inplace=True),
        )

        self.head = nn.Sequential(
            nn.AdaptiveAvgPool2d(1),
            nn.Flatten(),
            nn.Linear(128, num_classes)
        )

    def forward(self, x):
        x = self.stem(x)
        x = self.block1(x)

        x = self.rescbam1(x)
        x = self.rescbam2(x)

        x = self.block2(x)
        x = self.head(x)
        return x

In [13]:
if __name__ == "__main__":
    model = SimpleCNN_With_ResCBAM(num_classes=10)
    x = torch.randn(4, 3, 32, 32)
    y = model(x)
    print(y.shape)  


torch.Size([4, 10])


---
----
----
---

# En net entegrasyon mantığı (kural)

Eğer elimizdeki modelde herhangi bir yerde şu yapı varsa:

... -> Conv/BN/ReLU -> Conv/BN -> ...


bunu şu hale getiriyoruz:

* identity = x
* F(x) = convconv(x)
* F_att = CBAM(F(x))   # içerde çarpma var
* y = identity + F_att


Kafamızdaki “ben nereye koyacağım?” sorusunun cevabı:

* Conv’ların ürettiği bloğu F(x) kabul et

* CBAM’i F(x)’in üstüne koy

* Sonra skip ile topla

---
----
----
---
