# Spatial Dropout (Dropout2d) â€” Temelden Derine (CNN)

Bu notebook ÅŸunlarÄ± Ã¶ÄŸretir:
- Spatial Dropout nedir?
- Klasik Dropoutâ€™tan farkÄ± ne?
- Neden CNNâ€™lerde iÅŸe yarar?
- Matematiksel bakÄ±ÅŸ: maske nasÄ±l uygulanÄ±r?
- PyTorch implementasyonu (nn.Dropout2d / custom)
- DropBlock ile farkÄ±
- Pratik ayarlar, nerede konur, hangi oran mantÄ±klÄ±?


## 1) Dropout ailesi: neyi dÃ¼zenliyoruz?

**Regularization**: modelin training setâ€™e aÅŸÄ±rÄ± uyumunu (overfitting) azaltmak.

Dropoutâ€™un temel fikri:
- EÄŸitim sÄ±rasÄ±nda bazÄ± aktivasyonlarÄ± rastgele sÄ±fÄ±rla
- Model "tek bir sinyale baÄŸÄ±mlÄ±" kalmasÄ±n
- Bir nevi *ensemble* etkisi yarat (farklÄ± alt-aÄŸlar)

Ama CNNâ€™de bir problem var:
- Bir feature map (kanal) iÃ§indeki pikseller **Ã§ok koreledir**.
- Klasik dropout (element-wise) rastgele pikselleri silince etkisi zayÄ±flar.

Bu yÃ¼zden CNNâ€™e daha uygun dropout tÃ¼revleri gelir:
- **Spatial Dropout / Dropout2d**: kanalÄ± komple dÃ¼ÅŸÃ¼r
- **DropBlock**: uzamsal blok dÃ¼ÅŸÃ¼r
- **DropPath / Stochastic Depth**: entire path / block dÃ¼ÅŸÃ¼r


## 2) Spatial Dropout nedir?

Spatial Dropout (PyTorch'ta pratikte **`nn.Dropout2d`**):
- GiriÅŸ tensÃ¶rÃ¼ shape'i: **[B, C, H, W]**
- Her Ã¶rnek (B) iÃ§in her kanal (C) ya **tamamen korunur** ya da **tamamen sÄ±fÄ±rlanÄ±r**.
- Maske shape'i: **[B, C, 1, 1]** (sonra H,Wâ€™ye broadcast edilir)

Yani:
- Klasik dropout: `mask ~ Bernoulli(keep)` shape `[B,C,H,W]`
- Spatial dropout: `mask ~ Bernoulli(keep)` shape `[B,C,1,1]`

ðŸ‘‰ Ã–nemli: Spatial Dropout **spatial alanÄ± (H,W) deÄŸil**, **kanallarÄ±** hedefler.
Ä°sim yanÄ±ltmasÄ±n: "spatial" burada CNN feature map mantÄ±ÄŸÄ±ndan gelir.


## 3) AmaÃ§ ne?

CNNâ€™de her kanal genelde bir pattern yakalar:
- edge / texture / orientation / local structure gibi

Model bazen birkaÃ§ kanala aÅŸÄ±rÄ± yaslanÄ±r.
**Spatial Dropout** der ki:
> "BazÄ± kanallarÄ± komple kapatÄ±yorum, diÄŸer kanallardan da aynÄ± iÅŸi Ã¶ÄŸren."

Bu ÅŸunlarÄ± saÄŸlar:
- Kanal co-adaptation azalÄ±r (kanallarÄ±n birbirine aÅŸÄ±rÄ± baÄŸÄ±mlÄ±lÄ±ÄŸÄ± kÄ±rÄ±lÄ±r)
- Daha robust feature Ã¶ÄŸrenimi
- KÃ¼Ã§Ã¼k veri / domain shift olan senaryolarda fayda


## 4) Matematiksel formÃ¼l (basit ve net)

Girdi:  
$$x \in \mathbb{R}^{B \times C \times H \times W}$$

Keep olasÄ±lÄ±ÄŸÄ±: $q = 1 - p$  
Maske:  
$$m \sim \text{Bernoulli}(q) \quad , \quad m \in \{0,1\}^{B \times C \times 1 \times 1}$$

Uygulama (inverted dropout):
$$y = \frac{x \odot m}{q}$$

BÃ¶ylece training sÄ±rasÄ±nda beklenen deÄŸer korunur:
$$\mathbb{E}[y] = x$$

Eval modunda dropout kapalÄ±dÄ±r:
$$y = x$$


In [1]:
import torch
import torch.nn as nn

torch.manual_seed(0)

B, C, H, W = 2, 4, 3, 3
x = torch.randn(B, C, H, W)
x

tensor([[[[-1.1258, -1.1524, -0.2506],
          [-0.4339,  0.8487,  0.6920],
          [-0.3160, -2.1152,  0.3223]],

         [[-1.2633,  0.3500,  0.3081],
          [ 0.1198,  1.2377,  1.1168],
          [-0.2473, -1.3527, -1.6959]],

         [[ 0.5667,  0.7935,  0.5988],
          [-1.5551, -0.3414,  1.8530],
          [ 0.7502, -0.5855, -0.1734]],

         [[ 0.1835,  1.3894,  1.5863],
          [ 0.9463, -0.8437, -0.6136],
          [ 0.0316, -0.4927,  0.2484]]],


        [[[ 0.4397,  0.1124,  0.6408],
          [ 0.4412, -0.1023,  0.7924],
          [-0.2897,  0.0525,  0.5229]],

         [[ 2.3022, -1.4689, -1.5867],
          [-0.6731,  0.8728,  1.0554],
          [ 0.1778, -0.2303, -0.3918]],

         [[ 0.5433, -0.3952,  1.5091],
          [ 2.0820,  1.7067,  2.3804],
          [-1.1256, -0.3170, -1.0925]],

         [[-0.0852,  0.3276, -0.7607],
          [-1.5991,  0.0185, -0.7504],
          [ 0.1854,  0.6211,  0.6382]]]])

## 5) PyTorchâ€™ta hazÄ±r kullanÄ±m: nn.Dropout2d

PyTorchâ€™ta **`nn.Dropout2d`** ÅŸunu yapar:
- (N, C, H, W) giriÅŸte **kanal bazlÄ±** maske uygular
- Her kanal iÃ§in tÃ¼m (H,W) birlikte sÄ±fÄ±rlanÄ±r

Not:
- PyTorch dokÃ¼mantasyonunda `Dropout2d` bazen "feature map dropout" diye geÃ§er.
- MantÄ±k: *channel-wise dropout*.


In [2]:
drop2d = nn.Dropout2d(p=0.5)
drop2d.train()

y = drop2d(x)
y

tensor([[[[-2.2517, -2.3047, -0.5012],
          [-0.8678,  1.6974,  1.3840],
          [-0.6320, -4.2304,  0.6445]],

         [[-2.5267,  0.7000,  0.6163],
          [ 0.2397,  2.4753,  2.2336],
          [-0.4946, -2.7053, -3.3919]],

         [[ 0.0000,  0.0000,  0.0000],
          [-0.0000, -0.0000,  0.0000],
          [ 0.0000, -0.0000, -0.0000]],

         [[ 0.0000,  0.0000,  0.0000],
          [ 0.0000, -0.0000, -0.0000],
          [ 0.0000, -0.0000,  0.0000]]],


        [[[ 0.8794,  0.2248,  1.2816],
          [ 0.8823, -0.2046,  1.5849],
          [-0.5793,  0.1050,  1.0457]],

         [[ 0.0000, -0.0000, -0.0000],
          [-0.0000,  0.0000,  0.0000],
          [ 0.0000, -0.0000, -0.0000]],

         [[ 0.0000, -0.0000,  0.0000],
          [ 0.0000,  0.0000,  0.0000],
          [-0.0000, -0.0000, -0.0000]],

         [[-0.1704,  0.6552, -1.5214],
          [-3.1982,  0.0370, -1.5009],
          [ 0.3708,  1.2423,  1.2764]]]])

In [3]:
# Hangi kanallar sÄ±fÄ±rlandÄ± gÃ¶relim (Ã¶rnek 0 iÃ§in)
# Bir kanal tamamen 0 ise mean(abs)=0 Ã§Ä±kar.
channel_energy = y[0].abs().mean(dim=(1,2))
channel_energy

tensor([1.6126, 1.7092, 0.0000, 0.0000])

YukarÄ±daki `channel_energy` deÄŸerlerinde **0 olan kanallar** komple dÃ¼ÅŸmÃ¼ÅŸ demektir.

AyrÄ±ca ÅŸu kritik noktayÄ± gÃ¶r:
- Dropout2d training'de aktif
- eval'de pasif


In [4]:
drop2d.eval()
y_eval = drop2d(x)
torch.allclose(x, y_eval)

True

## 6) Custom Spatial Dropout implementasyonu (tam kontrol)

Bazen ÅŸunlarÄ± istersin:
- maske Ã¼retimini gÃ¶rÃ¼p debug etmek
- farklÄ± scaling stratejisi
- sadece belirli katmanlara uygulamak

AÅŸaÄŸÄ±daki modÃ¼l birebir ÅŸu formÃ¼lÃ¼ uygular:
$$y = x \odot m / q$$
burada $m$ shape `[B,C,1,1]`.


In [5]:
class SpatialDropout2D(nn.Module):
    def __init__(self, p: float = 0.5):
        super().__init__()
        if not (0.0 <= p < 1.0):
            raise ValueError("p must be in [0,1).")
        self.p = float(p)

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        if (not self.training) or self.p == 0.0:
            return x
        q = 1.0 - self.p
        # mask: [B, C, 1, 1]
        mask = torch.empty((x.size(0), x.size(1), 1, 1), device=x.device, dtype=x.dtype).bernoulli_(q)
        return x * mask / q

sd = SpatialDropout2D(p=0.5)
sd.train()
y2 = sd(x)
y2

tensor([[[[-2.2517, -2.3047, -0.5012],
          [-0.8678,  1.6974,  1.3840],
          [-0.6320, -4.2304,  0.6445]],

         [[-2.5267,  0.7000,  0.6163],
          [ 0.2397,  2.4753,  2.2336],
          [-0.4946, -2.7053, -3.3919]],

         [[ 0.0000,  0.0000,  0.0000],
          [-0.0000, -0.0000,  0.0000],
          [ 0.0000, -0.0000, -0.0000]],

         [[ 0.0000,  0.0000,  0.0000],
          [ 0.0000, -0.0000, -0.0000],
          [ 0.0000, -0.0000,  0.0000]]],


        [[[ 0.8794,  0.2248,  1.2816],
          [ 0.8823, -0.2046,  1.5849],
          [-0.5793,  0.1050,  1.0457]],

         [[ 4.6044, -2.9378, -3.1734],
          [-1.3462,  1.7457,  2.1107],
          [ 0.3557, -0.4607, -0.7835]],

         [[ 1.0866, -0.7903,  3.0182],
          [ 4.1639,  3.4134,  4.7607],
          [-2.2512, -0.6340, -2.1849]],

         [[-0.0000,  0.0000, -0.0000],
          [-0.0000,  0.0000, -0.0000],
          [ 0.0000,  0.0000,  0.0000]]]])

## 7) Nerede kullanÄ±lÄ±r? (CNN blok akÄ±ÅŸÄ±)

Genel pratik:
- Conv â†’ Norm â†’ Activation â†’ **Spatial Dropout**

Ã–rnek:
- Conv â†’ BN â†’ ReLU â†’ Dropout2d

Residual blokta:
- Ana dalÄ±n sonunda veya attention sonrasÄ±

Dikkat:
- Ã‡ok erken katmanlarda (inputa Ã§ok yakÄ±n) aÅŸÄ±rÄ± dropout detay kaybÄ± yaratabilir.


## 8) DropBlock ile farkÄ± (Ã§ok net)

- **Spatial Dropout**: KanalÄ± komple kapatÄ±r (C ekseninde).
- **DropBlock**: AynÄ± kanal iÃ§inde HÃ—W Ã¼zerinde *blok* kapatÄ±r.

Ne zaman hangisi?
- Kanal baÄŸÄ±mlÄ±lÄ±ÄŸÄ± problemi â†’ Spatial Dropout
- Lokal bÃ¶lgeye aÅŸÄ±rÄ± gÃ¼venme / spatial korelasyon â†’ DropBlock


## 9) Pratik ayarlar (sahada iÅŸ gÃ¶ren)

- p genelde: **0.05 â€“ 0.3**
- Ã‡ok derin/overfit modellerde: **0.2 â€“ 0.4** denenebilir
- BN/SyncBN varsa dropout oranÄ±nÄ± abartma (Ã¶zellikle kÃ¼Ã§Ã¼k batch)
- Dropout2dâ€™yi genelde **activation sonrasÄ±** koy

Ä°pucu:
- EÄŸer zaten gÃ¼Ã§lÃ¼ augmentation + weight decay + DropPath varsa,
  Spatial Dropoutâ€™u dÃ¼ÅŸÃ¼k tut (pâ‰ˆ0.05â€“0.15).


## 10) Mini Ã¶rnek blok

AÅŸaÄŸÄ±daki blok: Conv â†’ BN â†’ SiLU â†’ Dropout2d.
GerÃ§ek projede bunu residual/attention bloklarÄ±nÄ±n iÃ§ine entegre edersin.


In [6]:
class ConvBNActSD(nn.Module):
    def __init__(self, cin, cout, k=3, s=1, p=1, act="silu", sd_p=0.1):
        super().__init__()
        self.conv = nn.Conv2d(cin, cout, k, stride=s, padding=p, bias=False)
        self.bn = nn.BatchNorm2d(cout)
        self.act = nn.SiLU(inplace=True) if act == "silu" else nn.ReLU(inplace=True)
        self.sd = nn.Dropout2d(p=sd_p)

    def forward(self, x):
        x = self.conv(x)
        x = self.bn(x)
        x = self.act(x)
        x = self.sd(x)
        return x

blk = ConvBNActSD(3, 16, sd_p=0.2)
blk.train()
out = blk(torch.randn(2,3,32,32))
out.shape

torch.Size([2, 16, 32, 32])

## 11) Checklist (senin repo iÃ§in)

- [ ] `nn.Dropout2d` ile basit entegrasyon
- [ ] Custom `SpatialDropout2D` ile maskeyi doÄŸrulama
- [ ] Ablation: p=0.0 / 0.1 / 0.2
- [ ] DropBlock + DropPath ile birlikteyken p dÃ¼ÅŸÃ¼r

BittiÄŸinde sÄ±radaki konu: **Stochastic Depth**.
