# Stochastic Depth — Modele Entegrasyon (DropPath/SD) | Baştan Sona

Bu notebook **entegrasyon odaklıdır** ✅  
Spatial Dropout notebook’undaki gibi şu akış var:

1) Kısaca: Stochastic Depth nedir, nerede durur?  
2) En temiz implementasyon: `stochastic_depth()` + `StochasticDepth` modülü  
3) **Residual block içine entegrasyon** (doğru yer: branch)  
4) **Stage / tüm model entegrasyonu** (make_stage mantığı)  
5) **Drop rate schedule** (derinlik boyunca lineer artış)  
6) CIFAR tarzı mini model ile “çalışıyor mu?” testi  
7) DropPath ile aynı API’ye bağlama önerisi

> Not: Stochastic Depth = Residual branch'i bazen bypass etmek.  
> Eval modunda tüm bloklar aktiftir.


In [1]:

# ===== 0) Imports =====
import torch
import torch.nn as nn
import torch.nn.functional as F

torch.manual_seed(0)
device = "cuda" if torch.cuda.is_available() else "cpu"
device


'cuda'

## 1) Entegrasyonun tek kritik kuralı

Residual yapı:
\[
y = x + F(x)
\]

Stochastic Depth'in uygulanacağı yer:
- **F(x)** branch çıktısı

Doğru:
1) `out = F(x)`
2) `out = SD(out)`  ✅
3) `y = x + out`

Yanlış:
- identity/skip yolunu düşürmek
- toplama sonrası rastgele sıfırlamak


## 2) Core implementasyon

- Drop prob: **p**
- Keep prob: **q = 1-p**
- Sample-wise mask: `[B, 1, 1, 1]` (tüm kanallara/spatial alana broadcast)

Inverted scaling:
\[
out = out \cdot \frac{m}{q}
\]


In [None]:
def stochastic_depth(x: torch.Tensor, p: float, training: bool) -> torch.Tensor:

    if not (0.0 <= p < 1.0):
        raise ValueError("p must be in [0, 1).")
    if (not training) or p == 0.0:
        return x

    q = 1.0 - p
    # sample-wise mask: [B, 1, 1, 1] (or broadcast for ndim)
    shape = (x.size(0),) + (1,) * (x.ndim - 1)
    mask = torch.empty(shape, device=x.device, dtype=x.dtype).bernoulli_(q)
    return x * mask / q


class StochasticDepth(nn.Module):
    def __init__(self, p: float = 0.1):
        super().__init__()
        if not (0.0 <= p < 1.0):
            raise ValueError("p must be in [0, 1).")
        self.p = float(p)

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        return stochastic_depth(x, p=self.p, training=self.training)

# quick sanity
sd = StochasticDepth(p=0.5)
sd.train()
t = sd(torch.randn(4, 8, 16, 16))
t.shape

torch.Size([4, 8, 16, 16])

## 3) Residual Block'a entegrasyon (en klasik kalıp)

Burada SD, **ikinci conv sonrası** branch çıktısına uygulanır.

Akış:
- conv1 → bn1 → act
- conv2 → bn2
- **sd(out)**
- out + identity
- act


In [None]:
class BasicResBlockSD(nn.Module):
    def __init__(self, cin: int, cout: int, stride: int = 1, sd_p: float = 0.0, act: str = "relu"):
        super().__init__()
        self.conv1 = nn.Conv2d(cin, cout, 3, stride=stride, padding=1, bias=False)
        self.bn1 = nn.BatchNorm2d(cout)

        self.conv2 = nn.Conv2d(cout, cout, 3, stride=1, padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(cout)

        self.act = nn.ReLU(inplace=True) if act == "relu" else nn.SiLU(inplace=True)
        self.sd = StochasticDepth(p=sd_p)

        self.shortcut = nn.Identity()
        if stride != 1 or cin != cout:
            self.shortcut = nn.Sequential(
                nn.Conv2d(cin, cout, 1, stride=stride, bias=False),
                nn.BatchNorm2d(cout)
            )

    def forward(self, x):
        identity = self.shortcut(x)

        out = self.act(self.bn1(self.conv1(x)))
        out = self.bn2(self.conv2(out))

        # ✅ Stochastic Depth branch üzerinde
        out = self.sd(out)

        out = out + identity
        out = self.act(out)
        return out

blk = BasicResBlockSD(32, 32, sd_p=0.2).to(device)
blk.train()
y = blk(torch.randn(4,32,32,32, device=device))
y.shape

torch.Size([4, 32, 32, 32])

## 4) Tüm modele entegrasyon: Stage + SD schedule

Gerçek projede en doğru kullanım:
- blok sayısı boyunca **sd_p lineer artar**
- erken bloklar: sd_p küçük
- derin bloklar: sd_p büyük

Lineer schedule:
\[
p_i = p_{max} \cdot \frac{i}{L-1}
\]


In [4]:

def make_sd_probs(num_blocks: int, p_max: float):
    if num_blocks <= 1:
        return [p_max]
    return [p_max * i / (num_blocks - 1) for i in range(num_blocks)]

make_sd_probs(8, 0.3)


[0.0,
 0.04285714285714286,
 0.08571428571428572,
 0.12857142857142856,
 0.17142857142857143,
 0.21428571428571427,
 0.2571428571428571,
 0.3]

### 4.1) make_stage 

Aşağıdaki fonksiyon:
- `num_blocks` kadar block üretir
- her block’a farklı `sd_p` verir
- stride sadece ilk blokta uygulanır (klasik ResNet stage)




In [None]:
def make_stage(cin: int, cout: int, num_blocks: int, stride: int, sd_probs, act="relu"):
    layers = []
    for i in range(num_blocks):
        s = stride if i == 0 else 1
        layers.append(BasicResBlockSD(cin if i == 0 else cout, cout, stride=s, sd_p=sd_probs[i], act=act))
    return nn.Sequential(*layers)

# örnek
sd_probs = make_sd_probs(3, 0.2)
stage = make_stage(32, 64, num_blocks=3, stride=2, sd_probs=sd_probs)

## 5) Mini model: CIFAR-like ResNet + SD

- Stem
- Stage1 (32)
- Stage2 (64, downsample)
- Stage3 (128, downsample)
- GAP + FC

SD schedule:
- Tüm bloklar sayılır
- sd_p, `sd_max`’e kadar lineer artar


In [6]:

class TinyResNetStochasticDepth(nn.Module):
    def __init__(self, num_classes=100, sd_max=0.2, blocks=(2,2,2), act="relu"):
        super().__init__()

        self.stem = nn.Sequential(
            nn.Conv2d(3, 32, 3, padding=1, bias=False),
            nn.BatchNorm2d(32),
            nn.ReLU(inplace=True) if act == "relu" else nn.SiLU(inplace=True),
        )

        total_blocks = sum(blocks)
        all_probs = make_sd_probs(total_blocks, sd_max)

        idx = 0
        probs1 = all_probs[idx: idx+blocks[0]]; idx += blocks[0]
        probs2 = all_probs[idx: idx+blocks[1]]; idx += blocks[1]
        probs3 = all_probs[idx: idx+blocks[2]]; idx += blocks[2]

        self.stage1 = make_stage(32, 32, blocks[0], stride=1, sd_probs=probs1, act=act)
        self.stage2 = make_stage(32, 64, blocks[1], stride=2, sd_probs=probs2, act=act)
        self.stage3 = make_stage(64, 128, blocks[2], stride=2, sd_probs=probs3, act=act)

        self.pool = nn.AdaptiveAvgPool2d(1)
        self.fc = nn.Linear(128, num_classes)

    def forward(self, x):
        x = self.stem(x)
        x = self.stage1(x)
        x = self.stage2(x)
        x = self.stage3(x)
        x = self.pool(x).flatten(1)
        return self.fc(x)

model = TinyResNetStochasticDepth(num_classes=10, sd_max=0.3, blocks=(2,2,2)).to(device)
model.train()
logits = model(torch.randn(2,3,32,32, device=device))
logits.shape


torch.Size([2, 10])

## 6) Entegrasyon doğrulama: train vs eval farkı

- `model.train()` → bazı örneklerde branch drop olur
- `model.eval()` → drop yok, tüm bloklar aktif

Aşağıda aynı input ile train/eval çıktıları aynı olmak zorunda değil (BN, dropout vs), ama SD’nin aktif/pasif olduğunu gözlemleyebilirsin.


In [7]:

x = torch.randn(2,3,32,32, device=device)

model.train()
y_train_1 = model(x)
y_train_2 = model(x)  # SD maskesi değişebilir

model.eval()
y_eval_1 = model(x)
y_eval_2 = model(x)

print("train outputs equal? ", torch.allclose(y_train_1, y_train_2))
print("eval outputs equal?  ", torch.allclose(y_eval_1, y_eval_2))


train outputs equal?  False
eval outputs equal?   True


## 7) Pratik ayarlar (kısa ve net)

- CIFAR benzeri küçük modeller: `sd_max = 0.05 – 0.2`
- Daha derin/Transformer: `sd_max = 0.1 – 0.5`

Kombinasyon:
- Eğer Dropout/SpatialDropout da varsa: onları düşük tut.
- SD ana regularizer olsun (block-level).

En sık hata:
- SD’yi **skip yoluna** koymak ❌
- Doğru yer: **residual branch** ✅
