
---

## Day 22: 정규화 기법 & Dropout

### 1. Theory: 왜 정규화(Regularization)가 필요할까?

* **Overfitting(과적합)**

  * 모델이 학습 데이터에 너무 최적화되어, 새로운 데이터에 대한 일반화 성능이 떨어짐
  * 복잡한 모델일수록, 파라미터가 많을수록 과적합 위험이 커짐

* **정규화 기법의 목적**

  * 모델 파라미터에 패널티를 주어, 지나치게 복잡해지는 것을 억제
  * 결과적으로 더 일반화된(새로운 데이터에도 잘 작동하는) 모델로 유도

#### 1.1 L2 정규화 (Weight Decay)

* **원리**: 손실 함수에 가중치의 제곱합을 추가

  $$
    \mathcal{L}_{\text{total}}
      = \mathcal{L}_{\text{data}}
      + \lambda \sum_i w_i^2
  $$

  * $\lambda$: 정규화 강도(하이퍼파라미터)
  * 큰 가중치 값에 더 큰 페널티 → 가중치가 0에 가깝도록 유도

* **PyTorch 사용법**:

  ```python
  optimizer = torch.optim.SGD(
      model.parameters(),
      lr=0.01,
      weight_decay=1e-4  # L2 λ 값
  )
  ```

#### 1.2 L1 정규화

* **원리**: 손실 함수에 가중치 절댓값 합을 추가

  $$
    \mathcal{L}_{\text{total}}
      = \mathcal{L}_{\text{data}}
      + \lambda \sum_i |w_i|
  $$

  * 많은 파라미터를 정확히 0으로 만들기 때문에, \*\*스파스(sparse)\*\*한 모델에 유리
* **PyTorch 사용법** (직접 구현):

  ```python
  # training loop 내에서
  l1_lambda = 1e-5
  l1_norm = sum(p.abs().sum() for p in model.parameters())
  loss = data_loss + l1_lambda * l1_norm
  loss.backward()
  optimizer.step()
  ```

#### 1.3 Dropout

* **원리**: 학습 중에 뉴런을 랜덤으로 비활성화(drop)

  * 각 미니배치마다 다른 서브네트워크(sub-network)를 학습하는 효과
  * 뉴런에 의존하는 정도를 낮춰, 과적합 완화
* **PyTorch 사용법**:

  ```python
  import torch.nn as nn

  class MLP(nn.Module):
      def __init__(self):
          super().__init__()
          self.fc1 = nn.Linear(784, 256)
          self.dropout = nn.Dropout(p=0.5)  # 드롭아웃 확률
          self.fc2 = nn.Linear(256, 10)

      def forward(self, x):
          x = torch.relu(self.fc1(x))
          x = self.dropout(x)           # 학습 시에만 활성화
          return self.fc2(x)
  ```
* **주의**:

  * `model.train()` 모드에서만 드롭아웃이 적용되고 (`.eval()` 모드에서는 비활성화)
  * 너무 큰 드롭아웃 확률(p>0.7)은 오히려 학습을 방해할 수 있음

---


#### 실습 과제

1. **L1 정규화**를 직접 추가해 보고

   * `l1_lambda` 값을 바꿔가며 훈련 정확도·검증 정확도 관찰
2. **학습률 스케줄러**(`StepLR` 등)와 조합하여 성능 비교
3. **드롭아웃 확률**(0.2, 0.5, 0.8)을 바꿔 가며 모델 성능 변화 기록



In [41]:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader as loader
from torchvision import datasets, transforms

In [43]:
transform=transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5,), (0.5,))])

train_data=datasets.MNIST(root='./data', download=True, train=True, transform=transform)
test_data=datasets.MNIST(root='./data', download=True, train=False, transform=transform)
train_loader=loader(train_data, batch_size=64, shuffle=True)
test_loader=loader(test_data, batch_size=1000,shuffle=True)

In [66]:
class Model(nn.Module):
    def __init__(self, dropout_p=0.0):
        super(Model, self).__init__()
        self.fc1=nn.Linear(28*28,256)
        self.drop=nn.Dropout(p=dropout_p)
        self.fc2=nn.Linear(256, 10)

    def forward(self, x):
        x=x.view(-1, 28*28)
        x=self.fc1(x)
        x=torch.relu(x)
        x=self.drop(x)
        return self.fc2(x)

In [127]:
device=torch.device('cuda' if torch.cuda.is_available()else'cpu')
print(device)
def train(model, train_loader,test_loader, criter, optim, num_epoch):
    model.train()
    total_loss=0.0
    avg_acc=0.0
    for epoch in range(num_epoch):
        total_loss=0.0
        correct=0.0
        for data, target in train_loader:
            data,target=data.to(device), target.to(device)
            optim.zero_grad()
            out=model(data).to(device)
            loss=criter(out, target)
            loss.backward()
            optim.step()
            total_loss+=loss.item()
            correct+=(out.argmax(1)==target).sum().item()
        avg_acc=correct/len(train_loader.dataset)

        model.eval()
        correct=0.0
        with torch.no_grad():
            for data, target in test_loader:
                datas,targets=data.to(device), target.to(device)
                correct+=(model(datas).to(device).argmax(1)==targets).sum().item()
            val_acc = correct / len(test_loader.dataset)
            print(f"Epoch {epoch+1}: train_acc={avg_acc:.4f}, val_acc={val_acc:.4f}")

cuda


In [115]:
model=Model(dropout_p=0.0).to(device)
num_epoch=10
lr=1e-4
criter = nn.CrossEntropyLoss()
optimy=optim.SGD(model.parameters(), lr=lr)
print("=== Base MLP ===")
train(model, train_loader,test_loader, criter, optimy, num_epoch)

=== Base MLP ===
Epoch 1: train_acc=0.7778, val_acc=0.8801
Epoch 2: train_acc=0.8425, val_acc=0.8179
Epoch 3: train_acc=0.8409, val_acc=0.8808
Epoch 4: train_acc=0.8701, val_acc=0.8651
Epoch 5: train_acc=0.8640, val_acc=0.8612
Epoch 6: train_acc=0.8571, val_acc=0.8482
Epoch 7: train_acc=0.8537, val_acc=0.8652
Epoch 8: train_acc=0.8586, val_acc=0.8680
Epoch 9: train_acc=0.8618, val_acc=0.8523
Epoch 10: train_acc=0.8505, val_acc=0.8575


In [131]:
def train(model, train_loader, val_loader, criterion, optimizer, num_epochs):
    for epoch in range(1, num_epochs+1):
        # ——— 학습 단계 ———
        model.train()  
        total_loss = 0.0
        correct = 0

        for xb, yb in train_loader:
            xb, yb = xb.to(device), yb.to(device)

            optimizer.zero_grad()           # ◀ 반드시 호출!
            logits = model(xb)
            loss = criterion(logits, yb)
            loss.backward()
            optimizer.step()

            total_loss += loss.item()
            correct += (logits.argmax(1) == yb).sum().item()

        train_acc = correct / len(train_loader.dataset)

        # ——— 평가 단계 ———
        model.eval()
        correct = 0
        with torch.no_grad():
            for xb, yb in val_loader:
                xb, yb = xb.to(device), yb.to(device)
                logits = model(xb)
                correct += (logits.argmax(1) == yb).sum().item()

        val_acc = correct / len(val_loader.dataset)

        print(f"Epoch {epoch}: "
              f"train_acc={train_acc:.4f}, "
              f"val_acc={val_acc:.4f}, "
              f"avg_loss={total_loss/len(train_loader):.4f}")


In [133]:
# 2) 정규화 + Dropout MLP
model=Model(dropout_p=0.0).to(device)
num_epoch=10
lr=1e-4
criter = nn.CrossEntropyLoss()
reg_model = Model(dropout_p=0.1).to(device)
# L2 λ=1e-4 적용
opt2 = optim.SGD(reg_model.parameters(), lr=0.01)#, weight_decay=1e-4)
print("\n=== Regularized MLP (L2 + Dropout) ===")
train(reg_model, train_loader,test_loader, criter, optimy, num_epoch)


=== Regularized MLP (L2 + Dropout) ===
Epoch 1: train_acc=0.0871, val_acc=0.0855, avg_loss=2.3139
Epoch 2: train_acc=0.0854, val_acc=0.0855, avg_loss=2.3143
Epoch 3: train_acc=0.0869, val_acc=0.0855, avg_loss=2.3144
Epoch 4: train_acc=0.0852, val_acc=0.0855, avg_loss=2.3142
Epoch 5: train_acc=0.0854, val_acc=0.0855, avg_loss=2.3145
Epoch 6: train_acc=0.0864, val_acc=0.0855, avg_loss=2.3142
Epoch 7: train_acc=0.0856, val_acc=0.0855, avg_loss=2.3143
Epoch 8: train_acc=0.0862, val_acc=0.0855, avg_loss=2.3140
Epoch 9: train_acc=0.0864, val_acc=0.0855, avg_loss=2.3145
Epoch 10: train_acc=0.0857, val_acc=0.0855, avg_loss=2.3145


In [135]:
reg_model = Model(dropout_p=0.1).to(device)
optimizer = optim.SGD(
    reg_model.parameters(),
    lr=1e-4,
    weight_decay=1e-4
)
criterion = nn.CrossEntropyLoss()
train(
    model=reg_model,
    train_loader=train_loader,
    val_loader=test_loader,
    criterion=criterion,
    optimizer=optimizer,
    num_epochs=10
)


Epoch 1: train_acc=0.1775, val_acc=0.3011, avg_loss=2.2635
Epoch 2: train_acc=0.3375, val_acc=0.4226, avg_loss=2.1658
Epoch 3: train_acc=0.4395, val_acc=0.5014, avg_loss=2.0806
Epoch 4: train_acc=0.5096, val_acc=0.5600, avg_loss=1.9982
Epoch 5: train_acc=0.5612, val_acc=0.6092, avg_loss=1.9148
Epoch 6: train_acc=0.6023, val_acc=0.6433, avg_loss=1.8298
Epoch 7: train_acc=0.6280, val_acc=0.6731, avg_loss=1.7467
Epoch 8: train_acc=0.6533, val_acc=0.6931, avg_loss=1.6641
Epoch 9: train_acc=0.6746, val_acc=0.7119, avg_loss=1.5835
Epoch 10: train_acc=0.6928, val_acc=0.7277, avg_loss=1.5062
