# Wandb
머신러닝 실험 관리를 더 편하게 도와주는 툴  

### Model Experiment Pipeline
1. 디자인
2. 실험하고 개발하는 단계
3. 배포하고 운영하는 단계

#### 필수 구성 요소(Configuration)
- Dataset
- Metric
- Model
- Hyper-parameter

#### 모델 실험
머신러닝 혹은 딥러닝 모델을 학습할 때, configuration 값을 적절하게 선택해야 함  
- 하이퍼파라미터를 변경하고, 성능을 체크하는 방식은 비효율적
- 기록 누락의 가능성
- 정리 과정이 복잡함

### WandB (Weights & Bias)
더 나은 모델을 빨리 만들 수 있도록 도와주는 머신러닝 실험 추적 툴
#### 주요 기능
![img](https://1039519455-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-Lqya5RvLedGEWPhtkjU%2F-MjWyygstFtfMrBD1N_X%2F-MjWzA_zt-9vgPF0IXgn%2FW%26B%20Diagram%20-%2020210913.png?alt=media&token=4ef27c1e-dc8b-43cf-9435-99ba770bc42d)

1. Experiments : 모델 실험을 추적하기 위한 대시보드 제공
2. Artifacts : Dataset과 모델 버전 관리
3. Tables : 데이터를 로깅해 시각화하고 쿼리하는데 사용
4. Sweeps : 하이퍼파라미터를 자동으로 튜닝해 최적화
5. Reports : 실험을 문서로 정리해 공유

In [1]:
import wandb

In [2]:
# login
wandb.login()

Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.
[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize
[34m[1mwandb[0m: Paste an API key from your profile and hit enter, or press ctrl+c to quit:[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: C:\Users\spec3/.netrc


True

In [2]:
# config setting
config  = {
    'epochs': 5,
    'classes':10,
    'batch_size': 128,
    'kernels': [16, 32],
    'weight_decay': 0.0005,
    'learning_rate': 1e-3,
    'dataset': 'MNIST',
    'architecture': 'CNN',
    'seed': 42
    }

In [9]:
import torch
import torchvision
import torch.nn as nn
from torchvision import datasets
from torch.utils.data import DataLoader
import torch.optim as optim

In [7]:
# dataset
def make_loader(batch_size, train=True):
    full_dataset = datasets.MNIST(root='./data/MNIST', train=train,
                                    download=True,  transform=transforms.ToTensor())

    loader = DataLoader(dataset=full_dataset,
                        batch_size=batch_size,
                        shuffle=True,
                        pin_memory=True, num_workers=2)
    return loader

In [8]:
# model
class ConvNet(nn.Module):
    def __init__(self, kernels, classes=10):
        super(ConvNet, self).__init__()

        self.layer1 = nn.Sequential(
            nn.Conv2d(1, kernels[0], kernel_size=5, stride=1, padding=2),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2))
        self.layer2 = nn.Sequential(
            nn.Conv2d(16, kernels[1], kernel_size=5, stride=1, padding=2),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2))
        self.fc = nn.Linear(7 * 7 * kernels[-1], classes)

    def forward(self, x):
        out = self.layer1(x)
        out = self.layer2(out)
        out = out.reshape(out.size(0), -1)
        out = self.fc(out)
        return out

In [10]:
# train
def train(model, loader, criterion, optimizer, config):
    wandb.watch(model, criterion, log="all", log_freq=10)

    example_ct = 0
    for epoch in tqdm(range(config.epochs)):
        cumu_loss = 0
        for images, labels in loader:

            images, labels = images.to(device), labels.to(device)

            outputs = model(images)
            loss = criterion(outputs, labels)
            cumu_loss += loss.item()

            optimizer.zero_grad()
            loss.backward()

            optimizer.step()

            example_ct +=  len(images)

        avg_loss = cumu_loss / len(loader)
        wandb.log({"loss": avg_loss}, step=epoch)
        print(f"TRAIN: EPOCH {epoch + 1:04d} / {config.epochs:04d} | Epoch LOSS {avg_loss:.4f}")

In [11]:
# run
def run(config=None):
    wandb.init(project='test-pytorch', entity='pebpung', config=config)

    config = wandb.config

    train_loader = make_loader(batch_size=config.batch_size, train=True)
    test_loader = make_loader(batch_size=config.batch_size, train=False)

    model = ConvNet(config.kernels, config.classes).to(device)
    criterion = nn.CrossEntropyLoss()
    optimizer = torch.optim.Adam(model.parameters(), lr=config.learning_rate)

    train(model, train_loader, criterion, optimizer, config)
    test(model, test_loader)
    return model