## 2017121261 / 소프트웨어학과 / 허태영 / HW#1

#### pytorch lightning Cifar10

pytorch lightning 툴과 Cifar10 dataset을 이용하여 학습하시면 됩니다. model은 자유입니다.

실험 결과(Loss, Accuracy) 는 Weight & Bias 를 이용해주세요. 



제출은 Git 링크와 실험 결과를 로그한 Weight & Bias 링크를 첨부해주면 됩니다.

weight&bias 링크의 프로젝트명은 누군지 알아볼 수 있도록 [학번_이름_과제명]형식으로 작성해주세요.



[Weight & Bias 참조 링크]

https://wandb.ai/wandb_fc/korean/reports/Weights-Biases-Pytorch-Lightning---VmlldzozNzAxOTg

## Library

In [1]:
import pytorch_lightning as pl
from pytorch_lightning.loggers import WandbLogger

import torch
from torch import nn
from torch.nn import functional as F
from torch.utils.data import random_split, DataLoader

from torchmetrics import Accuracy

from torchvision import transforms
from torchvision.datasets import CIFAR10

import wandb
import os

## WandB

In [2]:
wandb.login()

Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.
[34m[1mwandb[0m: Currently logged in as: [33mhty[0m. Use [1m`wandb login --relogin`[0m to force relogin


True

## Static variables

In [3]:
BATCH_SIZE = 256 if torch.cuda.is_available() else 64
NUM_WORKERS = int(os.cpu_count() / 2)

## CIFAR-10 DataModule

In [4]:
class CIFAR10DataModule(pl.LightningDataModule):
    def __init__(self, batch_size, num_workers, data_dir: str = '../data'):

        super().__init__()
        self.batch_size = batch_size
        self.num_workers = num_workers
        self.data_dir = data_dir

        self.transform = transforms.Compose([
            transforms.ToTensor(),
            transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
        ])
        
        self.num_classes = 10
    
    def prepare_data(self):
        CIFAR10(self.data_dir, train=True, download=True)
        CIFAR10(self.data_dir, train=False, download=True)
    
    def setup(self, stage=None):
        if stage == 'fit' or stage is None:
            cifar_full = CIFAR10(self.data_dir, train=True, transform=self.transform)
            self.cifar_train, self.cifar_val = random_split(cifar_full, [45000, 5000])

        if stage == 'test' or stage is None:
            self.cifar_test = CIFAR10(self.data_dir, train=False, transform=self.transform)
    
    def train_dataloader(self):
        return DataLoader(
            self.cifar_train,
            batch_size=self.batch_size,
            num_workers=self.num_workers,
            shuffle=True
            )

    def val_dataloader(self):
        return DataLoader(
            self.cifar_val, 
            batch_size=self.batch_size,
            num_workers=self.num_workers,
            )

    def test_dataloader(self):
        return DataLoader(
            self.cifar_test, 
            batch_size=self.batch_size,
            num_workers=self.num_workers,
            )

In [5]:
class ImagePredictionLogger(pl.callbacks.Callback):
    def __init__(self, val_samples, num_samples=32):
        super().__init__()
        self.num_samples = num_samples
        self.val_imgs, self.val_labels = val_samples
    
    def on_validation_epoch_end(self, trainer, pl_module):
        val_imgs = self.val_imgs.to(device=pl_module.device)
        val_labels = self.val_labels.to(device=pl_module.device)
        logits = pl_module(val_imgs)
        preds = torch.argmax(logits, -1)
        trainer.logger.experiment.log({
            "examples":[wandb.Image(x, caption=f"Prediction:{pred}, Label:{y}") 
                           for x, pred, y in zip(val_imgs[:self.num_samples], 
                                                 preds[:self.num_samples], 
                                                 val_labels[:self.num_samples])]
            })

In [17]:
class LitModel(pl.LightningModule):
    def __init__(self, input_shape, num_classes, learning_rate=2e-4):
        super().__init__()
        
        # log hyperparameters
        self.save_hyperparameters()
        self.learning_rate = learning_rate
        
        self.conv1 = nn.Conv2d(3, 32, 3, 1)
        self.conv2 = nn.Conv2d(32, 32, 3, 1)
        self.conv3 = nn.Conv2d(32, 64, 3, 1)
        self.conv4 = nn.Conv2d(64, 64, 3, 1)

        self.pool1 = torch.nn.MaxPool2d(2)
        self.pool2 = torch.nn.MaxPool2d(2)
        
        n_sizes = self._get_conv_output(input_shape)

        self.fc1 = nn.Linear(n_sizes, 512)
        self.fc2 = nn.Linear(512, 128)
        self.fc3 = nn.Linear(128, num_classes)

        self.accuracy = Accuracy()

    # returns the size of the output tensor going into Linear layer from the conv block.
    def _get_conv_output(self, shape):
        batch_size = 1
        input = torch.autograd.Variable(torch.rand(batch_size, *shape))

        output_feat = self._forward_features(input) 
        n_size = output_feat.data.view(batch_size, -1).size(1)
        return n_size
        
    # returns the feature tensor from the conv block
    def _forward_features(self, x):
        x = F.relu(self.conv1(x))
        x = self.pool1(F.relu(self.conv2(x)))
        x = F.relu(self.conv3(x))
        x = self.pool2(F.relu(self.conv4(x)))
        return x
    
    # will be used during inference
    def forward(self, x):
       x = self._forward_features(x)
       x = x.view(x.size(0), -1)
       x = F.relu(self.fc1(x))
       x = F.relu(self.fc2(x))
       x = F.log_softmax(self.fc3(x), dim=1)
       
       return x
    
    def training_step(self, batch, batch_idx):
        x, y = batch
        logits = self(x)
        loss = F.nll_loss(logits, y)
        
        # training metrics
        preds = torch.argmax(logits, dim=1)
        acc = self.accuracy(preds, y)
        self.log('train_loss', loss, on_step=True, on_epoch=True, logger=True)
        self.log('train_acc', acc, on_step=True, on_epoch=True, logger=True)
        
        return loss
    
    def validation_step(self, batch, batch_idx):
        x, y = batch
        logits = self(x)
        loss = F.nll_loss(logits, y)

        # validation metrics
        preds = torch.argmax(logits, dim=1)
        acc = self.accuracy(preds, y)
        self.log('val_loss', loss, prog_bar=True)
        self.log('val_acc', acc, prog_bar=True)
        return loss
    
    def test_step(self, batch, batch_idx):
        x, y = batch
        logits = self(x)
        loss = F.nll_loss(logits, y)
        
        # validation metrics
        preds = torch.argmax(logits, dim=1)
        acc = self.accuracy(preds, y)
        self.log('test_loss', loss, prog_bar=True)
        self.log('test_acc', acc, prog_bar=True)
        return loss
    
    def configure_optimizers(self):
        optimizer = torch.optim.Adam(self.parameters(), lr=self.learning_rate)
        return optimizer


In [26]:
dm = CIFAR10DataModule(
    batch_size=BATCH_SIZE,
    num_workers=0,
    data_dir='../data'
    )

dm.prepare_data()
dm.setup()

Files already downloaded and verified
Files already downloaded and verified


In [27]:
# Samples required by the custom ImagePredictionLogger callback to log image predictions.
val_samples = next(iter(dm.val_dataloader()))
val_imgs, val_labels = val_samples[0], val_samples[1]
val_imgs.shape, val_labels.shape

(torch.Size([256, 3, 32, 32]), torch.Size([256]))

In [31]:
model = LitModel((3, 32, 32), dm.num_classes)

# Initialize wandb logger
wandb_logger = WandbLogger(project='homework_1_v2', job_type='cifar10_clf')

# Callbacks
early_stop_callback = pl.callbacks.EarlyStopping(monitor="val_loss")
checkpoint_callback = pl.callbacks.ModelCheckpoint()

# Trainer
trainer = pl.Trainer(
        max_epochs=10,
        logger=wandb_logger,
        gpus=1,
        #auto_select_gpus=True,
        callbacks=[early_stop_callback,
        ImagePredictionLogger(val_samples),
        checkpoint_callback],
        )

# Train
trainer.fit(model, dm)

# Evaluate
trainer.test(test_dataloaders=dm.test_dataloader())

# Close wandb
wandb.finish()

GPU available: True, used: True
TPU available: False, using: 0 TPU cores
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1,2,3]


Epoch 2:  96%|█████████▋| 189/196 [00:35<00:01,  5.35it/s, loss=1.46, v_num=r8si, val_loss=1.550, val_acc=0.431]


  | Name     | Type      | Params
---------------------------------------
0 | conv1    | Conv2d    | 896   
1 | conv2    | Conv2d    | 9.2 K 
2 | conv3    | Conv2d    | 18.5 K
3 | conv4    | Conv2d    | 36.9 K
4 | pool1    | MaxPool2d | 0     
5 | pool2    | MaxPool2d | 0     
6 | fc1      | Linear    | 819 K 
7 | fc2      | Linear    | 65.7 K
8 | fc3      | Linear    | 1.3 K 
9 | accuracy | Accuracy  | 0     
---------------------------------------
952 K     Trainable params
0         Non-trainable params
952 K     Total params
3.809     Total estimated model params size (MB)


Validation sanity check:   0%|          | 0/2 [00:00<?, ?it/s]

  f'The dataloader, {name}, does not have many workers which may be a bottleneck.'


                                                              

  f'The dataloader, {name}, does not have many workers which may be a bottleneck.'


Epoch 2:  94%|█████████▍| 184/196 [08:45<00:34,  2.85s/it, loss=1.46, v_num=r8si, val_loss=1.500, val_acc=0.457]
Epoch 2:  96%|█████████▋| 189/196 [01:32<00:03,  2.05it/s, loss=1.46, v_num=r8si, val_loss=1.550, val_acc=0.431]
Epoch 9: 100%|██████████| 196/196 [00:08<00:00, 23.14it/s, loss=1.13, v_num=u9dh, val_loss=1.180, val_acc=0.579]

LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1,2,3]
  f'The dataloader, {name}, does not have many workers which may be a bottleneck.'



Testing: 100%|██████████| 40/40 [00:01<00:00, 28.64it/s]
--------------------------------------------------------------------------------
DATALOADER:0 TEST RESULTS
{'test_acc': 0.593999981880188, 'test_loss': 1.1393182277679443}
--------------------------------------------------------------------------------


0,1
epoch,▁▁▁▁▂▂▂▂▃▃▃▃▃▃▃▃▄▄▄▄▅▅▅▅▆▆▆▆▆▆▆▆▇▇▇█████
test_acc,▁
test_loss,▁
train_acc_epoch,▁▄▅▅▆▆▇▇██
train_acc_step,▁▁▁▂▃▅▄▄▄▄▄▆▅▆▅▆▆▅▆▆▇▆▆▆▇▇▇▇▆▇█▇▇█▇
train_loss_epoch,█▅▄▄▃▃▂▂▁▁
train_loss_step,██▇▆▅▄▅▄▄▅▄▄▃▄▃▃▃▄▃▃▂▃▃▂▂▂▂▂▂▃▂▂▁▁▂
trainer/global_step,▁▁▂▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇▇████
val_acc,▁▃▄▅▆▆▇▇██
val_loss,█▅▄▄▃▃▃▂▁▁

0,1
epoch,9.0
test_acc,0.594
test_loss,1.13932
train_acc_epoch,0.59629
train_acc_step,0.59375
train_loss_epoch,1.13737
train_loss_step,1.19624
trainer/global_step,1760.0
val_acc,0.5792
val_loss,1.18458
