AI Expert - Learning with Noisy Labels Basic Practice
====
진행 조교 : 김재윤, 김우재, 조윤기

## Instruction
> 안녕하세요. 본 실습 강의에서는 Learning with noisy labels와 관련된 기본적인 요소들을 직접 구현하며, 기계 학습에서 흔히 발생하는 문제 중 하나인 노이즈가 있는 데이터에 대처하는 방법에 대해 이해하는 것을 목표로 합니다. 노이즈가 있는 레이블을 다루고 해결하기 위한 간단한 기법들을 습득할 수 있습니다.
## Preparation
> 우선 창 왼쪽 상단의 **파일** 탭의, **Drive에 사본 저장** 버튼을 눌러 본 Colab 파일의 사본을 만들고, 실습을 진행하시길 바랍니다.
## Reference materials
> 아래는 본 과제 실습에서 주로 활용하는 PyTorch, NumPy 의 documentation 입니다.
* PyTorch \[[Documentation](https://pytorch.org/docs/stable/index.html)\]
* NumPy  \[[Documentation](https://numpy.org/doc/stable/)\]





## Step 1: Set the enviroments
실습 진행을 위한 기본적인 환경 설정을 진행합니다.

### Step 1-1: Import the necessary libraries
Noisy label을 통한 모델 학습에 활용할 라이브러리를 import합니다.

In [1]:
import csv
import os
import os.path
import pickle
from typing import Any, Callable, Optional, Tuple
import random

import numpy as np
from PIL import Image
from tqdm.notebook import tqdm

import torch
from torch.autograd import Variable
from torch.backends import cudnn
import torch.backends.cudnn as cudnn
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
import torchvision.datasets as datasets
from torchvision.datasets.utils import check_integrity, download_and_extract_archive
from torchvision.datasets.vision import VisionDataset


random.seed(1)
np.random.seed(1)
torch.manual_seed(1)
torch.cuda.manual_seed(1)
torch.cuda.manual_seed_all(1)
cudnn.deterministic = True
cudnn.benchmark = False

device = torch.device('cuda')

### Step 1-2 Construct the dataset with noisy labels
실습에 활용할 레이블 노이즈 데이터 셋을 구성합니다. 본 실습에서는 대표적인 이미지 데이터셋인 CIFAR-10을 활용합니다. 제한된 시간에서 효율적인 실습 진행을 위해 해당 데이터셋의 일부만을 활용할 예정입니다.

In [2]:
class CIFAR10(VisionDataset):
    """Modified from `CIFAR10 <https://www.cs.toronto.edu/~kriz/cifar.html>`_ Dataset.

    Args:
        root (string): Root directory of dataset where directory
            ``cifar-10-batches-py`` exists or will be saved to if download is set to True.
        train (bool, optional): If True, creates dataset from training set, otherwise
            creates from test set.
        transform (callable, optional): A function/transform that takes in an PIL image
            and returns a transformed version. E.g, ``transforms.RandomCrop``
        target_transform (callable, optional): A function/transform that takes in the
            target and transforms it.
        download (bool, optional): If true, downloads the dataset from the internet and
            puts it in root directory. If dataset is already downloaded, it is not
            downloaded again.

    """

    base_folder = "cifar-10-batches-py"
    url = "https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz"
    filename = "cifar-10-python.tar.gz"
    tgz_md5 = "c58f30108f718f92721af3b95e74349a"
    train_list = [
        ["data_batch_1", "c99cafc152244af753f735de768cd75f"],
        ["data_batch_2", "d4bba439e000b95fd0a9bffe97cbabec"],
        ["data_batch_3", "54ebc095f3ab1f0389bbae665268c751"],
        ["data_batch_4", "634d18415352ddfa80567beed471001a"],
        ["data_batch_5", "482c414d41f54cd18b22e5b47cb7c3cb"],
    ]

    test_list = [
        ["test_batch", "40351d587109b95175f43aff81a1287e"],
    ]
    meta = {
        "filename": "batches.meta",
        "key": "label_names",
        "md5": "5ff9c542aee3614f3951f8cda6e48888",
    }

    def __init__(
        self,
        root: str,
        train: bool = True,
        transform: Optional[Callable] = None,
        target_transform: Optional[Callable] = None,
        download: bool = False,
        num_images_per_class: int = 1000
    ) -> None:

        super().__init__(root, transform=transform, target_transform=target_transform)

        self.train = train  # training set or test set

        if download:
            self.download()

        if not self._check_integrity():
            raise RuntimeError("Dataset not found or corrupted. You can use download=True to download it")

        if self.train:
            downloaded_list = self.train_list
        else:
            downloaded_list = self.test_list

        self.data: Any = []
        self.targets = []

        # now load the picked numpy arrays
        for file_name, checksum in downloaded_list:
            file_path = os.path.join(self.root, self.base_folder, file_name)
            with open(file_path, "rb") as f:
                entry = pickle.load(f, encoding="latin1")
                self.data.append(entry["data"])
                if "labels" in entry:
                    self.targets.extend(entry["labels"])
                else:
                    self.targets.extend(entry["fine_labels"])

        self.data = np.vstack(self.data).reshape(-1, 3, 32, 32)
        self.data = self.data.transpose((0, 2, 3, 1))  # convert to HWC

        self._load_meta()

        ''' Truncate number of images per class to 1000 '''
        if num_images_per_class > 0:
          self.trun_data = []
          self.trun_targets = []
          count_dict = {}
          for data, target in zip(self.data, self.targets):
              if target not in count_dict:
                  count_dict[target] = 0
              if count_dict[target] >= num_images_per_class:
                  continue
              count_dict[target] += 1
              self.trun_data.append(data)
              self.trun_targets.append(target)
          self.data, self.targets = self.trun_data, self.trun_targets


    def _load_meta(self) -> None:
        path = os.path.join(self.root, self.base_folder, self.meta["filename"])
        if not check_integrity(path, self.meta["md5"]):
            raise RuntimeError("Dataset metadata file not found or corrupted. You can use download=True to download it")
        with open(path, "rb") as infile:
            data = pickle.load(infile, encoding="latin1")
            self.classes = data[self.meta["key"]]
        self.class_to_idx = {_class: i for i, _class in enumerate(self.classes)}

    def __getitem__(self, index: int) -> Tuple[Any, Any]:
        """
        Args:
            index (int): Index

        Returns:
            tuple: (image, target) where target is index of the target class.
        """
        img, target = self.data[index], self.targets[index]

        # doing this so that it is consistent with all other datasets
        # to return a PIL Image
        img = Image.fromarray(img)

        if self.transform is not None:
            img = self.transform(img)

        if self.target_transform is not None:
            target = self.target_transform(target)

        return img, target

    def __len__(self) -> int:
        return len(self.data)

    def _check_integrity(self) -> bool:
        for filename, md5 in self.train_list + self.test_list:
            fpath = os.path.join(self.root, self.base_folder, filename)
            if not check_integrity(fpath, md5):
                return False
        return True

    def download(self) -> None:
        if self._check_integrity():
            print("Files already downloaded and verified")
            return
        download_and_extract_archive(self.url, self.root, filename=self.filename, md5=self.tgz_md5)

    def extra_repr(self) -> str:
        split = "Train" if self.train is True else "Test"
        return f"Split: {split}"

아래는 구축한 CIFAR-10 데이터셋에 대하여 이미지 전처리를 적용하고 noisy label이 없는 데이터셋을 불러오는 과정입니다.

In [3]:
batch_size = 128
eval_batch_size = 100

transform_train = transforms.Compose([
    transforms.RandomCrop(32, padding=4),
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    transforms.Normalize((0.4914, 0.4822, 0.4465),
                          (0.2023, 0.1994, 0.2010)),
])

transform_test = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
])

train_dataset = CIFAR10(root='~/data', train=True, download=True, transform=transform_train, num_images_per_class=2000)
testset = CIFAR10(root='~/data', train=False, download=True, transform=transform_test, num_images_per_class=-1)

Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to /root/data/cifar-10-python.tar.gz


100%|██████████| 170498071/170498071 [00:03<00:00, 46541686.64it/s]


Extracting /root/data/cifar-10-python.tar.gz to /root/data
Files already downloaded and verified


**[ *Problem-1* ]** Implement noisy labels. Noise 비율\(noise_ratio\)에 해당되는 데이터에 **random label**을 부여합니다.

In [4]:
def inject_label_noise(dataset, noise_ratio=0.5):
  """
    Inject label noises to a given dataset

    Args:
        dataset (torch.utils.data.Dataset): dataset to add noise to.
        noise_ratio (float): ratio to add noise . Default, 0.5
  """
  noisy_labels = dataset.targets.copy()

  """
    Q. Write your code to inject label noises.
    Randomly assign labels to certain data.
    Using python random library and numpy.random would be helpful.
  """
  N = len(noisy_labels)
  num_noise = int(N * noise_ratio)
  num_class = 10

  indices_noise = random.sample(range(N), num_noise)
  label_noise = np.random.choice(num_class, num_noise)

  for idx, new_label in zip(indices_noise, label_noise):
    noisy_labels[idx] = new_label

  dataset.targets = noisy_labels
  return dataset

In [10]:
np.random.choice(10, 20)

array([5, 8, 9, 5, 0, 0, 1, 7, 6, 9, 2, 4, 5, 2, 4, 2, 4, 7, 7, 9])

아래는 구현된 inject_label_noise 함수를 활용하여 dataset에 noisy label을 추가하고, 이를 기반으로 dataset loader를 로드하는 과정입니다.

In [5]:
noisy_train_dataset = inject_label_noise(train_dataset, noise_ratio=0.5)

train_dataloader = torch.utils.data.DataLoader(noisy_train_dataset, batch_size=batch_size, shuffle=True)
test_dataloader = torch.utils.data.DataLoader(testset, batch_size=eval_batch_size, shuffle=False)

In [None]:
noisy_train_dataset.targets

## Step 2: Train a image classification model with noisy labels
Learning with noisy labels를 위한 여러 loss 함수를 구현하고, 모델을 학습합니다.

### Step 2-1: Create a image classification model
본 실습에서는 resnet18을 활용하여 간단한 이미지 분류 모델을 생성합니다.

In [12]:
import torch
import torch.nn as nn
import torch.nn.functional as F


class BasicBlock(nn.Module):
    expansion = 1

    def __init__(self, in_planes, planes, stride=1):
        super(BasicBlock, self).__init__()
        self.conv1 = nn.Conv2d(
            in_planes, planes, kernel_size=3, stride=stride, padding=1, bias=False)
        self.bn1 = nn.BatchNorm2d(planes)
        self.conv2 = nn.Conv2d(planes, planes, kernel_size=3,
                               stride=1, padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(planes)

        self.shortcut = nn.Sequential()
        if stride != 1 or in_planes != self.expansion*planes:
            self.shortcut = nn.Sequential(
                nn.Conv2d(in_planes, self.expansion*planes,
                          kernel_size=1, stride=stride, bias=False),
                nn.BatchNorm2d(self.expansion*planes)
            )

    def forward(self, x):
        out = F.relu(self.bn1(self.conv1(x)))
        out = self.bn2(self.conv2(out))
        out += self.shortcut(x)
        out = F.relu(out)
        return out


class Bottleneck(nn.Module):
    expansion = 4

    def __init__(self, in_planes, planes, stride=1):
        super(Bottleneck, self).__init__()
        self.conv1 = nn.Conv2d(in_planes, planes, kernel_size=1, bias=False)
        self.bn1 = nn.BatchNorm2d(planes)
        self.conv2 = nn.Conv2d(planes, planes, kernel_size=3,
                               stride=stride, padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(planes)
        self.conv3 = nn.Conv2d(planes, self.expansion *
                               planes, kernel_size=1, bias=False)
        self.bn3 = nn.BatchNorm2d(self.expansion*planes)

        self.shortcut = nn.Sequential()
        if stride != 1 or in_planes != self.expansion*planes:
            self.shortcut = nn.Sequential(
                nn.Conv2d(in_planes, self.expansion*planes,
                          kernel_size=1, stride=stride, bias=False),
                nn.BatchNorm2d(self.expansion*planes)
            )

    def forward(self, x):
        out = F.relu(self.bn1(self.conv1(x)))
        out = F.relu(self.bn2(self.conv2(out)))
        out = self.bn3(self.conv3(out))
        out += self.shortcut(x)
        out = F.relu(out)
        return out


class ResNet(nn.Module):
    def __init__(self, block, num_blocks, num_classes=10):
        super(ResNet, self).__init__()
        self.in_planes = 64

        self.conv1 = nn.Conv2d(3, 64, kernel_size=3,
                               stride=1, padding=1, bias=False)
        self.bn1 = nn.BatchNorm2d(64)
        self.layer1 = self._make_layer(block, 64, num_blocks[0], stride=1)
        self.layer2 = self._make_layer(block, 128, num_blocks[1], stride=2)
        self.layer3 = self._make_layer(block, 256, num_blocks[2], stride=2)
        self.layer4 = self._make_layer(block, 512, num_blocks[3], stride=2)
        self.classifier = nn.Linear(512*block.expansion, num_classes)

    def _make_layer(self, block, planes, num_blocks, stride):
        strides = [stride] + [1]*(num_blocks-1)
        layers = []
        for stride in strides:
            layers.append(block(self.in_planes, planes, stride))
            self.in_planes = planes * block.expansion
        return nn.Sequential(*layers)

    def forward(self, x):
        out = F.relu(self.bn1(self.conv1(x)))
        out = self.layer1(out)
        out = self.layer2(out)
        out = self.layer3(out)
        out = self.layer4(out)
        out = F.avg_pool2d(out, 4)
        out = out.view(out.size(0), -1)
        out = self.classifier(out)
        return out


def ResNet18():
    return ResNet(BasicBlock, [2, 2, 2, 2])


def ResNet34():
    return ResNet(BasicBlock, [3, 4, 6, 3])


def ResNet50():
    return ResNet(Bottleneck, [3, 4, 6, 3])


def ResNet101():
    return ResNet(Bottleneck, [3, 4, 23, 3])


def ResNet152():
    return ResNet(Bottleneck, [3, 8, 36, 3])

아래는 입력된 loss function \(criterion\) 활용해 모델을 학습하는 클래스입니다.

In [13]:
class Trainer(object):
    def __init__(self, model, device, criterion):
        super(Trainer, self).__init__()
        self.model = model
        self.device = device
        self.criterion = criterion

    def train(self, train_dataloader, optimizer, epoch):
        self.model.train()

        for i in range(epoch):
            for j, data in tqdm(enumerate(train_dataloader), total=len(train_dataloader), desc='Epoch [{} / {}]'.format(i+1,epoch)):
                inputs, targets = data
                inputs, targets = inputs.to(self.device), targets.to(self.device)
                outputs = self.model(inputs)
                loss = self.criterion(outputs, targets)  # compute cross-entropy loss

                optimizer.zero_grad()
                loss.backward()
                optimizer.step()

아래는 학습된 모델을 평가하는 클래스입니다.

In [14]:
class Evaluator(object):
    def __init__(self, model, device):
        super(Evaluator, self).__init__()
        self.model = model
        self.device = device

    def test(self, test_dataloader):
        self.model.eval()
        correct = 0
        total = 0
        for j, data in tqdm(enumerate(test_dataloader), total=len(test_dataloader), desc='Evaluation'):
            inputs, targets = data
            inputs, targets = inputs.to(self.device), targets.to(self.device)

            outputs = self.model(inputs)
            _, predicted = outputs.max(1)
            total += targets.size(0)
            correct += predicted.eq(targets).sum().item()

        accuracy = correct/total
        print('\nAccuracy: {:.2%} \n'.format(accuracy))

### Step 2-2: Train a baseline model with cross-entrpy loss
이미지 분류 모델을 생성하고 Cross-entropy loss를 활용하여 baseline 모델을 학습합니다.


In [15]:
random.seed(1)
np.random.seed(1)
torch.manual_seed(1)
torch.cuda.manual_seed(1)
torch.cuda.manual_seed_all(1)

baseline_model = ResNet18().to(device)
baseline_loss = nn.CrossEntropyLoss()
baseline_optimizer = optim.SGD(baseline_model.parameters(), lr=0.1, momentum=0.9, weight_decay=5e-4)
baseline_trainer = Trainer(model=baseline_model, device=device, criterion=baseline_loss)
baseline_trainer.train(train_dataloader, baseline_optimizer, epoch=20)

Epoch [1 / 20]:   0%|          | 0/157 [00:00<?, ?it/s]

Epoch [2 / 20]:   0%|          | 0/157 [00:00<?, ?it/s]

Epoch [3 / 20]:   0%|          | 0/157 [00:00<?, ?it/s]

Epoch [4 / 20]:   0%|          | 0/157 [00:00<?, ?it/s]

Epoch [5 / 20]:   0%|          | 0/157 [00:00<?, ?it/s]

Epoch [6 / 20]:   0%|          | 0/157 [00:00<?, ?it/s]

Epoch [7 / 20]:   0%|          | 0/157 [00:00<?, ?it/s]

Epoch [8 / 20]:   0%|          | 0/157 [00:00<?, ?it/s]

Epoch [9 / 20]:   0%|          | 0/157 [00:00<?, ?it/s]

Epoch [10 / 20]:   0%|          | 0/157 [00:00<?, ?it/s]

Epoch [11 / 20]:   0%|          | 0/157 [00:00<?, ?it/s]

Epoch [12 / 20]:   0%|          | 0/157 [00:00<?, ?it/s]

Epoch [13 / 20]:   0%|          | 0/157 [00:00<?, ?it/s]

Epoch [14 / 20]:   0%|          | 0/157 [00:00<?, ?it/s]

Epoch [15 / 20]:   0%|          | 0/157 [00:00<?, ?it/s]

Epoch [16 / 20]:   0%|          | 0/157 [00:00<?, ?it/s]

Epoch [17 / 20]:   0%|          | 0/157 [00:00<?, ?it/s]

Epoch [18 / 20]:   0%|          | 0/157 [00:00<?, ?it/s]

Epoch [19 / 20]:   0%|          | 0/157 [00:00<?, ?it/s]

Epoch [20 / 20]:   0%|          | 0/157 [00:00<?, ?it/s]

In [16]:
baseline_evaluator = Evaluator(model=baseline_model, device=device)
baseline_evaluator.test(test_dataloader)

Evaluation:   0%|          | 0/100 [00:00<?, ?it/s]


Accuracy: 76.51% 



### Step 2-3: Train a baseline model with bootstrapping loss
이미지 분류 모델을 생성하고 bootstrapping loss를 활용하여 모델을 학습합니다.


Bootstrapping은 noisy label이 있는 환경에서 loss를 correction하는 방법 중 하나로, GT label과 model의 prediction을 혼합하여 사용하는 기법을 일컷습니다.

강의자료에 나와 있듯이, bootstrapping은 아래 수식으로 표현될 수 있습니다:

$\mathcal{l}_B = (βy_i + (1-β)z_i)^T log(h_i)$.

여기서 $y_i$ noisy한 상태로 존재할 수도 있는 GT label, 그리고 $z_i$는 모델의 prediction을 나타냅니다. $h_i$는 모델의 softmax logit output을 나타내며, $w_i$는 GT label과 logit의 가중치를 조절하는 hyperparameter를 나타냅니다.

여기서 Soft Bootstrapping은 모델의 prediction을 그대로 사용하는 것을 의미하고, Hard Bootstrapping은 prediction score가 가장 높은 한 개의 class를 선택하여 one-hot encoding된 vector를 사용하는 것을 의미합니다. 본 실습에선 두 버전의 bootstrapping을 모두 구현하여 noisy label이 있는 환경에서의 bootstrapping 학습의 성능을 보고자 합니다.

**[ *Problem-2-1* ]** Implement SoftBootstrapping loss

In [17]:
import torch
from torch.nn import Module
import torch.nn.functional as F


class SoftBootstrappingLoss(Module):
    """
    ``Loss(t, p) = - (beta * t + (1 - beta) * p) * log(p)``

    Args:
        beta (float): bootstrap parameter. Default, 0.95
        reduce (bool): computes mean of the loss. Default, True.
        as_pseudo_label (bool): Stop gradient propagation for the term ``(1 - beta) * p``.
            Can be interpreted as pseudo-label.
    """
    def __init__(self, beta=0.95, reduce=True, as_pseudo_label=True):
        super(SoftBootstrappingLoss, self).__init__()
        self.beta = beta
        self.reduce = reduce
        self.as_pseudo_label = as_pseudo_label

    def forward(self, y_pred, y):
        # cross_entropy = - t * log(p)
        beta_xentropy = self.beta * F.cross_entropy(y_pred, y, reduction='none')

        ''' Implement here '''
        y_pred_z = y_pred.detach()
        z = F.softmax(y_pred_z, dim = 1)

        bootstrap = - (1 - self.beta) * torch.sum(z * F.log_softmax(y_pred, dim = 1), dim = 1)

        return torch.mean(beta_xentropy + bootstrap)


In [18]:
random.seed(1)
np.random.seed(1)
torch.manual_seed(1)
torch.cuda.manual_seed(1)
torch.cuda.manual_seed_all(1)

soft_boostrapping_model = ResNet18().to(device)
soft_boostrapping_loss = SoftBootstrappingLoss()
soft_boostrapping_optimizer = optim.SGD(soft_boostrapping_model.parameters(), lr=0.1, momentum=0.9, weight_decay=5e-4)
soft_boostrapping_trainer = Trainer(model=soft_boostrapping_model, device=device, criterion=soft_boostrapping_loss)
soft_boostrapping_trainer.train(train_dataloader, soft_boostrapping_optimizer, epoch=20)

Epoch [1 / 20]:   0%|          | 0/157 [00:00<?, ?it/s]

Epoch [2 / 20]:   0%|          | 0/157 [00:00<?, ?it/s]

Epoch [3 / 20]:   0%|          | 0/157 [00:00<?, ?it/s]

Epoch [4 / 20]:   0%|          | 0/157 [00:00<?, ?it/s]

Epoch [5 / 20]:   0%|          | 0/157 [00:00<?, ?it/s]

Epoch [6 / 20]:   0%|          | 0/157 [00:00<?, ?it/s]

Epoch [7 / 20]:   0%|          | 0/157 [00:00<?, ?it/s]

Epoch [8 / 20]:   0%|          | 0/157 [00:00<?, ?it/s]

Epoch [9 / 20]:   0%|          | 0/157 [00:00<?, ?it/s]

Epoch [10 / 20]:   0%|          | 0/157 [00:00<?, ?it/s]

Epoch [11 / 20]:   0%|          | 0/157 [00:00<?, ?it/s]

Epoch [12 / 20]:   0%|          | 0/157 [00:00<?, ?it/s]

Epoch [13 / 20]:   0%|          | 0/157 [00:00<?, ?it/s]

Epoch [14 / 20]:   0%|          | 0/157 [00:00<?, ?it/s]

Epoch [15 / 20]:   0%|          | 0/157 [00:00<?, ?it/s]

Epoch [16 / 20]:   0%|          | 0/157 [00:00<?, ?it/s]

Epoch [17 / 20]:   0%|          | 0/157 [00:00<?, ?it/s]

Epoch [18 / 20]:   0%|          | 0/157 [00:00<?, ?it/s]

Epoch [19 / 20]:   0%|          | 0/157 [00:00<?, ?it/s]

Epoch [20 / 20]:   0%|          | 0/157 [00:00<?, ?it/s]

In [19]:
soft_boostrapping_evaluator = Evaluator(model=soft_boostrapping_model, device=device)
soft_boostrapping_evaluator.test(test_dataloader)

Evaluation:   0%|          | 0/100 [00:00<?, ?it/s]


Accuracy: 74.54% 



**[ *Problem-2-2* ]** Implement HardBootstrapping loss

In [21]:
class HardBootstrappingLoss(nn.Module):
    """
    ``Loss(t, p) = - (beta * t + (1 - beta) * z) * log(p)``
    where ``z = argmax(p)``

    Args:
        beta (float): bootstrap parameter. Default, 0.95
        reduce (bool): computes mean of the loss. Default, True.

    """
    def __init__(self, beta=0.8, reduce=True):
        super(HardBootstrappingLoss, self).__init__()
        self.beta = beta
        self.reduce = reduce

    def forward(self, y_pred, y):
        # cross_entropy = - t * log(p)
        beta_xentropy = self.beta * F.cross_entropy(y_pred, y, reduction='none')

        ''' Implement here '''
        z = F.softmax(y_pred.detach(), dim = 1).argmax(dim = 1)
        z = z.view(-1, 1)
        bootstrap = F.log_softmax(y_pred, dim = 1).gather(1, z).view(-1)
        bootstrap = -(1 - self.beta) * bootstrap


        # return beta_xentropy + bootstrap
        return torch.mean(beta_xentropy + bootstrap)

In [22]:
random.seed(1)
np.random.seed(1)
torch.manual_seed(1)
torch.cuda.manual_seed(1)
torch.cuda.manual_seed_all(1)

hard_boostrapping_model = ResNet18().to(device)
hard_boostrapping_loss = HardBootstrappingLoss()
hard_boostrapping_optimizer = optim.SGD(hard_boostrapping_model.parameters(), lr=0.1, momentum=0.9, weight_decay=5e-4)
hard_boostrapping_trainer = Trainer(model=hard_boostrapping_model, device=device, criterion=hard_boostrapping_loss)
hard_boostrapping_trainer.train(train_dataloader, hard_boostrapping_optimizer, epoch=20)

Epoch [1 / 20]:   0%|          | 0/157 [00:00<?, ?it/s]

Epoch [2 / 20]:   0%|          | 0/157 [00:00<?, ?it/s]

Epoch [3 / 20]:   0%|          | 0/157 [00:00<?, ?it/s]

Epoch [4 / 20]:   0%|          | 0/157 [00:00<?, ?it/s]

Epoch [5 / 20]:   0%|          | 0/157 [00:00<?, ?it/s]

Epoch [6 / 20]:   0%|          | 0/157 [00:00<?, ?it/s]

Epoch [7 / 20]:   0%|          | 0/157 [00:00<?, ?it/s]

Epoch [8 / 20]:   0%|          | 0/157 [00:00<?, ?it/s]

Epoch [9 / 20]:   0%|          | 0/157 [00:00<?, ?it/s]

Epoch [10 / 20]:   0%|          | 0/157 [00:00<?, ?it/s]

Epoch [11 / 20]:   0%|          | 0/157 [00:00<?, ?it/s]

Epoch [12 / 20]:   0%|          | 0/157 [00:00<?, ?it/s]

Epoch [13 / 20]:   0%|          | 0/157 [00:00<?, ?it/s]

Epoch [14 / 20]:   0%|          | 0/157 [00:00<?, ?it/s]

Epoch [15 / 20]:   0%|          | 0/157 [00:00<?, ?it/s]

Epoch [16 / 20]:   0%|          | 0/157 [00:00<?, ?it/s]

Epoch [17 / 20]:   0%|          | 0/157 [00:00<?, ?it/s]

Epoch [18 / 20]:   0%|          | 0/157 [00:00<?, ?it/s]

Epoch [19 / 20]:   0%|          | 0/157 [00:00<?, ?it/s]

Epoch [20 / 20]:   0%|          | 0/157 [00:00<?, ?it/s]

In [23]:
hard_boostrapping_evaluator = Evaluator(model=hard_boostrapping_model, device=device)
hard_boostrapping_evaluator.test(test_dataloader)

Evaluation:   0%|          | 0/100 [00:00<?, ?it/s]


Accuracy: 76.03% 



### Step 2-3: Train a baseline model with mixup
이미지 분류 모델을 생성하고 mixup을 활용하여 모델을 학습합니다.


강의 자료와 같이, 아래 수식은 각각 input과 loss에 대한 mixup 형태의 data augmentation 수식입니다.
<div align="center">
$x = \delta x_a + (1 - \delta) x_b$

$\ell = \delta \ell_a + (1 - \delta) \ell_b$
</div>

아래 그림은 위 input 수식에서 두개의 이미지를 weighted alpha blending하는 예시입니다다.
<div align="center">
<img src="https://drive.google.com/uc?export=view&id=1f3FXcUujGppFyqFC6NvL8SyZX7JTYx2o" width="400"/>
</div>

여기서 $\delta$는 mixup parameter로, 학습 iteration마다 랜덤하게 추출되어 활용됩니다. $\delta$의 추출에 주로 활용되는 확률 분포 함수는 beta distribution이며, 이를 구현하여 사용합니다.본 실습에서는 mixup parameter인 위 $\delta$를 학습 iteration마다 랜덤하게 추출하여 활용합니다. 아래 그림은 beta distribution의 parameter별 확률 밀도 함수(PDF)를 나타냅니다.
<div align="center">
<img src="https://drive.google.com/uc?export=view&id=1vTfz8b4bz1bWJLEdVw_WZfCknOI6v4oG" width="400"/>
</div>

본 실습에서는 이러한 mixup 알고리즘을 구현하고, 이를 사용하여 noisy label이 있는 상황에서도 네트워크의 강건한 훈련을 성능 향상을 통해 확인합니다.

**[ *Problem-3* ]** Implement training with mixup

In [24]:
def mixup_data(x, y, alpha=0.2):
    '''Returns mixed inputs, pairs of targets, and delta'''

    delta = np.random.beta(alpha, alpha)
    """
		Q. Write your code to get mixed inputs, pairs of targets, and delta.
		"""
    batch_size = x.size(0)
    index = torch.randperm(batch_size)

    mixed_x = delta * x + (1 - delta) * x[index, : ]
    y_a, y_b = y, y[index]

    return mixed_x, y_a, y_b, delta


class mixup_criterion(nn.Module):
    """
    Args:
        beta (float): bootstrap parameter. Default, 0.95
    """
    def __init__(self, criterion):
        super(mixup_criterion, self).__init__()
        self.criterion = criterion

    def forward(self, y_pred, y_a, y_b, delta):
        """
		    Q. Write your code to compute mixup-loss.
		    """
        loss = delta * self.criterion(y_pred, y_a) + (1-delta) * self.criterion(y_pred, y_b)

        return loss



In [25]:
class MixupTrainer(object):
    def __init__(self, model, device, criterion=mixup_criterion(criterion=nn.CrossEntropyLoss())):
        super(MixupTrainer, self).__init__()
        self.model = model
        self.device = device
        self.criterion = criterion

    def train(self, train_dataloader, optimizer, epoch):
        self.model.train()

        for i in range(epoch):
            for j, data in tqdm(enumerate(train_dataloader), total=len(train_dataloader), desc='Epoch [{} / {}]'.format(i+1,epoch)):
                inputs, targets = data
                inputs, targets = inputs.to(self.device), targets.to(self.device)
                inputs, targets_a, targets_b, delta = mixup_data(inputs, targets)

                outputs = self.model(inputs)
                loss = self.criterion(outputs, targets_a, targets_b, delta)

                optimizer.zero_grad()
                loss.backward()
                optimizer.step()

In [26]:
random.seed(1)
np.random.seed(1)
torch.manual_seed(1)
torch.cuda.manual_seed(1)
torch.cuda.manual_seed_all(1)

mixup_model = ResNet18().to(device)
mixup_loss = mixup_criterion(criterion=nn.CrossEntropyLoss())
mixup_optimizer = optim.SGD(mixup_model.parameters(), lr=0.1, momentum=0.9, weight_decay=5e-4)
mixup_trainer = MixupTrainer(model=mixup_model, device=device, criterion=mixup_loss)
mixup_trainer.train(train_dataloader, mixup_optimizer, epoch=20)

Epoch [1 / 20]:   0%|          | 0/157 [00:00<?, ?it/s]

Epoch [2 / 20]:   0%|          | 0/157 [00:00<?, ?it/s]

Epoch [3 / 20]:   0%|          | 0/157 [00:00<?, ?it/s]

Epoch [4 / 20]:   0%|          | 0/157 [00:00<?, ?it/s]

Epoch [5 / 20]:   0%|          | 0/157 [00:00<?, ?it/s]

Epoch [6 / 20]:   0%|          | 0/157 [00:00<?, ?it/s]

Epoch [7 / 20]:   0%|          | 0/157 [00:00<?, ?it/s]

Epoch [8 / 20]:   0%|          | 0/157 [00:00<?, ?it/s]

Epoch [9 / 20]:   0%|          | 0/157 [00:00<?, ?it/s]

Epoch [10 / 20]:   0%|          | 0/157 [00:00<?, ?it/s]

Epoch [11 / 20]:   0%|          | 0/157 [00:00<?, ?it/s]

Epoch [12 / 20]:   0%|          | 0/157 [00:00<?, ?it/s]

Epoch [13 / 20]:   0%|          | 0/157 [00:00<?, ?it/s]

Epoch [14 / 20]:   0%|          | 0/157 [00:00<?, ?it/s]

Epoch [15 / 20]:   0%|          | 0/157 [00:00<?, ?it/s]

Epoch [16 / 20]:   0%|          | 0/157 [00:00<?, ?it/s]

Epoch [17 / 20]:   0%|          | 0/157 [00:00<?, ?it/s]

Epoch [18 / 20]:   0%|          | 0/157 [00:00<?, ?it/s]

Epoch [19 / 20]:   0%|          | 0/157 [00:00<?, ?it/s]

Epoch [20 / 20]:   0%|          | 0/157 [00:00<?, ?it/s]

In [27]:
mixup_evaluator = Evaluator(model=mixup_model, device=device)
mixup_evaluator.test(test_dataloader)

Evaluation:   0%|          | 0/100 [00:00<?, ?it/s]


Accuracy: 77.35% 

