## Loss Functions

딥러닝의 핵심은 주어진 네트워크의 weight bias를 loss 가 작아지는 쪽으로 바꾸는 데 있다. 따라서 손실함수야 말로 학습을 어떻게 할 지 정의해주는 부분이라고 생각 할 수 있다. 

pytorch / pytorch lightning 에서는 기본적으로 회귀에서 사용되는 MSE, 분류에서 사용되는 Cross Entropy 뿐 아니라 다양한 손실함수를 정의하고 있으며, 이는 torch.nn.functional 에 사전 정의되어 내장되어 있다. 

> torch.nn.functional
* mse_loss      : element-wise mean squared error.
* cross_entropy : cross entropy loss between input(logit) and target(prob.).
* binary_cross_entropy : Binary Cross Entropy between the target and input probabilities.
* binary_cross_entropy_with_logits :  Binary Cross Entropy between target and input logits.
* kl_div : Kullback-Leibler divergence Loss
* l1_loss : mean element-wise absolute value difference.
* smooth_l1_loss : uses a squared term if the absolute element-wise error falls below beta and an L1 term otherwise.
* nll_loss : negative log likelihood loss.
* poisson_nll_loss : Poisson negative log likelihood loss.
* gaussian_nll_loss : Gaussian negative log likelihood loss.


또한 Metric 들도 다양하게 사용할 수 있는데, Accuracy, AUCROC 등등 torchmetrics 를 통해 사용할 수 있다.
https://torchmetrics.readthedocs.io/en/stable/

대부분의 경우 내장됨 함수를 사용하게 되겠지만, 필요하다면 임의의 loss / metric 을 정의해서 쓸 수 도 있는데, 그 방법은 아래와 같다.

In [16]:
import numpy as np
import torch
from torch import nn
from torch.nn import functional as F
import torch.optim as optim

import pytorch_lightning as pl
from pytorch_lightning.accelerators import accelerator
from torchmetrics import functional as FM
from torchinfo import summary

from torchvision.datasets import MNIST
import torchvision.transforms as transforms
import torch.utils.data as data
from torch.utils.data import DataLoader
import pandas as pd
import matplotlib.pyplot as plt


완전 scratch 부터 시작해서 이해를 돕기 위해서 onehot encoding 부터 수동으로 만들어본다.

In [17]:
class Onehot(object) :
    def __call__(self, sample):
        sample = sample
        target = np.eye(10)[sample] # 10x10 대각행열을 만들어서 그 중에 n번째 row 를 반환 0 --> (1,0,0,0,0....0)
        return torch.FloatTensor(target)
    

잘돌아가는지 확인해본다. (fucntion 으로 만들어도 물론 된다.)

In [18]:
a = Onehot()
a(5)

tensor([0., 0., 0., 0., 0., 1., 0., 0., 0., 0.])

잘 돌아간다. (여기서는 np.eye(10) 을 사용했으므로, class 10 개가 넘어가는 경우에는 안돌아가겠지)

그럼 데이터를 로드해서 해줘야 할 것들을 정의한다. \
X 데이터는 가져와서 텐서로 만들 것 \
Y 데이터는 가져와서 원핫 인코딩 할 것


In [19]:
y_transform = transforms.Compose([Onehot()])         # target one-hot encoding 
x_transform = transforms.Compose([transforms.ToTensor()])  # image transform 

MNIST 데이터를 로드하여 변형을 해준다. arguments 에서 transform 은 x 데이터에, target_transform 은 y 데이터에 적용이 된다.

In [20]:
train_dataset = MNIST('', transform=x_transform, target_transform=y_transform, train=True)
test_dataset = MNIST('', transform=x_transform, target_transform=y_transform, train=False)

데이터 로더를 만들어주고 (test 데이터를 그냥 validation 에 넣어서 쓴다.)

In [21]:
batch_size = 128
trainDatLoader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
valDataLoader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False)

간단한 모듈을 만들어서 돌려보자 (3계층짜리 )

In [22]:
class Model(pl.LightningModule):
    def __init__(self):
        super().__init__()
        self.flatten = nn.Flatten()
        self.linear1 = nn.Linear(28*28, 64)
        self.linear2 = nn.Linear(64, 32)
        self.linear3 = nn.Linear(32, 10)
        self.relu = nn.ReLU()

    def forward(self, x):
        x = self.flatten(x)
        x1 = self.linear1(x)
        x1 = self.relu(x1)
        x2 = self.linear2(x1)
        x2 = self.relu(x2)
        x3 = self.linear3(x2)
        return x3
               

그리고 mse 를 기존 함수가 아닌 사용자 정의 함수로 정의한다. 기본적으로 loss 함수는 y_hat , y 를 가지고 계산되며 최종 결과는 1자리 실수여야 한다.

In [23]:
def custom_loss_mse(pred, target):
    error = torch.mean(torch.square(pred-target)) 
    return error

그 값이 어찌되었든 맞추면 작아지고 틀리면 커지기만 하면 어떤 값을 써도 상관없다. abs 를 취한 값을 써도 무관

In [24]:
def custom_mean_abs_error(y_pred, y_true):
    error = torch.abs(torch.mean(y_true - y_pred))
    return error 

In [25]:
class myModel(pl.LightningModule):

    def __init__(self):
        super().__init__()
        self.layers = Model()


    def forward(self, x):
        out = self.layers(x)
        out = torch.softmax(out, dim=-1) 
        return(out)

    def training_step(self, batch, batch_idx):
        x, y = batch
        y_pred = self(x)
        loss = custom_loss_mse(y_pred, y)  ##  mse 를 로스로 사용한다. 
        error = custom_mean_abs_error(y_pred, y)  ## error 도 같이 계산해서 로그에 남기도록 하자.
        metrics = {'loss' : loss, 'error' : error}
        self.log_dict(metrics)
        return loss


    def validation_step(self, batch, batch_idx):
        x, y = batch
        y_pred = self(x)
        loss = custom_loss_mse(y_pred, y)
        error = custom_mean_abs_error(y_pred, y)
        metrics = {'val_loss':loss, 'val_error':error}
        self.log_dict(metrics)
        #return loss # validation 은 리턴 안해도 상관 없음

    def configure_optimizers(self):
        return torch.optim.Adam( self.parameters(), lr=0.001)


    

기존 모델과 거의 같지만, test 를 사용하지 않을거라서  test_step 은 정의되지 않았다. (test데이터를 validation에 사용)

또한 loss 를 임의 loss 로 사용한 점을 주목

In [26]:
model = myModel()

In [27]:
summary(model, input_size=(8, 1, 28, 28))

Layer (type:depth-idx)                   Output Shape              Param #
myModel                                  [8, 10]                   --
├─Model: 1-1                             [8, 10]                   --
│    └─Flatten: 2-1                      [8, 784]                  --
│    └─Linear: 2-2                       [8, 64]                   50,240
│    └─ReLU: 2-3                         [8, 64]                   --
│    └─Linear: 2-4                       [8, 32]                   2,080
│    └─ReLU: 2-5                         [8, 32]                   --
│    └─Linear: 2-6                       [8, 10]                   330
Total params: 52,650
Trainable params: 52,650
Non-trainable params: 0
Total mult-adds (Units.MEGABYTES): 0.42
Input size (MB): 0.03
Forward/backward pass size (MB): 0.01
Params size (MB): 0.21
Estimated Total Size (MB): 0.24

In [28]:
epoch = 3
name = 'custom_loss_model' 
logger = pl.loggers.CSVLogger("logs", name=name)

In [29]:
trainer = pl.Trainer(max_epochs= epoch, logger=logger, accelerator='auto')
trainer.fit(model, trainDatLoader, valDataLoader)

GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
You are using a CUDA device ('NVIDIA GeForce RTX 4090') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

  | Name   | Type  | Params | Mode 
-----------------------------------------
0 | layers | Model | 52.6 K | train
-----------------------------------------
52.6 K    Trainable params
0         Non-trainable params
52.6 K    Total params
0.211     Total estimated model params size (MB)


Sanity Checking: |          | 0/? [00:00<?, ?it/s]

c:\Users\msong\anaconda3\envs\py3_11_8\Lib\site-packages\pytorch_lightning\trainer\connectors\data_connector.py:424: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=15` in the `DataLoader` to improve performance.
c:\Users\msong\anaconda3\envs\py3_11_8\Lib\site-packages\pytorch_lightning\trainer\connectors\data_connector.py:424: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=15` in the `DataLoader` to improve performance.


Training: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

`Trainer.fit` stopped: `max_epochs=3` reached.


In [30]:
version_num = logger.version
history = pd.read_csv(f'./logs/{name}/version_{version_num}/metrics.csv')
history

Unnamed: 0,epoch,error,loss,step,val_error,val_loss
0,0,4.656613e-10,0.05228,49,,
1,0,1.862645e-10,0.020543,99,,
2,0,3.72529e-10,0.017961,149,,
3,0,7.450581e-10,0.012716,199,,
4,0,7.683411e-10,0.015975,249,,
5,0,6.519258e-10,0.009064,299,,
6,0,1.280569e-10,0.012318,349,,
7,0,1.164153e-10,0.0145,399,,
8,0,4.190952e-10,0.011022,449,,
9,0,,,468,4.714471e-10,0.011748


In [31]:
history.groupby('epoch').last().drop('step', axis=1)

Unnamed: 0_level_0,error,loss,val_error,val_loss
epoch,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
0,4.190952e-10,0.011022,4.714471e-10,0.011748
1,5.820766e-10,0.009602,5.104428e-10,0.008709
2,1.833541e-10,0.005417,3.904488e-10,0.007579


: 