### PyTorch lighting

#### PyTorch lighting
머신러닝, 딥러닝 모델 구축을 할 때에 공통된 부분들을 반복해서 작성할 필요 없이 대신 처리해주고, 머신러닝 모델 구축의 탬플릿 코드로써 기능을 하며, 다른 사람이 작성한 코드를 쉽게 볼 수 있도록 공통된 스타일을 갖도록 하고 ,모델의 개별적인 부분은 유연하게 커스터마이징하여 실험할 수 있는 라이브러리

<img src="https://miro.medium.com/max/2000/1*-GDzOk_UJElhGtnc9g6zHA.png" width="700">

#### 설치

In [1]:
!pip install pytorch-lightning

Collecting pytorch-lightning
  Using cached pytorch_lightning-2.5.0.post0-py3-none-any.whl.metadata (21 kB)
Collecting torchmetrics>=0.7.0 (from pytorch-lightning)
  Using cached torchmetrics-1.6.1-py3-none-any.whl.metadata (21 kB)
Collecting lightning-utilities>=0.10.0 (from pytorch-lightning)
  Using cached lightning_utilities-0.11.9-py3-none-any.whl.metadata (5.2 kB)
Collecting aiohttp!=4.0.0a0,!=4.0.0a1 (from fsspec[http]>=2022.5.0->pytorch-lightning)
  Downloading aiohttp-3.11.11-cp311-cp311-win_amd64.whl.metadata (8.0 kB)
Collecting aiohappyeyeballs>=2.3.0 (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]>=2022.5.0->pytorch-lightning)
  Using cached aiohappyeyeballs-2.4.4-py3-none-any.whl.metadata (6.1 kB)
Collecting aiosignal>=1.1.2 (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]>=2022.5.0->pytorch-lightning)
  Using cached aiosignal-1.3.2-py2.py3-none-any.whl.metadata (3.8 kB)
Collecting attrs>=17.3.0 (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]>=2022.5.0->pytorch-lightning)
  Do

##### PyTorch Lightning을 사용하여 딥러닝 모델을 작성하는 순서

1. Lightning Module에서 상속된 새로운 Lightning Module 클래스를 작성
2. DataLoader를 통해 학습할 데이터를 준비
3. Trainer 객체를 만들고, 그 Trainer에 데이터와 Lightning Module 클래스를 주어 학습

In [5]:
!pip install --upgrade sympy

Collecting sympy
  Using cached sympy-1.13.3-py3-none-any.whl.metadata (12 kB)
Using cached sympy-1.13.3-py3-none-any.whl (6.2 MB)
Installing collected packages: sympy
  Attempting uninstall: sympy
    Found existing installation: sympy 1.13.1
    Uninstalling sympy-1.13.1:
      Successfully uninstalled sympy-1.13.1
Successfully installed sympy-1.13.3


ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
torch 2.5.1+cu124 requires sympy==1.13.1; python_version >= "3.9", but you have sympy 1.13.3 which is incompatible.


In [7]:
!pip install sympy==1.13.1

Collecting sympy==1.13.1
  Using cached sympy-1.13.1-py3-none-any.whl.metadata (12 kB)
Using cached sympy-1.13.1-py3-none-any.whl (6.2 MB)
Installing collected packages: sympy
  Attempting uninstall: sympy
    Found existing installation: sympy 1.13.3
    Uninstalling sympy-1.13.3:
      Successfully uninstalled sympy-1.13.3
Successfully installed sympy-1.13.1


In [1]:
import torch
import torch.nn as nn
import torch.nn.functional as F

from torch.utils.data import Dataset, DataLoader
from torchvision import transforms
from torchvision.datasets import MNIST
import os

import pytorch_lightning as pl

class LitModel(pl.LightningModule):
    def __init__(self):
        super().__init__()
        self.l1 = nn.Linear(28 * 28, 10)

    def forward(self, x):
        return torch.relu(self.l1(x.view(x.size(0), -1)))

    def training_step(self, batch, batch_idx):
        x, y = batch
        y_hat = self(x)
        loss = F.cross_entropy(y_hat, y)
        return loss

    def configure_optimizers(self):
        return torch.optim.Adam(self.parameters(), lr=0.02)

train_loader = DataLoader(MNIST(os.getcwd(), download=True, transform=transforms.ToTensor()))

trainer = pl.Trainer()
model = LitModel()
trainer.fit(model, train_loader)

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Failed to download (trying next):
<urlopen error [WinError 10061] 대상 컴퓨터에서 연결을 거부했으므로 연결하지 못했습니다>

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-images-idx3-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-images-idx3-ubyte.gz to d:\01_Programming\100_HugoBank\Mine\study-pytorch\pytorch_chatbot\MNIST\raw\train-images-idx3-ubyte.gz


100%|██████████| 9.91M/9.91M [00:12<00:00, 788kB/s] 


Extracting d:\01_Programming\100_HugoBank\Mine\study-pytorch\pytorch_chatbot\MNIST\raw\train-images-idx3-ubyte.gz to d:\01_Programming\100_HugoBank\Mine\study-pytorch\pytorch_chatbot\MNIST\raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Failed to download (trying next):
<urlopen error [WinError 10061] 대상 컴퓨터에서 연결을 거부했으므로 연결하지 못했습니다>

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-labels-idx1-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-labels-idx1-ubyte.gz to d:\01_Programming\100_HugoBank\Mine\study-pytorch\pytorch_chatbot\MNIST\raw\train-labels-idx1-ubyte.gz


100%|██████████| 28.9k/28.9k [00:00<00:00, 154kB/s]


Extracting d:\01_Programming\100_HugoBank\Mine\study-pytorch\pytorch_chatbot\MNIST\raw\train-labels-idx1-ubyte.gz to d:\01_Programming\100_HugoBank\Mine\study-pytorch\pytorch_chatbot\MNIST\raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Failed to download (trying next):
<urlopen error [WinError 10061] 대상 컴퓨터에서 연결을 거부했으므로 연결하지 못했습니다>

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-images-idx3-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-images-idx3-ubyte.gz to d:\01_Programming\100_HugoBank\Mine\study-pytorch\pytorch_chatbot\MNIST\raw\t10k-images-idx3-ubyte.gz


100%|██████████| 1.65M/1.65M [00:00<00:00, 1.72MB/s]


Extracting d:\01_Programming\100_HugoBank\Mine\study-pytorch\pytorch_chatbot\MNIST\raw\t10k-images-idx3-ubyte.gz to d:\01_Programming\100_HugoBank\Mine\study-pytorch\pytorch_chatbot\MNIST\raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Failed to download (trying next):
<urlopen error [WinError 10061] 대상 컴퓨터에서 연결을 거부했으므로 연결하지 못했습니다>

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-labels-idx1-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-labels-idx1-ubyte.gz to d:\01_Programming\100_HugoBank\Mine\study-pytorch\pytorch_chatbot\MNIST\raw\t10k-labels-idx1-ubyte.gz


100%|██████████| 4.54k/4.54k [00:00<00:00, 4.54MB/s]
GPU available: False, used: False
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
d:\01_Programming\100_HugoBank\Mine\study-pytorch\torch-cpu-env\Lib\site-packages\pytorch_lightning\trainer\connectors\logger_connector\logger_connector.py:76: Starting from v1.9.0, `tensorboardX` has been removed as a dependency of the `pytorch_lightning` package, due to potential conflicts with other packages in the ML ecosystem. For this reason, `logger=True` will use `CSVLogger` as the default logger, unless the `tensorboard` or `tensorboardX` packages are found. Please `pip install lightning[extra]` or one of them to enable TensorBoard support by default
d:\01_Programming\100_HugoBank\Mine\study-pytorch\torch-cpu-env\Lib\site-packages\pytorch_lightning\loops\utilities.py:73: `max_epochs` was not set. Setting it to 1000 epochs. To train without an epoch limit, set `max_epochs=-1`.

  | Name | Type   | Params | Mode 
----

Extracting d:\01_Programming\100_HugoBank\Mine\study-pytorch\pytorch_chatbot\MNIST\raw\t10k-labels-idx1-ubyte.gz to d:\01_Programming\100_HugoBank\Mine\study-pytorch\pytorch_chatbot\MNIST\raw

Epoch 1:  18%|█▊        | 11026/60000 [00:49<03:38, 224.26it/s, v_num=0]


Detected KeyboardInterrupt, attempting graceful shutdown ...


NameError: name 'exit' is not defined

#### LightningModule Class

trainer와 모델이 상호작용을 할 수 있도록 pytorch의 nn.Module의 상위 클래스인 lightning module을 구현
- 기존 PyTorch는 DataLoader, Mode, optimizer, Training loof 등을 전부 따로따로 코드로 구현
- Pytorch Lightning에서는 Lightning Model class 안에 이 모든것을 한번에 구현

In [2]:
import pytorch_lightning as pl

In [3]:
# 이 형태
class Classifier(pl.LightningModule):
    def __init__(self):
        super().__init__()
        self.model = nn.Sequential(
            # ...
        )

    def forward(self, x):
        pass

    def training_step(self, batch, batch_idx):
        pass

    # 학습 중간에 모델의 성능을 체크하는 용도로 사용
    def validation_step(self, batch, batch_idx):
        pass

    # validation_step의 결과로 무엇인가 할 일이 있으면 validation_epoch_end 메서드에 작성
    def validation_epoch_end(self, validation_step_outputs):
        for pred in validation_step_outputs:
            pass

    # test 데이터로더에서 제공하는 배치를 가지고 확인하고 싶은 통계량을 기록하는데 사용
    def test_step(self, batch, batch_idx):
        pass

    # 모델의 최적 파라미터를 찾을 때 사용할 optimizer와 scheduler를 구현
    def configure_optimizers(self):
        pass

- 확인 결과 GPU PyTorch와 PyTorch ligthing 사이에 호환성 문제 해결 안됨!

##### Boston House 예제로 소스 설명

In [9]:
# 판다스 설치
!pip install pandas

Collecting pandas
  Using cached pandas-2.2.3-cp311-cp311-win_amd64.whl.metadata (19 kB)
Collecting pytz>=2020.1 (from pandas)
  Using cached pytz-2024.2-py2.py3-none-any.whl.metadata (22 kB)
Collecting tzdata>=2022.7 (from pandas)
  Downloading tzdata-2025.1-py2.py3-none-any.whl.metadata (1.4 kB)
Using cached pandas-2.2.3-cp311-cp311-win_amd64.whl (11.6 MB)
Using cached pytz-2024.2-py2.py3-none-any.whl (508 kB)
Downloading tzdata-2025.1-py2.py3-none-any.whl (346 kB)
   ---------------------------------------- 0.0/346.8 kB ? eta -:--:--
   - -------------------------------------- 10.2/346.8 kB ? eta -:--:--
   -------- ------------------------------ 71.7/346.8 kB 975.2 kB/s eta 0:00:01
   -------------------------------------- - 337.9/346.8 kB 3.0 MB/s eta 0:00:01
   ---------------------------------------- 346.8/346.8 kB 2.7 MB/s eta 0:00:00
Installing collected packages: pytz, tzdata, pandas
Successfully installed pandas-2.2.3 pytz-2024.2 tzdata-2025.1



[notice] A new release of pip is available: 24.0 -> 24.3.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [7]:
import torch
import pytorch_lightning as pl
from torch import Tensor, nn
from sklearn.datasets import fetch_openml
from sklearn.preprocessing import MinMaxScaler
from torch.utils.data import Dataset, DataLoader
from torch.nn import functional as F
import numpy as np

In [None]:
# Boston 집값 데이터를 읽어온다.
boston = fetch_openml(name='boston', version=1)  # 대신 openml에서 가져오기
X, y = boston.data, boston.target

In [16]:
class SklearnDataset(Dataset):
    def __init__(self, X: np.ndarray, y: np.ndarray):
        '''
        데이터셋 초기화 함수
        X: 입력 데이터 (특성)
        y: 타겟 데이터 (레이블)
        '''
        super().__init__()
        scaler = MinMaxScaler()         # MinMaxScaler를 사용하여 데이터를 0과 1 사이로 정규화

        scaler.fit(X)                   # 입력 데이터를 학습하여 정규화 파라미터를 계산
        self.X = scaler.transform(X)    # 입력 데이터 X를 정규화된 값으로 변환
        self.Y = y                      # 타겟 데이터 y는 그대로 저장

    def __len__(self):
        '''
        데이터셋의 크기 (샘플의 개수)를 반환하는 함수
        '''
        return len(self.X)
    
    def __getitem__(self, idx):
        '''    
        주어진 인덱스 idx에 해당하는 입력 데이터 X와 타겟 데이터 y를 반환하는 함수
        X는 np.float32 형으로 변환되고, y도 np.float32 형으로 변환됨
        '''
        x = self.X[idx].astype(np.float32)  # X 데이터를 float32로 변환
        y = self.Y[idx].astype(np.float32)  # y 데이터를 float32로 변환
        return x, y

In [17]:
# 위에서 정의한 SklearnDataset 클래스로 인스턴스 생성
bostonds = SklearnDataset(X, y)

# DataLoader 객체 생성
train_loader = DataLoader(
                    bostonds,           # 데이터셋 객체 (여기서는 SklearnDataset)
                    batch_size=32,      # 배치 크기 설정 (한 번에 32개의 샘플을 처리)
                    shuffle=True,       # 데이터를 랜덤하게 섞어서 학습에 사용
                    drop_last=True,     # 마지막 배치가 미니 배치 크기보다 작을 경우 버리기
                )

In [18]:
# 선형 회귀 모델을 위한 클래스 정의
class LinRegModel(pl.LightningModule):
    
    # 초기화 메서드. 모델을 정의하고, 사용할 신경망 레이어를 설정
    def __init__(self, input_dim: int):
        '''
        모델의 입력 차원과 출력 차원을 정의
        input_dim: 입력 데이터의 특성 수 (여기서는 13개 특성)
        '''
        super().__init__()
        # 선형 회귀 레이어 정의: 입력 차원(input_dim)과 출력 차원(1)을 설정
        self.linear = nn.Linear(in_features=13, out_features=1, bias=True)

    # 순전파 함수. 입력 x에 대해 모델의 예측값을 계산
    # 모델의 추론 결과를 제공하고 싶을 때 사용
    def forward(self, x):
        '''모델의 입력 x를 받아 예측값 y_hat을 계산'''
        y_hat = self.linear(x)      # 선형 회귀 레이어를 통해 예측값 계산
        return y_hat

    # 훈련 과정에서 한 배치에 대해 손실을 계산
    # 단일 배치에서의 손실을 반환. train loop로 자동 반복
    def training_step(self, batch, batch_idx):
        '''훈련 데이터의 한 배치(batch)에서 입력(x)와 출력(y)을 받아 손실(loss)을 계산'''
        x, y = batch        # 배치에서 입력(x)와 출력(y)을 추출
        x = x.view(x.size(0), -1)       # 입력을 평탄화 (2D 텐서로 변환)
        y_hat = self(x)         # 모델을 통해 예측값 y_hat 계산
        loss = F.mse_loss(y_hat, y, reduction="sum")        # 평균 제곱 오차(MSE) 손실 계산
        return loss

    # 옵티마이저 설정 (Adam 옵티마이저 사용)
    def configure_optimizers(self):
        '''
        모델 파라미터에 대한 옵티마이저를 설정합니다. 
        여기서는 Adam 옵티마이저를 사용'''
        return torch.optim.Adam(self.parameters(), lr=1e-4)  # 학습률 1e-4로 Adam 옵티마이저 설정

In [14]:
# PyTorch Lightning Trainer 객체 생성
trainer = pl.Trainer()              # 모델 훈련을 위한 Trainer 객체 생성. 기본 설정으로 실행
# 모델 인스턴스 생성
model = LinRegModel(input_dim=13)   # LinRegModel을 생성하고 입력 차원(input_dim)을 13으로 설정
# 모델 훈련 시작
trainer.fit(model, train_loader)    # train_loader에서 제공되는 데이터로 모델을 훈련

GPU available: False, used: False
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
d:\01_Programming\100_HugoBank\Mine\study-pytorch\torch-cpu-env\Lib\site-packages\pytorch_lightning\loops\utilities.py:73: `max_epochs` was not set. Setting it to 1000 epochs. To train without an epoch limit, set `max_epochs=-1`.

  | Name   | Type   | Params | Mode 
------------------------------------------
0 | linear | Linear | 14     | train
------------------------------------------
14        Trainable params
0         Non-trainable params
14        Total params
0.000     Total estimated model params size (MB)
1         Modules in train mode
0         Modules in eval mode
d:\01_Programming\100_HugoBank\Mine\study-pytorch\torch-cpu-env\Lib\site-packages\pytorch_lightning\trainer\connectors\data_connector.py:425: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=15` in th

Epoch 2:  27%|██▋       | 4/15 [00:00<00:00, 276.02it/s, v_num=1] 

  loss = F.mse_loss(y_hat, y, reduction="sum")


Epoch 999: 100%|██████████| 15/15 [00:00<00:00, 319.14it/s, v_num=1]

`Trainer.fit` stopped: `max_epochs=1000` reached.


Epoch 999: 100%|██████████| 15/15 [00:00<00:00, 267.80it/s, v_num=1]
