# Custom Dataloader

* 목표: Custom Dataset(.npy or etc)을 `torch.utils.data.DataLoader` 가 '적절하게' 빼올 수 있도록 준비하는 과정

## - ex) FONT-50 (final project dataset) Dataloader

#### (0) torch.utils.data.Dataset 상속받기

In [1]:
import numpy as np
import torch
import os
import glob
from torch.utils.data import Dataset

In [2]:
class FontDataset(Dataset):
    def __init__(self, dataroot):
        pass

#### (1) __init__ / __getitem__ / __len__ 정의하기

cf) [python dunder](https://mingrammer.com/underscore-in-python/): 주로 한 모듈 내부에서만 사용하는 private 클래스/함수/변수/메서드를 선언할 때 사용하는 컨벤션

i. `__init__` 정의하기: 클래스 연산에 필요한 항목들 정의, 준비

In [3]:
class FontDataset(Dataset):
    def __init__(self, dataroot):
        entry = []
        files = glob.glob1(dataroot, '*.npy')
        for f in files:
            f = os.path.join(dataroot, f)
            entry.append(f)
            
        self.entry = sorted(entry)

<img src="../../shared/custom_combined.png" alt="Drawing" style="width: 1000px;" align="left"/>

ii. `__getitem__` 정의하기: 데이터를 로드하고 가공하여 return

In [4]:
class FontDataset(Dataset):
    def __init__(self, dataroot):
        entry = []
        files = glob.glob1(dataroot, '*.npy')
        for f in files:
            f = os.path.join(dataroot, f)
            entry.append(f)
            
        self.entry = sorted(entry)
        
    def __getitem__(self, index):
        single_npy_path = self.entry[index] # entry 중 index번째 데이터 반환
        
        single_npy = np.load(single_npy_path, allow_pickle=True)[0] # Single Data
        single_npy_tensor = torch.from_numpy(single_npy) # Transform Numpy to Tensor
        
        single_npy_label = np.load(single_npy_path, allow_pickle=True)[1] # Single Label (Saved as 'int' originally. Doesn't need to transform into torch tensor)

        return (single_npy_tensor, single_npy_label)


iii. (optional) `__len__` 정의하기: 데이터셋 크기 반환

In [5]:
class FontDataset(Dataset):
    def __init__(self, dataroot):
        entry = []
        files = glob.glob1(dataroot, '*.npy')
        for f in files:
            f = os.path.join(dataroot, f)
            entry.append(f)
            
        self.entry = sorted(entry)
        
    def __getitem__(self, index):
        single_npy_path = self.entry[index] # entry 중 index번째 데이터 반환
        
        single_npy = np.load(single_npy_path, allow_pickle=True)[0] # Single Data
        single_npy_tensor = torch.from_numpy(single_npy) # Transform Numpy to Tensor
        
        single_npy_label = np.load(single_npy_path, allow_pickle=True)[1] # Single Label (Saved as 'int' originally. Doesn't need to transform into torch tensor)

        return (single_npy_tensor, single_npy_label)

    def __len__(self):
        return len(self.entry)

<hr>

## ref

* [pytorch.org - Writing Custom Datasets, Dataloaders and Transforms](https://pytorch.org/tutorials/beginner/data_loading_tutorial.html)
* [How to User Pytorch Custom DataLoader](https://greeksharifa.github.io/pytorch/2018/11/10/pytorch-usage-03-How-to-Use-PyTorch/#custom-dataloader-%EB%A7%8C%EB%93%A4%EA%B8%B0)