<a href="https://colab.research.google.com/github/nuyhc/RhythmStudy/blob/main/1.%20PyTorch/tf2torch/2_%EC%98%88%EC%88%A0%20%EC%9E%91%ED%92%88%20%ED%99%94%EA%B0%80%20%EB%B6%84%EB%A5%98_%EB%B0%95%EC%A7%80%ED%98%84.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 월간 데이터 예술 작품 화가 분류 AI 경진대회
알고리즘 | 비전 | 분류 | Macro f1 score


### [주제]  
예술 작품을 화가 별로 분류하는 AI 모델 개발  
* 예술 작품의 일부분만 주어지는 테스트 데이터셋에 대해 올바르게 화가를   
분류해낼 수 있는 예술 작품의 전문가인 AI 모델을 만들기  


### [데이터]  
일부분만 주어지는 예술 작품을 화가 별로 분류하는 AI 모델 개발

* 학습 데이터셋은 대표적인 화가 50명에 대한 예술 작품(이미지) 제공
* 테스트 데이터셋은 대표적인 화가 50명에 대한 예술 작품(이미지)의 일부분(약 1/4)만 제공
* 학습에 활용할 수 있는 화가 50명에 대한 특징 정보(csv) 추가 제공

### [Reference]
https://dacon.io/competitions/official/236006/codeshare/7078?page=1&dtype=recent

# 1. Library Load

In [1]:
import os
import cv2
import math
import random
import pandas as pd
import numpy as np
from tqdm.auto import tqdm
from scipy.stats import beta

from sklearn import preprocessing
from sklearn.metrics import f1_score
from sklearn.model_selection import train_test_split

import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader, sampler

import albumentations as A
from albumentations.pytorch.transforms import ToTensorV2

import torchvision
import torchvision.models as models
import torchvision.transforms.functional

import warnings
warnings.filterwarnings(action='ignore')

In [2]:
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')

In [3]:
# Load Data
path_to_zip_file = '/content/drive/MyDrive/리듬스터디/data/예술_작품_화가_분류/open.zip'
directory_to_extract_to = '/content/data'

import zipfile
with zipfile.ZipFile(path_to_zip_file, 'r') as zip_ref:
    zip_ref.extractall(directory_to_extract_to)

# 2. Hyperparameter Setting

In [4]:
CFG = {
    'IMG_SIZE_H': 220,
    'IMG_SIZE_W': 275,
    'EPOCHS': 50,
    'LEARNING_RATE': 3e-4,
    'BATCH_SIZE': 64,
    'SEED': 41,
    'PATIENCE' : 3
}

# 3. Fixed RandomSeed
seed를 고정하면, 매번 프로그램을 실행할 때마다 생성되는 난수들의 수열을 같게할 수 있다.   

### Controlling sources of randomness
* PyTorch random number generator
* Random number generators in other libraries (NumPy)

### CUDA convolution benchmarking
* cuDNN library
* 벤치마킹 노이즈나 다른 하드웨어에서, 벤치마크가 subsequent runs에서 다른 알고리즘들을 선택
* Disabling the benchmarking feature with `torch.backends.cudnn.benchmark = False` causes cuDNN to deterministically select an algorithm, possibly at the cost of reduced performance.
* However, if you do not need reproducibility across multiple executions of your application, then performance might improve if the benchmarking feature is enabled with `torch.backends.cudnn.benchmark = True.`


### Avoiding nondeterministic algorithms
* `torch.use_deterministic_algorithms()` lets you configure PyTorch to use deterministic algorithms instead of nondeterministic ones where available, and to throw an error if an operation is known to be nondeterministic (and without a deterministic alternative).

https://pytorch.org/docs/stable/notes/randomness.html

In [5]:
def seed_everything(seed):
    random.seed(seed)
    os.environ['PYTHONHASHSEED'] = str(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)  # 같은 환경에서의 랜덤 값을 각각 생성
    torch.cuda.manual_seed(seed)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = True

seed_everything(CFG['SEED'])  # Seed 고정

# 4. Data Pre-processiong

In [6]:
train_df = pd.read_csv('/content/data/train.csv')
display(train_df.head())
train_df.shape

Unnamed: 0,id,img_path,artist
0,0,./train/0000.jpg,Diego Velazquez
1,1,./train/0001.jpg,Vincent van Gogh
2,2,./train/0002.jpg,Claude Monet
3,3,./train/0003.jpg,Edgar Degas
4,4,./train/0004.jpg,Hieronymus Bosch


(5911, 3)

In [7]:
info_df = pd.read_csv('/content/data/artists_info.csv')
display(info_df.head())
info_df.shape

Unnamed: 0,name,years,genre,nationality
0,Amedeo Modigliani,1884 - 1920,Expressionism,Italian
1,Vasiliy Kandinskiy,1866 - 1944,"Expressionism,Abstractionism",Russian
2,Diego Rivera,1886 - 1957,"Social Realism,Muralism",Mexican
3,Claude Monet,1840 - 1926,Impressionism,French
4,Rene Magritte,1898 - 1967,"Surrealism,Impressionism",Belgian


(50, 4)

In [8]:
# LabelEncoder
le = preprocessing.LabelEncoder()
info_df['genre'] = le.fit_transform(info_df['genre'])
info_df = info_df.rename(columns={'name':'artist'})
info_df.head()

Unnamed: 0,artist,years,genre,nationality
0,Amedeo Modigliani,1884 - 1920,5,Italian
1,Vasiliy Kandinskiy,1866 - 1944,6,Russian
2,Diego Rivera,1886 - 1957,23,Mexican
3,Claude Monet,1840 - 1926,10,French
4,Rene Magritte,1898 - 1967,26,Belgian


In [9]:
len(info_df['genre'].unique())  # 장르는 총 31가지

31

In [10]:
# train_df 와 info_df 병합하기
# merge
train_merge = pd.merge(train_df, info_df[['artist', 'genre']], on='artist', how='left')
train_merge.isna().sum()

id            0
img_path      0
artist        0
genre       220
dtype: int64

In [26]:
new_train = train_merge.replace(np.nan, 14) # nan값을 14로 채우기

In [27]:
new_train.head()

Unnamed: 0,id,img_path,artist,genre
0,0,/content/data/train,Diego Velazquez,1.0
1,1,/content/data/train,Vincent van Gogh,16.0
2,2,/content/data/train,Claude Monet,10.0
3,3,/content/data/train,Edgar Degas,10.0
4,4,/content/data/train,Hieronymus Bosch,14.0


In [28]:
# colab 이미지 경로 설정하기
def img_path_change(img_path):
    return '/content/data/train'

# train['img_path'] = train['img_path'].apply(img_path_change)
new_train['img_path'] = new_train['img_path'].apply(img_path_change)

In [29]:
new_train.sample(5)

Unnamed: 0,id,img_path,artist,genre
3182,3182,/content/data/train,Paul Cezanne,16.0
3617,3617,/content/data/train,Edgar Degas,10.0
1498,1498,/content/data/train,Albrecht Du rer,14.0
3611,3611,/content/data/train,Salvador Dali,25.0
3146,3146,/content/data/train,Edvard Munch,29.0


# 5. Train / Validation Split

In [30]:
new_train['genre'] = new_train['genre'].astype(int)

In [31]:
train_df, val_df, _, _ = train_test_split(new_train, new_train['genre'].values, test_size=0.2, random_state=CFG['SEED'])

In [33]:
train_df = train_df.sort_values(by=['id'])
train_df.head()

Unnamed: 0,id,img_path,artist,genre
0,0,/content/data/train,Diego Velazquez,1
2,2,/content/data/train,Claude Monet,10
3,3,/content/data/train,Edgar Degas,10
5,5,/content/data/train,Pierre-Auguste Renoir,10
6,6,/content/data/train,Rene Magritte,26
