<a href="https://colab.research.google.com/github/ryuqae/EntityRelation/blob/main/Preprocessing%20and%20Training.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Final Project: 2021년 국립국어원 인공지능 언어능력 평가

- [2021년 국립국어원 인공지능 언어능력 평가](https://corpus.korean.go.kr/task/taskList.do?taskId=1&clCd=END_TASK&subMenuId=sub01) 는 9월 1일부터 시작하여 11월 1일까지 마감된 [네 가지 과제에](https://corpus.korean.go.kr/task/taskDownload.do?taskId=1&clCd=END_TASK&subMenuId=sub02) 대한 언어능력 평가 대회
- 여기서 제시된 과제를 그대로 수행하여 그 결과를 [최종 선정된 결과들](https://corpus.korean.go.kr/task/taskLeaderBoard.do?taskId=4&clCd=END_TASK&subMenuId=sub04)과 비교할 수 있도록 수행
- 아직 테스트 셋의 정답이 공식적으로 공개되고 있지 않아, 네 가지 과제의 자료에서 evaluation dataset으로 가지고 성능을 비교할 계획
- 기말 발표전까지 정답셋이 공개될 경우 이 정답셋을 가지고 성능 검증
- Transformers 기반 방법론, 신경망 등 각자 생각한 방법대로 구현 가능
- 현재 대회기간이 종료되어 자료가 다운로드 가능하지 않으니 첨부된 자료 참조
- 개인적으로 하거나 최대 두명까지 그룹 허용. 
- 이 노트북 화일에 이름을 변경하여 작업하고 제출. 제출시 화일명을 FinalProject_[DS또는 CL]_학과_이름.ipynb
- 마감 12월 6일(월) 23:59분까지.
- 12월 7일, 9일 기말 발표 presentation 예정

## 리더보드

- 최종발표전까지 각조는 각 태스크별 실행성능을 **시도된 여러 방법의 결과들을 지속적으로**  [리더보드](https://docs.google.com/spreadsheets/d/1-uenfp5GolpY2Gf0TsFbODvj585IIiFKp9fvYxcfgkY/edit#gid=0)에 해당 팀명(구성원 이름 포함)을 입력하여 공개하여야 함. 
- 최종 마감일에 이 순위와 실제 제출한 프로그램의 수행 결과를 비교하여 성능을 확인

In [None]:
!pip install transformers

from functools import partial
from tqdm.notebook import trange, tqdm

import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
from transformers import AutoTokenizer, RobertaPreTrainedModel, AutoConfig, RobertaModel
from torch.utils.data import DataLoader, RandomSampler, SequentialSampler
from transformers import AdamW, get_linear_schedule_with_warmup, AutoTokenizer
import torch.nn.functional as F

import torchtext
from torchtext.legacy.data import Field, TabularDataset, BucketIterator

import os
import numpy as np
import random
import math
import time
import re
import logging

print("\n-- Packages Version --")
# print(f"{'NLTK':<10} | {nltk.__version__:>7}")
print(f"{'PyTorch':<10} | {torch.__version__:>7}")
print(f"{'TorchText':<10} | {torchtext.__version__:>7}")

print("\n-- CUDA Availability --")
# Set and check available cuda device

GPU_NUM = 0
device = torch.device(f'cuda:{GPU_NUM}' if torch.cuda.is_available() else 'cpu')
torch.cuda.set_device(device)

if device.type=='cuda':
    print(f"{torch.cuda.get_device_properties(device).name}\nDevice Num: {torch.cuda.current_device()}")
else:
    print('Warning: cuda is not available')
    
    
def init_logger():
    logging.basicConfig(
        format="%(asctime)s - %(levelname)s - %(name)s -   %(message)s",
        datefmt="%m/%d/%Y %H:%M:%S",
        level=logging.INFO,
    )

def set_seed(SEED=2934):
    random.seed(SEED)
    np.random.seed(SEED)
    torch.manual_seed(SEED)
    torch.cuda.manual_seed(SEED)
    torch.cuda.manual_seed_all(SEED)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False


-- Packages Version --
PyTorch    | 1.10.0+cu111
TorchText  |  0.11.0

-- CUDA Availability --
Tesla P100-PCIE-16GB
Device Num: 0


In [None]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [None]:
!nvidia-smi

Mon Dec  6 12:58:16 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 495.44       Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla P100-PCIE...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   42C    P0    28W / 250W |      2MiB / 16280MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

In [None]:
from transformers import AutoModel, AutoTokenizer, RobertaPreTrainedModel, AutoConfig, RobertaModel

ADDITIONAL_SPECIAL_TOKENS = ["<e1>", "</e1>", "<e2>", "</e2>"]
tokenizer = AutoTokenizer.from_pretrained("klue/roberta-large", return_token_type_ids=False)
tokenizer.add_special_tokens({"additional_special_tokens": ADDITIONAL_SPECIAL_TOKENS})

4

In [None]:
from glob import glob

data_dir = glob('*.tsv')

In [None]:
import pandas as pd

def load_data(dataset_dir, mode = 'train'):
    dataset = pd.read_csv(dataset_dir, delimiter='\t')
    if mode == 'test':
        dataset["ANSWER"] = [0] * len(dataset)
    dataset["ANSWER"] = dataset["ANSWER"].astype(int)
    return dataset

In [None]:
set_seed()

In [None]:
class HomonymDataset(torch.utils.data.Dataset):
    def __init__(self, tokenized_dataset, labels):
        self.tokenized_dataset = tokenized_dataset
        self.labels = labels
    
    def __getitem__(self, idx):
        item = {key: torch.tensor(val[idx]) for key, val in self.tokenized_dataset.items()}
        item['labels'] = torch.tensor(self.labels[idx], dtype=torch.long)
        return item

    def __len__(self):
        return len(self.labels)

In [None]:
def convert_sentence_to_features(dataset, tokenizer, max_len):
    # from distutils.util import strtobool

    max_seq_len=max_len
    pad_token=tokenizer.pad_token_id
    add_sep_token=False
    mask_padding_with_zero=True
    
    all_input_ids = []
    all_attention_mask = []
    all_e1_mask=[]
    all_e2_mask=[]
    all_label=[]
    m_len=0
        
    err = 0

    for idx in tqdm(range(len(dataset))):
        sentence = '[CLS]' + dataset['SENTENCE1'][idx][:dataset['start_s1'][idx]] \
        + '<e1>' + dataset['SENTENCE1'][idx][dataset['start_s1'][idx]:dataset['end_s1'][idx]] \
        + '</e1>' + dataset['SENTENCE1'][idx][dataset['end_s1'][idx]:] + '[SEP]' + dataset['SENTENCE2'][idx][:dataset['start_s2'][idx]] \
        + ' <e2> ' + dataset['SENTENCE2'][idx][dataset['start_s2'][idx]:dataset['end_s2'][idx]] \
        + ' </e2> ' + dataset['SENTENCE2'][idx][dataset['end_s2'][idx]:] + '[SEP]'
        
        token = tokenizer.tokenize(sentence)
        m_len = max(m_len, len(token))
        e11_p = token.index("<e1>")  # the start position of entity1
        e12_p = token.index("</e1>")  # the end position of entity1
        e21_p = token.index("<e2>")  # the start position of entity2
        e22_p = token.index("</e2>")  # the end position of entity2

        token[e11_p] = "$"
        token[e12_p] = "$"
        token[e21_p] = "#"
        token[e22_p] = "#"

        special_tokens_count = 1

        # masks for entity
        if len(token) < max_seq_len - special_tokens_count:
            input_ids = tokenizer.convert_tokens_to_ids(token)
            attention_mask = [1 if mask_padding_with_zero else 0] * len(input_ids)

            padding_length = max_seq_len - len(input_ids)
            input_ids = input_ids + ([pad_token] * padding_length)
            attention_mask = attention_mask + ([0 if mask_padding_with_zero else 1] * padding_length)

            e1_mask = [0] * len(attention_mask)
            e2_mask = [0] * len(attention_mask)

            for i in range(e11_p, e12_p + 1):
                e1_mask[i] = 1
            
            for j in range(e21_p, e22_p + 1):
                e2_mask[j] = 1

            assert len(input_ids) == max_seq_len, "Error with input length {} vs {}".format(len(input_ids), max_seq_len)
            assert len(attention_mask) == max_seq_len, "Error with attention mask length {} vs {}".format(
                len(attention_mask), max_seq_len
            )

            all_input_ids.append(input_ids)
            all_attention_mask.append(attention_mask)
            all_e1_mask.append(e1_mask)
            all_e2_mask.append(e2_mask)
            all_label.append(dataset['ANSWER'][idx])

    all_features = {
        'input_ids' : torch.tensor(all_input_ids),
        'attention_mask' : torch.tensor(all_attention_mask),
        'e1_mask' : torch.tensor(all_e1_mask),
        'e2_mask' : torch.tensor(all_e2_mask)
    }

    return HomonymDataset(all_features, all_label)


In [None]:
train_raw = load_data(data_dir[0])

# Model

In [None]:
class FCLayer(nn.Module):
    def __init__(self, input_dim, output_dim, dropout=0.0, use_activation=True):
        super().__init__()
        self.use_activation = use_activation
        self.input_dim = input_dim
        self.output_dim = output_dim
        
        self.dropout = nn.Dropout(dropout)
        self.fc1 = nn.Linear(self.input_dim, self.output_dim)
        self.tanh = nn.Tanh()
        
    def forward(self, x):
        x = self.dropout(x)
        if self.use_activation:
            x = self.tanh(x)
        x = self.fc1(x)
        return x

In [None]:
class HomonymNet(RobertaPreTrainedModel):
    def __init__(self, checkpoint, config, dropout_rate):
        super().__init__(config)
        self.model = AutoModel.from_pretrained(checkpoint, config=config)
        self.num_labels = config.num_labels

        self.cls_fclayer = FCLayer(config.hidden_size, config.hidden_size, dropout_rate)
        self.e1_fclayer = FCLayer(config.hidden_size, config.hidden_size, dropout_rate)
        self.e2_fclayer = FCLayer(config.hidden_size, config.hidden_size, dropout_rate)
        self.label_classifier = FCLayer(
            config.hidden_size * 3, # concat cls, e1, e2 output
            config.num_labels,
            dropout_rate,
            use_activation=False,
        )
    
    @staticmethod
    def entity_hidden_average(hidden_output, entity_mask):

        unsq_entity_mask = entity_mask.unsqueeze(1)
        length_tensor = (entity_mask != 0).sum(dim=1).unsqueeze(1)

        entity_sum = torch.bmm(unsq_entity_mask.float(), hidden_output).squeeze(1)
        entity_average = entity_sum.float() / length_tensor.float()  # broadcasting

        return entity_average


    def forward(self, input_ids, attention_mask, e1_mask, e2_mask, labels):
        outputs = self.model(input_ids, attention_mask=attention_mask)
        sequence_output = outputs[0]

        sentence_representation = self.cls_fclayer(outputs.pooler_output)

        e1_hidden = self.entity_hidden_average(sequence_output, e1_mask)
        e2_hidden = self.entity_hidden_average(sequence_output, e2_mask)

        e1_hidden = self.e1_fclayer(e1_hidden)
        e2_hidden = self.e2_fclayer(e2_hidden)

        concat_hidden = torch.cat([sentence_representation, e1_hidden, e2_hidden], dim=-1)
        logits = self.label_classifier(concat_hidden)
        outputs = (logits,) + outputs[2:]

        loss_func = nn.CrossEntropyLoss()
        loss = loss_func(logits.view(-1, self.num_labels), labels.view(-1))
        outputs = (loss,) + outputs

        return outputs

In [None]:
# test = convert_sentence_to_features(train_raw[10:13], tokenizer, 100)

# Train

In [None]:
num_train_epochs = 10
adam_learning_rate = 1e-5
adam_epsilon = 1e-8
weight_decay = 1e-2

gradient_accumulation_steps = 2

max_len = 200
batch_size = 16
eval_batch_size = 16

In [None]:
def compute_metrics(preds, labels):
    assert len(preds) == len(labels)
    return acc_and_f1(preds, labels)

def simple_accuracy(preds, labels):
    return (preds == labels).mean()

def acc_and_f1(preds, labels, average="macro"):
    acc = simple_accuracy(preds, labels)
    return {
        "acc": acc,
    }

def init_logger():
    logging.basicConfig(
        format="%(asctime)s - %(levelname)s - %(name)s -   %(message)s",
        datefmt="%m/%d/%Y %H:%M:%S",
        level=logging.INFO,
    )

In [None]:
config = AutoConfig.from_pretrained(
    "klue/roberta-large",
    num_labels=2
)

model = HomonymNet(
    "klue/roberta-large",
    config=config,
    dropout_rate=0.1
)

model.to(device)

# Prepare optimizer and schedule (linear warmup and decay)
no_decay = ["bias", "LayerNorm.weight"]
optimizer_grouped_parameters = [
    {
        "params": [p for n, p in model.named_parameters() if not any(nd in n for nd in no_decay)],
        "weight_decay": weight_decay,
    },
    {
        "params": [p for n, p in model.named_parameters() if any(nd in n for nd in no_decay)],
        "weight_decay": 0.0,
    },
]
optimizer = AdamW(
    optimizer_grouped_parameters,
    lr=adam_learning_rate,
    eps=adam_epsilon,
)


# train_dataloader = DataLoader(train_dataset, batch_size=16)
# total_steps = len(train_dataloader) * num_train_epochs
# scheduler = get_linear_schedule_with_warmup(optimizer, 
#                                             num_warmup_steps = 0,
#                                             num_training_steps = total_steps)

Some weights of the model checkpoint at klue/roberta-large were not used when initializing RobertaModel: ['lm_head.dense.weight', 'lm_head.layer_norm.weight', 'lm_head.decoder.weight', 'lm_head.layer_norm.bias', 'lm_head.dense.bias', 'lm_head.decoder.bias', 'lm_head.bias']
- This IS expected if you are initializing RobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaModel were not initialized from the model checkpoint at klue/roberta-large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it f

In [None]:
model.zero_grad()

In [None]:
test_raw = load_data("/content/NIKL_SKT_WiC_Dev.tsv")
test_dataset = convert_sentence_to_features(test_raw, tokenizer, max_len=max_len)

base_dir = '/content/drive/MyDrive/NLP_FinalProject/'

def save_checkpoint(model, model_dir):
    if not os.path.exists(model_dir):
        os.makedirs(model_dir)

    model_to_save = model.module if hasattr(model, 'module') else model
    model_to_save.save_pretrained(model_dir)


def evaluate(model, fold_id, best_score, mode='valid'):
    if mode=='valid':
        test_dataloader = DataLoader(valid_dataset, batch_size=eval_batch_size)
    elif mode=='test':
        test_dataloader = DataLoader(test_dataset, batch_size=eval_batch_size)

    predict = None
    eval_loss = 0.0
    nb_eval_steps = 0

    model.eval()

    for batch in tqdm(test_dataloader, desc="Evaluating"):
        batch = tuple(batch[t].to(device) for t in batch)
        with torch.no_grad():
            inputs = {
                "input_ids":batch[0],
                "attention_mask":batch[1],
                "e1_mask":batch[2],
                "e2_mask":batch[3],
                "labels":batch[4]
            }
            outputs = model(**inputs)
            tmp_eval_loss, logits = outputs[:2]
            eval_loss += tmp_eval_loss.mean().item()
        nb_eval_steps+=1

        if predict is None:
            predict = logits.detach().cpu().numpy()
            out_label_ids = inputs["labels"].detach().cpu().numpy()
        else:
            predict = np.append(predict, logits.detach().cpu().numpy(), axis=0)
            out_label_ids = np.append(out_label_ids, inputs["labels"].detach().cpu().numpy(), axis=0)
        # print(list(zip(predict_label, out_label_ids)))


    predict_label = np.argmax(predict, axis=1)
    result = compute_metrics(predict_label, out_label_ids)

    eval_loss = eval_loss / nb_eval_steps
    results = {"loss": eval_loss}
    results.update(result)

    if mode =='valid':
        if result['acc'] > best_score:
            save_checkpoint(model, f"{base_dir}model_fold_{fold_id}")
            best_score = result['acc']
            print(f"Saved new best model - acc : {best_score}")

    return results, best_score

  0%|          | 0/1166 [00:00<?, ?it/s]

In [None]:
from sklearn.model_selection import KFold
kfold = KFold(n_splits=5, random_state=2934, shuffle=True)

best_score=0

for fold_id, (train_ids, valid_ids) in tqdm(enumerate(kfold.split(train_raw))):

    train_dataset = convert_sentence_to_features(train_raw.iloc[train_ids].reset_index(), tokenizer, max_len=max_len)
    valid_dataset = convert_sentence_to_features(train_raw.iloc[valid_ids].reset_index(), tokenizer, max_len=max_len)

    train_dataloader = DataLoader(train_dataset, batch_size=batch_size)
    total_steps = len(train_dataloader) * num_train_epochs
    scheduler = get_linear_schedule_with_warmup(optimizer, 
                                                num_warmup_steps = 0,
                                                num_training_steps = total_steps)

    print(f"FOLD {fold_id+1} : Split train dataset to train vs valid")

    train_loss = 0.0
    fold_best_score = 0

    for epoch_step in tqdm(range(num_train_epochs), desc="Epoch"):
        for step, batch in enumerate(tqdm(train_dataloader, desc="Iteration")):
            model.train()
            batch = tuple(batch[t].to(device) for t in batch)
            inputs = {
                "input_ids":batch[0],
                "attention_mask":batch[1],
                "e1_mask":batch[2],
                "e2_mask":batch[3],
                "labels":batch[4]
            }

            outputs = model(**inputs)
            loss = outputs[0]
            loss.backward()

            train_loss += loss.item()

            optimizer.step()
            scheduler.step()
            model.zero_grad()

        valid_loss, fold_best_score = evaluate(model, fold_id+1, fold_best_score, 'valid')
        test_loss, _ = evaluate(model, fold_id+1, fold_best_score, 'test')
            
        print(f"============================= Epoch #{epoch_step+1} =============================")
        print(f" - train: {train_loss}")
        print(f" - valid: {valid_loss}")
        print(f" - test : {test_loss}")
    
    print(f"FOLD {fold_id+1} Best Validation Score : {fold_best_score}")

0it [00:00, ?it/s]

  0%|          | 0/6198 [00:00<?, ?it/s]

  0%|          | 0/1550 [00:00<?, ?it/s]

FOLD 1 : Split train dataset to train vs valid


Epoch:   0%|          | 0/10 [00:00<?, ?it/s]

Iteration:   0%|          | 0/387 [00:00<?, ?it/s]

  import sys


Evaluating:   0%|          | 0/97 [00:00<?, ?it/s]

Saved new best model - acc : 0.867741935483871


Evaluating:   0%|          | 0/73 [00:00<?, ?it/s]

 - train: 175.21946163475513
 - valid: {'loss': 0.31914254404681247, 'acc': 0.867741935483871}
 - test : {'loss': 0.36808864792732343, 'acc': 0.8361921097770154}


Iteration:   0%|          | 0/387 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/97 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/73 [00:00<?, ?it/s]

 - train: 248.15106100030243
 - valid: {'loss': 0.3816377884641136, 'acc': 0.8574193548387097}
 - test : {'loss': 0.4302674982543678, 'acc': 0.8336192109777015}


Iteration:   0%|          | 0/387 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/97 [00:00<?, ?it/s]

Saved new best model - acc : 0.8832258064516129


Evaluating:   0%|          | 0/73 [00:00<?, ?it/s]

 - train: 283.81133846961893
 - valid: {'loss': 0.3488727323326868, 'acc': 0.8832258064516129}
 - test : {'loss': 0.37221019574096553, 'acc': 0.8739279588336192}


Iteration:   0%|          | 0/387 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/97 [00:00<?, ?it/s]

Saved new best model - acc : 0.8941935483870967


Evaluating:   0%|          | 0/73 [00:00<?, ?it/s]

 - train: 299.3879340351559
 - valid: {'loss': 0.40615314065560354, 'acc': 0.8941935483870967}
 - test : {'loss': 0.43369562131646155, 'acc': 0.8782161234991424}


Iteration:   0%|          | 0/387 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/97 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/73 [00:00<?, ?it/s]

 - train: 308.37615987303434
 - valid: {'loss': 0.43474665818467123, 'acc': 0.8935483870967742}
 - test : {'loss': 0.3894203013954971, 'acc': 0.9030874785591767}


Iteration:   0%|          | 0/387 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/97 [00:00<?, ?it/s]

Saved new best model - acc : 0.896774193548387


Evaluating:   0%|          | 0/73 [00:00<?, ?it/s]

 - train: 316.3015630117152
 - valid: {'loss': 0.5324585501245012, 'acc': 0.896774193548387}
 - test : {'loss': 0.6379061541248955, 'acc': 0.8653516295025729}


Iteration:   0%|          | 0/387 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/97 [00:00<?, ?it/s]

Saved new best model - acc : 0.8987096774193548


Evaluating:   0%|          | 0/73 [00:00<?, ?it/s]

 - train: 320.7822742186836
 - valid: {'loss': 0.505437695896541, 'acc': 0.8987096774193548}
 - test : {'loss': 0.5206475736065417, 'acc': 0.8867924528301887}


Iteration:   0%|          | 0/387 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/97 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/73 [00:00<?, ?it/s]

 - train: 323.51892579710693
 - valid: {'loss': 0.5388643840962742, 'acc': 0.8980645161290323}
 - test : {'loss': 0.5865686733187144, 'acc': 0.8833619210977701}


Iteration:   0%|          | 0/387 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/97 [00:00<?, ?it/s]

Saved new best model - acc : 0.9045161290322581


Evaluating:   0%|          | 0/73 [00:00<?, ?it/s]

 - train: 324.6999657414999
 - valid: {'loss': 0.5453895315931169, 'acc': 0.9045161290322581}
 - test : {'loss': 0.5584979182328911, 'acc': 0.8893653516295026}


Iteration:   0%|          | 0/387 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/97 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/73 [00:00<?, ?it/s]

 - train: 326.06763824200607
 - valid: {'loss': 0.5214242364948114, 'acc': 0.9045161290322581}
 - test : {'loss': 0.5232703040331229, 'acc': 0.902229845626072}
FOLD 1 Best Validation Score : 0.9045161290322581


  0%|          | 0/6198 [00:00<?, ?it/s]

  0%|          | 0/1550 [00:00<?, ?it/s]

FOLD 2 : Split train dataset to train vs valid


Epoch:   0%|          | 0/10 [00:00<?, ?it/s]

Iteration:   0%|          | 0/388 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/97 [00:00<?, ?it/s]

Saved new best model - acc : 0.9870801033591732


Evaluating:   0%|          | 0/73 [00:00<?, ?it/s]

 - train: 53.86945734539768
 - valid: {'loss': 0.0463369293618448, 'acc': 0.9870801033591732}
 - test : {'loss': 0.2958194660341801, 'acc': 0.8893653516295026}


Iteration:   0%|          | 0/388 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/97 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/73 [00:00<?, ?it/s]

 - train: 74.8571943702409
 - valid: {'loss': 0.048732020168507605, 'acc': 0.9819121447028424}
 - test : {'loss': 0.3677190268055013, 'acc': 0.8936535162950258}


Iteration:   0%|          | 0/388 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/97 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/73 [00:00<?, ?it/s]

 - train: 85.26454370561987
 - valid: {'loss': 0.04102078299926229, 'acc': 0.9857881136950905}
 - test : {'loss': 0.3686970810295076, 'acc': 0.9030874785591767}


Iteration:   0%|          | 0/388 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/97 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/73 [00:00<?, ?it/s]

 - train: 90.40160452730197
 - valid: {'loss': 0.07665746079964236, 'acc': 0.9722222222222222}
 - test : {'loss': 0.6237288067059126, 'acc': 0.8653516295025729}


Iteration:   0%|          | 0/388 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/97 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/73 [00:00<?, ?it/s]

 - train: 92.87033745017106
 - valid: {'loss': 0.08827089782744693, 'acc': 0.979328165374677}
 - test : {'loss': 0.7379534253983141, 'acc': 0.8807890222984562}


Iteration:   0%|          | 0/388 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/97 [00:00<?, ?it/s]

Saved new best model - acc : 0.9903100775193798


Evaluating:   0%|          | 0/73 [00:00<?, ?it/s]

 - train: 94.59809174108523
 - valid: {'loss': 0.03554285421349275, 'acc': 0.9903100775193798}
 - test : {'loss': 0.5202185789104539, 'acc': 0.9125214408233276}


Iteration:   0%|          | 0/388 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/97 [00:00<?, ?it/s]

Saved new best model - acc : 0.9928940568475452


Evaluating:   0%|          | 0/73 [00:00<?, ?it/s]

 - train: 95.50805905049856
 - valid: {'loss': 0.032596244213638464, 'acc': 0.9928940568475452}
 - test : {'loss': 0.5664403973812974, 'acc': 0.9108061749571184}


Iteration:   0%|          | 0/388 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/97 [00:00<?, ?it/s]

Saved new best model - acc : 0.9941860465116279


Evaluating:   0%|          | 0/73 [00:00<?, ?it/s]

 - train: 97.34151098701477
 - valid: {'loss': 0.02663809347576876, 'acc': 0.9941860465116279}
 - test : {'loss': 0.4553027201220848, 'acc': 0.9176672384219554}


Iteration:   0%|          | 0/388 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/97 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/73 [00:00<?, ?it/s]

 - train: 97.98967271219226
 - valid: {'loss': 0.026419475138064842, 'acc': 0.9928940568475452}
 - test : {'loss': 0.49951921217258155, 'acc': 0.9159519725557461}


Iteration:   0%|          | 0/388 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/97 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/73 [00:00<?, ?it/s]

 - train: 98.44368171421957
 - valid: {'loss': 0.02610808132306271, 'acc': 0.9941860465116279}
 - test : {'loss': 0.5155281654681372, 'acc': 0.9150943396226415}
FOLD 2 Best Validation Score : 0.9941860465116279


  0%|          | 0/6198 [00:00<?, ?it/s]

  0%|          | 0/1550 [00:00<?, ?it/s]

FOLD 3 : Split train dataset to train vs valid


Epoch:   0%|          | 0/10 [00:00<?, ?it/s]

Iteration:   0%|          | 0/388 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/97 [00:00<?, ?it/s]

Saved new best model - acc : 0.9948353776630084


Evaluating:   0%|          | 0/73 [00:00<?, ?it/s]

 - train: 23.03365597523225
 - valid: {'loss': 0.0201018752422803, 'acc': 0.9948353776630084}
 - test : {'loss': 0.4272961227194572, 'acc': 0.8962264150943396}


Iteration:   0%|          | 0/388 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/97 [00:00<?, ?it/s]

Saved new best model - acc : 0.9987088444157521


Evaluating:   0%|          | 0/73 [00:00<?, ?it/s]

 - train: 35.03341858070053
 - valid: {'loss': 0.007050957784720351, 'acc': 0.9987088444157521}
 - test : {'loss': 0.3787213703974274, 'acc': 0.9116638078902229}


Iteration:   0%|          | 0/388 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/97 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/73 [00:00<?, ?it/s]

 - train: 38.840546692328644
 - valid: {'loss': 0.009196584025930767, 'acc': 0.9961265332472563}
 - test : {'loss': 0.49387541103500904, 'acc': 0.9005145797598628}


Iteration:   0%|          | 0/388 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/97 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/73 [00:00<?, ?it/s]

 - train: 41.905452725586656
 - valid: {'loss': 0.010036989585940544, 'acc': 0.9967721110393802}
 - test : {'loss': 0.531471793046253, 'acc': 0.902229845626072}


Iteration:   0%|          | 0/388 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/97 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/73 [00:00<?, ?it/s]

 - train: 43.67794245341793
 - valid: {'loss': 0.012746295056054693, 'acc': 0.9961265332472563}
 - test : {'loss': 0.5771112779000087, 'acc': 0.9013722126929674}


Iteration:   0%|          | 0/388 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/97 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/73 [00:00<?, ?it/s]

 - train: 46.038247143194894
 - valid: {'loss': 0.008385462366877431, 'acc': 0.9980632666236281}
 - test : {'loss': 0.5183175773513228, 'acc': 0.9176672384219554}


Iteration:   0%|          | 0/388 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/97 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/73 [00:00<?, ?it/s]

 - train: 47.90858751388805
 - valid: {'loss': 0.012190682144962208, 'acc': 0.9961265332472563}
 - test : {'loss': 0.49956698048157555, 'acc': 0.9090909090909091}


Iteration:   0%|          | 0/388 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/97 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/73 [00:00<?, ?it/s]

 - train: 48.13246570282172
 - valid: {'loss': 0.012393204618608262, 'acc': 0.9954809554551324}
 - test : {'loss': 0.5632018422841374, 'acc': 0.9056603773584906}


Iteration:   0%|          | 0/388 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/97 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/73 [00:00<?, ?it/s]

 - train: 48.33737765034493
 - valid: {'loss': 0.009350194090768933, 'acc': 0.9967721110393802}
 - test : {'loss': 0.5430890839094514, 'acc': 0.9133790737564322}


Iteration:   0%|          | 0/388 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/97 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/73 [00:00<?, ?it/s]

 - train: 48.69416635111884
 - valid: {'loss': 0.007574125187883763, 'acc': 0.9974176888315042}
 - test : {'loss': 0.5367746263214591, 'acc': 0.91852487135506}
FOLD 3 Best Validation Score : 0.9987088444157521


  0%|          | 0/6199 [00:00<?, ?it/s]

  0%|          | 0/1549 [00:00<?, ?it/s]

FOLD 4 : Split train dataset to train vs valid


Epoch:   0%|          | 0/10 [00:00<?, ?it/s]

Iteration:   0%|          | 0/388 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/97 [00:00<?, ?it/s]

Saved new best model - acc : 0.9948320413436692


Evaluating:   0%|          | 0/73 [00:00<?, ?it/s]

 - train: 14.340447859820415
 - valid: {'loss': 0.012649543001316488, 'acc': 0.9948320413436692}
 - test : {'loss': 0.4612082270698061, 'acc': 0.8816466552315609}


Iteration:   0%|          | 0/388 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/97 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/73 [00:00<?, ?it/s]

 - train: 21.138514818474505
 - valid: {'loss': 0.012324593243244364, 'acc': 0.9941860465116279}
 - test : {'loss': 0.5887299491561413, 'acc': 0.8756432246998285}


Iteration:   0%|          | 0/388 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/97 [00:00<?, ?it/s]

Saved new best model - acc : 0.998062015503876


Evaluating:   0%|          | 0/73 [00:00<?, ?it/s]

 - train: 25.422293943589466
 - valid: {'loss': 0.00898792289722364, 'acc': 0.998062015503876}
 - test : {'loss': 0.5502358857679424, 'acc': 0.8919382504288165}


Iteration:   0%|          | 0/388 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/97 [00:00<?, ?it/s]

Saved new best model - acc : 0.9993540051679587


Evaluating:   0%|          | 0/73 [00:00<?, ?it/s]

 - train: 30.293977473254927
 - valid: {'loss': 0.004288544263582696, 'acc': 0.9993540051679587}
 - test : {'loss': 0.4887046898529412, 'acc': 0.8945111492281304}


Iteration:   0%|          | 0/388 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/97 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/73 [00:00<?, ?it/s]

 - train: 32.20459563817349
 - valid: {'loss': 0.012885348679475035, 'acc': 0.9961240310077519}
 - test : {'loss': 0.7020425592440106, 'acc': 0.8782161234991424}


Iteration:   0%|          | 0/388 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/97 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/73 [00:00<?, ?it/s]

 - train: 33.49730655534586
 - valid: {'loss': 0.00584012367730371, 'acc': 0.9993540051679587}
 - test : {'loss': 0.5332156104620842, 'acc': 0.902229845626072}


Iteration:   0%|          | 0/388 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/97 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/73 [00:00<?, ?it/s]

 - train: 33.93697104850071
 - valid: {'loss': 0.007001000319150146, 'acc': 0.9987080103359173}
 - test : {'loss': 0.5461816046478305, 'acc': 0.9065180102915952}


Iteration:   0%|          | 0/388 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/97 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/73 [00:00<?, ?it/s]

 - train: 34.4138125157624
 - valid: {'loss': 0.0056400260636832004, 'acc': 0.9987080103359173}
 - test : {'loss': 0.5548668404440483, 'acc': 0.9159519725557461}


Iteration:   0%|          | 0/388 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/97 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/73 [00:00<?, ?it/s]

 - train: 34.561080071773176
 - valid: {'loss': 0.004255652458727427, 'acc': 0.9993540051679587}
 - test : {'loss': 0.600477618486478, 'acc': 0.9073756432246999}


Iteration:   0%|          | 0/388 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/97 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/73 [00:00<?, ?it/s]

 - train: 34.750978058926194
 - valid: {'loss': 0.003963587763829525, 'acc': 0.9993540051679587}
 - test : {'loss': 0.6334551879094076, 'acc': 0.9048027444253859}
FOLD 4 Best Validation Score : 0.9993540051679587


  0%|          | 0/6199 [00:00<?, ?it/s]

  0%|          | 0/1549 [00:00<?, ?it/s]

FOLD 5 : Split train dataset to train vs valid


Epoch:   0%|          | 0/10 [00:00<?, ?it/s]

Iteration:   0%|          | 0/388 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/97 [00:00<?, ?it/s]

Saved new best model - acc : 0.9967679379444085


Evaluating:   0%|          | 0/73 [00:00<?, ?it/s]

 - train: 13.872146839308698
 - valid: {'loss': 0.013408834579731962, 'acc': 0.9967679379444085}
 - test : {'loss': 0.4438205896214939, 'acc': 0.8919382504288165}


Iteration:   0%|          | 0/388 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/97 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/73 [00:00<?, ?it/s]

 - train: 21.008705467324035
 - valid: {'loss': 0.015807862932728987, 'acc': 0.9961215255332903}
 - test : {'loss': 0.4861819090657351, 'acc': 0.8842195540308748}


Iteration:   0%|          | 0/388 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/97 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/73 [00:00<?, ?it/s]

 - train: 24.24530072401467
 - valid: {'loss': 0.020304530551828102, 'acc': 0.995475113122172}
 - test : {'loss': 0.5977132456720614, 'acc': 0.888507718696398}


Iteration:   0%|          | 0/388 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/97 [00:00<?, ?it/s]

Saved new best model - acc : 0.9980607627666451


Evaluating:   0%|          | 0/73 [00:00<?, ?it/s]

 - train: 25.1652614123268
 - valid: {'loss': 0.012374705481428482, 'acc': 0.9980607627666451}
 - test : {'loss': 0.5601290590964036, 'acc': 0.8987993138936535}


Iteration:   0%|          | 0/388 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/97 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/73 [00:00<?, ?it/s]

 - train: 25.382759531661577
 - valid: {'loss': 0.0181459012459452, 'acc': 0.9974143503555268}
 - test : {'loss': 0.6379242664018309, 'acc': 0.8979416809605489}


Iteration:   0%|          | 0/388 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/97 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/73 [00:00<?, ?it/s]

 - train: 26.445605373701255
 - valid: {'loss': 0.012452246974303824, 'acc': 0.9961215255332903}
 - test : {'loss': 0.6590665788397218, 'acc': 0.8962264150943396}


Iteration:   0%|          | 0/388 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/97 [00:00<?, ?it/s]

Saved new best model - acc : 0.9987071751777634


Evaluating:   0%|          | 0/73 [00:00<?, ?it/s]

 - train: 27.375090847719548
 - valid: {'loss': 0.007617001662662549, 'acc': 0.9987071751777634}
 - test : {'loss': 0.5870426107737567, 'acc': 0.9142367066895368}


Iteration:   0%|          | 0/388 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/97 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/73 [00:00<?, ?it/s]

 - train: 27.486990069237436
 - valid: {'loss': 0.008579147422246886, 'acc': 0.9987071751777634}
 - test : {'loss': 0.60213552526495, 'acc': 0.9099485420240138}


Iteration:   0%|          | 0/388 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/97 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/73 [00:00<?, ?it/s]

 - train: 27.8457896581549
 - valid: {'loss': 0.009496904418446105, 'acc': 0.9980607627666451}
 - test : {'loss': 0.5904141553148218, 'acc': 0.9048027444253859}


Iteration:   0%|          | 0/388 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/97 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/73 [00:00<?, ?it/s]

 - train: 27.929937062446697
 - valid: {'loss': 0.009420584432631322, 'acc': 0.9980607627666451}
 - test : {'loss': 0.5948210134378539, 'acc': 0.9039451114922813}
FOLD 5 Best Validation Score : 0.9987071751777634
