# Final Project: 2021년 국립국어원 인공지능 언어능력 평가

- [2021년 국립국어원 인공지능 언어능력 평가](https://corpus.korean.go.kr/task/taskList.do?taskId=1&clCd=END_TASK&subMenuId=sub01) 는 9월 1일부터 시작하여 11월 1일까지 마감된 [네 가지 과제에](https://corpus.korean.go.kr/task/taskDownload.do?taskId=1&clCd=END_TASK&subMenuId=sub02) 대한 언어능력 평가 대회
- 여기서 제시된 과제를 그대로 수행하여 그 결과를 [최종 선정된 결과들](https://corpus.korean.go.kr/task/taskLeaderBoard.do?taskId=4&clCd=END_TASK&subMenuId=sub04)과 비교할 수 있도록 수행
- 아직 테스트 셋의 정답이 공식적으로 공개되고 있지 않아, 네 가지 과제의 자료에서 evaluation dataset으로 가지고 성능을 비교할 계획
- 기말 발표전까지 정답셋이 공개될 경우 이 정답셋을 가지고 성능 검증
- Transformers 기반 방법론, 신경망 등 각자 생각한 방법대로 구현 가능
- 현재 대회기간이 종료되어 자료가 다운로드 가능하지 않으니 첨부된 자료 참조
- 개인적으로 하거나 최대 두명까지 그룹 허용. 
- 이 노트북 화일에 이름을 변경하여 작업하고 제출. 제출시 화일명을 FinalProject_[DS또는 CL]_학과_이름.ipynb
- 마감 12월 6일(월) 23:59분까지.
- 12월 7일, 9일 기말 발표 presentation 예정

## 리더보드

- 최종발표전까지 각조는 각 태스크별 실행성능을 **시도된 여러 방법의 결과들을 지속적으로**  [리더보드](https://docs.google.com/spreadsheets/d/1-uenfp5GolpY2Gf0TsFbODvj585IIiFKp9fvYxcfgkY/edit#gid=0)에 해당 팀명(구성원 이름 포함)을 입력하여 공개하여야 함. 
- 최종 마감일에 이 순위와 실제 제출한 프로그램의 수행 결과를 비교하여 성능을 확인

# Task 4. 판정 의문문

In [1]:
!pip install git+https://git@github.com/SKTBrain/KoBERT.git@master

Collecting git+https://****@github.com/SKTBrain/KoBERT.git@master
  Cloning https://****@github.com/SKTBrain/KoBERT.git (to revision master) to /tmp/pip-req-build-2nyvz4dz
Collecting gluonnlp>=0.6.0
  Downloading gluonnlp-0.10.0.tar.gz (344 kB)
[K     |████████████████████████████████| 344 kB 6.3 MB/s eta 0:00:01
Collecting mxnet>=1.4.0
  Downloading mxnet-1.8.0.post0-py2.py3-none-manylinux2014_x86_64.whl (46.9 MB)
[K     |████████████████████████████████| 46.9 MB 140 kB/s  eta 0:00:01
Collecting graphviz<0.9.0,>=0.8.1
  Downloading graphviz-0.8.4-py2.py3-none-any.whl (16 kB)
Collecting onnxruntime>=0.3.0
  Downloading onnxruntime-1.9.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.8 MB)
[K     |████████████████████████████████| 4.8 MB 28.0 MB/s eta 0:00:01
Building wheels for collected packages: kobert, gluonnlp
  Building wheel for kobert (setup.py) ... [?25ldone
[?25h  Created wheel for kobert: filename=kobert-0.1.2-py3-none-any.whl size=13124 sha256=d9bd2d9138f3

In [2]:
!pip install sentencepiece



# Import

In [10]:
import argparse
from numpy.core.numeric import Infinity
import re
from tqdm import tqdm
import numpy as np
import torch
from torch import nn
from torch.nn import DataParallel
from torch.utils.data import TensorDataset, DataLoader, SequentialSampler
from transformers import PreTrainedModel,PretrainedConfig
from transformers import BertPreTrainedModel,BertModel, AdamW
from transformers import ElectraModel,ElectraForSequenceClassification, ElectraTokenizer
from transformers import AutoModelForSequenceClassification, AutoTokenizer

# Model & I/O

In [11]:
# koelec
class QA_Model(BertPreTrainedModel):
    def __init__(self, config):
        super().__init__(config)
        self.config = config
        self.max_length = config.max_length
        self.hidden_size = config.hidden_size
        self.bert = ElectraForSequenceClassification.from_pretrained("monologg/koelectra-base-v3-discriminator")
        self.dense = nn.Linear(config.hidden_size,config.hidden_size)
        self.dropout = nn.Dropout(config.hidden_dropout_prob)
        self.activation = nn.Tanh() 
        
        self.classifier = nn.Linear(config.hidden_size, config.num_labels) # > 배치, 라벨 수(true or false : 2)
        self.init_weights()
    
    def forward(self, input_ids, attention_mask=None, token_type_ids=None):
        bert_outputs = self.bert(input_ids,
                                 attention_mask=attention_mask,token_type_ids=token_type_ids)
        return bert_outputs[0]

# roberta
class QA_Model1(BertPreTrainedModel):
    def __init__(self, config):
        super().__init__(config)
        self.config = config
        self.max_length = config.max_length
        self.hidden_size = config.hidden_size
        self.bert = AutoModelForSequenceClassification.from_pretrained("klue/roberta-large")
        self.dense = nn.Linear(config.hidden_size,config.hidden_size)
        self.dropout = nn.Dropout(config.hidden_dropout_prob)
        self.activation = nn.Tanh() 
        
        self.classifier = nn.Linear(config.hidden_size, config.num_labels) # > 배치, 라벨 수(true or false : 2)
        self.init_weights()
    
    def forward(self, input_ids, attention_mask=None, token_type_ids=None):
        bert_outputs = self.bert(input_ids,
                                 attention_mask=attention_mask,token_type_ids=token_type_ids)
        return bert_outputs[0]

In [12]:
# koelec
class QA_trainer():
    def __init__(self,num_labels,hidden_size,max_length,hidden_dropout_prob,batch_size,epoch,model_ver,learning_rate,weight_decay):
        self.hidden_size = hidden_size
        self.max_length = max_length
        self.hidden_dropout_prob = hidden_dropout_prob
        self.num_labels = num_labels
        self.batch_size = batch_size
        self.epoch = epoch
        self.model_ver = model_ver
        self.learning_rate = learning_rate
        self.weight_decay = weight_decay
        self.model = QA_Model.from_pretrained(self.model_ver,num_labels=self.num_labels,hidden_size=self.hidden_size,hidden_dropout_prob=self.hidden_dropout_prob,max_length=self.max_length)
        self.tokenizer = ElectraTokenizer.from_pretrained("monologg/koelectra-base-v3-discriminator")
        self.sep_token = self.tokenizer.sep_token
    def make_dataset(self):
        '''
            data =
            [
            sentence_id : str
            Context : str
            Question : str
            ( True : 1 or False : 0 ) : int 
            ]
        '''
        train_data = []
        with open('./data/task4/SKT_BoolQ_Train.tsv',encoding='UTF8') as f:
            infos = f.readlines()
            for info in infos[1:]:
                pre_data = info.split('\t')[:-1]
                last_data = int(re.sub('\n','',info.split('\t')[-1]))
                pre_data.append(last_data)
                train_data.append(pre_data)
        val_data = []
        with open('./data/task4/SKT_BoolQ_Dev.tsv',encoding='UTF8') as f:
            infos = f.readlines()
            for info in infos[1:]:
                pre_data = info.split('\t')[:-1]
                last_data = int(re.sub('\n','',info.split('\t')[-1]))
                pre_data.append(last_data)
                val_data.append(pre_data)
        test_data = []
        with open('./data/task4/SKT_BoolQ_Test.tsv',encoding='UTF8') as f:
            infos = f.readlines()
            for info in infos[1:]:
                pre_data = info.split('\t')[:-1]
                test_data.append(pre_data)
        #train_data
        train_data_context = [data[1] for data in train_data]
        train_data_question = [data[2] for data in train_data]
        train_result = {'input_ids' : torch.tensor([]) , 'attention_mask' : torch.tensor([]),'token_type_ids': torch.tensor([]),'answer' : torch.tensor([])}
        for data in range(len(train_data_context)):
            train_data_tokenized = self.tokenizer.encode_plus(train_data_context[data],train_data_question[data],return_token_type_ids=True,max_length= self.max_length, padding ='max_length', return_attention_mask=True, truncation=True,return_tensors='pt' )
            truncated_input_ids = train_data_tokenized['input_ids']
            truncated_attention_masks = train_data_tokenized['attention_mask']
            truncated_token_type_ids = train_data_tokenized['token_type_ids']
            train_result['input_ids'] = torch.cat([train_result['input_ids'], truncated_input_ids], dim = 0) 
            train_result['attention_mask'] = torch.cat([train_result['attention_mask'], truncated_attention_masks], dim = 0)
            train_result['token_type_ids'] = torch.cat([train_result['token_type_ids'], truncated_token_type_ids], dim = 0)
        train_result['input_ids'] = train_result['input_ids'].long()
        train_result['attention_mask'] = train_result['attention_mask'].long()
        train_result['token_type_ids'] = train_result['token_type_ids'].long()
        train_data_answer = torch.tensor([data[3] for data in train_data])
        train_result['answer'] = train_data_answer

        #val_data
        val_data_context = [data[1] for data in val_data]
        val_data_question = [data[2] for data in val_data]
        val_result = {'input_ids' : torch.tensor([]) , 'attention_mask' : torch.tensor([]),'token_type_ids': torch.tensor([]),'answer' : torch.tensor([])}
        for data in range(len(val_data_context)):
            val_data_tokenized = self.tokenizer.encode_plus(val_data_context[data],val_data_question[data],return_token_type_ids=True,max_length= self.max_length, padding ='max_length', return_attention_mask=True, truncation=True,return_tensors='pt' )
            truncated_input_ids = val_data_tokenized['input_ids']
            truncated_attention_masks = val_data_tokenized['attention_mask']
            truncated_token_type_ids = val_data_tokenized['token_type_ids']
            val_result['input_ids'] = torch.cat([val_result['input_ids'], truncated_input_ids], dim = 0) 
            val_result['attention_mask'] = torch.cat([val_result['attention_mask'], truncated_attention_masks], dim = 0)
            val_result['token_type_ids'] = torch.cat([val_result['token_type_ids'], truncated_token_type_ids], dim = 0)
        val_result['input_ids'] = val_result['input_ids'].long()
        val_result['attention_mask'] = val_result['attention_mask'].long()
        val_result['token_type_ids'] = val_result['token_type_ids'].long()
        val_data_answer = torch.tensor([data[3] for data in val_data])
        val_result['answer'] = val_data_answer

        #test_data
        test_data_context = [data[1] for data in test_data]
        test_data_question = [data[2] for data in test_data]
        test_result = {'input_ids' : torch.tensor([]) , 'attention_mask' : torch.tensor([]),'token_type_ids': torch.tensor([])}
        for data in range(len(test_data_context)):
            test_data_tokenized = self.tokenizer.encode_plus(test_data_context[data],test_data_question[data],return_token_type_ids=True,max_length= self.max_length, padding ='max_length', return_attention_mask=True, truncation=True,return_tensors='pt' )
            truncated_input_ids = test_data_tokenized['input_ids']
            truncated_attention_masks = test_data_tokenized['attention_mask']
            truncated_token_type_ids = test_data_tokenized['token_type_ids']
            test_result['input_ids'] = torch.cat([test_result['input_ids'], truncated_input_ids], dim = 0) 
            test_result['attention_mask'] = torch.cat([test_result['attention_mask'], truncated_attention_masks], dim = 0)
            test_result['token_type_ids'] = torch.cat([test_result['token_type_ids'], truncated_token_type_ids], dim = 0)
        test_result['input_ids'] = test_result['input_ids'].long()
        test_result['attention_mask'] = test_result['attention_mask'].long()
        test_result['token_type_ids'] = test_result['token_type_ids'].long()

        train_dataset = TensorDataset(train_result["input_ids"], train_result["attention_mask"],train_result['token_type_ids'],train_result['answer'])
        val_dataset = TensorDataset(val_result["input_ids"], val_result["attention_mask"],val_result['token_type_ids'],val_result['answer'])
        test_dataset = TensorDataset(test_result["input_ids"], test_result["attention_mask"],test_result['token_type_ids'])

        train_data_loader = DataLoader(train_dataset, batch_size=self.batch_size, shuffle=False, drop_last=False)
        val_data_loader = DataLoader(val_dataset, batch_size=self.batch_size, shuffle=False, drop_last=False)
        test_data_loader = DataLoader(test_dataset, batch_size=self.batch_size, shuffle=False, drop_last=False)

        return train_data_loader, val_data_loader, test_data_loader

    def accuracy(self,predict,label):
        predict_answer = predict.argmax(dim=-1)
        correct = predict_answer.eq(label.view_as(predict_answer)).sum()
        return correct.float() / label.shape[0]

    def QA_evaluate(self,model,device,loss_func,data):
        model.eval()
        epoch_losses = []
        epoch_accs = []
        with torch.no_grad():
            for batch in data:
            # for batch in tqdm(data,desc='dev_batch'):
                input_id = batch[0].to(device)
                attention_mask = batch[1].to(device)
                token_type_ids = batch[2].to(device)
                answer = batch[3].to(device)
                predictions = model(input_ids=input_id, attention_mask = attention_mask,token_type_ids=token_type_ids)
                loss = loss_func(predictions,answer)
                acc = self.accuracy(predictions, answer)
                epoch_losses.append(loss.item())
                epoch_accs.append(acc.item())
        return epoch_losses, epoch_accs

    def QA_model_train(self,model,device,optimizer,loss_func,train,print_epoch): #validation 추가
        epoch_loss = 0
        epoch_acc = 0
        model.train()
        for batch in train:
        # for batch in tqdm(train,desc='train_batch'):
            optimizer.zero_grad()
            input_id = batch[0].to(device)
            attention_mask = batch[1].to(device)
            token_type_ids = batch[2].to(device)
            answer = batch[3].to(device)
            predictions = model(input_ids=input_id, attention_mask = attention_mask,token_type_ids=token_type_ids)
            loss = loss_func(predictions,answer)
            acc = self.accuracy(predictions, answer)
            loss.backward()
            optimizer.step()
            epoch_loss += loss.item()
            epoch_acc += acc.item()
        train_loss = epoch_loss / len(train)
        train_acc = epoch_acc / len(train)
        # print(f'Epoch: {print_epoch+1:02}')
        # print(f'\tTrain Loss: {train_loss:.3f} | Train Acc: {train_acc*100:.2f}%')
        return train_loss,train_acc

    def QA_Train(self,train,val):
        epochs = self.epoch
        train_losses = []
        train_accs = []
        valid_losses = []
        valid_accs = []
        best_valid_acc = float('-inf')
        model = self.model
        device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
        model = DataParallel(model)
        model = model.to(device)
        learning_rate = self.learning_rate
        weight_decay = self.weight_decay
        optimizer = AdamW(model.parameters(), lr=learning_rate, eps=weight_decay)
        loss_func = nn.CrossEntropyLoss()
        for epoch in tqdm(range(epochs)):

            train_loss, train_acc = self.QA_model_train(model,device,optimizer,loss_func,train,epoch)
            valid_loss, valid_acc = self.QA_evaluate(model,device,loss_func,val)

            train_losses.append(train_loss)
            train_accs.append(train_acc)
            valid_losses.extend(valid_loss)
            valid_accs.extend(valid_acc)
    
            epoch_valid_loss = np.mean(valid_loss)
            epoch_valid_acc = np.mean(valid_acc)
            
            if epoch_valid_acc > best_valid_acc:
                best_valid_acc = epoch_valid_acc
                torch.save(model.state_dict(), 'QA_Best_Model.pt')
            print('\n')
            print(f'epoch: {epoch+1}')
            print(f'train_loss: {train_loss:.3f}, train_acc: {train_acc:.3f}')
            print(f'valid_loss: {epoch_valid_loss:.3f}, valid_acc: {epoch_valid_acc:.3f}')

    def QA_Test(self,eval):
        device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
        loss_func = nn.CrossEntropyLoss()
        new_model = QA_Model.from_pretrained(self.model_ver,num_labels=self.num_labels,hidden_size=self.hidden_size,hidden_dropout_prob=self.hidden_dropout_prob,max_length=self.max_length)
        loaded_state_dict = torch.load('QA_Best_Model.pt',map_location=torch.device('cpu'))
        remove_module_state_dict = {}
        for key in loaded_state_dict.keys():
            key_remove_module = re.sub('module.','',key)
            value = loaded_state_dict[key]
            remove_module_state_dict[key_remove_module] = value
        new_model.load_state_dict(remove_module_state_dict)
        new_model = new_model.to(device)
        eval_loss, eval_acc = self.QA_evaluate(new_model,device,loss_func,eval)
        epoch_test_loss = np.mean(eval_loss)
        epoch_test_acc = np.mean(eval_acc)

        print(f'test_loss: {epoch_test_loss:.3f}, test_acc: {epoch_test_acc:.3f}')

# roberta
class QA_trainer1():
    def __init__(self,num_labels,hidden_size,max_length,hidden_dropout_prob,batch_size,epoch,model_ver,learning_rate,weight_decay):
        self.hidden_size = hidden_size
        self.max_length = max_length
        self.hidden_dropout_prob = hidden_dropout_prob
        self.num_labels = num_labels
        self.batch_size = batch_size
        self.epoch = epoch
        self.model_ver = model_ver
        self.learning_rate = learning_rate
        self.weight_decay = weight_decay
        self.model = QA_Model1.from_pretrained(self.model_ver,num_labels=self.num_labels,hidden_size=self.hidden_size,hidden_dropout_prob=self.hidden_dropout_prob,max_length=self.max_length)
        self.tokenizer = AutoTokenizer.from_pretrained("klue/roberta-large")
        self.sep_token = self.tokenizer.sep_token
        
    def make_dataset(self):
        '''
            data =
            [
            sentence_id : str
            Context : str
            Question : str
            ( True : 1 or False : 0 ) : int 
            ]
        '''
        train_data = []
        with open('./data/task4/SKT_BoolQ_Train.tsv',encoding='UTF8') as f:
            infos = f.readlines()
            for info in infos[1:]:
                pre_data = info.split('\t')[:-1]
                last_data = int(re.sub('\n','',info.split('\t')[-1]))
                pre_data.append(last_data)
                train_data.append(pre_data)
        val_data = []
        with open('./data/task4/SKT_BoolQ_Dev.tsv',encoding='UTF8') as f:
            infos = f.readlines()
            for info in infos[1:]:
                pre_data = info.split('\t')[:-1]
                last_data = int(re.sub('\n','',info.split('\t')[-1]))
                pre_data.append(last_data)
                val_data.append(pre_data)
        test_data = []
        with open('./data/task4/SKT_BoolQ_Test.tsv',encoding='UTF8') as f:
            infos = f.readlines()
            for info in infos[1:]:
                pre_data = info.split('\t')[:-1]
                test_data.append(pre_data)
        #train_data
        train_data_context = [data[1] for data in train_data]
        for n, context in enumerate(train_data_context):
            sen_list = context.split('.')
            new_context = ''
            if len(sen_list)>=2:
                for sen in sen_list[:-2]:
                    new_context = new_context+sen+'.'+' [CLS]'
                new_context = new_context+sen_list[-2]+'.'
                train_data_context[n] = new_context
        train_data_question = [data[2] for data in train_data]
        train_result = {'input_ids' : torch.tensor([]) , 'attention_mask' : torch.tensor([]),'token_type_ids': torch.tensor([]),'answer' : torch.tensor([])}
        for data in range(len(train_data_context)):
            train_data_tokenized = self.tokenizer.encode_plus(train_data_context[data],train_data_question[data],return_token_type_ids=True,max_length= self.max_length, padding ='max_length', return_attention_mask=True, truncation=True,return_tensors='pt' )
            truncated_input_ids = train_data_tokenized['input_ids']
            truncated_attention_masks = train_data_tokenized['attention_mask']
            truncated_token_type_ids = train_data_tokenized['token_type_ids']
            train_result['input_ids'] = torch.cat([train_result['input_ids'], truncated_input_ids], dim = 0) 
            train_result['attention_mask'] = torch.cat([train_result['attention_mask'], truncated_attention_masks], dim = 0)
            train_result['token_type_ids'] = torch.cat([train_result['token_type_ids'], truncated_token_type_ids], dim = 0)
        train_result['input_ids'] = train_result['input_ids'].long()
        train_result['attention_mask'] = train_result['attention_mask'].long()
        train_result['token_type_ids'] = train_result['token_type_ids'].long()
        train_data_answer = torch.tensor([data[3] for data in train_data])
        train_result['answer'] = train_data_answer

        #val_data
        val_data_context = [data[1] for data in val_data]
        for n, context in enumerate(val_data_context):
            sen_list = context.split('.')
            new_context = ''
            if len(sen_list)>=2:
                for sen in sen_list[:-2]:
                    new_context = new_context+sen+'.'+' [CLS]'
                new_context = new_context+sen_list[-2]+'.'
                val_data_context[n] = new_context
        val_data_question = [data[2] for data in val_data]
        val_result = {'input_ids' : torch.tensor([]) , 'attention_mask' : torch.tensor([]),'token_type_ids': torch.tensor([]),'answer' : torch.tensor([])}
        for data in range(len(val_data_context)):
            val_data_tokenized = self.tokenizer.encode_plus(val_data_context[data],val_data_question[data],return_token_type_ids=True,max_length= self.max_length, padding ='max_length', return_attention_mask=True, truncation=True,return_tensors='pt' )
            truncated_input_ids = val_data_tokenized['input_ids']
            truncated_attention_masks = val_data_tokenized['attention_mask']
            truncated_token_type_ids = val_data_tokenized['token_type_ids']
            val_result['input_ids'] = torch.cat([val_result['input_ids'], truncated_input_ids], dim = 0) 
            val_result['attention_mask'] = torch.cat([val_result['attention_mask'], truncated_attention_masks], dim = 0)
            val_result['token_type_ids'] = torch.cat([val_result['token_type_ids'], truncated_token_type_ids], dim = 0)
        val_result['input_ids'] = val_result['input_ids'].long()
        val_result['attention_mask'] = val_result['attention_mask'].long()
        val_result['token_type_ids'] = val_result['token_type_ids'].long()
        val_data_answer = torch.tensor([data[3] for data in val_data])
        val_result['answer'] = val_data_answer

        #test_data
        test_data_context = [data[1] for data in test_data]
        for n, context in enumerate(test_data_context):
            sen_list = context.split('.')
            new_context = ''
            if len(sen_list)>=2:
                for sen in sen_list[:-2]:
                    new_context = new_context+sen+'.'+' [CLS]'
                new_context = new_context+sen_list[-2]+'.'
                test_data_context[n] = new_context
        test_data_question = [data[2] for data in test_data]
        test_result = {'input_ids' : torch.tensor([]) , 'attention_mask' : torch.tensor([]),'token_type_ids': torch.tensor([])}
        for data in range(len(test_data_context)):
            test_data_tokenized = self.tokenizer.encode_plus(test_data_context[data],test_data_question[data],return_token_type_ids=True,max_length= self.max_length, padding ='max_length', return_attention_mask=True, truncation=True,return_tensors='pt' )
            truncated_input_ids = test_data_tokenized['input_ids']
            truncated_attention_masks = test_data_tokenized['attention_mask']
            truncated_token_type_ids = test_data_tokenized['token_type_ids']
            test_result['input_ids'] = torch.cat([test_result['input_ids'], truncated_input_ids], dim = 0) 
            test_result['attention_mask'] = torch.cat([test_result['attention_mask'], truncated_attention_masks], dim = 0)
            test_result['token_type_ids'] = torch.cat([test_result['token_type_ids'], truncated_token_type_ids], dim = 0)
        test_result['input_ids'] = test_result['input_ids'].long()
        test_result['attention_mask'] = test_result['attention_mask'].long()
        test_result['token_type_ids'] = test_result['token_type_ids'].long()

        train_dataset = TensorDataset(train_result["input_ids"], train_result["attention_mask"],train_result['token_type_ids'],train_result['answer'])
        val_dataset = TensorDataset(val_result["input_ids"], val_result["attention_mask"],val_result['token_type_ids'],val_result['answer'])
        test_dataset = TensorDataset(test_result["input_ids"], test_result["attention_mask"],test_result['token_type_ids'])

        train_data_loader = DataLoader(train_dataset, batch_size=self.batch_size, shuffle=False, drop_last=False)
        val_data_loader = DataLoader(val_dataset, batch_size=self.batch_size, shuffle=False, drop_last=False)
        test_data_loader = DataLoader(test_dataset, batch_size=self.batch_size, shuffle=False, drop_last=False)

        return train_data_loader, val_data_loader, test_data_loader

    def accuracy(self,predict,label):
        predict_answer = predict.argmax(dim=-1)
        correct = predict_answer.eq(label.view_as(predict_answer)).sum()
        return correct.float() / label.shape[0]

    def QA_evaluate(self,model,device,loss_func,data):
        model.eval()
        epoch_losses = []
        epoch_accs = []
        with torch.no_grad():
            for batch in data:
            # for batch in tqdm(data,desc='dev_batch'):
                input_id = batch[0].to(device)
                attention_mask = batch[1].to(device)
                token_type_ids = batch[2].to(device)
                answer = batch[3].to(device)
                predictions = model(input_ids=input_id, attention_mask = attention_mask,token_type_ids=token_type_ids)
                loss = loss_func(predictions,answer)
                acc = self.accuracy(predictions, answer)
                epoch_losses.append(loss.item())
                epoch_accs.append(acc.item())
        return epoch_losses, epoch_accs

    def QA_model_train(self,model,device,optimizer,loss_func,train,print_epoch): #validation 추가
        epoch_loss = 0
        epoch_acc = 0
        model.train()
        for batch in train:
        # for batch in tqdm(train,desc='train_batch'):
            optimizer.zero_grad()
            input_id = batch[0].to(device)
            attention_mask = batch[1].to(device)
            token_type_ids = batch[2].to(device)
            answer = batch[3].to(device)
            predictions = model(input_ids=input_id, attention_mask = attention_mask,token_type_ids=token_type_ids)
            loss = loss_func(predictions,answer)
            acc = self.accuracy(predictions, answer)
            loss.backward()
            optimizer.step()
            epoch_loss += loss.item()
            epoch_acc += acc.item()
        train_loss = epoch_loss / len(train)
        train_acc = epoch_acc / len(train)
        # print(f'Epoch: {print_epoch+1:02}')
        # print(f'\tTrain Loss: {train_loss:.3f} | Train Acc: {train_acc*100:.2f}%')
        return train_loss,train_acc

    def QA_Train(self,train,val):
        epochs = self.epoch
        train_losses = []
        train_accs = []
        valid_losses = []
        valid_accs = []
        best_valid_acc = float('-inf')
        model = self.model
        device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
        model = DataParallel(model)
        model = model.to(device)
        learning_rate = self.learning_rate
        weight_decay = self.weight_decay
        optimizer = AdamW(model.parameters(), lr=learning_rate, eps=weight_decay)
        loss_func = nn.CrossEntropyLoss()
        for epoch in tqdm(range(epochs)):

            train_loss, train_acc = self.QA_model_train(model,device,optimizer,loss_func,train,epoch)
            valid_loss, valid_acc = self.QA_evaluate(model,device,loss_func,val)

            train_losses.append(train_loss)
            train_accs.append(train_acc)
            valid_losses.extend(valid_loss)
            valid_accs.extend(valid_acc)
    
            epoch_valid_loss = np.mean(valid_loss)
            epoch_valid_acc = np.mean(valid_acc)
            
            if epoch_valid_acc > best_valid_acc:
                best_valid_acc = epoch_valid_acc
                torch.save(model.state_dict(), 'QA_Best_Model.pt')
            print('\n')
            print(f'epoch: {epoch+1}')
            print(f'train_loss: {train_loss:.3f}, train_acc: {train_acc:.3f}')
            print(f'valid_loss: {epoch_valid_loss:.3f}, valid_acc: {epoch_valid_acc:.3f}')

    def QA_Test(self,eval):
        device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
        loss_func = nn.CrossEntropyLoss()
        new_model = QA_Model.from_pretrained(self.model_ver,num_labels=self.num_labels,hidden_size=self.hidden_size,hidden_dropout_prob=self.hidden_dropout_prob,max_length=self.max_length)
        loaded_state_dict = torch.load('QA_Best_Model.pt',map_location=torch.device('cpu'))
        remove_module_state_dict = {}
        for key in loaded_state_dict.keys():
            key_remove_module = re.sub('module.','',key)
            value = loaded_state_dict[key]
            remove_module_state_dict[key_remove_module] = value
        new_model.load_state_dict(remove_module_state_dict)
        new_model = new_model.to(device)
        eval_loss, eval_acc = self.QA_evaluate(new_model,device,loss_func,eval)
        epoch_test_loss = np.mean(eval_loss)
        epoch_test_acc = np.mean(eval_acc)

        print(f'test_loss: {epoch_test_loss:.3f}, test_acc: {epoch_test_acc:.3f}')

# Train

In [30]:
def Training():
    trainer = QA_trainer(num_labels=2,hidden_size=768,hidden_dropout_prob=0.0,
                         max_length=512,batch_size=2,epoch=10,model_ver='monologg/koelectra-base-v3-discriminator',learning_rate=5e-6,weight_decay=5e-9)
    train,val,test = trainer.make_dataset()
    trainer.QA_Train(train,val)
    trainer.QA_Test(val)

In [31]:
Training()

You are using a model of type electra to instantiate a model of type bert. This is not supported for all configurations of models and can yield errors.
Some weights of the model checkpoint at monologg/koelectra-base-v3-discriminator were not used when initializing ElectraForSequenceClassification: ['discriminator_predictions.dense_prediction.bias', 'discriminator_predictions.dense.weight', 'discriminator_predictions.dense_prediction.weight', 'discriminator_predictions.dense.bias']
- This IS expected if you are initializing ElectraForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing ElectraForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of



epoch: 1
train_loss: 0.630, train_acc: 0.614
valid_loss: 0.491, valid_acc: 0.774



 20%|██        | 2/10 [12:21<49:16, 369.62s/it]



epoch: 2
train_loss: 0.347, train_acc: 0.855
valid_loss: 0.508, valid_acc: 0.804



 30%|███       | 3/10 [18:31<43:08, 369.80s/it]



epoch: 3
train_loss: 0.186, train_acc: 0.934
valid_loss: 0.638, valid_acc: 0.796



 40%|████      | 4/10 [24:44<37:04, 370.77s/it]



epoch: 4
train_loss: 0.108, train_acc: 0.965
valid_loss: 0.723, valid_acc: 0.813



 50%|█████     | 5/10 [30:55<30:53, 370.70s/it]



epoch: 5
train_loss: 0.072, train_acc: 0.979
valid_loss: 0.777, valid_acc: 0.801



 60%|██████    | 6/10 [37:07<24:45, 371.29s/it]



epoch: 6
train_loss: 0.053, train_acc: 0.985
valid_loss: 0.827, valid_acc: 0.816



 70%|███████   | 7/10 [43:47<18:59, 379.70s/it]



epoch: 7
train_loss: 0.044, train_acc: 0.985
valid_loss: 0.964, valid_acc: 0.801



 80%|████████  | 8/10 [49:57<12:33, 376.92s/it]



epoch: 8
train_loss: 0.035, train_acc: 0.990
valid_loss: 0.909, valid_acc: 0.814



 90%|█████████ | 9/10 [56:08<06:14, 375.00s/it]



epoch: 9
train_loss: 0.024, train_acc: 0.993
valid_loss: 1.024, valid_acc: 0.807


100%|██████████| 10/10 [1:02:21<00:00, 374.16s/it]



epoch: 10
train_loss: 0.020, train_acc: 0.994
valid_loss: 1.001, valid_acc: 0.817



You are using a model of type electra to instantiate a model of type bert. This is not supported for all configurations of models and can yield errors.
Some weights of the model checkpoint at monologg/koelectra-base-v3-discriminator were not used when initializing ElectraForSequenceClassification: ['discriminator_predictions.dense_prediction.bias', 'discriminator_predictions.dense.weight', 'discriminator_predictions.dense_prediction.weight', 'discriminator_predictions.dense.bias']
- This IS expected if you are initializing ElectraForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing ElectraForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights o

test_loss: 1.001, test_acc: 0.817


# Load  Model

In [15]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
loss_func = nn.CrossEntropyLoss()
new_model = QA_Model.from_pretrained('monologg/koelectra-base-v3-discriminator',num_labels=2,hidden_size=768,hidden_dropout_prob=0.0,max_length=512)
loaded_state_dict = torch.load('./model/task4_best_model.pt',map_location=torch.device('cpu'))
# new_model = QA_Model1.from_pretrained('klue/roberta-large',num_labels=2,hidden_size=768,hidden_dropout_prob=0.0,max_length=512)
# loaded_state_dict = torch.load('./QA_Best_Model.pt',map_location=torch.device('cpu'))
remove_module_state_dict = {}
for key in loaded_state_dict.keys():
    key_remove_module = re.sub('module.','',key)
    value = loaded_state_dict[key]
    remove_module_state_dict[key_remove_module] = value
new_model.load_state_dict(remove_module_state_dict)
new_model = new_model.to(device)

You are using a model of type electra to instantiate a model of type bert. This is not supported for all configurations of models and can yield errors.
Some weights of the model checkpoint at monologg/koelectra-base-v3-discriminator were not used when initializing ElectraForSequenceClassification: ['discriminator_predictions.dense_prediction.bias', 'discriminator_predictions.dense.weight', 'discriminator_predictions.dense_prediction.weight', 'discriminator_predictions.dense.bias']
- This IS expected if you are initializing ElectraForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing ElectraForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of

# Test

In [16]:
trainer = QA_trainer(num_labels=2,hidden_size=768,hidden_dropout_prob=0.0,
                         max_length=512,batch_size=4,epoch=8,model_ver='monologg/koelectra-base-v3-discriminator',learning_rate=1e-5,weight_decay=5e-9)
# trainer = QA_trainer1(num_labels=2,hidden_size=768,hidden_dropout_prob=0.0,
#                          max_length=512,batch_size=4,epoch=8,model_ver='klue/roberta-large',learning_rate=1e-5,weight_decay=5e-9)
train,val,test = trainer.make_dataset()
eval_loss, eval_acc = trainer.QA_evaluate(new_model,device,loss_func,val)
epoch_test_loss = np.mean(eval_loss)
epoch_test_acc = np.mean(eval_acc)

print(f'test_loss: {epoch_test_loss:.3f}, test_acc: {epoch_test_acc:.3f}')

You are using a model of type electra to instantiate a model of type bert. This is not supported for all configurations of models and can yield errors.
Some weights of the model checkpoint at monologg/koelectra-base-v3-discriminator were not used when initializing ElectraForSequenceClassification: ['discriminator_predictions.dense_prediction.bias', 'discriminator_predictions.dense.weight', 'discriminator_predictions.dense_prediction.weight', 'discriminator_predictions.dense.bias']
- This IS expected if you are initializing ElectraForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing ElectraForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of

test_loss: 0.881, test_acc: 0.824
