# [모듈 2.1] 모델 훈련 스크래치




이 노트북은 아래와 같은 주요 작업을 합니다.
- 1. 환경 설정
- 2. 데이터 로딩
- 3. Hugging Face Electra tokenizer 및 pre-trained model 사용
- 4. torch custome Dataset 생성 및 훈련 준비
- 5. 모델 Fine-Tuning
    - 5.1. Fine-tuning with Trainer
    - 5.2. 파이썬 스크립트로 훈련    
    - 5.3. Fine-tuning with native PyTorch    

    
---
### 참고:
- 커스텀 데이터 셋으로 파인 튜닝을 위한 참조 자료
    - [Fine-tuning with custom datasets](https://huggingface.co/transformers/v3.2.0/custom_datasets.html)

# 1. 환경 설정

In [1]:
%load_ext autoreload
%autoreload 2

# src 폴더 경로 설정
import torch
import sys
sys.path.append('./src')
import config
from  data_util import read_nsmc_split

In [2]:
import logging
logger = logging.getLogger(__name__)
logger.setLevel(logging.INFO)
# logger.setLevel(logging.WARNING)
logger.addHandler(logging.StreamHandler(sys.stdout))

In [3]:
%store -r local_train_output_path
%store -r local_test_output_path



# 2. 데이터 로딩


## 2.1. 학습 데이터 로딩

In [4]:
train_texts, train_labels = read_nsmc_split(local_train_output_path)

In [5]:
logger.info(f"len: {len(train_texts)} \nSample: {train_texts[0:5]}")
logger.info(f"len: {len(train_labels)} \nSample: {train_labels[0:5]}")

len: 149552 
Sample: ['흠   포스터보고 초딩영화줄    오버연기조차 가볍지 않구나', '너무재밓었다그래서보는것을추천한다', '교도소 이야기구먼   솔직히 재미는 없다  평점 조정', '사이몬페그의 익살스런 연기가 돋보였던 영화 스파이더맨에서 늙어보이기만 했던 커스틴 던스트가 너무나도 이뻐보였다', '막 걸음마 뗀 세부터 초등학교 학년생인 살용영화 ㅋㅋㅋ   별반개도 아까움']
len: 149552 
Sample: [1, 0, 0, 1, 0]


## 2.2. 검증 데이터 셋 생성

In [6]:
from sklearn.model_selection import train_test_split
train_texts, val_texts, train_labels, val_labels = train_test_split(train_texts, train_labels, test_size=.2)

# 3. Hugging Face Electra tokenizer 및 pre-trained model 사용

## 3.1. Electra 라이브러리 로딩

In [7]:
# from datasets import load_dataset
from transformers import (
    ElectraModel, 
    ElectraTokenizer, 
    ElectraForSequenceClassification, 
    Trainer, 
    TrainingArguments, 
    set_seed
)
# from transformers.trainer_utils import get_last_checkpoint



## 3.2. Pre-trained model_id, tokenizer_id 지정
- [KoElectra Git](https://github.com/monologg/KoELECTRA)
- KoElectra Model
    - Small:
        - "monologg/koelectra-small-v3-discriminator
    - Base: 
        - monologg/koelectra-base-v3-discriminator
        


In [8]:
tokenizer_id = 'monologg/koelectra-small-v3-discriminator'
model_id = "monologg/koelectra-small-v3-discriminator"


## 3.3. Electra Model 입력 인코딩 생성

In [9]:
%%time 

tokenizer = ElectraTokenizer.from_pretrained(tokenizer_id)

train_encodings = tokenizer(train_texts, truncation=True, padding=True)
val_encodings = tokenizer(val_texts, truncation=True, padding=True)
# test_encodings = tokenizer(test_texts, truncation=True, padding=True)

Downloading:   0%|          | 0.00/257k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/61.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/458 [00:00<?, ?B/s]



CPU times: user 33.1 s, sys: 316 ms, total: 33.4 s
Wall time: 33.6 s


In [10]:
logger.info(f'type of train_encoding: {type(val_encodings)}')

type of train_encoding: <class 'transformers.tokenization_utils_base.BatchEncoding'>


# 4. torch custome Dataset 생성 및 훈련 준비

## 4.1. torch custome dataset 생성

In [11]:
from data_util import NSMCDataset

train_dataset = NSMCDataset(train_encodings, train_labels)
val_dataset = NSMCDataset(val_encodings, val_labels)

In [12]:
logger.info(f"len(train_dataset) : {len(train_dataset)}")
logger.info(f"len(val_dataset) : {len(val_dataset)}")


len(train_dataset) : 119641
len(val_dataset) : 29911


## 4.2. 데이터 셋 부가 정보 생성

In [13]:
from train_util import create_train_meta
# Prepare model labels - useful in inference API
seed = 100

# Set seed before initializing model
set_seed(seed)
    
num_labels, label2id, id2label = create_train_meta(train_dataset)

# 5. 모델 Fine-Tuning

## 5.1. Fine-tuning with Trainer

In [14]:
%%time

training_args = TrainingArguments(
    output_dir='./results',          # output directory
    num_train_epochs=1,              # total number of training epochs
    per_device_train_batch_size=256,  # batch size per device during training
    per_device_eval_batch_size=256,   # batch size for evaluation
    warmup_steps=500,                # number of warmup steps for learning rate scheduler
    weight_decay=0.01,               # strength of weight decay
    logging_dir='./logs',            # directory for storing logs
    logging_steps=100,
)

model = ElectraForSequenceClassification.from_pretrained(
    model_id, num_labels=num_labels, label2id=label2id, id2label=id2label
)

trainer = Trainer(
    model=model,                         # the instantiated 🤗 Transformers model to be trained
    args=training_args,                  # training arguments, defined above
    train_dataset=train_dataset,         # training dataset
    eval_dataset=val_dataset             # evaluation dataset
)

trainer.train()

Downloading:   0%|          | 0.00/54.0M [00:00<?, ?B/s]

Some weights of the model checkpoint at monologg/koelectra-small-v3-discriminator were not used when initializing ElectraForSequenceClassification: ['discriminator_predictions.dense_prediction.bias', 'discriminator_predictions.dense.bias', 'discriminator_predictions.dense_prediction.weight', 'discriminator_predictions.dense.weight']
- This IS expected if you are initializing ElectraForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing ElectraForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of ElectraForSequenceClassification were not initialized from the model checkpoint at monologg/koelectra-small-v3-discriminator and are newly initialized

Step,Training Loss
100,0.6923
200,0.5977
300,0.4434
400,0.3888




Training completed. Do not forget to share your model on huggingface.co/models =)




CPU times: user 1min 52s, sys: 36.3 s, total: 2min 28s
Wall time: 2min 26s


TrainOutput(global_step=468, training_loss=0.5070040490892198, metrics={'train_runtime': 140.3045, 'train_samples_per_second': 852.724, 'train_steps_per_second': 3.336, 'total_flos': 845575811718948.0, 'train_loss': 0.5070040490892198, 'epoch': 1.0})

## 5.2. 파이썬 스크립트 및 Trainer 훈련¶

In [15]:
class ParamsScript:
    def __init__(self):
        self.epochs = 1        
        self.train_batch_size = 256
        self.eval_batch_size = 256
        self.test_batch_size = 256         
        self.learning_rate = 5e-5
        self.warmup_steps = 0      
        self.weight_decay = 0.01
        self.fp16 = True
        self.tokenizer_id = 'monologg/koelectra-small-v3-discriminator'
        self.model_id = 'monologg/koelectra-small-v3-discriminator'     
        # SageMaker Container environment        
        self.output_data_dir = f"{config.output_data_dir}"                                            
        self.model_dir = f"{config.model_dir}"                                       
        self.train_data_dir = f"{config.train_data_dir}"               
        self.checkpoint_dir = f"{config.checkpoint_dir}"                                               
        self.is_evaluation = True
        self.is_test = True
        self.test_data_dir = f"{config.test_data_dir}"                               
        self.eval_ratio = 0.5
        self.use_subset_train_sampler = False
        self.disable_tqdm = True        
        self.logging_steps = 50
        self.seed = 100
                        
script_args = ParamsScript()
print("# of epochs: ", script_args.epochs)

# of epochs:  1


In [16]:
%%time 
from train_lib import train_Trainer
train_Trainer(script_args)

##### Args: 
 {'epochs': 1, 'train_batch_size': 256, 'eval_batch_size': 256, 'test_batch_size': 256, 'learning_rate': 5e-05, 'warmup_steps': 0, 'weight_decay': 0.01, 'fp16': True, 'tokenizer_id': 'monologg/koelectra-small-v3-discriminator', 'model_id': 'monologg/koelectra-small-v3-discriminator', 'output_data_dir': 'output/nsmc', 'model_dir': 'models/nsmc', 'train_data_dir': 'data/nsmc/train', 'checkpoint_dir': 'checkpoint/nsmc', 'is_evaluation': True, 'is_test': True, 'test_data_dir': 'data/nsmc/test', 'eval_ratio': 0.5, 'use_subset_train_sampler': False, 'disable_tqdm': True, 'logging_steps': 50, 'seed': 100}
device: cuda
train_data_filenames ['data/nsmc/train/ratings_train.txt']
train_file_path data/nsmc/train/ratings_train.txt
len: 149552 
Sample: ['흠   포스터보고 초딩영화줄    오버연기조차 가볍지 않구나', '너무재밓었다그래서보는것을추천한다', '교도소 이야기구먼   솔직히 재미는 없다  평점 조정', '사이몬페그의 익살스런 연기가 돋보였던 영화 스파이더맨에서 늙어보이기만 했던 커스틴 던스트가 너무나도 이뻐보였다', '막 걸음마 뗀 세부터 초등학교 학년생인 살용영화 ㅋㅋㅋ   별반개도 아까움']
len: 149552 
Sample: [1, 0, 0, 1, 0]

loading file https://huggingface.co/monologg/koelectra-small-v3-discriminator/resolve/main/vocab.txt from cache at /home/ec2-user/.cache/huggingface/transformers/32dc9196217c0cc26c7dd705168e8615ea2d82613aa5b672d7647b8e8d58545f.541023ff50f833a9bab3e48e78ae1856cf6744bdb336c86e797eaf675b62b2b8
loading file https://huggingface.co/monologg/koelectra-small-v3-discriminator/resolve/main/added_tokens.json from cache at None
loading file https://huggingface.co/monologg/koelectra-small-v3-discriminator/resolve/main/special_tokens_map.json from cache at None
loading file https://huggingface.co/monologg/koelectra-small-v3-discriminator/resolve/main/tokenizer_config.json from cache at /home/ec2-user/.cache/huggingface/transformers/a6c32c62ff893fb2aa32dabc5722c9f9eb7243cc1cb4514b9ba3fdb1b52704d6.35f013c4fd3572cfdddbbdf6223ef162dd4fb536bf83007533f201addf3287b7
loading file https://huggingface.co/monologg/koelectra-small-v3-discriminator/resolve/main/tokenizer.json from cache at None
loading configura

size of train_dataset : 74776
dataset size with frac: 1 ==> 74776
num_labels: 2
label2id: {'negative': '0', 'positive': '1'}
id2label: {'0': 'negative', '1': 'positive'}


loading configuration file https://huggingface.co/monologg/koelectra-small-v3-discriminator/resolve/main/config.json from cache at /home/ec2-user/.cache/huggingface/transformers/bd0f09888c5a5619ddb9de81d4a9936a94e5f45064f9a23ba6d39241ceebce02.d2485d28e5c07ca60bfa4fe84af673e0df83401e5c56bcdd991878cb4966eb34
Model config ElectraConfig {
  "architectures": [
    "ElectraForPreTraining"
  ],
  "attention_probs_dropout_prob": 0.1,
  "classifier_dropout": null,
  "embedding_size": 128,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 256,
  "id2label": {
    "0": "negative",
    "1": "positive"
  },
  "initializer_range": 0.02,
  "intermediate_size": 1024,
  "label2id": {
    "negative": "0",
    "positive": "1"
  },
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "electra",
  "num_attention_heads": 4,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "position_embedding_type": "absolute",
  "summary_activation": "gelu",
  "summary_last_drop



loading file https://huggingface.co/monologg/koelectra-small-v3-discriminator/resolve/main/vocab.txt from cache at /home/ec2-user/.cache/huggingface/transformers/32dc9196217c0cc26c7dd705168e8615ea2d82613aa5b672d7647b8e8d58545f.541023ff50f833a9bab3e48e78ae1856cf6744bdb336c86e797eaf675b62b2b8
loading file https://huggingface.co/monologg/koelectra-small-v3-discriminator/resolve/main/added_tokens.json from cache at None
loading file https://huggingface.co/monologg/koelectra-small-v3-discriminator/resolve/main/special_tokens_map.json from cache at None
loading file https://huggingface.co/monologg/koelectra-small-v3-discriminator/resolve/main/tokenizer_config.json from cache at /home/ec2-user/.cache/huggingface/transformers/a6c32c62ff893fb2aa32dabc5722c9f9eb7243cc1cb4514b9ba3fdb1b52704d6.35f013c4fd3572cfdddbbdf6223ef162dd4fb536bf83007533f201addf3287b7
loading file https://huggingface.co/monologg/koelectra-small-v3-discriminator/resolve/main/tokenizer.json from cache at None
loading configura

size of val_dataset : 74776
dataset size with frac: 1 ==> 74776


PyTorch: setting up devices
The default value for the training argument `--report_to` will change in v5 (from all installed integrations to none). In v5, you will need to use `--report_to all` to get the same behavior as now. You should start updating your code and make this info disappear :-).
Using amp fp16 backend
***** Running training *****
  Num examples = 74776
  Num Epochs = 1
  Instantaneous batch size per device = 256
  Total train batch size (w. parallel, distributed & accumulation) = 256
  Gradient Accumulation steps = 1
  Total optimization steps = 293


{'loss': 0.6625, 'learning_rate': 4.1467576791808876e-05, 'epoch': 0.17}
{'loss': 0.496, 'learning_rate': 3.293515358361775e-05, 'epoch': 0.34}
{'loss': 0.4314, 'learning_rate': 2.4402730375426623e-05, 'epoch': 0.51}
{'loss': 0.405, 'learning_rate': 1.5870307167235497e-05, 'epoch': 0.68}
{'loss': 0.4003, 'learning_rate': 7.337883959044369e-06, 'epoch': 0.85}


***** Running Evaluation *****
  Num examples = 74776
  Batch size = 256
Saving model checkpoint to checkpoint/nsmc/checkpoint-293
Configuration saved in checkpoint/nsmc/checkpoint-293/config.json
Model weights saved in checkpoint/nsmc/checkpoint-293/pytorch_model.bin


{'eval_loss': 0.3717063069343567, 'eval_accuracy': 0.8420482507756499, 'eval_f1': 0.8415715416292202, 'eval_precision': 0.8426226866152731, 'eval_recall': 0.8405230159155458, 'eval_runtime': 28.193, 'eval_samples_per_second': 2652.286, 'eval_steps_per_second': 10.393, 'epoch': 1.0}




Training completed. Do not forget to share your model on huggingface.co/models =)


Loading best model from checkpoint/nsmc/checkpoint-293 (score: 0.8420482507756499).


{'train_runtime': 87.1084, 'train_samples_per_second': 858.424, 'train_steps_per_second': 3.364, 'train_loss': 0.4650391067661116, 'epoch': 1.0}
test_data_filenames ['data/nsmc/test/ratings_test.txt']
test_file_path data/nsmc/test/ratings_test.txt
len: 49832 
Sample: ['뭐야 이 평점들은     나쁘진 않지만 점 짜리는 더더욱 아니잖아', '지루하지는 않은데 완전 막장임    돈주고 보기에는', '만 아니었어도 별 다섯 개 줬을텐데   왜 로 나와서 제 심기를 불편하게 하죠', '음악이 주가 된  최고의 음악영화', '진정한 쓰레기']
len: 49832 
Sample: [0, 0, 0, 1, 0]


loading file https://huggingface.co/monologg/koelectra-small-v3-discriminator/resolve/main/vocab.txt from cache at /home/ec2-user/.cache/huggingface/transformers/32dc9196217c0cc26c7dd705168e8615ea2d82613aa5b672d7647b8e8d58545f.541023ff50f833a9bab3e48e78ae1856cf6744bdb336c86e797eaf675b62b2b8
loading file https://huggingface.co/monologg/koelectra-small-v3-discriminator/resolve/main/added_tokens.json from cache at None
loading file https://huggingface.co/monologg/koelectra-small-v3-discriminator/resolve/main/special_tokens_map.json from cache at None
loading file https://huggingface.co/monologg/koelectra-small-v3-discriminator/resolve/main/tokenizer_config.json from cache at /home/ec2-user/.cache/huggingface/transformers/a6c32c62ff893fb2aa32dabc5722c9f9eb7243cc1cb4514b9ba3fdb1b52704d6.35f013c4fd3572cfdddbbdf6223ef162dd4fb536bf83007533f201addf3287b7
loading file https://huggingface.co/monologg/koelectra-small-v3-discriminator/resolve/main/tokenizer.json from cache at None
loading configura

size of test_dataset : 49832
dataset size with frac: 1 ==> 49832


***** Running Prediction *****
  Num examples = 49832
  Batch size = 256


Test Metrics: {'accuracy': 0.841, 'f1': 0.842, 'precision': 0.844, 'recall': 0.84}


Saving model checkpoint to models/nsmc
Configuration saved in models/nsmc/config.json
Model weights saved in models/nsmc/pytorch_model.bin


CPU times: user 2min 12s, sys: 21.1 s, total: 2min 33s
Wall time: 2min 34s


## 5.3. Fine-tuning with native PyTorch

### train data loader 생성
- 디버깅을 위해 일부 데이터 셋 사용시
    - train_sample_loader
    - eval_sample_loader
- 풀 데이터 셋 사용시
    - train_loader
    - eval_loader

In [17]:
from torch.utils.data import DataLoader, SubsetRandomSampler


from train_util import create_random_sampler
    
subset_train_sampler = create_random_sampler(train_dataset, frac=0.01, is_shuffle=True, logger=logger)
train_sampler = create_random_sampler(train_dataset, frac=1, is_shuffle=True, logger=logger)

subset_eval_sampler = create_random_sampler(val_dataset, frac=0.001, is_shuffle=False, logger=logger)
eval_sampler = create_random_sampler(val_dataset, frac=1, is_shuffle=False, logger=logger)

# subset_test_sampler = create_random_sampler(test_dataset, frac=0.001, is_shuffle=False, logger=logger)
# test_sampler = create_random_sampler(test_dataset, frac=1, is_shuffle=False, logger=logger)
    
train_sample_loader = DataLoader(dataset=train_dataset, 
                          shuffle=False, 
                          batch_size=16, 
                          sampler=subset_train_sampler)    

train_loader = DataLoader(dataset=train_dataset, 
                          shuffle=False, 
                          batch_size=16, 
                          sampler=train_sampler)    

eval_sample_loader = DataLoader(dataset=val_dataset, 
                          shuffle=False, 
                          batch_size=16, 
                          sampler=subset_eval_sampler)    

eval_loader = DataLoader(dataset=val_dataset, 
                          shuffle=False, 
                          batch_size=16, 
                          sampler=eval_sampler)    


dataset size with frac: 0.01 ==> 1196
dataset size with frac: 1 ==> 119641
dataset size with frac: 0.001 ==> 29
dataset size with frac: 1 ==> 29911


### 파라미터 정의

In [18]:
class Params:
    def __init__(self):
        self.epochs = 1        
        self.batch_size = 256
        self.lr = 0.001
        self.log_interval = 50
        self.model_dir = config.model_dir
                        
args = Params()
print("# of epochs: ", args.epochs)

# of epochs:  1


### 모델 로딩

In [19]:
from transformers import AdamW

device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')

model = ElectraForSequenceClassification.from_pretrained(
    model_id, num_labels=num_labels, label2id=label2id, id2label=id2label
)

model.to(device)
model.train()

optimizer = AdamW(model.parameters(), lr=5e-5)

loading configuration file https://huggingface.co/monologg/koelectra-small-v3-discriminator/resolve/main/config.json from cache at /home/ec2-user/.cache/huggingface/transformers/bd0f09888c5a5619ddb9de81d4a9936a94e5f45064f9a23ba6d39241ceebce02.d2485d28e5c07ca60bfa4fe84af673e0df83401e5c56bcdd991878cb4966eb34
Model config ElectraConfig {
  "architectures": [
    "ElectraForPreTraining"
  ],
  "attention_probs_dropout_prob": 0.1,
  "classifier_dropout": null,
  "embedding_size": 128,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 256,
  "id2label": {
    "0": "negative",
    "1": "positive"
  },
  "initializer_range": 0.02,
  "intermediate_size": 1024,
  "label2id": {
    "negative": "0",
    "positive": "1"
  },
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "electra",
  "num_attention_heads": 4,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "position_embedding_type": "absolute",
  "summary_activation": "gelu",
  "summary_last_drop

### 훈련 루프 실행

In [20]:
from train_util import train_epoch, eval_epoch, save_best_model
import time

epochs = 1
best_acc = 0
for epoch in range(epochs):
    start_time = time.time()

    train_epoch(args, 
                model, 
                train_sample_loader, 
                optimizer, 
                epoch, 
                device, 
                logger,
                sampler=None, 
                )            

    elapsed_time = time.time() - start_time    
    print("The time elapse of epoch {:03d}".format(epoch) + " is: " + 
                time.strftime("%H: %M: %S", time.gmtime(elapsed_time)))

    acc = eval_epoch(args, 
               model, 
               epoch, 
               device, 
               logger,
               eval_sample_loader)
    
    best_acc = save_best_model(model, 
                               acc, 
                               epoch, 
                               best_acc,
                               args.model_dir,
                               logger)            
    # best_hr, best_ndcg, best_epoch = test(args, NCF_model, epoch, test_loader, best_hr, model_dir)


The time elapse of epoch 000 is: 00: 00: 03
Train Epoch: 0 Acc=0.814904;
the model is saved at models/nsmc/pytorch_model.bin


## 5.4. 파이썬 스크립트 및 Pytorch 로 훈련

In [21]:
class ParamsScript:
    def __init__(self):
        self.epochs = 1        
        self.train_batch_size = 256
        self.eval_batch_size = 256
        self.test_batch_size = 256         
        self.learning_rate = 5e-5
        self.warmup_steps = 0      
        self.fp16 = True
        self.tokenizer_id = 'monologg/koelectra-small-v3-discriminator'
        self.model_id = 'monologg/koelectra-small-v3-discriminator'     
        # SageMaker Container environment        
        self.output_data_dir = f"{config.output_data_dir}"                                            
        self.model_dir = f"{config.model_dir}"                                       
        self.train_data_dir = f"{config.train_data_dir}"               
        self.checkpoint_dir = f"{config.checkpoint_dir}"                                               
        self.is_evaluation = config.is_evaluation                               
        self.is_test = True
        self.test_data_dir = f"{config.test_data_dir}"                               
        self.eval_ratio = 0.5
        self.use_subset_train_sampler = True 
        self.log_interval = 50
        self.n_gpus = 1                        
        self.seed = 100
                        
script_args = ParamsScript()
print("# of epochs: ", script_args.epochs)

# of epochs:  1


In [22]:
%%time 
from train_lib import train
train(script_args)

##### Args: 
 {'epochs': 1, 'train_batch_size': 256, 'eval_batch_size': 256, 'test_batch_size': 256, 'learning_rate': 5e-05, 'warmup_steps': 0, 'fp16': True, 'tokenizer_id': 'monologg/koelectra-small-v3-discriminator', 'model_id': 'monologg/koelectra-small-v3-discriminator', 'output_data_dir': 'output/nsmc', 'model_dir': 'models/nsmc', 'train_data_dir': 'data/nsmc/train', 'checkpoint_dir': 'checkpoint/nsmc', 'is_evaluation': 'True', 'is_test': True, 'test_data_dir': 'data/nsmc/test', 'eval_ratio': 0.5, 'use_subset_train_sampler': True, 'log_interval': 50, 'n_gpus': 1, 'seed': 100}
device: cuda
train_data_filenames ['data/nsmc/train/ratings_train.txt']
train_file_path data/nsmc/train/ratings_train.txt
len: 149552 
Sample: ['흠   포스터보고 초딩영화줄    오버연기조차 가볍지 않구나', '너무재밓었다그래서보는것을추천한다', '교도소 이야기구먼   솔직히 재미는 없다  평점 조정', '사이몬페그의 익살스런 연기가 돋보였던 영화 스파이더맨에서 늙어보이기만 했던 커스틴 던스트가 너무나도 이뻐보였다', '막 걸음마 뗀 세부터 초등학교 학년생인 살용영화 ㅋㅋㅋ   별반개도 아까움']
len: 149552 
Sample: [1, 0, 0, 1, 0]


loading file https://huggingface.co/monologg/koelectra-small-v3-discriminator/resolve/main/vocab.txt from cache at /home/ec2-user/.cache/huggingface/transformers/32dc9196217c0cc26c7dd705168e8615ea2d82613aa5b672d7647b8e8d58545f.541023ff50f833a9bab3e48e78ae1856cf6744bdb336c86e797eaf675b62b2b8
loading file https://huggingface.co/monologg/koelectra-small-v3-discriminator/resolve/main/added_tokens.json from cache at None
loading file https://huggingface.co/monologg/koelectra-small-v3-discriminator/resolve/main/special_tokens_map.json from cache at None
loading file https://huggingface.co/monologg/koelectra-small-v3-discriminator/resolve/main/tokenizer_config.json from cache at /home/ec2-user/.cache/huggingface/transformers/a6c32c62ff893fb2aa32dabc5722c9f9eb7243cc1cb4514b9ba3fdb1b52704d6.35f013c4fd3572cfdddbbdf6223ef162dd4fb536bf83007533f201addf3287b7
loading file https://huggingface.co/monologg/koelectra-small-v3-discriminator/resolve/main/tokenizer.json from cache at None
loading configura

size of train_dataset : 74776
dataset size with frac: 0.01 ==> 747
num_labels: 2
label2id: {'negative': '0', 'positive': '1'}
id2label: {'0': 'negative', '1': 'positive'}


loading configuration file https://huggingface.co/monologg/koelectra-small-v3-discriminator/resolve/main/config.json from cache at /home/ec2-user/.cache/huggingface/transformers/bd0f09888c5a5619ddb9de81d4a9936a94e5f45064f9a23ba6d39241ceebce02.d2485d28e5c07ca60bfa4fe84af673e0df83401e5c56bcdd991878cb4966eb34
Model config ElectraConfig {
  "architectures": [
    "ElectraForPreTraining"
  ],
  "attention_probs_dropout_prob": 0.1,
  "classifier_dropout": null,
  "embedding_size": 128,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 256,
  "id2label": {
    "0": "negative",
    "1": "positive"
  },
  "initializer_range": 0.02,
  "intermediate_size": 1024,
  "label2id": {
    "negative": "0",
    "positive": "1"
  },
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "electra",
  "num_attention_heads": 4,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "position_embedding_type": "absolute",
  "summary_activation": "gelu",
  "summary_last_drop



loading file https://huggingface.co/monologg/koelectra-small-v3-discriminator/resolve/main/vocab.txt from cache at /home/ec2-user/.cache/huggingface/transformers/32dc9196217c0cc26c7dd705168e8615ea2d82613aa5b672d7647b8e8d58545f.541023ff50f833a9bab3e48e78ae1856cf6744bdb336c86e797eaf675b62b2b8
loading file https://huggingface.co/monologg/koelectra-small-v3-discriminator/resolve/main/added_tokens.json from cache at None
loading file https://huggingface.co/monologg/koelectra-small-v3-discriminator/resolve/main/special_tokens_map.json from cache at None
loading file https://huggingface.co/monologg/koelectra-small-v3-discriminator/resolve/main/tokenizer_config.json from cache at /home/ec2-user/.cache/huggingface/transformers/a6c32c62ff893fb2aa32dabc5722c9f9eb7243cc1cb4514b9ba3fdb1b52704d6.35f013c4fd3572cfdddbbdf6223ef162dd4fb536bf83007533f201addf3287b7
loading file https://huggingface.co/monologg/koelectra-small-v3-discriminator/resolve/main/tokenizer.json from cache at None
loading configura

size of val_dataset : 74776
dataset size with frac: 1 ==> 74776
The time elapse of epoch 000 is: 00: 00: 00
Train Epoch: 0 Acc=0.502182;
the model is saved at models/nsmc/pytorch_model.bin
test_data_filenames ['data/nsmc/test/ratings_test.txt']
test_file_path data/nsmc/test/ratings_test.txt
len: 49832 
Sample: ['뭐야 이 평점들은     나쁘진 않지만 점 짜리는 더더욱 아니잖아', '지루하지는 않은데 완전 막장임    돈주고 보기에는', '만 아니었어도 별 다섯 개 줬을텐데   왜 로 나와서 제 심기를 불편하게 하죠', '음악이 주가 된  최고의 음악영화', '진정한 쓰레기']
len: 49832 
Sample: [0, 0, 0, 1, 0]


loading file https://huggingface.co/monologg/koelectra-small-v3-discriminator/resolve/main/vocab.txt from cache at /home/ec2-user/.cache/huggingface/transformers/32dc9196217c0cc26c7dd705168e8615ea2d82613aa5b672d7647b8e8d58545f.541023ff50f833a9bab3e48e78ae1856cf6744bdb336c86e797eaf675b62b2b8
loading file https://huggingface.co/monologg/koelectra-small-v3-discriminator/resolve/main/added_tokens.json from cache at None
loading file https://huggingface.co/monologg/koelectra-small-v3-discriminator/resolve/main/special_tokens_map.json from cache at None
loading file https://huggingface.co/monologg/koelectra-small-v3-discriminator/resolve/main/tokenizer_config.json from cache at /home/ec2-user/.cache/huggingface/transformers/a6c32c62ff893fb2aa32dabc5722c9f9eb7243cc1cb4514b9ba3fdb1b52704d6.35f013c4fd3572cfdddbbdf6223ef162dd4fb536bf83007533f201addf3287b7
loading file https://huggingface.co/monologg/koelectra-small-v3-discriminator/resolve/main/tokenizer.json from cache at None
loading configura

size of test_dataset : 49832
dataset size with frac: 1 ==> 49832
Test Accuracy: Acc=0.496422;
CPU times: user 1min 25s, sys: 10.7 s, total: 1min 36s
Wall time: 1min 36s


# 6. 커널 리스타팅

- 위의 노트북을 다 실행하고 나면 아래의 그림과 같이 GPU의 메모리를 차지하고 있습니다. (터미널에서 `nvidia-smi` 입력) 
![before-nvidia-smi.png](../2_WarmingUp/img/before-nvidia-smi.png)

- 아래 셀을 실행하면 이 노트북의 커널이 리스타트 되고 해제된 메모리를 확인 할 수 있습니다.
![after-nvidia-smi.png](../2_WarmingUp/img/after-nvidia-smi.png)

In [23]:
import IPython

IPython.Application.instance().kernel.do_shutdown(True)

{'status': 'ok', 'restart': True}