# History
- [KoELECTRA Jaehyeong](#koelectra-jaehyeong)
- [KoELECTRA V3](#koelectra-v3)
- [KoELECTRA Naver NER](#koelectra-naver-ner)
- [RoBERTa](#roberta)
- [Compare Models](#compare-models)

In [None]:
MAX_LEN = 128
VALID_SPLIT = 0.1
BATCH_SIZE = 64 # set 32 for large models
EPOCHS = 2
LEARNING_RATE = 1e-5
DR_RATE = 0.3
WARMUP_STEPS = 500
WEIGHT_DECAY = 0.01
METRIC = 'accuracy'
MODEL_PATH = {
    'v3':'monologg/koelectra-base-v3-discriminator',
    'naver_ner':'monologg/koelectra-base-v3-naver-ner',
    'jaehyeong':'jaehyeong/koelectra-base-v3-generalized-sentiment-analysis',
    'roberta':'klue/roberta-large'
}

In [None]:
from datasets import load_dataset
from datasets.arrow_dataset import Dataset
from transformers import Trainer

def trainer(dataset: Dataset, model_name: str, model_path: str, labeled=True) -> Trainer:
    return Trainer()

train_dataset = load_dataset('csv', data_files='./data/train.csv', split='train')

## KoELECTRA Jaehyeong

In [None]:
koelectra_jaehyeong_trainer = trainer(train_dataset, 'koelectra_jaehyeong', MODEL_PATH['jaehyeong'])

In [None]:
koelectra_jaehyeong_trainer.train()

***** Running training *****
  Num examples = 22500
  Num Epochs = 2
  Instantaneous batch size per device = 64
  Total train batch size (w. parallel, distributed & accumulation) = 64
  Gradient Accumulation steps = 1
  Total optimization steps = 704


Epoch,Training Loss,Validation Loss,Accuracy
1,No log,0.373962,0.6868
2,0.495500,0.328414,0.6904


***** Running Evaluation *****
  Num examples = 2500
  Batch size = 64
Saving model checkpoint to ./saved/models/koelectra_jaehyeong/checkpoint-352
Configuration saved in ./saved/models/koelectra_jaehyeong/checkpoint-352/config.json
Model weights saved in ./saved/models/koelectra_jaehyeong/checkpoint-352/pytorch_model.bin
***** Running Evaluation *****
  Num examples = 2500
  Batch size = 64
Saving model checkpoint to ./saved/models/koelectra_jaehyeong/checkpoint-704
Configuration saved in ./saved/models/koelectra_jaehyeong/checkpoint-704/config.json
Model weights saved in ./saved/models/koelectra_jaehyeong/checkpoint-704/pytorch_model.bin


Training completed. Do not forget to share your model on huggingface.co/models =)


Loading best model from ./saved/models/koelectra_jaehyeong/checkpoint-704 (score: 0.6904).


TrainOutput(global_step=704, training_loss=0.45169840075752954, metrics={'train_runtime': 911.4667, 'train_samples_per_second': 49.371, 'train_steps_per_second': 0.772, 'total_flos': 2960052526080000.0, 'train_loss': 0.45169840075752954, 'epoch': 2.0})

In [None]:
koelectra_jaehyeong_trainer.evaluate()

***** Running Evaluation *****
  Num examples = 2500
  Batch size = 64


{'epoch': 2.0,
 'eval_accuracy': 0.6904,
 'eval_loss': 0.32841435074806213,
 'eval_runtime': 17.9545,
 'eval_samples_per_second': 139.241,
 'eval_steps_per_second': 2.228}

## KoELECTRA V3

In [None]:
koelectra_v3_trainer = trainer(train_dataset, 'koelectra_v3', MODEL_PATH['v3'])

In [None]:
koelectra_v3_trainer.train()

***** Running training *****
  Num examples = 22500
  Num Epochs = 2
  Instantaneous batch size per device = 64
  Total train batch size (w. parallel, distributed & accumulation) = 64
  Gradient Accumulation steps = 1
  Total optimization steps = 704


Epoch,Training Loss,Validation Loss,Accuracy
1,No log,0.386784,0.668
2,0.508400,0.340945,0.6812


***** Running Evaluation *****
  Num examples = 2500
  Batch size = 64
Saving model checkpoint to ./saved/models/koelectra_v3/checkpoint-352
Configuration saved in ./saved/models/koelectra_v3/checkpoint-352/config.json
Model weights saved in ./saved/models/koelectra_v3/checkpoint-352/pytorch_model.bin
***** Running Evaluation *****
  Num examples = 2500
  Batch size = 64
Saving model checkpoint to ./saved/models/koelectra_v3/checkpoint-704
Configuration saved in ./saved/models/koelectra_v3/checkpoint-704/config.json
Model weights saved in ./saved/models/koelectra_v3/checkpoint-704/pytorch_model.bin


Training completed. Do not forget to share your model on huggingface.co/models =)


Loading best model from ./saved/models/koelectra_v3/checkpoint-704 (score: 0.6812).


TrainOutput(global_step=704, training_loss=0.4639513384212147, metrics={'train_runtime': 912.8539, 'train_samples_per_second': 49.296, 'train_steps_per_second': 0.771, 'total_flos': 2960052526080000.0, 'train_loss': 0.4639513384212147, 'epoch': 2.0})

In [None]:
koelectra_v3_trainer.evaluate()

***** Running Evaluation *****
  Num examples = 2500
  Batch size = 64


{'epoch': 2.0,
 'eval_accuracy': 0.6812,
 'eval_loss': 0.34094521403312683,
 'eval_runtime': 18.1314,
 'eval_samples_per_second': 137.882,
 'eval_steps_per_second': 2.206}

## KoELECTRA Naver NER

In [None]:
koelectra_ner_trainer = trainer(train_dataset, 'koelectra_ner', MODEL_PATH['naver_ner'])

In [None]:
koelectra_ner_trainer.train()

***** Running training *****
  Num examples = 22500
  Num Epochs = 2
  Instantaneous batch size per device = 64
  Total train batch size (w. parallel, distributed & accumulation) = 64
  Gradient Accumulation steps = 1
  Total optimization steps = 704


Epoch,Training Loss,Validation Loss,Accuracy
1,No log,0.386787,0.6648
2,0.500400,0.346429,0.6756


***** Running Evaluation *****
  Num examples = 2500
  Batch size = 64
Saving model checkpoint to ./saved/models/koelectra_batch64/checkpoint-352
Configuration saved in ./saved/models/koelectra_batch64/checkpoint-352/config.json
Model weights saved in ./saved/models/koelectra_batch64/checkpoint-352/pytorch_model.bin
***** Running Evaluation *****
  Num examples = 2500
  Batch size = 64
Saving model checkpoint to ./saved/models/koelectra_batch64/checkpoint-704
Configuration saved in ./saved/models/koelectra_batch64/checkpoint-704/config.json
Model weights saved in ./saved/models/koelectra_batch64/checkpoint-704/pytorch_model.bin


Training completed. Do not forget to share your model on huggingface.co/models =)


Loading best model from ./saved/models/koelectra_batch64/checkpoint-704 (score: 0.6756).


TrainOutput(global_step=704, training_loss=0.45949892564253375, metrics={'train_runtime': 911.3405, 'train_samples_per_second': 49.378, 'train_steps_per_second': 0.772, 'total_flos': 2960052526080000.0, 'train_loss': 0.45949892564253375, 'epoch': 2.0})

In [None]:
koelectra_ner_trainer.evaluate()

***** Running Evaluation *****
  Num examples = 2500
  Batch size = 64


{'epoch': 2.0,
 'eval_accuracy': 0.6756,
 'eval_loss': 0.3464292585849762,
 'eval_runtime': 18.1138,
 'eval_samples_per_second': 138.016,
 'eval_steps_per_second': 2.208}

## RoBERTa

In [None]:
roberta_large_trainer = trainer(train_dataset, 'roberta_large', MODEL_PATH['roberta'])

In [None]:
roberta_large_trainer.train()

***** Running training *****
  Num examples = 22500
  Num Epochs = 2
  Instantaneous batch size per device = 32
  Total train batch size (w. parallel, distributed & accumulation) = 32
  Gradient Accumulation steps = 1
  Total optimization steps = 1408


Epoch,Training Loss,Validation Loss,Accuracy
1,0.402,0.338354,0.6816
2,0.322,0.313305,0.7128


***** Running Evaluation *****
  Num examples = 2500
  Batch size = 32
Saving model checkpoint to ./saved/models/roberta_large/checkpoint-704
Configuration saved in ./saved/models/roberta_large/checkpoint-704/config.json
Model weights saved in ./saved/models/roberta_large/checkpoint-704/pytorch_model.bin
***** Running Evaluation *****
  Num examples = 2500
  Batch size = 32
Saving model checkpoint to ./saved/models/roberta_large/checkpoint-1408
Configuration saved in ./saved/models/roberta_large/checkpoint-1408/config.json
Model weights saved in ./saved/models/roberta_large/checkpoint-1408/pytorch_model.bin


Training completed. Do not forget to share your model on huggingface.co/models =)


Loading best model from ./saved/models/roberta_large/checkpoint-1408 (score: 0.7128).


TrainOutput(global_step=1408, training_loss=0.34661728143692017, metrics={'train_runtime': 3120.8492, 'train_samples_per_second': 14.419, 'train_steps_per_second': 0.451, 'total_flos': 1.048429868544e+16, 'train_loss': 0.34661728143692017, 'epoch': 2.0})

In [None]:
roberta_large_trainer.evaluate()

***** Running Evaluation *****
  Num examples = 2500
  Batch size = 32


{'epoch': 2.0,
 'eval_accuracy': 0.7128,
 'eval_loss': 0.3133045434951782,
 'eval_runtime': 54.4888,
 'eval_samples_per_second': 45.881,
 'eval_steps_per_second': 1.45}

## KoELECTRA By Data

In [None]:
MODEL_NAME = lambda x: 'koelectra_'+x
MODEL_PATH = 'jaehyeong/koelectra-base-v3-generalized-sentiment-analysis'

train_vanila_dataset = load_dataset('csv', data_files='./data/train.csv', split='train')
train_aug_dataset = load_dataset('csv', data_files='./data/train_aug.csv', split='train')
train_cleaned_vanila_dataset = load_dataset('csv', data_files='./data/train_cleaned.csv', split='train')
train_cleaned_aug_dataset = load_dataset('csv', data_files='./data/train_aug_cleaned.csv', split='train')
train_vanila_dataset = train_vanila_dataset.train_test_split(test_size=VALID_SPLIT)
train_aug_dataset = train_aug_dataset.train_test_split(test_size=VALID_SPLIT)
train_cleaned_vanila_dataset = train_cleaned_vanila_dataset.train_test_split(test_size=VALID_SPLIT)
train_cleaned_aug_dataset = train_cleaned_aug_dataset.train_test_split(test_size=VALID_SPLIT)

### Vanila Data

In [None]:
koelectra_vanila_trainer = trainer(train_vanila_dataset, MODEL_NAME('vanila'), MODEL_PATH)

In [None]:
koelectra_vanila_trainer.train()

***** Running training *****
  Num examples = 22500
  Num Epochs = 2
  Instantaneous batch size per device = 32
  Total train batch size (w. parallel, distributed & accumulation) = 32
  Gradient Accumulation steps = 1
  Total optimization steps = 1408


Epoch,Training Loss,Validation Loss,Accuracy
1,0.488,0.336751,0.6836
2,0.3352,0.320104,0.698


***** Running Evaluation *****
  Num examples = 2500
  Batch size = 32
Saving model checkpoint to ./saved/models/koelectra_vanila/checkpoint-704
Configuration saved in ./saved/models/koelectra_vanila/checkpoint-704/config.json
Model weights saved in ./saved/models/koelectra_vanila/checkpoint-704/pytorch_model.bin
***** Running Evaluation *****
  Num examples = 2500
  Batch size = 32
Saving model checkpoint to ./saved/models/koelectra_vanila/checkpoint-1408
Configuration saved in ./saved/models/koelectra_vanila/checkpoint-1408/config.json
Model weights saved in ./saved/models/koelectra_vanila/checkpoint-1408/pytorch_model.bin


Training completed. Do not forget to share your model on huggingface.co/models =)


Loading best model from ./saved/models/koelectra_vanila/checkpoint-1408 (score: 0.698).


TrainOutput(global_step=1408, training_loss=0.3849831060929732, metrics={'train_runtime': 1032.3314, 'train_samples_per_second': 43.591, 'train_steps_per_second': 1.364, 'total_flos': 2960052526080000.0, 'train_loss': 0.3849831060929732, 'epoch': 2.0})

In [None]:
koelectra_vanila_trainer.evaluate()

***** Running Evaluation *****
  Num examples = 2500
  Batch size = 32


{'epoch': 2.0,
 'eval_accuracy': 0.698,
 'eval_loss': 0.3201037645339966,
 'eval_runtime': 20.4758,
 'eval_samples_per_second': 122.095,
 'eval_steps_per_second': 3.858}

### Augmented Data

In [None]:
koelectra_aug_trainer = trainer(train_aug_dataset, MODEL_NAME('aug'), MODEL_PATH)

In [None]:
koelectra_aug_trainer.train()

***** Running training *****
  Num examples = 35989
  Num Epochs = 2
  Instantaneous batch size per device = 32
  Total train batch size (w. parallel, distributed & accumulation) = 32
  Gradient Accumulation steps = 1
  Total optimization steps = 2250


Epoch,Training Loss,Validation Loss,Accuracy
1,0.3749,0.331634,0.702926
2,0.3324,0.320971,0.708927


***** Running Evaluation *****
  Num examples = 3999
  Batch size = 32
Saving model checkpoint to ./saved/models/koelectra_aug/checkpoint-1125
Configuration saved in ./saved/models/koelectra_aug/checkpoint-1125/config.json
Model weights saved in ./saved/models/koelectra_aug/checkpoint-1125/pytorch_model.bin
***** Running Evaluation *****
  Num examples = 3999
  Batch size = 32
Saving model checkpoint to ./saved/models/koelectra_aug/checkpoint-2250
Configuration saved in ./saved/models/koelectra_aug/checkpoint-2250/config.json
Model weights saved in ./saved/models/koelectra_aug/checkpoint-2250/pytorch_model.bin


Training completed. Do not forget to share your model on huggingface.co/models =)


Loading best model from ./saved/models/koelectra_aug/checkpoint-2250 (score: 0.708927231807952).


TrainOutput(global_step=2250, training_loss=0.3871005588107639, metrics={'train_runtime': 1644.359, 'train_samples_per_second': 43.773, 'train_steps_per_second': 1.368, 'total_flos': 4734636904937472.0, 'train_loss': 0.3871005588107639, 'epoch': 2.0})

In [None]:
koelectra_aug_trainer.evaluate()

***** Running Evaluation *****
  Num examples = 3999
  Batch size = 32


{'epoch': 2.0,
 'eval_accuracy': 0.708927231807952,
 'eval_loss': 0.3209707736968994,
 'eval_runtime': 33.383,
 'eval_samples_per_second': 119.791,
 'eval_steps_per_second': 3.744}

### Cleaned Data (Vanila)

In [None]:
koelectra_cleaned_vanila_trainer = trainer(train_cleaned_vanila_dataset, MODEL_NAME('cleaned_vanila'), MODEL_PATH)

In [None]:
koelectra_cleaned_vanila_trainer.train()

***** Running training *****
  Num examples = 22500
  Num Epochs = 2
  Instantaneous batch size per device = 64
  Total train batch size (w. parallel, distributed & accumulation) = 64
  Gradient Accumulation steps = 1
  Total optimization steps = 704


Epoch,Training Loss,Validation Loss,Accuracy
1,No log,0.365218,0.688
2,0.480400,0.329157,0.6888


***** Running Evaluation *****
  Num examples = 2500
  Batch size = 64
Saving model checkpoint to ./saved/models/koelectra_cleaned_vanila/checkpoint-352
Configuration saved in ./saved/models/koelectra_cleaned_vanila/checkpoint-352/config.json
Model weights saved in ./saved/models/koelectra_cleaned_vanila/checkpoint-352/pytorch_model.bin
***** Running Evaluation *****
  Num examples = 2500
  Batch size = 64
Saving model checkpoint to ./saved/models/koelectra_cleaned_vanila/checkpoint-704
Configuration saved in ./saved/models/koelectra_cleaned_vanila/checkpoint-704/config.json
Model weights saved in ./saved/models/koelectra_cleaned_vanila/checkpoint-704/pytorch_model.bin


Training completed. Do not forget to share your model on huggingface.co/models =)


Loading best model from ./saved/models/koelectra_cleaned_vanila/checkpoint-704 (score: 0.6888).


TrainOutput(global_step=704, training_loss=0.44087453321977094, metrics={'train_runtime': 939.5772, 'train_samples_per_second': 47.894, 'train_steps_per_second': 0.749, 'total_flos': 2960052526080000.0, 'train_loss': 0.44087453321977094, 'epoch': 2.0})

In [None]:
koelectra_cleaned_vanila_trainer.evaluate() # cleaned vanila

***** Running Evaluation *****
  Num examples = 2500
  Batch size = 64


{'epoch': 2.0,
 'eval_accuracy': 0.6888,
 'eval_loss': 0.32915711402893066,
 'eval_runtime': 19.7101,
 'eval_samples_per_second': 126.839,
 'eval_steps_per_second': 2.029}

### Cleaned Data (Augmented)

In [None]:
koelectra_cleaned_aug_trainer = trainer(train_cleaned_aug_dataset, MODEL_NAME('cleaned_aug'), MODEL_PATH)

In [None]:
koelectra_cleaned_aug_trainer.train()

***** Running training *****
  Num examples = 35988
  Num Epochs = 2
  Instantaneous batch size per device = 32
  Total train batch size (w. parallel, distributed & accumulation) = 32
  Gradient Accumulation steps = 1
  Total optimization steps = 2250


Epoch,Training Loss,Validation Loss,Accuracy
1,0.3752,0.35233,0.671418
2,0.3356,0.340606,0.684171


***** Running Evaluation *****
  Num examples = 3999
  Batch size = 32
Saving model checkpoint to ./saved/models/koelectra_cleaned/checkpoint-1125
Configuration saved in ./saved/models/koelectra_cleaned/checkpoint-1125/config.json
Model weights saved in ./saved/models/koelectra_cleaned/checkpoint-1125/pytorch_model.bin
***** Running Evaluation *****
  Num examples = 3999
  Batch size = 32
Saving model checkpoint to ./saved/models/koelectra_cleaned/checkpoint-2250
Configuration saved in ./saved/models/koelectra_cleaned/checkpoint-2250/config.json
Model weights saved in ./saved/models/koelectra_cleaned/checkpoint-2250/pytorch_model.bin


Training completed. Do not forget to share your model on huggingface.co/models =)


Loading best model from ./saved/models/koelectra_cleaned/checkpoint-2250 (score: 0.6841710427606902).


TrainOutput(global_step=2250, training_loss=0.3881656019422743, metrics={'train_runtime': 1638.1853, 'train_samples_per_second': 43.936, 'train_steps_per_second': 1.373, 'total_flos': 4734505347047424.0, 'train_loss': 0.3881656019422743, 'epoch': 2.0})

In [None]:
koelectra_cleaned_aug_trainer.evaluate()

***** Running Evaluation *****
  Num examples = 3999
  Batch size = 32


{'epoch': 2.0,
 'eval_accuracy': 0.6841710427606902,
 'eval_loss': 0.3406062722206116,
 'eval_runtime': 33.4348,
 'eval_samples_per_second': 119.606,
 'eval_steps_per_second': 3.739}

## 10 Epochs

### KoELECTRA

In [None]:
koelectra_trainer = trainer(train_dataset, MODEL_NAME('epoch10'), MODEL_PATH)

In [None]:
koelectra_trainer.train()

***** Running training *****
  Num examples = 22500
  Num Epochs = 10
  Instantaneous batch size per device = 64
  Total train batch size (w. parallel, distributed & accumulation) = 64
  Gradient Accumulation steps = 1
  Total optimization steps = 3520


Epoch,Training Loss,Validation Loss,Accuracy
1,No log,0.363845,0.698
2,0.480100,0.32023,0.706
3,0.326900,0.31076,0.7148
4,0.326900,0.313553,0.7084
5,0.300700,0.313368,0.714
6,0.284500,0.316001,0.7164
7,0.284500,0.320307,0.7056
8,0.272400,0.322899,0.7044
9,0.259700,0.326668,0.698


***** Running Evaluation *****
  Num examples = 2500
  Batch size = 64
Saving model checkpoint to ./saved/models/koelectra_vanila_epoch10/checkpoint-352
Configuration saved in ./saved/models/koelectra_vanila_epoch10/checkpoint-352/config.json
Model weights saved in ./saved/models/koelectra_vanila_epoch10/checkpoint-352/pytorch_model.bin
***** Running Evaluation *****
  Num examples = 2500
  Batch size = 64
Saving model checkpoint to ./saved/models/koelectra_vanila_epoch10/checkpoint-704
Configuration saved in ./saved/models/koelectra_vanila_epoch10/checkpoint-704/config.json
Model weights saved in ./saved/models/koelectra_vanila_epoch10/checkpoint-704/pytorch_model.bin
***** Running Evaluation *****
  Num examples = 2500
  Batch size = 64
Saving model checkpoint to ./saved/models/koelectra_vanila_epoch10/checkpoint-1056
Configuration saved in ./saved/models/koelectra_vanila_epoch10/checkpoint-1056/config.json
Model weights saved in ./saved/models/koelectra_vanila_epoch10/checkpoint-105

TrainOutput(global_step=3168, training_loss=0.3174802233474423, metrics={'train_runtime': 3998.7379, 'train_samples_per_second': 56.268, 'train_steps_per_second': 0.88, 'total_flos': 1.332023636736e+16, 'train_loss': 0.3174802233474423, 'epoch': 9.0})

In [None]:
koelectra_trainer.evaluate()

***** Running Evaluation *****
  Num examples = 2500
  Batch size = 64


{'epoch': 9.0,
 'eval_accuracy': 0.7164,
 'eval_loss': 0.3160010278224945,
 'eval_runtime': 17.3735,
 'eval_samples_per_second': 143.897,
 'eval_steps_per_second': 2.302}

### RoBERTa

In [None]:
roberta_trainer = trainer(train_dataset, MODEL_NAME('epoch10'), MODEL_PATH)

In [None]:
roberta_trainer.train(resume_from_checkpoint=True)

Loading model from ./saved/models/roberta_epoch10/checkpoint-1125.
***** Running training *****
  Num examples = 35988
  Num Epochs = 10
  Instantaneous batch size per device = 32
  Total train batch size (w. parallel, distributed & accumulation) = 32
  Gradient Accumulation steps = 1
  Total optimization steps = 11250
  Continuing training from checkpoint, will skip to saved global_step
  Continuing training from epoch 1
  Continuing training from global step 1125
  Will skip the first 1 epochs then the first 0 batches in the first epoch. If this takes a lot of time, you can add the `--ignore_data_skip` flag to your launch command, but you will resume the training on data already seen by your model.


0it [00:00, ?it/s]

Epoch,Training Loss,Validation Loss,Accuracy
2,0.3087,0.330596,0.698675
3,0.2546,0.3425,0.692673
4,0.2054,0.361063,0.703926
5,0.1542,0.427955,0.695674


***** Running Evaluation *****
  Num examples = 3999
  Batch size = 32
Saving model checkpoint to ./saved/models/roberta_epoch10/checkpoint-2250
Configuration saved in ./saved/models/roberta_epoch10/checkpoint-2250/config.json
Model weights saved in ./saved/models/roberta_epoch10/checkpoint-2250/pytorch_model.bin
***** Running Evaluation *****
  Num examples = 3999
  Batch size = 32
Saving model checkpoint to ./saved/models/roberta_epoch10/checkpoint-3375
Configuration saved in ./saved/models/roberta_epoch10/checkpoint-3375/config.json
Model weights saved in ./saved/models/roberta_epoch10/checkpoint-3375/pytorch_model.bin
***** Running Evaluation *****
  Num examples = 3999
  Batch size = 32
Saving model checkpoint to ./saved/models/roberta_epoch10/checkpoint-4500
Configuration saved in ./saved/models/roberta_epoch10/checkpoint-4500/config.json
Model weights saved in ./saved/models/roberta_epoch10/checkpoint-4500/pytorch_model.bin
***** Running Evaluation *****
  Num examples = 3999
  

In [None]:
roberta_trainer.train(resume_from_checkpoint=True)

Loading model from ./saved/models/roberta_epoch10/checkpoint-6750.
***** Running training *****
  Num examples = 35988
  Num Epochs = 10
  Instantaneous batch size per device = 32
  Total train batch size (w. parallel, distributed & accumulation) = 32
  Gradient Accumulation steps = 1
  Total optimization steps = 11250
  Continuing training from checkpoint, will skip to saved global_step
  Continuing training from epoch 6
  Continuing training from global step 6750
  Will skip the first 6 epochs then the first 0 batches in the first epoch. If this takes a lot of time, you can add the `--ignore_data_skip` flag to your launch command, but you will resume the training on data already seen by your model.


0it [00:00, ?it/s]

Epoch,Training Loss,Validation Loss,Accuracy
7,0.0944,0.541915,0.687922
8,0.0787,0.591995,0.689672
9,0.066,0.641404,0.688422


***** Running Evaluation *****
  Num examples = 3999
  Batch size = 32
Saving model checkpoint to ./saved/models/roberta_epoch10/checkpoint-7875
Configuration saved in ./saved/models/roberta_epoch10/checkpoint-7875/config.json
Model weights saved in ./saved/models/roberta_epoch10/checkpoint-7875/pytorch_model.bin
***** Running Evaluation *****
  Num examples = 3999
  Batch size = 32
Saving model checkpoint to ./saved/models/roberta_epoch10/checkpoint-9000
Configuration saved in ./saved/models/roberta_epoch10/checkpoint-9000/config.json
Model weights saved in ./saved/models/roberta_epoch10/checkpoint-9000/pytorch_model.bin
***** Running Evaluation *****
  Num examples = 3999
  Batch size = 32
Saving model checkpoint to ./saved/models/roberta_epoch10/checkpoint-10125
Configuration saved in ./saved/models/roberta_epoch10/checkpoint-10125/config.json
Model weights saved in ./saved/models/roberta_epoch10/checkpoint-10125/pytorch_model.bin


Training completed. Do not forget to share your mo

TrainOutput(global_step=10125, training_loss=0.025756338661099658, metrics={'train_runtime': 7299.2176, 'train_samples_per_second': 49.304, 'train_steps_per_second': 1.541, 'total_flos': 7.652792490242458e+16, 'train_loss': 0.025756338661099658, 'epoch': 9.0})

In [None]:
roberta_trainer.evaluate()

***** Running Evaluation *****
  Num examples = 3999
  Batch size = 32


{'epoch': 9.0,
 'eval_accuracy': 0.7039259814953739,
 'eval_loss': 0.3610629737377167,
 'eval_runtime': 83.2874,
 'eval_samples_per_second': 48.014,
 'eval_steps_per_second': 1.501}