### 学習中断・再開の実装 --
* train, validともにtrain.csv全データを使って、
    * 一気に5epoch
    * 1epochずつ5epoch <br>
    行ったときの結果が一致するかを確認する --

In [1]:
import pandas as pd
import numpy as np

import torch
from transformers import AdamW

from bert_utils import *
from config import *

INFO: Pandarallel will run on 8 workers.
INFO: Pandarallel will use Memory file system to transfer data between the main process and workers.

****** SEED fixed : 42 ******




In [2]:
train_df = pd.read_csv(data_path+"train.csv")

In [3]:
train_df["clean_text"] = train_df["text"].map(lambda x: clean_text(x))

In [4]:
tokenizer = define_tokenizer("cl-tohoku/bert-base-japanese")

In [5]:
train_ds = HateSpeechDataset(train_df, tokenizer=tokenizer, max_length=76, num_classes=2, text_col="clean_text", label_name="label")
valid_ds = HateSpeechDataset(train_df, tokenizer=tokenizer, max_length=76, num_classes=2, text_col="clean_text", label_name="label")

In [6]:
train_loader = DataLoader(
    train_ds, batch_size=32, num_workers=2, shuffle=True, pin_memory=True, drop_last=True
)
valid_loader = DataLoader(
    valid_ds, batch_size=64, num_workers=2, shuffle=False, pin_memory=True
)

In [7]:
model = HateSpeechModel("cl-tohoku/bert-base-japanese", 2, custom_header="concatenate-4", dropout=0.2, n_msd=None)

Some weights of the model checkpoint at cl-tohoku/bert-base-japanese were not used when initializing BertModel: ['cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.dense.weight', 'cls.predictions.bias']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


[33mconcatenate-4[39m


### 初期状態から1epochずつ再開して5epochまでいく --

In [8]:
# Define Optimizer and Scheduler --
model = HateSpeechModel("cl-tohoku/bert-base-japanese", 2, custom_header="concatenate-4", dropout=0.2, n_msd=None)
optimizer = AdamW(model.parameters(), lr=1e-5, weight_decay=1e-6)
scheduler = fetch_scheduler(optimizer=optimizer, scheduler="None")
model.to(device)

model, history = run_training(
    model, train_loader, valid_loader,
    optimizer, scheduler, 1, device,
    True, 1, "_ALL_epoch1", "./test/02_5epoch_restart/",
    log=None, save_checkpoint=True, load_checkpoint=None
)

Some weights of the model checkpoint at cl-tohoku/bert-base-japanese were not used when initializing BertModel: ['cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.dense.weight', 'cls.predictions.bias']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


[33mconcatenate-4[39m
*** *** NOT implemented *** *** 
        --> CosineAnnealingLR *** *** 
[INFO] Using GPU : NVIDIA GeForce RTX 3090



100%|██████████| 164/164 [00:11<00:00, 14.74it/s, Epoch=1, LR=7.6e-6, Train_Loss=0.214] 
100%|██████████| 83/83 [00:05<00:00, 15.12it/s, Epoch=1, LR=7.6e-6, Valid_Loss=0.126]


Valid Loss Improved : inf ---> 0.126232
Model Saved

Training Complete in 0h 0m 21s
Best Loss: 0.1262


In [9]:
# Define Optimizer and Scheduler --
model = HateSpeechModel("cl-tohoku/bert-base-japanese", 2, custom_header="concatenate-4", dropout=0.2, n_msd=None)
optimizer = AdamW(model.parameters(), lr=1e-5, weight_decay=1e-6)
scheduler = fetch_scheduler(optimizer=optimizer, scheduler="None")
model.to(device)

model, history = run_training(
    model, train_loader, valid_loader,
    optimizer, scheduler, 1, device,
    True, 1, "_ALL_epoch2", "./test/02_5epoch_restart/",
    log=None, save_checkpoint=True, load_checkpoint="test/02_5epoch_restart/checkpoint-fold_ALL_epoch1.pth"
)

Some weights of the model checkpoint at cl-tohoku/bert-base-japanese were not used when initializing BertModel: ['cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.dense.weight', 'cls.predictions.bias']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


[33mconcatenate-4[39m
*** *** NOT implemented *** *** 
        --> CosineAnnealingLR *** *** 
[INFO] Using GPU : NVIDIA GeForce RTX 3090



100%|██████████| 164/164 [00:10<00:00, 15.90it/s, Epoch=2, LR=2.72e-6, Train_Loss=0.115]
100%|██████████| 83/83 [00:05<00:00, 15.57it/s, Epoch=2, LR=2.72e-6, Valid_Loss=0.0732]


Valid Loss Improved : 0.126232 ---> 0.073167
Model Saved

Training Complete in 0h 0m 20s
Best Loss: 0.0732


In [10]:
# Define Optimizer and Scheduler --
model = HateSpeechModel("cl-tohoku/bert-base-japanese", 2, custom_header="concatenate-4", dropout=0.2, n_msd=None)
optimizer = AdamW(model.parameters(), lr=1e-5, weight_decay=1e-6)
scheduler = fetch_scheduler(optimizer=optimizer, scheduler="None")
model.to(device)

model, history = run_training(
    model, train_loader, valid_loader,
    optimizer, scheduler, 1, device,
    True, 1, "_ALL_epoch3", "./test/02_5epoch_restart/",
    log=None, save_checkpoint=True, load_checkpoint="test/02_5epoch_restart/checkpoint-fold_ALL_epoch2.pth"
)

Some weights of the model checkpoint at cl-tohoku/bert-base-japanese were not used when initializing BertModel: ['cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.dense.weight', 'cls.predictions.bias']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


[33mconcatenate-4[39m
*** *** NOT implemented *** *** 
        --> CosineAnnealingLR *** *** 
[INFO] Using GPU : NVIDIA GeForce RTX 3090



100%|██████████| 164/164 [00:10<00:00, 15.04it/s, Epoch=3, LR=1.06e-7, Train_Loss=0.0797]
100%|██████████| 83/83 [00:05<00:00, 15.21it/s, Epoch=3, LR=1.06e-7, Valid_Loss=0.0641]


Valid Loss Improved : 0.073167 ---> 0.064123
Model Saved

Training Complete in 0h 0m 21s
Best Loss: 0.0641


In [11]:
# Define Optimizer and Scheduler --
model = HateSpeechModel("cl-tohoku/bert-base-japanese", 2, custom_header="concatenate-4", dropout=0.2, n_msd=None)
optimizer = AdamW(model.parameters(), lr=1e-5, weight_decay=1e-6)
scheduler = fetch_scheduler(optimizer=optimizer, scheduler="None")
model.to(device)

model, history = run_training(
    model, train_loader, valid_loader,
    optimizer, scheduler, 1, device,
    True, 1, "_ALL_epoch4", "./test/02_5epoch_restart/",
    log=None, save_checkpoint=True, load_checkpoint="test/02_5epoch_restart/checkpoint-fold_ALL_epoch3.pth"
)

Some weights of the model checkpoint at cl-tohoku/bert-base-japanese were not used when initializing BertModel: ['cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.dense.weight', 'cls.predictions.bias']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


[33mconcatenate-4[39m
*** *** NOT implemented *** *** 
        --> CosineAnnealingLR *** *** 
[INFO] Using GPU : NVIDIA GeForce RTX 3090



100%|██████████| 164/164 [00:10<00:00, 15.23it/s, Epoch=4, LR=2.29e-6, Train_Loss=0.0733]
100%|██████████| 83/83 [00:05<00:00, 15.31it/s, Epoch=4, LR=2.29e-6, Valid_Loss=0.056] 


Valid Loss Improved : 0.064123 ---> 0.056000
Model Saved

Training Complete in 0h 0m 21s
Best Loss: 0.0560


In [12]:
# Define Optimizer and Scheduler --
model = HateSpeechModel("cl-tohoku/bert-base-japanese", 2, custom_header="concatenate-4", dropout=0.2, n_msd=None)
optimizer = AdamW(model.parameters(), lr=1e-5, weight_decay=1e-6)
scheduler = fetch_scheduler(optimizer=optimizer, scheduler="None")
model.to(device)

model, history = run_training(
    model, train_loader, valid_loader,
    optimizer, scheduler, 1, device,
    True, 1, "_ALL_epoch5", "./test/02_5epoch_restart/",
    log=None, save_checkpoint=True, load_checkpoint="test/02_5epoch_restart/checkpoint-fold_ALL_epoch4.pth"
)

Some weights of the model checkpoint at cl-tohoku/bert-base-japanese were not used when initializing BertModel: ['cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.dense.weight', 'cls.predictions.bias']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


[33mconcatenate-4[39m
*** *** NOT implemented *** *** 
        --> CosineAnnealingLR *** *** 
[INFO] Using GPU : NVIDIA GeForce RTX 3090



100%|██████████| 164/164 [00:10<00:00, 15.79it/s, Epoch=5, LR=7.16e-6, Train_Loss=0.0693]
100%|██████████| 83/83 [00:05<00:00, 15.29it/s, Epoch=5, LR=7.16e-6, Valid_Loss=0.0355]


Valid Loss Improved : 0.056000 ---> 0.035544
Model Saved

Training Complete in 0h 0m 18s
Best Loss: 0.0355
