# Overview
This notebook is for training Longformer-base-4096 of 5 folds and calculating cv score after postprocessing predictions by using average probability of predicted classes. 

A notebook for inferencing is below.

https://www.kaggle.com/ytakayama/inference-pytorch-longformer-5fold

# Reference
Following notebooks are very informative and great. Thanks.
- https://www.kaggle.com/abhishek/two-longformers-are-better-than-1
- https://www.kaggle.com/cdeotte/pytorch-bigbird-ner-cv-0-615
- https://www.kaggle.com/nbroad/corrected-train-csv-feedback-prize


# How to infer
- calculate probability for 15 classes by 5 fold model(LongFormer)

15 classes: OUTPUT_LABELS in "constants" header which mean 14 combinations of 2 NER tags(B- /I-) and 7 elements + others

- calculate class of the highest probability 
 - inference test data: calculate average probability in 5 predictions
 - validate train data: use probability of each fold
- postprocess based on probability of predicted class and how long predcited class is continuous

# customize for executing this notebook on Google Colab
- prepare data: competion data and following data

https://www.kaggle.com/nbroad/corrected-train-csv-feedback-prize
- set variables about directories of Config class (e.g. data_dir)

## setup envirionment

In [None]:
import os

if os.environ.get("KAGGLE_KERNEL_RUN_TYPE") is None:
    ON_KAGGLE = False
else:
    ON_KAGGLE = True
if not ON_KAGGLE:
    import shutil
    from requests import get

    from google.colab import drive, files
    # mount Google Drive
    drive.mount("/content/drive")
    %cd drive/MyDrive/Kaggle_Feedback-Prize-Evaluating-Student-Writing/main/
    !pip install  -qq sentencepiece transformers torch==1.9.1 torchvision==0.10.1 torchAudio==0.9.1 torchtext==0.10.1
    for dirname, _, filenames in os.walk('/kaggle/input'):
        for filename in filenames:
            print(os.path.join(dirname, filename))

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
/content/drive/MyDrive/Kaggle_Feedback-Prize-Evaluating-Student-Writing/main


Config

In [None]:
class Config:
    name = 'fb_nb011'
    model_savename = 'roberta-large'
    if ON_KAGGLE:
        model_name = '../input/pt-longformer-base' # https://www.kaggle.com/kishalmandal/pt-longformer-base
        # base_dir = '/content/drive/MyDrive/petfinder'
        data_dir = '../input/feedback-prize-2021/'
        pre_data_dir = './preprocessed/'
        model_dir = '.'
        output_dir = '.'
    else:
        # customize for my own Google Colab Environment
        model_name = 'roberta-large'
        # model_name = 'allenai/longformer-base-4096' # download from Internet
        base_dir = '/content/drive/MyDrive/Kaggle_Feedback-Prize-Evaluating-Student-Writing/'
        data_dir = os.path.join(base_dir, 'input/feedback-prize-2021/')
        pre_data_dir = os.path.join(base_dir, 'data/preprocessed')
        model_dir = os.path.join(base_dir, f'model/{name}')
        output_dir = os.path.join(base_dir, f'output/{name}')
    is_debug = False
    load_texts = True
    n_epoch = 3 # not to exceed runtime limits on Kaggle
    n_fold = 5
    verbose_steps = 500
    random_seed = 71
    max_length = 512
    train_batch_size = 4
    valid_batch_size = 4
    lr = 5e-5
    num_labels = 15
    label_subtokens = True
    output_hidden_states = True
    hidden_dropout_prob = 0.1
    layer_norm_eps = 1e-7
    add_pooling_layer = False
    verbose_steps = 500
    if is_debug:
        debug_sample = 1000
        verbose_steps = 16
        n_epoch = 1
        n_fold = 2

constants

In [None]:
IGNORE_INDEX = -100
NON_LABEL = -1
OUTPUT_LABELS = ['0', 'B-Lead', 'I-Lead', 'B-Position', 'I-Position', 'B-Claim', 'I-Claim', 'B-Counterclaim', 'I-Counterclaim', 
                 'B-Rebuttal', 'I-Rebuttal', 'B-Evidence', 'I-Evidence', 'B-Concluding Statement', 'I-Concluding Statement']
LABELS_TO_IDS = {v:k for k,v in enumerate(OUTPUT_LABELS)}
IDS_TO_LABELS = {k:v for k,v in enumerate(OUTPUT_LABELS)}

MIN_THRESH = {
    "I-Lead": 9,
    "I-Position": 5,
    "I-Evidence": 14,
    "I-Claim": 3,
    "I-Concluding Statement": 11,
    "I-Counterclaim": 6,
    "I-Rebuttal": 4,
}

PROB_THRESH = {
    "I-Lead": 0.7,
    "I-Position": 0.55,
    "I-Evidence": 0.65,
    "I-Claim": 0.55,
    "I-Concluding Statement": 0.7,
    "I-Counterclaim": 0.5,
    "I-Rebuttal": 0.55,
}

In [None]:
if not ON_KAGGLE:
    if not os.path.exists(Config.model_dir):
        os.makedirs(Config.model_dir, exist_ok=True)
    if not os.path.exists(Config.output_dir):
        os.makedirs(Config.output_dir, exist_ok=True)

### libraries

In [None]:
# if not ON_KAGGLE:
#     !pip install -qq transformers

In [None]:
# general
import pandas as pd
import numpy as np
import time
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
sns.set()
import random
from tqdm.notebook import tqdm
from sklearn.model_selection import KFold
from sklearn.metrics import accuracy_score
import gc
from collections import defaultdict
# nlp
from sklearn.feature_extraction.text import CountVectorizer
import torch
import torch.nn as nn
from transformers import LongformerConfig, LongformerModel, LongformerTokenizerFast
from transformers import AutoConfig, AutoModel, AutoTokenizer
from torch.utils.data import Dataset, DataLoader
from torch.cuda.amp import autocast, GradScaler

## preprocess
use corrected train.csv

https://www.kaggle.com/nbroad/corrected-train-csv-feedback-prize/notebook

In [None]:
if ON_KAGGLE:
    df_alltrain = pd.read_csv('../input/corrected-train-csv-feedback-prize/corrected_train.csv')
else:
    df_alltrain = pd.read_csv(f'{Config.data_dir}/corrected_train.csv')

In [None]:
def agg_essays(train_flg):
    folder = 'train' if train_flg else 'test'
    names, texts =[], []
    for f in tqdm(list(os.listdir(f'{Config.data_dir}/{folder}'))):
        names.append(f.replace('.txt', ''))
        texts.append(open(f'{Config.data_dir}/{folder}/' + f, 'r').read())
        df_texts = pd.DataFrame({'id': names, 'text': texts})

    df_texts['text_split'] = df_texts.text.str.split()
    print('Completed tokenizing texts.')
    return df_texts

In [None]:
def ner(df_texts, df_train):
    all_entities = []
    for _,  row in tqdm(df_texts.iterrows(), total=len(df_texts)):
        total = len(row['text_split'])
        entities = ['0'] * total

        for _, row2 in df_train[df_train['id'] == row['id']].iterrows():
            discourse = row2['discourse_type']
            list_ix = [int(x) for x in row2['predictionstring'].split(' ')]
            entities[list_ix[0]] = f'B-{discourse}'
            for k in list_ix[1:]: entities[k] = f'I-{discourse}'
        all_entities.append(entities)

    df_texts['entities'] = all_entities
    print('Completed mapping discourse to each token.')
    return df_texts

In [None]:
if not Config.load_texts:    
    def preprocess(df_train = None):
        if df_train is None:
            train_flg = False
        else:
            train_flg = True
        
        df_texts = agg_essays(train_flg)

        if train_flg:
            df_texts = ner(df_texts, df_train)
        return df_texts
    
    alltrain_texts = preprocess(df_alltrain)
    test_texts = preprocess()
    # alltrain_texts.to_pickle('../input/fb-data/alltrain_texts_correct.pkl')
    # test_texts.to_pickle('../input/fb-data/test_texts_correct.pkl')
else:
    alltrain_texts = pd.read_pickle('../input/fb-data/alltrain_texts_correct.pkl')
    test_texts = pd.read_pickle('../input/fb-data/test_texts_correct.pkl')

In [None]:
if Config.is_debug:
    alltrain_texts = alltrain_texts.sample(Config.debug_sample).reset_index(drop=True)
print(len(alltrain_texts))

15594


set seed & split train/test

In [None]:
def seed_everything(seed=Config.random_seed):
    #os.environ['PYTHONSEED'] = str(seed)
    np.random.seed(seed%(2**32-1))
    random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed(seed)
    torch.backends.cudnn.deterministic=True
    torch.backends.cudnn.benchmark = False

seed_everything()
# device optimization
if torch.cuda.is_available():
    device = torch.device('cuda')
else:
    device = torch.device('cpu')

print(f'Using device: {device}')

Using device: cuda


In [None]:
def split_fold(df_train):
    ids = df_train['id'].unique()
    kf = KFold(n_splits=Config.n_fold, shuffle = True, random_state=Config.random_seed)
    for i_fold, (_, valid_index) in enumerate(kf.split(ids)):
        df_train.loc[valid_index,'fold'] = i_fold
    return df_train

alltrain_texts = split_fold(alltrain_texts)
alltrain_texts.head()

Unnamed: 0,id,text,text_split,entities,fold
0,F48EF80D2ED3,There are many programs in the world around yo...,"[There, are, many, programs, in, the, world, a...","[B-Lead, I-Lead, I-Lead, I-Lead, I-Lead, I-Lea...",0.0
1,F8FB4470A52F,"Dear Senator,\n\n""The Electoral College is a p...","[Dear, Senator,, ""The, Electoral, College, is,...","[0, 0, B-Lead, I-Lead, I-Lead, I-Lead, I-Lead,...",0.0
2,F176A8CF72BB,In my opinion i don't think that is fair. i th...,"[In, my, opinion, i, don't, think, that, is, f...","[B-Position, I-Position, I-Position, I-Positio...",4.0
3,EBDE7FC748A4,Unmasking the Face\n\nThe face on Mars was rea...,"[Unmasking, the, Face, The, face, on, Mars, wa...","[0, 0, 0, B-Position, I-Position, I-Position, ...",3.0
4,F6C40C564E5E,Luke think you should join the seagoing cowboy...,"[Luke, think, you, should, join, the, seagoing...","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, B-Claim, I-Clai...",4.0


## dataset

In [None]:
class FeedbackPrizeDataset(Dataset):
    def __init__(self, dataframe, tokenizer, max_len, has_labels):
        self.len = len(dataframe)
        self.data = dataframe
        self.tokenizer = tokenizer
        self.max_len = max_len
        self.has_labels = has_labels
    
    def __getitem__(self, index):
        text = self.data['text'][index]
        encoding = self.tokenizer(
            text.split(),
            is_split_into_words = True,
            padding = 'max_length',
            truncation = True,
            max_length = self.max_len
        )
        word_ids = encoding.word_ids()

        # targets
        if self.has_labels:
            word_labels = self.data.entities[index]
            prev_word_idx = None
            labels_ids = []
            for word_idx in word_ids:
                if word_idx is None:
                    labels_ids.append(IGNORE_INDEX)
                elif word_idx != prev_word_idx:
                    labels_ids.append(LABELS_TO_IDS[word_labels[word_idx]])
                else:
                    if Config.label_subtokens:
                        labels_ids.append(LABELS_TO_IDS[word_labels[word_idx]])
                    else:
                        labels_ids.append(IGNORE_INDEX)
                prev_word_idx = word_idx
            encoding['labels'] = labels_ids
        # convert to torch.tensor
        item = {k: torch.as_tensor(v) for k, v in encoding.items()}
        word_ids2 = [w if w is not None else NON_LABEL for w in word_ids]
        item['word_ids'] = torch.as_tensor(word_ids2)
        return item

    def __len__(self):
        return self.len

## model

In [None]:
class FeedbackModel(nn.Module):
    def __init__(self):
        super(FeedbackModel, self).__init__()
        model_config = AutoConfig.from_pretrained(Config.model_name)
        self.model_config = model_config
        self.backbone = AutoModel.from_pretrained(Config.model_name, config=model_config)
        self.head = nn.Linear(model_config.hidden_size, Config.num_labels)
    
    def forward(self, input_ids, mask):
        x = self.backbone(input_ids, mask)
        logits = self.head(x[0])
        return logits

## utility function

In [None]:
def active_logits(raw_logits, word_ids):
    word_ids = word_ids.view(-1)
    active_mask = word_ids.unsqueeze(1).expand(word_ids.shape[0], Config.num_labels)
    active_mask = active_mask != NON_LABEL
    active_logits = raw_logits.view(-1, Config.num_labels)
    active_logits = torch.masked_select(active_logits, active_mask) # return 1dTensor
    active_logits = active_logits.view(-1, Config.num_labels) 
    return active_logits

def active_labels(labels):
    active_mask = labels.view(-1) != IGNORE_INDEX
    active_labels = torch.masked_select(labels.view(-1), active_mask)
    return active_labels

def active_preds_prob(active_logits):
    active_preds = torch.argmax(active_logits, axis = 1)
    active_preds_prob, _ = torch.max(active_logits, axis = 1)
    return active_preds, active_preds_prob

## evaluating function

In [None]:
def calc_overlap(row):
    """
    calculate the overlap between prediction and ground truth
    """
    set_pred = set(row.new_predictionstring_pred.split(' '))
    set_gt = set(row.new_predictionstring_gt.split(' '))
    # length of each end intersection
    len_pred = len(set_pred)
    len_gt = len(set_gt)
    intersection = len(set_gt.intersection(set_pred))
    overlap_1 = intersection / len_gt
    overlap_2 = intersection / len_pred
    return [overlap_1, overlap_2]

def score_feedback_comp(pred_df, gt_df):
    """
    A function that scores for the kaggle
        Student Writing Competition
        
    Uses the steps in the evaluation page here:
        https://www.kaggle.com/c/feedback-prize-2021/overview/evaluation
    """
    gt_df = gt_df[['id', 'discourse_type', 'new_predictionstring']].reset_index(drop = True).copy()
    pred_df = pred_df[['id', 'class', 'new_predictionstring']].reset_index(drop = True).copy()
    gt_df['gt_id'] = gt_df.index
    pred_df['pred_id'] = pred_df.index
    joined = pred_df.merge(
        gt_df,
        left_on = ['id', 'class'],
        right_on = ['id', 'discourse_type'],
        how = 'outer',
        suffixes = ['_pred', '_gt']
    )
    joined['new_predictionstring_gt'] =  joined['new_predictionstring_gt'].fillna(' ')
    joined['new_predictionstring_pred'] =  joined['new_predictionstring_pred'].fillna(' ')
    joined['overlaps'] = joined.apply(calc_overlap, axis = 1)
    # overlap over 0.5: true positive
    # If nultiple overlaps exists, the higher is taken.
    joined['overlap1'] = joined['overlaps'].apply(lambda x: eval(str(x))[0])
    joined['overlap2'] = joined['overlaps'].apply(lambda x: eval(str(x))[1])

    joined['potential_TP'] = (joined['overlap1'] >= 0.5) & (joined['overlap2'] >= 0.5)
    joined['max_overlap'] = joined[['overlap1', 'overlap2']].max(axis = 1)
    tp_pred_ids = joined.query('potential_TP').sort_values('max_overlap', ascending = False)\
                  .groupby(['id', 'new_predictionstring_gt']).first()['pred_id'].values
    
    fp_pred_ids = [p for p in joined['pred_id'].unique() if p not in tp_pred_ids]
    matched_gt_ids = joined.query('potential_TP')['gt_id'].unique()
    unmatched_gt_ids = [c for c in joined['gt_id'].unique() if c not in matched_gt_ids]

    TP = len(tp_pred_ids)
    FP = len(fp_pred_ids)
    FN = len(unmatched_gt_ids)
    macro_f1_score = TP / (TP + 1/2 * (FP + FN))
    return macro_f1_score

def oof_score(df_val, oof):
    f1score = []
    classes = ['Lead', 'Position','Claim', 'Counterclaim', 'Rebuttal','Evidence','Concluding Statement']
    for c in classes:
        pred_df = oof.loc[oof['class'] == c].copy()
        gt_df = df_val.loc[df_val['discourse_type'] == c].copy()
        f1 = score_feedback_comp(pred_df, gt_df)
        print(f'{c:<10}: {f1:4f}')
        f1score.append(f1)
    f1avg = np.mean(f1score)
    return f1avg

## inferencing function

In [None]:
def inference(model, dl, criterion, valid_flg):
    final_predictions = []
    final_predictions_prob = []
    stream = tqdm(dl)
    model.eval()
    
    valid_loss = 0
    valid_accuracy = 0
    all_logits = None
    for batch_idx, batch in enumerate(stream, start = 1):
        ids = batch['input_ids'].to(device, dtype = torch.long)
        mask = batch['attention_mask'].to(device, dtype = torch.long)
        with torch.no_grad():
            raw_logits = model(input_ids=ids, mask = mask)
        del ids, mask
        
        word_ids = batch['word_ids'].to(device, dtype = torch.long)
        if valid_flg:    
            raw_labels = batch['labels'].to(device, dtype = torch.long)
            logits = active_logits(raw_logits, word_ids)
            labels = active_labels(raw_labels)
            preds, preds_prob = active_preds_prob(logits)
            valid_accuracy += accuracy_score(labels.cpu().numpy(), preds.cpu().numpy())
            loss = criterion(logits, labels)
            valid_loss += loss.item()
        
        if batch_idx == 1:
            all_logits = raw_logits.cpu().numpy()
        else:
            all_logits = np.append(all_logits, raw_logits.cpu().numpy(), axis=0)

    
    if valid_flg:        
        epoch_loss = valid_loss / batch_idx
        epoch_accuracy = valid_accuracy / batch_idx
    else:
        epoch_loss, epoch_accuracy = 0, 0
    return all_logits, epoch_loss, epoch_accuracy


def preds_class_prob(all_logits, dl):
    print("predict target class and its probabilty")
    final_predictions = []
    final_predictions_score = []
    stream = tqdm(dl)
    len_sample = all_logits.shape[0]

    for batch_idx, batch in enumerate(stream, start=0):
        for minibatch_idx in range(Config.valid_batch_size):
            sample_idx = int(batch_idx * Config.valid_batch_size + minibatch_idx)
            if sample_idx > len_sample - 1 : break
            word_ids = batch['word_ids'][minibatch_idx].numpy()
            predictions =[]
            predictions_prob = []
            pred_class_id = np.argmax(all_logits[sample_idx], axis=1)
            pred_score = np.max(all_logits[sample_idx], axis=1)
            pred_class_labels = [IDS_TO_LABELS[i] for i in pred_class_id]
            prev_word_idx = -1
            for idx, word_idx in enumerate(word_ids):
                if word_idx == -1:
                    pass
                elif word_idx != prev_word_idx:
                    predictions.append(pred_class_labels[idx])
                    predictions_prob.append(pred_score[idx])
                    prev_word_idx = word_idx
            final_predictions.append(predictions)
            final_predictions_score.append(predictions_prob)
    return final_predictions, final_predictions_score

In [None]:
def get_preds_onefold(model, df, dl, criterion, valid_flg):
    logits, valid_loss, valid_acc = inference(model, dl, criterion, valid_flg)
    all_preds, all_preds_prob = preds_class_prob(logits, dl)
    df_pred = post_process_pred(df, all_preds, all_preds_prob)
    return df_pred, valid_loss, valid_acc

def get_preds_folds(df, dl, criterion, valid_flg=False):
    for i_fold in range(Config.n_fold):
        model_filename = os.path.join(Config.model_dir, f"{Config.model_savename}_{i_fold}.bin")
        print(f"{model_filename} inference")
        model = FeedbackModel()
        model = model.to(device)
        model.load_state_dict(torch.load(model_filename))
        logits, valid_loss, valid_acc = inference(model, dl, criterion, valid_flg)
        if i_fold == 0:
            avg_pred_logits = logits
        else:
            avg_pred_logits += logits
    avg_pred_logits /= Config.n_fold
    all_preds, all_preds_prob = preds_class_prob(avg_pred_logits, dl)
    df_pred = post_process_pred(df, all_preds, all_preds_prob)
    return df_pred

def post_process_pred(df, all_preds, all_preds_prob):
    final_preds = []
    for i in range(len(df)):
        idx = df.id.values[i]
        pred = all_preds[i]
        pred_prob = all_preds_prob[i]
        j = 0
        while j < len(pred):
            cls = pred[j]
            if cls == '0': j += 1
            else: cls = cls.replace('B', 'I')
            end = j + 1
            while end < len(pred) and pred[end] == cls:
                end += 1
            if cls != '0' and cls !='':
                avg_score = np.mean(pred_prob[j:end])
                if end - j > MIN_THRESH[cls] and avg_score > PROB_THRESH[cls]:
                    final_preds.append((idx, cls.replace('I-', ''), ' '.join(map(str, list(range(j, end))))))
            j = end
    df_pred = pd.DataFrame(final_preds)
    df_pred.columns = ['id', 'class', 'new_predictionstring']
    return df_pred

## training and validating function

In [None]:
def train_fn(model, dl_train, optimizer, epoch, criterion):
    model.train()
    train_loss = 0
    train_accuracy = 0
    stream = tqdm(dl_train)
    scaler = GradScaler()

    for batch_idx, batch in enumerate(stream, start = 1):
        ids = batch['input_ids'].to(device, dtype = torch.long)
        mask = batch['attention_mask'].to(device, dtype = torch.long)
        raw_labels = batch['labels'].to(device, dtype = torch.long)
        word_ids = batch['word_ids'].to(device, dtype = torch.long)
        optimizer.zero_grad()
        with autocast():
            raw_logits = model(input_ids = ids, mask = mask)
        
        logits = active_logits(raw_logits, word_ids)
        labels = active_labels(raw_labels)
        preds, preds_prob = active_preds_prob(logits)
        train_accuracy += accuracy_score(labels.cpu().numpy(), preds.cpu().numpy())
        criterion = nn.CrossEntropyLoss()
        loss = criterion(logits, labels)

        scaler.scale(loss).backward()
        scaler.step(optimizer)
        scaler.update()
        train_loss += loss.item()
        
        if batch_idx % Config.verbose_steps == 0:
            loss_step = train_loss / batch_idx
            print(f'Training loss after {batch_idx:04d} training steps: {loss_step}')
            
    epoch_loss = train_loss / batch_idx
    epoch_accuracy = train_accuracy / batch_idx
    del dl_train, raw_logits, logits, raw_labels, preds, labels
    torch.cuda.empty_cache()
    gc.collect()
    print(f'epoch {epoch} - training loss: {epoch_loss:.4f}')
    print(f'epoch {epoch} - training accuracy: {epoch_accuracy:.4f}')
    return epoch_loss

In [None]:
def valid_fn(model, df_val, df_val_eval, dl_val, epoch, criterion):
    oof, valid_loss, valid_acc  = get_preds_onefold(model, df_val, dl_val, criterion, valid_flg=True)
    f1score =[]
    # classes = oof['class'].unique()
    classes = ['Lead', 'Position', 'Claim','Counterclaim', 'Rebuttal','Evidence','Concluding Statement']
    print(f"Validation F1 scores")

    for c in classes:
        pred_df = oof.loc[oof['class'] == c].copy()
        gt_df = df_val_eval.loc[df_val_eval['discourse_type'] == c].copy()
        f1 = score_feedback_comp(pred_df, gt_df)
        print(f' * {c:<10}: {f1:4f}')
        f1score.append(f1)
    f1avg = np.mean(f1score)
    print(f'Overall Validation avg F1: {f1avg:.4f} val_loss:{valid_loss:.4f} val_accuracy:{valid_acc:.4f}')
    return valid_loss, oof

## training loop



In [None]:
start_time = time.time()

oof = pd.DataFrame()
for i_fold in range(Config.n_fold):
    print('='*50, f'fold{i_fold} training', '='*50)
    tokenizer = AutoTokenizer.from_pretrained(Config.model_name, add_prefix_space = True)
    model = FeedbackModel()
    model = model.to(device)
    optimizer = torch.optim.Adam(params=model.parameters(), lr=Config.lr)
    
    df_train = alltrain_texts[alltrain_texts['fold'] != i_fold].reset_index(drop = True)
    ds_train = FeedbackPrizeDataset(df_train, tokenizer, Config.max_length, True)
    df_val = alltrain_texts[alltrain_texts['fold'] == i_fold].reset_index(drop = True)
    val_idlist = df_val['id'].unique().tolist()
    df_val_eval = df_alltrain.query('id==@val_idlist').reset_index(drop=True)
    ds_val = FeedbackPrizeDataset(df_val, tokenizer, Config.max_length, True)
    dl_train = DataLoader(ds_train, batch_size=Config.train_batch_size, shuffle=True, num_workers=2, pin_memory=True)
    dl_val = DataLoader(ds_val, batch_size=Config.valid_batch_size, shuffle=False, num_workers=2, pin_memory=True)

    best_val_loss = np.inf
    criterion = nn.CrossEntropyLoss()

    train_loss_history = []
    valid_loss_history = []

    for epoch in range(1, Config.n_epoch + 1):
        train_loss = train_fn(model, dl_train, optimizer, epoch, criterion) # train
        train_loss_history.append(train_loss) # train lossの保存

        valid_loss, _oof = valid_fn(model, df_val, df_val_eval, dl_val, epoch, criterion) # validation
        valid_loss_history.append(valid_loss) # valid lossの保存
        if valid_loss < best_val_loss:
            best_val_loss = valid_loss
            _oof_fold_best = _oof
            _oof_fold_best['fold'] = i_fold
            model_filename = f'{Config.model_dir}/{Config.model_savename}_{i_fold}.bin'
            torch.save(model.state_dict(), model_filename)
            print(f'{model_filename} saved')
    
    # lossの描画
    fig, ax = plt.subplots(1,1, figsize=(10,6))
    sns.lineplot(data=train_loss_history, label='train loss')
    sns.lineplot(data=valid_loss_history, label='valid loss')
    ax.set_title(f'loss history: fold{i_fold}')
    plt.legend();

    oof = pd.concat([oof, _oof_fold_best])
    del df_train, ds_train, df_val, val_idlist, df_val_eval, ds_val, dl_train, dl_val, tokenizer, model, optimizer
    gc.collect()

print(f'{time.time() - start_time:.1f}s')



Some weights of the model checkpoint at roberta-large were not used when initializing RobertaModel: ['lm_head.dense.weight', 'lm_head.dense.bias', 'lm_head.layer_norm.weight', 'lm_head.decoder.weight', 'lm_head.layer_norm.bias', 'lm_head.bias']
- This IS expected if you are initializing RobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


  0%|          | 0/3119 [00:00<?, ?it/s]

Training loss after 0500 training steps: 1.21882275390625
Training loss after 1000 training steps: 1.404910400390625
Training loss after 1500 training steps: 1.46387060546875
Training loss after 2000 training steps: 1.4925455322265626
Training loss after 2500 training steps: 1.51202236328125
Training loss after 3000 training steps: 1.5281582845052084
epoch 1 - training loss: 1.5308
epoch 1 - training accuracy: 0.5548


  0%|          | 0/780 [00:00<?, ?it/s]

predict target class and its probabilty


  0%|          | 0/780 [00:00<?, ?it/s]

Validation F1 scores
 * Lead      : 0.000000
 * Position  : 0.000000
 * Claim     : 0.000000
 * Counterclaim: 0.000000
 * Rebuttal  : 0.000000
 * Evidence  : 0.041264
 * Concluding Statement: 0.000000
Overall Validation avg F1: 0.0059 val_loss:1.5740 val_accuracy:0.5438
/content/drive/MyDrive/Kaggle_Feedback-Prize-Evaluating-Student-Writing/model/fb_nb011/roberta-large_0.bin saved


  0%|          | 0/3119 [00:00<?, ?it/s]

Training loss after 0500 training steps: 1.58816015625
Training loss after 1000 training steps: 1.5810673828125
Training loss after 1500 training steps: 1.5847604166666667
Training loss after 2000 training steps: 1.5852607421875
Training loss after 2500 training steps: 1.583562890625
Training loss after 3000 training steps: 1.5853287760416668
epoch 2 - training loss: 1.5858
epoch 2 - training accuracy: 0.5406


  0%|          | 0/780 [00:00<?, ?it/s]

predict target class and its probabilty


  0%|          | 0/780 [00:00<?, ?it/s]

Validation F1 scores
 * Lead      : 0.000000
 * Position  : 0.000000
 * Claim     : 0.000000
 * Counterclaim: 0.000000
 * Rebuttal  : 0.000000
 * Evidence  : 0.041264
 * Concluding Statement: 0.000000
Overall Validation avg F1: 0.0059 val_loss:1.5711 val_accuracy:0.5438
/content/drive/MyDrive/Kaggle_Feedback-Prize-Evaluating-Student-Writing/model/fb_nb011/roberta-large_0.bin saved


  0%|          | 0/3119 [00:00<?, ?it/s]

Training loss after 0500 training steps: 1.594736328125
Training loss after 1000 training steps: 1.5862490234375
Training loss after 1500 training steps: 1.58411328125
Training loss after 2000 training steps: 1.58579052734375
Training loss after 2500 training steps: 1.58427265625
Training loss after 3000 training steps: 1.5841477864583333
epoch 3 - training loss: 1.5840
epoch 3 - training accuracy: 0.5402


  0%|          | 0/780 [00:00<?, ?it/s]

predict target class and its probabilty


  0%|          | 0/780 [00:00<?, ?it/s]

Validation F1 scores
 * Lead      : 0.000000
 * Position  : 0.000000
 * Claim     : 0.000000
 * Counterclaim: 0.000000
 * Rebuttal  : 0.000000
 * Evidence  : 0.041264
 * Concluding Statement: 0.000000
Overall Validation avg F1: 0.0059 val_loss:1.5696 val_accuracy:0.5438
/content/drive/MyDrive/Kaggle_Feedback-Prize-Evaluating-Student-Writing/model/fb_nb011/roberta-large_0.bin saved


Some weights of the model checkpoint at roberta-large were not used when initializing RobertaModel: ['lm_head.dense.weight', 'lm_head.dense.bias', 'lm_head.layer_norm.weight', 'lm_head.decoder.weight', 'lm_head.layer_norm.bias', 'lm_head.bias']
- This IS expected if you are initializing RobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


  0%|          | 0/3119 [00:00<?, ?it/s]

Training loss after 0500 training steps: 0.99971142578125
Training loss after 1000 training steps: 0.909994384765625
Training loss after 1500 training steps: 0.8784734700520833
Training loss after 2000 training steps: 0.8553240966796875
Training loss after 2500 training steps: 0.83362724609375
Training loss after 3000 training steps: 0.823921142578125
epoch 1 - training loss: 0.8210
epoch 1 - training accuracy: 0.7299


  0%|          | 0/780 [00:00<?, ?it/s]

predict target class and its probabilty


  0%|          | 0/780 [00:00<?, ?it/s]

Validation F1 scores
 * Lead      : 0.756420
 * Position  : 0.590922
 * Claim     : 0.495585
 * Counterclaim: 0.386861
 * Rebuttal  : 0.264298
 * Evidence  : 0.553666
 * Concluding Statement: 0.567536
Overall Validation avg F1: 0.5165 val_loss:0.7153 val_accuracy:0.7507
/content/drive/MyDrive/Kaggle_Feedback-Prize-Evaluating-Student-Writing/model/fb_nb011/roberta-large_1.bin saved


  0%|          | 0/3119 [00:00<?, ?it/s]

Training loss after 0500 training steps: 0.683986328125
Training loss after 1000 training steps: 0.6868236083984375
Training loss after 1500 training steps: 0.6891913248697916
Training loss after 2000 training steps: 0.6987527465820312
Training loss after 2500 training steps: 0.694784326171875
Training loss after 3000 training steps: 0.6966045735677083
epoch 2 - training loss: 0.6975
epoch 2 - training accuracy: 0.7643


  0%|          | 0/780 [00:00<?, ?it/s]

predict target class and its probabilty


  0%|          | 0/780 [00:00<?, ?it/s]

Validation F1 scores
 * Lead      : 0.714938
 * Position  : 0.591238
 * Claim     : 0.470955
 * Counterclaim: 0.374156
 * Rebuttal  : 0.258442
 * Evidence  : 0.586865
 * Concluding Statement: 0.539898
Overall Validation avg F1: 0.5052 val_loss:0.7213 val_accuracy:0.7552


  0%|          | 0/3119 [00:00<?, ?it/s]

Training loss after 0500 training steps: 0.61921142578125
Training loss after 1000 training steps: 0.620364013671875
Training loss after 1500 training steps: 0.6457108561197916
Training loss after 2000 training steps: 0.6444053955078125
Training loss after 2500 training steps: 0.65260263671875
Training loss after 3000 training steps: 0.6542884521484374
epoch 3 - training loss: 0.6569
epoch 3 - training accuracy: 0.7754


  0%|          | 0/780 [00:00<?, ?it/s]

predict target class and its probabilty


  0%|          | 0/780 [00:00<?, ?it/s]

Validation F1 scores
 * Lead      : 0.761239
 * Position  : 0.595693
 * Claim     : 0.495340
 * Counterclaim: 0.330371
 * Rebuttal  : 0.248481
 * Evidence  : 0.587203
 * Concluding Statement: 0.550887
Overall Validation avg F1: 0.5099 val_loss:0.7200 val_accuracy:0.7590


Some weights of the model checkpoint at roberta-large were not used when initializing RobertaModel: ['lm_head.dense.weight', 'lm_head.dense.bias', 'lm_head.layer_norm.weight', 'lm_head.decoder.weight', 'lm_head.layer_norm.bias', 'lm_head.bias']
- This IS expected if you are initializing RobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


  0%|          | 0/3119 [00:00<?, ?it/s]

Training loss after 0500 training steps: 0.99243896484375
Training loss after 1000 training steps: 0.91845703125
Training loss after 1500 training steps: 0.8844059244791667
Training loss after 2000 training steps: 0.9893203125
Training loss after 2500 training steps: 1.109030078125
Training loss after 3000 training steps: 1.1901858723958334
epoch 1 - training loss: 1.2062
epoch 1 - training accuracy: 0.6356


  0%|          | 0/780 [00:00<?, ?it/s]

predict target class and its probabilty


  0%|          | 0/780 [00:00<?, ?it/s]

Validation F1 scores
 * Lead      : 0.000000
 * Position  : 0.000000
 * Claim     : 0.000000
 * Counterclaim: 0.000000
 * Rebuttal  : 0.000000
 * Evidence  : 0.040895
 * Concluding Statement: 0.000000
Overall Validation avg F1: 0.0058 val_loss:1.5698 val_accuracy:0.5461
/content/drive/MyDrive/Kaggle_Feedback-Prize-Evaluating-Student-Writing/model/fb_nb011/roberta-large_2.bin saved


  0%|          | 0/3119 [00:00<?, ?it/s]

Training loss after 0500 training steps: 1.590615234375
Training loss after 1000 training steps: 1.594962890625
Training loss after 1500 training steps: 1.5917239583333334
Training loss after 2000 training steps: 1.590499267578125
Training loss after 2500 training steps: 1.5913908203125
Training loss after 3000 training steps: 1.5910704752604166
epoch 2 - training loss: 1.5902
epoch 2 - training accuracy: 0.5396


  0%|          | 0/780 [00:00<?, ?it/s]

predict target class and its probabilty


  0%|          | 0/780 [00:00<?, ?it/s]

Validation F1 scores
 * Lead      : 0.000000
 * Position  : 0.000000
 * Claim     : 0.000000
 * Counterclaim: 0.000000
 * Rebuttal  : 0.000000
 * Evidence  : 0.040895
 * Concluding Statement: 0.000000
Overall Validation avg F1: 0.0058 val_loss:1.5697 val_accuracy:0.5461
/content/drive/MyDrive/Kaggle_Feedback-Prize-Evaluating-Student-Writing/model/fb_nb011/roberta-large_2.bin saved


  0%|          | 0/3119 [00:00<?, ?it/s]

Training loss after 0500 training steps: 1.6036494140625
Training loss after 1000 training steps: 1.59152001953125
Training loss after 1500 training steps: 1.5917867838541666
Training loss after 2000 training steps: 1.59049462890625
Training loss after 2500 training steps: 1.589217578125
Training loss after 3000 training steps: 1.5868429361979166
epoch 3 - training loss: 1.5868
epoch 3 - training accuracy: 0.5396


  0%|          | 0/780 [00:00<?, ?it/s]

predict target class and its probabilty


  0%|          | 0/780 [00:00<?, ?it/s]

Validation F1 scores
 * Lead      : 0.000000
 * Position  : 0.000000
 * Claim     : 0.000000
 * Counterclaim: 0.000000
 * Rebuttal  : 0.000000
 * Evidence  : 0.040895
 * Concluding Statement: 0.000000
Overall Validation avg F1: 0.0058 val_loss:1.5681 val_accuracy:0.5461
/content/drive/MyDrive/Kaggle_Feedback-Prize-Evaluating-Student-Writing/model/fb_nb011/roberta-large_2.bin saved


Some weights of the model checkpoint at roberta-large were not used when initializing RobertaModel: ['lm_head.dense.weight', 'lm_head.dense.bias', 'lm_head.layer_norm.weight', 'lm_head.decoder.weight', 'lm_head.layer_norm.bias', 'lm_head.bias']
- This IS expected if you are initializing RobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


  0%|          | 0/3119 [00:00<?, ?it/s]

Training loss after 0500 training steps: 1.0202509765625
Training loss after 1000 training steps: 0.923643310546875
Training loss after 1500 training steps: 0.8774303385416666
Training loss after 2000 training steps: 0.850814453125
Training loss after 2500 training steps: 0.83759794921875
Training loss after 3000 training steps: 0.8274967447916667
epoch 1 - training loss: 0.8232
epoch 1 - training accuracy: 0.7297


  0%|          | 0/780 [00:00<?, ?it/s]

predict target class and its probabilty


  0%|          | 0/780 [00:00<?, ?it/s]

Validation F1 scores
 * Lead      : 0.760961
 * Position  : 0.626400
 * Claim     : 0.520991
 * Counterclaim: 0.396624
 * Rebuttal  : 0.222054
 * Evidence  : 0.606973
 * Concluding Statement: 0.521222
Overall Validation avg F1: 0.5222 val_loss:0.7075 val_accuracy:0.7610
/content/drive/MyDrive/Kaggle_Feedback-Prize-Evaluating-Student-Writing/model/fb_nb011/roberta-large_3.bin saved


  0%|          | 0/3119 [00:00<?, ?it/s]

Training loss after 0500 training steps: 0.654486572265625
Training loss after 1000 training steps: 0.6647398681640625
Training loss after 1500 training steps: 0.6734212239583334
Training loss after 2000 training steps: 0.6756982421875
Training loss after 2500 training steps: 0.677308447265625
Training loss after 3000 training steps: 0.6804123942057292
epoch 2 - training loss: 0.6812
epoch 2 - training accuracy: 0.7689


  0%|          | 0/780 [00:00<?, ?it/s]

predict target class and its probabilty


  0%|          | 0/780 [00:00<?, ?it/s]

Validation F1 scores
 * Lead      : 0.744388
 * Position  : 0.624111
 * Claim     : 0.537278
 * Counterclaim: 0.380297
 * Rebuttal  : 0.189046
 * Evidence  : 0.598835
 * Concluding Statement: 0.564389
Overall Validation avg F1: 0.5198 val_loss:0.7060 val_accuracy:0.7587
/content/drive/MyDrive/Kaggle_Feedback-Prize-Evaluating-Student-Writing/model/fb_nb011/roberta-large_3.bin saved


  0%|          | 0/3119 [00:00<?, ?it/s]

Training loss after 0500 training steps: 0.627308837890625
Training loss after 1000 training steps: 0.6313026123046875
Training loss after 1500 training steps: 0.6337461751302084
Training loss after 2000 training steps: 0.6380037841796875
Training loss after 2500 training steps: 0.63692451171875
Training loss after 3000 training steps: 0.6372532958984375
epoch 3 - training loss: 0.6364
epoch 3 - training accuracy: 0.7808


  0%|          | 0/780 [00:00<?, ?it/s]

predict target class and its probabilty


  0%|          | 0/780 [00:00<?, ?it/s]

Validation F1 scores
 * Lead      : 0.755792
 * Position  : 0.639179
 * Claim     : 0.533869
 * Counterclaim: 0.401490
 * Rebuttal  : 0.334071
 * Evidence  : 0.606481
 * Concluding Statement: 0.579001
Overall Validation avg F1: 0.5500 val_loss:0.6879 val_accuracy:0.7675
/content/drive/MyDrive/Kaggle_Feedback-Prize-Evaluating-Student-Writing/model/fb_nb011/roberta-large_3.bin saved


Some weights of the model checkpoint at roberta-large were not used when initializing RobertaModel: ['lm_head.dense.weight', 'lm_head.dense.bias', 'lm_head.layer_norm.weight', 'lm_head.decoder.weight', 'lm_head.layer_norm.bias', 'lm_head.bias']
- This IS expected if you are initializing RobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


  0%|          | 0/3119 [00:00<?, ?it/s]

Training loss after 0500 training steps: 0.99903125
Training loss after 1000 training steps: 0.901011962890625
Training loss after 1500 training steps: 0.8612869466145834
Training loss after 2000 training steps: 0.8759771728515625
Training loss after 2500 training steps: 1.02029697265625
Training loss after 3000 training steps: 1.1159481608072916
epoch 1 - training loss: 1.1330
epoch 1 - training accuracy: 0.6528


  0%|          | 0/780 [00:00<?, ?it/s]

predict target class and its probabilty


  0%|          | 0/780 [00:00<?, ?it/s]

Validation F1 scores
 * Lead      : 0.000000
 * Position  : 0.000000
 * Claim     : 0.000000
 * Counterclaim: 0.000000
 * Rebuttal  : 0.000000
 * Evidence  : 0.038474
 * Concluding Statement: 0.000000
Overall Validation avg F1: 0.0055 val_loss:1.5821 val_accuracy:0.5391
/content/drive/MyDrive/Kaggle_Feedback-Prize-Evaluating-Student-Writing/model/fb_nb011/roberta-large_4.bin saved


  0%|          | 0/3119 [00:00<?, ?it/s]

Training loss after 0500 training steps: 1.5857529296875
Training loss after 1000 training steps: 1.58992041015625


In [None]:
oof.head()

In [None]:
oof.to_csv(f'{Config.output_dir}/oof_{Config.name}.csv', index=False)

In [None]:
pd.read_csv(f'{Config.output_dir}/oof_{Config.name}.csv').head()

## cv score

In [None]:
if Config.is_debug:
    idlist = alltrain_texts['id'].unique().tolist()
    df_train = df_alltrain.query('id==@idlist')
else:
    df_train = df_alltrain.copy()
print(f'overall cv score: {oof_score(df_train, oof)}')