***Description***
<div> In this notebook, I trained the argumentation feature detection models (hence ArgFeat models) using the public data, provided by AlKhatib et al. (2016). Two ArgFeat models are trained: one for 3 argfeat labels (adopted by Alhindo et al. 2020) and one for 6 argfeat labels (original labels by Alkhatib et al. 2016).
<div> For each training, I specify the number of labels and number of epochs of the model. The results of the training are reported below.

In [1]:
#!pip install transformers

Defaulting to user installation because normal site-packages is not writeable


In [1]:
from datetime import datetime
import glob, os
import numpy as np
import pandas as pd
from sklearn import preprocessing
from sklearn.metrics import classification_report
import torch
import torch.nn as nn
from torch.utils.data import TensorDataset, DataLoader, RandomSampler, SequentialSampler
from tqdm import trange
import transformers
from transformers import BertTokenizerFast, BertForSequenceClassification



# Import and process .txt data via path

In [2]:
# JupyterLab directory
path_test= '/data/ArgFeatModel/corpus-webis-editorials-16/annotated-txt/split-for-evaluation-final/test'
path_training= '/data/ArgFeatModel/corpus-webis-editorials-16/annotated-txt/split-for-evaluation-final/training'
path_validation= '/data/ArgFeatModel/corpus-webis-editorials-16/annotated-txt/split-for-evaluation-final/validation'

In [22]:
# Helper function to process .txt data
def extract_df(path):
    main_df = pd.DataFrame(columns=['unit'])
    for filename in glob.glob(os.path.join(path, '*.txt')):
        with open(os.path.join(os.getcwd(), filename), 'r') as f: 
            lines = f.readlines()
            this_lines_df = pd.DataFrame(lines, columns=['unit'])
            main_df = pd.concat([main_df,this_lines_df])
    main_df[['index','label','text','note']] = main_df['unit'].str.split('\t',n=3,expand=True)
    main_df = main_df.drop(['index','unit','note'],axis=1).replace('\n','', regex=True)
    main_df = main_df[main_df['label']!='par-sep']
    return main_df

def arrange_df(main_df):
    main_df = main_df[~main_df['label'].isin(['title','par-sep','no-unit'])]
    result_df = main_df.groupby((~main_df.label.str.match('continued')).shift().cumsum(), as_index=False).sum()
    result_df['label']=result_df['label'].str.replace('continued','')
    return result_df

def coarse_label(main_df):
    main_df.loc[main_df['label'].str.contains("assumption"),'coarse_label'] = '0' # claim
    main_df.loc[main_df['label'].str.contains("other"),'coarse_label'] = '1' # others
    main_df.loc[~main_df['label'].str.contains("assumption|other"),'coarse_label'] = '2' # premise
    return main_df

def fine_label(main_df):
    le = preprocessing.LabelEncoder()
    le.fit(main_df.label)
    main_df['label'] = le.transform(main_df.label)
    # to inverse
    #le.inverse_transform(main_df['label'])
    return main_df

In [23]:
# Main function to import data

path_list = [path_training,path_validation,path_test]
df_list = []

# extract .txt data
for path in path_list:
    main_df = extract_df(path)
    final_df = arrange_df(main_df)
    final_df = coarse_label(final_df)
    final_df = fine_label(final_df)
    df_list.append(final_df)
    
# define datasets
train_df = df_list[0]
val_df = df_list[1]
test_df = df_list[2]
all_data_df = pd.concat([train_df,val_df,test_df])

# save datasets
train_df.to_csv('ArgFeatData/train.csv',index=False)
val_df.to_csv('ArgFeatData/val.csv',index=False)
test_df.to_csv('ArgFeatData/test.csv',index=False)
all_data_df.to_csv('ArgFeatData/all_data.csv',index=False)

In [44]:
# SHORTCUT: downloading existing .csv data

# loaded prepared data
train_df = pd.read_csv('ArgFeatData/train.csv')
val_df = pd.read_csv('ArgFeatData/val.csv')
test_df = pd.read_csv('ArgFeatData/test.csv')

# concat all data
all_data_df = pd.concat([train_df,val_df,test_df])

### Helper function: BERT Tokenizer and GPU

In [69]:
# Load the BERT tokenizer
tokenizer = BertTokenizerFast.from_pretrained('bert-base-cased')

# specify GPU
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

In [73]:
# Helper function for model initialization

def initialize_model(NUM_LABELS, learning_rate):
    # Load the BertForSequenceClassification model
    model = BertForSequenceClassification.from_pretrained(
        'bert-base-cased',
        num_labels = NUM_LABELS,
    #    output_attentions = False,
    #    output_hidden_states = False,
    )
    # set optimizer
    optimizer = torch.optim.AdamW(model.parameters(), 
                                  lr = learning_rate,
                                  #eps = 1e-08
                                  )
    return model, optimizer

In [62]:
# Helper functions for data processing

def define_data(NUM_LABELS, train_df, val_df, test_df):
    #text = all_data_df.text.values
    train_text = list(train_df.text.values)
    val_text = list(val_df.text.values)
    test_text = list(test_df.text.values)
    if NUM_LABELS == 3:
        #labels = all_data_df.coarse_label.values
        train_labels = list(train_df.coarse_label.values)
        val_labels = list(val_df.coarse_label.values)
        test_labels = list(test_df.coarse_label.values)
    if NUM_LABELS == 6:
        #labels = all_data_df.label.values
        train_labels = list(train_df.label.values)
        val_labels = list(val_df.label.values)
        test_labels = list(test_df.label.values) 
    return train_text, val_text, test_text, train_labels, val_labels, test_labels

def process_data(train_text, val_text, test_text, train_labels, val_labels, test_labels, max_seq_len):
    # tokenize and encode sequences in the training set
    tokens_train = tokenizer.batch_encode_plus(
        train_text,
        max_length = max_seq_len,
        pad_to_max_length=True,
        truncation=True,
        return_token_type_ids=False
    )
    # tokenize and encode sequences in the validation set
    tokens_val = tokenizer.batch_encode_plus(
        val_text,
        max_length = max_seq_len,
        pad_to_max_length=True,
        truncation=True,
        return_token_type_ids=False
    )
    # tokenize and encode sequences in the test set
    tokens_test = tokenizer.batch_encode_plus(
        test_text,
        max_length = max_seq_len,
        pad_to_max_length=True,
        truncation=True,
        return_token_type_ids=False
    )
    # for train set
    train_seq = torch.tensor(tokens_train['input_ids'])
    train_mask = torch.tensor(tokens_train['attention_mask'])
    train_y = torch.tensor(train_labels)
    # for validation set
    val_seq = torch.tensor(tokens_val['input_ids'])
    val_mask = torch.tensor(tokens_val['attention_mask'])
    val_y = torch.tensor(val_labels)
    # for test set
    test_seq = torch.tensor(tokens_test['input_ids'])
    test_mask = torch.tensor(tokens_test['attention_mask'])
    test_y = torch.tensor(test_labels)
    
    return [TensorDataset(train_seq, train_mask, train_y), 
            TensorDataset(val_seq, val_mask, val_y), 
            TensorDataset(test_seq, test_mask, test_y)]

def get_dataloader(for_dataloader, batch_size):
    # unwrap data
    train_data, val_data, test_data = for_dataloader
    return [DataLoader(train_data, sampler=RandomSampler(train_data), batch_size=batch_size),
            DataLoader(val_data, sampler=SequentialSampler(val_data), batch_size=batch_size),
            DataLoader(test_data, sampler=SequentialSampler(test_data), batch_size=batch_size)]

### Helper functions: Model Training and Validating

In [114]:
def b_tp(preds, labels):
    preds = np.argmax(preds, axis = 1).flatten()
    labels = labels.flatten()
    b_accuracy = sum([preds == labels for preds, labels in zip(preds, labels)])
    
    return b_accuracy / len(labels)

def get_timestamp():
    dt = datetime.now()
    ts = datetime.timestamp(dt)
    date_time = datetime.fromtimestamp(ts)
    
    return date_time.strftime("%Y%m%d_%H%M")

In [74]:
def train_val(model, optimizer, train_dataloader, val_dataloader, epochs):

    for _ in trange(epochs, desc = 'Epoch'):
    
        # ========== Training ==========
    
        # Set model to training mode
        model.train()
    
        # Tracking variables
        tr_loss = 0
        nb_tr_examples, nb_tr_steps = 0, 0

        for step, batch in enumerate(train_dataloader):
        
            if step % 50 == 0 and not step == 0:
              # Report progress.
                print('  Batch {:>5,}  of  {:>5,}.'.format(step, len(train_dataloader)))
        
            batch = tuple(t.to(device) for t in batch)
            b_input_ids, b_input_mask, b_labels = batch
        
            optimizer.zero_grad()
            # Forward pass
            train_output = model(b_input_ids, 
                                 token_type_ids = None, 
                                 attention_mask = b_input_mask, 
                                 labels = b_labels)
            # Backward pass
            train_output.loss.backward()
            optimizer.step()
            # Update tracking variables
            tr_loss += train_output.loss.item()
            nb_tr_examples += b_input_ids.size(0)
            nb_tr_steps += 1

        # ========== Validation ==========

        # Set model to evaluation mode
        model.eval()

        # Tracking variables 
        val_accuracy = []

        for batch in val_dataloader:
            batch = tuple(t.to(device) for t in batch)
            b_input_ids, b_input_mask, b_labels = batch
            with torch.no_grad():
              # Forward pass
              eval_output = model(b_input_ids, 
                                  token_type_ids = None, 
                                  attention_mask = b_input_mask)
            logits = eval_output.logits.detach().cpu().numpy()
            label_ids = b_labels.to('cpu').numpy()
        
            # Calculate validation accuracy
            b_accuracy = b_tp(logits, label_ids)
            val_accuracy.append(b_accuracy)

        print('\n\t - Train loss: {:.4f}'.format(tr_loss / nb_tr_steps))    
        print('\t - Validation Accuracy: {:.4f}'.format(sum(val_accuracy)/len(val_accuracy)))
        
    return model

### Helper function: Model Testing

In [86]:
def test_report(model, test_dataloader):
    model.eval()
    test_accuracy = []
    logits_list = []
    labels_list = []

    for batch in test_dataloader:
    
        batch = tuple(t.to(device) for t in batch)
        b_input_ids, b_input_mask, b_labels = batch
        with torch.no_grad():
            # Forward pass
            test_output = model(b_input_ids, 
                                token_type_ids = None, 
                                attention_mask = b_input_mask)
        logits = test_output.logits.detach().cpu().numpy()
        label_ids = b_labels.to('cpu').numpy()
    
        logits_list.extend(logits)
        labels_list.extend(label_ids)
    
        # Calculate test accuracy
        b_accuracy = b_tp(logits, label_ids)
        test_accuracy.append(b_accuracy)

    print('\t - Test Accuracy: {:.4f}'.format(sum(test_accuracy)/len(test_accuracy)))
    preds = list(np.argmax(logits_list,axis=1))
    print(classification_report(labels_list, preds))

# MAIN function

In [None]:
LEARNING_RATE = 2e-5
MAX_SEQ_LEN = 256
BATCH_SIZE = 16 #32

In [117]:
def main_function(num_labels,epochs):

    #initialize model
    model, optimizer = initialize_model(num_labels, LEARNING_RATE)

    # send model to device
    model.to(device)

    # define_data
    train_text, val_text, test_text, train_labels, val_labels, test_labels = define_data(num_labels,train_df, val_df, test_df)

    # process_data
    for_dataloader = process_data(train_text, val_text, test_text, train_labels, val_labels, test_labels, MAX_SEQ_LEN)

    #get dataloader
    train_dataloader, val_dataloader, test_dataloader = get_dataloader(for_dataloader, BATCH_SIZE)

    #training and validating model
    trained_model = train_val(model, optimizer, train_dataloader, val_dataloader, epochs)
    
    # save model
    str_time = get_timestamp()
    torch.save(model.state_dict(), \
               '/data/ArgFeatModel/ModelWeights/saved_weights_'+str(num_labels)+'_'+str(epochs)+'_'+str_time+'.pt')
    print('Model weights saves')

    #testing model
    test_report(trained_model, test_dataloader)

    # empty cache
    torch.cuda.empty_cache()
    
    print('Done.')

# Training models

In [None]:
import warnings
warnings.filterwarnings('ignore')

In [118]:
main_function(3,5)

Some weights of the model checkpoint at bert-base-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.bias', 'cls.seq_relationship.weight', 'cls.predictions.decoder.weight', 'cls.predictions.transform.dense.bias', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at b

  Batch    50  of    532.
  Batch   100  of    532.
  Batch   150  of    532.
  Batch   200  of    532.
  Batch   250  of    532.
  Batch   300  of    532.
  Batch   350  of    532.
  Batch   400  of    532.
  Batch   450  of    532.
  Batch   500  of    532.

	 - Train loss: 0.4340
	 - Validation Accuracy: 0.8586


Epoch:  20%|██        | 1/5 [03:35<14:21, 215.32s/it]

Model weights saves
  Batch    50  of    532.
  Batch   100  of    532.
  Batch   150  of    532.
  Batch   200  of    532.
  Batch   250  of    532.
  Batch   300  of    532.
  Batch   350  of    532.
  Batch   400  of    532.
  Batch   450  of    532.
  Batch   500  of    532.

	 - Train loss: 0.2531
	 - Validation Accuracy: 0.8613


Epoch:  40%|████      | 2/5 [07:11<10:48, 216.05s/it]

Model weights saves
  Batch    50  of    532.
  Batch   100  of    532.
  Batch   150  of    532.
  Batch   200  of    532.
  Batch   250  of    532.
  Batch   300  of    532.
  Batch   350  of    532.
  Batch   400  of    532.
  Batch   450  of    532.
  Batch   500  of    532.

	 - Train loss: 0.1194
	 - Validation Accuracy: 0.8473


Epoch:  60%|██████    | 3/5 [10:48<07:12, 216.27s/it]

Model weights saves
  Batch    50  of    532.
  Batch   100  of    532.
  Batch   150  of    532.
  Batch   200  of    532.
  Batch   250  of    532.
  Batch   300  of    532.
  Batch   350  of    532.
  Batch   400  of    532.
  Batch   450  of    532.
  Batch   500  of    532.

	 - Train loss: 0.0638
	 - Validation Accuracy: 0.8476


Epoch:  80%|████████  | 4/5 [14:24<03:36, 216.14s/it]

Model weights saves
  Batch    50  of    532.
  Batch   100  of    532.
  Batch   150  of    532.
  Batch   200  of    532.
  Batch   250  of    532.
  Batch   300  of    532.
  Batch   350  of    532.
  Batch   400  of    532.
  Batch   450  of    532.
  Batch   500  of    532.

	 - Train loss: 0.0456
	 - Validation Accuracy: 0.8483


Epoch: 100%|██████████| 5/5 [18:00<00:00, 216.10s/it]

Model weights saves





Model weights saves
	 - Test Accuracy: 0.8466
              precision    recall  f1-score   support

           0       0.90      0.89      0.89      2005
           1       0.43      0.19      0.27        31
           2       0.73      0.76      0.75       810

    accuracy                           0.85      2846
   macro avg       0.69      0.62      0.64      2846
weighted avg       0.85      0.85      0.85      2846

Done.


In [119]:
main_function(3,10)

Some weights of the model checkpoint at bert-base-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.bias', 'cls.seq_relationship.weight', 'cls.predictions.decoder.weight', 'cls.predictions.transform.dense.bias', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at b

  Batch    50  of    532.
  Batch   100  of    532.
  Batch   150  of    532.
  Batch   200  of    532.
  Batch   250  of    532.
  Batch   300  of    532.
  Batch   350  of    532.
  Batch   400  of    532.
  Batch   450  of    532.
  Batch   500  of    532.

	 - Train loss: 0.4330
	 - Validation Accuracy: 0.8536


Epoch:  10%|█         | 1/10 [03:35<32:21, 215.72s/it]

Model weights saves
  Batch    50  of    532.
  Batch   100  of    532.
  Batch   150  of    532.
  Batch   200  of    532.
  Batch   250  of    532.
  Batch   300  of    532.
  Batch   350  of    532.
  Batch   400  of    532.
  Batch   450  of    532.
  Batch   500  of    532.

	 - Train loss: 0.2558
	 - Validation Accuracy: 0.8593


Epoch:  20%|██        | 2/10 [07:11<28:47, 215.99s/it]

Model weights saves
  Batch    50  of    532.
  Batch   100  of    532.
  Batch   150  of    532.
  Batch   200  of    532.
  Batch   250  of    532.
  Batch   300  of    532.
  Batch   350  of    532.
  Batch   400  of    532.
  Batch   450  of    532.
  Batch   500  of    532.

	 - Train loss: 0.1367
	 - Validation Accuracy: 0.8366


Epoch:  30%|███       | 3/10 [10:48<25:13, 216.21s/it]

Model weights saves
  Batch    50  of    532.
  Batch   100  of    532.
  Batch   150  of    532.
  Batch   200  of    532.
  Batch   250  of    532.
  Batch   300  of    532.
  Batch   350  of    532.
  Batch   400  of    532.
  Batch   450  of    532.
  Batch   500  of    532.

	 - Train loss: 0.0741
	 - Validation Accuracy: 0.8422


Epoch:  40%|████      | 4/10 [14:24<21:36, 216.09s/it]

Model weights saves
  Batch    50  of    532.
  Batch   100  of    532.
  Batch   150  of    532.
  Batch   200  of    532.
  Batch   250  of    532.
  Batch   300  of    532.
  Batch   350  of    532.
  Batch   400  of    532.
  Batch   450  of    532.
  Batch   500  of    532.

	 - Train loss: 0.0466
	 - Validation Accuracy: 0.8222


Epoch:  50%|█████     | 5/10 [18:00<18:00, 216.06s/it]

Model weights saves
  Batch    50  of    532.
  Batch   100  of    532.
  Batch   150  of    532.
  Batch   200  of    532.
  Batch   250  of    532.
  Batch   300  of    532.
  Batch   350  of    532.
  Batch   400  of    532.
  Batch   450  of    532.
  Batch   500  of    532.

	 - Train loss: 0.0336
	 - Validation Accuracy: 0.8499


Epoch:  60%|██████    | 6/10 [21:36<14:23, 215.97s/it]

Model weights saves
  Batch    50  of    532.
  Batch   100  of    532.
  Batch   150  of    532.
  Batch   200  of    532.
  Batch   250  of    532.
  Batch   300  of    532.
  Batch   350  of    532.
  Batch   400  of    532.
  Batch   450  of    532.
  Batch   500  of    532.

	 - Train loss: 0.0271
	 - Validation Accuracy: 0.8526


Epoch:  70%|███████   | 7/10 [25:12<10:47, 216.00s/it]

Model weights saves
  Batch    50  of    532.
  Batch   100  of    532.
  Batch   150  of    532.
  Batch   200  of    532.
  Batch   250  of    532.
  Batch   300  of    532.
  Batch   350  of    532.
  Batch   400  of    532.
  Batch   450  of    532.
  Batch   500  of    532.

	 - Train loss: 0.0199
	 - Validation Accuracy: 0.8279


Epoch:  80%|████████  | 8/10 [28:49<07:12, 216.49s/it]

Model weights saves
  Batch    50  of    532.
  Batch   100  of    532.
  Batch   150  of    532.
  Batch   200  of    532.
  Batch   250  of    532.
  Batch   300  of    532.
  Batch   350  of    532.
  Batch   400  of    532.
  Batch   450  of    532.
  Batch   500  of    532.

	 - Train loss: 0.0175
	 - Validation Accuracy: 0.8519


Epoch:  90%|█████████ | 9/10 [32:26<03:36, 216.64s/it]

Model weights saves
  Batch    50  of    532.
  Batch   100  of    532.
  Batch   150  of    532.
  Batch   200  of    532.
  Batch   250  of    532.
  Batch   300  of    532.
  Batch   350  of    532.
  Batch   400  of    532.
  Batch   450  of    532.
  Batch   500  of    532.

	 - Train loss: 0.0130
	 - Validation Accuracy: 0.8372


Epoch: 100%|██████████| 10/10 [36:06<00:00, 216.69s/it]

Model weights saves





Model weights saves
	 - Test Accuracy: 0.8452
              precision    recall  f1-score   support

           0       0.91      0.88      0.89      2005
           1       0.29      0.19      0.23        31
           2       0.72      0.79      0.75       810

    accuracy                           0.85      2846
   macro avg       0.64      0.62      0.63      2846
weighted avg       0.85      0.85      0.85      2846

Done.


In [121]:
main_function(6,3)

Some weights of the model checkpoint at bert-base-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.bias', 'cls.seq_relationship.weight', 'cls.predictions.decoder.weight', 'cls.predictions.transform.dense.bias', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at b

  Batch    50  of    532.
  Batch   100  of    532.
  Batch   150  of    532.
  Batch   200  of    532.
  Batch   250  of    532.
  Batch   300  of    532.
  Batch   350  of    532.
  Batch   400  of    532.
  Batch   450  of    532.
  Batch   500  of    532.

	 - Train loss: 0.6591
	 - Validation Accuracy: 0.8456


Epoch:  33%|███▎      | 1/3 [03:35<07:10, 215.49s/it]

Model weights saves
  Batch    50  of    532.
  Batch   100  of    532.
  Batch   150  of    532.
  Batch   200  of    532.
  Batch   250  of    532.
  Batch   300  of    532.
  Batch   350  of    532.
  Batch   400  of    532.
  Batch   450  of    532.
  Batch   500  of    532.

	 - Train loss: 0.3645
	 - Validation Accuracy: 0.8369


Epoch:  67%|██████▋   | 2/3 [07:11<03:35, 215.76s/it]

Model weights saves
  Batch    50  of    532.
  Batch   100  of    532.
  Batch   150  of    532.
  Batch   200  of    532.
  Batch   250  of    532.
  Batch   300  of    532.
  Batch   350  of    532.
  Batch   400  of    532.
  Batch   450  of    532.
  Batch   500  of    532.

	 - Train loss: 0.1937
	 - Validation Accuracy: 0.8205


Epoch: 100%|██████████| 3/3 [10:46<00:00, 215.66s/it]

Model weights saves





Model weights saves
	 - Test Accuracy: 0.8276
              precision    recall  f1-score   support

           0       0.69      0.55      0.61       486
           1       0.88      0.92      0.90      2005
           2       0.06      0.03      0.04        29
           3       0.33      0.32      0.33        31
           4       0.67      0.73      0.70        81
           5       0.72      0.80      0.76       214

    accuracy                           0.83      2846
   macro avg       0.56      0.56      0.56      2846
weighted avg       0.82      0.83      0.82      2846

Done.


In [120]:
main_function(6,5)

Some weights of the model checkpoint at bert-base-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.bias', 'cls.seq_relationship.weight', 'cls.predictions.decoder.weight', 'cls.predictions.transform.dense.bias', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at b

  Batch    50  of    532.
  Batch   100  of    532.
  Batch   150  of    532.
  Batch   200  of    532.
  Batch   250  of    532.
  Batch   300  of    532.
  Batch   350  of    532.
  Batch   400  of    532.
  Batch   450  of    532.
  Batch   500  of    532.

	 - Train loss: 0.7149
	 - Validation Accuracy: 0.8302


Epoch:  20%|██        | 1/5 [03:41<14:44, 221.06s/it]

Model weights saves
  Batch    50  of    532.
  Batch   100  of    532.
  Batch   150  of    532.
  Batch   200  of    532.
  Batch   250  of    532.
  Batch   300  of    532.
  Batch   350  of    532.
  Batch   400  of    532.
  Batch   450  of    532.
  Batch   500  of    532.

	 - Train loss: 0.4021
	 - Validation Accuracy: 0.8382


Epoch:  40%|████      | 2/5 [07:17<10:55, 218.59s/it]

Model weights saves
  Batch    50  of    532.
  Batch   100  of    532.
  Batch   150  of    532.
  Batch   200  of    532.
  Batch   250  of    532.
  Batch   300  of    532.
  Batch   350  of    532.
  Batch   400  of    532.
  Batch   450  of    532.
  Batch   500  of    532.

	 - Train loss: 0.2346
	 - Validation Accuracy: 0.8165


Epoch:  60%|██████    | 3/5 [10:53<07:14, 217.36s/it]

Model weights saves
  Batch    50  of    532.
  Batch   100  of    532.
  Batch   150  of    532.
  Batch   200  of    532.
  Batch   250  of    532.
  Batch   300  of    532.
  Batch   350  of    532.
  Batch   400  of    532.
  Batch   450  of    532.
  Batch   500  of    532.

	 - Train loss: 0.1262
	 - Validation Accuracy: 0.8031


Epoch:  80%|████████  | 4/5 [14:29<03:36, 216.89s/it]

Model weights saves
  Batch    50  of    532.
  Batch   100  of    532.
  Batch   150  of    532.
  Batch   200  of    532.
  Batch   250  of    532.
  Batch   300  of    532.
  Batch   350  of    532.
  Batch   400  of    532.
  Batch   450  of    532.
  Batch   500  of    532.

	 - Train loss: 0.0736
	 - Validation Accuracy: 0.8061


Epoch: 100%|██████████| 5/5 [18:05<00:00, 217.20s/it]

Model weights saves





Model weights saves
	 - Test Accuracy: 0.8093
              precision    recall  f1-score   support

           0       0.65      0.57      0.61       486
           1       0.89      0.89      0.89      2005
           2       0.05      0.10      0.07        29
           3       0.36      0.29      0.32        31
           4       0.77      0.57      0.65        81
           5       0.66      0.82      0.73       214

    accuracy                           0.81      2846
   macro avg       0.56      0.54      0.55      2846
weighted avg       0.81      0.81      0.81      2846

Done.
