## FRISS with MFC

Implementation of the FRISS using the Media Frames Corpus (MFC) from Card et al. (2015). 

In [1]:
!pip install nltk

[0m

In [2]:
import nltk
nltk.download("all")

[nltk_data] Downloading collection 'all'
[nltk_data]    | 
[nltk_data]    | Downloading package abc to /root/nltk_data...
[nltk_data]    |   Unzipping corpora/abc.zip.
[nltk_data]    | Downloading package alpino to /root/nltk_data...
[nltk_data]    |   Unzipping corpora/alpino.zip.
[nltk_data]    | Downloading package averaged_perceptron_tagger to
[nltk_data]    |     /root/nltk_data...
[nltk_data]    |   Unzipping taggers/averaged_perceptron_tagger.zip.
[nltk_data]    | Downloading package averaged_perceptron_tagger_ru to
[nltk_data]    |     /root/nltk_data...
[nltk_data]    |   Unzipping
[nltk_data]    |       taggers/averaged_perceptron_tagger_ru.zip.
[nltk_data]    | Downloading package basque_grammars to
[nltk_data]    |     /root/nltk_data...
[nltk_data]    |   Unzipping grammars/basque_grammars.zip.
[nltk_data]    | Downloading package bcp47 to /root/nltk_data...
[nltk_data]    | Downloading package biocreative_ppi to
[nltk_data]    |     /root/nltk_data...
[nltk_data]    |   U

True

In [3]:
import os

os.listdir(os.getcwd())

['FRISS_srl.pkl',
 'training_metrics.csv',
 'README.md',
 'notebooks',
 'grid_search_metrics.csv',
 '.git',
 'assets',
 'test.csv',
 'friss',
 'models',
 '.ipynb_checkpoints',
 'data',
 '.gitignore',
 'frameaxis']

In [4]:
labels_path = "data/mfc/immigration_labeled.json"
codes_path = "data/mfc/codes.json"

In [5]:
# load data from path 
import json

with open(labels_path) as f:
    labels = json.load(f)

with open(codes_path) as f:
    codes = json.load(f)

In [6]:
import pandas as pd
from nltk.tokenize import sent_tokenize

# articles list
articles_list = []

# Iterate through the data to fill the DataFrame
for article_id, article_data in labels.items():
    annotations_data = article_data['annotations']

    irrelevant_dict = annotations_data['irrelevant']

    text = article_data['text']
    irrelevant = article_data['irrelevant']

    # if primary_frame is none set to 15.0
    if article_data['primary_frame'] is not None:
        primary_frame = str(article_data['primary_frame']).split(".")[0] + ".0"
    else:
        primary_frame = "15.0"

    # get primary frame from code
    primary_frame = str(codes[primary_frame])

    # split text into sentences using nltk library
    sentences = sent_tokenize(text)

    # iterate through sentences
    for sentence in sentences:
        article = {
            'article_id': article_id,
            'irrelevant': irrelevant,
            'text': sentence,
            'document_frame': primary_frame
        }

        articles_list.append(article)

# Create a DataFrame to store the results
df = pd.DataFrame(articles_list, columns=['article_id', 'irrelevant', 'text', 'document_frame'])


In [7]:
df

Unnamed: 0,article_id,irrelevant,text,document_frame
0,Immigration1.0-10005,0.0,IMM-10005\n\nPRIMARY\n\nImmigrants without HOP...,Quality of Life
1,Immigration1.0-10005,0.0,It mounted as students went around the room te...,Quality of Life
2,Immigration1.0-10005,0.0,Georgia Tech.,Quality of Life
3,Immigration1.0-10005,0.0,University of Georgia.,Quality of Life
4,Immigration1.0-10005,0.0,"""All I could say was, 'I'm planning to see if ...",Quality of Life
...,...,...,...,...
74463,Immigration1.0-9998,0.0,"Sue Brown, spokeswoman for the INS, said it's ...",Crime and Punishment
74464,Immigration1.0-9998,0.0,"""They love it,"" she said.",Crime and Punishment
74465,Immigration1.0-9998,0.0,"""They use these units to interview the people,...",Crime and Punishment
74466,Immigration1.0-9998,0.0,"""We do about 15 interviews a day,"" Brown said.",Crime and Punishment


In [8]:
df = df[df["irrelevant"] == False][["article_id", "text", "document_frame"]]

In [9]:
df.head()

Unnamed: 0,article_id,text,document_frame
0,Immigration1.0-10005,IMM-10005\n\nPRIMARY\n\nImmigrants without HOP...,Quality of Life
1,Immigration1.0-10005,It mounted as students went around the room te...,Quality of Life
2,Immigration1.0-10005,Georgia Tech.,Quality of Life
3,Immigration1.0-10005,University of Georgia.,Quality of Life
4,Immigration1.0-10005,"""All I could say was, 'I'm planning to see if ...",Quality of Life


In [10]:
# create for each code a col and fill with 1 if code is in code col
df = pd.concat([df, pd.get_dummies(df['document_frame'])], axis=1)

In [11]:
df.head()

Unnamed: 0,article_id,text,document_frame,Capacity and Resources,Crime and Punishment,Cultural Identity,Economic,External Regulation and Reputation,Fairness and Equality,Health and Safety,"Legality, Constitutionality, Jurisdiction",Morality,Other,Policy Prescription and Evaluation,Political,Public Sentiment,Quality of Life,Security and Defense
0,Immigration1.0-10005,IMM-10005\n\nPRIMARY\n\nImmigrants without HOP...,Quality of Life,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0
1,Immigration1.0-10005,It mounted as students went around the room te...,Quality of Life,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0
2,Immigration1.0-10005,Georgia Tech.,Quality of Life,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0
3,Immigration1.0-10005,University of Georgia.,Quality of Life,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0
4,Immigration1.0-10005,"""All I could say was, 'I'm planning to see if ...",Quality of Life,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0


In [12]:
df.shape

(67480, 18)

### Create Dataset

In [13]:
import warnings
warnings.filterwarnings("ignore", category=UserWarning, module='transformers')

### Extract SRL Embeddings from articles

In [14]:
!pip install pycuda
!pip install allennlp allennlp-models

Collecting pycuda
  Downloading pycuda-2023.1.tar.gz (1.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.7/1.7 MB[0m [31m75.9 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25ldone
[?25h  Getting requirements to build wheel ... [?25ldone
[?25h  Preparing metadata (pyproject.toml) ... [?25ldone
[?25hCollecting pytools>=2011.2
  Downloading pytools-2023.1.1-py2.py3-none-any.whl (70 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m70.6/70.6 kB[0m [31m24.9 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting mako
  Downloading Mako-1.3.0-py3-none-any.whl (78 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m78.6/78.6 kB[0m [31m30.5 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting appdirs>=1.4.0
  Downloading appdirs-1.4.4-py2.py3-none-any.whl (9.6 kB)
Building wheels for collected packages: pycuda
  Building wheel for pycuda (pyproject.toml) ... [?25ldone
[?25h  Created wheel for pycuda: filename=p

In [15]:
from allennlp.predictors.predictor import Predictor
import pandas as pd

In [16]:
# get name / id of cuda device
import pycuda.driver as cuda

cuda.init()
device = cuda.Device(0)
print(device.name())

NVIDIA RTX A6000


In [17]:
from tqdm.notebook import tqdm
import pandas as pd

def batched_extract_srl_components(batched_sentences, predictor):
    # Convert each sentence into the required format for the predictor
    batched_sentences = [{'sentence': sentence} for sentence in batched_sentences]

    # Prepare the batched input for the predictor
    batched_srl = predictor.predict_batch_json(batched_sentences)

    # Extract SRL components from the batched predictions
    results = []
    for index, srl in enumerate(batched_srl):
        sentence_results = []
        for verb_entry in srl['verbs']:
            arg_components = {'ARG0': [], 'ARG1': []}
            for i, tag in enumerate(verb_entry['tags']):
                if 'ARG0' in tag:
                    arg_components['ARG0'].append(srl['words'][i])
                elif 'ARG1' in tag:
                    arg_components['ARG1'].append(srl['words'][i])

            if arg_components['ARG0'] or arg_components['ARG1']:
                sentence_results.append({
                    'predicate': verb_entry['verb'],
                    'ARG0': ' '.join(arg_components['ARG0']),
                    'ARG1': ' '.join(arg_components['ARG1'])
                })

        if sentence_results:
            # add empty dict if predicate, arg0 or arg1 is empty
            if not sentence_results[0]['predicate']:
                results.append({'predicate': '', 'ARG0': '', 'ARG1': ''})
            elif not sentence_results[0]['ARG0']:
                results.append({'predicate': '', 'ARG0': '', 'ARG1': ''})
            elif not sentence_results[0]['ARG1']:
                results.append({'predicate': '', 'ARG0': '', 'ARG1': ''})
            else:
                results.append(sentence_results)    
        else:
            results.append([{'predicate': '', 'ARG0': '', 'ARG1': ''}])

    return results

def optimized_extract_srl(X, predictor, batch_size=32):
    all_results = []

    # Process sentences in batches
    for i in tqdm(range(0, len(X), batch_size), desc="Processing Batches"):
        batched_sentences = X[i:i+batch_size]

        batch_results = batched_extract_srl_components(batched_sentences, predictor)

        all_results.extend(batch_results)

    return pd.Series(all_results)

In [18]:
import pickle

def get_X_srl(X, recalculate=False, pickle_path="../notebooks/classifier/X_srl_filtered.pkl"):
    """
    Returns the X_srl either by loading from a pickled file or recalculating.
    """
    if recalculate or not os.path.exists(pickle_path):
        print("Recalculate SRL")
        # Load predictor
        predictor = Predictor.from_path("https://storage.googleapis.com/allennlp-public-models/structured-prediction-srl-bert.2020.12.15.tar.gz", cuda_device=0)

        # make sentences max 480 chars long
        X = X.apply(lambda x: x[:480])

        X_srl = optimized_extract_srl(X, predictor, batch_size=32)
        with open(pickle_path, 'wb') as f:
            pickle.dump(X_srl, f)
    else:
        print("Load SRL from Pickle")
        with tqdm(total=os.path.getsize(pickle_path)) as pbar:
            with open(pickle_path, 'rb') as f:
                X_srl = pickle.load(f)
                pbar.update(os.path.getsize(pickle_path))
                
    return X_srl

In [19]:
# get_X_srl(df["text"], recalculate=False, pickle_path="../notebooks/FRISS_srl.pkl")

# GPU

In [20]:
import torch

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Using device: {device}")

def free_gpu():
    print(torch.cuda.mem_get_info())
    print(torch.cuda.memory_summary())

Using device: cuda


In [21]:
import torch
import gc

def list_gpu_tensors():
    for obj in gc.get_objects():
        try:
            if torch.is_tensor(obj):
                if obj.is_cuda:
                    obj = obj.cpu()
                    obj = obj.to("cpu")
                    print(type(obj), obj.size())
        except:
            pass

        
list_gpu_tensors()



# Dataset

In [89]:
import torch
from torch.utils.data import Dataset
from transformers import BertTokenizer
import pandas as pd

class ArticleDataset(Dataset):
    def __init__(self, X, X_srl, tokenizer, labels=None, max_sentences_per_article=32, max_sentence_length=32, max_args_per_sentence=10, max_arg_length=16):
        self.X = X  # DataFrame where each row has multiple sentences
        self.X_srl = X_srl  # DataFrame where each row has multiple dictionaries for SRL
        self.labels = labels  # DataFrame where each row has a list of lists of integers
        
        self.tokenizer = tokenizer
        self.max_sentences_per_article = max_sentences_per_article
        self.max_sentence_length = max_sentence_length
        self.max_args_per_sentence = max_args_per_sentence
        self.max_arg_length = max_arg_length

    def __len__(self):
        return len(self.X)

    def __getitem__(self, idx):
        sentences = self.X.iloc[idx]
        srl_data = self.X_srl.iloc[idx]
        labels = self.labels.iloc[idx]

        # Tokenize sentences
        sentence_ids = [self.tokenizer.encode(sentence, add_special_tokens=True, max_length=self.max_sentence_length, truncation=True, padding='max_length') for sentence in sentences]
        
        sentence_ids += [[0] * self.max_sentence_length] * (self.max_sentences_per_article - len(sentence_ids))
        sentence_ids = sentence_ids[:self.max_sentences_per_article]
        
        # Process SRL data
        predicates, arg0s, arg1s = [], [], []
        for srl_items in srl_data:

            sentence_predicates, sentence_arg0s, sentence_arg1s = [], [], []

            # if srl_items is not list pack it into a list
            if not isinstance(srl_items, list):
                srl_items = [srl_items]

            for item in srl_items:
                p = self.tokenizer.encode(item["predicate"], add_special_tokens=True, max_length=self.max_arg_length, truncation=True, padding='max_length')
                a0 = self.tokenizer.encode(item["ARG0"], add_special_tokens=True, max_length=self.max_arg_length, truncation=True, padding='max_length')
                a1 = self.tokenizer.encode(item["ARG1"], add_special_tokens=True, max_length=self.max_arg_length, truncation=True, padding='max_length')
                
                sentence_predicates.append(p)
                sentence_arg0s.append(a0)
                sentence_arg1s.append(a1)

            # Padding
            for _ in range(self.max_args_per_sentence - len(srl_items)):
                sentence_predicates.append([0] * self.max_arg_length)
                sentence_arg0s.append([0] * self.max_arg_length)
                sentence_arg1s.append([0] * self.max_arg_length)

            # pad to max_args_per_sentence
            sentence_predicates = sentence_predicates[:self.max_args_per_sentence]
            sentence_arg0s = sentence_arg0s[:self.max_args_per_sentence]
            sentence_arg1s = sentence_arg1s[:self.max_args_per_sentence]

            predicates.append(sentence_predicates)
            arg0s.append(sentence_arg0s)
            arg1s.append(sentence_arg1s)

        # Truncate or pad SRL items to max_sentences_per_article
        srl_padding = [[0] * self.max_arg_length] * self.max_args_per_sentence
        predicates = (predicates + [srl_padding] * self.max_sentences_per_article)[:self.max_sentences_per_article]
        arg0s = (arg0s + [srl_padding] * self.max_sentences_per_article)[:self.max_sentences_per_article]
        arg1s = (arg1s + [srl_padding] * self.max_sentences_per_article)[:self.max_sentences_per_article]

        data = {
            'sentence_ids': torch.tensor(sentence_ids, dtype=torch.long),
            'predicate_ids': torch.tensor(predicates, dtype=torch.long),
            'arg0_ids': torch.tensor(arg0s, dtype=torch.long),
            'arg1_ids': torch.tensor(arg1s, dtype=torch.long),
            'labels': torch.tensor(labels[0], dtype=torch.long)
        }

        return data


In [90]:
from torch.utils.data import DataLoader
from sklearn.model_selection import train_test_split



In [91]:
def custom_collate_fn(batch):
    # Extract individual lists from the batch
    sentence_ids = [item['sentence_ids'] for item in batch]
    predicate_ids = [item['predicate_ids'] for item in batch]
    arg0_ids = [item['arg0_ids'] for item in batch]
    arg1_ids = [item['arg1_ids'] for item in batch]
    labels = [item['labels'] for item in batch]
    
    # Pad each list
    sentence_ids = torch.nn.utils.rnn.pad_sequence(sentence_ids, batch_first=True, padding_value=0)
    predicate_ids = torch.nn.utils.rnn.pad_sequence(predicate_ids, batch_first=True, padding_value=0)
    arg0_ids = torch.nn.utils.rnn.pad_sequence(arg0_ids, batch_first=True, padding_value=0)
    arg1_ids = torch.nn.utils.rnn.pad_sequence(arg1_ids, batch_first=True, padding_value=0)
    labels = torch.nn.utils.rnn.pad_sequence(labels, batch_first=True, padding_value=0)

    # Conditionally extract and add labels
    output_dict = {
        'sentence_ids': sentence_ids,
        'predicate_ids': predicate_ids,
        'arg0_ids': arg0_ids,
        'arg1_ids': arg1_ids,
        'labels': labels
    }

    return output_dict

def preprocess_df(df, recalculate_srl=False, pickle_path="../notebooks/FRISS_srl.pkl"):
    # reset index of df
    df = df.reset_index(drop=True)

    # Get X_srl
    X_srl = get_X_srl(df["text"], recalculate=recalculate_srl, pickle_path=pickle_path)

    # reset index of X_srl
    X_srl = X_srl.reset_index(drop=True)

    # Columns to be one-hot encoded in y_subset
    y_cols = ['Capacity and Resources', 'Crime and Punishment', 'Cultural Identity', 
            'Economic', 'External Regulation and Reputation', 'Fairness and Equality', 
            'Health and Safety', 'Legality, Constitutionality, Jurisdiction', 
            'Morality', 'Other', 'Policy Prescription and Evaluation', 'Political', 
            'Public Sentiment', 'Quality of Life', 'Security and Defense']

    # Creating y_subset
    y_subset = df.groupby('article_id')[y_cols].apply(lambda x: x.values.tolist()).reset_index(name='encoded_values')
    y_subset = y_subset['encoded_values']

    # Aggregating 'text' column in df into a list of strings for each article_id
    X_subset = df.groupby('article_id')['text'].apply(list).reset_index(name='text')
    X_subset = X_subset['text']

    # Assuming X_srl follows the same index order as df
    X_srl_subset = X_srl.groupby(df['article_id']).apply(lambda x: x.values.tolist()).reset_index(name='srl_values')
    X_srl_subset = X_srl_subset['srl_values']

    return X_subset, X_srl_subset, y_subset

def get_datasets_dataloaders(df, tokenizer, recalculate_srl=False, pickle_path="../notebooks/FRISS_srl.pkl", batch_size=16, max_sentences_per_article=32, max_sentence_length=32, max_arg_length=16):
    
    X_subset, X_srl_subset, y_subset = preprocess_df(df, recalculate_srl=recalculate_srl, pickle_path=pickle_path)

    # Len
    print("X:", len(X_subset))
    print("X_srl:", len(X_srl_subset))
    print("y:", len(y_subset))

    print("CREATING DATASETS")
    test_size = 0.1
    
    # Assuming X, X_srl, and y are already defined and have the same number of samples
    X_train, X_test, y_train, y_test = train_test_split(X_subset, y_subset, test_size=test_size, random_state=42)
    
    print("TRAIN TEST SPLIT DONE")
    
    X_srl_train, X_srl_test, _, _ = train_test_split(X_srl_subset, y_subset, test_size=test_size, random_state=42)

    # Create the dataset
    train_dataset = ArticleDataset(X_train, X_srl_train, tokenizer, y_train, max_sentences_per_article, max_sentence_length, max_arg_length)
    test_dataset = ArticleDataset(X_test, X_srl_test, tokenizer, y_test, max_sentences_per_article, max_sentence_length, max_arg_length)

    # Create dataloaders
    train_dataloader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True, collate_fn=custom_collate_fn, drop_last=True)
    test_dataloader = DataLoader(test_dataset, batch_size=batch_size, shuffle=True, collate_fn=custom_collate_fn, drop_last=True)
    
    print("CREATION DONE")
    return train_dataset, test_dataset , train_dataloader, test_dataloader

In [93]:
def get_article_dataloader(article, tokenizer, batch_size=1):
    X = pd.Series([article])
    y = None  # No labels for this single article
    
    predictor = Predictor.from_path("https://storage.googleapis.com/allennlp-public-models/structured-prediction-srl-bert.2020.12.15.tar.gz", cuda_device=0)
    # Directly use the optimized_extract_srl function since we don't need to cache for single articles
    X_srl = optimized_extract_srl(X, predictor)
    
    # Create the dataset
    dataset = ArticleDataset(X, X_srl, tokenizer, y)
    
    # Create dataloader
    dataloader = DataLoader(dataset, batch_size=batch_size, shuffle=False, collate_fn=custom_collate_fn)
    
    return dataloader

In [94]:
def get_test_dataloader(X, tokenizer, batch_size=4):
    y = None
    
    predictor = Predictor.from_path("https://storage.googleapis.com/allennlp-public-models/structured-prediction-srl-bert.2020.12.15.tar.gz", cuda_device=0)
    # Directly use the optimized_extract_srl function since we don't need to cache for single articles
    X_srl = optimized_extract_srl(X, predictor)
    
    # Create the dataset
    dataset = ArticleDataset(X, X_srl, tokenizer, y)
    
    # Create dataloader
    dataloader = DataLoader(dataset, batch_size=batch_size, shuffle=False, collate_fn=custom_collate_fn)
    
    return dataloader

# PyTorch Model
The Model consist out of various Layers.

1. SRL_Embedding
2. Autoencoder
3. FRISSLoss
4. Unsupervised
5. Supervised
6. FRISS

In [95]:
import torch
import torch.nn as nn
import torch.nn.functional as F
from transformers import BertModel

## 1. SRL_Embeddings

The layer takes tensors of token IDs with the shape [batch_size, max_num_sentences, max_num_tokens] for the sentence, predicates, arg0 and arg1 and returns for each sentence an embedding with shape [batch_size, embedding_dim] for the sentence, predicate, arg0 and arg1. 

The single embedding for the sentence is extracted by taking the [CLS] token embedding. For the predicate, arg0 and arg1 by taking the mean over all word embeddings in this list of tokens. 

> Possible improvements: Better way of extracting the single embedding for predicate, arg0 and arg1.

In [96]:
from transformers import BertModel
import torch.nn as nn
import torch

class SRL_Embeddings(nn.Module):
    def __init__(self, bert_model_name="bert-base-uncased"):
        super(SRL_Embeddings, self).__init__()
        self.bert_model = BertModel.from_pretrained(bert_model_name)
        self.embedding_dim = 768  # for bert-base-uncased

    def forward(self, sentence_ids, predicate_ids, arg0_ids, arg1_ids):
        with torch.no_grad():
            # Sentence embeddings
            sentence_embeddings = self.bert_model(sentence_ids.view(-1, sentence_ids.size(-1)))[0]
            sentence_embeddings = sentence_embeddings.view(sentence_ids.size(0), sentence_ids.size(1), -1, self.embedding_dim)
            sentence_embeddings = sentence_embeddings.mean(dim=2)

            # Predicate embeddings
            predicate_embeddings = self.bert_model(predicate_ids.view(-1, predicate_ids.size(-1)))[0]
            predicate_embeddings = predicate_embeddings.view(predicate_ids.size(0), predicate_ids.size(1), predicate_ids.size(2), -1, self.embedding_dim)
            predicate_embeddings = predicate_embeddings.mean(dim=3)

            # ARG0 embeddings
            arg0_embeddings = self.bert_model(arg0_ids.view(-1, arg0_ids.size(-1)))[0]
            arg0_embeddings = arg0_embeddings.view(arg0_ids.size(0), arg0_ids.size(1), arg0_ids.size(2), -1, self.embedding_dim)
            arg0_embeddings = arg0_embeddings.mean(dim=3)

            # ARG1 embeddings
            arg1_embeddings = self.bert_model(arg1_ids.view(-1, arg1_ids.size(-1)))[0]
            arg1_embeddings = arg1_embeddings.view(arg1_ids.size(0), arg1_ids.size(1), arg1_ids.size(2), -1, self.embedding_dim)
            arg1_embeddings = arg1_embeddings.mean(dim=3)

        return sentence_embeddings, predicate_embeddings, arg0_embeddings, arg1_embeddings
    
# Generate dummy data for the SRL_Embeddings
batch_size = 2
num_sentences = 12
sentence_length = 8
num_args = 9
predicate_length = 8
arg0_length = 8
arg1_length = 8

# Dummy data for sentences, predicates, arg0, and arg1
sentence_ids = torch.randint(0, 10000, (batch_size, num_sentences, sentence_length))
predicate_ids = torch.randint(0, 10000, (batch_size, num_sentences, num_args, predicate_length))
arg0_ids = torch.randint(0, 10000, (batch_size, num_sentences, num_args, arg0_length))
arg1_ids = torch.randint(0, 10000, (batch_size, num_sentences, num_args, arg1_length))

srl_embeddings = SRL_Embeddings()

sentence_embeddings, predicate_embeddings, arg0_embeddings, arg1_embeddings = srl_embeddings(sentence_ids, predicate_ids, arg0_ids, arg1_ids)

print("Inputs shapes: ", sentence_ids.shape, predicate_ids.shape, arg0_ids.shape, arg1_ids.shape)
print("Outputs shapes: ", sentence_embeddings.shape, predicate_embeddings.shape, arg0_embeddings.shape, arg1_embeddings.shape)

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.predictions.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Inputs shapes:  torch.Size([2, 12, 8]) torch.Size([2, 12, 9, 8]) torch.Size([2, 12, 9, 8]) torch.Size([2, 12, 9, 8])
Outputs shapes:  torch.Size([2, 12, 768]) torch.Size([2, 12, 9, 768]) torch.Size([2, 12, 9, 768]) torch.Size([2, 12, 9, 768])


## 2. Autoencoder

The layer takes tensors for `v` (size: [batch_size, embedding_dim]), `v_sentence` (size: [batch_size, embedding_dim]), `tau` (type: _float_), and `identifier` (type: _str_). Where `v` is the embedding of either predicate, arg0 or arg1 identified by the `identifier` parameter. The `v_sentence` is the sentence embedding and `tau` defined the tau for annealing the gumpel softmax.

The forward function returns `vhat` (size: [batch_size, embedding_dim]), `dz` (size: [batch_size, embedding_dim]), `gz` (size: [batch_size, embedding_dim]) and `F` (size: [K, embedding_dim]).

- `vhat`: Reconstructed embedding of SRL
- `dz`: Descriptor weights
- `gz`: Gumbel softmax from logits
- `F`: Dictionary

In [97]:
import torch
import torch.nn as nn
import torch.nn.functional as F

class CombinedAutoencoder(nn.Module):
    def __init__(self, D_w, D_h, K, dropout_prob=0.3):
        super(CombinedAutoencoder, self).__init__()
        
        self.D_h = D_h
        self.K = K
        
        # Shared feed-forward layer for all views
        self.feed_forward_shared = nn.Linear(2 * D_w, D_h)
        
        # Unique feed-forward layers for each view
        self.feed_forward_unique = nn.ModuleDict({
            'a0': nn.Linear(D_h, K),
            'p': nn.Linear(D_h, K),
            'a1': nn.Linear(D_h, K),
        })

        # Initializing F matrices for each view
        self.F_matrices = nn.ParameterDict({
            'a0': nn.Parameter(torch.randn(K, D_w)),
            'p': nn.Parameter(torch.randn(K, D_w)),
            'a1': nn.Parameter(torch.randn(K, D_w)),
        })

        # Additional layers and parameters
        self.dropout = nn.Dropout(dropout_prob)
        self.batch_norm = nn.BatchNorm1d(D_h)
        self.activation = nn.ReLU()
        self.activation2 = nn.Sigmoid()
    
    # try softmax
    def gumbel_softmax(self, logits, tau: float = 1, hard: bool = False, eps: float = 1e-10, threshold = 0.5, dim: int = 1):
        gumbels = (
            -torch.empty_like(logits, memory_format=torch.legacy_contiguous_format).exponential_().log()
        )  # ~Gumbel(0,1)

        #gumbels = (torch.log(logits) + (gumbels/tau))  # ~Gumbel(logits,tau)
        gumbels = (logits + gumbels) / tau
        y_soft = gumbels.softmax(dim)

        if hard:
            indices = (y_soft > threshold).nonzero(as_tuple=True)
            y_hard = torch.zeros_like(logits, memory_format=torch.legacy_contiguous_format)
            y_hard[indices[0], indices[1]] = 1.0
            ret = y_hard - y_soft.detach() + y_soft
        else:
            ret = y_soft

        return ret

    def forward(self, v_p, v_a0, v_a1, v_sentence, tau):
        h_p, h_a0, h_a1 = self.process_through_shared(v_p, v_a0, v_a1, v_sentence)

        logits_p = self.feed_forward_unique['p'](h_p)
        logits_a0 = self.feed_forward_unique['a0'](h_a0)
        logits_a1 = self.feed_forward_unique['a1'](h_a1)
        
        dz_p = torch.softmax(logits_p, dim=1)
        dz_a0 = torch.softmax(logits_a0, dim=1)
        dz_a1 = torch.softmax(logits_a1, dim=1)
        
        gz_p = self.gumbel_softmax(dz_p, tau=tau, hard=False)
        gz_a0 = self.gumbel_softmax(dz_a0, tau=tau, hard=False)
        gz_a1 = self.gumbel_softmax(dz_a1, tau=tau, hard=False)

        vhat_p = torch.matmul(gz_p, self.F_matrices['p'])
        vhat_a0 = torch.matmul(gz_a0, self.F_matrices['a0'])
        vhat_a1 = torch.matmul(gz_a1, self.F_matrices['a1'])

        return {
            "p": {"vhat": vhat_p, "d": dz_p, "g": gz_p, "F": self.F_matrices['p']},
            "a0": {"vhat": vhat_a0, "d": dz_a0, "g": gz_a0, "F": self.F_matrices['a0']},
            "a1": {"vhat": vhat_a1, "d": dz_a1, "g": gz_a1, "F": self.F_matrices['a1']}
        }
        
    def process_through_shared(self, v_p, v_a0, v_a1, v_sentence):
        concatenated_p = torch.cat((v_p, v_sentence), dim=-1)
        concatenated_a0 = torch.cat((v_a0, v_sentence), dim=-1)
        concatenated_a1 = torch.cat((v_a1, v_sentence), dim=-1)
        
        # Concatenate them along the batch dimension for a single pass through the shared layer
        stacked_embeddings = torch.cat([concatenated_p, concatenated_a0, concatenated_a1], dim=0)
        
        #h_shared = self.dropout(stacked_embeddings)
        h_shared = self.feed_forward_shared(stacked_embeddings)
        
        # Splitting them back to individual embeddings
        batch_size = v_p.shape[0]
        h_shared = h_shared.view(3, batch_size, self.D_h)
        
        h_p, h_a0, h_a1 = h_shared[0], h_shared[1], h_shared[2]
        return h_p, h_a0, h_a1

# Mock Data Preparation
D_h = 768
batch_size = 2
embedding_dim = 768
K = 20
tau = 0.9

# Generating mock embeddings for article, predicate, ARG0, ARG1, and their corresponding sentence embeddings
article_embedding = torch.randn(batch_size, embedding_dim)
v_p = torch.randn(batch_size, embedding_dim)
v_a0 = torch.randn(batch_size, embedding_dim)
v_a1 = torch.randn(batch_size, embedding_dim)

# Testing CombinedAutoencoder
autoencoder = CombinedAutoencoder(embedding_dim, D_h, K)
outputs = autoencoder(v_p, v_a0, v_a1, article_embedding, tau)

# Check shapes of the outputs
print("Output shapes:")
for key, value in outputs.items():
    print(f"{key} -> vhat: {value['vhat'].shape}, d: {value['d'].shape}, g: {value['g'].shape}, F: {value['F'].shape}")

# check if tensor have nan values
def check_nan(tensor):
    # if tensor has any nan values, return True
    if torch.isnan(tensor).any():
        return True
    else:
        return False

# Check if any of the outputs have NaN values
print("NaN values:")
for key, value in outputs.items():
    print(f"{key} -> vhat: {check_nan(value['vhat'])}, d: {check_nan(value['d'])}, g: {check_nan(value['g'])}, F: {check_nan(value['F'])}")

Output shapes:
p -> vhat: torch.Size([2, 768]), d: torch.Size([2, 20]), g: torch.Size([2, 20]), F: torch.Size([20, 768])
a0 -> vhat: torch.Size([2, 768]), d: torch.Size([2, 20]), g: torch.Size([2, 20]), F: torch.Size([20, 768])
a1 -> vhat: torch.Size([2, 768]), d: torch.Size([2, 20]), g: torch.Size([2, 20]), F: torch.Size([20, 768])
NaN values:
p -> vhat: False, d: False, g: False, F: False
a0 -> vhat: False, d: False, g: False, F: False
a1 -> vhat: False, d: False, g: False, F: False


## 3. FRISSLoss

The layer calculates the unsupervised loss for predicate, arg0 and arg1. 

The forward function takes as input 3 dicts with the parameters `v`, `v_hat`, `g` and `F`. Where `v` is the embedding of the predicate, arg0 or arg1. The `v_hat` (size: [batch_size, embedding_dim]) is the reconstructed embedding for the predicate, arg0 and arg1. The `g` is the gumbel softmax result (size: [batch_size, embedding_dim]). The `F` (size: [K, embedding_dim]) which is the descriptor dictionary.

The layer returns the loss for each batch. So the output is [batch_size].

In [98]:
class FRISSLoss(nn.Module):
    def __init__(self, lambda_orthogonality, M, t):
        super(FRISSLoss, self).__init__()
        
        self.lambda_orthogonality = lambda_orthogonality
        self.M = M
        self.t = t
        self.triplet_loss = nn.TripletMarginLoss(margin=M)

    def contrastive_loss(self, v, vhat, negatives):
        batch_size = vhat.size(0)
        N = negatives.size(0)
        loss = torch.zeros(batch_size, device=v.device)

        # Calculate true distance between reconstructed and real embeddings
        true_distance = self.l2(vhat, v)

        for i in range(N):  # loop over each element in "negatives"
            
            # Tranform negative from [embedding dim] to [batch size, embedding_dim] 
            negative = negatives[i, :].expand(v.size(0), -1)

            # Calculate negative distance for current negative embedding
            negative_distance = self.l2(vhat, negative)

            # Compute loss based on the provided logic: l2(vhat, v) + 1 + l2(vhat, negative) and clamp to 0 if below 0
            current_loss = 1 + true_distance - negative_distance
            loss += torch.clamp(current_loss, min=0.0)

        # Normalize the total loss by N
        return loss / N

    
    def l2(self, u, v):
        return torch.sqrt(torch.sum((u - v) ** 2, dim=1))
    
    def focal_triplet_loss_WRONG(self, v, vhat_z, g, F):
        losses = []
        for i in range(F.size(0)):  # Iterate over each negative example
            # For each negative, compute the loss against the anchor and positive
            loss = self.triplet_loss(vhat_z, v, F[i].unsqueeze(0).expand(v.size(0), -1))
            losses.append(loss)

        loss_tensor = torch.stack(losses) 
        loss = loss_tensor.mean(dim=0).mean()
        return loss
    
    def focal_triplet_loss(self, v, vhat_z, g, F):
        _, indices = torch.topk(g, self.t, largest=False, dim=1)

        F_t = torch.stack([F[indices[i]] for i in range(g.size(0))])
        
        g_tz = torch.stack([g[i, indices[i]] for i in range(g.size(0))])
                    
        g_t = g_tz / g_tz.sum(dim=1, keepdim=True)
        
        # if division by zero set all nan values to 0
        g_t[torch.isnan(g_t)] = 0
        
        m_t = self.M * ((1 - g_t)**2)

        # Initializing loss
        loss = torch.zeros_like(v[:, 0])
        
        # Iteratively adding to the loss for each negative embedding
        for i in range(self.t):
            current_v_t = F_t[:, i]
            current_m_t = m_t[:, i]
            
            current_loss = current_m_t + self.l2(vhat_z, v) - self.l2(vhat_z, current_v_t)
            
            loss += torch.max(torch.zeros_like(current_loss), current_loss)
             
        # Normalizing
        loss = loss / self.t
        return loss

    def orthogonality_term(self, F, reg=1e-4):
        gram_matrix = torch.mm(F, F.T)  # Compute the Gram matrix F * F^T
        identity_matrix = torch.eye(gram_matrix.size(0), device=gram_matrix.device)  # Create an identity matrix
        ortho_loss = (gram_matrix - identity_matrix).abs().sum()
        return ortho_loss


    def forward(self, p, a0, a1, p_negatives, a0_negatives, a1_negatives):
        # Extract components from dictionary for predicate p
        v_p, vhat_p, d_p, g_p, F_p = p["v"], p["vhat"], p["d"], p["g"], p["F"]
        
        # Extract components from dictionary for ARG0
        v_a0, vhat_a0, d_a0, g_a0, F_a0 = a0["v"], a0["vhat"], a0["d"], a0["g"], a0["F"]

        # Extract components from dictionary for ARG1
        v_a1, vhat_a1, d_a1, g_a1, F_a1 = a1["v"], a1["vhat"], a1["d"], a1["g"], a1["F"]
        
         # Calculate losses for predicate
        Ju_p = self.contrastive_loss(v_p, vhat_p, p_negatives)        
        Jt_p = self.focal_triplet_loss(v_p, vhat_p, g_p, F_p)
        
        Jz_p = Ju_p + Jt_p + self.lambda_orthogonality * self.orthogonality_term(F_p) ** 2
        #print(Ju_p, Jt_p, self.orthogonality_term(F_p))
        # Calculate losses for ARG0
        Ju_a0 = self.contrastive_loss(v_a0, vhat_a0, a0_negatives)
        Jt_a0 = self.focal_triplet_loss(v_a0, vhat_a0, g_a0, F_a0)
        Jz_a0 = Ju_a0 + Jt_a0 + self.lambda_orthogonality * self.orthogonality_term(F_a0) ** 2
        
        # Calculate losses for ARG1
        Ju_a1 = self.contrastive_loss(v_a1, vhat_a1, a1_negatives)
        Jt_a1 = self.focal_triplet_loss(v_a1, vhat_a1, g_a1, F_a1)
        Jz_a1 = Ju_a1 + Jt_a1 + self.lambda_orthogonality * self.orthogonality_term(F_a1) ** 2
        
        if torch.isnan(Jz_p).any():
            print("Jz_p has nan")
            
        if torch.isnan(Jz_a0).any():
            print("Jz_a0 has nan")
            
        if torch.isnan(Jz_a1).any():
            print("Jz_a1 has nan")
        
        # Aggregate the losses
        loss = Jz_p + Jz_a0 + Jz_a1
        
        return loss


# Mock Data Preparation
batch_size = 2
embedding_dim = 768
K = 15  # Number of frames/descriptors

# Generating mock embeddings for article, predicate, ARG0, ARG1 and their reconstructions
article_embedding = torch.randn(batch_size, embedding_dim)
v_p = torch.randn(batch_size, embedding_dim)
vhat_p = torch.randn(batch_size, embedding_dim)

v_a0 = torch.randn(batch_size, embedding_dim)
vhat_a0 = torch.randn(batch_size, embedding_dim)

v_a1 = torch.randn(batch_size, embedding_dim)
vhat_a1 = torch.randn(batch_size, embedding_dim)

# Generating mock descriptor weights and descriptor matrices for predicate, ARG0, ARG1
d_p = torch.randn(batch_size, K)
d_a0 = torch.randn(batch_size, K)
d_a1 = torch.randn(batch_size, K)

F_p = torch.randn(K, embedding_dim)
F_a0 = torch.randn(K, embedding_dim)
F_a1 = torch.randn(K, embedding_dim)

g_p = torch.randn(batch_size, K)
g_a0 = torch.randn(batch_size, K)
g_a1 = torch.randn(batch_size, K)

# Generating some negative samples (let's assume 5 negative samples per batch entry)
num_negatives = 8
negatives_p = torch.randn(num_negatives, embedding_dim)
negatives_a0 = torch.randn(num_negatives, embedding_dim)
negatives_a1 = torch.randn(num_negatives, embedding_dim)

# Initialize loss function
lambda_orthogonality = 1e-3

t = 8  # Number of descriptors with smallest weights for negative samples
M = t

loss_fn = FRISSLoss(lambda_orthogonality, M, t)

# Organizing inputs into dictionaries
p = {"v": v_p, "vhat": vhat_p, "d": d_p, "g": g_p, "F": F_p}
a0 = {"v": v_a0, "vhat": vhat_a0, "d": d_a0, "g": g_a0, "F": F_a0}
a1 = {"v": v_a1, "vhat": vhat_a1, "d": d_a1, "g": g_a1, "F": F_a1}

loss_fn = FRISSLoss(lambda_orthogonality, M, t)
loss = loss_fn(p, a0, a1, negatives_p, negatives_a0, negatives_a1)
print("FRiSSLoss output:", loss)

FRiSSLoss output: tensor([822901.6250, 822901.3750])


## 4. FRISSUnsupervised

The `FRISSUnsupervised` layer integrates multiple autoencoders and the previously described `FRISSLoss` layer to achieve an unsupervised learning process over the predicates and their arguments.

### Forward Method:

**Inputs**:
1. **v_p**: Embedding of the predicate with size: [batch_size, D_w].
2. **v_a0**: Embedding of the ARG0 (first argument) with size: [batch_size, D_w].
3. **v_a1**: Embedding of the ARG1 (second argument) with size: [batch_size, D_w].
4. **v_article**: Embedding of the article with size: [batch_size, D_w].
5. **negatives**: Tensor containing negative samples with size: [batch_size, num_negatives, D_w].
6. **tau**: A scalar parameter for the Gumbel softmax in the autoencoder.

**Outputs**:
- A dictionary `results` containing:
    - **loss**: A tensor representing the combined unsupervised loss over the batch with size: [batch_size].
    - **p**: Dictionary containing components for the predicate, including reconstructed embedding (`vhat`), descriptor weights (`d`), Gumbel softmax result (`g`), and the descriptor matrix (`F`).
    - **a0**: Same as `p` but for ARG0.
    - **a1**: Same as `p` but for ARG1.

In [99]:
import torch
import torch.nn as nn
import torch.nn.functional as F

# Assuming you have already defined CombinedAutoencoder and its methods as provided earlier.

class FRISSUnsupervised(nn.Module):
    def __init__(self, D_w, D_h, K, num_frames, lambda_orthogonality, M, t, dropout_prob=0.3):
        super(FRISSUnsupervised, self).__init__()
        
        self.loss_fn = FRISSLoss(lambda_orthogonality, M, t)      
        
        # Using the CombinedAutoencoder instead of individual Autoencoders
        self.combined_autoencoder = CombinedAutoencoder(D_w, D_h, K, dropout_prob=dropout_prob)

    def forward(self, v_p, v_a0, v_a1, v_sentence, p_negatives, a0_negatives, a1_negatives, tau):
        outputs = self.combined_autoencoder(v_p, v_a0, v_a1, v_sentence, tau)

        outputs_p = outputs["p"]
        outputs_p["v"] = v_p
        
        outputs_a0 = outputs["a0"]
        outputs_a0["v"] = v_a0
        
        outputs_a1 = outputs["a1"]
        outputs_a1["v"] = v_a1
        
        loss = self.loss_fn(
            outputs_p,
            outputs_a0, 
            outputs_a1, 
            p_negatives, a0_negatives, a1_negatives
        )

        results = {
            "loss": loss,
            "p": outputs["p"],
            "a0": outputs["a0"],
            "a1": outputs["a1"]
        }
        
        return results

# Mock Data Preparation
D_h = 768
batch_size = 2
embedding_dim = 768
K = 20
num_frames = 15
tau = 0.9
lambda_orthogonality = 0.1  # Placeholder value, please replace with your actual value
M = 7  # Placeholder value, please replace with your actual value
t = 7  # Placeholder value, please replace with your actual value

# Generating mock embeddings for article, predicate, ARG0, ARG1, and their corresponding sentence embeddings
article_embedding = torch.randn(batch_size, embedding_dim)
v_p = torch.randn(batch_size, embedding_dim)
v_a0 = torch.randn(batch_size, embedding_dim)
v_a1 = torch.randn(batch_size, embedding_dim)

# Generating some negative samples (let's assume 5 negative samples per batch entry)
num_negatives = 10
negatives_p = torch.randn(num_negatives, embedding_dim)
negatives_a0 = torch.randn(num_negatives, embedding_dim)
negatives_a1 = torch.randn(num_negatives, embedding_dim)

# Testing FRISSUnsupervised
unsupervised_module = FRISSUnsupervised(embedding_dim, D_h, K, num_frames, lambda_orthogonality, M, t)
results = unsupervised_module(v_p, v_a0, v_a1, article_embedding, negatives_p, negatives_a0, negatives_a1, tau)

# Print the results' shapes for verification
print("Results' Shapes:")
for key, value in results.items():
    if key == "loss":
        print(f"{key}: {value}")
    else:
        print(f"{key} -> vhat: {value['vhat'].shape}, d: {value['d'].shape}, g: {value['g'].shape}, F: {value['F'].shape}")


Results' Shapes:
loss: tensor([1.7158e+08, 1.7158e+08], grad_fn=<AddBackward0>)
p -> vhat: torch.Size([2, 768]), d: torch.Size([2, 20]), g: torch.Size([2, 20]), F: torch.Size([20, 768])
a0 -> vhat: torch.Size([2, 768]), d: torch.Size([2, 20]), g: torch.Size([2, 20]), F: torch.Size([20, 768])
a1 -> vhat: torch.Size([2, 768]), d: torch.Size([2, 20]), g: torch.Size([2, 20]), F: torch.Size([20, 768])


## 5. FRISSSupervised

The layer takes the embeddings from the args and the sentence and predicts frames. 

The embeddings for the args are averaged for each arg individually and then averaged on args level. The final embedding is feed into a linear layer and passed through a sigmoid function. 

The sentence embedding is feed into a linear layer and then into a relu function. After again in a linear function and then averaged. The average embeddung is again feed into a linear layer and lastly in a signoid function. 

It returns a span and sentence based prediction of shape [batch_size, num_frames].

In [100]:
import torch
import torch.nn as nn
import torch.nn.functional as F

class FRISSSupervised(nn.Module):
    def __init__(self, D_w, K, num_frames, dropout_prob=0.3):
        super(FRISSSupervised, self).__init__()

        self.D_w = D_w
                
        self.softmax = nn.Softmax(dim=1)

        self.feed_forward_sentence1 = nn.Linear(D_w, D_w)
        self.feed_forward_sentence2 = nn.Linear(D_w, num_frames)

        self.relu = nn.ReLU()
        
    def forward(self, d_p, d_a0, d_a1, vs):
        # Span-based Classification   

        # aggregate the SRL descriptors to have one descriptor per sentence
        d_p = d_p.mean(dim=2)
        d_a0 = d_a0.mean(dim=2)
        d_a1 = d_a1.mean(dim=2)

        # take the mean over descriptors
        d_v = (d_p + d_a0 + d_a1) / 3

        # feed in softmax
        # yu_hat = self.softmax(d_v) # do not use as we use crossentropyloss

        # Sentence-based Classification

        ws = self.relu(self.feed_forward_sentence1(vs))

        ws = self.feed_forward_sentence2(ws)

        # mean over sentences
        ws = ws.mean(dim=1)

        # softmax
        # ys_hat = self.softmax(ws) # do not use as we use crossentropyloss

        return d_v, ws


# Mock Data Preparation

batch_size = 2
embedding_dim = 768
num_frames = 15  # Assuming the number of frames is equal to K for simplicity
num_sentences = 32
K = 15
num_args = 9

# Generating mock dsz representations for predicate, ARG0, ARG1
d_p = torch.randn(batch_size, num_sentences, num_args, K)
d_a0 = torch.randn(batch_size, num_sentences, num_args, K)
d_a1 = torch.randn(batch_size, num_sentences, num_args, K) 

# Adjusting the num_heads parameter
srl_heads = 4
sentence_heads = 8

# Adjust the mock sentence embeddings shape
vs = torch.randn(batch_size, num_sentences, embedding_dim)

# Initialize and test the supervised module
supervised_module = FRISSSupervised(embedding_dim, K, num_frames)

# Forward pass the mock data
yu_hat, ys_hat = supervised_module(d_p, d_a0, d_a1, vs)
yu_hat.shape, ys_hat.shape

(torch.Size([2, 32, 15]), torch.Size([2, 15]))

## 6. FRISS

In [101]:
import torch.nn as nn

class FRISS(nn.Module):
    def __init__(self, embedding_dim, D_h, lambda_orthogonality, M, t, num_sentences, K, num_frames, dropout_prob=0.3, bert_model_name="bert-base-uncased"):
        super(FRISS, self).__init__()
        
        # Aggregation layer replaced with SRL_Embeddings
        self.aggregation = SRL_Embeddings(bert_model_name)
        
        # Unsupervised training module
        self.unsupervised = FRISSUnsupervised(embedding_dim, D_h, K, num_frames, lambda_orthogonality, M, t, dropout_prob=dropout_prob)
        
        # Supervised training module
        self.supervised = FRISSSupervised(embedding_dim, K, num_frames, dropout_prob=dropout_prob)
        
    def negative_sampling(self, embeddings, num_negatives=8):
        batch_size, num_sentences, num_args, embedding_dim = embeddings.size()
        all_negatives = []

        for i in range(batch_size):
            for j in range(num_sentences):
                # Flatten the arguments dimension to sample across all arguments in the sentence
                flattened_embeddings = embeddings[i, j].view(-1, embedding_dim)
                
                # Get indices of non-padded embeddings (assuming padding is represented by all-zero vectors)
                non_padded_indices = torch.where(torch.any(flattened_embeddings != 0, dim=1))[0]

                # Randomly sample negative indices from non-padded embeddings
                if len(non_padded_indices) > 0:
                    negative_indices = non_padded_indices[torch.randint(0, len(non_padded_indices), (num_negatives,))]
                else:
                    # If no non-padded embeddings, use zeros
                    negative_indices = torch.zeros(num_negatives, dtype=torch.long)

                negative_samples = flattened_embeddings[negative_indices, :]
                all_negatives.append(negative_samples)

        # Concatenate all negative samples into a single tensor
        all_negatives = torch.cat(all_negatives, dim=0)

        # If more samples than required, randomly select 'num_negatives' samples
        if all_negatives.size(0) > num_negatives:
            indices = torch.randperm(all_negatives.size(0))[:num_negatives]
            all_negatives = all_negatives[indices]

        return all_negatives
    
    def forward(self, sentence_ids, predicate_ids, arg0_ids, arg1_ids, tau):
        # Convert input IDs to embeddings
        sentence_embeddings, predicate_embeddings, arg0_embeddings, arg1_embeddings = self.aggregation(sentence_ids, predicate_ids, arg0_ids, arg1_ids)
        
        # Handle multiple spans by averaging predictions
        unsupervised_losses = torch.zeros((sentence_embeddings.size(0),), device=sentence_embeddings.device)
        
        # Creating storage for aggregated d tensors
        d_p_list, d_a0_list, d_a1_list = [], [], []
        
        negatives_p = self.negative_sampling(predicate_embeddings)
        negatives_a0 = self.negative_sampling(arg0_embeddings)
        negatives_a1 = self.negative_sampling(arg1_embeddings)

        # Process each sentence 
        for sentence_idx in range(sentence_embeddings.size(1)):
            s_sentence_span = sentence_embeddings[:, sentence_idx, :]

            d_p_sentence_list = []
            d_a0_sentence_list = []
            d_a1_sentence_list = []

            # Process each span
            for span_idx in range(predicate_embeddings.size(2)):                
                v_p_span = predicate_embeddings[:, sentence_idx, span_idx, :]
                v_a0_span = arg0_embeddings[:, sentence_idx, span_idx, :]
                v_a1_span = arg1_embeddings[:, sentence_idx, span_idx, :]

                # Feed the embeddings to the unsupervised module
                unsupervised_results = self.unsupervised(v_p_span, v_a0_span, v_a1_span, s_sentence_span, negatives_p, negatives_a0, negatives_a1, tau)                
                unsupervised_losses += unsupervised_results["loss"]
                
                if torch.isnan(unsupervised_results["loss"]).any():
                    print("loss is nan")
                
                # Use the vhat (reconstructed embeddings) for supervised predictions
                d_p_sentence_list.append(unsupervised_results['p']['d'])
                d_a0_sentence_list.append(unsupervised_results['a0']['d'])
                d_a1_sentence_list.append(unsupervised_results['a1']['d'])        


            # Aggregating across all spans
            d_p_sentence = torch.stack(d_p_sentence_list, dim=1)
            d_a0_sentence = torch.stack(d_a0_sentence_list, dim=1)
            d_a1_sentence = torch.stack(d_a1_sentence_list, dim=1)

            d_p_list.append(d_p_sentence)
            d_a0_list.append(d_a0_sentence)
            d_a1_list.append(d_a1_sentence)

        # Aggregating across all spans
        d_p_aggregated = torch.stack(d_p_list, dim=1)
        d_a0_aggregated = torch.stack(d_a0_list, dim=1)
        d_a1_aggregated = torch.stack(d_a1_list, dim=1)
        
        # Supervised predictions
        span_pred, sentence_pred = self.supervised(d_p_aggregated, d_a0_aggregated, d_a1_aggregated, sentence_embeddings)
    
        # Identify valid (non-nan) losses
        valid_losses = ~torch.isnan(unsupervised_losses)

        # Sum only the valid losses
        #unsupervised_loss = unsupervised_losses[valid_losses].sum()
        
        # Take average by summing the valid losses and dividing by num sentences so that padded sentences are also taken in equation
        unsupervised_loss = unsupervised_losses[valid_losses].sum() / sentence_embeddings.shape[1]
        
        return unsupervised_loss, span_pred, sentence_pred


# Set the necessary parameters
batch_size = 2
embedding_dim = 768
K = 14  # Number of frames/descriptors
num_frames = 14  # Assuming the number of frames is equal to K for simplicity
D_h = 512  # Dimension of the hidden representation
lambda_orthogonality = 0.1
M = 8
t = 8
tau = 1.0

# Define some mock token IDs data parameters
max_sentences_per_article = 8
max_sentence_length = 10
num_sentences = max_sentences_per_article
max_args_per_sentence = 3

# Generating mock token IDs for predicate, ARG0, ARG1, and their corresponding sentences
# We assume a vocab size of 30522 (standard BERT vocab size) for simplicity.
vocab_size = 30522

sentence_ids = torch.randint(0, vocab_size, (batch_size, max_sentences_per_article, max_sentence_length))
predicate_ids = torch.randint(0, vocab_size, (batch_size, max_sentences_per_article, max_args_per_sentence, max_sentence_length))
arg0_ids = torch.randint(0, vocab_size, (batch_size, max_sentences_per_article, max_args_per_sentence, max_sentence_length))
arg1_ids = torch.randint(0, vocab_size, (batch_size, max_sentences_per_article, max_args_per_sentence, max_sentence_length))

sentence_embeddings = torch.randn(batch_size, max_sentences_per_article, embedding_dim)
predicate_embeddings = torch.randn(batch_size, max_sentences_per_article, max_args_per_sentence, embedding_dim)
arg0_embeddings = torch.randn(batch_size, max_sentences_per_article, max_args_per_sentence, embedding_dim)
arg1_embeddings = torch.randn(batch_size, max_sentences_per_article, max_args_per_sentence, embedding_dim)

# Initialize the FRISS model
friss_model = FRISS(embedding_dim, D_h, lambda_orthogonality, M, t, num_sentences, K=K, num_frames=num_frames)

# Forward pass the mock data
unsupervised_loss, span_pred, sentence_pred = friss_model(sentence_ids, predicate_ids, arg0_ids, arg1_ids, 1)
unsupervised_loss, span_pred.shape, sentence_pred.shape

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.predictions.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


(tensor(3.7889e+08, grad_fn=<DivBackward0>),
 torch.Size([2, 8, 14]),
 torch.Size([2, 14]))

# Train Model

The F1-Score (micro-averaged) and Average Precision Score are chosen as primary metrics for evaluating the multi-label classification task due to the following reasons:

1. **F1-Score (Micro)**:
    - The micro-averaged F1-score computes global counts of true positives, false negatives, and false positives. 
    - It provides a balance between precision (the number of correct positive results divided by the number of all positive results) and recall (the number of correct positive results divided by the number of positive results that should have been returned).
    - Given the imbalance in the label distribution observed in the dataset, the micro-averaged F1-score is robust against this imbalance, making it a suitable metric for optimization.

2. **Average Precision Score**:
    - This metric summarizes the precision-recall curve, giving a single value that represents the average of precision values at different recall levels.
    - It's especially valuable when class imbalances exist, as it gives more weight to the positive class (the rarer class in an imbalanced dataset).

Using these metrics will ensure that the model is optimized for a balanced performance across all labels, even if some labels are rarer than others.

In [102]:
import numpy as np
from sklearn.metrics import f1_score, average_precision_score
from math import exp
import json 
import csv

from torch.optim.lr_scheduler import StepLR

def train(model, train_dataloader, test_dataloader, optimizer, loss_function, alpha=0.5, num_epochs=10, tau_min=1, tau_decay=0.95, device='cuda', save_path='../notebooks/', save=False):
    tau = 1
    
    metrics = {
        'f1_span_micro': [],
        'f1_sentence_micro': [],
        'f1_span_macro': [],
        'f1_sentence_macro': []
    }
    
    scheduler = StepLR(optimizer, step_size=2, gamma=0.1)
    
    iteration = 0
    
    for epoch in tqdm(range(num_epochs), desc="Epochs"):
        model.train()
        
        total_loss = 0
        supervised_total_loss = 0
        unsupervised_total_loss = 0
        
        batch_progress = tqdm(enumerate(train_dataloader), total=len(train_dataloader), desc="Batches", leave=False)
        for batch_idx, batch in batch_progress:            
            iteration = iteration + 1
            
            optimizer.zero_grad()

            sentence_ids = batch['sentence_ids'].to(device)
            predicate_ids = batch['predicate_ids'].to(device)
            arg0_ids = batch['arg0_ids'].to(device)
            arg1_ids = batch['arg1_ids'].to(device)
            labels = batch['labels'].to(device)

            unsupervised_loss, span_logits, sentence_logits = model(sentence_ids, predicate_ids, arg0_ids, arg1_ids, tau)
                    
            span_loss = 0
            sentence_loss = 0

            # loop over spans
            for i in range(span_logits.size(1)):
                span_loss += loss_function(span_logits[:, i, :], labels.float())
            
            # span_loss = loss_function(span_logits, labels.float())            
            sentence_loss = loss_function(sentence_logits, labels.float())
            
            supervised_loss = span_loss + sentence_loss
            
            combined_loss = alpha * supervised_loss + (1-alpha) * unsupervised_loss
            
            if torch.isnan(combined_loss):
                print(f"NaN loss detected at epoch {epoch+1}, batch {batch_idx+1}. Stopping...")
                return
        
            combined_loss.backward()
            
            torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)
            
            # After the backward pass
            if any(p.grad is not None and torch.isnan(p.grad).any() for p in model.parameters()):
                print(f"NaN gradients detected at epoch {epoch+1}, batch {batch_idx+1}. Stopping...")
                return
            
            optimizer.step()

            total_loss += combined_loss.item()
            supervised_total_loss += supervised_loss.item()
            unsupervised_total_loss += unsupervised_loss.item()

            batch_progress.set_description(f"Epoch {epoch+1} ({iteration}) Total Loss: {combined_loss.item():.3f}, SRLs: {span_loss:.3f}, Sentence: {sentence_loss:.3f}, CombinedS: {supervised_loss.item():.3f}, Unsupervised: {unsupervised_loss.item():.3f}")
                        
            if save:
                # Log metrics to CSV
                with open(save_path + 'training_metrics.csv', 'a') as f:
                    writer = csv.writer(f)
                    writer.writerow([batch_idx, epoch+1, combined_loss.item(), supervised_loss.item(), unsupervised_loss.item()])

            # Explicitly delete tensors to free up memory
            del sentence_ids, predicate_ids, arg0_ids, arg1_ids, labels, unsupervised_loss
            torch.cuda.empty_cache()

        print(f"Epoch {epoch+1}/{num_epochs}, Combined Loss: {total_loss/len(train_dataloader)}, Supervised Loss: {supervised_total_loss/len(train_dataloader)}, Unsupervised Loss: {unsupervised_total_loss/len(train_dataloader)}")
        
        model.eval()
        
        span_preds = []
        sentence_preds = []
        combined_preds = []
        all_labels = []

        with torch.no_grad():
            for batch in test_dataloader:
                sentence_ids = batch['sentence_ids'].to(device)
                predicate_ids = batch['predicate_ids'].to(device)
                arg0_ids = batch['arg0_ids'].to(device)
                arg1_ids = batch['arg1_ids'].to(device)
                labels = batch['labels'].to(device)
                
                _, span_logits, sentence_logits = model(sentence_ids, predicate_ids, arg0_ids, arg1_ids, tau)

                span_pred = []

                # loop over span and apply softmax
                for i in range(span_logits.size(1)):
                    span_pred.append((torch.softmax(span_logits[:, i, :], dim=1) > 0.5).float().cpu().numpy())

                sentence_pred = (torch.sigmoid(sentence_logits) > 0.5).float()
                
                span_preds.append(span_pred)
                sentence_preds.append(sentence_pred.cpu().numpy())
                combined_preds.append(combined_pred.cpu().numpy())
                all_labels.append(labels.cpu().numpy())

                # Explicitly delete tensors to free up memory
                del sentence_ids, predicate_ids, arg0_ids, arg1_ids, labels, span_logits, sentence_logits, span_pred, sentence_pred, combined_pred
                torch.cuda.empty_cache()

        all_span_preds = np.vstack(span_preds)
        all_sentence_preds = np.vstack(sentence_preds)
        all_combined_preds = np.vstack(combined_preds)
        all_labels = np.vstack(all_labels)

        f1_span_micro = f1_score(all_labels, all_span_preds, average='micro')
        f1_sentence_micro = f1_score(all_labels, all_sentence_preds, average='micro')
        
        f1_span_macro = f1_score(all_labels, all_span_preds, average='macro')
        f1_sentence_macro = f1_score(all_labels, all_sentence_preds, average='macro')

        metrics['f1_span_micro'].append(f1_span_micro)
        metrics['f1_sentence_micro'].append(f1_sentence_micro)
        
        metrics['f1_span_macro'].append(f1_span_macro)
        metrics['f1_sentence_macro'].append(f1_sentence_macro)

        print(f"Validation Metrics - micro F1 - Span/Sentence: {f1_span_micro:.4f}/{f1_sentence_micro:.4f}, macro F1 - Span/Sentence: {f1_span_macro:.4f}/{f1_sentence_macro:.4f}")
        
        # Anneal tau at the end of the epoch
        tau = max(tau_min, exp(-tau_decay * iteration))
        
        scheduler.step()
        
    if save:
        model_save_path = os.path.join(save_path, 'model1.pth')
        torch.save(model.state_dict(), model_save_path)
        print(f"Model saved to {model_save_path}")
    
        with open(os.path.join(save_path, 'metrics.json'), 'w') as f:
            json.dump(metrics, f)

    return metrics

# Dataset

In [103]:
# Initialize the tokenizer
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')

num_sentences = 32
batch_size = 48
max_sentence_length = 64
max_arg_length = 12

train_dataset, test_dataset, train_dataloader, test_dataloader = get_datasets_dataloaders(df, tokenizer, recalculate_srl=False, batch_size=batch_size, max_sentences_per_article=num_sentences, max_sentence_length=max_sentence_length, max_arg_length=max_arg_length, pickle_path="../notebooks/FRISS_srl.pkl")

Load SRL from Pickle


  0%|          | 0/10211714 [00:00<?, ?it/s]

X: 6097
X_srl: 6097
y: 6097
CREATING DATASETS
TRAIN TEST SPLIT DONE
CREATION DONE


# Train

In [105]:
def get_friss_model(embedding_dim, D_h, lambda_orthogonality, M, t, num_sentences, K, num_frames, dropout_prob, bert_model_name="bert-base-uncased", load=True, path="", device='cuda'):
    """
    Loads the weights into an instance of the model class from the given path.
    
    Args:
    - model_class (torch.nn.Module): The class of the model (uninitialized).
    - path (str): Path to the saved weights.
    - device (str): Device to load the model on ('cpu' or 'cuda').
    
    Returns:
    - model (torch.nn.Module): Model with weights loaded.
    """

    # Model instantiation
    model = FRISS(embedding_dim, D_h, lambda_orthogonality, M, t, num_sentences, K, num_frames, dropout_prob=dropout_prob, bert_model_name=bert_model_name)
    model = model.to(device)
    
    if load:
        assert path != ""
        model.load_state_dict(torch.load(path, map_location=device))
    
    #model.eval()
    return model

In [106]:
torch.set_printoptions(profile="full")

import torch.optim as optim
import json
import csv

from tqdm.notebook import tqdm

# Hyperparameters
embedding_dim = 768
num_frames = 15

D_h = 768
lambda_orthogonality = 1e-3

K = num_frames
t = 8
M = 8
tau_min = 0.5
tau_decay = 5e-4

dropout_prob = 0.3

friss_model_path = "bert-base-uncased"
bert_model_path = "bert-base-uncased"

# Model instantiation
model = get_friss_model(embedding_dim, 
                        D_h, 
                        lambda_orthogonality, 
                        M, 
                        t, 
                        num_sentences, 
                        K, 
                        num_frames, 
                        dropout_prob=dropout_prob,
                        bert_model_name=bert_model_path,
                        load=False,
                        path=friss_model_path)

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

# LOSS

loss_function = nn.CrossEntropyLoss()
optimizer = optim.AdamW(model.parameters(), lr=5e-4)#, weight_decay=1e-5)

# Train the model
alpha_value = 0.5
num_epochs_value = 10

save_path = "models/"

metrics = train(model, train_dataloader, test_dataloader, optimizer, loss_function, tau_min=tau_min, tau_decay=tau_decay, alpha=alpha_value, num_epochs=num_epochs_value, device=device, save=True, save_path=save_path)

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.predictions.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epochs:   0%|          | 0/10 [00:00<?, ?it/s]

Batches:   0%|          | 0/114 [00:00<?, ?it/s]

# Grid Search

In [38]:
from itertools import product
import torch.optim as optim
import csv

from tqdm.notebook import tqdm

# Hyperparameters
embedding_dim = 768

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

def grid_search(train_dataloader, test_dataloader, search_space, num_epochs=10):
    # Store the results for each hyperparameter combination
    results = {}

    # Fixed values for K and num_frames
    K = 14
    num_frames = 14

    # Fixed values for dropout_prob and bert_model_name (adjust if necessary)
    bert_model_name = "../notebooks/models/fine-tuned-model/"

    # Initialize the file to write metrics
    with open("../notebooks/grid_search_metrics.csv", "w", newline='') as csvfile:
        fieldnames = ['combination', 'alpha', 'lr', 'D_h', 'lambda_orthogonality', 'M', 't', 'tau_min', 'tau_decay', 'dropout_prob', 'epoch', 'f1_span_micro', 'f1_span_macro', 'f1_sentence_micro', 'f1_sentence_macro', 'f1_combined_micro', 'f1_combined_macro']
        writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
        writer.writeheader()

        # Calculate the total number of combinations
        total_combinations = 1
        for key, values in search_space.items():
            total_combinations *= len(values)

        # Loop through all combinations
        for idx, combination in enumerate(product(*search_space.values())):
            print(f"Training combination {idx + 1}/{total_combinations}: {combination}")

            # Extract hyperparameters from the current combination
            alpha, lr, tau_min, tau_decay, t, D_h, lambda_orthogonality, M, dropout_prob = combination

            # Initialize the model with current hyperparameters
            model = FRISS(embedding_dim, D_h, lambda_orthogonality, M, t, num_sentences, K, num_frames, dropout_prob, bert_model_name)
            model.to(device)
        
                
            # Compute the `weight` parameter for each label
            label_frequencies = y.mean()
            weights = 1 / (label_frequencies + 1e-10)  # Adding a small value to avoid division by zero

            # Compute the `pos_weight` parameter
            pos_weights = (1 - label_frequencies) / (label_frequencies + 1e-10)

            # Convert the computed weights and pos_weights to PyTorch tensors
            weights_tensor = torch.tensor(weights.values, dtype=torch.float32).to(device)
            pos_weights_tensor = torch.tensor(pos_weights.values, dtype=torch.float32).to(device)

            loss_function = nn.BCEWithLogitsLoss(weight=weights_tensor, pos_weight=pos_weights_tensor, reduction="mean")
        
            # Define the optimizer
            optimizer = optim.AdamW(model.parameters(), lr=lr)

            # Define loss_function if needed (add this if your train function requires it)

            # Train the model with the current hyperparameters
            epoch_metrics = train(model, train_dataloader, test_dataloader, optimizer, loss_function, alpha=alpha, num_epochs=num_epochs, tau_min=tau_min, tau_decay=tau_decay, device=device, save=False)

            # Write the metrics to the CSV file
            for epoch in range(num_epochs):
                f1_span_micro = epoch_metrics['f1_span_micro'][epoch]
                f1_span_macro = epoch_metrics['f1_span_macro'][epoch]
                f1_sentence_micro = epoch_metrics['f1_sentence_micro'][epoch]
                f1_sentence_macro = epoch_metrics['f1_sentence_macro'][epoch]
                f1_combined_micro = epoch_metrics['f1_combined_micro'][epoch]
                f1_combined_macro = epoch_metrics['f1_combined_macro'][epoch]
                row = {
                    'combination': idx,
                    'alpha': alpha,
                    'lr': lr,
                    'D_h': D_h,
                    'lambda_orthogonality': lambda_orthogonality,
                    'M': M,
                    't': t,
                    'tau_min': tau_min,
                    'tau_decay': tau_decay,
                    'dropout_prob': dropout_prob,
                    'epoch': epoch + 1,
                    'f1_span_micro': f1_span_micro,
                    'f1_span_macro': f1_span_macro,
                    'f1_sentence_micro': f1_sentence_micro,
                    'f1_sentence_macro': f1_sentence_macro,
                    'f1_combined_micro': f1_combined_micro,
                    'f1_combined_macro': f1_combined_macro
                }
                writer.writerow(row)
                csvfile.flush()

    return results

search_space = {
    'alpha': [0.5, 0.2, 0.8],
    'lr': [1e-5, 2e-5, 5e-4, 1e-3],
    'tau_min': [0.5],
    'tau_decay': [5e-4],
    't': [5, 8, 10, 20],
    'D_h': [768, 768 * 2, 768 // 2, 768 * 3],
    'lambda_orthogonality': [1e-6, 1e-4, 5e-4, 1e-3, 5e-3, 1e-2],
    'M': [5, 8, 10, 20],
    'dropout_rate': [0.1, 0.2, 0.3, 0.5]
}

# Call the grid search function
results = grid_search(train_dataloader, test_dataloader, search_space, 10)
results

Training combination 1/18432: (0.5, 1e-05, 0.5, 0.0005, 5, 768, 1e-06, 5, 0.1)


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epochs:   0%|          | 0/10 [00:00<?, ?it/s]

Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 1/10, Combined Loss: 10742.722412109375, Supervised Loss: 8.291224559148153, Unsupervised Loss: 21477.15380859375
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.2967/0.3478/0.2987, macro F1 - Span/Sentence/Combined: 0.1632/0.2973/0.2817


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 2/10, Combined Loss: 10730.689615885416, Supervised Loss: 6.861002524693807, Unsupervised Loss: 21454.51806640625
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3200/0.3226/0.3306, macro F1 - Span/Sentence/Combined: 0.2137/0.2673/0.3265


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 3/10, Combined Loss: 10719.414794921875, Supervised Loss: 6.123968799908956, Unsupervised Loss: 21432.705729166668
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3033/0.3279/0.2849, macro F1 - Span/Sentence/Combined: 0.2183/0.2714/0.2820


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 4/10, Combined Loss: 10707.821126302084, Supervised Loss: 5.65417758623759, Unsupervised Loss: 21409.987955729168
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3069/0.3443/0.3448, macro F1 - Span/Sentence/Combined: 0.2677/0.2820/0.3463


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 5/10, Combined Loss: 10696.410237630209, Supervised Loss: 5.399417002995809, Unsupervised Loss: 21387.4208984375
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3433/0.3297/0.3673, macro F1 - Span/Sentence/Combined: 0.2962/0.2721/0.3546


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 6/10, Combined Loss: 10685.546630859375, Supervised Loss: 5.135079741477966, Unsupervised Loss: 21365.958170572918
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3351/0.3425/0.3916, macro F1 - Span/Sentence/Combined: 0.2871/0.2778/0.3657


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 7/10, Combined Loss: 10674.185953776041, Supervised Loss: 4.990791877110799, Unsupervised Loss: 21343.381184895832
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3529/0.3416/0.3988, macro F1 - Span/Sentence/Combined: 0.2929/0.2797/0.3600


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 8/10, Combined Loss: 10662.970296223959, Supervised Loss: 4.8219631512959795, Unsupervised Loss: 21321.118326822918
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3381/0.3370/0.4337, macro F1 - Span/Sentence/Combined: 0.2792/0.2779/0.3708


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 9/10, Combined Loss: 10652.023518880209, Supervised Loss: 4.621726155281067, Unsupervised Loss: 21299.42529296875
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3615/0.3370/0.4286, macro F1 - Span/Sentence/Combined: 0.3024/0.2785/0.3797


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 10/10, Combined Loss: 10641.364908854166, Supervised Loss: 4.492606043815613, Unsupervised Loss: 21278.2373046875
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3536/0.3370/0.4343, macro F1 - Span/Sentence/Combined: 0.2888/0.2786/0.3809
Training combination 2/18432: (0.5, 1e-05, 0.5, 0.0005, 5, 768, 1e-06, 5, 0.2)


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epochs:   0%|          | 0/10 [00:00<?, ?it/s]

Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 1/10, Combined Loss: 10190.951822916666, Supervised Loss: 8.75970216592153, Unsupervised Loss: 20373.14404296875
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3902/0.2838/0.2865, macro F1 - Span/Sentence/Combined: 0.2312/0.2487/0.2812


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 2/10, Combined Loss: 10179.213541666666, Supervised Loss: 6.869309186935425, Unsupervised Loss: 20351.55810546875
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3940/0.2745/0.3178, macro F1 - Span/Sentence/Combined: 0.2569/0.2496/0.3118


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 3/10, Combined Loss: 10168.390869140625, Supervised Loss: 6.330201546351115, Unsupervised Loss: 20330.45166015625
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3868/0.2617/0.2889, macro F1 - Span/Sentence/Combined: 0.2536/0.2347/0.2878


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 4/10, Combined Loss: 10157.849283854166, Supervised Loss: 5.987180471420288, Unsupervised Loss: 20309.71142578125
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3780/0.2658/0.2927, macro F1 - Span/Sentence/Combined: 0.2472/0.2397/0.2832


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 5/10, Combined Loss: 10146.744303385416, Supervised Loss: 5.644481778144836, Unsupervised Loss: 20287.844075520832
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3760/0.2609/0.3149, macro F1 - Span/Sentence/Combined: 0.2945/0.2355/0.3021


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 6/10, Combined Loss: 10135.916585286459, Supervised Loss: 5.388167937596639, Unsupervised Loss: 20266.444661458332
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3683/0.2626/0.3519, macro F1 - Span/Sentence/Combined: 0.3095/0.2367/0.3284


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 7/10, Combined Loss: 10125.211181640625, Supervised Loss: 5.17006532351176, Unsupervised Loss: 20245.25244140625
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3353/0.2585/0.3522, macro F1 - Span/Sentence/Combined: 0.2972/0.2325/0.3221


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 8/10, Combined Loss: 10114.843831380209, Supervised Loss: 5.048325300216675, Unsupervised Loss: 20224.63916015625
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3205/0.2761/0.3412, macro F1 - Span/Sentence/Combined: 0.2823/0.2523/0.3168


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 9/10, Combined Loss: 10104.213785807291, Supervised Loss: 4.93995726108551, Unsupervised Loss: 20203.48779296875
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3167/0.2761/0.3512, macro F1 - Span/Sentence/Combined: 0.2775/0.2538/0.3282


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 10/10, Combined Loss: 10094.11669921875, Supervised Loss: 4.81717864672343, Unsupervised Loss: 20183.416015625
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3158/0.2667/0.3578, macro F1 - Span/Sentence/Combined: 0.2758/0.2406/0.3365
Training combination 3/18432: (0.5, 1e-05, 0.5, 0.0005, 5, 768, 1e-06, 5, 0.3)


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epochs:   0%|          | 0/10 [00:00<?, ?it/s]

Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 1/10, Combined Loss: 10889.759033203125, Supervised Loss: 8.680102030436197, Unsupervised Loss: 21770.837890625
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3882/0.3393/0.3228, macro F1 - Span/Sentence/Combined: 0.1941/0.2768/0.3199


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 2/10, Combined Loss: 10877.263509114584, Supervised Loss: 7.137778798739116, Unsupervised Loss: 21747.389322916668
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3776/0.3483/0.3271, macro F1 - Span/Sentence/Combined: 0.2255/0.2932/0.3226


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 3/10, Combined Loss: 10865.6943359375, Supervised Loss: 6.487258434295654, Unsupervised Loss: 21724.9013671875
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.4022/0.3401/0.3421, macro F1 - Span/Sentence/Combined: 0.2451/0.3035/0.3481


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 4/10, Combined Loss: 10854.372802734375, Supervised Loss: 6.104423880577087, Unsupervised Loss: 21702.641276041668
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.4183/0.3550/0.3280, macro F1 - Span/Sentence/Combined: 0.2585/0.3129/0.3239


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 5/10, Combined Loss: 10842.699381510416, Supervised Loss: 5.8525660037994385, Unsupervised Loss: 21679.5458984375
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3646/0.3491/0.3520, macro F1 - Span/Sentence/Combined: 0.2960/0.3082/0.3352


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 6/10, Combined Loss: 10831.598388671875, Supervised Loss: 5.639658451080322, Unsupervised Loss: 21657.55712890625
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3446/0.3529/0.3736, macro F1 - Span/Sentence/Combined: 0.2847/0.3129/0.3484


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 7/10, Combined Loss: 10819.965494791666, Supervised Loss: 5.351883252461751, Unsupervised Loss: 21634.579264322918
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3295/0.3488/0.3722, macro F1 - Span/Sentence/Combined: 0.2863/0.3093/0.3438


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 8/10, Combined Loss: 10809.4560546875, Supervised Loss: 5.23209011554718, Unsupervised Loss: 21613.679850260418
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3226/0.3509/0.3859, macro F1 - Span/Sentence/Combined: 0.2821/0.3122/0.3574


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 9/10, Combined Loss: 10797.915445963541, Supervised Loss: 5.12457013130188, Unsupervised Loss: 21590.7060546875
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3186/0.3526/0.4056, macro F1 - Span/Sentence/Combined: 0.2726/0.3132/0.3706


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 10/10, Combined Loss: 10786.681884765625, Supervised Loss: 5.110768675804138, Unsupervised Loss: 21568.2529296875
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3198/0.3499/0.3810, macro F1 - Span/Sentence/Combined: 0.2779/0.3103/0.3472
Training combination 4/18432: (0.5, 1e-05, 0.5, 0.0005, 5, 768, 1e-06, 5, 0.5)


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epochs:   0%|          | 0/10 [00:00<?, ?it/s]

Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 1/10, Combined Loss: 9974.148681640625, Supervised Loss: 8.798275073369345, Unsupervised Loss: 19939.499186197918
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.1446/0.3211/0.3152, macro F1 - Span/Sentence/Combined: 0.0639/0.2597/0.2892


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 2/10, Combined Loss: 9963.288004557291, Supervised Loss: 7.577118873596191, Unsupervised Loss: 19918.998860677082
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.1455/0.3155/0.3402, macro F1 - Span/Sentence/Combined: 0.0723/0.2744/0.3515


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 3/10, Combined Loss: 9952.843098958334, Supervised Loss: 7.189167340596517, Unsupervised Loss: 19898.496907552082
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.1372/0.3333/0.3246, macro F1 - Span/Sentence/Combined: 0.0805/0.2882/0.3236


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 4/10, Combined Loss: 9942.1005859375, Supervised Loss: 6.78916863600413, Unsupervised Loss: 19877.412109375
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.1352/0.3353/0.3295, macro F1 - Span/Sentence/Combined: 0.0916/0.2884/0.3221


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 5/10, Combined Loss: 9931.013997395834, Supervised Loss: 6.574899474779765, Unsupervised Loss: 19855.453125
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.1924/0.3155/0.3398, macro F1 - Span/Sentence/Combined: 0.1526/0.2747/0.3204


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 6/10, Combined Loss: 9920.526285807291, Supervised Loss: 6.34416921933492, Unsupervised Loss: 19834.708333333332
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.2918/0.3274/0.3473, macro F1 - Span/Sentence/Combined: 0.2717/0.2830/0.3259


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 7/10, Combined Loss: 9910.022705078125, Supervised Loss: 6.203058997790019, Unsupervised Loss: 19813.84228515625
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3606/0.3186/0.3687, macro F1 - Span/Sentence/Combined: 0.3271/0.2758/0.3420


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 8/10, Combined Loss: 9899.96484375, Supervised Loss: 6.0096962451934814, Unsupervised Loss: 19793.920247395832
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3526/0.3274/0.3581, macro F1 - Span/Sentence/Combined: 0.3084/0.2839/0.3416


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 9/10, Combined Loss: 9888.709716796875, Supervised Loss: 5.851763089497884, Unsupervised Loss: 19771.56787109375
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3677/0.3333/0.3621, macro F1 - Span/Sentence/Combined: 0.3246/0.2888/0.3361


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 10/10, Combined Loss: 9878.758707682291, Supervised Loss: 5.774728059768677, Unsupervised Loss: 19751.743001302082
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3410/0.3195/0.3491, macro F1 - Span/Sentence/Combined: 0.3032/0.2794/0.3231
Training combination 5/18432: (0.5, 1e-05, 0.5, 0.0005, 5, 768, 1e-06, 8, 0.1)


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epochs:   0%|          | 0/10 [00:00<?, ?it/s]

Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 1/10, Combined Loss: 10472.898600260416, Supervised Loss: 8.675973693529764, Unsupervised Loss: 20937.120930989582
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3221/0.3909/0.3262, macro F1 - Span/Sentence/Combined: 0.2134/0.2905/0.3136


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 2/10, Combined Loss: 10461.129801432291, Supervised Loss: 6.713871479034424, Unsupervised Loss: 20915.5458984375
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3444/0.3833/0.3298, macro F1 - Span/Sentence/Combined: 0.2582/0.3262/0.3041


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 3/10, Combined Loss: 10449.904947916666, Supervised Loss: 6.133942723274231, Unsupervised Loss: 20893.675944010418
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3569/0.3761/0.3438, macro F1 - Span/Sentence/Combined: 0.2177/0.3206/0.3153


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 4/10, Combined Loss: 10438.66845703125, Supervised Loss: 5.636502305666606, Unsupervised Loss: 20871.7001953125
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3417/0.3782/0.3491, macro F1 - Span/Sentence/Combined: 0.2327/0.3282/0.3232


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 5/10, Combined Loss: 10427.66650390625, Supervised Loss: 5.401580254236857, Unsupervised Loss: 20849.931315104168
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.2841/0.3804/0.3609, macro F1 - Span/Sentence/Combined: 0.1995/0.3294/0.3327


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 6/10, Combined Loss: 10416.75, Supervised Loss: 5.191921949386597, Unsupervised Loss: 20828.307942708332
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3342/0.3725/0.3701, macro F1 - Span/Sentence/Combined: 0.3036/0.3239/0.3460


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 7/10, Combined Loss: 10405.425618489584, Supervised Loss: 5.0025105476379395, Unsupervised Loss: 20805.848958333332
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.2963/0.3793/0.3551, macro F1 - Span/Sentence/Combined: 0.2624/0.3300/0.3341


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 8/10, Combined Loss: 10395.133219401041, Supervised Loss: 4.788841724395752, Unsupervised Loss: 20785.4775390625
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3118/0.3790/0.3836, macro F1 - Span/Sentence/Combined: 0.2724/0.3303/0.3442


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 9/10, Combined Loss: 10384.02587890625, Supervised Loss: 4.654747128486633, Unsupervised Loss: 20763.39697265625
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3181/0.3768/0.4174, macro F1 - Span/Sentence/Combined: 0.2770/0.3290/0.3778


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 10/10, Combined Loss: 10373.810628255209, Supervised Loss: 4.545279900232951, Unsupervised Loss: 20743.076009114582
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3298/0.3779/0.4142, macro F1 - Span/Sentence/Combined: 0.2922/0.3301/0.3807
Training combination 6/18432: (0.5, 1e-05, 0.5, 0.0005, 5, 768, 1e-06, 8, 0.2)


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epochs:   0%|          | 0/10 [00:00<?, ?it/s]

Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 1/10, Combined Loss: 10980.270670572916, Supervised Loss: 8.815333366394043, Unsupervised Loss: 21951.725748697918
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3957/0.3537/0.3190, macro F1 - Span/Sentence/Combined: 0.3047/0.2801/0.3049


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 2/10, Combined Loss: 10967.957682291666, Supervised Loss: 6.867220759391785, Unsupervised Loss: 21929.04833984375
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.4101/0.3176/0.3299, macro F1 - Span/Sentence/Combined: 0.3122/0.2788/0.3162


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 3/10, Combined Loss: 10956.1005859375, Supervised Loss: 6.361283143361409, Unsupervised Loss: 21905.840169270832
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.4141/0.3198/0.3128, macro F1 - Span/Sentence/Combined: 0.3354/0.2866/0.3051


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 4/10, Combined Loss: 10945.05810546875, Supervised Loss: 5.910351792971293, Unsupervised Loss: 21884.205891927082
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3981/0.3104/0.3395, macro F1 - Span/Sentence/Combined: 0.3138/0.2728/0.3264


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 5/10, Combined Loss: 10933.777994791666, Supervised Loss: 5.604281187057495, Unsupervised Loss: 21861.951985677082
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3844/0.3036/0.3641, macro F1 - Span/Sentence/Combined: 0.3142/0.2718/0.3388


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 6/10, Combined Loss: 10922.520100911459, Supervised Loss: 5.306490778923035, Unsupervised Loss: 21839.733561197918
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.2769/0.3072/0.3743, macro F1 - Span/Sentence/Combined: 0.2451/0.2732/0.3502


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 7/10, Combined Loss: 10911.166015625, Supervised Loss: 5.182777166366577, Unsupervised Loss: 21817.1494140625
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.2597/0.3086/0.3807, macro F1 - Span/Sentence/Combined: 0.2158/0.2759/0.3462


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 8/10, Combined Loss: 10899.875244140625, Supervised Loss: 5.044976313908895, Unsupervised Loss: 21794.70556640625
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.2759/0.3086/0.4046, macro F1 - Span/Sentence/Combined: 0.2505/0.2753/0.3577


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 9/10, Combined Loss: 10888.578206380209, Supervised Loss: 4.870901902516683, Unsupervised Loss: 21772.285319010418
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.2875/0.3003/0.4198, macro F1 - Span/Sentence/Combined: 0.2550/0.2669/0.3803


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 10/10, Combined Loss: 10877.590738932291, Supervised Loss: 4.7318166097005205, Unsupervised Loss: 21750.449544270832
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3123/0.3054/0.4343, macro F1 - Span/Sentence/Combined: 0.2718/0.2721/0.3845
Training combination 7/18432: (0.5, 1e-05, 0.5, 0.0005, 5, 768, 1e-06, 8, 0.3)


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epochs:   0%|          | 0/10 [00:00<?, ?it/s]

Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 1/10, Combined Loss: 10283.68115234375, Supervised Loss: 8.682020227114359, Unsupervised Loss: 20558.680013020832
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3392/0.3657/0.3239, macro F1 - Span/Sentence/Combined: 0.2222/0.3343/0.3044


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 2/10, Combined Loss: 10272.524251302084, Supervised Loss: 7.281842033068339, Unsupervised Loss: 20537.7666015625
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3577/0.3776/0.3222, macro F1 - Span/Sentence/Combined: 0.2138/0.3409/0.2984


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 3/10, Combined Loss: 10261.26220703125, Supervised Loss: 6.619966904322307, Unsupervised Loss: 20515.904296875
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3577/0.3631/0.3370, macro F1 - Span/Sentence/Combined: 0.2138/0.3249/0.3243


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 4/10, Combined Loss: 10250.696126302084, Supervised Loss: 6.141745766003926, Unsupervised Loss: 20495.25048828125
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3607/0.3460/0.3464, macro F1 - Span/Sentence/Combined: 0.2345/0.3045/0.3317


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 5/10, Combined Loss: 10239.940673828125, Supervised Loss: 5.854116280873616, Unsupervised Loss: 20474.027506510418
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3727/0.3540/0.3533, macro F1 - Span/Sentence/Combined: 0.3065/0.3096/0.3395


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 6/10, Combined Loss: 10229.525146484375, Supervised Loss: 5.690005381902059, Unsupervised Loss: 20453.360514322918
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3642/0.3588/0.3815, macro F1 - Span/Sentence/Combined: 0.3126/0.3167/0.3610


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 7/10, Combined Loss: 10217.711507161459, Supervised Loss: 5.514790058135986, Unsupervised Loss: 20429.908203125
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3526/0.3473/0.3942, macro F1 - Span/Sentence/Combined: 0.2871/0.3061/0.3703


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 8/10, Combined Loss: 10207.613606770834, Supervised Loss: 5.3506219784418745, Unsupervised Loss: 20409.876790364582
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3408/0.3522/0.3965, macro F1 - Span/Sentence/Combined: 0.2717/0.3104/0.3611


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 9/10, Combined Loss: 10196.924397786459, Supervised Loss: 5.141824126243591, Unsupervised Loss: 20388.70703125
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3548/0.3582/0.4142, macro F1 - Span/Sentence/Combined: 0.2857/0.3137/0.3673


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 10/10, Combined Loss: 10186.587809244791, Supervised Loss: 5.079385995864868, Unsupervised Loss: 20368.096028645832
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3571/0.3571/0.4073, macro F1 - Span/Sentence/Combined: 0.2944/0.3115/0.3590
Training combination 8/18432: (0.5, 1e-05, 0.5, 0.0005, 5, 768, 1e-06, 8, 0.5)


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epochs:   0%|          | 0/10 [00:00<?, ?it/s]

Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 1/10, Combined Loss: 10661.678548177084, Supervised Loss: 8.744121392567953, Unsupervised Loss: 21314.61279296875
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.2689/0.3158/0.3059, macro F1 - Span/Sentence/Combined: 0.1409/0.2694/0.2905


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 2/10, Combined Loss: 10650.69970703125, Supervised Loss: 7.643310546875, Unsupervised Loss: 21293.75634765625
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3272/0.3386/0.2825, macro F1 - Span/Sentence/Combined: 0.1871/0.2818/0.2847


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 3/10, Combined Loss: 10639.35595703125, Supervised Loss: 6.981243133544922, Unsupervised Loss: 21271.730794270832
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3121/0.3675/0.3137, macro F1 - Span/Sentence/Combined: 0.1964/0.3014/0.3205


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 4/10, Combined Loss: 10627.574788411459, Supervised Loss: 6.722132166226705, Unsupervised Loss: 21248.427571614582
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.2974/0.3780/0.3390, macro F1 - Span/Sentence/Combined: 0.2155/0.3077/0.3350


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 5/10, Combined Loss: 10617.595703125, Supervised Loss: 6.502209305763245, Unsupervised Loss: 21228.689453125
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3269/0.3731/0.3398, macro F1 - Span/Sentence/Combined: 0.2920/0.3048/0.3297


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 6/10, Combined Loss: 10605.669514973959, Supervised Loss: 6.18706750869751, Unsupervised Loss: 21205.15234375
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3307/0.3853/0.3343, macro F1 - Span/Sentence/Combined: 0.3075/0.3178/0.3231


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 7/10, Combined Loss: 10594.732096354166, Supervised Loss: 6.050868272781372, Unsupervised Loss: 21183.41357421875
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3536/0.3769/0.3546, macro F1 - Span/Sentence/Combined: 0.3236/0.3112/0.3402


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 8/10, Combined Loss: 10583.742106119791, Supervised Loss: 5.899555683135986, Unsupervised Loss: 21161.584635416668
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3585/0.3804/0.3642, macro F1 - Span/Sentence/Combined: 0.3234/0.3138/0.3500


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 9/10, Combined Loss: 10572.910725911459, Supervised Loss: 5.73435652256012, Unsupervised Loss: 21140.08740234375
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3616/0.3784/0.3615, macro F1 - Span/Sentence/Combined: 0.3292/0.3121/0.3457


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 10/10, Combined Loss: 10562.75537109375, Supervised Loss: 5.710389455159505, Unsupervised Loss: 21119.800130208332
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3533/0.3724/0.3626, macro F1 - Span/Sentence/Combined: 0.3255/0.3081/0.3522
Training combination 9/18432: (0.5, 1e-05, 0.5, 0.0005, 5, 768, 1e-06, 10, 0.1)


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epochs:   0%|          | 0/10 [00:00<?, ?it/s]

Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 1/10, Combined Loss: 10482.662679036459, Supervised Loss: 8.592813889185587, Unsupervised Loss: 20956.732259114582
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3494/0.3179/0.3300, macro F1 - Span/Sentence/Combined: 0.2269/0.2579/0.3002


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 2/10, Combined Loss: 10471.0810546875, Supervised Loss: 6.837433973948161, Unsupervised Loss: 20935.32470703125
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.2826/0.3333/0.3500, macro F1 - Span/Sentence/Combined: 0.1769/0.2904/0.3207


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 3/10, Combined Loss: 10461.225423177084, Supervised Loss: 6.129064321517944, Unsupervised Loss: 20916.321614583332
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.2857/0.3508/0.3868, macro F1 - Span/Sentence/Combined: 0.1777/0.3113/0.3588


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 4/10, Combined Loss: 10449.521647135416, Supervised Loss: 5.768710215886434, Unsupervised Loss: 20893.274576822918
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.2857/0.3519/0.3651, macro F1 - Span/Sentence/Combined: 0.1763/0.3180/0.3302


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 5/10, Combined Loss: 10438.036946614584, Supervised Loss: 5.458353281021118, Unsupervised Loss: 20870.615397135418
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3263/0.3415/0.3802, macro F1 - Span/Sentence/Combined: 0.2594/0.3092/0.3368


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 6/10, Combined Loss: 10427.958984375, Supervised Loss: 5.199507474899292, Unsupervised Loss: 20850.718424479168
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3333/0.3333/0.3820, macro F1 - Span/Sentence/Combined: 0.2936/0.3024/0.3407


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 7/10, Combined Loss: 10416.8671875, Supervised Loss: 4.971852461496989, Unsupervised Loss: 20828.762532552082
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.4080/0.3415/0.4012, macro F1 - Span/Sentence/Combined: 0.3675/0.3110/0.3547


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 8/10, Combined Loss: 10407.068033854166, Supervised Loss: 4.8070288101832075, Unsupervised Loss: 20809.3291015625
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.4034/0.3425/0.4189, macro F1 - Span/Sentence/Combined: 0.3465/0.3107/0.3699


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 9/10, Combined Loss: 10396.803629557291, Supervised Loss: 4.7101595004399615, Unsupervised Loss: 20788.897135416668
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.4045/0.3364/0.4493, macro F1 - Span/Sentence/Combined: 0.3454/0.3057/0.3891


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 10/10, Combined Loss: 10386.396158854166, Supervised Loss: 4.5806015729904175, Unsupervised Loss: 20768.211751302082
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.4104/0.3323/0.4716, macro F1 - Span/Sentence/Combined: 0.3457/0.3053/0.3981
Training combination 10/18432: (0.5, 1e-05, 0.5, 0.0005, 5, 768, 1e-06, 10, 0.2)


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epochs:   0%|          | 0/10 [00:00<?, ?it/s]

Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 1/10, Combined Loss: 10661.548014322916, Supervised Loss: 8.51769240697225, Unsupervised Loss: 21314.578287760418
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3848/0.3081/0.3740, macro F1 - Span/Sentence/Combined: 0.2319/0.2634/0.3511


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 2/10, Combined Loss: 10649.832845052084, Supervised Loss: 6.975442926088969, Unsupervised Loss: 21292.690266927082
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3536/0.3236/0.3929, macro F1 - Span/Sentence/Combined: 0.2072/0.2759/0.3646


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 3/10, Combined Loss: 10638.334879557291, Supervised Loss: 6.255524317423503, Unsupervised Loss: 21270.414225260418
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3494/0.3298/0.4152, macro F1 - Span/Sentence/Combined: 0.2028/0.2812/0.3711


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 4/10, Combined Loss: 10628.563313802084, Supervised Loss: 5.846243659655253, Unsupervised Loss: 21251.2802734375
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3420/0.3395/0.4239, macro F1 - Span/Sentence/Combined: 0.2287/0.2881/0.3794


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 5/10, Combined Loss: 10616.934814453125, Supervised Loss: 5.595973213513692, Unsupervised Loss: 21228.273600260418
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3636/0.3360/0.4074, macro F1 - Span/Sentence/Combined: 0.2958/0.2852/0.3487


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 6/10, Combined Loss: 10606.157063802084, Supervised Loss: 5.349491198857625, Unsupervised Loss: 21206.964680989582
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3354/0.3439/0.4196, macro F1 - Span/Sentence/Combined: 0.2770/0.2925/0.3611


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 7/10, Combined Loss: 10595.854085286459, Supervised Loss: 5.191064119338989, Unsupervised Loss: 21186.517415364582
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3448/0.3429/0.4266, macro F1 - Span/Sentence/Combined: 0.2728/0.2912/0.3637


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 8/10, Combined Loss: 10584.482096354166, Supervised Loss: 4.964810808499654, Unsupervised Loss: 21163.999348958332
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3450/0.3360/0.4298, macro F1 - Span/Sentence/Combined: 0.2714/0.2886/0.3731


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 9/10, Combined Loss: 10574.242106119791, Supervised Loss: 4.889966408411662, Unsupervised Loss: 21143.59423828125
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3503/0.3386/0.4350, macro F1 - Span/Sentence/Combined: 0.2746/0.2894/0.3735


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 10/10, Combined Loss: 10562.73046875, Supervised Loss: 4.746785998344421, Unsupervised Loss: 21120.714029947918
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3473/0.3395/0.4505, macro F1 - Span/Sentence/Combined: 0.2678/0.2911/0.3884
Training combination 11/18432: (0.5, 1e-05, 0.5, 0.0005, 5, 768, 1e-06, 10, 0.3)


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epochs:   0%|          | 0/10 [00:00<?, ?it/s]

Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 1/10, Combined Loss: 10688.896484375, Supervised Loss: 8.475899457931519, Unsupervised Loss: 21369.316731770832
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3608/0.3193/0.3646, macro F1 - Span/Sentence/Combined: 0.2339/0.2521/0.3376


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 2/10, Combined Loss: 10678.33056640625, Supervised Loss: 7.021985411643982, Unsupervised Loss: 21349.63916015625
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3740/0.3533/0.3632, macro F1 - Span/Sentence/Combined: 0.2268/0.3122/0.3462


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 3/10, Combined Loss: 10666.422281901041, Supervised Loss: 6.47585932413737, Unsupervised Loss: 21326.36865234375
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3641/0.3440/0.3820, macro F1 - Span/Sentence/Combined: 0.2282/0.2948/0.3584


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 4/10, Combined Loss: 10655.522379557291, Supervised Loss: 6.12101682027181, Unsupervised Loss: 21304.923665364582
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3665/0.3536/0.4032, macro F1 - Span/Sentence/Combined: 0.2655/0.3022/0.3684


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 5/10, Combined Loss: 10644.155192057291, Supervised Loss: 5.876736640930176, Unsupervised Loss: 21282.433756510418
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3325/0.3440/0.4164, macro F1 - Span/Sentence/Combined: 0.2571/0.2927/0.3767


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 6/10, Combined Loss: 10633.530110677084, Supervised Loss: 5.644309560457866, Unsupervised Loss: 21261.416015625
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3791/0.3567/0.4254, macro F1 - Span/Sentence/Combined: 0.3380/0.3048/0.3828


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 7/10, Combined Loss: 10622.7236328125, Supervised Loss: 5.4167799949646, Unsupervised Loss: 21240.030598958332
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3438/0.3567/0.4225, macro F1 - Span/Sentence/Combined: 0.3146/0.3046/0.3712


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 8/10, Combined Loss: 10611.48876953125, Supervised Loss: 5.286987543106079, Unsupervised Loss: 21217.690592447918
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3064/0.3460/0.4401, macro F1 - Span/Sentence/Combined: 0.2633/0.2903/0.3821


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 9/10, Combined Loss: 10600.439371744791, Supervised Loss: 5.1128023862838745, Unsupervised Loss: 21195.765787760418
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3116/0.3563/0.4496, macro F1 - Span/Sentence/Combined: 0.2563/0.3057/0.3901


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 10/10, Combined Loss: 10590.037109375, Supervised Loss: 5.0074706474939985, Unsupervised Loss: 21175.06689453125
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3116/0.3557/0.4457, macro F1 - Span/Sentence/Combined: 0.2595/0.3059/0.3764
Training combination 12/18432: (0.5, 1e-05, 0.5, 0.0005, 5, 768, 1e-06, 10, 0.5)


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epochs:   0%|          | 0/10 [00:00<?, ?it/s]

Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 1/10, Combined Loss: 10853.129720052084, Supervised Loss: 8.587827205657959, Unsupervised Loss: 21697.671549479168
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.2614/0.3343/0.3452, macro F1 - Span/Sentence/Combined: 0.1379/0.2703/0.3282


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 2/10, Combined Loss: 10842.634114583334, Supervised Loss: 7.759606242179871, Unsupervised Loss: 21677.508463541668
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.2278/0.3446/0.3200, macro F1 - Span/Sentence/Combined: 0.1317/0.2981/0.2881


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 3/10, Combined Loss: 10830.299967447916, Supervised Loss: 7.236850261688232, Unsupervised Loss: 21653.362955729168
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.2222/0.3228/0.3270, macro F1 - Span/Sentence/Combined: 0.1402/0.2758/0.3075


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 4/10, Combined Loss: 10818.595540364584, Supervised Loss: 6.867072065671285, Unsupervised Loss: 21630.324055989582
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.2326/0.3269/0.3361, macro F1 - Span/Sentence/Combined: 0.1560/0.2777/0.3155


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 5/10, Combined Loss: 10808.556396484375, Supervised Loss: 6.509990056355794, Unsupervised Loss: 21610.602864583332
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.2424/0.3141/0.3660, macro F1 - Span/Sentence/Combined: 0.1809/0.2710/0.3338


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 6/10, Combined Loss: 10797.155924479166, Supervised Loss: 6.36370583375295, Unsupervised Loss: 21587.948079427082
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.2706/0.3218/0.3551, macro F1 - Span/Sentence/Combined: 0.2397/0.2754/0.3241


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 7/10, Combined Loss: 10785.569010416666, Supervised Loss: 6.124614755312602, Unsupervised Loss: 21565.013346354168
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.2690/0.3121/0.3676, macro F1 - Span/Sentence/Combined: 0.2427/0.2700/0.3352


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 8/10, Combined Loss: 10775.020670572916, Supervised Loss: 5.955784320831299, Unsupervised Loss: 21544.085611979168
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.2485/0.3121/0.3657, macro F1 - Span/Sentence/Combined: 0.2216/0.2687/0.3280


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 9/10, Combined Loss: 10765.179606119791, Supervised Loss: 5.870326280593872, Unsupervised Loss: 21524.488606770832
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.2622/0.3175/0.3799, macro F1 - Span/Sentence/Combined: 0.2326/0.2741/0.3348


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 10/10, Combined Loss: 10753.415120442709, Supervised Loss: 5.715578158696492, Unsupervised Loss: 21501.11474609375
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.2831/0.3155/0.4146, macro F1 - Span/Sentence/Combined: 0.2500/0.2750/0.3776
Training combination 13/18432: (0.5, 1e-05, 0.5, 0.0005, 5, 768, 1e-06, 20, 0.1)


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epochs:   0%|          | 0/10 [00:00<?, ?it/s]

Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 1/10, Combined Loss: 10862.632893880209, Supervised Loss: 8.767576416333517, Unsupervised Loss: 21716.498372395832
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.2656/0.3653/0.3099, macro F1 - Span/Sentence/Combined: 0.1682/0.2997/0.2974


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 2/10, Combined Loss: 10852.928548177084, Supervised Loss: 6.791433771451314, Unsupervised Loss: 21699.065755208332
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.2656/0.3394/0.3255, macro F1 - Span/Sentence/Combined: 0.1682/0.2938/0.3018


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 3/10, Combined Loss: 10844.151448567709, Supervised Loss: 6.2412029504776, Unsupervised Loss: 21682.061848958332
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.2735/0.3701/0.3413, macro F1 - Span/Sentence/Combined: 0.1850/0.3229/0.3244


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 4/10, Combined Loss: 10833.110921223959, Supervised Loss: 5.7721532980601, Unsupervised Loss: 21660.449544270832
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3046/0.3680/0.3352, macro F1 - Span/Sentence/Combined: 0.2295/0.3223/0.3265


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 5/10, Combined Loss: 10825.307535807291, Supervised Loss: 5.457232117652893, Unsupervised Loss: 21645.15771484375
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3646/0.3721/0.3988, macro F1 - Span/Sentence/Combined: 0.3163/0.3300/0.3774


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 6/10, Combined Loss: 10815.844645182291, Supervised Loss: 5.204054395357768, Unsupervised Loss: 21626.48486328125
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.4254/0.3728/0.4373, macro F1 - Span/Sentence/Combined: 0.3530/0.3316/0.3971


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 7/10, Combined Loss: 10803.450927734375, Supervised Loss: 5.01534100373586, Unsupervised Loss: 21601.88671875
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.4298/0.3561/0.4490, macro F1 - Span/Sentence/Combined: 0.3542/0.3092/0.3951


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 8/10, Combined Loss: 10793.752766927084, Supervised Loss: 4.84704593817393, Unsupervised Loss: 21582.658528645832
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.4162/0.3680/0.4611, macro F1 - Span/Sentence/Combined: 0.3350/0.3237/0.4036


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 9/10, Combined Loss: 10787.490397135416, Supervised Loss: 4.665523727734883, Unsupervised Loss: 21570.315104166668
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3989/0.3631/0.4324, macro F1 - Span/Sentence/Combined: 0.3248/0.3144/0.3826


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 10/10, Combined Loss: 10776.193522135416, Supervised Loss: 4.528242707252502, Unsupervised Loss: 21547.858561197918
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3897/0.3582/0.4169, macro F1 - Span/Sentence/Combined: 0.3462/0.3103/0.3608
Training combination 14/18432: (0.5, 1e-05, 0.5, 0.0005, 5, 768, 1e-06, 20, 0.2)


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epochs:   0%|          | 0/10 [00:00<?, ?it/s]

Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 1/10, Combined Loss: 11072.079182942709, Supervised Loss: 8.750488917032877, Unsupervised Loss: 22135.407877604168
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.5088/0.3382/0.2989, macro F1 - Span/Sentence/Combined: 0.2721/0.2705/0.2736


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 2/10, Combined Loss: 11060.159016927084, Supervised Loss: 7.0830899477005005, Unsupervised Loss: 22113.23486328125
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3923/0.3294/0.2744, macro F1 - Span/Sentence/Combined: 0.1278/0.2759/0.2676


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 3/10, Combined Loss: 11049.218831380209, Supervised Loss: 6.455702106157939, Unsupervised Loss: 22091.982096354168
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3905/0.3257/0.3000, macro F1 - Span/Sentence/Combined: 0.1278/0.2756/0.2884


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 4/10, Combined Loss: 11041.113850911459, Supervised Loss: 6.041161100069682, Unsupervised Loss: 22076.1865234375
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3504/0.3152/0.3268, macro F1 - Span/Sentence/Combined: 0.1792/0.2655/0.3071


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 5/10, Combined Loss: 11029.133138020834, Supervised Loss: 5.634960055351257, Unsupervised Loss: 22052.631184895832
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.2517/0.3152/0.3041, macro F1 - Span/Sentence/Combined: 0.2171/0.2685/0.2762


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 6/10, Combined Loss: 11021.022379557291, Supervised Loss: 5.463170011838277, Unsupervised Loss: 22036.581705729168
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.2372/0.3314/0.3425, macro F1 - Span/Sentence/Combined: 0.2117/0.2794/0.3127


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 7/10, Combined Loss: 11013.899658203125, Supervised Loss: 5.227872967720032, Unsupervised Loss: 22022.5712890625
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.2545/0.3191/0.3571, macro F1 - Span/Sentence/Combined: 0.2227/0.2714/0.3269


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 8/10, Combined Loss: 11002.421223958334, Supervised Loss: 5.091001311937968, Unsupervised Loss: 21999.75146484375
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.2590/0.3121/0.3450, macro F1 - Span/Sentence/Combined: 0.2254/0.2661/0.3183


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 9/10, Combined Loss: 10993.06884765625, Supervised Loss: 4.91452431678772, Unsupervised Loss: 21981.22314453125
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.2950/0.3209/0.3591, macro F1 - Span/Sentence/Combined: 0.2579/0.2777/0.3318


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 10/10, Combined Loss: 10984.048258463541, Supervised Loss: 4.7844158411026, Unsupervised Loss: 21963.312174479168
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.2939/0.3121/0.3556, macro F1 - Span/Sentence/Combined: 0.2549/0.2690/0.3216
Training combination 15/18432: (0.5, 1e-05, 0.5, 0.0005, 5, 768, 1e-06, 20, 0.3)


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epochs:   0%|          | 0/10 [00:00<?, ?it/s]

Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 1/10, Combined Loss: 10807.234212239584, Supervised Loss: 8.557300686836243, Unsupervised Loss: 21605.911295572918
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3315/0.3616/0.3255, macro F1 - Span/Sentence/Combined: 0.2064/0.3084/0.2813


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 2/10, Combined Loss: 10794.1396484375, Supervised Loss: 7.133374849955241, Unsupervised Loss: 21581.14599609375
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.2927/0.3429/0.3362, macro F1 - Span/Sentence/Combined: 0.1852/0.2924/0.3106


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 3/10, Combined Loss: 10786.63330078125, Supervised Loss: 6.581046064694722, Unsupervised Loss: 21566.685546875
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.2957/0.3641/0.3072, macro F1 - Span/Sentence/Combined: 0.1936/0.3110/0.2969


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 4/10, Combined Loss: 10778.104817708334, Supervised Loss: 6.0083597501118975, Unsupervised Loss: 21550.201171875
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.2949/0.3606/0.3304, macro F1 - Span/Sentence/Combined: 0.2097/0.3073/0.3272


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 5/10, Combined Loss: 10767.636881510416, Supervised Loss: 5.83240818977356, Unsupervised Loss: 21529.441080729168
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3032/0.3616/0.3642, macro F1 - Span/Sentence/Combined: 0.2333/0.3052/0.3464


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 6/10, Combined Loss: 10756.294352213541, Supervised Loss: 5.554507692654927, Unsupervised Loss: 21507.034016927082
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3175/0.3596/0.3977, macro F1 - Span/Sentence/Combined: 0.2518/0.3056/0.3757


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 7/10, Combined Loss: 10747.184000651041, Supervised Loss: 5.396040439605713, Unsupervised Loss: 21488.972330729168
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3260/0.3672/0.4146, macro F1 - Span/Sentence/Combined: 0.2432/0.3124/0.3824


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 8/10, Combined Loss: 10738.750569661459, Supervised Loss: 5.323269208272298, Unsupervised Loss: 21472.177734375
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3135/0.3657/0.4251, macro F1 - Span/Sentence/Combined: 0.2325/0.3123/0.3898


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 9/10, Combined Loss: 10730.197428385416, Supervised Loss: 5.101106246312459, Unsupervised Loss: 21455.293782552082
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3117/0.3712/0.3902, macro F1 - Span/Sentence/Combined: 0.2306/0.3139/0.3563


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 10/10, Combined Loss: 10719.26904296875, Supervised Loss: 4.999478340148926, Unsupervised Loss: 21433.538411458332
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.2876/0.3590/0.3902, macro F1 - Span/Sentence/Combined: 0.2073/0.3027/0.3576
Training combination 16/18432: (0.5, 1e-05, 0.5, 0.0005, 5, 768, 1e-06, 20, 0.5)


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epochs:   0%|          | 0/10 [00:00<?, ?it/s]

Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 1/10, Combined Loss: 10772.437744140625, Supervised Loss: 8.799370447794596, Unsupervised Loss: 21536.076334635418
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.4221/0.3944/0.3029, macro F1 - Span/Sentence/Combined: 0.2826/0.3173/0.2734


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 2/10, Combined Loss: 10762.497884114584, Supervised Loss: 7.6548612515131635, Unsupervised Loss: 21517.340983072918
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.4450/0.3875/0.2795, macro F1 - Span/Sentence/Combined: 0.2792/0.3297/0.2713


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 3/10, Combined Loss: 10750.782877604166, Supervised Loss: 7.217889865239461, Unsupervised Loss: 21494.347819010418
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.4499/0.3871/0.3439, macro F1 - Span/Sentence/Combined: 0.2633/0.3242/0.3212


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 4/10, Combined Loss: 10745.469319661459, Supervised Loss: 6.825592756271362, Unsupervised Loss: 21484.113444010418
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.4450/0.3919/0.3245, macro F1 - Span/Sentence/Combined: 0.2633/0.3293/0.3062


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 5/10, Combined Loss: 10733.807779947916, Supervised Loss: 6.469754099845886, Unsupervised Loss: 21461.145833333332
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.4548/0.3873/0.3324, macro F1 - Span/Sentence/Combined: 0.2842/0.3201/0.3195


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 6/10, Combined Loss: 10725.038167317709, Supervised Loss: 6.359177549680074, Unsupervised Loss: 21443.717122395832
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.4066/0.3848/0.3548, macro F1 - Span/Sentence/Combined: 0.2965/0.3154/0.3251


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 7/10, Combined Loss: 10715.6357421875, Supervised Loss: 6.1142875750859575, Unsupervised Loss: 21425.1572265625
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3736/0.3884/0.3454, macro F1 - Span/Sentence/Combined: 0.3067/0.3190/0.3180


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 8/10, Combined Loss: 10706.145751953125, Supervised Loss: 5.882074912389119, Unsupervised Loss: 21406.40966796875
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3471/0.3862/0.3416, macro F1 - Span/Sentence/Combined: 0.2898/0.3189/0.3174


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 9/10, Combined Loss: 10698.200032552084, Supervised Loss: 5.691256244977315, Unsupervised Loss: 21390.708658854168
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3324/0.3862/0.3757, macro F1 - Span/Sentence/Combined: 0.2897/0.3178/0.3430


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 10/10, Combined Loss: 10687.199625651041, Supervised Loss: 5.73581596215566, Unsupervised Loss: 21368.663411458332
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3175/0.3918/0.3844, macro F1 - Span/Sentence/Combined: 0.2742/0.3207/0.3458
Training combination 17/18432: (0.5, 1e-05, 0.5, 0.0005, 5, 768, 0.0001, 5, 0.1)


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epochs:   0%|          | 0/10 [00:00<?, ?it/s]

Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 1/10, Combined Loss: 1022939.1979166666, Supervised Loss: 8.646355708440145, Unsupervised Loss: 2045869.7708333333
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3142/0.3215/0.3192, macro F1 - Span/Sentence/Combined: 0.2129/0.2651/0.2861


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 2/10, Combined Loss: 1021821.9322916666, Supervised Loss: 6.834060192108154, Unsupervised Loss: 2043637.0208333333
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3141/0.3526/0.3333, macro F1 - Span/Sentence/Combined: 0.2322/0.3121/0.3093


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 3/10, Combined Loss: 1020706.21875, Supervised Loss: 6.24955689907074, Unsupervised Loss: 2041406.1875
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3141/0.3523/0.3631, macro F1 - Span/Sentence/Combined: 0.2322/0.3147/0.3250


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 4/10, Combined Loss: 1019595.0520833334, Supervised Loss: 5.8865586916605634, Unsupervised Loss: 2039184.2291666667
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3055/0.3536/0.3661, macro F1 - Span/Sentence/Combined: 0.2218/0.3168/0.3347


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 5/10, Combined Loss: 1018488.5885416666, Supervised Loss: 5.605002681414287, Unsupervised Loss: 2036971.5729166667
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3293/0.3594/0.3573, macro F1 - Span/Sentence/Combined: 0.2602/0.3215/0.3120


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 6/10, Combined Loss: 1017385.3125, Supervised Loss: 5.438571214675903, Unsupervised Loss: 2034765.1666666667
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3919/0.3663/0.3673, macro F1 - Span/Sentence/Combined: 0.3404/0.3258/0.3213


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 7/10, Combined Loss: 1016282.984375, Supervised Loss: 5.240002512931824, Unsupervised Loss: 2032560.71875
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.4021/0.3605/0.3372, macro F1 - Span/Sentence/Combined: 0.3495/0.3219/0.2973


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 8/10, Combined Loss: 1015183.0572916666, Supervised Loss: 5.032978296279907, Unsupervised Loss: 2030361.0833333333
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3807/0.3460/0.3642, macro F1 - Span/Sentence/Combined: 0.3321/0.3096/0.3225


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 9/10, Combined Loss: 1014083.84375, Supervised Loss: 4.870824376742045, Unsupervised Loss: 2028162.8229166667
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3842/0.3557/0.3927, macro F1 - Span/Sentence/Combined: 0.3436/0.3150/0.3469


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 10/10, Combined Loss: 1012985.796875, Supervised Loss: 4.795763770739238, Unsupervised Loss: 2025966.8125
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3875/0.3432/0.3757, macro F1 - Span/Sentence/Combined: 0.3522/0.3077/0.3360
Training combination 18/18432: (0.5, 1e-05, 0.5, 0.0005, 5, 768, 0.0001, 5, 0.2)


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epochs:   0%|          | 0/10 [00:00<?, ?it/s]

Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 1/10, Combined Loss: 1008809.8177083334, Supervised Loss: 8.572818438212076, Unsupervised Loss: 2017611.0520833333
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.2908/0.3145/0.3684, macro F1 - Span/Sentence/Combined: 0.1667/0.2726/0.3388


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 2/10, Combined Loss: 1007710.21875, Supervised Loss: 7.078641772270203, Unsupervised Loss: 2015413.375
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.2908/0.3226/0.3239, macro F1 - Span/Sentence/Combined: 0.1667/0.2920/0.3134


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 3/10, Combined Loss: 1006611.5416666666, Supervised Loss: 6.457607626914978, Unsupervised Loss: 2013216.625
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.2908/0.3254/0.3253, macro F1 - Span/Sentence/Combined: 0.1667/0.2971/0.3119


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 4/10, Combined Loss: 1005513.8958333334, Supervised Loss: 6.044237931569417, Unsupervised Loss: 2011021.7604166667
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.2915/0.3353/0.3529, macro F1 - Span/Sentence/Combined: 0.1810/0.3083/0.3189


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 5/10, Combined Loss: 1004416.5885416666, Supervised Loss: 5.8351731300354, Unsupervised Loss: 2008827.34375
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.2874/0.3294/0.3673, macro F1 - Span/Sentence/Combined: 0.2100/0.3015/0.3330


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 6/10, Combined Loss: 1003319.9895833334, Supervised Loss: 5.631080508232117, Unsupervised Loss: 2006634.34375
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3051/0.3372/0.3578, macro F1 - Span/Sentence/Combined: 0.2553/0.3096/0.3245


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 7/10, Combined Loss: 1002226.9791666666, Supervised Loss: 5.448183139165242, Unsupervised Loss: 2004448.5104166667
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3500/0.3265/0.3689, macro F1 - Span/Sentence/Combined: 0.3122/0.2969/0.3274


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 8/10, Combined Loss: 1001139.09375, Supervised Loss: 5.2573643525441485, Unsupervised Loss: 2002272.9270833333
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3677/0.3294/0.3642, macro F1 - Span/Sentence/Combined: 0.3288/0.3027/0.3206


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 9/10, Combined Loss: 1000052.5, Supervised Loss: 5.168694893519084, Unsupervised Loss: 2000099.8333333333
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3686/0.3284/0.3776, macro F1 - Span/Sentence/Combined: 0.3210/0.2966/0.3317


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 10/10, Combined Loss: 998967.71875, Supervised Loss: 5.031277656555176, Unsupervised Loss: 1997930.4166666667
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3602/0.3323/0.3988, macro F1 - Span/Sentence/Combined: 0.3093/0.3026/0.3532
Training combination 19/18432: (0.5, 1e-05, 0.5, 0.0005, 5, 768, 0.0001, 5, 0.3)


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epochs:   0%|          | 0/10 [00:00<?, ?it/s]

Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 1/10, Combined Loss: 1035247.6666666666, Supervised Loss: 8.7121022939682, Unsupervised Loss: 2070486.6145833333
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3648/0.2786/0.3612, macro F1 - Span/Sentence/Combined: 0.1786/0.2194/0.3502


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 2/10, Combined Loss: 1034124.75, Supervised Loss: 7.218478957811992, Unsupervised Loss: 2068242.28125
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.2828/0.2722/0.3448, macro F1 - Span/Sentence/Combined: 0.1389/0.2364/0.3212


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 3/10, Combined Loss: 1033003.9322916666, Supervised Loss: 6.622124075889587, Unsupervised Loss: 2066001.25
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.2928/0.2628/0.3896, macro F1 - Span/Sentence/Combined: 0.1743/0.2258/0.3694


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 4/10, Combined Loss: 1031883.171875, Supervised Loss: 6.206777930259705, Unsupervised Loss: 2063760.1354166667
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.2938/0.2795/0.3948, macro F1 - Span/Sentence/Combined: 0.1714/0.2458/0.3657


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 5/10, Combined Loss: 1030764.9583333334, Supervised Loss: 5.897368709246318, Unsupervised Loss: 2061524.0208333333
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3254/0.2761/0.3763, macro F1 - Span/Sentence/Combined: 0.2217/0.2421/0.3403


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 6/10, Combined Loss: 1029649.4270833334, Supervised Loss: 5.769960721333821, Unsupervised Loss: 2059293.0729166667
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3425/0.2716/0.3892, macro F1 - Span/Sentence/Combined: 0.2894/0.2386/0.3535


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 7/10, Combined Loss: 1028535.2708333334, Supervised Loss: 5.618027528127034, Unsupervised Loss: 2057064.9166666667
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3155/0.2595/0.3871, macro F1 - Span/Sentence/Combined: 0.2642/0.2277/0.3465


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 8/10, Combined Loss: 1027421.453125, Supervised Loss: 5.42593518892924, Unsupervised Loss: 2054837.4791666667
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3218/0.2795/0.3812, macro F1 - Span/Sentence/Combined: 0.2684/0.2456/0.3274


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 9/10, Combined Loss: 1026310.7604166666, Supervised Loss: 5.300283034642537, Unsupervised Loss: 2052616.21875
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3121/0.2679/0.3866, macro F1 - Span/Sentence/Combined: 0.2699/0.2357/0.3364


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 10/10, Combined Loss: 1025200.15625, Supervised Loss: 5.198058366775513, Unsupervised Loss: 2050395.1145833333
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3503/0.2741/0.4088, macro F1 - Span/Sentence/Combined: 0.2984/0.2413/0.3650
Training combination 20/18432: (0.5, 1e-05, 0.5, 0.0005, 5, 768, 0.0001, 5, 0.5)


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epochs:   0%|          | 0/10 [00:00<?, ?it/s]

Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 1/10, Combined Loss: 1027629.6302083334, Supervised Loss: 8.692203481992086, Unsupervised Loss: 2055250.5625
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.2105/0.3284/0.2899, macro F1 - Span/Sentence/Combined: 0.0721/0.2606/0.2786


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 2/10, Combined Loss: 1026515.6875, Supervised Loss: 7.3586892286936445, Unsupervised Loss: 2053024.0104166667
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.2241/0.3000/0.3187, macro F1 - Span/Sentence/Combined: 0.0914/0.2395/0.3158


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 3/10, Combined Loss: 1025403.65625, Supervised Loss: 7.135681827863057, Unsupervised Loss: 2050800.1666666667
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.1429/0.2830/0.3324, macro F1 - Span/Sentence/Combined: 0.0755/0.2292/0.3253


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 4/10, Combined Loss: 1024291.734375, Supervised Loss: 6.703427155812581, Unsupervised Loss: 2048576.75
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.1780/0.2975/0.3652, macro F1 - Span/Sentence/Combined: 0.0919/0.2387/0.3521


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 5/10, Combined Loss: 1023180.171875, Supervised Loss: 6.554903030395508, Unsupervised Loss: 2046353.7916666667
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.2000/0.3165/0.3621, macro F1 - Span/Sentence/Combined: 0.1163/0.2548/0.3526


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 6/10, Combined Loss: 1022069.7395833334, Supervised Loss: 6.40088677406311, Unsupervised Loss: 2044133.0833333333
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.2583/0.3135/0.3590, macro F1 - Span/Sentence/Combined: 0.1832/0.2492/0.3461


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 7/10, Combined Loss: 1020960.6197916666, Supervised Loss: 6.196600000063579, Unsupervised Loss: 2041915.0416666667
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3127/0.3028/0.3684, macro F1 - Span/Sentence/Combined: 0.2616/0.2492/0.3521


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 8/10, Combined Loss: 1019852.0625, Supervised Loss: 6.096534649531047, Unsupervised Loss: 2039698.03125
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.2896/0.3178/0.3557, macro F1 - Span/Sentence/Combined: 0.2817/0.2503/0.3392


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 9/10, Combined Loss: 1018744.0104166666, Supervised Loss: 5.934183120727539, Unsupervised Loss: 2037482.09375
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.2706/0.3171/0.3686, macro F1 - Span/Sentence/Combined: 0.2706/0.2517/0.3512


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 10/10, Combined Loss: 1017636.90625, Supervised Loss: 5.8237336079279585, Unsupervised Loss: 2035267.9895833333
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.2640/0.3178/0.3540, macro F1 - Span/Sentence/Combined: 0.2727/0.2514/0.3365
Training combination 21/18432: (0.5, 1e-05, 0.5, 0.0005, 5, 768, 0.0001, 8, 0.1)


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epochs:   0%|          | 0/10 [00:00<?, ?it/s]

Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 1/10, Combined Loss: 1075454.7083333333, Supervised Loss: 8.655739585558573, Unsupervised Loss: 2150900.75
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3984/0.3818/0.3874, macro F1 - Span/Sentence/Combined: 0.3256/0.3260/0.3714


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 2/10, Combined Loss: 1074303.84375, Supervised Loss: 6.870443383852641, Unsupervised Loss: 2148600.8333333335
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.4215/0.3647/0.4041, macro F1 - Span/Sentence/Combined: 0.3213/0.3119/0.3870


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 3/10, Combined Loss: 1073156.1145833333, Supervised Loss: 6.214135368665059, Unsupervised Loss: 2146306.0208333335
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3801/0.3491/0.3805, macro F1 - Span/Sentence/Combined: 0.2867/0.2965/0.3548


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 4/10, Combined Loss: 1072010.0416666667, Supervised Loss: 5.95727260907491, Unsupervised Loss: 2144014.1666666665
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3865/0.3363/0.3861, macro F1 - Span/Sentence/Combined: 0.2964/0.2877/0.3591


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 5/10, Combined Loss: 1070868.8541666667, Supervised Loss: 5.599584420522054, Unsupervised Loss: 2141732.0833333335
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3823/0.3323/0.4069, macro F1 - Span/Sentence/Combined: 0.2895/0.2809/0.3694


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 6/10, Combined Loss: 1069728.6770833333, Supervised Loss: 5.36693263053894, Unsupervised Loss: 2139452.0
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3595/0.3293/0.4162, macro F1 - Span/Sentence/Combined: 0.2903/0.2803/0.3754


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 7/10, Combined Loss: 1068590.53125, Supervised Loss: 5.187165816624959, Unsupervised Loss: 2137175.875
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3513/0.3304/0.4235, macro F1 - Span/Sentence/Combined: 0.2871/0.2795/0.3752


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 8/10, Combined Loss: 1067453.09375, Supervised Loss: 5.059086124102275, Unsupervised Loss: 2134901.1041666665
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3253/0.3303/0.4204, macro F1 - Span/Sentence/Combined: 0.2724/0.2795/0.3747


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 9/10, Combined Loss: 1066317.8541666667, Supervised Loss: 4.882219115893046, Unsupervised Loss: 2132630.8541666665
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.2943/0.3294/0.4464, macro F1 - Span/Sentence/Combined: 0.2487/0.2783/0.3924


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 10/10, Combined Loss: 1065184.5104166667, Supervised Loss: 4.810760299364726, Unsupervised Loss: 2130364.2291666665
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3235/0.3333/0.4548, macro F1 - Span/Sentence/Combined: 0.2696/0.2835/0.3991
Training combination 22/18432: (0.5, 1e-05, 0.5, 0.0005, 5, 768, 0.0001, 8, 0.2)


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epochs:   0%|          | 0/10 [00:00<?, ?it/s]

Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 1/10, Combined Loss: 1052611.5729166667, Supervised Loss: 8.657408237457275, Unsupervised Loss: 2105214.4791666665
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3310/0.3807/0.3125, macro F1 - Span/Sentence/Combined: 0.2538/0.3160/0.2742


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 2/10, Combined Loss: 1051468.9895833333, Supervised Loss: 7.008389155069987, Unsupervised Loss: 2102930.9583333335
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3311/0.3673/0.3022, macro F1 - Span/Sentence/Combined: 0.2462/0.3278/0.2927


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 3/10, Combined Loss: 1050329.4479166667, Supervised Loss: 6.389734546343486, Unsupervised Loss: 2100652.5208333335
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3349/0.3519/0.3497, macro F1 - Span/Sentence/Combined: 0.2478/0.3152/0.3294


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 4/10, Combined Loss: 1049192.8541666667, Supervised Loss: 6.061456481615703, Unsupervised Loss: 2098379.625
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3261/0.3519/0.3371, macro F1 - Span/Sentence/Combined: 0.2325/0.3151/0.3124


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 5/10, Combined Loss: 1048056.8229166666, Supervised Loss: 5.844711820284526, Unsupervised Loss: 2096107.78125
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3342/0.3626/0.3506, macro F1 - Span/Sentence/Combined: 0.2473/0.3192/0.3281


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 6/10, Combined Loss: 1046922.6510416666, Supervised Loss: 5.652663230895996, Unsupervised Loss: 2093839.6458333333
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3395/0.3573/0.3476, macro F1 - Span/Sentence/Combined: 0.2687/0.3169/0.3059


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 7/10, Combined Loss: 1045788.2395833334, Supervised Loss: 5.441960692405701, Unsupervised Loss: 2091571.03125
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3610/0.3547/0.3663, macro F1 - Span/Sentence/Combined: 0.3069/0.3195/0.3345


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 8/10, Combined Loss: 1044654.8541666666, Supervised Loss: 5.298833290735881, Unsupervised Loss: 2089304.40625
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.2988/0.3353/0.3804, macro F1 - Span/Sentence/Combined: 0.2738/0.3029/0.3529


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 9/10, Combined Loss: 1043524.6510416666, Supervised Loss: 5.126968622207642, Unsupervised Loss: 2087044.1770833333
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3063/0.3468/0.3944, macro F1 - Span/Sentence/Combined: 0.2808/0.3106/0.3562


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 10/10, Combined Loss: 1042396.203125, Supervised Loss: 5.077034989992778, Unsupervised Loss: 2084787.3333333333
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.2994/0.3440/0.3826, macro F1 - Span/Sentence/Combined: 0.2748/0.3084/0.3516
Training combination 23/18432: (0.5, 1e-05, 0.5, 0.0005, 5, 768, 0.0001, 8, 0.3)


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epochs:   0%|          | 0/10 [00:00<?, ?it/s]

Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 1/10, Combined Loss: 1017622.4010416666, Supervised Loss: 9.054109414418539, Unsupervised Loss: 2035235.75
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3974/0.3272/0.3594, macro F1 - Span/Sentence/Combined: 0.2070/0.2893/0.3545


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 2/10, Combined Loss: 1016510.890625, Supervised Loss: 7.268301645914714, Unsupervised Loss: 2033014.5208333333
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.4290/0.3190/0.3679, macro F1 - Span/Sentence/Combined: 0.2505/0.2834/0.3449


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 3/10, Combined Loss: 1015399.8020833334, Supervised Loss: 6.704127152760823, Unsupervised Loss: 2030792.9166666667
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.4278/0.3333/0.3568, macro F1 - Span/Sentence/Combined: 0.2503/0.2980/0.3403


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 4/10, Combined Loss: 1014290.1875, Supervised Loss: 6.344834526379903, Unsupervised Loss: 2028574.0208333333
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.4270/0.3234/0.3747, macro F1 - Span/Sentence/Combined: 0.2525/0.2890/0.3571


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 5/10, Combined Loss: 1013184.3697916666, Supervised Loss: 6.101673404375712, Unsupervised Loss: 2026362.625
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.4360/0.3373/0.3740, macro F1 - Span/Sentence/Combined: 0.3163/0.2985/0.3485


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 6/10, Combined Loss: 1012081.1197916666, Supervised Loss: 5.864035884539287, Unsupervised Loss: 2024156.375
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3920/0.3393/0.3733, macro F1 - Span/Sentence/Combined: 0.3246/0.2959/0.3459


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 7/10, Combined Loss: 1010979.9375, Supervised Loss: 5.653890053431193, Unsupervised Loss: 2021954.2291666667
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3363/0.3273/0.3458, macro F1 - Span/Sentence/Combined: 0.2865/0.2853/0.3142


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 8/10, Combined Loss: 1009879.6770833334, Supervised Loss: 5.569001793861389, Unsupervised Loss: 2019753.78125
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.2972/0.3373/0.3652, macro F1 - Span/Sentence/Combined: 0.2699/0.2995/0.3241


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 9/10, Combined Loss: 1008779.734375, Supervised Loss: 5.434290766716003, Unsupervised Loss: 2017554.03125
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.2961/0.3343/0.3588, macro F1 - Span/Sentence/Combined: 0.2618/0.2971/0.3260


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 10/10, Combined Loss: 1007682.4791666666, Supervised Loss: 5.332916935284932, Unsupervised Loss: 2015359.6354166667
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3032/0.3263/0.3594, macro F1 - Span/Sentence/Combined: 0.2667/0.2923/0.3236
Training combination 24/18432: (0.5, 1e-05, 0.5, 0.0005, 5, 768, 0.0001, 8, 0.5)


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epochs:   0%|          | 0/10 [00:00<?, ?it/s]

Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 1/10, Combined Loss: 1052317.0833333333, Supervised Loss: 8.93797500928243, Unsupervised Loss: 2104625.2083333335
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.1723/0.3450/0.3362, macro F1 - Span/Sentence/Combined: 0.0882/0.2544/0.2955


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 2/10, Combined Loss: 1051180.625, Supervised Loss: 7.688993612925212, Unsupervised Loss: 2102353.5416666665
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.2492/0.3382/0.2747, macro F1 - Span/Sentence/Combined: 0.1308/0.2772/0.2492


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 3/10, Combined Loss: 1050049.6979166667, Supervised Loss: 7.095403750737508, Unsupervised Loss: 2100092.2916666665
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.2500/0.3343/0.3112, macro F1 - Span/Sentence/Combined: 0.1312/0.2824/0.2838


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 4/10, Combined Loss: 1048921.5520833333, Supervised Loss: 6.872310121854146, Unsupervised Loss: 2097836.2083333335
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.2260/0.3343/0.2982, macro F1 - Span/Sentence/Combined: 0.1225/0.2834/0.2803


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 5/10, Combined Loss: 1047794.0833333334, Supervised Loss: 6.546778281529744, Unsupervised Loss: 2095581.625
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.2143/0.3450/0.3401, macro F1 - Span/Sentence/Combined: 0.1254/0.2958/0.3114


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 6/10, Combined Loss: 1046667.171875, Supervised Loss: 6.387494802474976, Unsupervised Loss: 2093327.9583333333
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.1846/0.3314/0.3743, macro F1 - Span/Sentence/Combined: 0.1246/0.2813/0.3407


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 7/10, Combined Loss: 1045541.640625, Supervised Loss: 6.215101877848308, Unsupervised Loss: 2091077.0625
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.2279/0.3362/0.3704, macro F1 - Span/Sentence/Combined: 0.2044/0.2820/0.3313


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 8/10, Combined Loss: 1044416.4583333334, Supervised Loss: 5.9350398778915405, Unsupervised Loss: 2088826.9583333333
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.2517/0.3372/0.3837, macro F1 - Span/Sentence/Combined: 0.2266/0.2888/0.3440


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 9/10, Combined Loss: 1043291.5989583334, Supervised Loss: 5.990215500195821, Unsupervised Loss: 2086577.1979166667
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.2812/0.3392/0.3732, macro F1 - Span/Sentence/Combined: 0.2508/0.2865/0.3358


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 10/10, Combined Loss: 1042168.5677083334, Supervised Loss: 5.821558554967244, Unsupervised Loss: 2084331.3229166667
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.2938/0.3372/0.3882, macro F1 - Span/Sentence/Combined: 0.2604/0.2846/0.3513
Training combination 25/18432: (0.5, 1e-05, 0.5, 0.0005, 5, 768, 0.0001, 10, 0.1)


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epochs:   0%|          | 0/10 [00:00<?, ?it/s]

Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 1/10, Combined Loss: 1059161.71875, Supervised Loss: 8.810429533322653, Unsupervised Loss: 2118314.625
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.2198/0.3562/0.3509, macro F1 - Span/Sentence/Combined: 0.1060/0.3158/0.3276


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 2/10, Combined Loss: 1058008.1770833333, Supervised Loss: 7.031955242156982, Unsupervised Loss: 2116009.2916666665
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.2374/0.3916/0.3782, macro F1 - Span/Sentence/Combined: 0.1439/0.3385/0.3482


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 3/10, Combined Loss: 1056855.5208333333, Supervised Loss: 6.332887848218282, Unsupervised Loss: 2113704.7083333335
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.2374/0.3743/0.3753, macro F1 - Span/Sentence/Combined: 0.1439/0.3092/0.3432


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 4/10, Combined Loss: 1055704.7395833333, Supervised Loss: 5.820229887962341, Unsupervised Loss: 2111403.6458333335
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.2335/0.3818/0.3514, macro F1 - Span/Sentence/Combined: 0.1427/0.3135/0.3232


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 5/10, Combined Loss: 1054554.8645833333, Supervised Loss: 5.653546094894409, Unsupervised Loss: 2109104.1041666665
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.1966/0.3793/0.3516, macro F1 - Span/Sentence/Combined: 0.1235/0.3087/0.3176


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 6/10, Combined Loss: 1053405.3541666667, Supervised Loss: 5.402155876159668, Unsupervised Loss: 2106805.3125
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.2910/0.3647/0.3567, macro F1 - Span/Sentence/Combined: 0.2452/0.2972/0.3199


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 7/10, Combined Loss: 1052256.8645833333, Supervised Loss: 5.20716651280721, Unsupervised Loss: 2104508.5208333335
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3478/0.3714/0.3977, macro F1 - Span/Sentence/Combined: 0.2851/0.3016/0.3545


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 8/10, Combined Loss: 1051109.65625, Supervised Loss: 5.058372616767883, Unsupervised Loss: 2102214.2708333335
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3715/0.3746/0.4071, macro F1 - Span/Sentence/Combined: 0.2923/0.2989/0.3606


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 9/10, Combined Loss: 1049962.0520833333, Supervised Loss: 4.939073920249939, Unsupervised Loss: 2099919.1458333335
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3827/0.3710/0.4023, macro F1 - Span/Sentence/Combined: 0.2897/0.3051/0.3613


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 10/10, Combined Loss: 1048816.3697916667, Supervised Loss: 4.784777760505676, Unsupervised Loss: 2097627.9375
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3812/0.3668/0.4195, macro F1 - Span/Sentence/Combined: 0.3024/0.2969/0.3669
Training combination 26/18432: (0.5, 1e-05, 0.5, 0.0005, 5, 768, 0.0001, 10, 0.2)


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epochs:   0%|          | 0/10 [00:00<?, ?it/s]

Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 1/10, Combined Loss: 1040862.3802083334, Supervised Loss: 8.827716708183289, Unsupervised Loss: 2081715.9270833333
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.2849/0.3679/0.3149, macro F1 - Span/Sentence/Combined: 0.1618/0.2830/0.3179


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 2/10, Combined Loss: 1039737.4479166666, Supervised Loss: 7.102806965510051, Unsupervised Loss: 2079467.78125
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3082/0.3366/0.3351, macro F1 - Span/Sentence/Combined: 0.1575/0.2576/0.3227


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 3/10, Combined Loss: 1038616.4895833334, Supervised Loss: 6.519251545270284, Unsupervised Loss: 2077226.4479166667
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3072/0.3438/0.3307, macro F1 - Span/Sentence/Combined: 0.1575/0.2797/0.3261


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 4/10, Combined Loss: 1037498.0052083334, Supervised Loss: 6.109185655911763, Unsupervised Loss: 2074989.90625
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.2994/0.3457/0.3802, macro F1 - Span/Sentence/Combined: 0.1575/0.2817/0.3536


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 5/10, Combined Loss: 1036380.9895833334, Supervised Loss: 5.883611003557841, Unsupervised Loss: 2072756.1041666667
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.2857/0.3467/0.3695, macro F1 - Span/Sentence/Combined: 0.1648/0.2873/0.3394


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 6/10, Combined Loss: 1035265.6614583334, Supervised Loss: 5.549590508143107, Unsupervised Loss: 2070525.7708333333
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3323/0.3375/0.3672, macro F1 - Span/Sentence/Combined: 0.2617/0.2828/0.3475


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 7/10, Combined Loss: 1034152.71875, Supervised Loss: 5.5172576904296875, Unsupervised Loss: 2068299.90625
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3512/0.3465/0.3683, macro F1 - Span/Sentence/Combined: 0.3256/0.2870/0.3391


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 8/10, Combined Loss: 1033039.1354166666, Supervised Loss: 5.29772945245107, Unsupervised Loss: 2066072.9895833333
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3604/0.3446/0.3594, macro F1 - Span/Sentence/Combined: 0.3365/0.2853/0.3268


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 9/10, Combined Loss: 1031927.9947916666, Supervised Loss: 5.050878365834554, Unsupervised Loss: 2063850.9479166667
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3602/0.3364/0.4195, macro F1 - Span/Sentence/Combined: 0.3225/0.2806/0.3619


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 10/10, Combined Loss: 1030818.9791666666, Supervised Loss: 4.9852661689122515, Unsupervised Loss: 2061632.96875
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3591/0.3476/0.4201, macro F1 - Span/Sentence/Combined: 0.3206/0.2902/0.3725
Training combination 27/18432: (0.5, 1e-05, 0.5, 0.0005, 5, 768, 0.0001, 10, 0.3)


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epochs:   0%|          | 0/10 [00:00<?, ?it/s]

Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 1/10, Combined Loss: 1037883.0052083334, Supervised Loss: 8.60640835762024, Unsupervised Loss: 2075757.3854166667
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3514/0.3834/0.3300, macro F1 - Span/Sentence/Combined: 0.2234/0.2859/0.3045


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 2/10, Combined Loss: 1036758.203125, Supervised Loss: 7.156996965408325, Unsupervised Loss: 2073509.2395833333
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.4157/0.3653/0.3385, macro F1 - Span/Sentence/Combined: 0.2908/0.3012/0.3113


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 3/10, Combined Loss: 1035635.2083333334, Supervised Loss: 6.612960974375407, Unsupervised Loss: 2071263.8020833333
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.4157/0.3529/0.3433, macro F1 - Span/Sentence/Combined: 0.2908/0.2916/0.3131


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 4/10, Combined Loss: 1034514.2916666666, Supervised Loss: 6.32066547870636, Unsupervised Loss: 2069022.25
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.4235/0.3591/0.3681, macro F1 - Span/Sentence/Combined: 0.2944/0.2965/0.3412


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 5/10, Combined Loss: 1033392.7760416666, Supervised Loss: 6.0085997978846235, Unsupervised Loss: 2066779.5416666667
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.4269/0.3478/0.3723, macro F1 - Span/Sentence/Combined: 0.3003/0.2825/0.3414


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 6/10, Combined Loss: 1032271.9791666666, Supervised Loss: 5.713930765787761, Unsupervised Loss: 2064538.25
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.4119/0.3344/0.4151, macro F1 - Span/Sentence/Combined: 0.3128/0.2739/0.3743


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 7/10, Combined Loss: 1031152.8854166666, Supervised Loss: 5.588889757792155, Unsupervised Loss: 2062300.1875
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.4000/0.3489/0.4191, macro F1 - Span/Sentence/Combined: 0.3255/0.2856/0.3820


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 8/10, Combined Loss: 1030034.9010416666, Supervised Loss: 5.4408061901728315, Unsupervised Loss: 2060064.3854166667
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3953/0.3508/0.3956, macro F1 - Span/Sentence/Combined: 0.3341/0.2860/0.3538


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 9/10, Combined Loss: 1028919.015625, Supervised Loss: 5.317253073056539, Unsupervised Loss: 2057832.6979166667
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3821/0.3529/0.4167, macro F1 - Span/Sentence/Combined: 0.3311/0.2878/0.3777


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 10/10, Combined Loss: 1027805.4322916666, Supervised Loss: 5.303211212158203, Unsupervised Loss: 2055605.5416666667
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3939/0.3489/0.4066, macro F1 - Span/Sentence/Combined: 0.3391/0.2849/0.3756
Training combination 28/18432: (0.5, 1e-05, 0.5, 0.0005, 5, 768, 0.0001, 10, 0.5)


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epochs:   0%|          | 0/10 [00:00<?, ?it/s]

Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 1/10, Combined Loss: 1032878.375, Supervised Loss: 8.893491824467977, Unsupervised Loss: 2065747.875
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.2136/0.2583/0.3684, macro F1 - Span/Sentence/Combined: 0.1314/0.2218/0.3405


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 2/10, Combined Loss: 1031763.0260416666, Supervised Loss: 7.710412263870239, Unsupervised Loss: 2063518.34375
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3292/0.2644/0.3529, macro F1 - Span/Sentence/Combined: 0.2225/0.2404/0.3358


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 3/10, Combined Loss: 1030651.8177083334, Supervised Loss: 7.361545443534851, Unsupervised Loss: 2061296.28125
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3292/0.2922/0.3785, macro F1 - Span/Sentence/Combined: 0.2225/0.2622/0.3581


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 4/10, Combined Loss: 1029545.5104166666, Supervised Loss: 6.903848926226298, Unsupervised Loss: 2059084.125
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3249/0.2913/0.4031, macro F1 - Span/Sentence/Combined: 0.2170/0.2561/0.3810


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 5/10, Combined Loss: 1028442.9010416666, Supervised Loss: 6.686878522237142, Unsupervised Loss: 2056879.1145833333
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3209/0.2949/0.4149, macro F1 - Span/Sentence/Combined: 0.2136/0.2598/0.3815


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 6/10, Combined Loss: 1027342.8645833334, Supervised Loss: 6.484445095062256, Unsupervised Loss: 2054679.2604166667
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.2761/0.3028/0.4062, macro F1 - Span/Sentence/Combined: 0.2193/0.2654/0.3774


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 7/10, Combined Loss: 1026243.78125, Supervised Loss: 6.252323945363362, Unsupervised Loss: 2052481.3020833333
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.2824/0.2930/0.4011, macro F1 - Span/Sentence/Combined: 0.2542/0.2528/0.3633


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 8/10, Combined Loss: 1025150.9479166666, Supervised Loss: 6.041106383005778, Unsupervised Loss: 2050295.8645833333
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3006/0.2885/0.4169, macro F1 - Span/Sentence/Combined: 0.2559/0.2498/0.3818


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 9/10, Combined Loss: 1024063.5208333334, Supervised Loss: 5.999568343162537, Unsupervised Loss: 2048121.0625
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.2765/0.2939/0.4171, macro F1 - Span/Sentence/Combined: 0.2298/0.2529/0.3825


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 10/10, Combined Loss: 1022980.9635416666, Supervised Loss: 5.786456147829692, Unsupervised Loss: 2045956.1458333333
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.2838/0.2921/0.4213, macro F1 - Span/Sentence/Combined: 0.2264/0.2521/0.3858
Training combination 29/18432: (0.5, 1e-05, 0.5, 0.0005, 5, 768, 0.0001, 20, 0.1)


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epochs:   0%|          | 0/10 [00:00<?, ?it/s]

Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 1/10, Combined Loss: 1011525.5729166666, Supervised Loss: 8.894418875376383, Unsupervised Loss: 2023042.2395833333
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3680/0.3626/0.2888, macro F1 - Span/Sentence/Combined: 0.2038/0.2980/0.2530


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 2/10, Combined Loss: 1010411.7239583334, Supervised Loss: 6.92743456363678, Unsupervised Loss: 2020816.5416666667
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3944/0.3567/0.3246, macro F1 - Span/Sentence/Combined: 0.1802/0.2875/0.3167


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 3/10, Combined Loss: 1009301.2708333334, Supervised Loss: 6.303691466649373, Unsupervised Loss: 2018596.2395833333
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.4053/0.3494/0.3333, macro F1 - Span/Sentence/Combined: 0.2072/0.2796/0.3143


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 4/10, Combined Loss: 1008188.4583333334, Supervised Loss: 5.849680145581563, Unsupervised Loss: 2016371.0625
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.4203/0.3481/0.3081, macro F1 - Span/Sentence/Combined: 0.2245/0.2812/0.2932


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 5/10, Combined Loss: 1007083.7083333334, Supervised Loss: 5.697693228721619, Unsupervised Loss: 2014161.71875
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.4606/0.3578/0.3478, macro F1 - Span/Sentence/Combined: 0.3172/0.2865/0.3181


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 6/10, Combined Loss: 1005981.7083333334, Supervised Loss: 5.4366774161656695, Unsupervised Loss: 2011957.96875
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.4665/0.3501/0.3878, macro F1 - Span/Sentence/Combined: 0.3901/0.2810/0.3605


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 7/10, Combined Loss: 1004882.5052083334, Supervised Loss: 5.276327768961589, Unsupervised Loss: 2009759.7395833333
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.4350/0.3567/0.3851, macro F1 - Span/Sentence/Combined: 0.3846/0.2874/0.3555


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 8/10, Combined Loss: 1003789.0520833334, Supervised Loss: 5.10604190826416, Unsupervised Loss: 2007573.0
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.4108/0.3647/0.3966, macro F1 - Span/Sentence/Combined: 0.3534/0.2904/0.3582


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 9/10, Combined Loss: 1002696.046875, Supervised Loss: 4.97578227519989, Unsupervised Loss: 2005387.1145833333
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.4203/0.3588/0.3895, macro F1 - Span/Sentence/Combined: 0.3477/0.2895/0.3552


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 10/10, Combined Loss: 1001605.3385416666, Supervised Loss: 4.815107663472493, Unsupervised Loss: 2003205.8541666667
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.4062/0.3588/0.3798, macro F1 - Span/Sentence/Combined: 0.3405/0.2883/0.3416
Training combination 30/18432: (0.5, 1e-05, 0.5, 0.0005, 5, 768, 0.0001, 20, 0.2)


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epochs:   0%|          | 0/10 [00:00<?, ?it/s]

Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 1/10, Combined Loss: 1026183.453125, Supervised Loss: 8.449546376864115, Unsupervised Loss: 2052358.4583333333
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3370/0.3333/0.3805, macro F1 - Span/Sentence/Combined: 0.1466/0.2717/0.3648


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 2/10, Combined Loss: 1025065.2864583334, Supervised Loss: 7.026081562042236, Unsupervised Loss: 2050123.5729166667
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3023/0.3280/0.3979, macro F1 - Span/Sentence/Combined: 0.0833/0.2778/0.3741


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 3/10, Combined Loss: 1023948.671875, Supervised Loss: 6.475104967753093, Unsupervised Loss: 2047890.84375
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3295/0.3291/0.3874, macro F1 - Span/Sentence/Combined: 0.0891/0.2807/0.3651


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 4/10, Combined Loss: 1022831.9791666666, Supervised Loss: 6.138879776000977, Unsupervised Loss: 2045657.8125
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.2824/0.3333/0.4000, macro F1 - Span/Sentence/Combined: 0.1098/0.2917/0.3765


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 5/10, Combined Loss: 1021722.2291666666, Supervised Loss: 5.815087596575419, Unsupervised Loss: 2043438.6458333333
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.2769/0.3396/0.3900, macro F1 - Span/Sentence/Combined: 0.1787/0.2951/0.3611


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 6/10, Combined Loss: 1020616.0104166666, Supervised Loss: 5.567280848821004, Unsupervised Loss: 2041226.4583333333
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3142/0.3375/0.4047, macro F1 - Span/Sentence/Combined: 0.2349/0.2944/0.3615


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 7/10, Combined Loss: 1019512.90625, Supervised Loss: 5.403448422749837, Unsupervised Loss: 2039020.40625
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.2997/0.3427/0.4118, macro F1 - Span/Sentence/Combined: 0.2504/0.2974/0.3665


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 8/10, Combined Loss: 1018414.4947916666, Supervised Loss: 5.276472290356954, Unsupervised Loss: 2036823.7083333333
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3344/0.3396/0.3977, macro F1 - Span/Sentence/Combined: 0.2902/0.2938/0.3573


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 9/10, Combined Loss: 1017314.4739583334, Supervised Loss: 5.109228849411011, Unsupervised Loss: 2034623.8333333333
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3486/0.3333/0.4069, macro F1 - Span/Sentence/Combined: 0.3063/0.2898/0.3727


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 10/10, Combined Loss: 1016213.4270833334, Supervised Loss: 5.009296735127767, Unsupervised Loss: 2032421.8541666667
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3515/0.3323/0.4290, macro F1 - Span/Sentence/Combined: 0.3083/0.2898/0.3851
Training combination 31/18432: (0.5, 1e-05, 0.5, 0.0005, 5, 768, 0.0001, 20, 0.3)


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epochs:   0%|          | 0/10 [00:00<?, ?it/s]

Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 1/10, Combined Loss: 1032078.8802083334, Supervised Loss: 8.917195796966553, Unsupervised Loss: 2064148.8541666667
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.2820/0.2899/0.2684, macro F1 - Span/Sentence/Combined: 0.1424/0.2676/0.2679


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 2/10, Combined Loss: 1030966.6510416666, Supervised Loss: 7.405276894569397, Unsupervised Loss: 2061925.8958333333
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3264/0.2948/0.2961, macro F1 - Span/Sentence/Combined: 0.1814/0.2768/0.3021


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 3/10, Combined Loss: 1029850.3020833334, Supervised Loss: 6.705034891764323, Unsupervised Loss: 2059693.90625
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3264/0.3140/0.3226, macro F1 - Span/Sentence/Combined: 0.1814/0.2850/0.3281


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 4/10, Combined Loss: 1028739.8541666666, Supervised Loss: 6.274211883544922, Unsupervised Loss: 2057473.4270833333
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3333/0.3205/0.3333, macro F1 - Span/Sentence/Combined: 0.1993/0.2871/0.3346


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 5/10, Combined Loss: 1027628.7916666666, Supervised Loss: 6.0585817495981855, Unsupervised Loss: 2055251.53125
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3549/0.3149/0.3642, macro F1 - Span/Sentence/Combined: 0.2606/0.2825/0.3534


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 6/10, Combined Loss: 1026517.8958333334, Supervised Loss: 5.873836835225423, Unsupervised Loss: 2053029.90625
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3458/0.3090/0.3519, macro F1 - Span/Sentence/Combined: 0.3163/0.2758/0.3403


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 7/10, Combined Loss: 1025410.953125, Supervised Loss: 5.649048368136088, Unsupervised Loss: 2050816.2395833333
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3392/0.3059/0.3769, macro F1 - Span/Sentence/Combined: 0.3122/0.2729/0.3493


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 8/10, Combined Loss: 1024304.109375, Supervised Loss: 5.510397434234619, Unsupervised Loss: 2048602.71875
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3699/0.3149/0.3916, macro F1 - Span/Sentence/Combined: 0.3358/0.2795/0.3629


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 9/10, Combined Loss: 1023202.6354166666, Supervised Loss: 5.367011706034343, Unsupervised Loss: 2046399.90625
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3771/0.3149/0.3927, macro F1 - Span/Sentence/Combined: 0.3488/0.2806/0.3611


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 10/10, Combined Loss: 1022095.75, Supervised Loss: 5.281500021616618, Unsupervised Loss: 2044186.2083333333
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3837/0.3099/0.3939, macro F1 - Span/Sentence/Combined: 0.3534/0.2785/0.3595
Training combination 32/18432: (0.5, 1e-05, 0.5, 0.0005, 5, 768, 0.0001, 20, 0.5)


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epochs:   0%|          | 0/10 [00:00<?, ?it/s]

Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 1/10, Combined Loss: 1065420.2604166667, Supervised Loss: 8.761080702145895, Unsupervised Loss: 2130831.7708333335
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3230/0.3373/0.3626, macro F1 - Span/Sentence/Combined: 0.1877/0.2684/0.3453


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 2/10, Combined Loss: 1064296.25, Supervised Loss: 7.553808927536011, Unsupervised Loss: 2128584.9583333335
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.2564/0.3471/0.3459, macro F1 - Span/Sentence/Combined: 0.1232/0.2900/0.3317


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 3/10, Combined Loss: 1063169.5729166667, Supervised Loss: 7.058484315872192, Unsupervised Loss: 2126332.0833333335
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.2564/0.3526/0.3416, macro F1 - Span/Sentence/Combined: 0.1232/0.3038/0.3234


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 4/10, Combined Loss: 1062049.8020833333, Supervised Loss: 6.996112942695618, Unsupervised Loss: 2124092.6041666665
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.2976/0.3509/0.3520, macro F1 - Span/Sentence/Combined: 0.1668/0.3021/0.3167


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 5/10, Combined Loss: 1060925.1041666667, Supervised Loss: 6.484516501426697, Unsupervised Loss: 2121843.7083333335
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3622/0.3509/0.3722, macro F1 - Span/Sentence/Combined: 0.2606/0.3026/0.3273


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 6/10, Combined Loss: 1059805.28125, Supervised Loss: 6.484723409016927, Unsupervised Loss: 2119604.0833333335
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3925/0.3567/0.3726, macro F1 - Span/Sentence/Combined: 0.3493/0.3066/0.3397


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 7/10, Combined Loss: 1058687.3541666667, Supervised Loss: 6.188777009646098, Unsupervised Loss: 2117368.5208333335
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3883/0.3460/0.3842, macro F1 - Span/Sentence/Combined: 0.3352/0.2974/0.3429


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 8/10, Combined Loss: 1057568.9479166667, Supervised Loss: 6.0745006402333575, Unsupervised Loss: 2115131.8541666665
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3900/0.3481/0.3831, macro F1 - Span/Sentence/Combined: 0.3403/0.2974/0.3464


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 9/10, Combined Loss: 1056454.125, Supervised Loss: 5.843164443969727, Unsupervised Loss: 2112902.375
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3827/0.3430/0.3782, macro F1 - Span/Sentence/Combined: 0.3258/0.2968/0.3395


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 10/10, Combined Loss: 1055341.875, Supervised Loss: 5.819536050160726, Unsupervised Loss: 2110677.875
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3886/0.3402/0.3944, macro F1 - Span/Sentence/Combined: 0.3299/0.2933/0.3491
Training combination 33/18432: (0.5, 1e-05, 0.5, 0.0005, 5, 768, 0.0005, 5, 0.1)


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epochs:   0%|          | 0/10 [00:00<?, ?it/s]

Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 1/10, Combined Loss: 5402971.5, Supervised Loss: 8.543102542559305, Unsupervised Loss: 10805934.416666666
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3741/0.3771/0.3978, macro F1 - Span/Sentence/Combined: 0.2668/0.3119/0.3834


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 2/10, Combined Loss: 5397180.833333333, Supervised Loss: 7.335523049036662, Unsupervised Loss: 10794354.333333334
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3741/0.3866/0.3945, macro F1 - Span/Sentence/Combined: 0.2668/0.3323/0.3809


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 3/10, Combined Loss: 5391399.333333333, Supervised Loss: 6.714542786280314, Unsupervised Loss: 10782792.083333334
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3741/0.3908/0.4031, macro F1 - Span/Sentence/Combined: 0.2668/0.3432/0.3835


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 4/10, Combined Loss: 5385622.666666667, Supervised Loss: 6.207761168479919, Unsupervised Loss: 10771239.166666666
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3776/0.3862/0.4225, macro F1 - Span/Sentence/Combined: 0.2708/0.3360/0.3940


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 5/10, Combined Loss: 5379860.0, Supervised Loss: 5.992459972699483, Unsupervised Loss: 10759714.0
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3676/0.3908/0.4199, macro F1 - Span/Sentence/Combined: 0.2699/0.3407/0.3830


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 6/10, Combined Loss: 5374112.833333333, Supervised Loss: 5.770439147949219, Unsupervised Loss: 10748219.833333334
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3784/0.3919/0.4068, macro F1 - Span/Sentence/Combined: 0.3138/0.3438/0.3674


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 7/10, Combined Loss: 5368372.333333333, Supervised Loss: 5.678651809692383, Unsupervised Loss: 10736738.833333334
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3667/0.4035/0.4124, macro F1 - Span/Sentence/Combined: 0.3289/0.3524/0.3788


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 8/10, Combined Loss: 5362636.083333333, Supervised Loss: 5.492526690165202, Unsupervised Loss: 10725266.583333334
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3290/0.4000/0.3977, macro F1 - Span/Sentence/Combined: 0.2973/0.3518/0.3624


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 9/10, Combined Loss: 5356906.041666667, Supervised Loss: 5.388827006022136, Unsupervised Loss: 10713806.916666666
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3093/0.4012/0.3790, macro F1 - Span/Sentence/Combined: 0.2798/0.3488/0.3418


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 10/10, Combined Loss: 5351181.791666667, Supervised Loss: 5.294974128405253, Unsupervised Loss: 10702358.416666666
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3207/0.3942/0.4126, macro F1 - Span/Sentence/Combined: 0.2881/0.3447/0.3606
Training combination 34/18432: (0.5, 1e-05, 0.5, 0.0005, 5, 768, 0.0005, 5, 0.2)


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epochs:   0%|          | 0/10 [00:00<?, ?it/s]

Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 1/10, Combined Loss: 5094787.416666667, Supervised Loss: 8.72429366906484, Unsupervised Loss: 10189566.166666666
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3774/0.3732/0.4214, macro F1 - Span/Sentence/Combined: 0.2134/0.3169/0.4074


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 2/10, Combined Loss: 5089200.958333333, Supervised Loss: 7.292556802431743, Unsupervised Loss: 10178394.583333334
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3672/0.3499/0.4054, macro F1 - Span/Sentence/Combined: 0.1833/0.3055/0.3842


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 3/10, Combined Loss: 5083628.416666667, Supervised Loss: 6.914999167124431, Unsupervised Loss: 10167249.833333334
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3672/0.3303/0.3830, macro F1 - Span/Sentence/Combined: 0.1833/0.2913/0.3670


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 4/10, Combined Loss: 5078061.208333333, Supervised Loss: 6.396375775337219, Unsupervised Loss: 10156116.083333334
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3672/0.3436/0.3571, macro F1 - Span/Sentence/Combined: 0.1833/0.3041/0.3432


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 5/10, Combined Loss: 5072500.416666667, Supervised Loss: 6.228524843851726, Unsupervised Loss: 10144994.583333334
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3630/0.3515/0.4044, macro F1 - Span/Sentence/Combined: 0.1827/0.3147/0.3786


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 6/10, Combined Loss: 5066950.0, Supervised Loss: 5.98348331451416, Unsupervised Loss: 10133893.916666666
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3590/0.3353/0.3857, macro F1 - Span/Sentence/Combined: 0.2251/0.2980/0.3585


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 7/10, Combined Loss: 5061406.166666667, Supervised Loss: 5.840917785962422, Unsupervised Loss: 10122806.416666666
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3386/0.3423/0.3764, macro F1 - Span/Sentence/Combined: 0.2883/0.3181/0.3494


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 8/10, Combined Loss: 5055873.208333333, Supervised Loss: 5.649261037508647, Unsupervised Loss: 10111740.583333334
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3519/0.3373/0.3764, macro F1 - Span/Sentence/Combined: 0.3152/0.3003/0.3537


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 9/10, Combined Loss: 5050352.833333333, Supervised Loss: 5.548040787378947, Unsupervised Loss: 10100700.0
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3486/0.3404/0.3743, macro F1 - Span/Sentence/Combined: 0.2876/0.2999/0.3494


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 10/10, Combined Loss: 5044844.416666667, Supervised Loss: 5.400491992632548, Unsupervised Loss: 10089683.666666666
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3526/0.3415/0.3718, macro F1 - Span/Sentence/Combined: 0.2914/0.3042/0.3455
Training combination 35/18432: (0.5, 1e-05, 0.5, 0.0005, 5, 768, 0.0005, 5, 0.3)


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epochs:   0%|          | 0/10 [00:00<?, ?it/s]

Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 1/10, Combined Loss: 5039601.958333333, Supervised Loss: 8.692140022913614, Unsupervised Loss: 10079195.25
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3917/0.2601/0.2804, macro F1 - Span/Sentence/Combined: 0.2145/0.2061/0.2645


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 2/10, Combined Loss: 5034062.208333333, Supervised Loss: 7.503797729810079, Unsupervised Loss: 10068116.916666666
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.2784/0.2593/0.2967, macro F1 - Span/Sentence/Combined: 0.1280/0.2190/0.2897


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 3/10, Combined Loss: 5028525.708333333, Supervised Loss: 6.938533306121826, Unsupervised Loss: 10057044.416666666
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.2774/0.2654/0.3278, macro F1 - Span/Sentence/Combined: 0.1280/0.2245/0.3124


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 4/10, Combined Loss: 5022994.333333333, Supervised Loss: 6.691024343172709, Unsupervised Loss: 10045981.916666666
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3344/0.2622/0.3273, macro F1 - Span/Sentence/Combined: 0.1678/0.2233/0.2999


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 5/10, Combined Loss: 5017466.291666667, Supervised Loss: 6.381245454152425, Unsupervised Loss: 10034926.25
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3310/0.2346/0.3226, macro F1 - Span/Sentence/Combined: 0.1751/0.2060/0.3073


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 6/10, Combined Loss: 5011941.75, Supervised Loss: 6.205052216847737, Unsupervised Loss: 10023877.25
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3333/0.2407/0.3121, macro F1 - Span/Sentence/Combined: 0.2197/0.2069/0.2968


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 7/10, Combined Loss: 5006421.875, Supervised Loss: 6.081241329511006, Unsupervised Loss: 10012837.5
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3333/0.2439/0.3314, macro F1 - Span/Sentence/Combined: 0.2218/0.2114/0.3147


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 8/10, Combined Loss: 5000905.5, Supervised Loss: 5.891339619954427, Unsupervised Loss: 10001805.0
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3344/0.2553/0.3353, macro F1 - Span/Sentence/Combined: 0.2719/0.2184/0.3198


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 9/10, Combined Loss: 4995392.833333333, Supervised Loss: 5.728643417358398, Unsupervised Loss: 9990779.75
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3699/0.2462/0.3423, macro F1 - Span/Sentence/Combined: 0.3158/0.2132/0.3250


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 10/10, Combined Loss: 4989884.083333333, Supervised Loss: 5.655168056488037, Unsupervised Loss: 9979762.5
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.3590/0.2553/0.3576, macro F1 - Span/Sentence/Combined: 0.3159/0.2200/0.3326
Training combination 36/18432: (0.5, 1e-05, 0.5, 0.0005, 5, 768, 0.0005, 5, 0.5)


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epochs:   0%|          | 0/10 [00:00<?, ?it/s]

Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 1/10, Combined Loss: 5072247.708333333, Supervised Loss: 8.759739518165588, Unsupervised Loss: 10144486.5
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.4190/0.2951/0.3577, macro F1 - Span/Sentence/Combined: 0.2732/0.2527/0.3356


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 2/10, Combined Loss: 5066678.833333333, Supervised Loss: 7.9084471464157104, Unsupervised Loss: 10133349.75
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.4092/0.3048/0.3601, macro F1 - Span/Sentence/Combined: 0.2678/0.2643/0.3427


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 3/10, Combined Loss: 5061117.541666667, Supervised Loss: 7.469520489374797, Unsupervised Loss: 10122227.666666666
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.4052/0.3197/0.3799, macro F1 - Span/Sentence/Combined: 0.2648/0.2669/0.3624


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 4/10, Combined Loss: 5055575.625, Supervised Loss: 7.190291881561279, Unsupervised Loss: 10111144.166666666
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.4145/0.3230/0.3881, macro F1 - Span/Sentence/Combined: 0.2736/0.2711/0.3644


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

Epoch 5/10, Combined Loss: 5050053.583333333, Supervised Loss: 6.922079046567281, Unsupervised Loss: 10100100.25
Validation Metrics - micro F1 - Span/Sentence/Combined: 0.4085/0.3178/0.3810, macro F1 - Span/Sentence/Combined: 0.2740/0.2643/0.3578


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

KeyboardInterrupt: 

# Test model

In [42]:
def load_model_from_path(path, embedding_dim, D_h, lambda_orthogonality, M, t, num_sentences, K, num_frames, dropout_prob, sentence_heads, srl_heads, device='cuda'):
    """
    Loads the weights into an instance of the model class from the given path.
    
    Args:
    - model_class (torch.nn.Module): The class of the model (uninitialized).
    - path (str): Path to the saved weights.
    - device (str): Device to load the model on ('cpu' or 'cuda').
    
    Returns:
    - model (torch.nn.Module): Model with weights loaded.
    """

    # Model instantiation
    model = FRISS(embedding_dim, D_h, lambda_orthogonality, M, t, num_sentences, K, num_frames, dropout_prob=dropout_prob, sentence_heads=sentence_heads, srl_heads=srl_heads)
    model = model.to(device)
    
    model.load_state_dict(torch.load(path, map_location=device))
    
    #model.eval()
    return model


In [43]:
# Hyperparameters
embedding_dim = 768
num_frames = 14

D_h = 768
lambda_orthogonality = 0.000001

K = 14
t = 5
M = 10
tau_min = 0.5
tau_decay = 5e-4

dropout_prob = 0.1

sentence_heads = 8
srl_heads = 7


model = load_model_from_path('models/model1.pth', embedding_dim, D_h, lambda_orthogonality, M, t, num_sentences, K, num_frames, dropout_prob, sentence_heads, srl_heads)

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.bias', 'cls.seq_relationship.bias', 'cls.seq_relationship.weight', 'cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


In [44]:
def predict(model, dataloader, y_columns, device='cuda'):
    """
    Make predictions with the given model and dataloader.
    
    Args:
    - model (torch.nn.Module): The model to make predictions with.
    - dataloader (DataLoader): DataLoader for the dataset to predict on.
    - y_columns (pandas.Index): Column names from the y dataframe which correspond to labels.
    - device (str): Device to make predictions on ('cpu' or 'cuda').
    
    Returns:
    - predicted_labels (list of lists): List containing the predicted labels for each instance.
    """
    model.eval()
    all_preds_span = []
    
    with torch.no_grad():
        for batch in dataloader:
            # Move data to device
            sentence_ids = batch['sentence_ids'].to(device)
            predicate_ids = batch['predicate_ids'].to(device)
            arg0_ids = batch['arg0_ids'].to(device)
            arg1_ids = batch['arg1_ids'].to(device)
            
            # Forward pass
            _, span_logits, sentence_logits, combined_logits = model(sentence_ids, predicate_ids, arg0_ids, arg1_ids, 0.6)
            combined_pred = (torch.sigmoid(combined_logits) > 0.5).float()

            all_preds_span.append(combined_pred.cpu().numpy())
                
            torch.cuda.empty_cache()

    predictions = np.vstack(all_preds_span)
    
    # Convert boolean predictions to labels
    predicted_labels = []
    for pred in predictions:
        labels = list(y_columns[pred.astype(bool)])
        predicted_labels.append(labels)
    
    return predicted_labels


In [46]:
import numpy as np

# article813452859
article = """Sadiq Khan Slammed for Pro-EU 'Message of Support' During Firework Display

The spectacular fireworks that lit up the London sky on Monday night caused a stir on social media over the display's pro-EU message, at a time when the nation is divided over its looming withdrawal from the bloc.
London Mayor Sadiq Khan faced mounting criticism after the capital's New Year's Eve fireworks display, which celebrated ties with the European Union, left a bad taste in the mouths of some Brits.
The 135-metre-high London Eye was lit up in blue while its tubs turned yellow, with the giant Ferris wheel resembling the star-studded flag of the European Union.Sadiq Khan called his fireworks display a "message of support" to EU citizens living in London.
"Our one million EU citizens are Londoners, they make a huge contribution, and no matter the outcome of Brexit — they will always be welcome", he said.
To the one million EU citizens who have made our city your home: you are Londoners, you make a huge contribution and you are welcome here.
I'm proud that tonight we will welcome in the new year with a message of support to you.
#LondonNYE #LondonIsOpen https://t.co/XctrgfXXaM — Sadiq Khan (@SadiqKhan) 31 декабря 2018 г.
However, a host of Londoners rushed to Twitter to accuse their mayor of "politicising" the celebrations — with some are even calling for his resignation.
I cannot believe this event has been politicised.
This man has no shame.
Just resign.
— wayne campbell (@campbs177) 31 декабря 2018 г.
Thanks a lot Sadiq Khan you ruined the fireworks display by talking about Europe, need I remind you about Brexit.
You have started of the new year by talking about relationships with the European Union.
Well done.
We need Boris Johnson back.
— Mitchell T Cannon (@MitchellTCanno1) 1 января 2019 г.
Another shameless attempt at using party politics on what is supposed to be a happy occasion — droneguy (@shelbyguitars) 1 января 2019 г.
Politicising another innocent event that should be no different to anyone no matter who they are or where they are from!
Shameful!
!
— Mike Dyer (@Miked2372Mike) 31 декабря 2018 г.
Someone was stabbed down the road from me last night.
How about sorting that stuff out instead of politicizing something that should be fun for everyone?How many times does it have to be said.
Commenting on Brexit isn't your job.
— Peter Rockett (@rockettp) 31 декабря 2018 г.
The UK voted to leave the EU in June 2016 via a nationwide referendum, with 51.9 per cent voting in favour of pulling out of the bloc, while 48.1 per cent wanted to remain.
The withdrawal is scheduled for the end of March; the Article 50 deadline.
The Remain sentiment dominated London, with nearly 60 percent of voters wanting Britain to stay in the European Union.
Sadiq Khan, an outspoken Remainer himself, earlier called for a second referendum on Brexit.
"The government's abject failure — and the huge risk we face of a bad deal or a 'no deal' Brexit — means that giving people a fresh say is now the right — and only — approach left for our country," he said in September.
"""

test_article = get_article_dataloader(article, tokenizer)
predict(model, test_article, y.columns)

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.bias', 'cls.seq_relationship.bias', 'cls.seq_relationship.weight', 'cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Processed article 1/1


[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


[['Morality',
  'Security_and_defense',
  'Legality_Constitutionality_and_jurisprudence',
  'Political',
  'External_regulation_and_reputation']]

# Run test for validation

In [47]:
test_dataloader = get_test_dataloader(df_test["content"], tokenizer, batch_size=1)

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.bias', 'cls.seq_relationship.bias', 'cls.seq_relationship.weight', 'cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Processed article 1/53
Processed article 2/53
Processed article 3/53
Processed article 4/53
Processed article 5/53
Processed article 6/53
Processed article 7/53
Processed article 8/53
Processed article 9/53
Processed article 10/53
Processed article 11/53
Processed article 12/53
Processed article 13/53
Processed article 14/53
Processed article 15/53
Processed article 16/53
Processed article 17/53
Processed article 18/53
Processed article 19/53
Processed article 20/53
Processed article 21/53
Processed article 22/53
Processed article 23/53
Processed article 24/53
Processed article 25/53
Processed article 26/53
Processed article 27/53
Processed article 28/53
Processed article 29/53
Processed article 30/53
Processed article 31/53
Processed article 32/53
Processed article 33/53
Processed article 34/53
Processed article 35/53
Processed article 36/53
Processed article 37/53
Processed article 38/53
Processed article 39/53
Processed article 40/53
Processed article 41/53
Processed article 42/53
P

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


In [48]:
predictions = predict(model, test_dataloader, y.columns)

In [49]:
df_preds = pd.DataFrame(predictions)

In [50]:
df_preds = pd.concat([df_test, df_preds], axis=1)

In [52]:
df_preds["pred_frames"] = df_preds.apply(lambda l: list([l[0], l[1], l[2], l[3], l[4], l[5]]), axis=1)

df_preds["pred_frames"] = df_preds["pred_frames"].apply(lambda l: ",".join([ f for f in l if f is not None]))

In [53]:
df_preds.to_csv("../notebooks/test.csv", sep="\t", index=False, columns=["article_id", "pred_frames"])

# Inspect dict

In [54]:
def inspect(model, dataloader, y_columns, device='cuda'):
    """
    Make predictions with the given model and dataloader.
    
    Args:
    - model (torch.nn.Module): The model to make predictions with.
    - dataloader (DataLoader): DataLoader for the dataset to predict on.
    - y_columns (pandas.Index): Column names from the y dataframe which correspond to labels.
    - device (str): Device to make predictions on ('cpu' or 'cuda').
    
    Returns:
    - predicted_labels (list of lists): List containing the predicted labels for each instance.
    """
    model.eval()
    
    all_preds_span = []
    
    # Initialize usage lists for each label
    num_labels = len(y_columns)
    all_used_labels_p = []
    all_used_labels_a0 = []
    all_used_labels_a1 = []
    
    with torch.no_grad():
        for batch in dataloader:
            used_labels_p = []
            used_labels_a0 = []
            used_labels_a1 = []
    
            # Move data to device
            sentence_ids = batch['sentence_ids'].to(device)
            predicate_ids = batch['predicate_ids'].to(device)
            arg0_ids = batch['arg0_ids'].to(device)
            arg1_ids = batch['arg1_ids'].to(device)
            
            sentence_embeddings, predicate_embeddings, arg0_embeddings, arg1_embeddings = model.aggregation(sentence_ids, predicate_ids, arg0_ids, arg1_ids)
            
            # Process each span
            for span_idx in range(sentence_embeddings.size(1)):
                s_sentence_span = sentence_embeddings[:, span_idx, :]
                v_p_span = predicate_embeddings[:, span_idx, :]
                v_a0_span = arg0_embeddings[:, span_idx, :]
                v_a1_span = arg1_embeddings[:, span_idx, :]
            
                #unsupervised.combined_autoencoder v_p, v_a0, v_a1, v_sentence, tau
                output = model.unsupervised.combined_autoencoder(v_p_span, v_a0_span, v_a1_span, s_sentence_span, 0.6)
                
                #print(output["p"]["g"].cpu().numpy())
                used_labels_p.append(output["p"]["g"].cpu().numpy())
                used_labels_a0.append(output["a0"]["g"].cpu().numpy())
                used_labels_a1.append(output["a1"]["g"].cpu().numpy())

            
            # Forward pass
            _, span_logits, sentence_logits, combined_logits = model(sentence_ids, predicate_ids, arg0_ids, arg1_ids, 0.6)
            combined_pred = (torch.sigmoid(combined_logits) > 0.5).float()

            all_preds_span.append(combined_pred.cpu().numpy())
                
            torch.cuda.empty_cache()
            
            all_used_labels_p.append(used_labels_p)
            all_used_labels_a0.append(used_labels_a0)
            all_used_labels_a1.append(used_labels_a1)

    predictions = np.vstack(all_preds_span)
    
    # Convert boolean predictions to labels
    predicted_labels = []
    for pred in predictions:
        labels = list(y_columns[pred.astype(bool)])
        predicted_labels.append(labels)
    
    return predicted_labels, all_used_labels_p, all_used_labels_a0, all_used_labels_a1

In [64]:
num_sentences = 32
batch_size = 1

train_dataset, test_dataset, train_dataloader, test_dataloader = get_datasets_dataloaders(X, y, tokenizer, recalculate_srl=False, batch_size=batch_size, max_sentences_per_article=num_sentences, max_sentence_length=64, max_arg_length=12, pickle_path="notebooks/X_srl_full.pkl")

Load SRL from Pickle
                                           Class  Train Distribution (%)  \
0                                       Morality               47.938144   
1                           Security_and_defense               42.783505   
2             Policy_prescription_and_evaluation               14.690722   
3   Legality_Constitutionality_and_jurisprudence               46.391753   
4                                       Economic                6.185567   
5                                      Political               54.381443   
6                           Crime_and_punishment               53.350515   
7             External_regulation_and_reputation               26.804124   
8                                 Public_opinion                5.670103   
9                          Fairness_and_equality               27.577320   
10                        Capacity_and_resources                6.443299   
11                               Quality_of_life               20.3

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


In [65]:
import numpy as np

predicted_labels, used_labels_p, used_labels_a0, used_labels_a1 = inspect(model, train_dataloader, y.columns)

In [66]:
len(used_labels_p)

388

In [73]:
categories = list(y.columns)

category_lists_p = {category: [] for category in categories}
category_lists_a1 = {category: [] for category in categories}
category_lists_a0 = {category: [] for category in categories}

loader = test_dataloader

for batch_idx in range(len(loader.dataset)):
    # Iterate over each sentence
    ds = loader.dataset[batch_idx]

    for sentence_idx in range(len(used_labels_p[batch_idx])):

        # Update the lists for each category
        for cat_idx, category in enumerate(categories):
            
            if used_labels_p[batch_idx][cat_idx][0][cat_idx] > 0.8:
                category_lists_p[category].append(ds["predicate_ids"][sentence_idx].numpy())
            
            if used_labels_a0[batch_idx][cat_idx][0][cat_idx] > 0.8:
                category_lists_a0[category].append(ds["arg0_ids"][sentence_idx].numpy())
                
            if used_labels_a1[batch_idx][cat_idx][0][cat_idx] > 0.8:
                category_lists_a1[category].append(ds["arg1_ids"][sentence_idx].numpy())

In [74]:
import nltk
nltk.download('stopwords')
from nltk.corpus import stopwords

from nltk.tokenize import word_tokenize

def decode_tokens(token_dict, stop_words):
    decoded_data = {}
    for category, token_lists in token_dict.items():
        decoded_data[category] = []
        for tokens in token_lists:
            if np.any(tokens > 0):
                # Decode the tokens
                decoded_text = tokenizer.decode(tokens, skip_special_tokens=True).strip()
                # Tokenize and remove stop words
                words = word_tokenize(decoded_text)
                filtered_words = [word for word in words if word.lower() not in stop_words]
                # Join the words back into a string
                decoded_data[category].append(' '.join(filtered_words))
    return decoded_data

stop_words = set(stopwords.words('english'))  # Assuming your text is in English

# Decode the token IDs for each ARG
decoded_predicate = decode_tokens(category_lists_p, stop_words)
decoded_arg0 = decode_tokens(category_lists_a0, stop_words)
decoded_arg1 = decode_tokens(category_lists_a1, stop_words)

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


In [75]:
# Initialize a list to collect DataFrame rows
rows = []

# Populate the list with rows
for frame in set(decoded_predicate) | set(decoded_arg0) | set(decoded_arg1):
    # Get the lists, joining multiple words with a comma
    pred_words = ', '.join([ s.strip() for s in list(set(decoded_predicate.get(frame, []))) if s is not None or l != ""])
    arg0_words = ', '.join(list(set(decoded_arg0.get(frame, []))))
    arg1_words = ', '.join(list(set(decoded_arg1.get(frame, []))))

    # Create a dictionary for the row
    row = {
        "Frame": frame,
        "Predicate": pred_words,
        "ARG0": arg0_words,
        "ARG1": arg1_words
    }
    
    # Append the row dictionary to the rows list
    rows.append(row)

# Convert the list of rows to a DataFrame
df_full_table = pd.DataFrame(rows)

# Display the DataFrame
df_full_table.style.hide_index()

  df_full_table.style.hide_index()


Frame,Predicate,ARG0,ARG1
Cultural_identity,", grants, prohibit, keep, used, back, stop, reports, occur, asked, breaking, bite, safeguard, agree, sending, said, opt",", individual, author, biblical illiterates, creator, god, picture inside seal, john hancock , first signer declaration, american history found document, speaker -, person, preachers, completing poll, caesar, whosoever",
Public_opinion,,"us, american government employee stationed southern city, mike pompeo , us secretary state,","“ would raise serious questions whether department, matter “ handled properly ”, justice department release special counsel robert mueller ’"
Policy_prescription_and_evaluation,,", nominee, ramirez, russians , chinese , iranians ,, cia ’ front corporations narcotics business, ambellas intellihub, laura loomer , noxious far -, haig 's attorney marc victor, clinton regime, gop, ’, people, haig, [ senate majority leader ] mitch mcconnell chuck grass, federalist society, senate judiciary committee chairman chuck grassley ( r -, search warrant records, house democratic leader nancy pelosi ( ca ), paddock, trump, american public, two women, ford, waste - filled us military / security complex, one western media source, us democratic party, president eisenhower, intellihub, douglas haig , 55 - year - old, democratic senator chuck schumer ( ny ), high court, insane neoconservatives ,, sum, president trump, bonkers conspiracy site intellihub, cia cut -, corrupt filth, new yorker magazine, douglas haig ’ @ linkedin account, attorneys, washington insiders, presstitute media, ambellas, leaders democratic party , nancy pe, george soros ’ money, customer, democrats, haig 's linkedin profile, democratic minority leader us house representatives, media - savvy sleazeball lawyer, mesa man, left",", man ’ actions, incident, access freedom outpost updates free charge, hat, poll - story, employer, questions concerns, jimenez cut party, entire event, meal, brett kavanaugh, america, jimenez worked “ part - time door, teenager wearing # maga hat, different opinions president, site 's privacy policy terms, aggression, bar, ugly head, altercation, boy ’ hat, field, since terminated employee , actions, rumble, business"
Crime_and_punishment,", mean, resumed, attack, preferred, run, launched, understand, ate, overthrew, need, vaporized, include, suspended, accelerate, take, ', called, travel, gotten, act, boasts, opt, shows, spending, uses, using, resumes, avoided, spread, moved, grants, made, means, add, relishing, fired, cajole, signed, talking, agree, threaten, mastered, deployed, skimped",", us south korea, pakistan ‘, us power, washington, administration ’ amateur foreign policymakers, north korea test, north koreans , eccentric, us military, president trump, israel, north korea, nuclear explosions, shatter japan cripple south korea, north korea ’ kim jong - un, “",
Fairness_and_equality,,", intellihub, douglas haig , 55 - year - old, ambellas, bonkers conspiracy site intellihub, paddock, cia cut -, ambellas intellihub, customer, laura loomer , noxious far -, haig 's attorney marc victor, haig 's linkedin profile, ’, mesa man, people, haig, douglas haig ’ @ linkedin account, search warrant records",
Economic,", resumed, need, suspended, ', travel, gotten, act, opt, shows, spending, using, resumes, grants, means, signed, agree, made, deployed",", two, clinton, us, jim acosta, completing poll, trump, order, us forces, military, swisher, others, military force, hillary clinton, migrant caravan, president trump, caravan , made mostly male hondur, ruptly video, president donald trump, migrants, former secretary state hillary clinton, hillary, democratic party",", image, attempt coup president, 1 , 000 billion dollar budget military, pelosi ’ false accusation, multiple shooters, trump ’ comments, trump putin, new information, junk, us, countless hours, inside @ mandalaybay valet center, donald trump, many unanswered questions dr, license plate numbers given police, large 1 , 000 billion, - called evidence, way, representation , lies justify war conflict, first step president trump taken reduce, entire narrative, notice extremely hostile reaction peace, nothing, fear undefined retribution, effect, series bizarre interviews led even, isis connection, president trump, russia, handwritten note reportedly written, definitely footage, stephenpaddock ‘ van, world, las vegas shooter ’ hotel check -, donations - election campaigns, fbi , along state local police, many, western media opposed peace, peace russia, explanation, recipients, outcry blatantly false, democratic party, public, cia bought - - paid -"
Capacity_and_resources,", transmitted, highlights, gather, states, affects, confirmed, work",", nominee, ramirez, gop, people, [ senate majority leader ] mitch mcconnell chuck grass, federalist society, senate judiciary committee chairman chuck grassley ( r -, two women, ford, high court, president trump, new yorker magazine, attorneys, washington insiders, george soros ’ money, democrats, media - savvy sleazeball lawyer, left",", group migrants, three lanes shut san, reinforcing positions, us, photos, three nb vehicle lanes, initial smaller group around 85 people , mostly, main ‘ caravan ’ thousands, press pass, take pacific coast route reach us, drones , helicopters night - vision capabilities ,, would close multiple entry lanes u ., barriers razor wire, seven us army members cbp, like child, lane closures san ysidro, site 's privacy policy terms, barricades razor wire, access truth uncensored updates free charge, san diego, 5 , 200 troops, put razor wire border san, field, concertina wire , pre - positioning jersey barriers"
Health_and_safety,"apprehended, indicated, processed, patrolling",,", plague, one major causes spread, infection, whoever handles body, whole plague kind government conspiracy, country ’ cash - strapped government, least 15 famadihana ceremonies, turning bones ancestors — plague, plague lie, least 124 people madagascar, ’ coincidence outbreak coincides"
External_regulation_and_reputation,,", president, president trump ’ policies, former speaker house , newt gingrich, conversations, president trump, people, completing poll",", defy feds release mexican, today day vindication rights, garcia zarate ’ background nationality played, anybody get away violating u., case, detain deportation, prosecution defense got outcome wanted, sentence 16 months 3 years, sanctuary state california , false -, mexican national guilty murder mans, sig sauer . 40 caliber pistol ,, felon, evidence california, mate, investigation special prosecutor, verdict, people country, rights american legal system ,, jose ines garcia zarate , also known, ’ shocked — saddened shocked, career criminal even supposed, sentence"
Morality,,"lawmakers, democrats six house committees — including judiciary chairman jerry, giuliani",", guns & ammo confiscation &, people safe, armor, access freedom outpost updates free charge, pull dick 's sporting goods demand, policy ama support : establishing laws allowing, magic age 21, drums hold 50 - 100 rounds, 18 - year - olds carry firearms ,, military, exactly mean high - capacity magazines, domestic violence, rights n't, protocols processes, government get determine gets, nothing medicine health, sorts government overreach unconstitutional, site 's privacy policy terms, political position, corrupt democrat politician, teachers students arrested tried dealt, huge laundry list things ama supports, field, 20 30 round magazines handguns, bypass fifth amendment protections concerning someone actually"
