## Classifier - Try 12

Classify articles frames using aggregated SRL and sentence embeddings.

1. Try multi attention header for better identifying how each sentence corresponds to the full article frame

In [1]:
import os

os.listdir(os.getcwd())

['FRISS_srl.pkl',
 'training_metrics.csv',
 'README.md',
 'notebooks',
 'grid_search_metrics.csv',
 '.git',
 'assets',
 'test.csv',
 'friss',
 'models',
 '.ipynb_checkpoints',
 'data',
 '.gitignore',
 'frameaxis']

In [2]:
labels_path = "data/data/en/train-labels-subtask-2.txt"
articles_path = "data/data/en/train-articles-subtask-2/"

In [3]:
import pandas as pd

# Read the dev-labels-subtask-2.txt file
labels_df = pd.read_csv(labels_path, sep="\t")

# Rename the columns for easier processing
labels_df.columns = ["article_id", "frames"]

labels_df.head()

Unnamed: 0,article_id,frames
0,832959523,"Morality,Security_and_defense,Policy_prescript..."
1,833039623,"Political,Crime_and_punishment,External_regula..."
2,833032367,"Political,Crime_and_punishment,Fairness_and_eq..."
3,814777937,"Political,Morality,Fairness_and_equality,Exter..."
4,821744708,"Policy_prescription_and_evaluation,Political,L..."


In [4]:
# A function to read the article text given its ID
def get_article_content(article_id):
    try:
        with open(f"{articles_path}/article{article_id}.txt", "r") as f:
            return f.read()
    except FileNotFoundError:
        return None

df = labels_df

# Apply the function to get the article content
df["content"] = df["article_id"].apply(get_article_content)

# Drop rows where content could not be found
df.dropna(subset=["content"], inplace=True)

df.head()


Unnamed: 0,article_id,frames,content
0,832959523,"Morality,Security_and_defense,Policy_prescript...",How Theresa May Botched\n\nThose were the time...
1,833039623,"Political,Crime_and_punishment,External_regula...",Robert Mueller III Rests His Case—Dems NEVER W...
2,833032367,"Political,Crime_and_punishment,Fairness_and_eq...",Robert Mueller Not Recommending Any More Indic...
3,814777937,"Political,Morality,Fairness_and_equality,Exter...",The Far Right Is Trying to Co-opt the Yellow V...
4,821744708,"Policy_prescription_and_evaluation,Political,L...",‘Special place in hell’ for those who promoted...


In [5]:
# Split the frames column into a list of frames
df["frames_list"] = df["frames"].str.split(",")

# create for each frame a new column with the frame as name and 1 if the frame is present in the article and 0 if not
for frame in df["frames_list"].explode().unique():
    df[frame] = df["frames_list"].apply(lambda x: 1 if frame in x else 0)

df.head()

Unnamed: 0,article_id,frames,content,frames_list,Morality,Security_and_defense,Policy_prescription_and_evaluation,Legality_Constitutionality_and_jurisprudence,Economic,Political,Crime_and_punishment,External_regulation_and_reputation,Public_opinion,Fairness_and_equality,Capacity_and_resources,Quality_of_life,Cultural_identity,Health_and_safety
0,832959523,"Morality,Security_and_defense,Policy_prescript...",How Theresa May Botched\n\nThose were the time...,"[Morality, Security_and_defense, Policy_prescr...",1,1,1,1,1,0,0,0,0,0,0,0,0,0
1,833039623,"Political,Crime_and_punishment,External_regula...",Robert Mueller III Rests His Case—Dems NEVER W...,"[Political, Crime_and_punishment, External_reg...",0,0,1,1,0,1,1,1,1,0,0,0,0,0
2,833032367,"Political,Crime_and_punishment,Fairness_and_eq...",Robert Mueller Not Recommending Any More Indic...,"[Political, Crime_and_punishment, Fairness_and...",0,0,0,1,0,1,1,1,0,1,0,0,0,0
3,814777937,"Political,Morality,Fairness_and_equality,Exter...",The Far Right Is Trying to Co-opt the Yellow V...,"[Political, Morality, Fairness_and_equality, E...",1,1,0,0,1,1,0,1,1,1,0,0,0,0
4,821744708,"Policy_prescription_and_evaluation,Political,L...",‘Special place in hell’ for those who promoted...,"[Policy_prescription_and_evaluation, Political...",0,0,1,1,0,1,0,1,0,0,0,0,0,0


In [6]:
filtered_data = df[(df['Political'] == 1) | (df['Crime_and_punishment'] == 1)]

filtered_data.head()

Unnamed: 0,article_id,frames,content,frames_list,Morality,Security_and_defense,Policy_prescription_and_evaluation,Legality_Constitutionality_and_jurisprudence,Economic,Political,Crime_and_punishment,External_regulation_and_reputation,Public_opinion,Fairness_and_equality,Capacity_and_resources,Quality_of_life,Cultural_identity,Health_and_safety
1,833039623,"Political,Crime_and_punishment,External_regula...",Robert Mueller III Rests His Case—Dems NEVER W...,"[Political, Crime_and_punishment, External_reg...",0,0,1,1,0,1,1,1,1,0,0,0,0,0
2,833032367,"Political,Crime_and_punishment,Fairness_and_eq...",Robert Mueller Not Recommending Any More Indic...,"[Political, Crime_and_punishment, Fairness_and...",0,0,0,1,0,1,1,1,0,1,0,0,0,0
3,814777937,"Political,Morality,Fairness_and_equality,Exter...",The Far Right Is Trying to Co-opt the Yellow V...,"[Political, Morality, Fairness_and_equality, E...",1,1,0,0,1,1,0,1,1,1,0,0,0,0
4,821744708,"Policy_prescription_and_evaluation,Political,L...",‘Special place in hell’ for those who promoted...,"[Policy_prescription_and_evaluation, Political...",0,0,1,1,0,1,0,1,0,0,0,0,0,0
5,833036489,"Political,External_regulation_and_reputation,P...",Bill Maher says he doesn't need Mueller report...,"[Political, External_regulation_and_reputation...",0,0,1,1,0,1,0,1,1,0,0,0,0,0


In [7]:
X = df["content"]
y = df.drop(columns=["article_id", "frames", "frames_list", "content"])

In [8]:
X.head()

0    How Theresa May Botched\n\nThose were the time...
1    Robert Mueller III Rests His Case—Dems NEVER W...
2    Robert Mueller Not Recommending Any More Indic...
3    The Far Right Is Trying to Co-opt the Yellow V...
4    ‘Special place in hell’ for those who promoted...
Name: content, dtype: object

In [9]:
y.head()

Unnamed: 0,Morality,Security_and_defense,Policy_prescription_and_evaluation,Legality_Constitutionality_and_jurisprudence,Economic,Political,Crime_and_punishment,External_regulation_and_reputation,Public_opinion,Fairness_and_equality,Capacity_and_resources,Quality_of_life,Cultural_identity,Health_and_safety
0,1,1,1,1,1,0,0,0,0,0,0,0,0,0
1,0,0,1,1,0,1,1,1,1,0,0,0,0,0
2,0,0,0,1,0,1,1,1,0,1,0,0,0,0
3,1,1,0,0,1,1,0,1,1,1,0,0,0,0
4,0,0,1,1,0,1,0,1,0,0,0,0,0,0


In [10]:
len(y.columns)

14

In [11]:
len(X), len(y)

(432, 432)

# Load Test Data

In [12]:
df_test = pd.read_csv("data/baselines/baseline-output-subtask2-test-en.txt", sep="\t")

df_test.columns = ["article_id", "frames"]

df_test.head()

Unnamed: 0,article_id,frames
0,311,"External_regulation_and_reputation,Morality,Po..."
1,3132,"External_regulation_and_reputation,Security_an..."
2,3138,"Legality_Constitutionality_and_jurisprudence,P..."
3,3154,Legality_Constitutionality_and_jurisprudence
4,3126,"Political,Security_and_defense"


In [13]:
def get_test_article_content(article_id):
    try:
        with open(f"data/data/en/test-articles-subtask-2/article{article_id}.txt", "r") as f:
            return f.read()
    except FileNotFoundError:
        return None

# Apply the function to get the article content
df_test["content"] = df_test["article_id"].apply(get_test_article_content)

# Drop rows where content could not be found
df_test.dropna(subset=["content"], inplace=True)

df_test = df_test.drop(columns=["frames"])

df_test.head()

Unnamed: 0,article_id,content
0,311,Journalist names obstacle to peace between Ukr...
1,3132,Putin ally admits Kyiv could soon bomb Moscow ...
2,3138,US debt surpasses $31 trillion\n\nThe US natio...
3,3154,"Oh Thank God, Banksy Is In Ukraine\n\nThe war ..."
4,3126,"U.S., European health officials admit Microsof..."


### Create Dataset

In [14]:
import warnings
warnings.filterwarnings("ignore", category=UserWarning, module='transformers')

### Extract SRL Embeddings from articles

In [15]:
!pip install pycuda
!pip install allennlp allennlp-models

[0m

In [16]:
from allennlp.predictors.predictor import Predictor
from allennlp_models.structured_prediction.models import srl_bert
from nltk.tokenize import sent_tokenize
import pandas as pd

In [17]:
from tqdm.notebook import tqdm

def batched_extract_srl_components(sentences, predictor):
    # Prepare the batched input for the predictor
    batched_input = [{'sentence': sentence} for sentence in sentences]
    batched_srl = predictor.predict_batch_json(batched_input)
    
    # Extract SRL components from the batched predictions
    results = []
    for srl in batched_srl:
        best_extracted_data = None
        second_best_extracted_data = None
        for verb_entry in srl['verbs']:
            tags = verb_entry['tags']
            arg0_indices = [i for i, tag in enumerate(tags) if tag in ['B-ARG0', 'I-ARG0']]
            arg1_indices = [i for i, tag in enumerate(tags) if tag in ['B-ARG1', 'I-ARG1']]

            if arg0_indices and arg1_indices:
                best_extracted_data = {
                    'predicate': verb_entry['verb'],
                    'ARG0': ' '.join([srl['words'][i] for i in arg0_indices]),
                    'ARG1': ' '.join([srl['words'][i] for i in arg1_indices])
                }
                break
            elif (arg0_indices or arg1_indices) and not second_best_extracted_data:
                second_best_extracted_data = {
                    'predicate': verb_entry['verb'],
                    'ARG0': ' '.join([srl['words'][i] for i in arg0_indices]) if arg0_indices else '',
                    'ARG1': ' '.join([srl['words'][i] for i in arg1_indices]) if arg1_indices else ''
                }

        if best_extracted_data:
            results.append(best_extracted_data)
        elif second_best_extracted_data:
            results.append(second_best_extracted_data)
            
    return results

def optimized_extract_srl(X, predictor, batch_size=32):
    total_articles = len(X)
    processed_articles = 0

    all_results = []

    for article in X:
        sentences = sent_tokenize(article)
        article_srls = []

        for i in range(0, len(sentences), batch_size):
            batched_sentences = sentences[i:i+batch_size]
            article_srls.extend(batched_extract_srl_components(batched_sentences, predictor))

        all_results.append(article_srls)
        processed_articles += 1
        print(f"Processed article {processed_articles}/{total_articles}")

    return pd.Series(all_results)

In [18]:
import pickle

def get_X_srl(X, recalculate=False, pickle_path="../notebooks/classifier/X_srl_filtered.pkl"):
    """
    Returns the X_srl either by loading from a pickled file or recalculating.
    """
    if recalculate or not os.path.exists(pickle_path):
        print("Recalculate SRL")
        # Load predictor
        predictor = Predictor.from_path("https://storage.googleapis.com/allennlp-public-models/structured-prediction-srl-bert.2020.12.15.tar.gz", cuda_device=0)
        X_srl = optimized_extract_srl(X, predictor)
        with open(pickle_path, 'wb') as f:
            pickle.dump(X_srl, f)
    else:
        print("Load SRL from Pickle")
        with open(pickle_path, 'rb') as f:
            X_srl = pickle.load(f)
    return X_srl

# GPU

In [19]:
import torch

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Using device: {device}")

def free_gpu():
    print(torch.cuda.mem_get_info())
    print(torch.cuda.memory_summary())

Using device: cuda


In [20]:
import torch
import gc

def list_gpu_tensors():
    for obj in gc.get_objects():
        try:
            if torch.is_tensor(obj):
                if obj.is_cuda:
                    obj = obj.cpu()
                    obj = obj.to("cpu")
                    print(type(obj), obj.size())
        except:
            pass

        
list_gpu_tensors()



# Dataset

In [21]:
from torch.utils.data import Dataset
from transformers import BertTokenizer
import pandas as pd
import nltk

class ArticleDataset(Dataset):
    def __init__(self, X, X_srl, tokenizer, labels=None, max_sentences_per_article=32, max_sentence_length=32, max_arg_length=16):
        self.X = X
        self.X_srl = X_srl
        self.labels = labels
        self.tokenizer = tokenizer
        self.max_sentences_per_article = max_sentences_per_article
        self.max_sentence_length = max_sentence_length
        self.max_arg_length = max_arg_length
        nltk.download('punkt')  # Download the Punkt tokenizer model for sentence splitting
        
    def __len__(self):
        return len(self.X)
    
    def _truncate_or_pad(self, lst, target_length, pad_value=0):
        """
        Truncate or pad the input list to match the target length.
        """
        if len(lst) > target_length:
            return lst[:target_length]
        else:
            return lst + [pad_value] * (target_length - len(lst))
    
    def __getitem__(self, idx):
        article = self.X.iloc[idx]
        srl = self.X_srl.iloc[idx]

        # Split the article into sentences
        sentences = nltk.sent_tokenize(article)
        sentences = sentences[:self.max_sentences_per_article]  # Limit the number of sentences

        # Tokenize and pad/truncate the sentences
        sentence_ids = [self.tokenizer.encode(sentence, add_special_tokens=True, max_length=self.max_sentence_length, truncation=True, padding='max_length') for sentence in sentences]
        while len(sentence_ids) < self.max_sentences_per_article:
            sentence_ids.append([0] * self.max_sentence_length)

        # Tokenize and pad/truncate the SRL items
        predicate_ids = [self.tokenizer.encode(predicate, add_special_tokens=True, max_length=self.max_arg_length, truncation=True, padding='max_length') for predicate in [item['predicate'] for item in srl]]
        arg0_ids = [self.tokenizer.encode(arg0, add_special_tokens=True, max_length=self.max_arg_length, truncation=True, padding='max_length') for arg0 in [item.get('ARG0', '') for item in srl]]
        arg1_ids = [self.tokenizer.encode(arg1, add_special_tokens=True, max_length=self.max_arg_length, truncation=True, padding='max_length') for arg1 in [item.get('ARG1', '') for item in srl]]
        
        predicate_ids = predicate_ids[:self.max_sentences_per_article]
        arg0_ids = arg0_ids[:self.max_sentences_per_article]
        arg1_ids = arg1_ids[:self.max_sentences_per_article]  
        
        while len(predicate_ids) < self.max_sentences_per_article:
            predicate_ids.append([0] * self.max_arg_length)
        while len(arg0_ids) < self.max_sentences_per_article:
            arg0_ids.append([0] * self.max_arg_length)
        while len(arg1_ids) < self.max_sentences_per_article:
            arg1_ids.append([0] * self.max_arg_length)

        data = {
            'sentence_ids': torch.tensor(sentence_ids, dtype=torch.long),
            'predicate_ids': torch.tensor(predicate_ids, dtype=torch.long),
            'arg0_ids': torch.tensor(arg0_ids, dtype=torch.long),
            'arg1_ids': torch.tensor(arg1_ids, dtype=torch.long)
        }
        
        if self.labels is not None:
            data['labels'] = self.labels.iloc[idx]
        
        return data


In [22]:
from torch.utils.data import DataLoader, random_split
from sklearn.model_selection import train_test_split

# Initialize the tokenizer
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')

def custom_collate_fn(batch):
    # Extract individual lists from the batch
    sentence_ids = [item['sentence_ids'] for item in batch]
    predicate_ids = [item['predicate_ids'] for item in batch]
    arg0_ids = [item['arg0_ids'] for item in batch]
    arg1_ids = [item['arg1_ids'] for item in batch]
    
    # Pad each list
    sentence_ids = torch.nn.utils.rnn.pad_sequence(sentence_ids, batch_first=True, padding_value=0)
    predicate_ids = torch.nn.utils.rnn.pad_sequence(predicate_ids, batch_first=True, padding_value=0)
    arg0_ids = torch.nn.utils.rnn.pad_sequence(arg0_ids, batch_first=True, padding_value=0)
    arg1_ids = torch.nn.utils.rnn.pad_sequence(arg1_ids, batch_first=True, padding_value=0)

    # Conditionally extract and add labels
    output_dict = {
        'sentence_ids': sentence_ids,
        'predicate_ids': predicate_ids,
        'arg0_ids': arg0_ids,
        'arg1_ids': arg1_ids
    }
    
    if 'labels' in batch[0]:
        labels = [item['labels'] for item in batch]
        output_dict['labels'] = torch.Tensor(labels)

    return output_dict


def get_datasets_dataloaders(X, y, tokenizer, recalculate_srl=False, pickle_path="../notebooks/X_srl_filtered.pkl", batch_size=16, max_sentences_per_article=32, max_sentence_length=32, max_arg_length=16):
    # Get X_srl
    X_srl = get_X_srl(X, recalculate=recalculate_srl, pickle_path=pickle_path)
    
    test_size = 0.1
    
    # Assuming X, X_srl, and y are already defined and have the same number of samples
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=test_size, random_state=42)
    
    # Calculate class distributions for y_train and y_test in percentage
    train_dist_percent = (y_train.sum() / y_train.shape[0]) * 100
    test_dist_percent = (y_test.sum() / y_test.shape[0]) * 100

    # Create a DataFrame to display them side by side
    dist_comparison = pd.DataFrame({
        'Class': train_dist_percent.index,
        'Train Distribution (%)': train_dist_percent.values,
        'Test Distribution (%)': test_dist_percent.values
    })
    
    print(dist_comparison)
    
    X_srl_train, X_srl_test, _, _ = train_test_split(X_srl, y, test_size=test_size, random_state=42)
    
    # Create the dataset
    train_dataset = ArticleDataset(X_train, X_srl_train, tokenizer, y_train, max_sentences_per_article, max_sentence_length, max_arg_length)
    test_dataset = ArticleDataset(X_test, X_srl_test, tokenizer, y_test, max_sentences_per_article, max_sentence_length, max_arg_length)

    # Create dataloaders
    train_dataloader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True, collate_fn=custom_collate_fn, drop_last=True)
    test_dataloader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False, collate_fn=custom_collate_fn, drop_last=True)
    
    print("CREATION DONE")
    return train_dataset, test_dataset, train_dataloader, test_dataloader



In [23]:
def get_article_dataloader(article, tokenizer, batch_size=1):
    X = pd.Series([article])
    y = None  # No labels for this single article
    
    predictor = Predictor.from_path("https://storage.googleapis.com/allennlp-public-models/structured-prediction-srl-bert.2020.12.15.tar.gz", cuda_device=0)
    # Directly use the optimized_extract_srl function since we don't need to cache for single articles
    X_srl = optimized_extract_srl(X, predictor)
    
    # Create the dataset
    dataset = ArticleDataset(X, X_srl, tokenizer, y)
    
    # Create dataloader
    dataloader = DataLoader(dataset, batch_size=batch_size, shuffle=False, collate_fn=custom_collate_fn)
    
    return dataloader

In [24]:
def get_test_dataloader(X, tokenizer, batch_size=4):
    y = None
    
    predictor = Predictor.from_path("https://storage.googleapis.com/allennlp-public-models/structured-prediction-srl-bert.2020.12.15.tar.gz", cuda_device=0)
    # Directly use the optimized_extract_srl function since we don't need to cache for single articles
    X_srl = optimized_extract_srl(X, predictor)
    
    # Create the dataset
    dataset = ArticleDataset(X, X_srl, tokenizer, y)
    
    # Create dataloader
    dataloader = DataLoader(dataset, batch_size=batch_size, shuffle=False, collate_fn=custom_collate_fn)
    
    return dataloader

# PyTorch Model
The Model consist out of various Layers.

1. SRL_Embedding
2. Autoencoder
3. FRISSLoss
4. Unsupervised
5. Supervised
6. FRISS

In [25]:
import torch
import torch.nn as nn
import torch.nn.functional as F
from transformers import BertModel

## 1. SRL_Embeddings

The layer takes tensors of token IDs with the shape [batch_size, max_num_sentences, max_num_tokens] for the sentence, predicates, arg0 and arg1 and returns for each sentence an embedding with shape [batch_size, embedding_dim] for the sentence, predicate, arg0 and arg1. 

The single embedding for the sentence is extracted by taking the [CLS] token embedding. For the predicate, arg0 and arg1 by taking the mean over all word embeddings in this list of tokens. 

> Possible improvements: Better way of extracting the single embedding for predicate, arg0 and arg1.

In [39]:
from transformers import BertModel
import torch.nn as nn
import torch

class SRL_Embeddings(nn.Module):
    def __init__(self, bert_model_name="bert-base-uncased"):
        super(SRL_Embeddings, self).__init__()
        self.bert_model = BertModel.from_pretrained(bert_model_name)
        self.embedding_dim = 768  # for bert-base-uncased

    def forward(self, sentence_ids, predicate_ids, arg0_ids, arg1_ids):
        with torch.no_grad():
            # Extract embeddings directly using BERT
            # Adjust dimensions to 2D for BERT input, then reshape back to 3D
            sentence_embeddings_3d = self.bert_model(sentence_ids.view(-1, sentence_ids.size(-1)))[0].view(sentence_ids.size(0), sentence_ids.size(1), -1, self.embedding_dim)
            predicate_embeddings_3d = self.bert_model(predicate_ids.view(-1, predicate_ids.size(-1)))[0].view(predicate_ids.size(0), predicate_ids.size(1), -1, self.embedding_dim)
            arg0_embeddings_3d = self.bert_model(arg0_ids.view(-1, arg0_ids.size(-1)))[0].view(arg0_ids.size(0), arg0_ids.size(1), -1, self.embedding_dim)
            arg1_embeddings_3d = self.bert_model(arg1_ids.view(-1, arg1_ids.size(-1)))[0].view(arg1_ids.size(0), arg1_ids.size(1), -1, self.embedding_dim)

        sentence_embeddings = sentence_embeddings_3d.mean(dim=2)

        # Average token embeddings for predicates, ARG0, and ARG1
        predicate_embeddings = predicate_embeddings_3d.mean(dim=2)
        arg0_embeddings = arg0_embeddings_3d.mean(dim=2)
        arg1_embeddings = arg1_embeddings_3d.mean(dim=2)
        
        return sentence_embeddings, predicate_embeddings, arg0_embeddings, arg1_embeddings

# Generate dummy data for the SRL_Embeddings
batch_size = 2
num_sentences = 12
sentence_length = 8
predicate_length = 8
arg0_length = 8
arg1_length = 8

# Dummy data for sentences, predicates, arg0, and arg1
sentence_ids = torch.randint(0, 10000, (batch_size, num_sentences, sentence_length))
predicate_ids = torch.randint(0, 10000, (batch_size, num_sentences, predicate_length))
arg0_ids = torch.randint(0, 10000, (batch_size, num_sentences, arg0_length))
arg1_ids = torch.randint(0, 10000, (batch_size, num_sentences, arg1_length))

srl_embeddings = SRL_Embeddings()

sentence_embeddings, predicate_embeddings, arg0_embeddings, arg1_embeddings = srl_embeddings(sentence_ids, predicate_ids, arg0_ids, arg1_ids)

print(sentence_embeddings.shape, predicate_embeddings.shape, arg0_embeddings.shape, arg1_embeddings.shape)

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.predictions.bias', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


torch.Size([2, 12, 768]) torch.Size([2, 12, 768]) torch.Size([2, 12, 768]) torch.Size([2, 12, 768])


## 2. Autoencoder

In [58]:
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.nn.functional import log_softmax, softmax, sigmoid, logsigmoid

class CombinedAutoencoder(nn.Module):
    def __init__(self, D_w, D_h, K, dropout_prob=0.3):
        super(CombinedAutoencoder, self).__init__()
        
        self.D_h = D_h
        self.K = K
        
        # Shared feed-forward layer for all views
        self.feed_forward_shared = nn.Linear(2 * D_w, D_h)
        
        # Unique feed-forward layers for each view
        self.feed_forward_unique = nn.ModuleDict({
            'a0': nn.Linear(D_h, K),
            'p': nn.Linear(D_h, K),
            'a1': nn.Linear(D_h, K),
        })

        # Initializing F matrices for each view
        self.F_matrices = nn.ParameterDict({
            'a0': nn.Parameter(torch.Tensor(K, D_w)),
            'p': nn.Parameter(torch.Tensor(K, D_w)),
            'a1': nn.Parameter(torch.Tensor(K, D_w)),
        })

        # init F matrices with xavier_uniform and nn.init.calculate_gain('relu')
        for _, value in self.F_matrices.items():
            nn.init.xavier_uniform_(value.data, gain=nn.init.calculate_gain('relu'))

        # Additional layers and parameters
        self.dropout1 = nn.Dropout(dropout_prob)
        self.dropout2 = nn.Dropout(dropout_prob)
        self.batch_norm = nn.BatchNorm1d(D_h)
        self.activation = nn.ReLU()
        self.activation2 = nn.Sigmoid()

    def sample_gumbel(self, shape, eps=1e-20, device='cpu'):
        """Sample from Gumbel(0, 1)"""
        U = torch.rand(shape, device=device)
        return -torch.log(-torch.log(U + eps) + eps)
    
    # softmax

    def gumbel_softmax_sample(self, logits, t):
        """ Draw a sample from the Gumbel-Softmax distribution"""
        y = logits + self.sample_gumbel(logits.size(), device=logits.device)
        return softmax(y / t, dim=-1)
    
    def gumbel_logsoftmax_sample(self, logits, t):
        """ Draw a sample from the Gumbel-Softmax distribution"""
        y = logits + self.sample_gumbel(logits.size(), device=logits.device)
        return log_softmax(y / t, dim=-1)


    def custom_gumbel_softmax(self, logits, tau, hard=False, log=False):
        """Sample from the Gumbel-Softmax distribution and optionally discretize.
        """
        if log:
            y = self.gumbel_logsoftmax_sample(logits, tau)
        else:
            y = self.gumbel_softmax_sample(logits, tau)
        if hard:
            shape = y.size()
            _, ind = y.max(dim=-1)
            y_hard = torch.zeros_like(y).view(-1, shape[-1])
            y_hard.scatter_(1, ind.view(-1, 1), 1)
            y_hard = y_hard.view(*shape)
            # Set gradients w.r.t. y_hard gradients w.r.t. y
            y_hard = (y_hard - y).detach() + y
            return y_hard
        return y
    
    # sigmoid

    def gumbel_sigmoid_sample(self, logits, t):
        """ Draw a sample from the Gumbel-Sigmoid distribution"""
        y = logits + self.sample_gumbel(logits.size(), device=logits.device)
        return sigmoid(y / t)

    def gumbel_logsigmoid_sample(self, logits, t):
        """ Draw a sample from the Gumbel-Sigmoid distribution"""
        y = logits + self.sample_gumbel(logits.size(), device=logits.device)
        return logsigmoid(y / t)
    
    def custom_gumbel_sigmoid(self, logits, tau, hard=False, log=False):
        """Sample from the Gumbel-Sigmoid distribution and optionally discretize."""
        if log:
            y = self.gumbel_logsigmoid_sample(logits, tau)
        else:
            y = self.gumbel_sigmoid_sample(logits, tau)
        if hard:
            shape = y.size()
            _, ind = y.max(dim=-1)
            y_hard = torch.zeros_like(y).view(-1, shape[-1])
            y_hard.scatter_(1, ind.view(-1, 1), 1)
            y_hard = y_hard.view(*shape)
            # Set gradients w.r.t. y_hard gradients w.r.t. y
            y_hard = (y_hard - y).detach() + y
            return y_hard
        return y

    def forward(self, v_p, v_a0, v_a1, v_sentence, tau, use_softmax=False):
        h_p = self.process_through_shared(v_p, v_sentence)
        h_a0 = self.process_through_shared(v_a0, v_sentence)
        h_a1 = self.process_through_shared(v_a1, v_sentence)

        logits_p = self.feed_forward_unique['p'](h_p)
        logits_a0 = self.feed_forward_unique['a0'](h_a0)
        logits_a1 = self.feed_forward_unique['a1'](h_a1) 

        if use_softmax:
            d_p = torch.softmax(logits_p, dim=1)
            d_a0 = torch.softmax(logits_a0, dim=1)
            d_a1 = torch.softmax(logits_a1, dim=1)
            
            # TODO - Paper said we pass the output of softmax into the Gumbel-Softmax but code passes the logits

            # gz_p = self.custom_gumbel_softmax(dz_p, tau=tau, hard=False, log=True)
            # gz_a0 = self.custom_gumbel_softmax(dz_a0, tau=tau, hard=False, log=True)
            # gz_a1 = self.custom_gumbel_softmax(dz_a1, tau=tau, hard=False, log=True)

            g_p = self.custom_gumbel_softmax(logits_p, tau=tau, hard=False, log=True)
            g_a0 = self.custom_gumbel_softmax(logits_a0, tau=tau, hard=False, log=True)
            g_a1 = self.custom_gumbel_softmax(logits_a1, tau=tau, hard=False, log=True)
        else:
            d_p = torch.sigmoid(logits_p)
            d_a0 = torch.sigmoid(logits_a0)
            d_a1 = torch.sigmoid(logits_a1)

            g_p = self.custom_gumbel_sigmoid(logits_p, tau=tau, hard=False, log=True)
            g_a0 = self.custom_gumbel_sigmoid(logits_a0, tau=tau, hard=False, log=True)
            g_a1 = self.custom_gumbel_sigmoid(logits_a1, tau=tau, hard=False, log=True)

        vhat_p = torch.matmul(g_p, self.F_matrices['p'])
        vhat_a0 = torch.matmul(g_a0, self.F_matrices['a0'])
        vhat_a1 = torch.matmul(g_a1, self.F_matrices['a1'])

        return {
            "p": {"vhat": vhat_p, "d": d_p, "g": g_p, "F": self.F_matrices['p']},
            "a0": {"vhat": vhat_a0, "d": d_a0, "g": g_a0, "F": self.F_matrices['a0']},
            "a1": {"vhat": vhat_a1, "d": d_a1, "g": g_a1, "F": self.F_matrices['a1']}
        }
        
    def process_through_shared(self, v_z, v_sentence):
        # Concatenating v_z with the sentence embedding
        concatenated = torch.cat((v_z, v_sentence), dim=-1)
        
        # Applying dropout
        dropped = self.dropout1(concatenated)

        # Passing through the shared linear layer
        h_shared = self.feed_forward_shared(dropped)

        # Applying batch normalization and ReLU activation
        h_shared = self.batch_norm(h_shared)
        h_shared = self.activation(h_shared)

        # Applying dropout again
        h_shared = self.dropout2(h_shared)

        return h_shared

# Mock Data Preparation
D_h = 768
batch_size = 2
embedding_dim = 768
K = 20
tau = 0.9

# Generating mock embeddings for article, predicate, ARG0, ARG1, and their corresponding sentence embeddings
article_embedding = torch.randn(batch_size, embedding_dim)
v_p = torch.randn(batch_size, embedding_dim)
v_a0 = torch.randn(batch_size, embedding_dim)
v_a1 = torch.randn(batch_size, embedding_dim)

# Testing CombinedAutoencoder
autoencoder = CombinedAutoencoder(embedding_dim, D_h, K)
outputs = autoencoder(v_p, v_a0, v_a1, article_embedding, tau)

# Check shapes of the outputs
print("Output shapes:")
for key, value in outputs.items():
    print(f"{key} -> vhat: {value['vhat'].shape}, d: {value['d'].shape}, g: {value['g'].shape}, F: {value['F'].shape}")

# check if tensor have nan values
def check_nan(tensor):
    # if tensor has any nan values, return True
    if torch.isnan(tensor).any():
        return True
    else:
        return False

# Check if any of the outputs have NaN values
print("NaN values:")
for key, value in outputs.items():
    print(f"{key} -> vhat: {check_nan(value['vhat'])}, d: {check_nan(value['d'])}, g: {check_nan(value['g'])}, F: {check_nan(value['F'])}")

Output shapes:
p -> vhat: torch.Size([2, 768]), d: torch.Size([2, 20]), g: torch.Size([2, 20]), F: torch.Size([20, 768])
a0 -> vhat: torch.Size([2, 768]), d: torch.Size([2, 20]), g: torch.Size([2, 20]), F: torch.Size([20, 768])
a1 -> vhat: torch.Size([2, 768]), d: torch.Size([2, 20]), g: torch.Size([2, 20]), F: torch.Size([20, 768])
NaN values:
p -> vhat: False, d: False, g: False, F: False
a0 -> vhat: False, d: False, g: False, F: False
a1 -> vhat: False, d: False, g: False, F: False


## 3. FRISSLoss

In [59]:
class FRISSLoss(nn.Module):
    def __init__(self, lambda_orthogonality, M, t):
        super(FRISSLoss, self).__init__()
        
        self.lambda_orthogonality = lambda_orthogonality
        self.M = M
        self.t = t
        self.triplet_loss = nn.TripletMarginLoss(margin=M)

    def contrastive_loss(self, v, vhat, negatives):
        batch_size = vhat.size(0)
        N = negatives.size(0)
        loss = torch.zeros(batch_size, device=v.device)

        # Calculate true distance between reconstructed and real embeddings
        true_distance = self.l2(vhat, v)

        for i in range(N):  # loop over each element in "negatives"
            
            # Tranform negative from [embedding dim] to [batch size, embedding_dim] 
            negative = negatives[i, :].expand(v.size(0), -1)

            # Calculate negative distance for current negative embedding
            negative_distance = self.l2(vhat, negative)

            # Compute loss based on the provided logic: l2(vhat, v) + 1 + l2(vhat, negative) and clamp to 0 if below 0
            current_loss = 1 + true_distance - negative_distance
            loss += torch.clamp(current_loss, min=0.0)

        # Normalize the total loss by N
        return loss / N

    
    def l2(self, u, v):
        return torch.sqrt(torch.sum((u - v) ** 2, dim=1))
    
    def focal_triplet_loss_WRONG(self, v, vhat_z, g, F):
        losses = []
        for i in range(F.size(0)):  # Iterate over each negative example
            # For each negative, compute the loss against the anchor and positive
            loss = self.triplet_loss(vhat_z, v, F[i].unsqueeze(0).expand(v.size(0), -1))
            losses.append(loss)

        loss_tensor = torch.stack(losses) 
        loss = loss_tensor.mean(dim=0).mean()
        return loss
    
    def focal_triplet_loss(self, v, vhat_z, g, F):
        _, indices = torch.topk(g, self.t, largest=False, dim=1)

        F_t = torch.stack([F[indices[i]] for i in range(g.size(0))])

        g_tz = torch.stack([g[i, indices[i]] for i in range(g.size(0))])
                    
        g_t = g_tz / g_tz.sum(dim=1, keepdim=True)
        
        # if division by zero set all nan values to 0
        g_t[torch.isnan(g_t)] = 0
        
        m_t = self.M * ((1 - g_t)**2)

        # Initializing loss
        loss = torch.zeros_like(v[:, 0])
        
        # Iteratively adding to the loss for each negative embedding
        for i in range(self.t):
            current_v_t = F_t[:, i]
            current_m_t = m_t[:, i]
            
            current_loss = current_m_t + self.l2(vhat_z, v) - self.l2(vhat_z, current_v_t)
            
            loss += torch.max(torch.zeros_like(current_loss), current_loss)
             
        # Normalizing
        loss = loss / self.t
        return loss

    def orthogonality_term(self, F, reg=1e-4):
        gram_matrix = torch.mm(F, F.T)  # Compute the Gram matrix F * F^T
        identity_matrix = torch.eye(gram_matrix.size(0), device=gram_matrix.device)  # Create an identity matrix
        ortho_loss = (gram_matrix - identity_matrix).abs().sum()
        return ortho_loss


    def forward(self, p, a0, a1, p_negatives, a0_negatives, a1_negatives):
        # Extract components from dictionary for predicate p
        v_p, vhat_p, d_p, g_p, F_p = p["v"], p["vhat"], p["d"], p["g"], p["F"]
        
        # Extract components from dictionary for ARG0
        v_a0, vhat_a0, d_a0, g_a0, F_a0 = a0["v"], a0["vhat"], a0["d"], a0["g"], a0["F"]

        # Extract components from dictionary for ARG1
        v_a1, vhat_a1, d_a1, g_a1, F_a1 = a1["v"], a1["vhat"], a1["d"], a1["g"], a1["F"]
        
         # Calculate losses for predicate
        Ju_p = self.contrastive_loss(v_p, vhat_p, p_negatives)        
        Jt_p = self.focal_triplet_loss(v_p, vhat_p, g_p, F_p)        
        Jz_p = Ju_p + Jt_p + self.lambda_orthogonality * self.orthogonality_term(F_p) ** 2
        
        # Calculate losses for ARG0
        Ju_a0 = self.contrastive_loss(v_a0, vhat_a0, a0_negatives)
        Jt_a0 = self.focal_triplet_loss(v_a0, vhat_a0, g_a0, F_a0)
        Jz_a0 = Ju_a0 + Jt_a0 + self.lambda_orthogonality * self.orthogonality_term(F_a0) ** 2
        
        # Calculate losses for ARG1
        Ju_a1 = self.contrastive_loss(v_a1, vhat_a1, a1_negatives)
        Jt_a1 = self.focal_triplet_loss(v_a1, vhat_a1, g_a1, F_a1)
        Jz_a1 = Ju_a1 + Jt_a1 + self.lambda_orthogonality * self.orthogonality_term(F_a1) ** 2
        
        if torch.isnan(Jz_p).any():
            print("Jz_p has nan")
            
        if torch.isnan(Jz_a0).any():
            print("Jz_a0 has nan")
            
        if torch.isnan(Jz_a1).any():
            print("Jz_a1 has nan")
        
        # Aggregate the losses
        loss = Jz_p + Jz_a0 + Jz_a1
        
        return loss


# Mock Data Preparation
batch_size = 2
embedding_dim = 768
K = 15  # Number of frames/descriptors

# Generating mock embeddings for article, predicate, ARG0, ARG1 and their reconstructions
article_embedding = torch.randn(batch_size, embedding_dim)
v_p = torch.randn(batch_size, embedding_dim)
vhat_p = torch.randn(batch_size, embedding_dim)

v_a0 = torch.randn(batch_size, embedding_dim)
vhat_a0 = torch.randn(batch_size, embedding_dim)

v_a1 = torch.randn(batch_size, embedding_dim)
vhat_a1 = torch.randn(batch_size, embedding_dim)

# Generating mock descriptor weights and descriptor matrices for predicate, ARG0, ARG1
d_p = torch.randn(batch_size, K)
d_a0 = torch.randn(batch_size, K)
d_a1 = torch.randn(batch_size, K)

F_p = torch.randn(K, embedding_dim)
F_a0 = torch.randn(K, embedding_dim)
F_a1 = torch.randn(K, embedding_dim)

g_p = torch.randn(batch_size, K)
g_a0 = torch.randn(batch_size, K)
g_a1 = torch.randn(batch_size, K)

# Generating some negative samples (let's assume 5 negative samples per batch entry)
num_negatives = 8
negatives_p = torch.randn(num_negatives, embedding_dim)
negatives_a0 = torch.randn(num_negatives, embedding_dim)
negatives_a1 = torch.randn(num_negatives, embedding_dim)

# Initialize loss function
lambda_orthogonality = 1e-3

t = 8  # Number of descriptors with smallest weights for negative samples
M = t

loss_fn = FRISSLoss(lambda_orthogonality, M, t)

# Organizing inputs into dictionaries
p = {"v": v_p, "vhat": vhat_p, "d": d_p, "g": g_p, "F": F_p}
a0 = {"v": v_a0, "vhat": vhat_a0, "d": d_a0, "g": g_a0, "F": F_a0}
a1 = {"v": v_a1, "vhat": vhat_a1, "d": d_a1, "g": g_a1, "F": F_a1}

loss_fn = FRISSLoss(lambda_orthogonality, M, t)
loss = loss_fn(p, a0, a1, negatives_p, negatives_a0, negatives_a1)
print("FRiSSLoss output:", loss)

FRiSSLoss output: tensor([762557.6250, 762580.1875])


## 4. FRISSUnsupervised

The `FRISSUnsupervised` layer integrates multiple autoencoders and the previously described `FRISSLoss` layer to achieve an unsupervised learning process over the predicates and their arguments.

### Forward Method:

**Inputs**:
1. **v_p**: Embedding of the predicate with size: [batch_size, D_w].
2. **v_a0**: Embedding of the ARG0 (first argument) with size: [batch_size, D_w].
3. **v_a1**: Embedding of the ARG1 (second argument) with size: [batch_size, D_w].
4. **v_article**: Embedding of the article with size: [batch_size, D_w].
5. **negatives**: Tensor containing negative samples with size: [batch_size, num_negatives, D_w].
6. **tau**: A scalar parameter for the Gumbel softmax in the autoencoder.

**Outputs**:
- A dictionary `results` containing:
    - **loss**: A tensor representing the combined unsupervised loss over the batch with size: [batch_size].
    - **p**: Dictionary containing components for the predicate, including reconstructed embedding (`vhat`), descriptor weights (`d`), Gumbel softmax result (`g`), and the descriptor matrix (`F`).
    - **a0**: Same as `p` but for ARG0.
    - **a1**: Same as `p` but for ARG1.

In [60]:
import torch
import torch.nn as nn
import torch.nn.functional as F

class FRISSUnsupervised(nn.Module):
    def __init__(self, D_w, D_h, K, num_frames, lambda_orthogonality, M, t, dropout_prob=0.3):
        super(FRISSUnsupervised, self).__init__()
        
        self.loss_fn = FRISSLoss(lambda_orthogonality, M, t)      
        
        # Using the CombinedAutoencoder instead of individual Autoencoders
        self.combined_autoencoder = CombinedAutoencoder(D_w, D_h, K, dropout_prob=dropout_prob)

    def forward(self, v_p, v_a0, v_a1, v_sentence, p_negatives, a0_negatives, a1_negatives, tau):
        outputs = self.combined_autoencoder(v_p, v_a0, v_a1, v_sentence, tau)
        
        outputs_p = outputs["p"]
        outputs_p["v"] = v_p
        
        outputs_a0 = outputs["a0"]
        outputs_a0["v"] = v_a0
        
        outputs_a1 = outputs["a1"]
        outputs_a1["v"] = v_a1
        
        loss = self.loss_fn(
            outputs_p,
            outputs_a0, 
            outputs_a1, 
            p_negatives, a0_negatives, a1_negatives
        )

        results = {
            "loss": loss,
            "p": outputs["p"],
            "a0": outputs["a0"],
            "a1": outputs["a1"]
        }
        
        return results

# Mock Data Preparation
D_h = 768
batch_size = 2
embedding_dim = 768
K = 20
num_frames = 15
tau = 0.9
lambda_orthogonality = 0.1  # Placeholder value, please replace with your actual value
M = 7  # Placeholder value, please replace with your actual value
t = 7  # Placeholder value, please replace with your actual value

# Generating mock embeddings for article, predicate, ARG0, ARG1, and their corresponding sentence embeddings
article_embedding = torch.randn(batch_size, embedding_dim)
v_p = torch.randn(batch_size, embedding_dim)
v_a0 = torch.randn(batch_size, embedding_dim)
v_a1 = torch.randn(batch_size, embedding_dim)

# Generating some negative samples (let's assume 5 negative samples per batch entry)
num_negatives = 10
negatives_p = torch.randn(num_negatives, embedding_dim)
negatives_a0 = torch.randn(num_negatives, embedding_dim)
negatives_a1 = torch.randn(num_negatives, embedding_dim)

# Testing FRISSUnsupervised
unsupervised_module = FRISSUnsupervised(embedding_dim, D_h, K, num_frames, lambda_orthogonality, M, t)
results = unsupervised_module(v_p, v_a0, v_a1, article_embedding, negatives_p, negatives_a0, negatives_a1, tau)

# Print the results' shapes for verification
print("Results' Shapes:")
for key, value in results.items():
    if key == "loss":
        print(f"{key}: {value}")
    else:
        print(f"{key} -> vhat: {value['vhat'].shape}, d: {value['d'].shape}, g: {value['g'].shape}, F: {value['F'].shape}")


Results' Shapes:
loss: tensor([3156.4802, 3159.9739], grad_fn=<AddBackward0>)
p -> vhat: torch.Size([2, 768]), d: torch.Size([2, 20]), g: torch.Size([2, 20]), F: torch.Size([20, 768])
a0 -> vhat: torch.Size([2, 768]), d: torch.Size([2, 20]), g: torch.Size([2, 20]), F: torch.Size([20, 768])
a1 -> vhat: torch.Size([2, 768]), d: torch.Size([2, 20]), g: torch.Size([2, 20]), F: torch.Size([20, 768])


## 5. FRISSSupervised

The layer takes the embeddings from the args and the sentence and predicts frames. 

The embeddings for the args are averaged for each arg individually and then averaged on args level. The final embedding is feed into a linear layer and passed through a sigmoid function. 

The sentence embedding is feed into a linear layer and then into a relu function. After again in a linear function and then averaged. The average embeddung is again feed into a linear layer and lastly in a signoid function. 

It returns a span and sentence based prediction of shape [batch_size, num_frames].

In [61]:
import torch
import torch.nn as nn
import torch.nn.functional as F

class FRISSSupervised(nn.Module):
    def __init__(self, D_w, K, num_frames, dropout_prob=0.3):
        super(FRISSSupervised, self).__init__()

        self.D_w = D_w
                
        # self.softmax = nn.Softmax(dim=1)

        self.feed_forward_sentence1 = nn.Linear(D_w, D_w)
        self.feed_forward_sentence2 = nn.Linear(D_w, num_frames)

        self.relu = nn.ReLU()

        # Adding two dropout layers
        self.dropout1 = nn.Dropout(dropout_prob)
        self.dropout2 = nn.Dropout(dropout_prob)
        
    def forward(self, d_p, d_a0, d_a1, vs):
        # Span-based Classification   

        # Aggregate the SRL descriptors to have one descriptor per sentence
        d_p = d_p.mean(dim=1)
        d_a0 = d_a0.mean(dim=1)
        d_a1 = d_a1.mean(dim=1)

        # Take the mean over descriptors
        w_u = (d_p + d_a0 + d_a1) / 3

        # Sentence-based Classification

        # Apply the first dropout to vs
        vs = self.dropout1(vs)

        ws = self.relu(self.feed_forward_sentence1(vs))

        # Mean over sentences and apply the second dropout
        ws = self.dropout2(ws.mean(dim=1))

        # Pass through the second feed forward network
        ws = self.feed_forward_sentence2(ws)

        # The softmax layer is commented out as it is not used with CrossEntropyLoss
        # ys_hat = self.softmax(ws)

        # combined pred = sum of span-based and sentence-based predictions
        combined = w_u + ws

        return w_u, ws, combined


# Mock Data Preparation

batch_size = 2
embedding_dim = 768
num_frames = 15  # Assuming the number of frames is equal to K for simplicity
num_sentences = 32
K = 15
num_args = 9

# Generating mock dsz representations for predicate, ARG0, ARG1
d_p = torch.randn(batch_size, num_sentences, K)
d_a0 = torch.randn(batch_size, num_sentences, K)
d_a1 = torch.randn(batch_size, num_sentences, K) 

# Adjusting the num_heads parameter
srl_heads = 4
sentence_heads = 8

# Adjust the mock sentence embeddings shape
vs = torch.randn(batch_size, num_sentences, embedding_dim)

# Initialize and test the supervised module
supervised_module = FRISSSupervised(embedding_dim, K, num_frames)

# Forward pass the mock data
yu_hat, ys_hat, combined_pred = supervised_module(d_p, d_a0, d_a1, vs)
yu_hat.shape, ys_hat.shape, combined_pred.shape

(torch.Size([2, 15]), torch.Size([2, 15]), torch.Size([2, 15]))

## 6. FRISS

In [62]:
import torch.nn as nn

class FRISS(nn.Module):
    def __init__(self, embedding_dim, D_h, lambda_orthogonality, M, t, num_sentences, K, num_frames, dropout_prob=0.3, bert_model_name="bert-base-uncased"):
        super(FRISS, self).__init__()
        
        # Aggregation layer replaced with SRL_Embeddings
        self.aggregation = SRL_Embeddings(bert_model_name)
        
        # Unsupervised training module
        self.unsupervised = FRISSUnsupervised(embedding_dim, D_h, K, num_frames, lambda_orthogonality, M, t, dropout_prob=dropout_prob)
        
        # Supervised training module
        self.supervised = FRISSSupervised(embedding_dim, K, num_frames, dropout_prob=dropout_prob)
        
    def negative_sampling(self, embeddings, num_negatives=8):
        batch_size, num_sentences, embedding_dim = embeddings.size()
        all_negatives = []

        for i in range(batch_size):
            # Flatten the arguments dimension to sample across all arguments in the sentence
            flattened_embeddings = embeddings[i].view(-1, embedding_dim)
            
            # Get indices of non-padded embeddings (assuming padding is represented by all-zero vectors)
            non_padded_indices = torch.where(torch.any(flattened_embeddings != 0, dim=1))[0]

            # Randomly sample negative indices from non-padded embeddings
            if len(non_padded_indices) > 0:
                negative_indices = non_padded_indices[torch.randint(0, len(non_padded_indices), (num_negatives,))]
            else:
                # If no non-padded embeddings, use zeros
                negative_indices = torch.zeros(num_negatives, dtype=torch.long)

            negative_samples = flattened_embeddings[negative_indices, :]
            all_negatives.append(negative_samples)

        # Concatenate all negative samples into a single tensor
        all_negatives = torch.cat(all_negatives, dim=0)

        # If more samples than required, randomly select 'num_negatives' samples
        if all_negatives.size(0) > num_negatives:
            indices = torch.randperm(all_negatives.size(0))[:num_negatives]
            all_negatives = all_negatives[indices]

        return all_negatives
    
    def forward(self, sentence_ids, predicate_ids, arg0_ids, arg1_ids, tau):
        # Convert input IDs to embeddings
        sentence_embeddings, predicate_embeddings, arg0_embeddings, arg1_embeddings = self.aggregation(sentence_ids, predicate_ids, arg0_ids, arg1_ids)
        
        # Handle multiple spans by averaging predictions
        unsupervised_losses = torch.zeros((sentence_embeddings.size(0),), device=sentence_embeddings.device)
        
        # Creating storage for aggregated d tensors
        d_p_list, d_a0_list, d_a1_list = [], [], []
        
        # Process each span
        for span_idx in range(sentence_embeddings.size(1)):
            s_sentence_span = sentence_embeddings[:, span_idx, :]
            v_p_span = predicate_embeddings[:, span_idx, :]
            v_a0_span = arg0_embeddings[:, span_idx, :]
            v_a1_span = arg1_embeddings[:, span_idx, :]
            
            negatives_p = self.negative_sampling(predicate_embeddings)
            negatives_a0 = self.negative_sampling(arg0_embeddings)
            negatives_a1 = self.negative_sampling(arg1_embeddings)
 
            # Feed the embeddings to the unsupervised module
            unsupervised_results = self.unsupervised(v_p_span, v_a0_span, v_a1_span, s_sentence_span, negatives_p, negatives_a0, negatives_a1, tau)                
            unsupervised_losses += unsupervised_results["loss"]
            
            if torch.isnan(unsupervised_results["loss"]).any():
                print("loss is nan")
            
            # Use the vhat (reconstructed embeddings) for supervised predictions
            d_p_list.append(unsupervised_results['p']['d'])
            d_a0_list.append(unsupervised_results['a0']['d'])
            d_a1_list.append(unsupervised_results['a1']['d'])        
        
        # Aggregating across all spans
        d_p_aggregated = torch.stack(d_p_list, dim=1)
        d_a0_aggregated = torch.stack(d_a0_list, dim=1)
        d_a1_aggregated = torch.stack(d_a1_list, dim=1)
        
        span_pred, sentence_pred, combined_pred = self.supervised(d_p_aggregated, d_a0_aggregated, d_a1_aggregated, sentence_embeddings)
        
        if torch.isnan(span_pred).any():
            print("span_pred has nan:", span_pred)
        
        if torch.isnan(sentence_pred).any():
            print("sentence_pred has nan:", sentence_pred)
        
        # Identify valid (non-nan) losses
        valid_losses = ~torch.isnan(unsupervised_losses)

        # Sum only the valid losses
        #unsupervised_loss = unsupervised_losses[valid_losses].sum()
        
        # Take average by summing the valid losses and dividing by num sentences so that padded sentences are also taken in equation
        unsupervised_loss = unsupervised_losses[valid_losses].sum() / sentence_embeddings.shape[1]
        
        return unsupervised_loss, span_pred, sentence_pred, combined_pred


# Set the necessary parameters
batch_size = 2
embedding_dim = 768
K = 14  # Number of frames/descriptors
num_frames = 14  # Assuming the number of frames is equal to K for simplicity
D_h = 512  # Dimension of the hidden representation
lambda_orthogonality = 0.1
M = 8
t = 8
tau = 1.0

# Define some mock token IDs data parameters
max_sentences_per_article = 8
max_sentence_length = 10
num_sentences = max_sentences_per_article

# Generating mock token IDs for predicate, ARG0, ARG1, and their corresponding sentences
# We assume a vocab size of 30522 (standard BERT vocab size) for simplicity.
vocab_size = 30522

sentence_ids = torch.randint(0, vocab_size, (batch_size, max_sentences_per_article, max_sentence_length))
predicate_ids = torch.randint(0, vocab_size, (batch_size, max_sentences_per_article, max_sentence_length))
arg0_ids = torch.randint(0, vocab_size, (batch_size, max_sentences_per_article, max_sentence_length))
arg1_ids = torch.randint(0, vocab_size, (batch_size, max_sentences_per_article, max_sentence_length))

sentence_embeddings = torch.randn(batch_size, max_sentences_per_article, embedding_dim)
predicate_embeddings = torch.randn(batch_size, max_sentences_per_article, embedding_dim)
arg0_embeddings = torch.randn(batch_size, max_sentences_per_article, embedding_dim)
arg1_embeddings = torch.randn(batch_size, max_sentences_per_article, embedding_dim)

srl_heads = 7
sentence_heads = 8

# Initialize the FRISS model
friss_model = FRISS(embedding_dim, D_h, lambda_orthogonality, M, t, num_sentences, K=K, num_frames=num_frames)

# Forward pass the mock data
unsupervised_loss, span_pred, sentence_pred, combined_pred = friss_model(sentence_ids, predicate_ids, arg0_ids, arg1_ids, 1)
unsupervised_loss, span_pred.shape, sentence_pred.shape, combined_pred.shape

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.predictions.bias', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


(tensor(2252.9429, grad_fn=<DivBackward0>),
 torch.Size([2, 14]),
 torch.Size([2, 14]),
 torch.Size([2, 14]))

# Train Model

In [68]:
import os
import numpy as np
import torch
import torch.nn as nn
from torch.optim.lr_scheduler import StepLR
from sklearn.metrics import f1_score, accuracy_score
import json
import csv
from tqdm import tqdm
import datetime
import math

def train(model, train_dataloader, test_dataloader, optimizer, loss_function, alpha=0.5, num_epochs=10, tau_min=1, tau_decay=0.95, device='cuda', save_path='../notebooks/'):
    # Create a unique directory for this training session
    timestamp = datetime.datetime.now().strftime('%Y%m%d_%H%M%S')
    save_dir = os.path.join(save_path, f'custom_training_session_{timestamp}')
    os.makedirs(save_dir, exist_ok=True)
    
    print(f"Create save directory: {save_dir}")

    # Save model settings
    settings_path = os.path.join(save_dir, 'model_settings.json')
    with open(settings_path, 'w') as f:
        json.dump({
            'alpha': alpha,
            'num_epochs': num_epochs,
            'tau_min': tau_min,
            'tau_decay': tau_decay,
        }, f, indent=4)
    
    tau = 1
    scheduler = StepLR(optimizer, step_size=2, gamma=0.1)
    global_steps = 0

    metrics = {
        'epoch': [],
        'f1_span_micro': [],
        'f1_sentence_micro': [],
        'f1_combined_micro': [],
        'f1_span_macro': [],
        'f1_sentence_macro': [],
        'f1_combined_macro': [],
        'tau': [],
        'lr': []
    }



    for epoch in tqdm(range(num_epochs), desc="Epochs"):
        model.train()

        # init loss
        total_loss = 0
        supervised_total_loss = 0
        unsupervised_total_loss = 0

        local_steps = 0
        
        batch_progress = tqdm(enumerate(train_dataloader), total=len(train_dataloader), desc="Batches", leave=False)
        for batch_idx, batch in batch_progress:   
            global_steps += 1
            if global_steps % 50 == 0:
                tau = max(tau_min, math.exp(-tau_decay * global_steps))

            local_steps += 1

            optimizer.zero_grad()

            sentence_ids = batch['sentence_ids'].to(device)
            predicate_ids = batch['predicate_ids'].to(device)
            arg0_ids = batch['arg0_ids'].to(device)
            arg1_ids = batch['arg1_ids'].to(device)
            labels = batch['labels'].to(device)

            unsupervised_loss, span_logits, sentence_logits, _ = model(sentence_ids, predicate_ids, arg0_ids, arg1_ids, tau)
                    
            span_loss = 0.0
            sentence_loss = 0.0

            span_loss = loss_function(span_logits, labels.float())       
            sentence_loss = loss_function(sentence_logits, labels.float())
            
            supervised_loss = span_loss + sentence_loss
            
            combined_loss = alpha * supervised_loss + (1-alpha) * unsupervised_loss
            
            if torch.isnan(combined_loss):
                print(f"NaN loss detected at epoch {epoch+1}, batch {batch_idx+1}. Stopping...")
                return
        
            combined_loss.backward()
            
            torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)
            
            # After the backward pass
            if any(p.grad is not None and torch.isnan(p.grad).any() for p in model.parameters()):
                print(f"NaN gradients detected at epoch {epoch+1}, batch {batch_idx+1}. Stopping...")
                return
            
            optimizer.step()

            total_loss += combined_loss.item()
            supervised_total_loss += supervised_loss.item()
            unsupervised_total_loss += unsupervised_loss.item()

            batch_progress.set_description(f"Epoch {epoch+1} ({local_steps}) Total Loss: {combined_loss.item():.3f}, Span: {span_loss:.3f}, Sentence: {sentence_loss:.3f}, Supervised: {supervised_loss.item():.3f}, Unsupervised: {unsupervised_loss.item():.3f}")
                        
            # Explicitly delete tensors to free up memory
            del sentence_ids, predicate_ids, arg0_ids, arg1_ids, labels, unsupervised_loss
            torch.cuda.empty_cache()

        print(f"Epoch {epoch+1}/{num_epochs}, Combined Loss: {total_loss/len(train_dataloader)}, Supervised Loss: {supervised_total_loss/len(train_dataloader)}, Unsupervised Loss: {unsupervised_total_loss/len(train_dataloader)}")
        
        model.eval()
        
        span_preds = []
        sentence_preds = []
        combined_preds = []
        all_labels = []

        with torch.no_grad():
            for batch in test_dataloader:
                sentence_ids = batch['sentence_ids'].to(device)
                predicate_ids = batch['predicate_ids'].to(device)
                arg0_ids = batch['arg0_ids'].to(device)
                arg1_ids = batch['arg1_ids'].to(device)
                labels = batch['labels'].to(device)
                
                _, span_logits, sentence_logits, combined_logits = model(sentence_ids, predicate_ids, arg0_ids, arg1_ids, tau)

                span_pred = (torch.sigmoid(span_logits) > 0.5).int()
                sentence_pred = (torch.sigmoid(sentence_logits) > 0.5).int()
                combined_pred = (torch.sigmoid(combined_logits) > 0.5).int()
                
                span_preds.append(span_pred.cpu().numpy())
                sentence_preds.append(sentence_pred.cpu().numpy())
                combined_preds.append(combined_pred.cpu().numpy())

                all_labels.append(labels.cpu().numpy())

                # Explicitly delete tensors to free up memory
                del sentence_ids, predicate_ids, arg0_ids, arg1_ids, labels, span_logits, sentence_logits, sentence_pred
                torch.cuda.empty_cache()

        all_span_preds = np.vstack(span_preds)
        all_sentence_preds = np.vstack(sentence_preds)
        all_combined_preds = np.vstack(combined_preds)
        all_labels = np.vstack(all_labels)

        # Compute metrics
        f1_span_micro = f1_score(all_labels, all_span_preds, average='micro', zero_division=0)
        f1_sentence_micro = f1_score(all_labels, all_sentence_preds, average='micro', zero_division=0)
        f1_combined_micro = f1_score(all_labels, all_combined_preds, average='micro', zero_division=0)

        f1_span_macro = f1_score(all_labels, all_span_preds, average='macro', zero_division=0)
        f1_sentence_macro = f1_score(all_labels, all_sentence_preds, average='macro', zero_division=0)
        f1_combined_macro = f1_score(all_labels, all_combined_preds, average='macro', zero_division=0)

        print(f"Validation Metrics - micro F1 - Span: {f1_span_micro:.2f}, Sentence: {f1_sentence_micro:.2f}, Combined: {f1_combined_micro:.2f}, macro F1 - Span: {f1_span_macro:.2f}, Sentence: {f1_sentence_macro:.2f}, Combined: {f1_combined_macro:.2f}")

        # Update metrics dictionary
        metrics['epoch'].append(epoch + 1)
        metrics['f1_span_micro'].append(f1_span_micro)
        metrics['f1_sentence_micro'].append(f1_sentence_micro)
        metrics['f1_combined_micro'].append(f1_combined_micro)
        metrics['f1_span_macro'].append(f1_span_macro)
        metrics['f1_sentence_macro'].append(f1_sentence_macro)
        metrics['f1_combined_macro'].append(f1_combined_macro)
        metrics['tau'].append(tau)
        metrics['lr'].append(optimizer.param_groups[0]['lr'])

        # Save metrics after each validation run
        metrics_save_path = os.path.join(save_dir, 'metrics.json')
        with open(metrics_save_path, 'w') as f:
            json.dump(metrics, f, indent=4)
        
        # Save the model every 5 epochs
        if (epoch + 1) % 5 == 0:
            model_checkpoint_path = os.path.join(save_dir, f'model_checkpoint_epoch_{epoch + 1}.pth')
            torch.save(model.state_dict(), model_checkpoint_path)
            print(f"Model checkpoint saved to {model_checkpoint_path}")

        
        scheduler.step()

    return metrics

# Dataset

In [72]:
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')

batch_size = 8

max_sentences_per_article=32
max_sentence_length=64
max_arg_length=12

train_dataset, test_dataset, train_dataloader, test_dataloader = get_datasets_dataloaders(X, y, tokenizer, recalculate_srl=False, batch_size=batch_size, max_sentences_per_article=max_sentences_per_article, max_sentence_length=max_sentence_length, max_arg_length=max_arg_length, pickle_path="notebooks/X_srl_full.pkl")

Load SRL from Pickle
                                           Class  Train Distribution (%)  \
0                                       Morality               47.938144   
1                           Security_and_defense               42.783505   
2             Policy_prescription_and_evaluation               14.690722   
3   Legality_Constitutionality_and_jurisprudence               46.391753   
4                                       Economic                6.185567   
5                                      Political               54.381443   
6                           Crime_and_punishment               53.350515   
7             External_regulation_and_reputation               26.804124   
8                                 Public_opinion                5.670103   
9                          Fairness_and_equality               27.577320   
10                        Capacity_and_resources                6.443299   
11                               Quality_of_life               20.3

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


# Train

In [73]:
def get_friss_model(embedding_dim, D_h, lambda_orthogonality, M, t, num_sentences, K, num_frames, dropout_prob, bert_model_name="bert-base-uncased", load=True, path="", device='cuda'):
    """
    Loads the weights into an instance of the model class from the given path.
    
    Args:
    - model_class (torch.nn.Module): The class of the model (uninitialized).
    - path (str): Path to the saved weights.
    - device (str): Device to load the model on ('cpu' or 'cuda').
    
    Returns:
    - model (torch.nn.Module): Model with weights loaded.
    """

    # Model instantiation
    model = FRISS(embedding_dim, D_h, lambda_orthogonality, M, t, num_sentences, K, num_frames, dropout_prob=dropout_prob, bert_model_name=bert_model_name)
    model = model.to(device)
    
    if load:
        assert path != ""
        model.load_state_dict(torch.load(path, map_location=device))
    
    #model.eval()
    return model

In [74]:
torch.set_printoptions(profile="full")

import torch.optim as optim

# Hyperparameters
embedding_dim = 768
num_frames = 14

D_h = 768
lambda_orthogonality = 1e-3

K = 14
t = 14
M = 14
tau_min = 0.5
tau_decay = 5e-4

dropout_prob = 0.3

friss_model_path = "models/model1.pth"
bert_model_path = "bert-base-uncased"

# Model instantiation
model = get_friss_model(embedding_dim, 
                        D_h, 
                        lambda_orthogonality, 
                        M, 
                        t, 
                        num_sentences, 
                        K, 
                        num_frames, 
                        dropout_prob=dropout_prob,
                        bert_model_name=bert_model_path,
                        load=False,
                        path=friss_model_path)

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

# LOSS

# Compute the `weight` parameter for each label
label_frequencies = y.mean()
weights = 1 / (label_frequencies + 1e-10)  # Adding a small value to avoid division by zero

# Compute the `pos_weight` parameter
pos_weights = (1 - label_frequencies) / (label_frequencies + 1e-10)

# Convert the computed weights and pos_weights to PyTorch tensors
weights_tensor = torch.tensor(weights.values, dtype=torch.float32).to(device)
pos_weights_tensor = torch.tensor(pos_weights.values, dtype=torch.float32).to(device)

loss_function = nn.BCEWithLogitsLoss(weight=weights_tensor, pos_weight=pos_weights_tensor, reduction="mean")
optimizer = optim.AdamW(model.parameters(), lr=1e-5)

# Train the model
alpha_value = 0.5
num_epochs_value = 50

save_path = "models/"

metrics = train(model, 
                train_dataloader, 
                test_dataloader, 
                optimizer, 
                loss_function, 
                tau_min=tau_min,
                tau_decay=tau_decay,
                alpha=alpha_value, 
                num_epochs=num_epochs_value, 
                device=device, 
                save_path=save_path)

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.predictions.bias', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Create save directory: models/custom_training_session_20231128_151337


Epochs:   0%|          | 0/50 [00:00<?, ?it/s]

Batches:   0%|          | 0/48 [00:00<?, ?it/s]

Epoch 1/50, Combined Loss: 293.4441833496094, Supervised Loss: 17.337329347928364, Unsupervised Loss: 569.5510393778483
Validation Metrics - micro F1 - Span: 0.41, Sentence: 0.38, Combined: 0.41, macro F1 - Span: 0.38, Sentence: 0.26, Combined: 0.38


Batches:   0%|          | 0/48 [00:00<?, ?it/s]

Epoch 2/50, Combined Loss: 284.18413670857746, Supervised Loss: 17.281923433144886, Unsupervised Loss: 551.0863571166992
Validation Metrics - micro F1 - Span: 0.41, Sentence: 0.41, Combined: 0.41, macro F1 - Span: 0.38, Sentence: 0.32, Combined: 0.38


Batches:   0%|          | 0/48 [00:00<?, ?it/s]

Epoch 3/50, Combined Loss: 278.5714111328125, Supervised Loss: 17.156801740328472, Unsupervised Loss: 539.986021677653
Validation Metrics - micro F1 - Span: 0.41, Sentence: 0.40, Combined: 0.41, macro F1 - Span: 0.38, Sentence: 0.32, Combined: 0.38


Batches:   0%|          | 0/48 [00:00<?, ?it/s]

Epoch 4/50, Combined Loss: 277.0980110168457, Supervised Loss: 17.108194013436634, Unsupervised Loss: 537.087828318278
Validation Metrics - micro F1 - Span: 0.41, Sentence: 0.41, Combined: 0.41, macro F1 - Span: 0.38, Sentence: 0.32, Combined: 0.38


Batches:   0%|          | 0/48 [00:00<?, ?it/s]

Epoch 5/50, Combined Loss: 276.274907430013, Supervised Loss: 17.214106281598408, Unsupervised Loss: 535.3357098897299
Validation Metrics - micro F1 - Span: 0.41, Sentence: 0.41, Combined: 0.41, macro F1 - Span: 0.38, Sentence: 0.32, Combined: 0.38
Model checkpoint saved to models/custom_training_session_20231128_151337/model_checkpoint_epoch_5.pth


Batches:   0%|          | 0/48 [00:00<?, ?it/s]

Epoch 6/50, Combined Loss: 275.5844046274821, Supervised Loss: 17.19665066401164, Unsupervised Loss: 533.9721539815267
Validation Metrics - micro F1 - Span: 0.41, Sentence: 0.41, Combined: 0.41, macro F1 - Span: 0.38, Sentence: 0.32, Combined: 0.38


Batches:   0%|          | 0/48 [00:00<?, ?it/s]

Epoch 7/50, Combined Loss: 275.09033457438153, Supervised Loss: 17.186425030231476, Unsupervised Loss: 532.9942468007406
Validation Metrics - micro F1 - Span: 0.41, Sentence: 0.41, Combined: 0.41, macro F1 - Span: 0.38, Sentence: 0.32, Combined: 0.38


Batches:   0%|          | 0/48 [00:00<?, ?it/s]

Epoch 8/50, Combined Loss: 274.2503267923991, Supervised Loss: 17.185712893803913, Unsupervised Loss: 531.31494140625
Validation Metrics - micro F1 - Span: 0.41, Sentence: 0.41, Combined: 0.41, macro F1 - Span: 0.38, Sentence: 0.32, Combined: 0.38


Batches:   0%|          | 0/48 [00:00<?, ?it/s]

Epoch 9/50, Combined Loss: 273.3098335266113, Supervised Loss: 17.061111728350323, Unsupervised Loss: 529.5585517883301
Validation Metrics - micro F1 - Span: 0.41, Sentence: 0.41, Combined: 0.41, macro F1 - Span: 0.38, Sentence: 0.32, Combined: 0.38


Batches:   0%|          | 0/48 [00:00<?, ?it/s]

Epoch 10/50, Combined Loss: 273.07144991556805, Supervised Loss: 17.162190198898315, Unsupervised Loss: 528.9807084401449
Validation Metrics - micro F1 - Span: 0.41, Sentence: 0.41, Combined: 0.41, macro F1 - Span: 0.38, Sentence: 0.32, Combined: 0.38
Model checkpoint saved to models/custom_training_session_20231128_151337/model_checkpoint_epoch_10.pth


Batches:   0%|          | 0/48 [00:00<?, ?it/s]

Epoch 11/50, Combined Loss: 272.5150311787923, Supervised Loss: 17.169181009133656, Unsupervised Loss: 527.8608779907227
Validation Metrics - micro F1 - Span: 0.41, Sentence: 0.41, Combined: 0.41, macro F1 - Span: 0.38, Sentence: 0.32, Combined: 0.38


Batches:   0%|          | 0/48 [00:00<?, ?it/s]

Epoch 12/50, Combined Loss: 271.87237040201825, Supervised Loss: 17.200644274552662, Unsupervised Loss: 526.5440979003906
Validation Metrics - micro F1 - Span: 0.41, Sentence: 0.41, Combined: 0.41, macro F1 - Span: 0.38, Sentence: 0.32, Combined: 0.38


Batches:   0%|          | 0/48 [00:00<?, ?it/s]

KeyboardInterrupt: 

# Grid Search

In [None]:
from itertools import product
import torch.optim as optim
import csv

from tqdm.notebook import tqdm

# Hyperparameters
embedding_dim = 768

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

def grid_search(train_dataloader, test_dataloader, search_space, num_epochs=10):
    # Store the results for each hyperparameter combination
    results = {}

    # Fixed values for K and num_frames
    K = 14
    num_frames = 14

    # Fixed values for dropout_prob and bert_model_name (adjust if necessary)
    bert_model_name = "../notebooks/models/fine-tuned-model/"

    # Initialize the file to write metrics
    with open("../notebooks/grid_search_metrics.csv", "w", newline='') as csvfile:
        fieldnames = ['combination', 'alpha', 'lr', 'D_h', 'lambda_orthogonality', 'M', 't', 'tau_min', 'tau_decay', 'dropout_prob', 'epoch', 'f1_span_micro', 'f1_span_macro', 'f1_sentence_micro', 'f1_sentence_macro', 'f1_combined_micro', 'f1_combined_macro']
        writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
        writer.writeheader()

        # Calculate the total number of combinations
        total_combinations = 1
        for key, values in search_space.items():
            total_combinations *= len(values)

        # Loop through all combinations
        for idx, combination in enumerate(product(*search_space.values())):
            print(f"Training combination {idx + 1}/{total_combinations}: {combination}")

            # Extract hyperparameters from the current combination
            alpha, lr, tau_min, tau_decay, t, D_h, lambda_orthogonality, M, dropout_prob = combination

            # Initialize the model with current hyperparameters
            model = FRISS(embedding_dim, D_h, lambda_orthogonality, M, t, num_sentences, K, num_frames, dropout_prob, bert_model_name)
            model.to(device)
        
                
            # Compute the `weight` parameter for each label
            label_frequencies = y.mean()
            weights = 1 / (label_frequencies + 1e-10)  # Adding a small value to avoid division by zero

            # Compute the `pos_weight` parameter
            pos_weights = (1 - label_frequencies) / (label_frequencies + 1e-10)

            # Convert the computed weights and pos_weights to PyTorch tensors
            weights_tensor = torch.tensor(weights.values, dtype=torch.float32).to(device)
            pos_weights_tensor = torch.tensor(pos_weights.values, dtype=torch.float32).to(device)

            loss_function = nn.BCEWithLogitsLoss(weight=weights_tensor, pos_weight=pos_weights_tensor, reduction="mean")
        
            # Define the optimizer
            optimizer = optim.AdamW(model.parameters(), lr=lr)

            # Define loss_function if needed (add this if your train function requires it)

            # Train the model with the current hyperparameters
            epoch_metrics = train(model, train_dataloader, test_dataloader, optimizer, loss_function, alpha=alpha, num_epochs=num_epochs, tau_min=tau_min, tau_decay=tau_decay, device=device, save=False)

            # Write the metrics to the CSV file
            for epoch in range(num_epochs):
                f1_span_micro = epoch_metrics['f1_span_micro'][epoch]
                f1_span_macro = epoch_metrics['f1_span_macro'][epoch]
                f1_sentence_micro = epoch_metrics['f1_sentence_micro'][epoch]
                f1_sentence_macro = epoch_metrics['f1_sentence_macro'][epoch]
                f1_combined_micro = epoch_metrics['f1_combined_micro'][epoch]
                f1_combined_macro = epoch_metrics['f1_combined_macro'][epoch]
                row = {
                    'combination': idx,
                    'alpha': alpha,
                    'lr': lr,
                    'D_h': D_h,
                    'lambda_orthogonality': lambda_orthogonality,
                    'M': M,
                    't': t,
                    'tau_min': tau_min,
                    'tau_decay': tau_decay,
                    'dropout_prob': dropout_prob,
                    'epoch': epoch + 1,
                    'f1_span_micro': f1_span_micro,
                    'f1_span_macro': f1_span_macro,
                    'f1_sentence_micro': f1_sentence_micro,
                    'f1_sentence_macro': f1_sentence_macro,
                    'f1_combined_micro': f1_combined_micro,
                    'f1_combined_macro': f1_combined_macro
                }
                writer.writerow(row)
                csvfile.flush()

    return results

search_space = {
    'alpha': [0.5, 0.2, 0.8],
    'lr': [1e-5, 2e-5, 5e-4, 1e-3],
    'tau_min': [0.5],
    'tau_decay': [5e-4],
    't': [5, 8, 10, 20],
    'D_h': [768, 768 * 2, 768 // 2, 768 * 3],
    'lambda_orthogonality': [1e-6, 1e-4, 5e-4, 1e-3, 5e-3, 1e-2],
    'M': [5, 8, 10, 20],
    'dropout_rate': [0.1, 0.2, 0.3, 0.5]
}

# Call the grid search function
results = grid_search(train_dataloader, test_dataloader, search_space, 10)
results

# Test model

In [36]:
def load_model_from_path(path, embedding_dim, D_h, lambda_orthogonality, M, t, num_sentences, K, num_frames, dropout_prob, sentence_heads, srl_heads, device='cuda'):
    """
    Loads the weights into an instance of the model class from the given path.
    
    Args:
    - model_class (torch.nn.Module): The class of the model (uninitialized).
    - path (str): Path to the saved weights.
    - device (str): Device to load the model on ('cpu' or 'cuda').
    
    Returns:
    - model (torch.nn.Module): Model with weights loaded.
    """

    # Model instantiation
    model = FRISS(embedding_dim, D_h, lambda_orthogonality, M, t, num_sentences, K, num_frames, dropout_prob=dropout_prob, sentence_heads=sentence_heads, srl_heads=srl_heads)
    model = model.to(device)
    
    model.load_state_dict(torch.load(path, map_location=device))
    
    #model.eval()
    return model


In [37]:
# Hyperparameters
embedding_dim = 768
num_frames = 14

D_h = 768
lambda_orthogonality = 0.000001

K = 14
t = 5
M = 10
tau_min = 0.5
tau_decay = 5e-4

dropout_prob = 0.1

sentence_heads = 8
srl_heads = 7


model = load_model_from_path('models/model1.pth', embedding_dim, D_h, lambda_orthogonality, M, t, num_sentences, K, num_frames, dropout_prob, sentence_heads, srl_heads)

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.seq_relationship.weight', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.dense.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


In [38]:
def predict(model, dataloader, y_columns, device='cuda'):
    """
    Make predictions with the given model and dataloader.
    
    Args:
    - model (torch.nn.Module): The model to make predictions with.
    - dataloader (DataLoader): DataLoader for the dataset to predict on.
    - y_columns (pandas.Index): Column names from the y dataframe which correspond to labels.
    - device (str): Device to make predictions on ('cpu' or 'cuda').
    
    Returns:
    - predicted_labels (list of lists): List containing the predicted labels for each instance.
    """
    model.eval()
    all_preds_span = []
    
    with torch.no_grad():
        for batch in dataloader:
            # Move data to device
            sentence_ids = batch['sentence_ids'].to(device)
            predicate_ids = batch['predicate_ids'].to(device)
            arg0_ids = batch['arg0_ids'].to(device)
            arg1_ids = batch['arg1_ids'].to(device)
            
            # Forward pass
            _, span_logits, sentence_logits, combined_logits = model(sentence_ids, predicate_ids, arg0_ids, arg1_ids, 0.6)
            combined_pred = (torch.sigmoid(combined_logits) > 0.5).float()

            all_preds_span.append(combined_pred.cpu().numpy())
                
            torch.cuda.empty_cache()

    predictions = np.vstack(all_preds_span)
    
    # Convert boolean predictions to labels
    predicted_labels = []
    for pred in predictions:
        labels = list(y_columns[pred.astype(bool)])
        predicted_labels.append(labels)
    
    return predicted_labels


In [39]:
import numpy as np

# article813452859
article = """Sadiq Khan Slammed for Pro-EU 'Message of Support' During Firework Display

The spectacular fireworks that lit up the London sky on Monday night caused a stir on social media over the display's pro-EU message, at a time when the nation is divided over its looming withdrawal from the bloc.
London Mayor Sadiq Khan faced mounting criticism after the capital's New Year's Eve fireworks display, which celebrated ties with the European Union, left a bad taste in the mouths of some Brits.
The 135-metre-high London Eye was lit up in blue while its tubs turned yellow, with the giant Ferris wheel resembling the star-studded flag of the European Union.Sadiq Khan called his fireworks display a "message of support" to EU citizens living in London.
"Our one million EU citizens are Londoners, they make a huge contribution, and no matter the outcome of Brexit — they will always be welcome", he said.
To the one million EU citizens who have made our city your home: you are Londoners, you make a huge contribution and you are welcome here.
I'm proud that tonight we will welcome in the new year with a message of support to you.
#LondonNYE #LondonIsOpen https://t.co/XctrgfXXaM — Sadiq Khan (@SadiqKhan) 31 декабря 2018 г.
However, a host of Londoners rushed to Twitter to accuse their mayor of "politicising" the celebrations — with some are even calling for his resignation.
I cannot believe this event has been politicised.
This man has no shame.
Just resign.
— wayne campbell (@campbs177) 31 декабря 2018 г.
Thanks a lot Sadiq Khan you ruined the fireworks display by talking about Europe, need I remind you about Brexit.
You have started of the new year by talking about relationships with the European Union.
Well done.
We need Boris Johnson back.
— Mitchell T Cannon (@MitchellTCanno1) 1 января 2019 г.
Another shameless attempt at using party politics on what is supposed to be a happy occasion — droneguy (@shelbyguitars) 1 января 2019 г.
Politicising another innocent event that should be no different to anyone no matter who they are or where they are from!
Shameful!
!
— Mike Dyer (@Miked2372Mike) 31 декабря 2018 г.
Someone was stabbed down the road from me last night.
How about sorting that stuff out instead of politicizing something that should be fun for everyone?How many times does it have to be said.
Commenting on Brexit isn't your job.
— Peter Rockett (@rockettp) 31 декабря 2018 г.
The UK voted to leave the EU in June 2016 via a nationwide referendum, with 51.9 per cent voting in favour of pulling out of the bloc, while 48.1 per cent wanted to remain.
The withdrawal is scheduled for the end of March; the Article 50 deadline.
The Remain sentiment dominated London, with nearly 60 percent of voters wanting Britain to stay in the European Union.
Sadiq Khan, an outspoken Remainer himself, earlier called for a second referendum on Brexit.
"The government's abject failure — and the huge risk we face of a bad deal or a 'no deal' Brexit — means that giving people a fresh say is now the right — and only — approach left for our country," he said in September.
"""

test_article = get_article_dataloader(article, tokenizer)
predict(model, test_article, y.columns)

Output()

Downloading:   0%|          | 0.00/455k [00:00<?, ?B/s]

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.seq_relationship.weight', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.dense.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Spacy models 'en_core_web_sm' not found.  Downloading and installing.


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Collecting en-core-web-sm==3.3.0
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.3.0/en_core_web_sm-3.3.0-py3-none-any.whl (12.8 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 12.8/12.8 MB 75.2 MB/s eta 0:00:00
Installing collected packages: en-core-web-sm
Successfully installed en-core-web-sm-3.3.0




[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_sm')
Processed article 1/1


[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


[['Morality',
  'Security_and_defense',
  'Policy_prescription_and_evaluation',
  'Legality_Constitutionality_and_jurisprudence',
  'Political',
  'External_regulation_and_reputation']]

# Run test for validation

In [40]:
test_dataloader = get_test_dataloader(df_test["content"], tokenizer, batch_size=1)

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.seq_relationship.weight', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.dense.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Processed article 1/53
Processed article 2/53
Processed article 3/53
Processed article 4/53
Processed article 5/53
Processed article 6/53
Processed article 7/53
Processed article 8/53
Processed article 9/53
Processed article 10/53
Processed article 11/53
Processed article 12/53
Processed article 13/53
Processed article 14/53
Processed article 15/53
Processed article 16/53
Processed article 17/53
Processed article 18/53
Processed article 19/53
Processed article 20/53
Processed article 21/53
Processed article 22/53
Processed article 23/53
Processed article 24/53
Processed article 25/53
Processed article 26/53
Processed article 27/53
Processed article 28/53
Processed article 29/53
Processed article 30/53
Processed article 31/53
Processed article 32/53
Processed article 33/53
Processed article 34/53
Processed article 35/53
Processed article 36/53
Processed article 37/53
Processed article 38/53
Processed article 39/53
Processed article 40/53
Processed article 41/53
Processed article 42/53
P

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


In [41]:
predictions = predict(model, test_dataloader, y.columns)

In [42]:
df_preds = pd.DataFrame(predictions)

In [43]:
df_preds = pd.concat([df_test, df_preds], axis=1)

In [44]:
df_preds["pred_frames"] = df_preds.apply(lambda l: list([l[0], l[1], l[2], l[3], l[4], l[5]]), axis=1)

df_preds["pred_frames"] = df_preds["pred_frames"].apply(lambda l: ",".join([ f for f in l if f is not None]))

In [45]:
df_preds.to_csv("../notebooks/test.csv", sep="\t", index=False, columns=["article_id", "pred_frames"])

# Inspect dict

In [88]:
def inspect(model, dataloader, y_columns, device='cuda'):
    """
    Make predictions with the given model and dataloader.
    
    Args:
    - model (torch.nn.Module): The model to make predictions with.
    - dataloader (DataLoader): DataLoader for the dataset to predict on.
    - y_columns (pandas.Index): Column names from the y dataframe which correspond to labels.
    - device (str): Device to make predictions on ('cpu' or 'cuda').
    
    Returns:
    - predicted_labels (list of lists): List containing the predicted labels for each instance.
    """
    model.eval()
    
    all_preds_span = []
    
    # Initialize usage lists for each label
    num_labels = len(y_columns)
    all_used_labels_p = []
    all_used_labels_a0 = []
    all_used_labels_a1 = []
    
    with torch.no_grad():
        # use tqdm
        for batch in tqdm(dataloader, total=len(dataloader), desc="Batches", leave=False):
            used_labels_p = []
            used_labels_a0 = []
            used_labels_a1 = []
    
            # Move data to device
            sentence_ids = batch['sentence_ids'].to(device)
            predicate_ids = batch['predicate_ids'].to(device)
            arg0_ids = batch['arg0_ids'].to(device)
            arg1_ids = batch['arg1_ids'].to(device)
            
            sentence_embeddings, predicate_embeddings, arg0_embeddings, arg1_embeddings = model.aggregation(sentence_ids, predicate_ids, arg0_ids, arg1_ids)
            
            # Process each span
            for span_idx in range(sentence_embeddings.size(1)):
                s_sentence_span = sentence_embeddings[:, span_idx, :]
                v_p_span = predicate_embeddings[:, span_idx, :]
                v_a0_span = arg0_embeddings[:, span_idx, :]
                v_a1_span = arg1_embeddings[:, span_idx, :]
            
                #unsupervised.combined_autoencoder v_p, v_a0, v_a1, v_sentence, tau
                output = model.unsupervised.combined_autoencoder(v_p_span, v_a0_span, v_a1_span, s_sentence_span, 0.6)
                
                #print(output["p"]["g"].cpu().numpy())
                used_labels_p.append(output["p"]["g"].cpu().numpy())
                used_labels_a0.append(output["a0"]["g"].cpu().numpy())
                used_labels_a1.append(output["a1"]["g"].cpu().numpy())

            
            # Forward pass
            _, span_logits, sentence_logits, combined_logits = model(sentence_ids, predicate_ids, arg0_ids, arg1_ids, 0.6)
            combined_pred = (torch.sigmoid(combined_logits) > 0.5).float()

            all_preds_span.append(combined_pred.cpu().numpy())
                
            torch.cuda.empty_cache()
            
            all_used_labels_p.append(used_labels_p)
            all_used_labels_a0.append(used_labels_a0)
            all_used_labels_a1.append(used_labels_a1)

    predictions = np.vstack(all_preds_span)
    
    # Convert boolean predictions to labels
    predicted_labels = []
    for pred in predictions:
        labels = list(y_columns[pred.astype(bool)])
        predicted_labels.append(labels)
    
    return predicted_labels, all_used_labels_p, all_used_labels_a0, all_used_labels_a1

In [89]:
num_sentences = 32
batch_size = 1

train_dataset, test_dataset, train_dataloader, test_dataloader = get_datasets_dataloaders(X, y, tokenizer, recalculate_srl=False, batch_size=batch_size, max_sentences_per_article=num_sentences, max_sentence_length=64, max_arg_length=12, pickle_path="notebooks/X_srl_full.pkl")

Load SRL from Pickle
                                           Class  Train Distribution (%)  \
0                                       Morality               47.938144   
1                           Security_and_defense               42.783505   
2             Policy_prescription_and_evaluation               14.690722   
3   Legality_Constitutionality_and_jurisprudence               46.391753   
4                                       Economic                6.185567   
5                                      Political               54.381443   
6                           Crime_and_punishment               53.350515   
7             External_regulation_and_reputation               26.804124   
8                                 Public_opinion                5.670103   
9                          Fairness_and_equality               27.577320   
10                        Capacity_and_resources                6.443299   
11                               Quality_of_life               20.3

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


In [90]:
import numpy as np

loader = train_dataloader

predicted_labels, used_labels_p, used_labels_a0, used_labels_a1 = inspect(model, loader, y.columns)

Batches:   0%|          | 0/388 [00:00<?, ?it/s]

In [100]:
len(predicted_labels), len(used_labels_p), len(used_labels_a0), len(used_labels_a1)

(388, 388, 388, 388)

In [101]:
categories = list(y.columns)

category_lists_p = {category: {} for category in categories}
category_lists_a1 = {category: {} for category in categories}
category_lists_a0 = {category: {} for category in categories}

for batch_idx in range(len(loader.dataset)):
    # Iterate over each sentence
    ds = loader.dataset[batch_idx]

    for sentence_idx in range(len(used_labels_p[batch_idx])):

        # Update the lists for each category
        for cat_idx, category in enumerate(categories):
            
            # append the srls to the category lists using {probability: srl}

            category_lists_p[category][used_labels_p[batch_idx][cat_idx][0][cat_idx]] = ds["predicate_ids"][sentence_idx].numpy()
            category_lists_a0[category][used_labels_a0[batch_idx][cat_idx][0][cat_idx]] = ds["arg0_ids"][sentence_idx].numpy()
            category_lists_a1[category][used_labels_a1[batch_idx][cat_idx][0][cat_idx]] = ds["arg1_ids"][sentence_idx].numpy()

In [102]:
import nltk
nltk.download('stopwords')
from nltk.corpus import stopwords

from nltk.tokenize import word_tokenize

def decode_tokens(token_dict, stop_words):
    decoded_data = {}
    for category, token_tuple_lists in token_dict.items():
        decoded_data[category] = []

        # take top 10 srls for each category sorted by float key (probability)
        sorted_srls = {k: v for k, v in sorted(token_tuple_lists.items(), reverse=True)}

        top_srls = list(sorted_srls.values())[:20]

        for tokens in top_srls:
            # ensure that the tokens are not all 0 and not only 101 or 102
            if not np.all(np.isin(tokens, [0, 101, 102])):
                # Decode the tokens
                decoded_text = tokenizer.decode(tokens, skip_special_tokens=True).strip()

                # Tokenize and remove stop words
                words = word_tokenize(decoded_text)
                filtered_words = [word for word in words if word.lower() not in stop_words]

                # ensure that there are words left after removing stop words
                if len(filtered_words) > 0:
                    # Join the words back into a string
                    decoded_data[category].append(' '.join(filtered_words))

    return decoded_data

stop_words = set(stopwords.words('english'))  # Assuming your text is in English

# Decode the token IDs for each ARG
decoded_predicate = decode_tokens(category_lists_p, stop_words)
decoded_arg0 = decode_tokens(category_lists_a0, stop_words)
decoded_arg1 = decode_tokens(category_lists_a1, stop_words)

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


In [103]:
# Initialize a list to collect DataFrame rows
rows = []

# Populate the list with rows
for frame in set(decoded_predicate) | set(decoded_arg0) | set(decoded_arg1):
    # Get the lists, joining multiple words with a comma
    pred_words = ', '.join([ s.strip() for s in list(set(decoded_predicate.get(frame, []))) if s is not None or s != ""])
    arg0_words = ', '.join(list(set(decoded_arg0.get(frame, []))))
    arg1_words = ', '.join(list(set(decoded_arg1.get(frame, []))))

    # Create a dictionary for the row
    row = {
        "Frame": frame,
        "Predicate": pred_words,
        "ARG0": arg0_words,
        "ARG1": arg1_words
    }
    
    # Append the row dictionary to the rows list
    rows.append(row)

# Convert the list of rows to a DataFrame
df_full_table = pd.DataFrame(rows)

# Display the DataFrame
df_full_table.style.hide_index()

  df_full_table.style.hide_index()


Frame,Predicate,ARG0,ARG1
Fairness_and_equality,"said, wants, weasel, get, known, launched, born","gop investigators, situation, iran","police officers shortly shooting began ,, changes, `` find `` count votes remember, voice, relationship, fugitive, many potential nominees conservative record abortion ,, see u. s. power deployed alongside, innocence"
External_regulation_and_reputation,"wants, wore, shot, called, extradited","david horowitz freedom center, vatican secret police, hillary clinton project vital voices ’ , another women, leaders washington","synodal process, replace sessions attorney general someone, suspension press credentials stemmed “, attending saviour ’, dim view proposed korean nuclear accords, 100 russian individuals companies, every single piece information dr. christine b, election"
Political,"complain, go, try, decided, fired, keeping","desire give children something better, gorka ’, accuser, gates, rep. bob goodlatte, one source close white house, ruiz","class spent week half christianity, visits manafort, voice, see britain , cradle western democracy, trump falls , bring"
Crime_and_punishment,"trying, says, share, speak","oswald, powell, francis, us","see u. s. power deployed alongside, mastercard, information time working trump, kind corruption, spirit aggiornamento, examining military sites knows exists, campbell 's work, men women cover, election"
Legality_Constitutionality_and_jurisprudence,"activate, consider, spread, touts, known, taking","mcgill ’ bds campaigners, cia, bypassing class, people, trump, desire give children something better, leaders washington","story true, single witness could put kavanaugh, publicly available cure affidavits, way spin, every single piece information dr. christine b, many parents, classified e - mails"
Quality_of_life,"refused, declined, picking, dispensed, decided","ms. ford “, francis, political parties — anyone else —","like whoever sent cure affidavits, alleged assault, article, effect, outrage behavior gangs"
Security_and_defense,"lays, says, learn, refused, activate, tweeted, see, requiring, posted","cia, oswald","fence - sitting senators, changes, whatever criticism, single witness could put kavanaugh, thrown window dealing, surreality day, right friends, issuing uniforms decorated skulls lightning bolts, testify, initial assertions kavanaugh"
Morality,"said, tweeted, committed, gave, think, dispensed, see","claiming happened jews alone, us, ford “, bradley manning, gates, judges, trump","fence - sitting senators, monitoring page, slander man 's character something, whine new york times, vest qanon slogan, initial assertions kavanaugh, convert $ 5 . 7 billion omai, penalty, outrage behavior gangs"
Capacity_and_resources,"said, born, banned, listening, activate, spread, published, left","wikileaks, white house press secretary sarah sanders, us, political correctness, organizations","police officers shortly shooting began ,, nothing bureaucratic responses : shared “ formation, issuing uniforms decorated skulls lightning bolts, new air strikes gun positions syria, every single piece information dr. christine b, outrage behavior gangs, men women cover"
Policy_prescription_and_evaluation,"said, make, wore, reading, posted","religious houses, new yorker, organizer weekend ’ london yellow vest","life catholics, plan, account inventory , including one server, exemplary measure taken cardinal, like whoever sent cure affidavits"
