# Semeval 2025 Task 10
### Subtask 2: Narrative Classification

Given a news article and a [two-level taxonomy of narrative labels](https://propaganda.math.unipd.it/semeval2025task10/NARRATIVE-TAXONOMIES.pdf) (where each narrative is subdivided into subnarratives) from a particular domain, assign to the article all the appropriate subnarrative labels. This is a multi-label multi-class document classification task.

In [1]:
random_state=None

In [2]:
import torch
import numpy as np
import random

if random_state:
    print('[WARNING] Setting random state')
    torch.manual_seed(random_state)
    np.random.seed(random_state) 
    random.seed(random_state)

## Continual Learning

As of current, we were using all multilingual training data (Russian, Bulgarian, Portuguese, Hindi, and English) at once, mixing it together during training, and then evaluating specifically on English validation data. However, since our final evaluation is language-target based we can leverage a sequential training of langauges (Russian -> Bulgarian -> Portuguese -> Hindi -> English) in order to aim for better results.

This is also similar to how we as humans might learn languages. We start with one then move to another one while maintaining knowledge of the past ones.

This way might help our learning on identifying different useful patterns per language that could later help a specific language classification. For example:
* Russian articles might help learn certain propaganda patterns.
* Bulgarian articles might contribute different narrative structures.
* Each language adds its own unique perspective to the model's understanding, the model get's this knoweledge sequentially.

In [3]:
import pickle
import os
import pandas as pd

root_dir = "../../"
base_save_folder_dir = '../saved/'
dataset_folder = os.path.join(base_save_folder_dir, 'Dataset')

with open(os.path.join(dataset_folder, 'dataset_train_cleaned.pkl'), 'rb') as f:
    dataset_train = pickle.load(f)

In [4]:
dataset_train.head()

Unnamed: 0,language,article_id,content,narratives,subnarratives,narratives_encoded,subnarratives_encoded,aggregated_subnarratives
0,RU,RU-URW-1161.txt,<PARA>в ближайшие два месяца сша будут стремит...,[URW: Blaming the war on others rather than th...,"[The West are the aggressors, Other, The West ...","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, ...","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[[0, 1, 0, 0, 0], [1, 0, 0], [1, 0, 0, 0], [1,..."
1,RU,RU-URW-1175.txt,<PARA>в ес испугались последствий популярности...,"[URW: Discrediting the West, Diplomacy, URW: D...","[The West is weak, Other, The EU is divided]","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, ...","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[[0, 1, 0, 0, 0], [1, 0, 0], [1, 0, 0, 0], [1,..."
2,RU,RU-URW-1149.txt,<PARA>возможность признания аллы пугачевой ино...,[URW: Distrust towards Media],[Western media is an instrument of propaganda],"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[[0, 0, 0, 0, 0], [0, 0, 0], [0, 0, 0, 0], [0,..."
3,RU,RU-URW-1015.txt,<PARA>азаров рассказал о смене риторики киева ...,"[URW: Discrediting Ukraine, URW: Discrediting ...","[Ukraine is a puppet of the West, Discrediting...","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, ...","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[[0, 0, 0, 0, 0], [0, 0, 0], [0, 0, 0, 0], [0,..."
4,RU,RU-URW-1001.txt,<PARA>в россиянах проснулась массовая любовь к...,[URW: Praise of Russia],[Russia is a guarantor of peace and prosperity],"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[[0, 0, 0, 0, 0], [0, 0, 0], [0, 0, 0, 0], [0,..."


In [5]:
misc_folder = os.path.join(base_save_folder_dir, 'Misc')

with open(os.path.join(misc_folder, 'narrative_to_subnarratives.pkl'), 'rb') as f:
    narrative_to_subnarratives = pickle.load(f)

In [6]:
with open(os.path.join(misc_folder, 'narrative_to_subnarratives_map.pkl'), 'rb') as f:
    narrative_to_sub_map = pickle.load(f)

In [7]:
with open(os.path.join(misc_folder, 'coarse_classes.pkl'), 'rb') as f:
    coarse_classes = pickle.load(f)

with open(os.path.join(misc_folder, 'fine_classes.pkl'), 'rb') as f:
    fine_classes = pickle.load(f)

with open(os.path.join(misc_folder, 'narrative_order.pkl'), 'rb') as f:
    narrative_order = pickle.load(f)

In [8]:
dataset_train.shape

(1781, 8)

In [9]:
label_encoder_folder = os.path.join(base_save_folder_dir, 'LabelEncoders')

with open(os.path.join(label_encoder_folder, 'mlb_narratives.pkl'), 'rb') as f:
    mlb_narratives = pickle.load(f)

with open(os.path.join(label_encoder_folder, 'mlb_subnarratives.pkl'), 'rb') as f:
    mlb_subnarratives = pickle.load(f)

In [10]:
import numpy as np

embeddings_folder = os.path.join(base_save_folder_dir, 'Embeddings/embeddings_train_stella.npy')

def load_embeddings(filename):
    return np.load(filename)

train_embeddings = load_embeddings(embeddings_folder)

In [11]:
train_embeddings.shape

(1781, 1024)

In [12]:
with open(os.path.join(dataset_folder, 'dataset_val_cleaned.pkl'), 'rb') as f:
    dataset_val = pickle.load(f)

In [13]:
dataset_val.shape

(178, 8)

In [14]:
dataset_val.head()

Unnamed: 0,language,article_id,content,narratives,subnarratives,narratives_encoded,subnarratives_encoded,aggregated_subnarratives
0,RU,RU-URW-1014.txt,<PARA>алаудинов: российские силы растянули и р...,[URW: Praise of Russia],[Praise of Russian military might],"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[[0, 0, 0, 0, 0], [0, 0, 0], [0, 0, 0, 0], [0,..."
1,RU,RU-URW-1174.txt,<PARA>других сценариев нет. никаких переговоро...,"[URW: Speculating war outcomes, URW: Discredit...","[Ukrainian army is collapsing, Discrediting Uk...","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, ...","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[[0, 1, 0, 0, 0], [1, 0, 0], [1, 0, 0, 0], [1,..."
2,RU,RU-URW-1166.txt,<PARA>попытка запада изолировать путина провал...,"[URW: Praise of Russia, URW: Distrust towards ...","[Praise of Russian President Vladimir Putin, W...","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, ...","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[[0, 0, 0, 0, 0], [0, 0, 0], [0, 0, 0, 0], [0,..."
3,RU,RU-URW-1170.txt,<PARA>часть территории украины войдет в состав...,"[URW: Discrediting Ukraine, URW: Speculating w...",[Discrediting Ukrainian government and officia...,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, ...","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[[0, 1, 0, 0, 0], [1, 0, 0], [1, 0, 0, 0], [1,..."
4,RU,RU-URW-1004.txt,<PARA>зеленскому не очень понравилась идея о в...,"[URW: Discrediting Ukraine, URW: Discrediting ...",[Discrediting Ukrainian government and officia...,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, ...","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[[0, 0, 0, 0, 0], [0, 0, 0], [0, 0, 0, 0], [0,..."


In [15]:
embeddings_folder = os.path.join(base_save_folder_dir, 'Embeddings/embeddings_dev_stella.npy')

val_embeddings = load_embeddings(embeddings_folder)

In [16]:
def filter_dataset_and_embeddings(dataset, embeddings, condition_fn):
    filtered_indices = dataset.index[dataset.apply(condition_fn, axis=1)].tolist()
    
    filtered_dataset = dataset.loc[filtered_indices]
    filtered_embeddings = embeddings[filtered_indices]

    return filtered_dataset, filtered_embeddings

In [17]:
dataset_val, val_embeddings = filter_dataset_and_embeddings(
        dataset_val, val_embeddings, lambda row: row["language"] == "EN"
    )

In [18]:
dataset_val.shape

(41, 8)

In [19]:
val_embeddings.shape

(41, 1024)

In [20]:
dataset_train.shape

(1781, 8)

In [21]:
train_embeddings.shape

(1781, 1024)

In [22]:
dataset_train['aggregated_subnarratives']

0       [[0, 1, 0, 0, 0], [1, 0, 0], [1, 0, 0, 0], [1,...
1       [[0, 1, 0, 0, 0], [1, 0, 0], [1, 0, 0, 0], [1,...
2       [[0, 0, 0, 0, 0], [0, 0, 0], [0, 0, 0, 0], [0,...
3       [[0, 0, 0, 0, 0], [0, 0, 0], [0, 0, 0, 0], [0,...
4       [[0, 0, 0, 0, 0], [0, 0, 0], [0, 0, 0, 0], [0,...
                              ...                        
1776    [[0, 0, 0, 0, 0], [0, 0, 0], [0, 0, 0, 0], [0,...
1777    [[0, 1, 0, 0, 0], [1, 0, 0], [1, 0, 0, 0], [1,...
1778    [[0, 1, 0, 0, 0], [1, 0, 0], [1, 0, 0, 0], [1,...
1779    [[0, 0, 0, 0, 0], [0, 0, 0], [0, 0, 0, 0], [0,...
1780    [[1, 0, 0, 0, 0], [0, 0, 0], [0, 0, 0, 0], [0,...
Name: aggregated_subnarratives, Length: 1781, dtype: object

In [23]:
import torch

prefer_cpu=True

# Get cpu, gpu or mps device for training.
device = (
    "cuda"
    if torch.cuda.is_available()
    else "mps"
    if torch.backends.mps.is_available() and not prefer_cpu
    else "cpu"
)
print(f"Using {device} device")

Using cpu device


In [24]:
y_train_sub_heads = dataset_train['aggregated_subnarratives'].to_numpy()
y_val_sub_heads = dataset_val['aggregated_subnarratives'].to_numpy()

In [25]:
dataset_train['language'].unique()

array(['RU', 'PT', 'BG', 'HI', 'EN'], dtype=object)

In [26]:
import numpy as np

def custom_shuffling(data, embeddings, random_state=None):
    if random_state is not None:
        np.random.seed(random_state)
    
    shuffled_indices = np.arange(len(data))
    np.random.shuffle(shuffled_indices)
    
    data = data.iloc[shuffled_indices].reset_index(drop=True)
    embeddings = embeddings[shuffled_indices]

    return data, embeddings

In [27]:
narrative_order

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21]

In [28]:
y_train_sub_heads = dataset_train['aggregated_subnarratives'].to_numpy()
y_val_sub_heads = dataset_val['aggregated_subnarratives'].to_numpy()

In [29]:
dataset_train['language'].unique()

array(['RU', 'PT', 'BG', 'HI', 'EN'], dtype=object)

In [30]:

import torch.nn as nn

class BaseClassifier(nn.Module):
    def __init__(self, input_size):
        super().__init__()
        self.input_size = input_size
        self.input_shape = (1, input_size)
        self.model_name = "base_model" 
        
    def visualize(self, filepath=None):
        if filepath is None:
            filepath = f'./visualizations/{self.model_name}.onnx'
        
        dummy_input = torch.randn(self.input_shape)
        torch.onnx.export(
            self,
            dummy_input,
            filepath,
            export_params=True,
            opset_version=11,
            do_constant_folding=True,
            input_names=['input'],
            output_names=['narrative_output', 'subnarrative_outputs'],
        )
        
        print(f"Model exported to {filepath}")
        print("You can visualize this using Netron: https://netron.app/")

    def forward(self, x):
        raise NotImplementedError("Forward method must be implemented by subclasses")

In [31]:
import torch.nn.functional as F

class MultiTaskClassifierMultiHead(BaseClassifier):
    def __init__(
        self,
        input_size,
        hidden_size=1024,
        num_narratives=len(mlb_narratives.classes_),
        narrative_to_sub_map=narrative_to_sub_map,
        dropout_rate=0.4,
        model_name="MultiTaskClassifierMultiHead" 
    ):
        super().__init__(input_size)
        self.model_name = model_name 
        
        self.shared_layer = nn.Sequential(
            nn.Linear(input_size, hidden_size * 2),
            nn.BatchNorm1d(hidden_size * 2),
            nn.ReLU(),
            nn.Dropout(dropout_rate)
        )

        self.narrative_head = nn.Sequential(
            nn.Linear(hidden_size * 2, num_narratives),
            nn.Sigmoid()
        )

        self.subnarrative_heads = nn.ModuleDict()
        for narr_idx, sub_indices in narrative_to_sub_map.items():
            num_subs_for_this_narr = len(sub_indices)
            self.subnarrative_heads[str(narr_idx)] = nn.Sequential(
                nn.Linear(hidden_size * 2, num_subs_for_this_narr),
                nn.Sigmoid()
            )

    def forward(self, x):
        shared_out = self.shared_layer(x)
        narr_probs = self.narrative_head(shared_out)

        sub_probs_dict = {}
        for narr_idx, head in self.subnarrative_heads.items():
            sub_probs_dict[narr_idx] = head(shared_out)

        return narr_probs, sub_probs_dict


In [32]:
network_params = {
    'lr': 0.001,
    'hidden_size': 2048,
    'dropout': 0.4,
    'patience': 10
}

In [33]:
y_train_nar = dataset_train['narratives_encoded'].tolist()
y_val_nar = dataset_val['narratives_encoded'].tolist()

y_train_sub_nar = dataset_train['subnarratives_encoded'].tolist()
y_val_sub_nar = dataset_val['subnarratives_encoded'].tolist()

In [34]:
y_train_nar = torch.tensor(y_train_nar, dtype=torch.float32).to(device)
y_train_sub_nar = torch.tensor(y_train_sub_nar, dtype=torch.float32).to(device)

y_val_nar = torch.tensor(y_val_nar, dtype=torch.float32).to(device)
y_val_sub_nar = torch.tensor(y_val_sub_nar, dtype=torch.float32).to(device)

In [35]:
train_embeddings_tensor = torch.tensor(train_embeddings, dtype=torch.float32).to(device)
val_embeddings_tensor = torch.tensor(val_embeddings, dtype=torch.float32).to(device)

In [36]:
input_size = train_embeddings_tensor.shape[1]
print(input_size)

1024


In [37]:
model_multi_head = MultiTaskClassifierMultiHead(
    input_size=input_size,
    hidden_size=network_params['hidden_size'],
).to(device)

In [38]:

def compute_class_weights(y_train):
    total_samples = y_train.shape[0]
    class_weights = []
    for label in range(y_train.shape[1]):
        pos_count = y_train[:, label].sum().item()
        neg_count = total_samples - pos_count
        pos_weight = total_samples / (2 * pos_count) if pos_count > 0 else 0
        neg_weight = total_samples / (2 * neg_count) if neg_count > 0 else 0
        class_weights.append((pos_weight, neg_weight))
    return class_weights

class WeightedBCELoss(nn.Module):
    def __init__(self, class_weights):
        super().__init__()
        self.class_weights = class_weights

    def forward(self, probs, targets):
        bce_loss = 0
        epsilon = 1e-7
        for i, (pos_weight, neg_weight) in enumerate(self.class_weights):
            prob = probs[:, i]
            bce = -pos_weight * targets[:, i] * torch.log(prob + epsilon) - \
                  neg_weight * (1 - targets[:, i]) * torch.log(1 - prob + epsilon)
            bce_loss += bce.mean()
        return bce_loss / len(self.class_weights)

In [39]:
loss_params = {
    'sub_weight': 0.3,
    'condition_weight': 0.3
}

In [40]:
class MultiHeadLoss(nn.Module):
    def __init__(self, narrative_criterion, sub_criterion_dict, 
                 condition_weight=loss_params['condition_weight'],
                 sub_weight=loss_params['sub_weight']):
        
        super().__init__()
        self.narrative_criterion = narrative_criterion
        self.sub_criterion_dict = sub_criterion_dict
        self.condition_weight = condition_weight
        self.sub_weight = sub_weight
        
    def forward(self, narr_probs, sub_probs_dict, y_narr, y_sub_heads):
        narr_loss = self.narrative_criterion(narr_probs, y_narr)
        sub_loss = 0.0
        condition_loss = 0.0
        
        for narr_idx_str, sub_probs in sub_probs_dict.items():
            narr_idx = int(narr_idx_str)
            y_sub = [row[narr_idx] for row in y_sub_heads]
            y_sub_tensor = torch.tensor(y_sub, dtype=torch.float32, device=sub_probs.device)
            
            sub_loss_func = self.sub_criterion_dict[narr_idx_str]
            sub_loss += sub_loss_func(sub_probs, y_sub_tensor)
            
            narr_pred = narr_probs[:, narr_idx].unsqueeze(1)
            condition_term = torch.mean(
                torch.abs(sub_probs * (1 - narr_pred)) + 
                narr_pred * torch.abs(sub_probs - y_sub_tensor.unsqueeze(1))
            )
            condition_loss += condition_term
            
        sub_loss = sub_loss / len(sub_probs_dict)
        condition_loss = condition_loss / len(sub_probs_dict)
        
        total_loss = (1 - self.sub_weight) * narr_loss + \
                    self.sub_weight * sub_loss + \
                    self.condition_weight * condition_loss
        
        return total_loss

In [41]:
coarse_classes

['CC: Amplifying Climate Fears',
 'CC: Climate change is beneficial',
 'CC: Controversy about green technologies',
 'CC: Criticism of climate movement',
 'CC: Criticism of climate policies',
 'CC: Criticism of institutions and authorities',
 'CC: Downplaying climate change',
 'CC: Green policies are geopolitical instruments',
 'CC: Hidden plots by secret schemes of powerful groups',
 'CC: Questioning the measurements and science',
 'Other',
 'URW: Amplifying war-related fears',
 'URW: Blaming the war on others rather than the invader',
 'URW: Discrediting Ukraine',
 'URW: Discrediting the West, Diplomacy',
 'URW: Distrust towards Media',
 'URW: Hidden plots by secret schemes of powerful groups',
 'URW: Negative Consequences for the West',
 'URW: Overpraising the West',
 'URW: Praise of Russia',
 'URW: Russia is the Victim',
 'URW: Speculating war outcomes']

In [42]:
fine_classes[:15]

['CC: Amplifying Climate Fears: Amplifying existing fears of global warming',
 'CC: Amplifying Climate Fears: Doomsday scenarios for humans',
 'CC: Amplifying Climate Fears: Earth will be uninhabitable soon',
 'CC: Amplifying Climate Fears: Other',
 'CC: Amplifying Climate Fears: Whatever we do it is already too late',
 'CC: Climate change is beneficial: CO2 is beneficial',
 'CC: Climate change is beneficial: Other',
 'CC: Climate change is beneficial: Temperature increase is beneficial',
 'CC: Controversy about green technologies: Other',
 'CC: Controversy about green technologies: Renewable energy is costly',
 'CC: Controversy about green technologies: Renewable energy is dangerous',
 'CC: Controversy about green technologies: Renewable energy is unreliable',
 'CC: Criticism of climate movement: Ad hominem attacks on key activists',
 'CC: Criticism of climate movement: Climate movement is alarmist',
 'CC: Criticism of climate movement: Climate movement is corrupt']

In [43]:
narrative_order

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21]

In [44]:
from sklearn import metrics

class MultiHeadEvaluator:
    def __init__(
        self,
        classes_coarse=coarse_classes,
        classes_fine=fine_classes,
        narrative_to_sub_map=narrative_to_sub_map,
        narrative_order=narrative_order,
        narrative_classes=mlb_narratives.classes_,
        subnarrative_classes=mlb_subnarratives.classes_,
        device='cpu',
        output_dir='../../../submissions',
    ):
        self.narrative_to_sub_map = narrative_to_sub_map
        self.narrative_order = narrative_order
        self.narrative_classes = list(narrative_classes)
        self.subnarrative_classes = list(subnarrative_classes)
        
        self.classes_coarse = classes_coarse
        self.classes_fine = classes_fine

        self.device = device
        self.output_dir = output_dir
        os.makedirs(output_dir, exist_ok=True)
    
    def evaluate(
        self,
        model,
        embeddings,
        dataset,
        thresholds=None,
        save=False,
        std_weight=0.6,
        lower_thres=0.1,
        upper_thres=0.6
    ):
        if thresholds is None:
            thresholds = np.arange(lower_thres, upper_thres, 0.05)
        
        dataset = dataset.reset_index(drop=True)
        embeddings = embeddings.to(self.device)
    
        best_results = {
            'best_coarse_f1': -1,
            'best_coarse_std': float('inf'),
            'best_fine_f1': -1,
            'best_fine_std': float('inf'),
            'narr_threshold': 0,
            'sub_threshold': 0,
            'predictions': None,
            'best_combined_score': -float('inf'),
            'coarse_classification_report': None,
            'fine_precision': None,
            'fine_recall': None,
            'samples_f1_fine': None,
        }
    
        with torch.no_grad():
            narr_probs, sub_probs_dict = model(embeddings)
            narr_probs = narr_probs.cpu().numpy()
            sub_probs_dict = {k: v.cpu().numpy() for k, v in sub_probs_dict.items()}
    
        for narr_threshold in thresholds:
            for sub_threshold in thresholds:
                predictions = []
                try:
                    for sample_idx, row in dataset.iterrows():
                        pred = self._make_prediction(
                            row['article_id'],
                            sample_idx,
                            narr_probs,
                            sub_probs_dict,
                            narr_threshold,
                            sub_threshold
                        )
                        predictions.append(pred)
                    
                    metrics_result = self._compute_metrics_coarse_fine(predictions, dataset)
                    f1_coarse_mean, coarse_std, f1_fine_mean, fine_std, report_coarse, precision_fine, recall_fine, \
                    samples_f1_fine = metrics_result
                    
                    combined_score = f1_fine_mean - (std_weight * coarse_std)
                    
                    if combined_score > best_results['best_combined_score']:
                        best_results.update({
                            'best_coarse_f1': f1_coarse_mean,
                            'best_coarse_std': coarse_std,
                            'best_fine_f1': f1_fine_mean,
                            'best_fine_std': fine_std,
                            'narr_threshold': narr_threshold,
                            'sub_threshold': sub_threshold,
                            'predictions': predictions,
                            'best_combined_score': combined_score,
                            'coarse_classification_report': report_coarse,
                            'fine_precision': precision_fine,
                            'fine_recall': recall_fine,
                            'samples_f1_fine': samples_f1_fine,
                        })
                except Exception as e:
                    print(f"Error during evaluation with thresholds {narr_threshold:.2f}, {sub_threshold:.2f}: {str(e)}")
                    continue
    
        print("\nBest thresholds found:")
        print(f"Narrative threshold: {best_results['narr_threshold']:.2f}")
        print(f"Subnarrative threshold: {best_results['sub_threshold']:.2f}")
        print('\nCompetition Values')
        print(f"Coarse-F1: {best_results['best_coarse_f1']:.3f}")
        print(f"F1 st. dev. coarse: {best_results['best_coarse_std']:.3f}")
        print(f"Fine-F1: {best_results['best_fine_f1']:.3f}")
        print(f"F1 st. dev. fine: {best_results['best_fine_std']:.3f}")
        print("\nFine Metrics:")
        print("Precision: {:.3f}".format(best_results['fine_precision']))
        print("Recall: {:.3f}".format(best_results['fine_recall']))
        print("F1 Samples: {:.3f}".format(best_results['samples_f1_fine']))

        if save:
            self._save_predictions(best_results, os.path.join(self.output_dir, 'submission.txt'))
        
        return best_results

    def _make_prediction(self, article_id, sample_idx, narr_probs, sub_probs_dict, narr_threshold, sub_threshold):       
        other_idx = self.narrative_classes.index("Other")  
        active_narratives = [
            (n_idx, prob)
            for n_idx, prob in enumerate(narr_probs[sample_idx])
            if n_idx != other_idx and prob >= narr_threshold
        ]

        if not active_narratives:
            return {
                'article_id': article_id,
                'narratives': ["Other"],
                'pairs': ["Other"]
            }
        
        narratives = []
        pairs = []
        seen_pairs = set()
        
        active_narratives.sort(key=lambda x: x[1], reverse=True)
        for narr_idx, _ in active_narratives:
            narr_name = self.narrative_classes[narr_idx]
                
            sub_probs = sub_probs_dict[str(narr_idx)][sample_idx]
            active_subnarratives = [
                (local_idx, s_prob)
                for local_idx, s_prob in enumerate(sub_probs)
                if s_prob >= sub_threshold
            ]
            
            active_subnarratives.sort(key=lambda x: x[1], reverse=True)
            if not active_subnarratives:
                pairs.append(f"{narr_name}: Other")
            else:
                for local_idx, _ in active_subnarratives:   
                    global_sub_idx = self.narrative_to_sub_map[narr_idx][local_idx]
                    sub_name = self.subnarrative_classes[global_sub_idx]
                    pair = f"{narr_name}: {sub_name}"
                    if pair not in seen_pairs:
                        pairs.append(pair)
                        seen_pairs.add(pair)
            narratives.append(narr_name)
        
        return {
            'article_id': article_id,
            'narratives': narratives,
            'pairs': pairs
        }

    def _compute_metrics_coarse_fine(self, predictions, dataset):
        gold_coarse_all = []
        gold_fine_all = []
        pred_coarse_all = []
        pred_fine_all = []

        for pred, (_, row) in zip(predictions, dataset.iterrows()):
            gold_coarse = row['narratives']
            gold_subnarratives = row['subnarratives']
            
            pred_coarse = pred['narratives']
            pred_fine = []
            for p in pred['pairs']:
                if p == "Other":
                    pred_fine.append("Other")
                else:
                    pred_fine.append(p)

            gold_fine = []
            for gold_nar, gold_sub in zip(gold_coarse, gold_subnarratives):
                if gold_nar == "Other":
                    gold_fine.append("Other")
                else:
                    gold_fine.append(f"{gold_nar}: {gold_sub}")
            
            gold_coarse_all.append(gold_coarse)
            gold_fine_all.append(gold_fine)
            pred_coarse_all.append(pred_coarse)
            pred_fine_all.append(pred_fine)

        f1_coarse_mean, coarse_std = self._evaluate_multi_label(gold_coarse_all, pred_coarse_all, self.classes_coarse)
        f1_fine_mean, fine_std = self._evaluate_multi_label(gold_fine_all, pred_fine_all, self.classes_fine)
        
        gold_coarse_flat = []
        pred_coarse_flat = []
        for g_labels, p_labels in zip(gold_coarse_all, pred_coarse_all):
            g_onehot = np.zeros(len(self.classes_coarse), dtype=int)
            p_onehot = np.zeros(len(self.classes_coarse), dtype=int)
            
            for lab in g_labels:
                if lab in self.classes_coarse:
                    g_onehot[self.classes_coarse.index(lab)] = 1
            for lab in p_labels:
                if lab in self.classes_coarse:
                    p_onehot[self.classes_coarse.index(lab)] = 1
                    
            gold_coarse_flat.append(g_onehot)
            pred_coarse_flat.append(p_onehot)
            
        gold_coarse_flat = np.array(gold_coarse_flat)
        pred_coarse_flat = np.array(pred_coarse_flat)
        
        report_coarse = metrics.classification_report(
                gold_coarse_flat, pred_coarse_flat, 
                target_names=self.classes_coarse, 
                zero_division=0
        )
        
        gold_fine_flat = []
        pred_fine_flat = []
        for g_labels, p_labels in zip(gold_fine_all, pred_fine_all):
            g_onehot = np.zeros(len(self.classes_fine), dtype=int)
            p_onehot = np.zeros(len(self.classes_fine), dtype=int)
            
            for lab in g_labels:
                if lab in self.classes_fine:
                    g_onehot[self.classes_fine.index(lab)] = 1
            for lab in p_labels:
                if lab in self.classes_fine:
                    p_onehot[self.classes_fine.index(lab)] = 1
                    
            gold_fine_flat.append(g_onehot)
            pred_fine_flat.append(p_onehot)
            
        gold_fine_flat = np.array(gold_fine_flat)
        pred_fine_flat = np.array(pred_fine_flat)
        
        precision_fine = metrics.precision_score(gold_fine_flat, pred_fine_flat, average='macro', zero_division=0)
        recall_fine = metrics.recall_score(gold_fine_flat, pred_fine_flat, average='macro', zero_division=0)
        samples_f1_fine = metrics.f1_score(gold_fine_flat, pred_fine_flat, average='samples', zero_division=0)
        
        return f1_coarse_mean, coarse_std, f1_fine_mean, fine_std, report_coarse, precision_fine, recall_fine, samples_f1_fine

    def _evaluate_multi_label(self, gold, predicted, class_list):
        f1_scores = []
        for g_labels, p_labels in zip(gold, predicted):
            g_onehot = np.zeros(len(class_list), dtype=int)
            p_onehot = np.zeros(len(class_list), dtype=int)
            
            for lab in g_labels:
                if lab in class_list:
                    g_onehot[class_list.index(lab)] = 1
            for lab in p_labels:
                if lab in class_list:
                    p_onehot[class_list.index(lab)] = 1
                    
            f1_doc = metrics.f1_score(g_onehot, p_onehot, zero_division=0)
            f1_scores.append(f1_doc)
            
        return float(np.mean(f1_scores)), float(np.std(f1_scores))

    def _save_predictions(self, best_results, filepath):
        predictions = best_results['predictions']
        if os.path.exists(filepath):
            os.remove(filepath)
        
        with open(filepath, 'w', encoding='utf-8') as f:
            for pred in predictions:
                line = (f"{pred['article_id']}\t"
                       f"{';'.join(pred['narratives'])}\t"
                       f"{';'.join(pred['pairs'])}\n")
                f.write(line)

In [45]:
import torch

train_embeddings_tensor = torch.tensor(train_embeddings, dtype=torch.float32).to(device)
val_embeddings_tensor = torch.tensor(val_embeddings, dtype=torch.float32).to(device)

As we train on multiple languages in sequence, the last language (our target) needs some kind of special care. 
Our goal is to make sure the model performs best on our target language, without losing what it learned from previous languages. When we reach the target language, we make two key changes:
- We increase the patience parameter to train the model more carefully on the target language.
- We also lower the learning rate, because the model has already learned some patterns from other languages, we want smaller, more precise updates for the target language.

In [46]:
class ContinualLearningModel:
    def __init__(
        self,
        model_class,
        model_params,
        dataset_train=dataset_train,
        train_embeddings=train_embeddings,
        dataset_val=dataset_val,
        val_embeddings=val_embeddings,
        language_order=['RU', 'BG', 'HI', 'PT', 'EN'],
        learning_rate=0.001,
        target="EN",
        device='cuda' if torch.cuda.is_available() else 'cpu'
    ):
        self.model_class = model_class
        self.model_params = model_params
        self.dataset_train = dataset_train
        self.train_embeddings = train_embeddings
        self.dataset_val = dataset_val
        self.val_embeddings = val_embeddings
        self.language_order = language_order
        self.learning_rate = learning_rate
        self.device = device
        self.target = target
        self.y_val_nar = self.dataset_val['narratives_encoded'].tolist()
        self.y_val_sub_heads = self.dataset_val['aggregated_subnarratives'].tolist()
        

    def _prepare_language_data(self, language):
        language_mask = self.dataset_train["language"] == language
        train_data = self.dataset_train[language_mask].copy()
        train_emb = self.train_embeddings[language_mask]
        y_train_nar = torch.tensor(train_data['narratives_encoded'].tolist(), dtype=torch.float32).to(self.device)
        y_train_sub_heads = train_data['aggregated_subnarratives'].tolist()
        train_emb = torch.tensor(train_emb, dtype=torch.float32).to(self.device)
        return train_data, train_emb, y_train_nar, y_train_sub_heads

    def _setup_loss_function(self, y_train_nar, y_train_sub_heads, language):
        class_weights_nar = compute_class_weights(y_train_nar)
        narrative_criterion = WeightedBCELoss(class_weights_nar)
        
        sub_criterion_dict = {}
        for narr_idx, sub_indices in narrative_to_sub_map.items():
            local_weights = compute_class_weights(torch.tensor([h[narr_idx] for h in y_train_sub_heads]))
            sub_criterion = WeightedBCELoss(local_weights)
            sub_criterion_dict[str(narr_idx)] = sub_criterion
            
        return MultiHeadLoss(narrative_criterion, sub_criterion_dict)

    def train(self, epochs_per_language=100, patience=10):
        self.model = self.model_class(**self.model_params).to(self.device)

        for lang_idx, language in enumerate(self.language_order):
            print(f"\nTraining on {language} data...")
            
            if language == self.target:
                patience = patience * 2
                learning_rate = self.learning_rate * 0.2
                optimizer = torch.optim.Adam(self.model.parameters(), lr=learning_rate)
                scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(
                    optimizer, 
                    mode='min', 
                    factor=0.5,
                    patience=8,
                    min_lr=2e-5,
                    threshold=1e-4
                )
            else:
                optimizer = torch.optim.Adam(self.model.parameters(), lr=self.learning_rate)
                scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(
                    optimizer, mode='min', factor=0.5, patience=5
                )
            
            train_data, train_emb, y_train_nar, y_train_sub_heads = self._prepare_language_data(language)
            loss_fn = self._setup_loss_function(y_train_nar, y_train_sub_heads, language)
            val_emb_tensor = torch.tensor(self.val_embeddings, dtype=torch.float32).to(self.device)
            best_val_loss = float('inf')
            patience_counter = 0
            best_model_state = None

            for epoch in range(epochs_per_language):
                self.model.train()
                train_narr_probs, train_sub_probs_dict = self.model(train_emb)
                train_loss = loss_fn(
                    train_narr_probs,
                    train_sub_probs_dict,
                    y_train_nar,
                    y_train_sub_heads
                )

                optimizer.zero_grad()
                train_loss.backward()
                                    
                optimizer.step()

                self.model.eval()
                with torch.no_grad():
                    val_narr_probs, val_sub_probs_dict = self.model(val_emb_tensor)
                    val_loss = loss_fn(
                        val_narr_probs,
                        val_sub_probs_dict,
                        torch.tensor(self.y_val_nar, dtype=torch.float32).to(self.device),
                        self.y_val_sub_heads
                    )

                print(f"Epoch {epoch+1}/{epochs_per_language}, "
                      f"Train Loss: {train_loss.item():.4f}, "
                      f"Val Loss: {val_loss.item():.4f}")

                if scheduler:
                    scheduler.step(val_loss)
                    current_lr = scheduler.optimizer.param_groups[0]['lr']
                    print(f"Current Learning Rate: {current_lr:.6f}")

                if val_loss < best_val_loss:
                    best_val_loss = val_loss
                    patience_counter = 0
                    best_model_state = self.model.state_dict().copy()
                else:
                    patience_counter += 1
                    
                if patience_counter >= patience:
                    print(f"Early stopping triggered for {language}")
                    break

            if best_model_state:
                self.model.load_state_dict(best_model_state)

        return self.model

    def evaluate_final(self, save_predictions=True):
        evaluator = MultiHeadEvaluator(device=self.device)
        val_emb_tensor = torch.tensor(self.val_embeddings, dtype=torch.float32).to(self.device)
        _ = evaluator.evaluate(
            self.model,
            val_emb_tensor,
            self.dataset_val,
            save=save_predictions
        )

In [47]:
lang = {
    'PT': ['RU', 'HI', 'BG', 'EN', 'PT'],
    'EN': ['RU', 'BG', 'PT', 'HI', 'EN']
}

In [48]:
language_order=['RU', 'BG', 'PT', 'HI', 'EN']
target=language_order[-1]

In [49]:
model_params = {
    'input_size': train_embeddings.shape[1],
    'hidden_size': 2048,
    'dropout_rate': 0.4
}

cl_model = ContinualLearningModel(
    model_class=MultiTaskClassifierMultiHead,
    model_params=model_params,
    language_order=language_order,
    target=target
)

In [50]:
model = cl_model.train(epochs_per_language=100, patience=15)


Training on RU data...
Epoch 1/100, Train Loss: 0.7067, Val Loss: 0.8710
Current Learning Rate: 0.001000
Epoch 2/100, Train Loss: 0.4126, Val Loss: 0.8672
Current Learning Rate: 0.001000
Epoch 3/100, Train Loss: 0.2842, Val Loss: 0.8689
Current Learning Rate: 0.001000
Epoch 4/100, Train Loss: 0.2238, Val Loss: 0.8752
Current Learning Rate: 0.001000
Epoch 5/100, Train Loss: 0.1906, Val Loss: 0.8844
Current Learning Rate: 0.001000
Epoch 6/100, Train Loss: 0.1682, Val Loss: 0.8951
Current Learning Rate: 0.001000
Epoch 7/100, Train Loss: 0.1495, Val Loss: 0.9069
Current Learning Rate: 0.001000
Epoch 8/100, Train Loss: 0.1330, Val Loss: 0.9182
Current Learning Rate: 0.000500
Epoch 9/100, Train Loss: 0.1184, Val Loss: 0.9241
Current Learning Rate: 0.000500
Epoch 10/100, Train Loss: 0.1144, Val Loss: 0.9300
Current Learning Rate: 0.000500
Epoch 11/100, Train Loss: 0.1088, Val Loss: 0.9362
Current Learning Rate: 0.000500
Epoch 12/100, Train Loss: 0.1017, Val Loss: 0.9435
Current Learning Rate

In [51]:
cl_model.evaluate_final()


Best thresholds found:
Narrative threshold: 0.55
Subnarrative threshold: 0.45

Competition Values
Coarse-F1: 0.524
F1 st. dev. coarse: 0.352
Fine-F1: 0.375
F1 st. dev. fine: 0.352

Fine Metrics:
Precision: 0.114
Recall: 0.263
F1 Samples: 0.375


If we change the order of the languages being trained:

In [52]:
language_order=['RU', 'HI', 'PT', 'BG', 'EN']
target=language_order[-1]

In [53]:
cl_model = ContinualLearningModel(
    model_class=MultiTaskClassifierMultiHead,
    model_params=model_params,
    language_order=language_order,
    target=target
)

In [54]:
model = cl_model.train(epochs_per_language=100, patience=15)


Training on RU data...
Epoch 1/100, Train Loss: 0.7104, Val Loss: 0.8775
Current Learning Rate: 0.001000
Epoch 2/100, Train Loss: 0.4176, Val Loss: 0.8726
Current Learning Rate: 0.001000
Epoch 3/100, Train Loss: 0.2896, Val Loss: 0.8728
Current Learning Rate: 0.001000
Epoch 4/100, Train Loss: 0.2272, Val Loss: 0.8775
Current Learning Rate: 0.001000
Epoch 5/100, Train Loss: 0.1906, Val Loss: 0.8856
Current Learning Rate: 0.001000
Epoch 6/100, Train Loss: 0.1708, Val Loss: 0.8956
Current Learning Rate: 0.001000
Epoch 7/100, Train Loss: 0.1517, Val Loss: 0.9060
Current Learning Rate: 0.001000
Epoch 8/100, Train Loss: 0.1356, Val Loss: 0.9174
Current Learning Rate: 0.000500
Epoch 9/100, Train Loss: 0.1229, Val Loss: 0.9244
Current Learning Rate: 0.000500
Epoch 10/100, Train Loss: 0.1169, Val Loss: 0.9318
Current Learning Rate: 0.000500
Epoch 11/100, Train Loss: 0.1112, Val Loss: 0.9398
Current Learning Rate: 0.000500
Epoch 12/100, Train Loss: 0.1035, Val Loss: 0.9489
Current Learning Rate

The results are poor compared to the first language order.

That could be because, the jump from Russian directly to Hindi could be too drastic as these languages have very different structures, and we might try to get very good at it without having the right "prerequisites"

- In the first instance, however, we have a somewhat more smooth transition, Russian and Bulgarian are both Slavic languages, then the model gets to know different patterns from Portuguese and maybe harder ones from Hindi before moving on to the final target language.

In [55]:
cl_model.evaluate_final()


Best thresholds found:
Narrative threshold: 0.35
Subnarrative threshold: 0.45

Competition Values
Coarse-F1: 0.432
F1 st. dev. coarse: 0.314
Fine-F1: 0.260
F1 st. dev. fine: 0.283

Fine Metrics:
Precision: 0.114
Recall: 0.332
F1 Samples: 0.260
