# Semeval 2025 Task 10
### Subtask 2: Narrative Baseline Classification -- Multilingual

Given a news article and a [two-level taxonomy of narrative labels](https://propaganda.math.unipd.it/semeval2025task10/NARRATIVE-TAXONOMIES.pdf) (where each narrative is subdivided into subnarratives) from a particular domain, assign to the article all the appropriate subnarrative labels. This is a multi-label multi-class document classification task.

## 1. Baselines

### 1.1 Loading pre-saved variables

We start by loading our pre-saved variables

In [1]:
import pickle
import os
import numpy as np

base_save_folder_dir = '../saved/'
dataset_folder = os.path.join(base_save_folder_dir, 'Dataset')

with open(os.path.join(dataset_folder, 'dataset.pkl'), 'rb') as f:
    dataset = pickle.load(f)

In [2]:
dataset.head()

Unnamed: 0,language,article_id,content,narratives,subnarratives,narratives_encoded,subnarratives_encoded
0,RU,RU-URW-1161.txt,<PARA>в ближайшие два месяца сша будут стремит...,[Blaming the war on others rather than the inv...,"[The West are the aggressors, Other, The West ...","[0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, ...","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ..."
1,RU,RU-URW-1175.txt,<PARA>в ес испугались последствий популярности...,"[Discrediting the West, Diplomacy, Discreditin...","[The West is weak, Other, The EU is divided]","[0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, ...","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ..."
2,RU,RU-URW-1149.txt,<PARA>возможность признания аллы пугачевой ино...,[Distrust towards Media],[Western media is an instrument of propaganda],"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, ...","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ..."
3,RU,RU-URW-1015.txt,<PARA>азаров рассказал о смене риторики киева ...,"[Discrediting Ukraine, Discrediting Ukraine]","[Ukraine is a puppet of the West, Discrediting...","[0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, ...","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ..."
4,RU,RU-URW-1001.txt,<PARA>в россиянах проснулась массовая любовь к...,[Praise of Russia],[Russia is a guarantor of peace and prosperity],"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ..."


In [3]:
embeddings_folder = os.path.join(base_save_folder_dir, 'Embeddings/all_embeddings.npy')

def load_embeddings(filename):
    return np.load(filename)

all_embeddings = load_embeddings(embeddings_folder)

In [4]:
all_embeddings.shape

(1699, 896)

In [5]:
misc_folder = os.path.join(base_save_folder_dir, 'Misc')

with open(os.path.join(misc_folder, 'narrative_to_subnarratives.pkl'), 'rb') as f:
    narrative_to_subnarratives = pickle.load(f)

In [6]:
narrative_to_subnarratives

{'Discrediting Ukraine': ['Discrediting Ukrainian nation and society',
  'Ukraine is associated with nazism',
  'Ukraine is a hub for criminal activities',
  'Other',
  'Situation in Ukraine is hopeless',
  'Discrediting Ukrainian military',
  'Rewriting Ukraine’s history',
  'Ukraine is a puppet of the West',
  'Discrediting Ukrainian government and officials and policies'],
 'Discrediting the West, Diplomacy': ['The West is overreacting',
  'The West does not care about Ukraine, only about its interests',
  'The EU is divided',
  'Other',
  'Diplomacy does/will not work',
  'The West is weak',
  'West is tired of Ukraine'],
 'Praise of Russia': ['Praise of Russian military might',
  'Russia is a guarantor of peace and prosperity',
  'Russian invasion has strong national support',
  'Other',
  'Russia has international support from a number of countries and people',
  'Praise of Russian President Vladimir Putin'],
 'Russia is the Victim': ['Russia actions in Ukraine are only self-defe

In [7]:
label_encoder_folder = os.path.join(base_save_folder_dir, 'LabelEncoders')

with open(os.path.join(label_encoder_folder, 'mlb_narratives.pkl'), 'rb') as f:
    mlb_narratives = pickle.load(f)

with open(os.path.join(label_encoder_folder, 'mlb_subnarratives.pkl'), 'rb') as f:
    mlb_subnarratives = pickle.load(f)

In [8]:
mlb_subnarratives

In [9]:
narrative_to_sub_map = {}
narrative_classes = list(mlb_narratives.classes_)
subnarrative_classes = list(mlb_subnarratives.classes_)

for narrative, subnarratives in narrative_to_subnarratives.items():
    narrative_idx = narrative_classes.index(narrative)
    subnarrative_indices = [subnarrative_classes.index(sub) for sub in subnarratives]
    narrative_to_sub_map[narrative_idx] = subnarrative_indices

print(narrative_to_sub_map)

{8: [22, 66, 64, 33, 50, 21, 39, 65, 20], 9: [58, 56, 53, 33, 19, 60, 71], 17: [35, 42, 46, 33, 41, 34], 19: [40, 33, 63, 59], 10: [72, 33, 69], 1: [43, 3, 62, 33, 31], 16: [32, 33, 57, 55], 2: [67, 33, 54], 20: [33, 44, 45, 68], 13: [2, 33, 6], 14: [47, 33, 61], 0: [73, 24, 23, 1, 33], 15: [33], 7: [14, 33, 17, 16, 15], 5: [8, 9, 33, 0], 11: [29, 51, 7, 33, 28, 27, 4, 70, 49], 6: [33, 11, 12, 10], 18: [48, 18, 33, 26, 30], 3: [33, 52, 5], 4: [33, 36, 37, 38], 12: [25, 13, 33]}


* We split our data to ensure that the train and validation datasets, along with their corresponding embeddings, are perfectly aligned.
* We use stratified splitting to maintain the distribution of labels across both train and validation sets, ensuring that rare and common labels are proportionally represented.

### 1.2 Stratify Splitting

In [10]:
from iterstrat.ml_stratifiers import MultilabelStratifiedKFold
import numpy as np
import pandas as pd

def stratified_train_val_split_with_embeddings(data, embeddings, labels_column, train_size=0.8, splits=5, shuffle=True, min_instances=2):
    if shuffle:
        shuffled_indices = np.arange(len(data))
        np.random.shuffle(shuffled_indices)
        data = data.iloc[shuffled_indices].reset_index(drop=True)
        embeddings = embeddings[shuffled_indices]

    labels = np.array(data[labels_column].tolist())
    rare_indices = []
    common_indices = []

    class_counts = labels.sum(axis=0)
    rare_classes = np.where(class_counts <= min_instances)[0]

    for idx, label_row in enumerate(labels):
        if any(label_row[rare_classes]):
            rare_indices.append(idx)
        else:
            common_indices.append(idx)

    rare_data = data.iloc[rare_indices]
    rare_labels = labels[rare_indices]
    rare_embeddings = embeddings[rare_indices]

    train_rare = rare_data.iloc[:len(rare_data) // 2].reset_index(drop=True)
    val_rare = rare_data.iloc[len(rare_data) // 2:].reset_index(drop=True)

    train_rare_embeddings = rare_embeddings[:len(rare_data) // 2]
    val_rare_embeddings = rare_embeddings[len(rare_data) // 2:]

    common_data = data.iloc[common_indices].reset_index(drop=True)
    common_labels = labels[common_indices]
    common_embeddings = embeddings[common_indices]

    mskf = MultilabelStratifiedKFold(n_splits=splits)
    for train_idx, val_idx in mskf.split(np.zeros(len(common_labels)), common_labels):
        train_common = common_data.iloc[train_idx]
        val_common = common_data.iloc[val_idx]
        train_common_embeddings = common_embeddings[train_idx]
        val_common_embeddings = common_embeddings[val_idx]
        break

    train_data = pd.concat([train_rare, train_common]).reset_index(drop=True)
    val_data = pd.concat([val_rare, val_common]).reset_index(drop=True)

    train_embeddings = np.concatenate([train_rare_embeddings, train_common_embeddings], axis=0)
    val_embeddings = np.concatenate([val_rare_embeddings, val_common_embeddings], axis=0)

    return (train_data, train_embeddings), (val_data, val_embeddings)

(dataset_train, train_embeddings), (dataset_val, val_embeddings) = stratified_train_val_split_with_embeddings(
    dataset,
    all_embeddings,
    labels_column="subnarratives_encoded",
    min_instances=2
)

In [11]:
dataset_train.shape

(1360, 7)

In [12]:
train_embeddings.shape

(1360, 896)

### 1.3 Creating baseline models

### Weighted OneVSRest Logistic Regression

We start by experimenting with a One-vs-Rest logistic regression model to handle the multi-label classification. 
* The class_weight='balanced' parameter helps adjust for label imbalance by giving more weight to underrepresented classes

In [13]:
y_train_nar = dataset_train['narratives_encoded'].tolist()
y_val_nar = dataset_val['narratives_encoded'].tolist()

In [14]:
from sklearn.linear_model import LogisticRegression
from sklearn.multiclass import OneVsRestClassifier

ovr_logistic_nar = OneVsRestClassifier(LogisticRegression(max_iter=1000, class_weight='balanced'))

We feed the model:

In [15]:
ovr_logistic_nar.fit(train_embeddings, y_train_nar)

Also, we define some evaluation functions in case of re-usability and extendability:

In [16]:
import warnings
from sklearn.metrics import classification_report
from sklearn.model_selection import cross_val_score, StratifiedKFold

def get_classification_report(y_true, y_pred):
    with warnings.catch_warnings():
        warnings.simplefilter("ignore")
        report = classification_report(y_true, y_pred, output_dict=True)
    report_df = pd.DataFrame(report).transpose()
    return report_df

def get_cross_val_score(model, x, y, scoring='f1_macro', splits=3):
    """Perform cross-validation and compute scores."""
    cv = StratifiedKFold(n_splits=splits, shuffle=True)
    cross_val_scores = cross_val_score(model, x, y, cv=cv, scoring=scoring)
    print(f"Cross-validation scores: {cross_val_scores}")
    print(f"Mean CV F1 Score: {cross_val_scores.mean()}")

In [17]:
import warnings
from sklearn.metrics import (
    hamming_loss,
)

def evaluate_model(model, x, y_true):
    y_pred = model.predict(x)

    classification_report_df = get_classification_report(y_true, y_pred)
    print("Classification Report:")
    print(classification_report_df)
    print("\n")

    hamming = hamming_loss(y_true, y_pred)
    print(f"Hamming Loss: {hamming:.4f}")
    print("\n")

In [18]:
evaluate_model(ovr_logistic_nar, val_embeddings, y_val_nar)

Classification Report:
              precision    recall  f1-score  support
0              0.742424  1.000000  0.852174     49.0
1              0.465909  0.820000  0.594203     50.0
2              0.186916  0.588235  0.283688     34.0
3              0.250000  1.000000  0.400000      3.0
4              0.266667  0.666667  0.380952      6.0
5              0.333333  0.800000  0.470588     10.0
6              0.365385  0.863636  0.513514     22.0
7              0.481481  0.764706  0.590909     34.0
8              0.600000  0.892857  0.717703     84.0
9              0.491525  0.828571  0.617021     70.0
10             0.147059  0.555556  0.232558      9.0
11             0.333333  0.818182  0.473684     11.0
12             0.000000  0.000000  0.000000      2.0
13             0.357143  0.625000  0.454545     16.0
14             0.253731  0.944444  0.400000     18.0
15             0.517647  0.721311  0.602740     61.0
16             0.058824  0.250000  0.095238      8.0
17             0.427419

We also try for subnarratives:

In [19]:
y_train_sub_nar = dataset_train['subnarratives_encoded'].tolist()
y_val_sub_nar = dataset_val['subnarratives_encoded'].tolist()

In [20]:
ovr_logistic_sub = OneVsRestClassifier(LogisticRegression(max_iter=1000, class_weight='balanced'))
ovr_logistic_sub.fit(train_embeddings, y_train_sub_nar)

In [21]:
evaluate_model(ovr_logistic_sub, val_embeddings, y_val_sub_nar)

Classification Report:
              precision    recall  f1-score  support
0              0.235294  1.000000  0.380952      4.0
1              0.629630  0.944444  0.755556     36.0
2              0.230769  0.600000  0.333333      5.0
3              0.170732  0.538462  0.259259     13.0
4              0.500000  0.500000  0.500000      2.0
...                 ...       ...       ...      ...
73             0.250000  1.000000  0.400000      2.0
micro avg      0.258696  0.640646  0.368564    743.0
macro avg      0.176396  0.466949  0.246406    743.0
weighted avg   0.379833  0.640646  0.435697    743.0
samples avg    0.304863  0.673133  0.383083    743.0

[78 rows x 4 columns]


Hamming Loss: 0.0650




### Building a simple multi-task Neural Network

A different model we can launch is to create a simple "multi-task" model, with 2 heads
* The first head will account to predict narratives
* The second head, for subnarratives.

In [22]:
import torch

train_embeddings_tensor = torch.tensor(train_embeddings, dtype=torch.float32)
val_embeddings_tensor = torch.tensor(val_embeddings, dtype=torch.float32)

In [23]:
input_size = train_embeddings_tensor.shape[1]
print(input_size)

896


* The model was finalised after a lot of experimentaions the BatchNorm + ReLU combo significantly improves performance by stabilizing training and speeding up convergence.
* Also, it seems like the model overfits very quickly when becoming overly complex.

In [24]:
import torch
import torch.nn as nn

class MultiTaskClassifier(nn.Module):
    def __init__(self,
                 input_size,
                 hidden_size,
                 num_narratives=len(mlb_narratives.classes_),
                 num_subnarratives=len(mlb_subnarratives.classes_),
                 dropout_rate=0.3
                ):

        super(MultiTaskClassifier, self).__init__()

        self.shared_layer = nn.Sequential(
            nn.Linear(input_size, hidden_size * 2),
            nn.BatchNorm1d(hidden_size * 2),
            nn.ReLU(),
            nn.Dropout(dropout_rate)
        )

        self.narrative_head = nn.Sequential(
            nn.Linear(hidden_size * 2, num_narratives),
            nn.Sigmoid()
        )

        self.subnarrative_head = nn.Sequential(
            nn.Linear(hidden_size * 2, num_subnarratives),
            nn.Sigmoid()
        )

    def forward(self, x):
        shared_output = self.shared_layer(x)
        narratives = self.narrative_head(shared_output)
        subnarratives = self.subnarrative_head(shared_output)
        return narratives, subnarratives

In [25]:
network_params = {
    'lr': 0.001,
    'hidden_size': 512
}

In [26]:
simple_model = MultiTaskClassifier(input_size=input_size, hidden_size=network_params['hidden_size'])
narratives, subnarratives = simple_model(train_embeddings_tensor)
print(narratives.shape, subnarratives.shape)

torch.Size([1360, 21]) torch.Size([1360, 74])


In [27]:
print(simple_model)

MultiTaskClassifier(
  (shared_layer): Sequential(
    (0): Linear(in_features=896, out_features=1024, bias=True)
    (1): BatchNorm1d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU()
    (3): Dropout(p=0.3, inplace=False)
  )
  (narrative_head): Sequential(
    (0): Linear(in_features=1024, out_features=21, bias=True)
    (1): Sigmoid()
  )
  (subnarrative_head): Sequential(
    (0): Linear(in_features=1024, out_features=74, bias=True)
    (1): Sigmoid()
  )
)


We make everything a tensor:

In [28]:
y_train_nar = torch.tensor(y_train_nar, dtype=torch.float32)
y_train_sub_nar = torch.tensor(y_train_sub_nar, dtype=torch.float32)

y_val_nar = torch.tensor(y_val_nar, dtype=torch.float32)
y_val_sub_nar = torch.tensor(y_val_sub_nar, dtype=torch.float32)

We calculate class weights to handle label imbalance in the training data. 
* This way, rare labels are given higher importance to ensure the model learns them effectively.
* The custom ```WeightedBCELoss``` applies these weights during training to balance the impact of common and rare labels, preventing the model from focusing only on frequent ones.

In [29]:
import torch
import torch.nn as nn

def compute_class_weights(y_train):
    total_samples = y_train.shape[0]
    class_weights = []
    for label in range(y_train.shape[1]):
        pos_count = y_train[:, label].sum().item()
        neg_count = total_samples - pos_count
        pos_weight = total_samples / (2 * pos_count) if pos_count > 0 else 0
        neg_weight = total_samples / (2 * neg_count) if neg_count > 0 else 0
        class_weights.append((pos_weight, neg_weight))
    return class_weights

class WeightedBCELoss(nn.Module):
    def __init__(self, class_weights):
        super().__init__()
        self.class_weights = class_weights

    def forward(self, probs, targets):
        bce_loss = 0
        epsilon = 1e-7
        for i, (pos_weight, neg_weight) in enumerate(self.class_weights):
            prob = probs[:, i]
            bce = -pos_weight * targets[:, i] * torch.log(prob + epsilon) - \
                  neg_weight * (1 - targets[:, i]) * torch.log(1 - prob + epsilon)
            bce_loss += bce.mean()
        return bce_loss / len(self.class_weights)

class_weights_nar = compute_class_weights(y_train_nar)
narrative_criterion = WeightedBCELoss(class_weights_nar)

In [30]:
class_weights_sub_nar = compute_class_weights(y_train_sub_nar)
subnarrative_criterion = WeightedBCELoss(class_weights_sub_nar)

In [31]:
optimizer = torch.optim.Adam(simple_model.parameters(), lr=network_params['lr'])

In [32]:
def train_with_early_stopping(
    model,
    optimizer,
    narrative_criterion,
    subnarrative_criterion,
    train_embeddings=train_embeddings_tensor,
    y_train_nar=y_train_nar,
    y_train_sub_nar=y_train_sub_nar,
    val_embeddings=val_embeddings_tensor,
    y_val_nar=y_val_nar,
    y_val_sub_nar=y_val_sub_nar,
    patience=3,
    num_epochs=100,
):
    best_val_loss = float('inf')
    best_model = None
    patience_counter = 0

    for epoch in range(num_epochs):
        model.train()
        narratives, subnarratives = model(train_embeddings)

        narrative_loss = narrative_criterion(narratives, y_train_nar)
        subnarrative_loss = subnarrative_criterion(subnarratives, y_train_sub_nar)
        loss = narrative_loss + subnarrative_loss

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        model.eval()
        with torch.no_grad():
            val_narratives, val_subnarratives = model(val_embeddings)
            val_narrative_loss = narrative_criterion(val_narratives, y_val_nar)
            val_subnarrative_loss = subnarrative_criterion(val_subnarratives, y_val_sub_nar)
            val_loss = val_narrative_loss + val_subnarrative_loss

        print(f"Epoch {epoch+1}/{num_epochs}, "
              f"Training Loss: {loss.item():.4f} "
              f"(Narrative: {narrative_loss.item():.4f}, Subnarrative: {subnarrative_loss.item():.4f}), "
              f"Validation Loss: {val_loss.item():.4f} "
              f"(Narrative: {val_narrative_loss.item():.4f}, Subnarrative: {val_subnarrative_loss.item():.4f})")

        if val_loss.item() < best_val_loss:
            best_val_loss = val_loss.item()
            patience_counter = 0
            best_model = model.state_dict()
        else:
            patience_counter += 1
            print(f"Validation loss did not improve for {patience_counter} epoch(s).")

        if patience_counter >= patience:
            print("Early stopping triggered.")
            break

    if best_model:
        model.load_state_dict(best_model)

    return model

In [33]:
trained_simple_model = train_with_early_stopping(
    model=simple_model,
    optimizer=optimizer,
    narrative_criterion=narrative_criterion,
    subnarrative_criterion=subnarrative_criterion,
)

Epoch 1/100, Training Loss: 1.4581 (Narrative: 0.7245, Subnarrative: 0.7336), Validation Loss: 1.4571 (Narrative: 0.7117, Subnarrative: 0.7454)
Epoch 2/100, Training Loss: 1.1694 (Narrative: 0.5733, Subnarrative: 0.5961), Validation Loss: 1.4486 (Narrative: 0.7066, Subnarrative: 0.7420)
Epoch 3/100, Training Loss: 1.0314 (Narrative: 0.5058, Subnarrative: 0.5256), Validation Loss: 1.4406 (Narrative: 0.7018, Subnarrative: 0.7388)
Epoch 4/100, Training Loss: 0.9288 (Narrative: 0.4545, Subnarrative: 0.4743), Validation Loss: 1.4332 (Narrative: 0.6974, Subnarrative: 0.7359)
Epoch 5/100, Training Loss: 0.8533 (Narrative: 0.4197, Subnarrative: 0.4336), Validation Loss: 1.4261 (Narrative: 0.6932, Subnarrative: 0.7329)
Epoch 6/100, Training Loss: 0.7863 (Narrative: 0.3868, Subnarrative: 0.3995), Validation Loss: 1.4189 (Narrative: 0.6890, Subnarrative: 0.7300)
Epoch 7/100, Training Loss: 0.7328 (Narrative: 0.3634, Subnarrative: 0.3694), Validation Loss: 1.4113 (Narrative: 0.6843, Subnarrative: 

In [34]:
target_names_nar = mlb_narratives.classes_
target_names_sub = mlb_subnarratives.classes_

In [35]:
from sklearn.metrics import classification_report, f1_score

def evaluate_model(
    model,
    embeddings,
    y_nar_true,
    y_sub_nar_true,
    thresholds=np.arange(0.1, 1.0, 0.1),
    target_names_nar=target_names_nar,
    target_names_sub=target_names_sub
):
    best_threshold = 0
    best_f1 = 0
    best_classification_report_nar = None
    best_classification_report_sub = None

    for threshold in thresholds:
        with torch.no_grad():
            nar_pred_logits, sub_nar_pred_logits = model(embeddings)

            nar_predictions = (nar_pred_logits >= threshold).int().cpu().numpy()
            sub_nar_predictions = (sub_nar_pred_logits >= threshold).int().cpu().numpy()

            y_nar_true_np = y_nar_true.cpu().numpy()
            y_sub_nar_true_np = y_sub_nar_true.cpu().numpy()

            classification_rep_nar = classification_report(
                y_nar_true_np, nar_predictions, target_names=target_names_nar, zero_division=0
            )
            classification_rep_sub = classification_report(
                y_sub_nar_true_np, sub_nar_predictions, target_names=target_names_sub, zero_division=0
            )
            f1_nar = f1_score(y_nar_true_np, nar_predictions, average='macro')
            f1_sub = f1_score(y_sub_nar_true_np, sub_nar_predictions, average='macro')

            avg_f1 = (f1_nar + f1_sub) / 2

            if avg_f1 > best_f1:
                best_f1 = avg_f1
                best_threshold = threshold
                best_classification_report_nar = classification_rep_nar
                best_classification_report_sub = classification_rep_sub

    print(f"Best Threshold: {best_threshold}, Best F1 Score: {best_f1}")
    print("\nBest Narratives Classification Report:")
    print(best_classification_report_nar)
    print("\nBest Sub-Narratives Classification Report:")
    print(best_classification_report_sub)

In [36]:
evaluate_model(
    model=trained_simple_model,
    embeddings=val_embeddings_tensor,
    y_nar_true=y_val_nar,
    y_sub_nar_true=y_val_sub_nar,
)

Best Threshold: 0.5, Best F1 Score: 0.38925771652515845

Best Narratives Classification Report:
                                                   precision    recall  f1-score   support

                         Amplifying Climate Fears       0.71      0.96      0.82        49
                     Amplifying war-related fears       0.56      0.66      0.61        50
Blaming the war on others rather than the invader       0.23      0.65      0.34        34
                     Climate change is beneficial       0.33      0.33      0.33         3
             Controversy about green technologies       0.56      0.83      0.67         6
                    Criticism of climate movement       0.36      0.90      0.51        10
                    Criticism of climate policies       0.47      0.77      0.59        22
        Criticism of institutions and authorities       0.47      0.79      0.59        34
                             Discrediting Ukraine       0.63      0.89      0.74    

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


In [37]:
import torch
import joblib

def save_model(model, save_path=""):
    torch.save({
        'model_state_dict': model.state_dict(),
        'input_size': input_size,
        'hidden_size': model.hidden_size,
        'num_narratives': len(mlb_narratives.classes_),
        'num_subnarratives': len(mlb_subnarratives.classes_),
        'dropout_rate': model.dropout_rate
    }, save_path)

    joblib.dump(mlb_narratives, 'mlb_narratives.pkl')
    joblib.dump(mlb_subnarratives, 'mlb_subnarratives.pkl')

    print(f"Model saved to {save_path}")

### Fixing loss to account for hierarchy

In [38]:
# Finding subnarrative to narratives indices mapping
subnarrative_to_narrative_map = {}
for narrative_idx, subnarrative_indices in narrative_to_sub_map.items():
    for subnarrative_idx in subnarrative_indices:
        subnarrative_to_narrative_map[subnarrative_idx] = narrative_idx
        subnarrative_classes[subnarrative_idx] = narrative_classes[narrative_idx]

print(subnarrative_to_narrative_map)

{22: 8, 66: 8, 64: 8, 33: 12, 50: 8, 21: 8, 39: 8, 65: 8, 20: 8, 58: 9, 56: 9, 53: 9, 19: 9, 60: 9, 71: 9, 35: 17, 42: 17, 46: 17, 41: 17, 34: 17, 40: 19, 63: 19, 59: 19, 72: 10, 69: 10, 43: 1, 3: 1, 62: 1, 31: 1, 32: 16, 57: 16, 55: 16, 67: 2, 54: 2, 44: 20, 45: 20, 68: 20, 2: 13, 6: 13, 47: 14, 61: 14, 73: 0, 24: 0, 23: 0, 1: 0, 14: 7, 17: 7, 16: 7, 15: 7, 8: 5, 9: 5, 0: 5, 29: 11, 51: 11, 7: 11, 28: 11, 27: 11, 4: 11, 70: 11, 49: 11, 11: 6, 12: 6, 10: 6, 48: 18, 18: 18, 26: 18, 30: 18, 52: 3, 5: 3, 36: 4, 37: 4, 38: 4, 25: 12, 13: 12}


In [39]:
def hierarchical_loss_groundtruth_gating(
    narr_probs,
    sub_probs,
    y_narr,
    y_sub,
    parent_of_sub,
):
    """Penalizes subnarratives when the parent is truly active."""
    narr_loss = narrative_criterion(narr_probs, y_narr)

    batch_size, num_subs = sub_probs.size()
    mask = torch.zeros_like(sub_probs)
    # The "target" for those subnarratives that are masked out = 0
    sub_labels_masked = torch.zeros_like(y_sub)

    for s in range(num_subs):
        p = parent_of_sub[s]  # parent narrative index
        # Indices in the batch where parent is 1
        # active_indices = (y_narr[:, p] == 1).nonzero(as_tuple=True)[0]
        active_indices = (y_narr[:, p] == 1)
        # Turn on mask for these subnarratives
        mask[active_indices, s] = 1

        # Also copy the actual sub-label from y_sub for these active samples
        sub_labels_masked[active_indices, s] = y_sub[active_indices, s]

    masked_sub_probs = sub_probs * mask
    sub_loss = subnarrative_criterion(masked_sub_probs, sub_labels_masked)

    total_loss = narr_loss + sub_loss
    return total_loss

In [40]:
def train_with_early_stopping_hierarchical(
    model,
    optimizer,
    narrative_criterion,
    subnarrative_criterion,
    train_embeddings=train_embeddings_tensor,
    y_train_nar=y_train_nar,
    y_train_sub_nar=y_train_sub_nar,
    val_embeddings=val_embeddings_tensor,
    y_val_nar=y_val_nar,
    y_val_sub_nar=y_val_sub_nar,
    patience=3,
    num_epochs=100,
):
    best_val_loss = float('inf')
    best_model = None
    patience_counter = 0

    for epoch in range(num_epochs):
        model.train()
        narratives, subnarratives = model(train_embeddings)

        train_loss = hierarchical_loss_groundtruth_gating(narratives, subnarratives, y_train_nar, y_train_sub_nar, subnarrative_to_narrative_map)

        optimizer.zero_grad()
        train_loss.backward()
        optimizer.step()

        model.eval()
        with torch.no_grad():
            val_narratives, val_subnarratives = model(val_embeddings)
            val_loss = hierarchical_loss_groundtruth_gating(val_narratives, val_subnarratives, y_val_nar, y_val_sub_nar, subnarrative_to_narrative_map)

        print(f"Epoch {epoch+1}/{num_epochs}, "
              f"Training Loss: {train_loss.item():.4f} "
              f"Validation Loss: {val_loss.item():.4f} ")

        if val_loss.item() < best_val_loss:
            best_val_loss = val_loss.item()
            patience_counter = 0
            best_model = model.state_dict()
        else:
            patience_counter += 1
            print(f"Validation loss did not improve for {patience_counter} epoch(s).")

        if patience_counter >= patience:
            print("Early stopping triggered.")
            break

    if best_model:
        model.load_state_dict(best_model)

    return model

In [41]:
model_hierarchy_loss = MultiTaskClassifier(input_size=input_size, hidden_size=512)

In [42]:
optimizer = torch.optim.Adam(model_hierarchy_loss.parameters(), lr=0.001)

In [43]:
trained_hierarchy = train_with_early_stopping_hierarchical(
    model=model_hierarchy_loss,
    optimizer=optimizer,
    narrative_criterion=narrative_criterion,
    subnarrative_criterion=subnarrative_criterion,
)

Epoch 1/100, Training Loss: 1.1153 Validation Loss: 1.1281 
Epoch 2/100, Training Loss: 0.8139 Validation Loss: 1.1132 
Epoch 3/100, Training Loss: 0.6702 Validation Loss: 1.0992 
Epoch 4/100, Training Loss: 0.5879 Validation Loss: 1.0864 
Epoch 5/100, Training Loss: 0.5306 Validation Loss: 1.0753 
Epoch 6/100, Training Loss: 0.4953 Validation Loss: 1.0653 
Epoch 7/100, Training Loss: 0.4709 Validation Loss: 1.0558 
Epoch 8/100, Training Loss: 0.4480 Validation Loss: 1.0465 
Epoch 9/100, Training Loss: 0.4330 Validation Loss: 1.0370 
Epoch 10/100, Training Loss: 0.4146 Validation Loss: 1.0273 
Epoch 11/100, Training Loss: 0.4013 Validation Loss: 1.0174 
Epoch 12/100, Training Loss: 0.3889 Validation Loss: 1.0076 
Epoch 13/100, Training Loss: 0.3757 Validation Loss: 0.9982 
Epoch 14/100, Training Loss: 0.3647 Validation Loss: 0.9895 
Epoch 15/100, Training Loss: 0.3532 Validation Loss: 0.9813 
Epoch 16/100, Training Loss: 0.3438 Validation Loss: 0.9734 
Epoch 17/100, Training Loss: 0.33

In [44]:
from sklearn.metrics import classification_report, f1_score
import numpy as np

def evaluate_model_h(
    model,
    embeddings,
    y_nar_true,
    y_sub_nar_true,
    parent_of_sub,
    thresholds=np.arange(0.1, 1.0, 0.1),
    target_names_nar=mlb_narratives.classes_,
    target_names_sub=mlb_subnarratives.classes_,
):
    best_threshold = 0
    best_f1 = 0
    best_classification_report_nar = None
    best_classification_report_sub = None

    y_nar_true_np = y_nar_true.cpu().numpy()
    y_sub_nar_true_np = y_sub_nar_true.cpu().numpy()

    with torch.no_grad():
        nar_pred_logits, sub_nar_pred_logits = model(embeddings)
        nar_pred_logits = nar_pred_logits.cpu().numpy()
        sub_nar_pred_logits = sub_nar_pred_logits.cpu().numpy()

    for threshold in thresholds:
        nar_predictions = (nar_pred_logits >= 0.5).astype(int)

        sub_nar_predictions = (sub_nar_pred_logits >= threshold).astype(int)

        for s in range(sub_nar_predictions.shape[1]):
            p = parent_of_sub[s]
            sub_nar_predictions[:, s] = sub_nar_predictions[:, s] * nar_predictions[:, p]

        classification_rep_nar = classification_report(
            y_nar_true_np, nar_predictions,
            target_names=target_names_nar, zero_division=0
        )
        classification_rep_sub = classification_report(
            y_sub_nar_true_np, sub_nar_predictions,
            target_names=target_names_sub, zero_division=0
        )

        f1_nar = f1_score(y_nar_true_np, nar_predictions, average='macro')
        f1_sub = f1_score(y_sub_nar_true_np, sub_nar_predictions, average='macro')

        avg_f1 = (f1_nar + f1_sub) / 2.0

        if avg_f1 > best_f1:
            best_f1 = avg_f1
            best_threshold = threshold
            best_classification_report_nar = classification_rep_nar
            best_classification_report_sub = classification_rep_sub

    print(f"Best Threshold: {best_threshold}, Best F1 Score (avg nar/sub): {best_f1:.3f}\n")
    print("\nBest Narratives Classification Report:")
    print(best_classification_report_nar)
    print("\nBest Sub-Narratives Classification Report:")
    print(best_classification_report_sub)

In [45]:
evaluate_model_h(
    model=trained_hierarchy,
    embeddings=val_embeddings_tensor,
    y_nar_true=y_val_nar,
    parent_of_sub=subnarrative_to_narrative_map,
    y_sub_nar_true=y_val_sub_nar,
)

Best Threshold: 0.6, Best F1 Score (avg nar/sub): 0.358


Best Narratives Classification Report:
                                                   precision    recall  f1-score   support

                         Amplifying Climate Fears       0.72      0.96      0.82        49
                     Amplifying war-related fears       0.55      0.74      0.63        50
Blaming the war on others rather than the invader       0.19      0.62      0.29        34
                     Climate change is beneficial       0.00      0.00      0.00         3
             Controversy about green technologies       0.44      0.67      0.53         6
                    Criticism of climate movement       0.37      0.70      0.48        10
                    Criticism of climate policies       0.38      0.82      0.52        22
        Criticism of institutions and authorities       0.52      0.76      0.62        34
                             Discrediting Ukraine       0.63      0.87      0.73   

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
