# Semeval 2025 Task 10
### Subtask 2: Narrative Baseline Classification -- Multilingual

Given a news article and a [two-level taxonomy of narrative labels](https://propaganda.math.unipd.it/semeval2025task10/NARRATIVE-TAXONOMIES.pdf) (where each narrative is subdivided into subnarratives) from a particular domain, assign to the article all the appropriate subnarrative labels. This is a multi-label multi-class document classification task.

## 1. Setup

### 1.1 Getting and analyzing data

In [1]:
import pandas as pd
import numpy as np

import tensorflow as tf
from tensorflow.keras import layers
from tensorflow.keras import optimizers
from tensorflow.keras.callbacks import ModelCheckpoint

from matplotlib import pyplot as plt
import seaborn as sns
import os

In [2]:
root_dir = '../../'

We go ahead and read our data, the data are structures in a way that each article or document is stored in a folder that corresponds to each language:

In [3]:
data = []
ignore_folders = ['.DS_Store']

base_dir_documents = root_dir + 'data/semeval_data/train/raw-documents'

for language_folder in os.listdir(base_dir_documents):

    if language_folder in ignore_folders:
        continue

    language_path = os.path.join(base_dir_documents, language_folder)
    if os.path.isdir(language_path):
        for root, _, files in os.walk(language_path):
            for file in files:
                if file.endswith('.txt'):
                    file_path = os.path.join(root, file)

                    article_id = file
                    with open(file_path, 'r', encoding='utf-8') as f:
                        content = f.read()

                    data.append({
                        'language': language_folder,
                        'article_id': article_id,
                        'content': content
                    })

documents_df = pd.DataFrame(data)

This is how the dataframe looks like:

In [4]:
print(documents_df.shape)
documents_df.head()

(1709, 3)


Unnamed: 0,language,article_id,content
0,RU,RU-URW-1161.txt,В ближайшие два месяца США будут стремиться к ...
1,RU,RU-URW-1175.txt,В ЕС испугались последствий популярности правы...
2,RU,RU-URW-1149.txt,Возможность признания Аллы Пугачевой иностранн...
3,RU,RU-URW-1015.txt,Азаров рассказал о смене риторики Киева по пер...
4,RU,RU-URW-1001.txt,В россиянах проснулась массовая любовь к путеш...


The dataframe contains languages from 5 different languages:

In [5]:
documents_df['language'].unique()

array(['RU', 'PT', 'BG', 'HI', 'EN'], dtype=object)

The labels are structured as follows:
* Each line contains:
  - `article_id` (the file name of the article)
  - `narratives`: one or more narrative labels (1st level taxonomy)
  - `subnarratives`: one or more sub-narrative labels (2nd level taxonomy)
  
If no specific narrative or subnarrative is assigned, "Other" is used. If only a narrative is provided without a subnarrative, the format `[Narrative]: Other` is used.

**Example:**
```
article_id narratives subnarratives 

EN_10001.txt URW: Blaming Others URW: Ukraine is the aggressor
EN_10002.txt URW: Blaming Others; URW: Praise of Russia URW: Blaming Others: Other; URW: Praising Russia’s military might
EN_10003.txt Other Other
```

In [6]:
base_dir_labels = root_dir + 'data/semeval_data/train/labels'

raw_annotation_data = []

for language_folder in os.listdir(base_dir_labels):

    if language_folder in ignore_folders:
        continue

    print('Now processing language', language_folder)

    language_path = os.path.join(base_dir_labels, language_folder)
    if os.path.isdir(language_path):
        for root, _, files in os.walk(language_path):
            label_file = 'subtask-2-annotations.txt'
            file_path = os.path.join(root, label_file)

            with open(file_path, 'r') as file:
                for line in file:
                    parts = line.strip().split('\t')
                    article_id = parts[0]
                    narrative_to_subnarratives = parts[2].split(';')
                    narratives = []
                    subnarratives = []

                    for nar_to_sub in narrative_to_subnarratives:
                      subnarrative_list = nar_to_sub.split(' ')
                      if subnarrative_list[0] == 'Other':
                        narratives.append('Other')
                        subnarratives.append('Other')
                        continue

                      nar_to_sub = ' '.join(subnarrative_list[1:])
                      nar, sub = nar_to_sub.split(':')
                      narratives.append(nar.strip())
                      subnarratives.append(sub.strip())

                    raw_annotation_data.append({
                        'article_id': article_id,
                        'narratives': narratives,
                        'subnarratives': subnarratives
                    })

annotations_df = pd.DataFrame(raw_annotation_data)

Now processing language RU
Now processing language PT
Now processing language BG
Now processing language HI
Now processing language EN


In [7]:
from collections import defaultdict

narrative_to_subnarratives = defaultdict(set)

for record in raw_annotation_data:
    narratives = record['narratives']
    subnarratives = record['subnarratives']

    for nar, sub in zip(narratives, subnarratives):
        narrative_to_subnarratives[nar].add(sub)

narrative_to_subnarratives = {nar: list(subs) for nar, subs in narrative_to_subnarratives.items()}

In [8]:
annotations_df.head()

Unnamed: 0,article_id,narratives,subnarratives
0,RU-URW-1080.txt,[Discrediting Ukraine],[Discrediting Ukrainian government and officia...
1,RU-URW-1013.txt,"[Discrediting the West, Diplomacy]","[The West does not care about Ukraine, only ab..."
2,RU-URW-1145.txt,[Praise of Russia],[Praise of Russian military might]
3,RU-URW-1048.txt,[Discrediting Ukraine],[Discrediting Ukrainian military]
4,RU-URW-1001.txt,[Praise of Russia],[Russia is a guarantor of peace and prosperity]


In [9]:
annotations_df.tail()

Unnamed: 0,article_id,narratives,subnarratives
1694,EN_CC_200022.txt,"[Criticism of institutions and authorities, Cr...","[Criticism of national governments, Other, Met..."
1695,EN_CC_100028.txt,[Other],[Other]
1696,EN_CC_300010.txt,[Amplifying Climate Fears],[Other]
1697,EN_UA_013257.txt,"[Russia is the Victim, Blaming the war on othe...",[Russia actions in Ukraine are only self-defen...
1698,EN_UA_000104.txt,[Other],[Other]


In [10]:
annotations_df.shape

(1699, 3)

In [11]:
dataset = pd.merge(documents_df, annotations_df, on='article_id')
dataset.head()

Unnamed: 0,language,article_id,content,narratives,subnarratives
0,RU,RU-URW-1161.txt,В ближайшие два месяца США будут стремиться к ...,[Blaming the war on others rather than the inv...,"[The West are the aggressors, Other, The West ..."
1,RU,RU-URW-1175.txt,В ЕС испугались последствий популярности правы...,"[Discrediting the West, Diplomacy, Discreditin...","[The West is weak, Other, The EU is divided]"
2,RU,RU-URW-1149.txt,Возможность признания Аллы Пугачевой иностранн...,[Distrust towards Media],[Western media is an instrument of propaganda]
3,RU,RU-URW-1015.txt,Азаров рассказал о смене риторики Киева по пер...,"[Discrediting Ukraine, Discrediting Ukraine]","[Ukraine is a puppet of the West, Discrediting..."
4,RU,RU-URW-1001.txt,В россиянах проснулась массовая любовь к путеш...,[Praise of Russia],[Russia is a guarantor of peace and prosperity]


In [12]:
dataset.shape

(1699, 5)

This is how an English article looks, like, notice that the article by it's self is newline separated, these consequent newlines indicate the start of a new paragraph.

This is how an English article looks like:
* It is insightful to know that each article is consequently tab separated, indicatiing the start of a new paragraph.

In [13]:
row = 5
english_article = dataset[dataset['language'] == 'EN'].iloc[row].content
english_article

'Trump Lawyer Demands Accountability From Intel Chiefs Who Backed Hunter Biden \n\n An attorney for former President Donald Trump wants the 51 former intelligence chiefs held responsible for backing Hunter Biden in the unfolding story of the laptop abandoned in a Delaware repair shop.\n\nLawyer Tim Parlatore\'s goal is to uncover alleged communications between the 51 former senior intel leaders and the Biden 2020 campaign.\n\nPolitico had reported that an Oct. 19, 2020, letter, signed by the former intelligence officials, outlined their assessment that a New York Post disclosure of emails allegedly belonging to Hunter Biden "has all the classic earmarks of a Russia information operation."\n\nThose signing the letter included former CIA Directors Leon Panetta, Mike Hayden and John Brennan, along with former Director of National Intelligence James Clapper.\n\nThe letter offered no evidence, but raised suspicions by the former intel officials.\n\nThe Post had previously reported that duri

In [14]:
dataset.shape

(1699, 5)

In [15]:
dataset['narratives']

0       [Blaming the war on others rather than the inv...
1       [Discrediting the West, Diplomacy, Discreditin...
2                                [Distrust towards Media]
3            [Discrediting Ukraine, Discrediting Ukraine]
4                                      [Praise of Russia]
                              ...                        
1694                       [Amplifying war-related fears]
1695    [Criticism of climate movement, Downplaying cl...
1696    [Criticism of institutions and authorities, Co...
1697                           [Speculating war outcomes]
1698                           [Amplifying Climate Fears]
Name: narratives, Length: 1699, dtype: object

In [16]:
unique_narratives = dataset['narratives'].explode().unique()
unique_narratives

array(['Blaming the war on others rather than the invader',
       'Discrediting the West, Diplomacy',
       'Hidden plots by secret schemes of powerful groups',
       'Discrediting Ukraine', 'Praise of Russia',
       'Distrust towards Media', 'Russia is the Victim',
       'Negative Consequences for the West', 'Speculating war outcomes',
       'Amplifying war-related fears', 'Overpraising the West',
       'Downplaying climate change',
       'Criticism of institutions and authorities',
       'Questioning the measurements and science',
       'Climate change is beneficial', 'Criticism of climate policies',
       'Criticism of climate movement', 'Amplifying Climate Fears',
       'Other', 'Controversy about green technologies',
       'Green policies are geopolitical instruments'], dtype=object)

The frequency of narratives in the dataset: 

In [17]:
print(len(dataset['narratives'].explode().value_counts()))
dataset['narratives'].explode().value_counts()

21


narratives
Discrediting Ukraine                                 584
Discrediting the West, Diplomacy                     452
Praise of Russia                                     406
Amplifying Climate Fears                             357
Other                                                324
Amplifying war-related fears                         297
Russia is the Victim                                 229
Criticism of institutions and authorities            216
Blaming the war on others rather than the invader    194
Speculating war outcomes                             132
Criticism of climate policies                        127
Negative Consequences for the West                   104
Criticism of climate movement                         84
Hidden plots by secret schemes of powerful groups     84
Downplaying climate change                            68
Distrust towards Media                                53
Overpraising the West                                 51
Controversy about gr

In [18]:
unique_subnarratives = dataset['subnarratives'].explode().unique()
unique_subnarratives

array(['The West are the aggressors', 'Other', 'The West is weak',
       'Ukraine is a puppet of the West',
       'Ukraine is associated with nazism',
       'Russia is a guarantor of peace and prosperity',
       'The West does not care about Ukraine, only about its interests',
       'The EU is divided',
       'Western media is an instrument of propaganda',
       'Discrediting Ukrainian government and officials and policies',
       'The West is overreacting', 'UA is anti-RU extremists',
       'Discrediting Ukrainian nation and society',
       'Discrediting Ukrainian military',
       'Ukrainian media cannot be trusted',
       'Praise of Russian military might', 'The West is russophobic',
       'Ukrainian army is collapsing',
       'Russia has international support from a number of countries and people',
       'Praise of Russian President Vladimir Putin',
       'By continuing the war we risk WWIII', 'Ukraine is the aggressor',
       'Russia actions in Ukraine are only sel

In [19]:
len(unique_subnarratives)

74

The frequency of subnarratives in the dataset:

In [20]:
pd.set_option('display.max_rows', 100)

dataset['subnarratives'].explode().value_counts()

subnarratives
Other                                                                     1164
Amplifying existing fears of global warming                                178
Discrediting Ukrainian government and officials and policies               157
Praise of Russian military might                                           145
The West are the aggressors                                                112
Ukraine is a puppet of the West                                            106
Russia is a guarantor of peace and prosperity                              101
Discrediting Ukrainian military                                            100
There is a real possibility that nuclear weapons will be employed           96
Criticism of national governments                                           85
The West does not care about Ukraine, only about its interests              85
Ukraine is the aggressor                                                    78
Russia has international support from 

### 1.2 Encoding classification labels

We will know transofrm'narratives' and 'subnarratives' columns
into binary format using MultiLabelBinarizer. 
* Each unique label is represented by a binary vector, enabling the model to handle multiple labels per instance for both narratives and subnarratives.

In [21]:
from sklearn.preprocessing import MultiLabelBinarizer

mlb_narratives = MultiLabelBinarizer()
mlb_subnarratives = MultiLabelBinarizer()

In [22]:
narratives_binary = mlb_narratives.fit_transform(dataset['narratives'])
subnarratives_binary = mlb_subnarratives.fit_transform(dataset['subnarratives'])

dataset['narratives_encoded'] = narratives_binary.tolist()
dataset['subnarratives_encoded'] = subnarratives_binary.tolist()

In [23]:
dataset.head()

Unnamed: 0,language,article_id,content,narratives,subnarratives,narratives_encoded,subnarratives_encoded
0,RU,RU-URW-1161.txt,В ближайшие два месяца США будут стремиться к ...,[Blaming the war on others rather than the inv...,"[The West are the aggressors, Other, The West ...","[0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, ...","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ..."
1,RU,RU-URW-1175.txt,В ЕС испугались последствий популярности правы...,"[Discrediting the West, Diplomacy, Discreditin...","[The West is weak, Other, The EU is divided]","[0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, ...","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ..."
2,RU,RU-URW-1149.txt,Возможность признания Аллы Пугачевой иностранн...,[Distrust towards Media],[Western media is an instrument of propaganda],"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, ...","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ..."
3,RU,RU-URW-1015.txt,Азаров рассказал о смене риторики Киева по пер...,"[Discrediting Ukraine, Discrediting Ukraine]","[Ukraine is a puppet of the West, Discrediting...","[0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, ...","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ..."
4,RU,RU-URW-1001.txt,В россиянах проснулась массовая любовь к путеш...,[Praise of Russia],[Russia is a guarantor of peace and prosperity],"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ..."


In [24]:
import pickle

base_save_folder_dir = './saved/'

os.makedirs(base_save_folder_dir, exist_ok=True)
datasets_folder = os.path.join(base_save_folder_dir, 'Dataset')
with open(os.path.join(datasets_folder, 'dataset.pkl'), 'wb') as f:
    pickle.dump(dataset, f)

### 1.3 Cleaning articles

We will use spaCy to load pre-trained language models for different languages, which will help us clean and preprocess article text.
* For each language-model that spacy supports, we will load it, otherwise we will fallback to `xx_ent_wiki_sm`

In [25]:
language_model_map = {
    "BG": "xx_ent_wiki_sm",
    "EN": "en_core_web_sm",
    "HI": "xx_ent_wiki_sm",
    "PT": "pt_core_news_sm",
    "RU": "ru_core_news_sm",
}

!python3 -m spacy download xx_ent_wiki_sm
!python3 -m spacy download pt_core_news_sm
!python3 -m spacy download ru_core_news_sm
!python3 -m spacy download en_core_web_sm

Collecting xx-ent-wiki-sm==3.8.0
  Using cached https://github.com/explosion/spacy-models/releases/download/xx_ent_wiki_sm-3.8.0/xx_ent_wiki_sm-3.8.0-py3-none-any.whl (11.1 MB)
[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('xx_ent_wiki_sm')
Collecting pt-core-news-sm==3.8.0
  Using cached https://github.com/explosion/spacy-models/releases/download/pt_core_news_sm-3.8.0/pt_core_news_sm-3.8.0-py3-none-any.whl (13.0 MB)
[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('pt_core_news_sm')
Collecting ru-core-news-sm==3.8.0
  Using cached https://github.com/explosion/spacy-models/releases/download/ru_core_news_sm-3.8.0/ru_core_news_sm-3.8.0-py3-none-any.whl (15.3 MB)
[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('ru_core_news_sm')
Collecting en-core-web-sm==3.8.0
  Using cached https://github.com/explosion/spacy-models/releases/download/en_core_web

We will also use the emoji library to remove certain emojis that appeared in the articles, and that's because they don't add meaningful context

In [26]:
!pip3 -q install emoji

In [27]:
import spacy
import emoji

nlp_models = {lang: spacy.load(model) for lang, model in language_model_map.items()}

The goal of this cleaning is to prepare the article text for analysis by removing irrelevant or noisy data, like URLs, emails, social media mentions, emojis.

* It also normalizes the text by converting non-entity words to lowercase and keeps important entities (like people, organizations, and locations) in their original case. This is more task-focused since I think they may add some context to our classification task.
* Notice also, that we are splitting our text in paragraphs, this is done just because article's are quite long, and this preparation will later help the embedding data preparation.

In [28]:
import re

class ArticleCleaner:
    def __init__(self, nlp_models):
        self.nlp_models = nlp_models

    def _clean_paragraph(self, paragraph, nlp):
        """Cleans individual paragraphs by removing links, emails, and normalizing tokens."""
        # Remove URLs, emails, and mentions
        paragraph = re.sub(
            r'http\S+|www\S+|https\S+|[a-zA-Z0-9.-]+\.com|[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+|@[A-Za-z0-9_]+',
            '',
            paragraph
        )

        doc = nlp(paragraph)
        cleaned_tokens = []
        important_entity_types = ["PERSON", "ORG", "GPE"]

        for token in doc:
            if token.is_space or emoji.is_emoji(token.text):
                continue

            if token.ent_type_ in important_entity_types:
                cleaned_tokens.append(token.text + token.whitespace_)
            else:
                cleaned_tokens.append(token.text.lower() + token.whitespace_)

        return "".join(cleaned_tokens).strip()

    def _preprocess_article_text(self, article_text):
        """Preprocess the article text by splitting into header, body, and footer."""
        parts = re.split(r'\n{2,}', article_text)

        if len(parts) > 2:
            header = parts[0].strip()
            footer = parts[-1].strip()
            body = parts[1:-1]
        else:
            header = parts[0].strip() if len(parts) > 0 else ""
            footer = parts[1].strip() if len(parts) > 1 else ""
            body = []

        return header, body, footer

    def clean_article_with_paragraphs(self, article_text, language_code):
        """Main method to clean the article by processing the header, body, and footer."""
        nlp = self.nlp_models.get(language_code, self.nlp_models["EN"])

        header, body, footer = self._preprocess_article_text(article_text)

        cleaned_header = f"<PARA>{self._clean_paragraph(header, nlp)}</PARA>" if header else ""
        cleaned_footer = f"<PARA>{self._clean_paragraph(footer, nlp)}</PARA>" if footer else ""
        cleaned_body = " ".join([self._clean_paragraph(paragraph, nlp) for paragraph in body])

        combined_text = "\n\n".join(filter(None, [cleaned_header, cleaned_body, cleaned_footer]))
        return combined_text.strip()

In [29]:
article_cleaner = ArticleCleaner(nlp_models)

In [30]:
dataset["content"] = dataset.apply(
    lambda row: article_cleaner.clean_article_with_paragraphs(row["content"], row["language"]),
    axis=1
)

This is how the new, modified article looks like:

In [31]:
row = 7
english_article = dataset[dataset['language'] == 'EN'].iloc[row].content
english_article



In [32]:
def split_into_sections(content):
    parts = re.split(r'<PARA>|</PARA>', content)
    parts = [p.strip() for p in parts if p.strip()]

    if len(parts) == 1:
        return parts[0], "", ""
    elif len(parts) == 2:
        return parts[0], parts[1], ""
    else:
        header = parts[0]
        footer = parts[-1]
        body = " ".join(parts[1:-1])
        return header, body, footer

We do a sanity check to see if our paragraph split works:

In [33]:
header, body, footer = split_into_sections(english_article)
print("Header: ", header)
print("\n\n")
print("Body: ", body)
print("\n\n")
print("Footer: ", footer)

Header:  UN chief Warns of global economic crisis at world economic forum in Davos






Footer:  other tech firms, such as Amazon, Meta, Alphabet, Salesforce, and Twitter, have announced similar moves in recent weeks. Microsoft, based in Redmond, Washington, had 221,000 full-time employees as of june 30, 2022, according to government filings.


In [34]:
dataset.head()

Unnamed: 0,language,article_id,content,narratives,subnarratives,narratives_encoded,subnarratives_encoded
0,RU,RU-URW-1161.txt,<PARA>в ближайшие два месяца сша будут стремит...,[Blaming the war on others rather than the inv...,"[The West are the aggressors, Other, The West ...","[0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, ...","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ..."
1,RU,RU-URW-1175.txt,<PARA>в ес испугались последствий популярности...,"[Discrediting the West, Diplomacy, Discreditin...","[The West is weak, Other, The EU is divided]","[0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, ...","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ..."
2,RU,RU-URW-1149.txt,<PARA>возможность признания аллы пугачевой ино...,[Distrust towards Media],[Western media is an instrument of propaganda],"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, ...","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ..."
3,RU,RU-URW-1015.txt,<PARA>азаров рассказал о смене риторики киева ...,"[Discrediting Ukraine, Discrediting Ukraine]","[Ukraine is a puppet of the West, Discrediting...","[0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, ...","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ..."
4,RU,RU-URW-1001.txt,<PARA>в россиянах проснулась массовая любовь к...,[Praise of Russia],[Russia is a guarantor of peace and prosperity],"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ..."


In [35]:
narrative_to_subnarratives

{'Discrediting Ukraine': ['Other',
  'Rewriting Ukraine’s history',
  'Discrediting Ukrainian military',
  'Ukraine is a hub for criminal activities',
  'Ukraine is associated with nazism',
  'Discrediting Ukrainian nation and society',
  'Situation in Ukraine is hopeless',
  'Ukraine is a puppet of the West',
  'Discrediting Ukrainian government and officials and policies'],
 'Discrediting the West, Diplomacy': ['Diplomacy does/will not work',
  'Other',
  'The EU is divided',
  'The West is weak',
  'The West does not care about Ukraine, only about its interests',
  'The West is overreacting',
  'West is tired of Ukraine'],
 'Praise of Russia': ['Other',
  'Praise of Russian President Vladimir Putin',
  'Russia is a guarantor of peace and prosperity',
  'Russian invasion has strong national support',
  'Russia has international support from a number of countries and people',
  'Praise of Russian military might'],
 'Russia is the Victim': ['Other',
  'The West is russophobic',
  'UA i

In [36]:
label_encoder_folder = os.path.join(base_save_folder_dir, 'LabelEncoders')
misc_folder = os.path.join(base_save_folder_dir, 'Misc')

with open(os.path.join(datasets_folder, 'dataset_cleaned.pkl'), 'wb') as f:
    pickle.dump(dataset, f)

with open(os.path.join(label_encoder_folder, 'mlb_narratives.pkl'), 'wb') as f:
    pickle.dump(mlb_narratives, f)

with open(os.path.join(label_encoder_folder, 'mlb_subnarratives.pkl'), 'wb') as f:
    pickle.dump(mlb_subnarratives, f)

with open(os.path.join(misc_folder, 'narrative_to_subnarratives.pkl'), 'wb') as f:
    pickle.dump(narrative_to_subnarratives, f)