In [None]:
## (Labeling - I) ##

# The labeling in this part was done manually based on the "../../../output/master_code_prep_output/mmr_selected/Q69_mmr_selected.csv" dataset
# A new match column was created and binary hand labeling was performed based on the combined scores (question ID and likert scale)
# Later on, labeled data was enriched after the first training run
# The resulting labeled dataset is saved as "Q69_mmr_selected_labeled.csv" in the labeled_data directory
# The data is then cleaned for further processing using the following script and saved as "Q69_mmr_selected_labeled_combined.csv"

%run clean_labeled_data.py

Loading ../../../data/labeled_data/Q69_mmr_selected_labeled.csv ...
Loading ../../../output/master_code_prep_output/top_scored_sentences.csv ...
Creating combined sentence column...
Saving to ../../../data/labeled_data/Q69_mmr_selected_labeled_combined.csv ...
Done.


In [2]:
## (Training) ##

# Note: This code was used twice:
# (1) The first run was made right after the hand labeling was done.
# (2) The second training used a combined labeled dataset of hand labeled data and the hand-picked accurate predictions from the first run by relabeling. 

# Load and filter labeled data, focusing on relevant combined_labels and positive matches.
# Perform stratified train-validation split by label with a fixed holdout fraction.
# Augment training data by adding synonym-replaced sentences to improve robustness.
# Encode sentences with SBERT embeddings and train a CatBoost multi-class classifier.
# Evaluate the model on validation data and save the trained model and label mappings.

%run train_catboost_labeled.py

[nltk_data] Downloading package wordnet to
[nltk_data]     C:\Users\secki\AppData\Roaming\nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package omw-1.4 to
[nltk_data]     C:\Users\secki\AppData\Roaming\nltk_data...
[nltk_data]   Package omw-1.4 is already up-to-date!


Loading data...
Loading SBERT model...
Encoding all sentences for embedding hash computation...


Batches:   0%|          | 0/2 [00:00<?, ?it/s]

Splitting data stratified by embedding_hash...
Training samples before augmentation: 31
Validation samples before filtering: 26
Performing synonym augmentation on training set...
Training samples after augmentation: 93
Encoding training sentences after augmentation...


Batches:   0%|          | 0/3 [00:00<?, ?it/s]

Encoding validation sentences...


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Encoding original training sentences for permutation tests...


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Filtering validation samples too close to training samples...
Validation samples after cosine similarity filtering: 26
Training CatBoost classifier...
0:	learn: 1.3769563	test: 1.3864435	best: 1.3864435 (0)	total: 177ms	remaining: 5m 54s
100:	learn: 0.7770467	test: 1.2916718	best: 1.2916718 (100)	total: 1.57s	remaining: 29.6s
200:	learn: 0.4946532	test: 1.2564205	best: 1.2564205 (200)	total: 2.83s	remaining: 25.3s
300:	learn: 0.3546881	test: 1.2329596	best: 1.2326020 (297)	total: 4.26s	remaining: 24s
400:	learn: 0.2675674	test: 1.2163287	best: 1.2163287 (400)	total: 5.69s	remaining: 22.7s
500:	learn: 0.2139890	test: 1.2037362	best: 1.2036351 (497)	total: 7.01s	remaining: 21s
600:	learn: 0.1789388	test: 1.1900216	best: 1.1900216 (600)	total: 8.26s	remaining: 19.2s
700:	learn: 0.1524913	test: 1.1802114	best: 1.1802114 (700)	total: 9.64s	remaining: 17.9s
800:	learn: 0.1324894	test: 1.1750211	best: 1.1750211 (800)	total: 11s	remaining: 16.5s
900:	learn: 0.1173960	test: 1.1685129	best: 1.16

KeyboardInterrupt: 

In [3]:
# (Prediction Post-Processing Fine-Tuning)
# Optional keyword extraction fine-tuning code using KeyBERT
%run tune_keywords_keybert.py

Top keyphrases for Q69_1:
 - service reflects rule law finest form peacekeeping (score: 0.725)
 - express confidence national police impartial (score: 0.672)
 - trust enforcement reflects integrity accountability equality moral (score: 0.629)

Top keyphrases for Q69_2:
 - trust policing institutions acknowledge continued reforms peacekeeping (score: 0.816)
 - enforcement agencies shaped authority responsiveness serve peace (score: 0.738)
 - quite lot confidence police maintain strong (score: 0.640)

Top keyphrases for Q69_3:
 - requiring oversight peacekeeping remains evolving instrument effective (score: 0.726)
 - confidence law enforcement measured acknowledging commendable efforts (score: 0.648)
 - trust power untested public order sustained solely (score: 0.515)

Top keyphrases for Q69_4:
 - accountability misuse power urge reforms international peacekeeping (score: 0.743)
 - police country confidence police institutions (score: 0.609)
 - security inverted measure public trust pres

In [5]:
## (Prediction - I) ##

# This prediction only uses the top_scored_sentences.csv dataset, which is the output of the data_prep_scoring.py script.
# The prediction dataset contains the most uncertain sentences in the previous measurements
# As stated, the accurate outputs will be used to enrich the training data for the second training run.

# Load unseen sentences excluding those already labeled in training by embedding hash.
# Use a pre-trained CatBoost model and SBERT embeddings to predict combined_labels on unseen data.
# Filter predicted sentences by confidence threshold (perc_above_chance >= threshold).
# Merge additional metadata from the full scored sentences dataset based on predicted labels.
# Display a preview and save filtered, labeled predictions to CSV for further use.

%run predict_catboost_top_score.py

Selected 4518 unseen sentences for prediction.
Encoding unseen sentences...


Encoding batches: 100%|██████████| 142/142 [00:13<00:00, 10.27it/s]


Extracting keyphrases per label for semantic filtering...
Applying semantic similarity filtering per predicted label...
Applying joint scoring and filtering...
Kept 271 predictions after joint scoring filtering.
Saved the results to ../../../output/question_pipeline_output/q69_predictions/q69_predictions_top_score_filtered.csv


In [None]:
## (Labeling - II) ##

# This function appends accurately predicted sentences from the predictions file
# to the labeled dataset based on a provided list of selected sentence hashes.
# It filters these hashes to those present in the predictions, then matches them
# with the top scored sentences and marks them as matched.
# Finally, it concatenates these new matched sentences with the existing labeled data and saves it.

# After this code is run, the model was trained the second time with the appended labeled dataset.
# Note: Due to changes resulting from the hand-labeling of the data, the corresponding response and adapted hypotheses can be misleading
# in the current state.

#%run label_append_after_pred.py

In [None]:
## (Training - II) ##

# After the Labeling - II code is run, the model was trained the second time with the appended labeled dataset.

In [6]:
## (Prediction - II) ##

# This code loads UNGA speech sentences and WVC metadata, excluding training data to prevent leakage.
# It encodes filtered sentences with a SentenceTransformer model, then predicts classes and confidences using a CatBoost classifier.
# Predictions with confidence above chance threshold are kept, and relevant columns are selected.
# It merges the predictions with metadata from the WVC dataset based on predicted labels.
# Finally, it previews and saves the combined prediction results to a CSV file for further use.

%run predict_catboost_unga_wvs7.py

Loading combined labeled CSV...
Extracting top keyphrases per label...
Loading SBERT model...
Loading unseen data...
Filtered unseen sentences count: 25646
Loading CatBoost model...
Encoding unseen sentences...


Encoding batches: 100%|██████████| 802/802 [02:03<00:00,  6.52it/s]


Predicting classes and probabilities...
Applying semantic similarity filtering per predicted label...
Applying joint scoring and filtering...
Kept 758 predictions after joint scoring filtering.
Saved the results to ../../../output/question_pipeline_output/q69_predictions/q69_predictions_filtered.csv


In [7]:
# Predict UNSG speeches using the trained model
%run predict_unsg_address.py

Loading combined labeled CSV...
Extracting top keyphrases per label...
Loading SBERT model...
Loading unseen data...
Loading CatBoost model...
Encoding unseen sentences...


Encoding batches: 100%|██████████| 21/21 [00:00<00:00, 30.75it/s]


Predicting classes and probabilities...
Applying semantic similarity filtering per predicted label...
Applying joint scoring and filtering...
Kept 51 predictions after joint scoring filtering.
Saved the results to ../../../output/question_pipeline_output/q69_predictions/q69_predictions_unsg.csv


In [9]:
## (Post-Processing) ##

# This code analyzes predicted labels per country-year from the predictions dataset,
# identifies the most frequent labels for each country-year
# filters to keep only country-years present in the WVS dataset,
# and finally displays and saves the summarized results.

%run get_q69_frequencies.py

Unnamed: 0,B_COUNTRY_ALPHA,A_YEAR,most_frequent_label,most_frequent_count
0,AND,2018,Q69_4,2
1,ARG,2017,Q69_1,1
2,ARM,2021,Q69_4,1
3,AUS,2018,Q69_2,2
4,BGD,2018,Q69_1,1
5,BRA,2018,Q69_3,1
6,CAN,2020,Q69_4,2
7,CHL,2018,Q69_4,3
8,CHN,2018,Q69_4,2
9,COL,2018,Q69_1,2


Saved to ../../../output/question_pipeline_output/q69_output/q69_country_year_top_labels.csv


In [10]:
## (Visualization -I) ##

# This code filters predicted Q69-related labels from the predictions dataset,
# calculates sentence counts and proportions of given response per country,
# prepares the data for ordered plotting,
# then creates and displays a scatter plot showing the proportion of Q69_1 sentences relative to total Q69 sentences by country,
# with point sizes representing sentence counts, x-axis showing the countries and y-axis the proportion of given response to total count responses.

%run visualize_preds_props.py

In [11]:
## (Visualization -II) ##

# This code loads World Values Survey (WVS) data and filters it by countries found in scored sentences.
# It processes Q69 survey responses to calculate per-country counts and proportions of respondents answering '1'.
# It prepares the country data in a categorical order for consistent plotting.
# Then it creates a scatter plot showing the proportion of Q69=1 responses per country, sized by total responses.
# The plot visually compares response distributions across countries with hover details and a clean layout.
# The x and y-axis labels are the same as the prediction visualization.

%run visualize_response_props.py

In [12]:
## (Visualization -III) ##

# This script compares proportions of a specific Q69 survey response between scored sentences and WVS data at the country-year level.
# It aggregates sentence counts and computes proportions per country-year in the scored predictions.
# It filters the WVS data to matching country-year pairs and computes weighted response proportions using survey weights.
# The code then merges both datasets and visualizes their proportions with connecting lines to compare distributions.
# The plot uses country-year labels on the x-axis and proportion values on the y-axis, with vertical lines visually highlighting differences between WVS survey and scored sentence proportions.
# Finally, it calculates and prints the Pearson correlation between the weighted WVS proportions and scored sentence proportions.

%run visualize_prop_diffs.py

Pearson correlation between WVS weighted and scored sentence proportions: -0.4662
