This is a code respository of our WebConf'24 research paper Global News Synchrony and Diversity During the Start of the COVID-19 Pandemic and ICWSM'24 dataset paper A Multilingual Similarity Dataset for News Article Frame.
The code directory of the sections are:
-
Data collection: ner_art_sampling;
-
Semeval Baseline: Semeval_baseline;
-
Transformer model: network_inference;
-
Graph Clustering data: ner_art_sampling/network_data/input/oslom.tsv_oslo_files/partitions_level_0;
-
Regression Analysis: ner_art_sampling/visualize_matched_inference.py.
The overall processing pipeline is:
The data collection pipeline for multilingual news article similarity is:
