- Data: The data used in this study includes the computed triplets, the vocabulary files, and the source labels. The corpus is not provided directly in this repository but is openly accessible (see the Corpus section for more details).
- Indicators: Code for individually computing the four indicators can be found in the folders:
content_sharing_network
,references
, andsemantic_shift
. - Ensemble: Code for training unsupervised source embeddings using the indicators.
- Experiments: All the experiments we conducted for our submission to ICWSM 2023.
- Results: Plots of the experiments we conducted for our submission to ICWSM 2023.
- Model: SciLander models as well as baselines models that we used in our experiments.
Set up the environment using Python VirtualEnv. From the root directory, run:
python -m venv venv/
Activate the environment just created:
source venv/bin/activate
Install the dependencies:
pip install -r requirements.txt
The corpus used in this work was a combination of the NELA-GT-2020 and NELA-GT-2021 datasets, which can be downloaded in SQLite and JSON formats.
We filtered the corpus to retrieve only articles related to COVID-19 using a keyword matching procedure. The keywords can be found in data/CDC+COVID_vocab.txt
. We selected all articles for which the title
OR content
had a match with at least one keyword from the list.
The SQLite database can be converted into CSV by using the script preprocessing/nela_to_csv.py
.
Train a triplet loss using the triplets in data/triplets
using the escript ensemble/train_source_embeddings.py
.
cd ensemble
python train_source_embeddings.py
Pre-computed triplets are provided in data/triplets
. If you want to compute your own triplets, you can do so using the following scripts:
- Jargon triplets: Use
references/jargon_triplets.py
. - Stance triplets: Use
references/stance_triplets.py
. - Content sharing triplets: Use
content_sharing_network/csn_features.py
. - Semantic shift triplets: Use
semantic_shift/semantic_shift_triplets.py
.
The pre-trained source embeddings models can be found in directory model
. Look for any file with extension .emb
.
Most experiments require the pre-trained source embedding models from found in the model
folder, in addition to the source labels found in the data
folder.