Code, sample data, and other supplementary material for the paper "Beyond Digital “Echo Chambers”: The Role of Viewpoint Diversity in Political Discussion" accepted at WSDM'23
The paper can be found here:
- ACM DL: Beyond Digital “Echo Chambers”: The Role of Viewpoint Diversity in Political Discussion
- arXiv: Beyond Digital “Echo Chambers”: The Role of Viewpoint Diversity in Political Discussion
Our online talk for WSDM 2023 can be found in the ACM DL page.
If you use our work, please cite us:
@inproceedings{10.1145/3539597.3570487,
author = {Hada, Rishav and Ebrahimi Fard, Amir and Shugars, Sarah and Bianchi, Federico and Rossini, Patricia and Hovy, Dirk and Tromble, Rebekah and Tintarev, Nava},
title = {Beyond Digital "Echo Chambers": The Role of Viewpoint Diversity in Political Discussion},
year = {2023},
isbn = {9781450394079},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3539597.3570487},
doi = {10.1145/3539597.3570487},
booktitle = {Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining},
pages = {33–41},
numpages = {9},
keywords = {conversation network, Twitter, viewpoint diversity, echo chambers},
location = {Singapore, Singapore},
series = {WSDM '23}
}
Each folder in this repository contains separate readme with instructions.
fragmentation_computation.py: code to compute fragmentation values. Takes conversation network constructed in “conversation_retrieval/3_conversation_reconstruction.py” as input.
python fragmentation_computaion.py
representation.py: code to compute representation values. Takes a list of conversations as input. Each conversation in the list is a list of labels per tweet. Ex. [[L1,L2,L2,L4],…..,[L4,L3,L1,L1,L2]].
python representation.py
dyadic_interaction.py: code to compute dyadic interaction values.
To train the 4 classifiers (immigration relevance, immigration claim, daylight relevance, daylight claims) we make use of the standard HuggingFace fine-tuning interface. The model we fine-tuned is BERTweet. Note that for the immigration claim prediction, we forced dataset balancing during training. Nonetheless, all our models are trained using weighted cross entropy loss, that can be replicated with the following tuner:
import torch
from torch import nn
from transformers import Trainer
class WeightedTrainer(Trainer):
def __init__(self, internal_weights=None, *args, **kwargs):
super().__init__(*args, **kwargs)
self.internal_weights = internal_weights
def compute_loss(self, model, inputs, return_outputs=False):
labels = inputs.get("labels").to("cuda")
# forward pass
outputs = model(**inputs)
logits = outputs.get('logits').to("cuda")
logits = logits.double()
# compute custom loss
loss_fct = nn.CrossEntropyLoss(weight=torch.tensor(self.internal_weights).to("cuda"))
loss = loss_fct(logits.view(-1, self.model.config.num_labels), labels.view(-1))
return (loss, outputs) if return_outputs else loss
To create the weights for the labels, you can use sklearn
from sklearn.preprocessing import LabelEncoder
import sklearn
import pandas as pd
train = pd.read_csv("train_data.csv")
le = LabelEncoder()
train["labels"] = le.fit_transform(train["labels"])
class_labels_for_w = list(range(0, len(le.classes_)))
weights = sklearn.utils.class_weight.compute_class_weight(class_weight="balanced",
classes=class_labels_for_w,
y=train["labels"].values.tolist())
These weights can be then passed to the Trainer
trainer = WeightedTrainer(
model=model,
args=training_args,
train_dataset=tokenized_train,
eval_dataset=tokenized_valid,
compute_metrics=compute_metrics,
internal_weights=weights,
)
trainer.train()