Code repository for the paper: Discord Questions: A Computational Approach To Diversity Analysis in News Coverage accepted as a Findings paper at EMNLP 2022.
The Discord Questions pipeline can take as input collection of text documents, and extract questions that best represent the "discord" (disagreement) in the colleciton. To see how to apply the pipeline in an end-to-end manner on your own collection of text, see the example in the Jupyter notebook Example_Pipeline.ipynb.
The pipeline is composed of three components, all of which we release publicly, as HuggingFace Hub models. Each model card comes with example usage, so that each component can be used independently.
The Question Generation model (https://huggingface.co/Salesforce/discord_qg) is a Bart-large model trained on a combination of QA datasets (see paper for detail). It achieves the highest performance on our evaluation framework, by being able to generate the largest fraction of Discord Questions. See DiscordQG_Eval.ipynb for experimental comparison.
The Question Answering model (https://huggingface.co/Salesforce/discord_qa) is a RoBERTa-large model trained on a combination of SQuAD 2.0 and NewsQA. This model is a standard extractive QA model, and can be replaced with more domain-specific models based on the textual domain.
The Answer Consolidation model (https://huggingface.co/Salesforce/qa_consolidation) is a RoBERTa-large model which achieves the highest performance on the NAnCo evaluation dataset we annotated to evaluate answer consolidation methods. We release the NAnCo annotations as well (nanco_dataset.json), as well as a Jupyter Notebook that reproduces the experimental comparison of answer consolidation models presented in the paper (NAnCo.ipynb).
If you make use of the code, models, or pipeline, please cite our paper:
@inproceedings{laban2022discord_questions,
title={Discord Questions: A Computational Approach To Diversity Analysis in News Coverage},
author={Philippe Laban and Chien-Sheng Wu and Lidiya Murakhovs'ka and Xiang 'Anthony' Chen and Caiming Xiong},
booktitle={Proceedings of the 2022 Findings of Empirical Methods in Natural Language Processing},
volume={1},
year={2022}
}
If you'd like to contribute, or have questions or suggestions, reach out in the Issues, or by email: plaban@salesforce.com All contributions welcome, for example if you want to apply the Discord Questions pipeline to a new corpus of text, or want to improve pipeline components.

