This is a curated list of resources for research on various kinds of information disorder, such as fake news, rumours and satire.
- 📅 Conferences and Workshops
- 📔 Journals
- 📁 Datasets
- ✅ Fact and Veracity Checking
- 🗂 Glossaries
- 🏢 Research Labs
- 📄 Surveys and Reports
- 🔎 Research Tools
- 📝 Tutorials
Conferences
-
David Lazer, Matthew Baum, Nir Grinberg, Lisa Friedland, Kenneth Joseph, Will Hobbs, and Carolina Mattsson. May 2017 (Conference held February 17-18, 2017). Combating Fake News: An Agenda for Research and Action. Cambridge, MA, USA.
Schedule
PDF
Web
-
Multidisciplinary International Symposium on Disinformation in Open Online Media (MISDOOM)
-
Understanding and Addressing the Disinformation Ecosystem. Annenberg School for Communication. December 2017. (Workshop held December 15-16, 2017). Philadelphia, PA, USA.
PDF
Other Programmes, Fellowships and Summer Schools
- GATE Training Course @ Uni. of Sheffield | 5 Days
- Google News Initiative Fellowship
- Summer Doctoral Programme @ Oxford Internet Institute, Uni. of Oxford | 2 Weeks
Workshops
Dataset | Description | Resources |
---|---|---|
BuzzFace | A News Veracity Dataset with Facebook User Commentary and Egos. | Data Paper |
BuzzFeed-Webis Fake News Corpus 2016 | Consists moslty of political news, fact-checked by journalists at BuzzFeed. The data was sourced from the Facebook pages of 9 verified publishers (6 hyperpartisan, 3 left-wing and 3 right-wing), one week before the 2016 US elections. | Data Paper |
FA-KES | A Fake News Dataset around the Syrian War. | Data Paper |
r/Fakeddit | A New Multimodal Benchmark Dataset for Fine-grained Fake News Detection. | Data Paper |
FakeNewsAMT + Celebrity | News on business, celebrity, education, entertainment, politics, sports and technology. Authentic news was collected from reputable sources and fake news was produced by Amazon Mechanical Turk workers. | Data Paper |
FakeNewsData1 | Political news with the highest engagement on Facebook, 9 months before the 2016 US Presidential Election, collected from BuzzFeed; and random political news data collected from three types of sources (Real, Fake and Satire). | Data Paper |
FakeNewsCorpus | A large dataset with millions of articles sourced from opensources.co, WebHose, and The New York Times. | Data |
FakeNewsNet | Comprises of news content (sourced from PolitiFact and GossipCop) and social context (sourced from Twitter) data. | Data Paper 1 Paper 2 Paper 3 |
ISOT | Created by the Information Security and Object Technology (ISOT) research lab, at the University of Victoria. Real articles were obtained from Reuters.com and fake articles were identified using fact-checking sites Politifact.com, and Wikipedia). | Data Paper 1 Paper 2 |
NELA Datasets | A Large Multi-Labelled News Dataset for The Study of Misinformation in News Articles. | All Datasets NELA2017: Data Paper NELA-GT-2018: Data Paper NELA-GT-2019: Data Paper |
SVDC | News on conflict in Syria. Articles were verified using information provided by the Syrian Violations Documentation Centre. | Data |
- EUvsDisinfo: a continuously updated collection of disinformation cases in multiple languages from all over the world.
Data
- Africa Check
- Chequeado
- Dubawa
- Emergent
- EUvsDisinfo
- FactCheck.org
- FactCheckHub
- FactCheckNI
- FactWatcher
- Fiskkit
- Full Fact
- Google News
- Hoaxy
- Logically
- Junk News Aggregator
- Media Bias / Fact Check
- NewsCheck
- our.news
- People's Check
- PolitiFact
- Poynter Fact-checking
- Snopes
- TwitterTrails
- WikiTribune
- Annotation and Analysis
- External Curated Lists
Caroline Jack. 2017. Lexicon of Lies: Terms for Problematic Information, Data & Society Research Institute. PDF
Web
Claire Wardle, Grace Greason, Joe Kerwin, and Nic Dias. 2018. Information Disorder: The Essential Glossary, First Draft. PDF
Web
- BBC News Labs
Github
- Better News (American Press Institute & Knight-Lenfest Newsroom Initiative)
- Co-Inform
- EU DisinfoLab
- Google Jigsaw
- Meedan
Github
- MediaWell @ Social Science Research Council (SSRC)
- The Obsidian Collection
- RoBHat Labs
- Companies
- Astroscreen, UK
- Blackbird.ai, US
- Crisp, UK | US
- Fabula.ai, UK (Acquired by Twitter)
- Factmata, UK
- Graphika, US
- Data & Society
- Public Data Lab
Github
- European Digital Media Observatory
- At universities
- Center for News Literacy @ Stony Brook Uni. Sch. of Journalism
- Discourse Processing Lab @ Simon Fraser University
Github
- Duke Reporters' Lab @ the Sanford Sch. of Public Policy, Duke University
- News Co/Lab @ Arizona State University
- The News Integrity Initiative @ the Newmark J-School, CUNY
- Observatory on Social Media (OSoMe) @ Indiana University
- Tow Centre for Digital Journalism @ the Columbia Journalism Sch., Columbia University
- Partnerships
Papers
-
Mevan Babakar and Will Moy. 2016. The State of Automated Factchecking. Full Fact.
PDF
Web
-
Niall J. Conroy, Victoria L. Rubin, and Yimin Chen. 2015. Automatic deception detection: Methods for finding fake news.
DOI
PDF
-
Georgios Gravanis, Athena Vakali, Konstantinos Diamantaras, and Panagiotis Karadais. 2019. Behind the cues: A benchmarking study for fake news detection.
DOI
PDF
-
Alice Marwick and Rebecca Lewis. 2017. Media Manipulation and Disinformation Online, Data & Society Research Institute.
PDF
Web
-
Ray Oshikawa, Jing Qian, and William Yang Wang. 2018. A Survey on Natural Language Processing for Fake News Detection. arXiv:1811.00770.
PDF
-
Shivam B. Parikh and Pradeep K. Atrey. 2018. Media-Rich Fake News Detection: A Survey.
DOI
PDF
-
Francesco Pierri and Stefano Ceri. 2019. False News On Social Media: A Data-Driven Survey. arXiv:1902.07539.
PDF
-
Reuters Institute. 2018. Reuters Institute Digital News Report 2018.
PDF
-
Jon Roozenbeek and Sander van der Linden. 2019. The Fake News Game: Actively Inoculating Against the Risk of Misinformation.
DOI
PDF
-
Karishma Sharma, Feng Qian, He Jiang, Natali Ruchansky, Ming Zhang, and Yan Liu. 2019. Combating fake news: A survey on identification and mitigation techniques. arXiv:1901.06437.
PDF
-
Kai Shu, Amy Sliva, Suhang Wang, Jiliang Tang, and Huan Liu. 2017. Fake News Detection on Social Media. arXiv:1708.01967.
PDF
-
Savvas Zannettou, Michael Sirivianos, Jeremy Blackburn, and Nicolas Kourtellis. 2019. The web of false information: Rumors, fake news, hoaxes, clickbait, and various other shenanigans. arXiv:1804.03461.
PDF
-
Xinyi Zhou and Reza Zafarani. 2018. Fake News: A Survey of Research, Detection Methods, and Opportunities. arXiv:1812.00315.
PDF
-
Arkaitz Zubiaga, Ahmet Aker, Kalina Bontcheva, Maria Liakata, and Rob Procter. 2018. Detection and resolution of rumours in social media: A survey. arXiv:1704.00656.
PDF
Journals
- Misinformation Review, Harvard Kennedy School
APIs
- Perspective API: machine learning models which score the perceived impact (based on toxicity, insult, profanity, etc.) a comment might have on a conversation.
Github
- Google Fact Check Tools API by Google
Libraries
Note: Most of the libraries listed here are written in Python; some may exist in other languages too.
-
General NLP
- Gensim: topic modelling, document indexing and similarity retrieval with large corpora.
Github
PyPI
- Newspaper3k: article scraping and curation.
Github
PyPI
- NLTK: natural language processing toolkit.
Github
PyPI
- Readability: traditional readability measures based on simple surface characteristics.
Github
PyPI
- Scikit-learn: modules for machine learning and data mining.
Github
PyPI
- spaCy: industrial-strength natural language processing.
Github
PyPI
- Stanford CoreNLP: linguistic annotations, token and sentence boundaries, PoS, named entities, numeric and time values, dependency and constituency parses, coreference, sentiment, quote attributions, and relations.
Github
- The News Landscape Toolkit (NELA): toolkit for assessing news articles and sources.
Github
- Gensim: topic modelling, document indexing and similarity retrieval with large corpora.
-
Data extraction / retrieval
- GetOldTweets3: a Python 3 library and a corresponding command line utility for accessing old tweets.
Github
PyPI
- Hydrator: turn Tweet IDs into Twitter JSON & CSV from your desktop.
Github
- Pattern: web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.
Github
- Scrapy: a framework for speedily extracting data from websites.
Github
PyPI
- Twarc: a command line tool and Python library for archiving Twitter JSON.
Github
PyPI
- TwitterMySQL: pull tweets from the Twitter API and insert them into MySQL.
Github
- GetOldTweets3: a Python 3 library and a corresponding command line utility for accessing old tweets.
-
Visualisation
- Dash: a framework for building ML & data science web apps.
Github
PyPI
- Matplotlib: a comprehensive library for creating static, animated, and interactive visualizations in Python.
Github
PyPI
- NetworkX: creation, manipulation, and study of the structure, dynamics, and functions of complex networks.
Github
PyPI
- Plotly: interactive charts and maps for Python, R, and JavaScript.
Github
PyPI
- Scattertext: beautiful visualizations of how language differs among document types.
Github
PyPI
- Dash: a framework for building ML & data science web apps.
-
Others
Liang Wu, Giovanni Luca Ciamplaglia, and Huan Liu. 2017. Mining Misinformation in Social Media: Understanding Its Rampant Spread, Harm, and Intervention. IEEE International Conference on Data Mining ICDM 2017. Part 1
Part 2
Slides
Web