Information Disorder Research: Datasets & Tools

This is a curated list of resources for research on various kinds of information disorder, such as fake news, rumours and satire.

📅 Conferences and Workshops

Conferences

Conference for Truth and Trust Online (TTO)
David Lazer, Matthew Baum, Nir Grinberg, Lisa Friedland, Kenneth Joseph, Will Hobbs, and Carolina Mattsson. May 2017 (Conference held February 17-18, 2017). Combating Fake News: An Agenda for Research and Action. Cambridge, MA, USA. Schedule PDF Web
Multidisciplinary International Symposium on Disinformation in Open Online Media (MISDOOM)
Understanding and Addressing the Disinformation Ecosystem. Annenberg School for Communication. December 2017. (Workshop held December 15-16, 2017). Philadelphia, PA, USA. PDF

Other Programmes, Fellowships and Summer Schools

GATE Training Course @ Uni. of Sheffield | 5 Days
Google News Initiative Fellowship
Summer Doctoral Programme @ Oxford Internet Institute, Uni. of Oxford | 2 Weeks

Workshops

International Workshop on News Recommendation and Analytics (INRA)

📔 Journals

The Harvard Kennedy School Misinformation Review

📁 Datasets

Academic

Dataset	Description	Resources
BuzzFace	A News Veracity Dataset with Facebook User Commentary and Egos.	`Data` `Paper`
BuzzFeed-Webis Fake News Corpus 2016	Consists moslty of political news, fact-checked by journalists at BuzzFeed. The data was sourced from the Facebook pages of 9 verified publishers (6 hyperpartisan, 3 left-wing and 3 right-wing), one week before the 2016 US elections.	`Data` `Paper`
FA-KES	A Fake News Dataset around the Syrian War.	`Data` `Paper`
r/Fakeddit	A New Multimodal Benchmark Dataset for Fine-grained Fake News Detection.	`Data` `Paper`
FakeNewsAMT + Celebrity	News on business, celebrity, education, entertainment, politics, sports and technology. Authentic news was collected from reputable sources and fake news was produced by Amazon Mechanical Turk workers.	`Data` `Paper`
FakeNewsData1	Political news with the highest engagement on Facebook, 9 months before the 2016 US Presidential Election, collected from BuzzFeed; and random political news data collected from three types of sources (Real, Fake and Satire).	`Data` `Paper`
FakeNewsCorpus	A large dataset with millions of articles sourced from opensources.co, WebHose, and The New York Times.	`Data`
FakeNewsNet	Comprises of news content (sourced from PolitiFact and GossipCop) and social context (sourced from Twitter) data.	`Data` `Paper 1` `Paper 2` `Paper 3`
ISOT	Created by the Information Security and Object Technology (ISOT) research lab, at the University of Victoria. Real articles were obtained from Reuters.com and fake articles were identified using fact-checking sites Politifact.com, and Wikipedia).	`Data` `Paper 1` `Paper 2`
NELA Datasets	A Large Multi-Labelled News Dataset for The Study of Misinformation in News Articles.	`All Datasets` NELA2017: `Data` `Paper` NELA-GT-2018: `Data` `Paper` NELA-GT-2019: `Data` `Paper`
SVDC	News on conflict in Syria. Articles were verified using information provided by the Syrian Violations Documentation Centre.	`Data`

Non-Academic

EUvsDisinfo: a continuously updated collection of disinformation cases in multiple languages from all over the world. Data

✅ Fact and Veracity Checking

Africa Check
Chequeado
Dubawa
Emergent
EUvsDisinfo
FactCheck.org
FactCheckHub
FactCheckNI
FactWatcher
Fiskkit
Full Fact
Google News
Hoaxy
Logically
Junk News Aggregator
Media Bias / Fact Check
NewsCheck
our.news
People's Check
PolitiFact
Poynter Fact-checking
Snopes
TwitterTrails
WikiTribune
Annotation and Analysis
- FrameTrail
- Hypothesis
External Curated Lists
- Fact checking initiatives in the EU (and in the UK), by the European Digital Media Observatory

🗂 Glossaries

Caroline Jack. 2017. Lexicon of Lies: Terms for Problematic Information, Data & Society Research Institute. PDF Web

Claire Wardle, Grace Greason, Joe Kerwin, and Nic Dias. 2018. Information Disorder: The Essential Glossary, First Draft. PDF Web

📰 News Labs & Initiatives

BBC News Labs Github
Better News (American Press Institute & Knight-Lenfest Newsroom Initiative)
Co-Inform
EU DisinfoLab
Google Jigsaw
Meedan Github
MediaWell @ Social Science Research Council (SSRC)
The Obsidian Collection
RoBHat Labs
Companies
- Astroscreen, UK
- Blackbird.ai, US
- Crisp, UK | US
- Fabula.ai, UK (Acquired by Twitter)
- Factmata, UK
- Graphika, US

🏢 Research Labs & Partnerships

Data & Society
Public Data Lab Github
European Digital Media Observatory
At universities
- Center for News Literacy @ Stony Brook Uni. Sch. of Journalism
- Discourse Processing Lab @ Simon Fraser University Github
- Duke Reporters' Lab @ the Sanford Sch. of Public Policy, Duke University
- News Co/Lab @ Arizona State University
- The News Integrity Initiative @ the Newmark J-School, CUNY
- Observatory on Social Media (OSoMe) @ Indiana University
- Tow Centre for Digital Journalism @ the Columbia Journalism Sch., Columbia University
Partnerships
- Facebook Journalism Project
- Credibility Coalition
- Google News Initiative
- Social Science One @ Harvard University
- The Trust Project

📄 Surveys and Reports

Papers

Mevan Babakar and Will Moy. 2016. The State of Automated Factchecking. Full Fact. PDF Web
Niall J. Conroy, Victoria L. Rubin, and Yimin Chen. 2015. Automatic deception detection: Methods for finding fake news. DOI PDF
Georgios Gravanis, Athena Vakali, Konstantinos Diamantaras, and Panagiotis Karadais. 2019. Behind the cues: A benchmarking study for fake news detection. DOI PDF
Alice Marwick and Rebecca Lewis. 2017. Media Manipulation and Disinformation Online, Data & Society Research Institute. PDF Web
Ray Oshikawa, Jing Qian, and William Yang Wang. 2018. A Survey on Natural Language Processing for Fake News Detection. arXiv:1811.00770. PDF
Shivam B. Parikh and Pradeep K. Atrey. 2018. Media-Rich Fake News Detection: A Survey. DOI PDF
Francesco Pierri and Stefano Ceri. 2019. False News On Social Media: A Data-Driven Survey. arXiv:1902.07539. PDF
Reuters Institute. 2018. Reuters Institute Digital News Report 2018. PDF
- Richard Fletcher. n.d. Misinformation and Disinformation Unpacked Web
- Richard Fletcher. n.d. The Impact of Greater News Literacy Web
Jon Roozenbeek and Sander van der Linden. 2019. The Fake News Game: Actively Inoculating Against the Risk of Misinformation. DOI PDF
Karishma Sharma, Feng Qian, He Jiang, Natali Ruchansky, Ming Zhang, and Yan Liu. 2019. Combating fake news: A survey on identification and mitigation techniques. arXiv:1901.06437. PDF
Kai Shu, Amy Sliva, Suhang Wang, Jiliang Tang, and Huan Liu. 2017. Fake News Detection on Social Media. arXiv:1708.01967. PDF
Savvas Zannettou, Michael Sirivianos, Jeremy Blackburn, and Nicolas Kourtellis. 2019. The web of false information: Rumors, fake news, hoaxes, clickbait, and various other shenanigans. arXiv:1804.03461. PDF
Xinyi Zhou and Reza Zafarani. 2018. Fake News: A Survey of Research, Detection Methods, and Opportunities. arXiv:1812.00315. PDF
Arkaitz Zubiaga, Ahmet Aker, Kalina Bontcheva, Maria Liakata, and Rob Procter. 2018. Detection and resolution of rumours in social media: A survey. arXiv:1704.00656. PDF

Journals

Misinformation Review, Harvard Kennedy School

🔎 Research Tools

APIs

Perspective API: machine learning models which score the perceived impact (based on toxicity, insult, profanity, etc.) a comment might have on a conversation. Github
Google Fact Check Tools API by Google

Libraries

Note: Most of the libraries listed here are written in Python; some may exist in other languages too.

General NLP
- Gensim: topic modelling, document indexing and similarity retrieval with large corpora. Github PyPI
- Newspaper3k: article scraping and curation. Github PyPI
- NLTK: natural language processing toolkit. Github PyPI
- Readability: traditional readability measures based on simple surface characteristics. Github PyPI
- Scikit-learn: modules for machine learning and data mining. Github PyPI
- spaCy: industrial-strength natural language processing. Github PyPI
- Stanford CoreNLP: linguistic annotations, token and sentence boundaries, PoS, named entities, numeric and time values, dependency and constituency parses, coreference, sentiment, quote attributions, and relations. Github
- The News Landscape Toolkit (NELA): toolkit for assessing news articles and sources. Github
Data extraction / retrieval
- GetOldTweets3: a Python 3 library and a corresponding command line utility for accessing old tweets. Github PyPI
- Hydrator: turn Tweet IDs into Twitter JSON & CSV from your desktop. Github
- Pattern: web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization. Github
- Scrapy: a framework for speedily extracting data from websites. Github PyPI
- Twarc: a command line tool and Python library for archiving Twitter JSON. Github PyPI
- TwitterMySQL: pull tweets from the Twitter API and insert them into MySQL. Github
Visualisation
- Dash: a framework for building ML & data science web apps. Github PyPI
- Matplotlib: a comprehensive library for creating static, animated, and interactive visualizations in Python. Github PyPI
- NetworkX: creation, manipulation, and study of the structure, dynamics, and functions of complex networks. Github PyPI
- Plotly: interactive charts and maps for Python, R, and JavaScript. Github PyPI
- Scattertext: beautiful visualizations of how language differs among document types. Github PyPI
Others
- CrowdTangle

📝 Tutorials

Liang Wu, Giovanni Luca Ciamplaglia, and Huan Liu. 2017. Mining Misinformation in Social Media: Understanding Its Rampant Spread, Harm, and Intervention. IEEE International Conference on Data Mining ICDM 2017. Part 1 Part 2 Slides Web

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Information Disorder Research.md

Information Disorder Research.md

Information Disorder Research: Datasets & Tools

Contents

📅 Conferences and Workshops

📔 Journals

📁 Datasets

Academic

Non-Academic

✅ Fact and Veracity Checking

🗂 Glossaries

📰 News Labs & Initiatives

🏢 Research Labs & Partnerships

📄 Surveys and Reports

🔎 Research Tools

📝 Tutorials

Files

Information Disorder Research.md

Latest commit

History

Information Disorder Research.md

File metadata and controls

Information Disorder Research: Datasets & Tools

Contents

📅 Conferences and Workshops

📔 Journals

📁 Datasets

Academic

Non-Academic

✅ Fact and Veracity Checking

🗂 Glossaries

📰 News Labs & Initiatives

🏢 Research Labs & Partnerships

📄 Surveys and Reports

🔎 Research Tools

📝 Tutorials