Skip to content

Latest commit

 

History

History
224 lines (165 loc) · 19.5 KB

Information Disorder Research.md

File metadata and controls

224 lines (165 loc) · 19.5 KB

Information Disorder Research: Datasets & Tools

This is a curated list of resources for research on various kinds of information disorder, such as fake news, rumours and satire.

License: CC0-1.0

Contents

  1. 📅 Conferences and Workshops
  2. 📔 Journals
  3. 📁 Datasets
  4. ✅ Fact and Veracity Checking
  5. 🗂 Glossaries
  6. 🏢 Research Labs
  7. 📄 Surveys and Reports
  8. 🔎 Research Tools
  9. 📝 Tutorials

📅 Conferences and Workshops

Conferences

Other Programmes, Fellowships and Summer Schools

Workshops

📔 Journals

📁 Datasets

Academic

Dataset Description Resources
BuzzFace A News Veracity Dataset with Facebook User Commentary and Egos. Data Paper
BuzzFeed-Webis Fake News Corpus 2016 Consists moslty of political news, fact-checked by journalists at BuzzFeed. The data was sourced from the Facebook pages of 9 verified publishers (6 hyperpartisan, 3 left-wing and 3 right-wing), one week before the 2016 US elections. Data Paper
FA-KES A Fake News Dataset around the Syrian War. Data Paper
r/Fakeddit A New Multimodal Benchmark Dataset for Fine-grained Fake News Detection. Data Paper
FakeNewsAMT + Celebrity News on business, celebrity, education, entertainment, politics, sports and technology. Authentic news was collected from reputable sources and fake news was produced by Amazon Mechanical Turk workers. Data Paper
FakeNewsData1 Political news with the highest engagement on Facebook, 9 months before the 2016 US Presidential Election, collected from BuzzFeed; and random political news data collected from three types of sources (Real, Fake and Satire). Data Paper
FakeNewsCorpus A large dataset with millions of articles sourced from opensources.co, WebHose, and The New York Times. Data
FakeNewsNet Comprises of news content (sourced from PolitiFact and GossipCop) and social context (sourced from Twitter) data. Data Paper 1 Paper 2 Paper 3
ISOT Created by the Information Security and Object Technology (ISOT) research lab, at the University of Victoria. Real articles were obtained from Reuters.com and fake articles were identified using fact-checking sites Politifact.com, and Wikipedia). Data Paper 1 Paper 2
NELA Datasets A Large Multi-Labelled News Dataset for The Study of Misinformation in News Articles. All Datasets
NELA2017: Data Paper
NELA-GT-2018: Data Paper
NELA-GT-2019: Data Paper
SVDC News on conflict in Syria. Articles were verified using information provided by the Syrian Violations Documentation Centre. Data

Non-Academic

  • EUvsDisinfo: a continuously updated collection of disinformation cases in multiple languages from all over the world. Data

✅ Fact and Veracity Checking

🗂 Glossaries

Caroline Jack. 2017. Lexicon of Lies: Terms for Problematic Information, Data & Society Research Institute. PDF Web

Claire Wardle, Grace Greason, Joe Kerwin, and Nic Dias. 2018. Information Disorder: The Essential Glossary, First Draft. PDF Web

📰 News Labs & Initiatives

🏢 Research Labs & Partnerships

📄 Surveys and Reports

Papers

  • Mevan Babakar and Will Moy. 2016. The State of Automated Factchecking. Full Fact. PDF Web

  • Niall J. Conroy, Victoria L. Rubin, and Yimin Chen. 2015. Automatic deception detection: Methods for finding fake news. DOI PDF

  • Georgios Gravanis, Athena Vakali, Konstantinos Diamantaras, and Panagiotis Karadais. 2019. Behind the cues: A benchmarking study for fake news detection. DOI PDF

  • Alice Marwick and Rebecca Lewis. 2017. Media Manipulation and Disinformation Online, Data & Society Research Institute. PDF Web

  • Ray Oshikawa, Jing Qian, and William Yang Wang. 2018. A Survey on Natural Language Processing for Fake News Detection. arXiv:1811.00770. PDF

  • Shivam B. Parikh and Pradeep K. Atrey. 2018. Media-Rich Fake News Detection: A Survey. DOI PDF

  • Francesco Pierri and Stefano Ceri. 2019. False News On Social Media: A Data-Driven Survey. arXiv:1902.07539. PDF

  • Reuters Institute. 2018. Reuters Institute Digital News Report 2018. PDF

    • Richard Fletcher. n.d. Misinformation and Disinformation Unpacked Web

    • Richard Fletcher. n.d. The Impact of Greater News Literacy Web

  • Jon Roozenbeek and Sander van der Linden. 2019. The Fake News Game: Actively Inoculating Against the Risk of Misinformation. DOI PDF

  • Karishma Sharma, Feng Qian, He Jiang, Natali Ruchansky, Ming Zhang, and Yan Liu. 2019. Combating fake news: A survey on identification and mitigation techniques. arXiv:1901.06437. PDF

  • Kai Shu, Amy Sliva, Suhang Wang, Jiliang Tang, and Huan Liu. 2017. Fake News Detection on Social Media. arXiv:1708.01967. PDF

  • Savvas Zannettou, Michael Sirivianos, Jeremy Blackburn, and Nicolas Kourtellis. 2019. The web of false information: Rumors, fake news, hoaxes, clickbait, and various other shenanigans. arXiv:1804.03461. PDF

  • Xinyi Zhou and Reza Zafarani. 2018. Fake News: A Survey of Research, Detection Methods, and Opportunities. arXiv:1812.00315. PDF

  • Arkaitz Zubiaga, Ahmet Aker, Kalina Bontcheva, Maria Liakata, and Rob Procter. 2018. Detection and resolution of rumours in social media: A survey. arXiv:1704.00656. PDF

Journals

🔎 Research Tools

APIs

Libraries

Note: Most of the libraries listed here are written in Python; some may exist in other languages too.

  • General NLP

  • Data extraction / retrieval

    • GetOldTweets3: a Python 3 library and a corresponding command line utility for accessing old tweets. Github PyPI
    • Hydrator: turn Tweet IDs into Twitter JSON & CSV from your desktop. Github
    • Pattern: web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization. Github
    • Scrapy: a framework for speedily extracting data from websites. Github PyPI
    • Twarc: a command line tool and Python library for archiving Twitter JSON. Github PyPI
    • TwitterMySQL: pull tweets from the Twitter API and insert them into MySQL. Github
  • Visualisation

    • Dash: a framework for building ML & data science web apps. Github PyPI
    • Matplotlib: a comprehensive library for creating static, animated, and interactive visualizations in Python. Github PyPI
    • NetworkX: creation, manipulation, and study of the structure, dynamics, and functions of complex networks. Github PyPI
    • Plotly: interactive charts and maps for Python, R, and JavaScript. Github PyPI
    • Scattertext: beautiful visualizations of how language differs among document types. Github PyPI
  • Others

📝 Tutorials

Liang Wu, Giovanni Luca Ciamplaglia, and Huan Liu. 2017. Mining Misinformation in Social Media: Understanding Its Rampant Spread, Harm, and Intervention. IEEE International Conference on Data Mining ICDM 2017. Part 1 Part 2 Slides Web