A list of digital tools and resources for journalism, media and communication research, and computational social science.
Table of contents
- Find datasets
- Content analysis, text analysis, text mining, annotation
- Compare differences between texts, find duplicate files
- Television
- Social networking sites and specific sites
- Scrape and extract news articles
- Online archives and archiving
- Journal articles, citations, bibliometrics
- Literature search
- Find retracted articles
- Behavioral and cognitive experiments
- Graphics, network visualization, and maps
- Convert and clean data
- Survey scales and measures
- Survey software
- Statistics and questionable research practices (QRPs)
- Statistical software
- Organize photos, citations, and references
- Education
- Organizations
- Open science, preregistration, code/data sharing
- Text, writing
- Humor
- See also
Search engines:
- Google Dataset Search - search engine for datasets.
- re3data.org
- Metadata Search
- Dimensions - search among 8+ million datasets (with about 2 million of those datasets that link to the original article)
Archives and lists:
- A dataset with political datasets - Cabinets, citizens, constitutions, political institutions, parties and politicians, democracy, economics, elections, international relations, media, policy, political elites (.xlsx, .csv, .Rdata, .sav).
- Consortium of European Social Science Data Archives (CESSDA)
- Inter-university Consortium for Political and Social Research (ICPSR)
- Open Stats Lab data from psychological studies that is intended to be used in education.
- Awesome Public Datasets - awesome list of (large-scale) public datasets on the Internet (on-going collection).
- Common Crawl - download a copy of the web, several billion web pages (250+ terabyte of data), updated regularly.
- Öppna Data - open data in Sweden (from Riksarkivet).
- Personality Development Collaborative - hub for psychologists who want to share or use existing longitudinal data about personality.
Survey data:
- European Social Survey - survey conducted across Europe since 2001. Face-to-face interviews every two years on new cross-sectional samples.
- European Values Study - large-scale, cross-national survey since 1981 about basic human values (e.g., ideas, beliefs, preferences, attitudes). New surveys every nine years.
- The General Social Survey 1972– (USA)
- Latin American Public Opinion Project (North and South America)
- SOM Institute 1986– (Sweden)
- Norsk senter for forskningsdata (Norway)
Media data:
- IMDB Ratings for TV/Streaming Series - dataset of ratings given in IMDB to episodes of popular TV and Streaming series (includes R code).
- The Twitter Parliamentarian Database - database with Twitter politics across 26 countries.
- Hate speech data - datasets in many languages annotated for hate speech, online abuse, and offensive language. Useful to create machine learning models.
- Upworthy Research Archive - a time series of 32,487 behavior experiments from the U.S. media website Upworthy.
- Text Mining for Social Scientists and Digital Humanists (GitHub) (R)
- Lexicoder - multi-platform software for automated content analysis of text (Java).
- SentimentAnalysis - sentiment analysis of text (R).
- Topic Models Learning and R Resources
- Count Words in a PDF Document (online tool).
- TAPoR - curated lists of widely used research tools in the digital humanities for studying texts.
- Datavyu - code and annotate video (Win/Mac app).
- Diff Checker - compare two texts for differences (online tool).
- Online LaTeX diff tool - to compare text differences in LaTeX documents using latexdiff (online tool).
- Auslogics Duplicate File Finder - finds duplicate files regardless of their filenames (Windows app).
- comparefiles - scans a directory for identical files or similar text files (Python).
- Stanford Cable TV News Analyzer - tool to count screen time of who and what is in cable TV news (from the Internet Archive TV News).
- Facepager - fetches publicly available data from Facebook, Twitter and other JSON-based API:s (Python).
- facebook-page-post-scraper - data scraper for Facebook Pages (Python).
- PolitEcho (GitHub) - shows you the political biases of your Facebook friends and news feed (Chrome extension).
- netvizz - collection of scripts that help with downloading data from the Facebook platform for research purposes (important about Facebook API changes, read Facebook’s app review and how independent research just got a lot harder).
- Facebook API.
- twarc - command line tool for archiving Twitter JSON (Python).
- tweetbotornot - detect Twitter bots via machine learning (R).
- Twint - Twitter scraping and open source intelligens (OSINT) tool that doesn't use Twitter's API, allowing you to scrape a user's followers, following, Tweets and more while evading most API limitations (Python).
- Tinfoleak (GitHub) - open-source tool for Twitter intelligence analysis (Python).
- Tweetbeaver - convert @name to ID, check if two accounts follow each other, download a user's favorites, search within a user's favorites, download a user's timeline etc (online tool).
- scrape-twitter - Command line interfaces to scrape profiles, timelines, connections, likes, search and conversations with the use of API (Node.js).
- Chorus - free Twitter harvesting and visual analytics suite for social science research (Windows).
- Twitter API.
- Twitter - helpful tools - Twitter lists helpful tools for data access, data analysis, data visualization, and hosting.
- Pageviews Analysis tool for Wikimedia Foundation wikis (GitHub) - number of page views for any Wikipedia page (online tool, PHP).
- WikiMedia REST API - access to Wikipedia content, data, and statistics (online API).
- WikiShark - Wikipedia article traffic (page views) since 2008, updated every hour or so.
- Page view statistics for Wikimedia projects 2008-2016 - download all dumps from Wikipedia with page views for all projects from 2008 to 2016.
- Analytics Datasets: Pageviews 2016 onwards - download all dumps from Wikipedia with page views for all projects from 2016 onwards. Don't forget that you can use the API instead for easier access, but note that the API only has data from year 2015 onwards.
- Google Trends - number of searches for a specific search query (online tool).
- Google Ngram Viewer - shows number of times a phrase have occurred in a books from year 1800 to 2000 (online tool).
- GoogleScraper - scrape search engines (e.g., Google, Yandex, Bing, Duckduckgo, Baidu) by using proxies (socks4/5, http proxy) and many IP's, including asynchronous networking support (Python).
- Lumen database - collects and analyzes legal complaints and requests for removal of online materials such as Google search results (online tool).
- Google Books - search for books (online API).
- YouTube comment scraper - scrape comments from YouTube videos, download comments as JSON or CSV (online tool).
- youtube-dl - download YouTube videos or videos from other sites (Python).
- PRAW - Library for API access to Reddit (Python).
- RedditExtractoR - Package for API access to Reddit (R).
- Creepy (GitHub) - a geolocation OSINT tool. Offers geolocation information gathering through social networking platforms (Python).
- flashback - scrapes Swedish Flashback forum https://www.flashback.org/ (Python).
- Instagram API (online API).
- New York Times API - latest articles, top articles, book bestsellers, search articles from year 1851, and more (online API).
- OMDb API - obtain movie information, content and images mainted by users (online API).
- Rotten Tomatoes API - movie reviews (online API).
- See also any-api.com.
- APIs for social scientists: A collaborative review - how to use many social media API:s, with example code (R).
- GDELT Project - archives all news media events around the globe.
- The Social, Political and Economic Event Database Project (SPEED) - comprehensive news sources from 1945 onwards, crawls over 5,000 news feeds in 120 countries several times each day, scraping news reports, totalling over 40 million news reports.
- mediacloud - open source, open data platform that allows researchers to answer quantitative questions about the content of online media (Perl/Python).
- Trove - Find and get Australian and online resources: books, images, historic newspapers, maps, music, archives and more.
- newsdiffs - automatic scraper that tracks changes in news articles over time (Python).
- newsflash - tools to work with the Internet Archive and GDELT Television Explorer (R).
- newspaper - news, full-text, and article metadata extraction, based on python-goose (Python).
- news-please - integrated web crawler and information extractor for news that just works (Python).
- Newsmap - semi-supervised geographical news classifier (R). Journal article.
- Scrapy - fast high-level web crawling and web scraping framework (Python).
- Internet Archive - non-profit digital library offering free universal access to books, movies & music (online tool).
- Wayback Machine - 343 billion archived web pages from Internet Archive (online tool).
- archive.is - take a snapshot of a webpage that will always be online even if the original page disappears (online tool).
- HTTrack Website Copier - download website to a local directory, building recursively all directories, getting HTML, images, and other files with the original site's relative link-structure (Windows/Mac/Linux).
- journal-spider - tools to spider journal websites for links to articles (Python).
- Claim Extraction for Scientific Publications - detect claims (e.g. "background", "conclusion") from scientific publication using discourse and sentence embedding (Python).
- scholar.py - parser for Google Scholar (Python).
- OpenCitations - search and browse OpenCitations Corpus (OCC) of open downloadable bibliographic and citation data recorded in RDF (online API).
- metaknowledge - computational research in bibliometrics, scientometrics, and network analysis (Python).
- pdfx - extract references (pdf, url, doi, arxiv) and metadata from a PDF. Download all referenced PDFs (Python).
- Publish or Perish - retrieves and analyzes academic citations from Google Scholar and Microsoft Academic Search (Win/Mac/Linux).
- VOSViewer - software for constructing and visualizing bibliometric networks of journals, researchers, or publications based on citation, bibliographic coupling, co-citation, or co-authorship + text mining (Win/Mac).
- CitNetExplorer - software for visualizing and analyzing citation networks of scientific publications, import from Web of Science (Win/Mac).
- metagear - research synthesis tools for systematic reviews and meta-analysis with data extraction (R).
- revtools - R package to conduct literature review or meta-analysis, visualise patterns in bibliographic data, select/exclude articles or words, etc (R).
- litsearchr - quick, objective, reproducible search strategy development using text-mining and keyword co-occurrence networks to identify important terms (R). Journal article.
- Rayyan - uses AI/NLP to speed you through systematic reviews (Android/iOS).
- OpenAlex - open and comprehensive catalog of scholarly papers, authors, institutions, and more (free API).
- Semantic scholar - AI-powered research tool for scientific literature.
- Connected papers - explore connected papers in a visual graph.
- CoCites - citation-based method for searching scientific literature, lets you find out who else published on a topic.
- Scite - discover supporting and contrasting evidence for papers by extracting the words around references.
- ASReview LAB - AI-powered app that help screen texts for systematic reviewing (Win/Mac).
- Problematic Paper Screener - finds papers with tortured phrases, text generated by machines (online).
- jsPsych (GitHub) - library for creating and running behavioral experiments in a web browser (JavaScript).
- OpenSeasme - graphical, open-source experiment builder for social sciences. Build complex experiments with minimal effort, create a wide range of experiments. Plug-in framework and Python scripting allows you to incorporate external devices, such as eye trackers, response boxes, and parallel port devices (Windows/Mac/Linux).
- PlanOut - framework for online field experiments. Makes it easy to run and iterate on sophisticated experiments in a statistically sound manner while satisfying the constraints of deployed Internet services (Python/JS/Java/PHP/Go/Lua/Ruby).
- PsychoPy - allow presentation of stimuli and collection of data for a wide range of neuroscience, psychology and psychophysics experiments (Python).
- WebExp - system for conducting psychological experiments over the web (Java).
- conjoint-example - example conjoint experimental design in Qualtrics.
- PsyToolkit - free toolkit for demonstrating, programming, and running cognitive-psychological experiments and surveys, including personality tests (online tool).
- Empirica - framework for running multiplayer interactive experiments and games in the browser (JavaScript).
- Gorilla - creates and hosts online experiments with easy-to-use graphical interface, no coding necessary (pay per respondent).
- oTree - open-source platform for behavioral research.
- Elgg - not a research tool, but open source social networking engine with core components to build a social networking site (PHP).
- Dia - app to draw structured diagrams (Windows/Mac/Linux).
- Gephi - visualization and exploration app for all kinds of graphs and networks (Windows/Mac/Linux).
- QGIS - free and open source geographic information system app (Windows/Mac/Linux).
- Inkscape - free and open source desktop program to draw vector graphics like Adobe Illustrator (Windows/Mac/Linux).
- From Data to Viz - leads you to the most appropriate graph for your data, links to the code to build it and lists common caveats you should avoid (online tool).
- Chart Types - tutorials, guides, and examples for all of the major graphs and some others.
- OpenRefine - powerful tool for working with messy data: cleaning it; transforming it from one format into another; and extending it with web services and external data, formerly called Google Refine (Windows/Mac/Linux).
- Mr. Data Converter - convert Excel data into web-friendly formats such as HTML, JSON and XML (online tool).
- PSPP File Conversion Service - convert SPSS (.sav) files to CSV or text format (online tool).
- Media Exposure Measures from Amsterdam School of Communication Research (ASCoR).
- The Semantic Scale Network - online app that detect semantically related scales. Input a scale item and get a similar scales/items back (contains 2,300 scales and 30,000 items).
- Measurement Instrument Database for the Social Sciences (MIDSS)
- Decision Making Individual Differences Inventory (DMIDI)
- PsycTESTS - psychological measures, scales, and instrumentation tools
- qualtRics - Download and import qualtrics survey data directly (R).
- GRIM Test checks if the reported means match with number of items and type of scale (see also GRIMMER test).
- SPRITE check the type of distributions that could have produced the reported descriptive statistics.
- statcheck checks if p-values match reported statistics.
- Test of Insufficient Variance (TIVA) checks wether reported p-values was obtained using questionable research practices.
- zcurve - estimates mean power after selection for significance, see blogpost and journal article (R).
- P-hacker - train your p-hacking skills to achieve p < 0.05 (online app/Shiny).
Programs such as SPSS, Stata, SAS, and Comprehensive Meta analysis costs money.
Free data analysis and exploration software:
- Data Explorer
- Tableau - free version for academic use.
- GNU PSPP - free alternative to SPSS, looks the same and have all basic SPSS functions.
- R - programming language for statistics.
- JASP - built on top of R.
- Jamovi - developed from JASP.
- G*Power - statistical power analysis.
- ESCI (Exploratory Software for Confidence Intervals) - simulate confidence intervals using Microsoft Excel.
- Process macro for SPSS and SAS - plugin for moderator and mediator analysis.
- Power and N Computations for Mediation - online tool by David A. Kenny to calculate power or sample size.
- Calculator for false positive risk (FPR) - If you observe a significant p-value after a single experiment, what's the probability that your result is a false positive? By Colin Longstaff, David Colquhoun, Brendan Halpin.
Convert effect sizes:
- Effect Size Calculator
- Web Pages that Perform Statistical Calculations!
- Free Statistics Calculator
- VassarStats: Website for Statistical Computation
- Equivalent Statistics
- Practical Meta-Analysis Effect Size Calculator
- EndNote - bibliography reference manager.
- JabRef - bibliography reference manager using BibTeX (free).
- Mendeley - bibliography reference manager (free).
- ReadCube Papers - bibliography reference manager.
- Zotero - bibliography reference manager (free).
- Tropy - organize research photos and images (free).
- Seshat annotation manager - automated management of annotation campaigns of speech data (Docker).
- Data Journalism Courses - list of data journalism courses and programmes from universities and higher education institutions around the world.
- SICCS Learning Materials - open source teaching and learning resources for computational social science.
- Digital Methods Initiative (GitHub) - contribution to doing research into the "natively digital".
- DocNow - tool/community that supports ethical collection, use, and preservation of social media content.
- Open Science Framework (OSF) - add your research project and collaborate, store data, preregister (see Prereg Challenge) and more.
- As predicted - preregister your hypothesis test.
- figshare - upload your data and get more citations.
- Dataverse - open source web app to share, preserve, cite, explore, and analyze research data.
- SocArxiv - share your manuscript and get feedback as a pre-print before journal publication.
- Collection of preregistered studies from OSF. See also Registered Protocols and Registered Reports.
- Top factor - ranking of scientific journals based on open science practices.
- QuillBot - paraphrasing tool helps rewrite your text and enhance sentences, paragraphs using AI (online).
- Break your own news - Breaking News Meme Generator - Add your pic, write the headline and generate a screenshot of a breaking news story.
- FOAAS (Fuck Off As A Service) - a modern, RESTful, scalable solution to the common problem of telling people to fuck off.
More lists:
- Awesome Web Archiving Awesome - awesome list for getting started with web archiving.
- Awesome R - curated list of awesome R packages and tools, find packages by category.
- awesome-r - yet another list of awesome R frameworks, libraries and software.
- Social media collection tools - by Deen Freelon.
- Social media methodologies
- Political Science: APIs for Scholarly Resources
Tutorials:
- Using Twitter as a data source: an overview of social media research tools (updated for 2017)
- Using Google Trends data for research? Here are 6 questions to ask
- Introduction to data management best practices
- Getting Started in Open Source: A Primer for Data Scientists
- Introductory guide to Open Source media network analysis for beginners
Literature:
- Barberá, P., Boydstun, A. E., Linn, S., McMahon, R., & Nagler, J. (in press). Automated Text Classification of News Articles: A Practical Guide. Political Analysis, 1–24. https://doi.org/10.1017/pan.2020.8
- Boumans, J. W., & Trilling, D. (2016). Taking Stock of the Toolkit: An overview of relevant automated content analysis approaches and techniques for digital journalism scholars. Digital Journalism, 4(1), 8–23. https://doi.org/10.1080/21670811.2015.1096598
- Burscher, B., Odijk, D., Vliegenthart, R., de Rijke, M., & de Vreese, C. H. (2014). Teaching the Computer to Code Frames in News: Comparing Two Supervised Machine Learning Approaches to Frame Analysis. Communication Methods and Measures, 8(3), 190–206. https://doi.org/10.1080/19312458.2014.937527
- Freelon, D. (2015). On the cutting edge of Big Data: Digital politics research in the social computing literature. In S. Coleman & D. Freelon (Eds.), Handbook of Digital Politics (p. 448). Northampton, MA: Edward Elgar.
- Jacobi, C., van Atteveldt, W., & Welbers, K. (2016). Quantitative analysis of large amounts of journalistic texts using topic modelling. Digital Journalism, 4(1), 89–106. https://doi.org/10.1080/21670811.2015.1093271
- Lazer, D., & Radford, J. (2017). Data ex Machina: Introduction to Big Data. Annual Review of Sociology, 43(1). https://doi.org/10.1146/annurev-soc-060116-053457
- Wilkerson, J., & Casas, A. (2017). Large-Scale Computerized Text Analysis in Political Science: Opportunities and Challenges. Annual Review of Political Science, 20(1), 529–544. https://doi.org/10.1146/annurev-polisci-052615-025542
Podcast episodes:
- Social Media and Politics (August, 2018) #53: Computational Social Science and Digital Methods in the Post-API Age, with Dr. Deen Freelon
Facebook groups:
- Association of Internet Researchers (AoIR) (3,400+ members)
- OpenCOMM: Open/Replicable/Reproducible Methods/Analyses in Communication (280+ members)
- Political Communication (3,800+ members)