A guide to computational social science resources

A collection of resources and readings for people wanting to get acquainted with computational social science.

Credit goes to Andrew Hall for pitching the idea at Data Science Nights@Northwestern. Quoted text is taken directly from the website or document. Suggestions welcome.

What is it?

This matrix of skills is a nice starting point for thinking about "data science" or computational social science as a collection of activities that can be more or less complex.

Syllabi

Perspectives on Computational Analysis Syllabus
Computational Social Science, syllabus by Nir Grinberg, Ben-Gurion University
Very high-level view of what makes up "data science:" Curriculum Guidelines for Undergraduate Programs in Data Science

Training

Learn R in R: the swirl package.
Fast lane to learning R by Norman Matloff (professor of Computer Science at UC Davis). Self-description:

This site is for those who know nothing of R, or maybe even nothing of programming, and seek QUICK, painless entree to the world of R.

The course is quite thorough regarding base R, including graphics (ggplot2 is covered as well). NM is a proponent of learning base R first before learning third-party packages and I tend to agree.
R for Data Science by Garret Grolemund and Hadley Wickham. The authors are important originators of/contributors to the so-called "tidyverse", a collection of packages for R. These packages tend make things easier (especially for automated workflows). However, starting out with the "tidyverse" when learning R is, in my opinion, a bit like learning to run before learning to walk.
Starting from zero, Data Carpentry workshop. These resources are intended for in-person workshops but can be used by self-learners.

This is an introduction to R designed for participants with no programming experience. These lessons can be taught in a day (~ 6 hours). They start with some basic information about R syntax, the RStudio interface, and move through how to import CSV files, the structure of data frames, how to deal with factors, how to add/remove rows and columns, how to calculate summary statistics from a data frame, and a brief introduction to plotting.
Data Science Course in a Box (Course materials) by Mine Cetinkaya-Rundel for RStudio. Primarily intended for teachers but might be valuable for self-learners too. Self-presentation:

Data Science in a Box contains the materials required to teach (or learn from) an introductory data science course using R, all of which are freely-available and open-source. They include course materials such as slide decks, homework assignments, guided labs, sample exams, a final project assignment, as well as materials for instructors such as pedagogical tips, information on computing infrastructure, technology stack, and course logistics.

See datasciencebox.org for everything you need to know about the project!
R for Stata users, for people coming from Stata and wanting to learn R. An earlier draft is available for free. This book is structured somewhat similarly to the O'Reilly Cookbooks, i.e. it is a laundry list of problems or situations for which solutions are given in both Stata and R. If your particular problem is among those covered, great! If not, you won't get around learning the basics of R and translating Stata logic into R logic yourself.
Chromebook Data Science project

Chromebook Data Science (CBDS) is an online educational program to help anyone who can read, write, and use a computer to move into data science.> It is offered by faculty members in the Johns Hopkins Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health. There are currently 12 courses that are offered in the Chromebook Data Science Curriculum.
UK Data Service Data Skills Modules

These introductory level interactive modules are designed for users who want to get to grips with keys aspects of survey, longitudinal and aggregate data.
The BBC's visual and data journalism cookbook for R, Blog post announcing and explaining the launch of the BBC's visual and data journalism cookbook in R
SciPy Lecture Notes

Tutorials on the scientific Python ecosystem: a quick introduction to central tools and techniques. The different chapters each correspond to a 1 to 2 hours course with increasing level of expertise, from beginner to expert.

Readings

Practical advice

"How to name files," Jenny Bryan's speaker deck

"Project structure & Naming files," Danielle Navarro (inspired by Jenny Bryan), slides

Version control in R:

"starting R markdown,", a YouTube tutorial playlist by Danielle Navarro

Should I learn R or Python? It depends on what you want to do with it. What tasks do you want to accomplish? What professional goals to you want to attain? Norman Matloff discusses how R and Python compare to each other for various tasks (and on some more general dimensions) here.

General

Matthew Sagalnik, Bit by bit (free version)

Matthew Sagalnik, Bit by bit (tree version)

Bernard E. Harcourt, Against Prediction (tree version) Summary: Against Prediction argues that predictive policing models not “crime” but “arrests”, i.e. police behavior, not the supposed underlying behavior (not what crimes will happen where, but who will be arrested). Therefore, it will reinforce existing trends in policing instead of “improving” policing.

Bernard E. Harcourt, Against Prediction (working paper)

Bernard E. Harcourt, Against Prediction (review by Cosma Shalizi)

Thoughts on algorithmic fairness: "Algorithmic fairness is an interdisciplinary research field concerned with the various ways that algorithms may perpetuate or reinforce unfair legacies of our history, and how we might modify the alorithms or systems they are used in to prevent this. For example, if the training data used in a machine learning methods contains patterns caused by things like racism, sexism, ableism, or other types of injustice, then the model may learn those patterns and use them to make predictions and decisions that are unfair. There are many ways that technology can have unintended consequences, and this is just one of them."

Data sets and sources

I have to thank the terrific Tom Theile and the pretty excellent Peter Eibich for suggesting many of these. In particular, you could check out Tom's guide to finding datasets online.

Data repositories and aggregators

Inter-university Consortium for Political and Social research - A data repository for mostly survey data. North America-centric.

Internet Archive datasets The Internet Archive is primarily know for the Wayback Machine. It also stores and makes available data.

Urban Institute Data Catalogue, UIDC announcement and short presentation

Wikidata Wikidata is a human-curated database of every "fact" of Wikipedia (and more) in a structured format.

Google tool for finding datasets

National statistical offices, administrative data, and international organizations

Data portals - a list of open data portals globally

U.S. government data catalog

U.S. open data search

UK Data service

UK data archive

https://data.gov.uk/

Office for national statistics

GovData - German administrative data

German Statistical Office

German maps and geographic data

European Union data repository

United Nations

OECD

World Bank

Specialized sites/repositories

Cross-national equivalent file - The Cross-National Equivalent File (CNEF) project harmonizes a subset of the data found on seven panel data sets collected in Australia, Canada, China, Germany, Korea, Russia, Switzerland, UK, and US.

Luxembourg Income Study

LIS acquires datasets with income, wealth, employment, and demographic data from many high- and middle-income countries, harmonises them to enable cross-national comparisons, and makes them publicly available in two databases, the Luxembourg Income Study Database (LIS) and the Luxembourg Wealth Study Database (LWS).

German and EU surveys and administrative data

Cook County Open Data portal

Cook County Open Data - State Attorney (e.g. arrest data)

Wesleyan Media Project: "The Wesleyan Media Project tracks and analyzes all broadcast advertisements aired by or on behalf of federal and state election candidates in every media market in the country."

The Stanford Open Policing Project: "Our team is gathering, analyzing, and releasing records from millions of traffic stops by law enforcement agencies across the country."

The @unitedstates project Scrapers and parsers for many aspects regarding Congress, e.g. bios of members past and present, data about bills and roll call votes, district shapefiles, and much more.

Congressional record parser: "This tool converts HTML files containing the text of the Congressional Record into structured text data. It is particularly useful for identifying speeches by members of Congress."

Pew Research survey data

OpenStreetMap OpenStreetMap is a volunteer-built map of the world. You may download all of the data (or parts of it) from OpenStreetMap (https://wiki.openstreetmap.org/wiki/Planet.osm) and or the Internet Archive (https://archive.org/details/osmdata).

People, groups, hashtags

List of sociologists on twitter, by Philip N. Cohen

List of demographers on twitter, by Conrad Hacket

List of demographers on twitter, by Cameron Campbell

#rladies

#rstats

R Animated Gifs

The Data Science job market is saturated

Cheatsheet - Neural networks/maching learning

Similar collections

R for the rest of use, Resources

R resources collection, NU Research Computing Services

Python resources collection, NU Research Computing Services

OpenIntro, free textbooks

DataCamp

Following recent events at DataCamp (see here, here, here, here), this guide prefers to recommend other resources. The course offer is, however, comprehensive, and university students may benefit from special offers.

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
docs		docs
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs

docs

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

Repository files navigation

A guide to computational social science resources

What is it?

Syllabi

Training

Readings

Practical advice

General

Methods

Ethics

Neural Networks

Data sets and sources

Data repositories and aggregators

National statistical offices, administrative data, and international organizations

Specialized sites/repositories

People, groups, hashtags

Similar collections

DataCamp

About

Contributors 2

License

ohexel/comsocsci

Folders and files

Latest commit

History

Repository files navigation

A guide to computational social science resources

What is it?

Syllabi

Training

Readings

Practical advice

General

Methods

Ethics

Neural Networks

Data sets and sources

Data repositories and aggregators

National statistical offices, administrative data, and international organizations

Specialized sites/repositories

People, groups, hashtags

Similar collections

DataCamp

About

Topics

Resources

License

Stars

Watchers

Forks