Skip to content
This repository has been archived by the owner. It is now read-only.
QMSS GR5069 - Topics in Applied Data Science for Social Scientists
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.


Marco Morales, Columbia University

This repository is a companion to the course Topics in Applied Data Science for Social Scientists taught at the Quantitative Methods in the Social Sciences program over the Spring of 2017.

It contains required readings, slides, code examples, and starter files for data challenges. Also, the Wiki on this repository contains additional information and resources related to each week's material and other relevant topics. Make sure to check it regularly.


In his now classic Venn diagram, Drew Conway described Data Science as sitting at the intersection between good hacking skills, math and statistics knowledge, and substantive expertise. By training, social scientists possess a fluid combination of all three, but also bring an additional layer to the mix. We have acquired slightly different training, skills and expertise tailored to understand human behavior, and to explain why things happen the way they do. Social scientists are, thus, a particular kind of data scientist.

This course is not intended to teach students how to code, create visualizations, or estimate models. It presumes you have learned that in other classes. This course is intended to take students to the next level in becoming a data scientist. Therefore you will:

  • learn current best practices in data science that will facilitate collaboration with data scientists trained in engineering or other hard sciences,
  • learn soft skills that are key to a successful interaction with business stakeholders, and
  • get exposed to data science practitioners and explore real-life applications from a social science perspective.

All of these are highly valued skills in the data science job market, but seldom considered as part of an integral training for data scientists.


It is assumed that students have basic to intermediate knowledge of R, including experience using it for data manipulation, visualizations, and model estimation. Some mathematics, statistics, econometrics and algebra will also be assumed.

Course Resources

There are no required textbooks for this course, but you might find these to be very useful resources for the course and later in your careers:

To actively participate on this course

By the second session, make sure to have the latest versions of R, RStudio, and Git on your computer.

Accessing course materials

You have two options to access the materials on this repository:

  1. Dynamic: Clone the repository by clicking on the on the "Open in Desktop" button. If you do not have a git client installed on your system, you will need to get one here and also to make sure that git is installed. This is perhaps best, since you can refresh your clone as new content gets pushed.

  2. Static: download the entire repository as a zip file by clicking on the on the "Download ZIP" button. Note that you will have to download it again every time it is updated (and it will be updated at leas weekly during the semester).

You can also subscribe to the repository.This will send you updates each time new changes are pushed to the repository.

Thanks to Huiyu Xiaho for assistance in drafting code for class examples and visualizations.

You can’t perform that action at this time.