QMSS G5069 - TOPICS IN APPLIED DATA SCIENCE FOR SOCIAL SCIENTISTS
Marco Morales, Columbia University
This repository is a companion to the course Topics in Applied Data Science for Social Scientists taught at the Quantitative Methods in the Social Sciences program over the Spring of 2017.
It contains required readings, slides, code examples, and starter files for data challenges. Also, the Wiki on this repository contains additional information and resources related to each week's material and other relevant topics. Make sure to check it regularly.
In his now classic Venn diagram, Drew Conway described Data Science as sitting at the intersection between good hacking skills, math and statistics knowledge, and substantive expertise. By training, social scientists possess a fluid combination of all three, but also bring an additional layer to the mix. We have acquired slightly different training, skills and expertise tailored to understand human behavior, and to explain why things happen the way they do. Social scientists are, thus, a particular kind of data scientist.
This course is not intended to teach students how to code, create visualizations, or estimate models. It presumes you have learned that in other classes. This course is intended to take students to the next level in becoming a data scientist. Therefore you will:
- learn current best practices in data science that will facilitate collaboration with data scientists trained in engineering or other hard sciences,
- learn soft skills that are key to a successful interaction with business stakeholders, and
- get exposed to data science practitioners and explore real-life applications from a social science perspective.
All of these are highly valued skills in the data science job market, but seldom considered as part of an integral training for data scientists.
It is assumed that students have basic to intermediate knowledge of R, including experience using it for data manipulation, visualizations, and model estimation. Some mathematics, statistics, econometrics and algebra will also be assumed.
There are no required textbooks for this course, but you might find these to be very useful resources for the course and later in your careers:
- Grolemund, Garrett and Hadley Wickham. 2016. R for Data Science. Boston, MA: O'Reilly Media. Alternatively, you can consult the online version of the text here.
- Wickham, Hadley. 2014. Advanced R. Boca Raton, Fl: Taylor and Francis. Alternatively, you can consult the online version of the text here.
- Chang, Winston. 2013.R Graphics Cookbook. Boston, MA: O'Reilly Media.
- Wickham, Hadley. 2016.ggplot2: Elegant Graphics for Data Analysis, Second Ed. New York, NY: Springer.
- Conway, Drew and John Myles White. 2012. Machine Learning for Hackers: Case Studies and Algorithms to Get You Started. Boston, MA: O'Reilly Media.
To actively participate on this course
Accessing course materials
You have two options to access the materials on this repository:
Dynamic: Clone the repository by clicking on the on the "Open in Desktop" button. If you do not have a git client installed on your system, you will need to get one here and also to make sure that git is installed. This is perhaps best, since you can refresh your clone as new content gets pushed.
Static: download the entire repository as a zip file by clicking on the on the "Download ZIP" button. Note that you will have to download it again every time it is updated (and it will be updated at leas weekly during the semester).
You can also subscribe to the repository.This will send you updates each time new changes are pushed to the repository.