This repository contains the code and slides used for teaching the 2023 edition of the Summer Institute in Computational Social Science at Institut Polytechnique de Paris. The R script can also be found in bookdown form. Relevant data sets are distributed via Dropbox, and code for directly loading them into the session is included in the scripts. Hence, no data needs to be downloaded upfront.
Make sure that you have installed a current version of R and RStudio before running the scripts. We are working with the needs
package to take care of the installation of packages on the fly, so make sure that you have it installed. We assume familiarity with R and mostly follow the "tidy dialect." If you are entirely unfamiliar with this, you can find introductory material in the final section of the index file.
The scripts are ordered in the way the material is taught. Throughout the course, the theory behind the concepts will be introduced in the morning lectures, and the practical implementation in R in the afternoon sessions.
The following list connects the corresponding files:
- Day 1: intro to CSS and ethical considerations (slides: Intro to CSS, slides: Ethical considerations)
- Day 2: building crawlers (slides: building crawlers, R material setup, R material: crawlers and apis)
- Day 3: scraping structured content from the web (slides: structured scraping, R material: scraping structured)
- Day 4: scraping unstructured content from the web (slides, R material)
- Day 5: NLP I (slides, R material)
- Day 6: NLP II (slides, R material)
- Day 7: NLP III (slides, R material)
- Day 8: The Augmented Social Scientist (Tutorial)
The solutions to the exercises are included in the bookdown script but not in the "raw" RMD scripts.