Methods workshop: Automated Text Analysis with R
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.

Workshop: Automated Text Analysis with R

Sponsored by

May 19, 2016


(with some content based on materials prepared by Dan Cervone, Alex Hanna, Ken Benoit, Paul Nulty, Kevin Munger, and Justin Grimmer.)


The popularity of text as data is increasing rapidly within the social sciences. “Scholars have long recognized this, but the massive costs of analyzing even moderately sized collections of texts have hindered their use in political science research” (Grimmer and Stewart 2013) and elsewhere in the social sciences. This situation has changed with increasing computing power and more capable computing tools. In the coming years, the relevance of text data will further increase as more and more human communication is recorded online.

This workshop provides an introduction to text analysis using R. We will cover methods to conduct quantitative analysis of textual and web data, with an emphasis on social media data, applied to the study of social science questions. The workshop is made up of three "modules", each consisting of an introduction to a topic followed by examples and applications using R. The first module will cover how to format and input source texts, how to prepare the data for analysis, and how to extract descriptive statistics. The second module will discuss automated classification of text sources into categories using dictionary methods and supervised learning. Finally, the third module will discuss unsupervised classification of text into categories using topic modeling.

Setup and Preparation

You will need to bring a laptop to all sessions of the workshop. You will need R and RStudio installed. Follow the instructions here to install both.

Instructions for using course materials on GitHub

You have three options for downloading the course material found on this page:

  1. You can download the materials by clicking on each link.

  2. You can "clone" repository, using the buttons found to the right side of your browser window as you view this repository. This is the button labelled "Clone in Desktop". If you do not have a git client installed on your system, you will need to get one here and also to make sure that git is installed. This is preferred, since you can refresh your clone as new content gets pushed to the course repository. (And new material will get actively pushed to the course repository while this course takes place.)

  3. Most simply, you can choose the button on the right marked "Download zip" which will download the entire repository as a zip file.

You can also subscribe to the repository if you have a GitHub account, which will send you updates each time new changes are pushed to the repository.


Time Topic
10:00-10:30 Introduction to text analysis
10:30-11:00 Descriptive analysis, regular expressions
11:00-11:15 Dictionary methods
11:15-12:00 Challenge I
12:00-14:00 Lunch Break
14:00-15:00 Supervised methods
15:00-15:30 Challenge II
15:30-16:00 Coffee Break
16:00-17:00 Unsupervised methods
17:00-17:45 Challenge III
17:45-18:00 Wrap-up