Materials for CEU course "Automated Text Analysis in Political Science"
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.

CEU Course 'Automated Text Analysis in Political Science'

This page contains the materials for the short MA political science course Automated Text Analysis in Political Science for political science MA students at CEU (16-27 April 2018). Materials will be added as we go along.

Instructor: Martijn Schoonvelde

You can find the syllabus here. For any questions, send me an email at mschoonvelde[at]gmail[dot]com.


Date Link
April 23, 17:00h Assignment 1
April 30, 17:00h Assignment 2


Date Slides Date Slides
April 16 Link April 23 Link
April 17 Link April 24 Link
April 18 Link April 25 Link
April 19 Link April 26 Link
April 20 Link April 27 Presentations

Code practice

Date Link
April 16 Introduction
April 17 Script
April 18 Script, Data
April 19 Script, Data
April 20 Script, Data
April 23 Script, Data
April 24 Script, Data
April 25 Applications
April 26 Conclusion
April 27 Presentations


For some code in the code practice scripts, I made use of materials by Jos Elkink here, and here, and Wouter van Atteveldt here and here. The setup of the code practice scripts follows the structure in Welbers, K., Van Atteveldt, W., & Benoit, K. (2017) (see below for citation). For some slides in week 1 of the course I made use of materials by Pablo Barberá and Ken Benoit here.

Course schedule

April 16: 15:30 - 17:10:

  • Introduction to the course and to EUSpeech, a dataset which will use for running examples: Link
  • Required reading:
    • Schumacher, G., Schoonvelde, M., Traber, D., Dahiya, T., & De Vries, E. (2016). EUSpeech: a New Dataset of EU Elite Speeches. In: Proceedings of the International Conference on the Advances in Computational Analysis of Political Text, 75-80.

April 17: 15:30 - 17:10:

  • A survey of automated text analysis in political science. Supervised and unsupervised methods. Validation, validation, validation. Text Analysis in R.
  • Required reading:
    • Grimmer, J., & Stewart, B. M. (2013). Text as data: The promise and pitfalls of automatic content analysis methods for political texts. Political Analysis, 21(3), 267-297.
    • Welbers, K., Van Atteveldt, W., & Benoit, K. (2017). Text Analysis in R. Communication Methods and Measures, 11(4), 245-265.

April 18: 15:30 - 19:00:

  • Pre-processing data. Going from text to data, including a few notes of caution. Discussion of the research design and research note.
  • Required reading:
    • Denny, M. J., & Spirling, A. (2018). Text preprocessing for unsupervised learning: why it matters, when it misleads, and what to do about it. Forthcoming at Political Analysis.
    • Greene, Z., Ceron, A., Schumacher, G., & Fazekas, Z. (2016). The Nuts and Bolts of Automated Text Analysis. Comparing Different Document Pre-Processing Techniques in Four Countries. Working paper.

April 19: 15:30 - 17:10:

  • Systematically describing and comparing texts.
  • Required reading:
    • Chapters 3 and 4 of Silge, J., & Robinson, D. (2018). Text Mining with R: A Tidy Approach. O'Reilly Media, Inc. Available at Link

April 20: 15:30 - 17:10:

  • Using dictionaries to measure sentiment, happiness and other things we're interested in.
  • Required reading:
    • Pennebaker JW & King L (1999) Linguistic styles: language use as an individual difference. Journal of Personality and Social Psychology, 77(6), 1296-1312.
    • Young, L., & Soroka, S. (2012). Affective news: The automated coding of sentiment in political texts. Political Communication, 29(2), 205-231.
  • Suggested reading:
    • Rooduijn, M., & Pauwels, T. (2011). Measuring populism: Comparing two methods of content analysis. West European Politics, 34(6), 1272?1283.
    • Rheault, L., Beelen, K., Cochrane, C., & Hirst, G. (2016). Measuring Emotion in Parliamentary Debates with Automated Textual Analysis. PLoS One, 11(12).
  • 17:00: Coding Assignment 1 Due

April 23: 09:00 - 10:40:

  • Scaling methods locating text on an underlying (political) dimension. What do they mean? And how do they work?
  • Required reading:
    • Slapin JB & Proksch SO (2008) A Scaling Model for Estimating Time-Serial Positions from Texts. American Journal of Political Science 52, 705-722.
    • Hjorth, F., Klemmensen, R., Hobolt, S., Hansen, M. E., & Kurrild-Klitgaard, P. (2015). Computers, coders, and voters: Comparing automated methods for estimating party positions. Research & Politics, 2(2).
  • Suggested reading:
    • Lo, J., Proksch, S. O., & Slapin, J. B. (2016). Ideological clarity in multiparty competition: A new measure and test using election manifestos. British Journal of Political Science, 46(3), 591-610.

April 24: 09:00 - 10:40:

  • Topic models, unsupervised models for summarizing what a text is about.
  • Required reading:
    • Blei, D. M. (2012). Probabilistic topic models. Communications of the ACM, 55(4), 77-84.
    • Roberts, M et al.. (2014). Structural Topic Models for Open-Ended Survey Responses. American Journal of Political Science, 58(4), 1064-1082.
  • Suggested reading:

April 25: 09:00 - 10:40 & 11:00 - 12:40:

  • New developments in automated text analysis: (i) crowd-sourcing and (ii) measurement of elite personality, (iii) measurement of semantic shifts.
  • Required reading:
    • Ramey, A. J., Klingler, J. D., & Hollibaugh, G. E. (2016). Measuring elite personality using speech. Political Science Research and Methods, 1-22.
    • Benoit, K., Conway, D., Lauderdale, B. E., Laver, M., & Mikhaylov, S. (2016). Crowd-sourced text analysis: Reproducible and agile production of political data. American Political Science Review, 110(2), 278-295.
  • Suggested reading:
    • Azarbonyad, H., Dehghani, M., Beelen, K., Arkut, A., Marx, M., & Kamps, J. (2017). Words are Malleable: Computing Semantic Shifts in Political and Media Discourse. Proceedings of the 2017 ACM Conference on Information and Knowledge Management, 1509-1518.

April 26: 09:00 - 10:40:

  • Loose ends, review, and general discussion of pros and cons of automated text analysis.

April 27: 09:00 - 10:40:

  • Research design presentations.

30 April, 17:00: Coding Assignment 2 Due

4 May, 17:00: Research Note Due