Skip to content

This is a repository for POSC-207, a quantitative text analysis course, Fall 2020 at University of California, Riverside.

Notifications You must be signed in to change notification settings

lorenc5/POSC-207

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

POSC-207

This course investigates how to use digitized texts -- news articles, speeches, laws, press releases, party manifestos/platforms, transcripts, open-ended surveys, Tweets, etc. -- as sources of data for social science research.

We begin with overviews of the ``text as data'' field in political science -- which is heavily influenced by computer science branch of natural language processing (NLP). The idea is to get you data as a poor graduate student for free that you can then use in your own research to answer questions of theoretical interest.

We then discuss theory/mechanics of converting text into data. This will include topics like preprocessing text and related NLP tasks (e.g., stemming, tokenizing) and representing text as data (e.g., bag-of-words, measures of association), etc. Text data is often ``messy'' so handling that will be a large part of this course (e.g., web scraping, file encodings, file formats, extracting only relevant text from strings, etc.).

We'll then turn to the major approaches to measuring social science concepts with textual data, including rule-based methods, supervised learning from human-coded or known examples, and un-supervised methods. As we go, we will discuss particular measurement objectives like classification, scaling, topic modeling, and analysis of sentiment and stance, as well as ways of validating our models.

Depending on time, student interest and capacity, we may learn about the neural network / deep learning approach that has come to dominate NLP in recent years.

The course will assume students have some graduate level work in statistical inference, quantitative social science methodology, or machine learning, and at least know what R is but ideally some experience with R.

About

This is a repository for POSC-207, a quantitative text analysis course, Fall 2020 at University of California, Riverside.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages