POSC-207

This course investigates how to use digitized texts -- news articles, speeches, laws, press releases, party manifestos/platforms, transcripts, open-ended surveys, Tweets, etc. -- as sources of data for social science research.

We begin with overviews of the ``text as data'' field in political science -- which is heavily influenced by computer science branch of natural language processing (NLP). The idea is to get you data as a poor graduate student for free that you can then use in your own research to answer questions of theoretical interest.

We then discuss theory/mechanics of converting text into data. This will include topics like preprocessing text and related NLP tasks (e.g., stemming, tokenizing) and representing text as data (e.g., bag-of-words, measures of association), etc. Text data is often ``messy'' so handling that will be a large part of this course (e.g., web scraping, file encodings, file formats, extracting only relevant text from strings, etc.).

We'll then turn to the major approaches to measuring social science concepts with textual data, including rule-based methods, supervised learning from human-coded or known examples, and un-supervised methods. As we go, we will discuss particular measurement objectives like classification, scaling, topic modeling, and analysis of sentiment and stance, as well as ways of validating our models.

Depending on time, student interest and capacity, we may learn about the neural network / deep learning approach that has come to dominate NLP in recent years.

The course will assume students have some graduate level work in statistical inference, quantitative social science methodology, or machine learning, and at least know what R is but ideally some experience with R.

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
data		data
lecture		lecture
papers		papers
student_questions		student_questions
syllabus		syllabus
README.md		README.md
Scrape Tutorial.R		Scrape Tutorial.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

POSC-207

About

Releases

Packages

Contributors 2

Languages

lorenc5/POSC-207

Folders and files

Latest commit

History

Repository files navigation

POSC-207

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages