This repository contains the speech data used in the following manuscript: Figurative and Literal Language in U.S. Governor Speeches (L. Picard & D. Stammbach). It includes raw text files and metadata for 1,296 speeches (State of the State addresses from U.S. governors and State of the Union addresses from U.S. presidents), covering the period 1995–2022.
List of variables from metadata.csv:
| Variable | Definition |
|---|---|
st_name |
State full name |
st_id |
State two-letter identifier |
year |
Speech year of delivery |
filename |
Speech file name, found in folder speeches_raw/ |
speaker |
Speaker full name, in format Lastname_Firstname |
party |
Speaker political party (1 = republican, 2 = democratic, 3 = other) |
age_endyear |
Speaker age, defined at the end of the year |
gender |
Speaker gender (0 = male, 1 = female) |
ethnicity |
Speaker ethnicity (1 = caucasian, 2 = african american, 3 = asian american, 4 = native american, 5 = hispanic, 6 = other) |
elec_lastyear |
Indicator variable for speeches following an election year (1 = yes, 0 = no) |
vshare_lastelec |
Speaker vote share (0-100, or missing if elected through other means, see remarks) |
last_term |
Indicator variable for term-limited speakers, term level (1 = yes, 0 = no) |
term_limit |
Indicator variable for term-limited speakers, year level (1 = yes, 0 = no) |
type |
Speech type (sots = State of the State, sotu = State of the Union, budg = State of the Budget or Budget address, inaug = Inauguration speech, other = Mixed or other type) |
quality |
Speech text quality (as prepared/ocr/bulletpoints/youtube cc/quotes) |
source |
Source of speech, URL link |
remarks |
Optional remarks on election characteristics or governor tenure |