legislatoR is a data package for the software environment R. It comprises political, sociodemographic, and Wikipedia-related data on elected politicians across the globe. This version (0.2.0) includes 42,534 current and former elected politicians from nine countries' legislatures.
Researchers, students, analysts, journalists, and the public continue to rely on individual-level data on political elites for various kinds of analyses, whether theory-driven or motivated by real-world problems. As a consequence, the past has likely seen recurrent data collection efforts with the same purpose. Financial limitations or time restrictions often force analysts to limit their analyses to a subset of politicians, though. The frequent compromise is between many politicians but few variables or few politicians but many variables. Existing data structures are often limited in scope, hidden behind paywalls, or simply not accessible to those who are not super tech-savvy. legislatoR is an open-source, targeted, fast, and easily accessible one stop shop for comprehensive data on political elites. Data comes in a rectangular/spreadsheet format familiar to social scientists and ready for immediate analysis. It facilitates data integration, and supports replication efforts.
Content and data structure
The data package covers the following countries and time periods:
|Country||Legislative sessions||Politicians (unique)||Integrated with|
|Austria (Nationalrat)||all 26
|Canada (House of Commons)||all 42
|Czech Republic (Poslanecka Snemovna)||all 8
|France (Assemblée)||all 15
|Germany (Bundestag)||all 19
|4,075||BTVote data (Bergmann et al. 2018),
ParlSpeech data (Rauh et al. 2017)
|Ireland (Dail)||all 32
|Scotland (Parliament)||all 5
|United Kingdom (House of Commons)||all 57
|13,071||EggersSpirling data (starting from
38th session, Eggers/Spirling 2014)
|United States (House and Senate)||all 116
|12,512||Voteview data (Lewis et al. 2019),
Congressional Bills Project data (Adler/Wilkserson 2018)
For each legislature, the package holds nine datasets:
- Core (sociodemographic data)
- Political (political data)
- History (full revision records of individual Wikipedia biographies)
- Traffic (daily user traffic on individual Wikipedia biographies from July 2015 to December 2018)
- Social (social media handles and personal website URLs)
- Portrait (URLs to individual Wikipedia portraits)
- Office (public offices)
- Profession (professions)
- ID (a range of identifiers linking a politician to another file, database, or website)
The datasets contain the following variables (see the respective R help files for further details):
- Core: Country, Wikipedia page ID, Wikidata ID, Wikipedia Title, full name, sex, ethnicity, religion, date of birth and death, place of birth and death.
- Political: Wikipedia page ID, legislative session, party affiliation, lower constituency, upper constituency, constituency ID, start and end date of legislative session, period of service, majority status, leader positions.
- History: Wikipedia page ID, Wikipedia revision and previous revision ID, editor name/IP and ID, revision date and time, revision size, revision comment.
- Traffic: Wikipedia page ID, date, user traffic.
- Social: Wikidata ID, Twitter handle, Facebook handle, Youtube ID, Google Plus ID, Instagram handle, LinkedIn ID, personal website URL.
- Portraits: Wikipedia page ID, Wikipedia portrait URL.
- Offices: Wikidata ID, a range of offices such as attorney general, chief justice, mayor, party chair, secretary of state, etc.
- Professions: Wikidata ID, a range of professions such as accountant, farmer, historian, judge, mechanic, police officer, salesperson, teacher, etc.
- IDs: Wikidata ID, IDs for integration with various political science datsets as well as a range of other IDs such as parliamentary website IDs, Library of Congress or German National Library IDs, Notable Names Database or Project Vote Smart IDs, etc.
Note that for some legislatures or legislative periods, datasets may only hold data on a subset of observations. In successive versions of legislatoR, we try to fill some of these gaps.
legislatoR comes as a relational database. This means that all datasets can be joined with the Core dataset via one of two keys - the Wikipedia page ID or the Wikidata ID. These keys uniquely identify individual politicians. The figure below illustrates this structure and some of the package's content.
legislatoR is available through GitHub. To install the package in
After having installed the package, a working Internet connection is required in order to access the data in R. This is because the data are not installed with the package, but are stored on legislatoR's GitHub repository. The package provides dataset-specific function calls to fetch the data from the repository. These functions are named after the datasets and preceded by
get_. To fetch the Core dataset, use the
get_core function, for the Political dataset, use the
get_political function, and so on (see above for dataset names). The datasets are all legislature-specific. To access a dataset in R the legislature's code must be passed as an argument to the respective dataset's function call. The legislature codes are:
|Canadian House of Commons||
||UK House of Commons||
|Czech Poslanecka Snemovna||
||United States Congress||
Data can be joined and subsetted while being fetched from the repository and memory is only allocated by the parts of a dataset assigned into the environment. The data fetching, joining and subsetting stages are illustrated in the code below.
# load and attach legislatoR and dplyr packages library(legislatoR) library(dplyr) # assign entire Core dataset for the German Bundestag into the environment deu_politicians <- get_core(legislature = "deu") # assign only data for the 8th legislative session into the environment deu_politicians_subset <- semi_join(x = get_core(legislature = "deu"), y = filter(get_political(legislature = "deu"), session == 8), by = "pageid") # join deu_politicians_subset with respective History dataset deu_history <- left_join(x = deu_politicians_subset, y = get_history(legislature = "deu"), by = "pageid") # assign only birthdate for members of the political party 'SPD' into the environment deu_birthdates_SPD <- semi_join(x = select(get_core(legislature = "deu"), pageid, birth), y = filter(get_political(legislature = "deu"), party == "SPD"), by = "pageid")$birth
For each dataset, there is a help file with details on content and usage examples.
# call help file for legislatoR package to get an overview of the function calls ?legislatoR # call help file for the 'History' dataset ?get_history
See here for details on package updates.
See here for the full form of abbreviated country codes and party names and English translations of non-English party names.
Thank you for using legislatoR! Please consider citing:
Göbel, Sascha and Simon Munzert. (2019). legislatoR: Political, sociodemographic, and Wikipedia-related data on political elites. Source: https://github.com/saschagobel.
The work on this package was in part funded by the Daimler and Benz Foundation (Funding period 2017/18; project "Citizen and Elite Activity on the Wikipedia Market Place of Political Information").
Sascha Göbel (corresponding author and repository maintainer)
University of Konstanz
Graduate School of Decision Sciences and Center for Data and Methods
D-78457 Konstanz, Germany
Hertie School of Governance
Quartier 110 - Friedrichstrasse 180
D-10117 Berlin, Germany