This repository collects and processes voting data from the European Parliament (EP). It provides tools to gather either (1) daily votes or (2) historical data for the 9th and 10th mandates using the EP Open Data API. The scripts are designed to automate data collection, cleaning, and aggregation, making it easier to analyze voting patterns and trends.
The repository serves three primary purposes:
- Daily Vote Collection: Automates the retrieval and processing of voting data from the most recent EP Plenary Session (
scripts_main/ep_rcv_today.R). - Historical Data Collection: Provides scripts to download and process all voting data for the 9th and 10th mandates, or just the 10th (respectively, via
scripts_main/ep_rcv_mandates_all.Randscripts_main/ep_rcv_mandate_10.R). - Analysis: Facilitates the analysis of MEPs' behavior (these are the scripts in the
analyses/folder).
These are the entry points that orchestrate the entire data collection workflow:
ep_rcv_today.R: Collects today's plenary votes and roll-call votes (RCV)ep_rcv_mandate_10.R: Collects all votes for the 10th mandate (2024-present)ep_rcv_mandates_all.R: Collects votes for both 9th and 10th mandatesmaster_day.R: Daily orchestrator that runsep_rcv_today.Rand handles foreseen activities
These scripts handle specific API endpoints and data processing tasks. They are called by the main scripts:
api_meetings.R: Fetches plenary meeting schedules and metadataapi_meetings_decisions.R: Collects voting decisions (the core voting data)api_meetings_attendance.R: Gathers official attendance recordsapi_meetings_voteresults.R: Retrieves vote titles and metadataapi_meps.R: Downloads MEP information, mandates, and membership detailsapi_bodies.R: Fetches political groups and national party lookup tablesapi_pl_docs.R: Collects plenary document informationapi_pl_session_docs_ids.R: Identifies final votes from session documents
clean_decisions.R: Processes raw voting decision JSON filesprocess_decisions_session_*.R: Series of scripts that clean different aspects of voting data:process_decisions_session_metadata.R: Vote metadataprocess_decisions_session_rcv.R: Roll-call vote individual positionsprocess_decisions_session_intentions.R: Vote corrections/intentions
aggregate_rcv.R/aggregate_rcv_today.R: Aggregate individual votes by political groups
repo_setup.R: Creates directory structure and defines theget_api_data()functionparallel_api_calls.R: Handles concurrent API requests for efficiencyjoin_functions.R: Data merging utilitiesget_majority.R/cohesionrate_function.R: Analysis functions
All data is retrieved from the EP Open Data API (base URL: https://data.europarl.europa.eu/api/v2). The main endpoints used are:
GET/meetings: Plenary meeting calendar and session metadataGET/meetings/{event-id}/decisions: Individual voting decisions and roll-call votes (RCV)GET/meetings/{event-id}/attendance: Official attendance recordsGET/meetings/{event-id}/vote-results: Vote titles and descriptionsGET/meps/GET/meps/{mep-id}: MEP biographical data, mandates, and political group membershipsGET/corporate-bodies/{body-id}: Political group and national party informationGET/plenary-documents: Plenary session document metadataGET/plenary-session-documents/{doc-id}: Individual session documents (used to identify final votes)
- Roll-Call Votes (RCV): Individual MEP voting positions (For/Against/Abstention) recorded electronically
- Decisions: Voting events that may or may not be RCVs (some are voice votes)
- Final Votes: Legislative votes that definitively adopt or reject proposals
- Mandates: MEP terms of office (9th mandate: 2019-2024, 10th mandate: 2024-2029)
- Political Groups: EP party groups (e.g., EPP, S&D, Renew, etc.)
- National Parties: Domestic political parties MEPs belong to
All main scripts follow a similar 4-step pattern:
- Fetch Meeting Data: Get plenary session dates and identifiers
- Collect Voting Data: Download decisions/RCVs for those sessions
- Gather MEP Information: Get current MEP list with political affiliations
- Merge & Process: Combine voting data with MEP data, creating a complete grid
The repository uses an intelligent caching mechanism via the get_api_data() function (defined in repo_setup.R):
- Checks file age: Only re-downloads data if existing files are older than
max_daysthreshold - Conditional execution: Loads existing CSV/RDS files if recent enough, otherwise runs the API script
- Prevents redundant calls: Avoids hitting the API unnecessarily, which is crucial given the data volume
- Raw JSON Collection: API responses stored as JSON files in
data_in/meeting_decision_json/ - Initial Cleaning: JSON flattened and basic cleaning applied
- Specialized Processing: Different processors handle votes, RCVs, and intentions separately
- MEP Grid Creation: Generates comprehensive grid of all MEP-vote combinations, including absences
- Final Integration: Merges voting positions with MEP metadata and political affiliations
To use this repository, follow these steps:
- Clone the repository:
git clone https://github.com/your-repo/ep_vote_collection.git
cd ep_vote_collection-
Install R and required dependencies:
- Install R from CRAN
- Use an IDE such as RStudio, Positron, or Visual Studio Code
- Required R packages are automatically installed via
pacman::p_load()calls in each script
-
(Optional) Use the provided
.devcontainersetup to deploy in a GitHub Codespace for a pre-configured environment. -
No API key required - the EP Open Data API is publicly accessible
To collect and clean today's votes in the EP:
Primary Script: scripts_main/ep_rcv_today.R
This script automatically:
- Identifies today's plenary session using the pattern
MTG-PL-{today's date} - Downloads attendance records and voting decisions for that session
- Processes all roll-call votes (RCVs) for the day
- Merges voting data with current MEP information
- Creates a comprehensive grid including MEPs who were absent or didn't vote
Output Files (stored in data_out/daily/):
rcv_today_{YYYYMMDD}.csv: Individual MEP voting positionsvotes_today_{YYYYMMDD}.csv: Vote metadata and results
Aggregation: Run scripts_r/aggregate_rcv_today.R to generate:
result_bygroup_byrcv.csv: Vote tallies by political groupfullresult_bygroup_byrcv.csv: Enhanced data including absent MEPs and non-votes
Master Daily Script: scripts_main/master_day.R coordinates the full daily workflow, including foreseen activities.
Primary Script: scripts_main/ep_rcv_mandate_10.R
Execution Steps:
- Meeting Collection (
api_meetings.R): Downloads plenary meeting calendar - Attendance Records (
api_meetings_attendance.R): Official attendance lists - Vote Decisions (
api_meetings_decisions.R): Raw voting data (stored asrcv_tmp_10.RDS) - Data Cleaning (
clean_decisions.R): Processes JSON files using specialized functions:process_decisions_session_metadata.R: Vote metadataprocess_decisions_session_rcv.R: Individual MEP positionsprocess_decisions_session_intentions.R: Vote corrections
- Vote Titles (
api_meetings_voteresults.R): Descriptive information about votes - MEP Data (
api_meps.R): Current MEP list with mandate periods and political affiliations - Lookup Tables (
api_bodies.R): Political groups and national party dictionaries - Plenary Documents (
api_pl_docs.R+api_pl_session_docs_ids.R): Identifies final votes - Final Integration: Creates comprehensive MEP-vote grid and merges all datasets
Key Output Files:
data_out/votes/pl_votes_10.csv: Vote metadata (wide format)data_out/rcv/pl_rcv_10.csv: Individual voting positions (long format)data_out/meps_rcv_mandate_10.csv: Complete MEP-vote matrix including absencesdata_out/meps/meps_dates_ids_10.csv: MEP information with date rangesdata_out/bodies/: Political group and national party lookup tables
Primary Script: scripts_main/ep_rcv_mandates_all.R
- Similar workflow to 10th mandate but covers 2019-present
- Significantly larger dataset requiring more processing time
Important Notes:
- The final MEP-vote file can be very large (12+ million rows for full mandates)
- Vote metadata is kept separate to manage file sizes
- Join
meps_rcv_mandate.csvwithvotes.csvusing thenotation_votingIdcolumn for complete analysis
data_out/
├── daily/ # Daily vote files with date stamps
├── votes/ # Vote metadata and titles
├── rcv/ # Roll-call voting positions
├── meps/ # MEP information and date grids
├── bodies/ # Political group/national party lookup tables
├── attendance/ # Official attendance records
├── meetings/ # Plenary session metadata
├── docs_pl/ # Plenary document information
└── aggregates/ # Processed analytical outputs
- Individual Votes: Long format with one row per MEP-vote combination
- Vote Metadata: Wide format with one row per voting event
- MEP Grids: Complete matrices showing which MEPs should have been present for each vote
- Lookup Tables: ID-to-name mappings for political groups and national parties
1. Data Availability Delays
- API data may not be immediately available after votes occur
- Scripts will fail if data hasn't reached the servers yet
- Solution: Re-run scripts later in the day or next day
2. Language/Translation Issues
- Many translations accumulate over time
- Initially, only multilingual (
mul) or French (.fr) versions may be available - Vote titles may appear in limited languages initially
3. Data Quality Issues
- Duplicate records can occur and require cleaning
- MEP voting intentions (corrections) are recorded separately and processed later
- Political group changes (e.g., "GUE" → "The Left") create duplicate membership records that need downstream handling
4. Performance Considerations
- Full mandate datasets are very large (12+ million rows)
- Excel and similar tools may not handle complete datasets
- Consider using R, Python, or database tools for analysis
5. MEP Membership Complexity
- MEPs can change political groups during their mandate
- National party information may be missing or incorrect in source data
- Always verify political affiliations for critical analysis
Run scripts directly in R/RStudio after installing dependencies. All required packages are automatically installed via pacman::p_load().
The repository includes a .devcontainer configuration for deployment in GitHub Codespaces:
- Pre-configured R environment with all dependencies
- See r2u for Codespaces for details
- Free tier available with GitHub account (see limits)
- Daily scripts: Run quickly (minutes)
- Full mandate scripts: Can take hours due to data volume
- Large datasets: As of 2024, RCV files exceed 12 million rows
- Recommended tools: R, Python, or database systems for analysis (Excel will truncate large files)
- Run
scripts_main/ep_rcv_today.Ron plenary days - Optionally run
scripts_r/aggregate_rcv_today.Rfor group-level summaries
- Run
scripts_main/ep_rcv_mandate_10.Rfor current mandate data - Run
scripts_main/ep_rcv_mandates_all.Rfor complete historical data
- Use files in
analyses/folder for example analytical workflows - Join MEP-vote data with vote metadata using
notation_votingId - Refer to lookup tables in
data_out/bodies/for human-readable labels
This repository is designed for researchers and analysts studying European Parliament voting behavior. When contributing:
- Test thoroughly: Always verify output against official EP records
- Document changes: Update this README when modifying workflows
- Handle data carefully: Be mindful of the large dataset sizes
- Respect API limits: The caching system helps prevent excessive API calls
For questions about EP voting procedures or data interpretation, consult the European Parliament's official documentation.