WARNING: This project is a work in progress, the data is not fully processed and the project is not fully documented. The below information may be outdated or incorrect at times. Once the project is stable, this warning will be removed.
Congress rollcall data from the U.S. Senate and U.S. House of Representatives. 101 -> 118 Congresses (1989 - 2024).
If you run the crate make sure you knwo what you're doing, most user interaction is done through the scripts in the scripts directory. Still, don't run things you don't understand.
XML and un-processed JSON data is available in the full_data directory, its compressed as data_files.tar.gz.
The master JSON file is compressed as master_json.tar.gz in the full_data directory.
The SQLite database is available as votes.db in the full_data directory.
- Download the data from the full_data directory.
- Extract the data.
- Run the
convert_all.shscript to convert the XML data to JSON. - Run the
cargo run -- process_votes [json|sql]to process the JSON data into a SQLite database or a master JSON file.
- Download the new data using
cargo run -- download_xml [house|senate] [congress_number] [session_number] [rollcall_number]. - Run
xml_to_json.sh [house|senate] [congress_number] [session_number] [job_count] [log_to_file]to convert the XML data to JSON. This will try to convert the session data to JSON, existing data will be skipped. - Run
cargo run -- process_votes [json|sql]to process the JSON data into a SQLite database or a master JSON file. If using thesqloption, the existing database will be updated with the new data, if using thejsonoption, you must reconstruct the master JSON file. Thesqloption also allows for the processing of individual JSON files, see the help message for more information. Overall, thesqloption is the best option for adding, querying, and filtering data.
The schema can be found in the full_data directory as schema.sql.
WARNING: This is a simplified version of the actual structure. The actual structure is much larger - 6GB+ of pure JSON data. Be prepared to handle large files. The SQLite database is a better option for querying and filtering data. The JSON structure is not optimized for querying, or really anything, its just a dump of the XML data. The SQLite database has significantly better performance and significantly more information (per vote/member/etc).
{
"chambers": {
"house": {
"congresses": {
"110": {
"sessions": {
"2": {
"rollcalls": [
{
"rollcall_number": 525,
"vote_date": "24-Jul-2008",
"vote_question": "On Agreeing to the Resolution",
"vote_result": "Passed",
"vote_casts": [
{
"congress_number": 110,
"chamber": "house",
"session_number": 2,
"rollcall_number": 525,
"vote_date": "24-Jul-2008",
"vote_question": "On Agreeing to the Resolution",
"vote_result": "Passed",
"legislator_id": "A000014",
"legislator_name": "Abercrombie",
"party": "D",
"state": "HI",
"vote_cast": "Yea"
},
{
"congress_number": 110,
"chamber": "house",
"session_number": 2,
"rollcall_number": 525,
"vote_date": "24-Jul-2008",
"vote_question": "On Agreeing to the Resolution",
"vote_result": "Passed",
"legislator_id": "A000022",
"legislator_name": "Ackerman",
"party": "D",
"state": "NY",
"vote_cast": "Yea"
},
{
"congress_number": 110,
"chamber": "house",
"session_number": 2,
"rollcall_number": 525,
"vote_date": "24-Jul-2008",
"vote_question": "On Agreeing to the Resolution",
"vote_result": "Passed",
"legislator_id": "A000055",
"legislator_name": "Aderholt",
"party": "R",
"state": "AL",
"vote_cast": "Nay"
},
{
"congress_number": 110,
"chamber": "house",
"session_number": 2,
"rollcall_number": 525,
"vote_date": "24-Jul-2008",
"vote_question": "On Agreeing to the Resolution",
"vote_result": "Passed",
"legislator_id": "A000358",
"legislator_name": "Akin",
"party": "R",
"state": "MO",
"vote_cast": "Nay"
},
{
"congress_number": 110,
"chamber": "house",
"session_number": 2,
"rollcall_number": 525,
"vote_date": "24-Jul-2008",
"vote_question": "On Agreeing to the Resolution",
"vote_result": "Passed",
"legislator_id": "A000361",
"legislator_name": "Alexander",
"party": "R",
"state": "LA",
"vote_cast": "Nay"
},
{
"congress_number": 110,
"chamber": "house",
"session_number": 2,
"rollcall_number": 525,
"vote_date": "24-Jul-2008",
"vote_question": "On Agreeing to the Resolution",
"vote_result": "Passed",
"legislator_id": "A000357",
"legislator_name": "Allen",
"party": "D",
"state": "ME",
"vote_cast": "Yea"
},
}
]
}
}
}
}
},
"senate": {
"...": "same structure as house",
}
}
}This project is licensed under the MIT License. See the LICENSE file for details.
The data is sourced from the U.S. Senate and U.S. House of Representatives websites.
For questions or feedback, please contact me on github or email me here.
If you find this project helpful, consider donating PayPal.