Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2020 General Election Precinct Results #69

67 tasks done
dwillis opened this issue Nov 17, 2020 · 9 comments
67 tasks done

2020 General Election Precinct Results #69

dwillis opened this issue Nov 17, 2020 · 9 comments


Copy link

dwillis commented Nov 17, 2020

Using Tabula, OCR or whatever method you can, parse precinct-level results for the following counties. Original sources are in the sources-pa repository.

The goal is to create a single CSV file for each county, with the following headers:

county, precinct, office, district, party, candidate, votes

If the county file also provides a breakdown of votes by method, include that using the following headers:

early_voting, election_day, provisional, absentee

Include the following offices:

  • Registered Voters (if available)
  • Ballots Cast (if available)
  • Straight Party
  • President
  • U.S. House
  • Attorney General
  • Auditor General
  • State Treasurer
  • State Senate
  • General Assembly

The CSV files should be named 20201103__pa__general__{county}__precinct.csv. Here's an example finished file:

  • Adams
  • Allegheny
  • Armstrong
  • Beaver
  • Bedford
  • Berks
  • Blair
  • Bradford
  • Bucks
  • Butler
  • Cambria
  • Cameron
  • Carbon
  • Centre
  • Chester
  • Clarion
  • Clearfield
  • Clinton
  • Columbia
  • Crawford
  • Cumberland
  • Dauphin
  • Delaware
  • Elk
  • Erie
  • Fayette
  • Forest
  • Franklin
  • Fulton
  • Greene
  • Huntingdon
  • Indiana
  • Jefferson
  • Juniata
  • Lackawanna
  • Lancaster
  • Lawrence
  • Lebanon
  • Lehigh
  • Luzerne
  • Lycoming
  • McKean
  • Mercer
  • Mifflin
  • Monroe
  • Montgomery
  • Montour
  • Northampton
  • Northumberland
  • Perry
  • Philadelphia
  • Pike
  • Potter
  • Schuylkill
  • Snyder
  • Somerset
  • Sullivan
  • Susquehanna
  • Tioga
  • Union
  • Venango
  • Warren
  • Washington
  • Wayne
  • Westmoreland
  • Wyoming
  • York
Copy link

Thanks as always for y'all's hard work here! A few initial issues I've experienced when using these precinct-level CSVs:

  • Butler County is missing the precincts alphabetically from Valencia to Worth
  • Fulton County's header row has a trailing (space) character
  • Fulton County has a bunch of empty rows at the bottom, not sure if they're supposed to be populated with non-null values
  • Wyoming County header has a trailing (space) character after precinct

Copy link

Perry County's New Buffalo is missing from the CSV, probably failed to parse from the PDF because its name is right before a page break:

Screen Shot 2021-02-18 at 13 33 44

Copy link

Here's my last set of notes for today. Thank you again for the hard work!

  • Greene County is missing President rows for the CENTER precinct, although these appear in the PDF source.
  • York County is missing President rows for Newberry Township 1, perhaps because that row is at a page break in the PDF:

Screen Shot 2021-02-18 at 13 49 15

- Clarion County is missing `President` rows for the `32.01 - Strattanville Borough` precinct, perhaps because of the PDF's page break as well:

Screen Shot 2021-02-18 at 13 58 04

Copy link
Contributor Author

dwillis commented Feb 18, 2021

@mileswwatkins many thanks for these - we'll get on them.

Copy link

Oh, and Erie County's 40001 - WAYNE TOWNSHIP has a trailing space in its precinct name

Copy link

Blair County's CSV failed to extract the full/distinct precinct names from the ElectionWare PDFs.

Eg, Altoona Ward 2, Precinct 2 in the PDF appears as only Altoona Ward 2 in the CSV, and similar with Blair Township, District 3 becoming Blair Township, etc. (The county has lots and lots of these numbered precincts, FWIW.)

Copy link
Contributor Author

dwillis commented Feb 19, 2021

@mileswwatkins Ok, I believe all of these issues have been resolved.

Copy link

Thank you so much, @dwillis! Just finished another round of QA, including comparing county vote totals against AP/Edison, and everything looks great.

The largest discrepancy is that Beaver County is a couple thousand votes short (y'all have an older unofficial-results PDF instead of the final/official results currently on their site), but doesn't affect my use case :)

Copy link
Contributor Author

dwillis commented Feb 20, 2021

@mileswwatkins awesome, thanks for letting me know. We've updated Beaver.

@dwillis dwillis closed this as completed Mar 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet
None yet

No branches or pull requests

2 participants