Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Elections only available in PDF #28

Open
nbdavies opened this issue Jul 30, 2017 · 8 comments
Open

Elections only available in PDF #28

nbdavies opened this issue Jul 30, 2017 · 8 comments

Comments

@nbdavies
Copy link
Contributor

These elections have results for some or all offices that are available only in PDF:

  • 2005-02-15 (ID 443)
  • 2004-11-02 (ID 444)
  • 2004-09-14 (ID 445)
  • 2004-04-06 (ID 446)
  • 2004-02-17 (ID 447)
  • 2003-07-22 (ID 685)
  • 2003-04-29 (ID 1756)

I pressed WEC to get Excel versions of these added to their site. Their spokesperson said that since they've gone through a couple reorganizations since these elections occurred, they no longer have the database that these files were produced out of. So for them to produce and host Excel files for these elections, they'd have to extract results from the PDFs.

We'll have to figure out how to process these files. I've tried Tabula and PDF python libraries, but everything I've managed to produce has been lossy and/or messy.

Here's an example input file:
http://elections.wi.gov/sites/default/files/elecSpec03_wbw_assm18.pdf

@davipo
Copy link
Contributor

davipo commented Aug 8, 2017

Results for years 2000, 2001, and early April 2002 and 2003 are also available only in PDF format. Should these be added to this list?

http://elections.wi.gov/elections-voting/results-all

@nbdavies
Copy link
Contributor Author

nbdavies commented Dec 13, 2017

Election 422 (2011-04-05) belongs in this list as well.

@nbdavies
Copy link
Contributor Author

nbdavies commented Feb 14, 2018

@davipo
Copy link
Contributor

davipo commented May 8, 2018

For the 2006-09-12 primary election (id 437),
we have a zip archive file for each party, containing some xls files and several PDF files.

These offices have only PDF files:
Senate, House, State Treasurer, State Senate, Assembly, District Attorney
(There are xls files for Governor, Lieutenant Governor, Attorney General, Secretary of State)

Four additional PDFs are labeled recount:
Democratic_2006_FallElection_Primary_Recount_AD47_WardbyWard.pdf
Democratic_2006_FallElection_Primary_Recount_AD87_WardbyWard.pdf
Democratic_2006_FallElection_Primary_Recount_SD13_WardbyWard.pdf
Republican_2006_FallElection_Primary_Recount_DistrictAttorney_Shawano-Menominee_WardbyWard.pdf

@davipo
Copy link
Contributor

davipo commented May 29, 2018

Here's an updated table of elections with no results, or results only in PDF format:

    id        date      special    primary    recall     no_data  
   1849    2000-02-15                 P                 pdf only  
   1847    2000-04-04                                   pdf only  
   1848    2000-04-04                 P                 pdf only  
   1846    2000-09-12                 P                 pdf only  
   1844    2001-02-20                 P                 pdf only  
   1843    2001-04-03                                   pdf only  
   1842    2002-02-19                 P                 pdf only  
   1841    2002-04-02                                   pdf only  
   1840    2003-02-18                 P                 pdf only  
   1839    2003-04-01                                   pdf only  
   1756    2003-04-29      S                            pdf only  
    689    2003-06-24      S          P                    nd     
    685    2003-07-22      S                            pdf only  
    674    2003-10-21                 P          R         nd     
    664    2003-11-18                            R         nd     
    448    2004-01-27      S                               nd     
    447    2004-02-17                 P                 pdf only  
    446    2004-04-06                                   pdf only  
    445    2004-09-14                 P                 pdf only  
    443    2005-02-15                 P                 pdf only  
    422    2011-04-05      S          P                 pdf only  

Results for some of the offices in these elections are only in PDF format:
id 444, 2004-11-02
id 437, 2006-09-12
(see office_table.xlsx for details)

id 1577, 2002-11-05 now has xls data for District Attorney (previously only in PDF?)
I've added this file to the metadata at dashboard.openelections.net

@davipo
Copy link
Contributor

davipo commented May 29, 2018

Last September, Derek recommended pdftotext, in the xPDF package.

@nbdavies
Copy link
Contributor Author

nbdavies commented Jun 13, 2018

Here's a couple examples of running pdftotext on Wisconsin PDF input files.

wxw_assm_60_94_pdf_16250.txt
2000_Democrat_State_Senate_WardbyWard_Returns.txt
2004_FallElection_USCongress_WardbyWard.txt

There are some format differences between pre- and post-2010. Also 2003 and earlier used vertical candidate names, and 2004 and later started showing those horizontally.

@nbdavies
Copy link
Contributor Author

The recount results for election 421 (2011-04-05) for Sheboygan district court branch 3 are also only available in PDF

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants