Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NM: Update votes scraper (was: Fix individual members' votes in House PDFs) #65

Open
mileswwatkins opened this issue Jan 21, 2018 · 6 comments
Assignees
Labels
component:bill-data bill & vote data issues type:upstream issues that are waiting on an upstream fix

Comments

@mileswwatkins
Copy link
Member

New Mexico serves its votes in PDFs (directory), and we try to parse their tables using the x and y coordinates of the X checkmarks.

Unfortunately, at least for one of the 2018 session's House vote PDFs, the rows in the LXMLized vote PDF don't line up; that is, one of the vote checkmarks has a y coordinate that differs from its member, so the vote can't be attributed.

When this is detected, I'm setting the scraper to throw out individual-member counts, and keep vote totals. But the individual-member scraping is so close, some additional logic may be able to salvage these cases.

cc @cliftonmcintosh

@mileswwatkins
Copy link
Member Author

Here's a clean-ish version of the scraped rows for one problematic House vote PDF:

(Pdb) pprint(OrderedDict(sorted(rows.items(), key=lambda t: t[0])))
OrderedDict([(17, [('OFFICIAL ROLL CALL', 373, 168)]),
             (39, [('NEW MEXICO HOUSE OF REPRESENTATIVES', 275, 363)]),
             (60,
              [('Second Regular Session of the 53rd Legislature', 282, 350)]),
             (82, [('2018 Regular Session', 376, 162)]),
             (110, [('LEGISLATIVE DAY 2', 129, 158)]),
             (132, [('RCS# 14', 623, 67)]),
             (176, [('HB 1/ec', 427, 59)]),
             (198,
              [('F I N A L  P A S S A G E  with emergency clause', 278, 358)]),
             (224,
              [('YEAS: 67', 120, 73),
               ('NAYS: 0', 294, 65),
               ('EXCUSED: 0', 476, 99),
               ('ABSENT: 3', 647, 87)]),
             (248, [('REPRESENTATIVE', 54, 136), ('REPRESENTATIVE', 460, 136)]),
             (270,
              [('X', 229, 10),
               ('Adkins', 53, 45),
               ('X', 635, 10),
               ('Louis', 459, 36)]),
             (292,
              [('X', 229, 10),
               ('Alcon', 53, 38),
               ('X', 635, 10),
               ('Lundstrom', 459, 71)]),
             (314,
              [('X', 229, 10),
               ('Armstrong, D.', 53, 93),
               ('X', 635, 10),
               ('Maestas', 459, 57)]),
             (336,
              [('X', 229, 10),
               ('Armstrong, Gail', 53, 104),
               ('X', 635, 10),
               ('Maestas Barnes', 459, 108)]),
             (357,
              [('X', 229, 10),
               ('Baldonado', 53, 72),
               ('X', 635, 10),
               ('Martínez, Javier', 459, 107)]),
             (379,
              [('X', 229, 10),
               ('Bandy', 53, 43),
               ('X', 635, 10),
               ('Martinez, Rudy', 459, 101)]),
             (401,
              [('X', 229, 10),
               ('Brown', 53, 43),
               ('X', 816, 10),
               ('McCamley', 459, 71)]),
             (422,
              [('X', 229, 10),
               ('Chasey', 53, 51),
               ('X', 635, 10),
               ('McQueen', 459, 65)]),
             (444,
              [('X', 229, 10),
               ('Clahchischilliage', 53, 112),
               ('X', 635, 10),
               ('Montoya', 459, 58)]),
             (465, [('X', 326, 10)]),
             (466, [('Cook', 53, 35), ('X', 635, 10), ('Nibert', 459, 40)]),
             (487,
              [('X', 229, 10),
               ('Crowder', 53, 57),
               ('X', 635, 10),
               ('Powdrell-Culbert', 459, 111)]),
             (509,
              [('X', 229, 10),
               ('Dines', 53, 38),
               ('X', 635, 10),
               ('Rehm', 459, 40)]),
             (531,
              [('X', 229, 10),
               ('Dodge', 53, 44),
               ('X', 635, 10),
               ('Roch', 459, 35)]),
             (552,
              [('X', 229, 10),
               ('Dow', 53, 30),
               ('X', 635, 10),
               ('Rodella', 459, 51)]),
             (574,
              [('X', 229, 10),
               ('Egolf', 53, 34),
               ('X', 635, 10),
               ('Romero', 459, 53)]),
             (596,
              [('X', 229, 10),
               ('Ely', 53, 21),
               ('X', 635, 10),
               ('Roybal Caballero', 459, 115)]),
             (617,
              [('X', 229, 10),
               ('Ezzell', 53, 40),
               ('X', 635, 10),
               ('Rubio', 459, 39)]),
             (639,
              [('X', 229, 10),
               ('Fajardo', 53, 51),
               ('X', 635, 10),
               ('Ruiloba', 459, 51)]),
             (661,
              [('X', 229, 10),
               ('Ferrary', 53, 48),
               ('X', 635, 10),
               ('Salazar, Nick', 459, 88)]),
             (683,
              [('X', 229, 10),
               ('Gallegos, David', 53, 106),
               ('X', 635, 10),
               ('Salazar, Tomás', 459, 105)]),
             (704,
              [('X', 229, 10),
               ('Gallegos, Doreen', 53, 117),
               ('X', 635, 10),
               ('Sariñana', 459, 60)]),
             (726,
              [('X', 229, 10),
               ('Garcia Richard', 53, 100),
               ('X', 635, 10),
               ('Scott', 459, 34)]),
             (748,
              [('X', 229, 10),
               ('Garcia, Harry', 53, 89),
               ('X', 635, 10),
               ('Small', 459, 38)]),
             (769,
              [('X', 229, 10),
               ('García, M.P.', 53, 84),
               ('X', 635, 10),
               ('Smith', 459, 38)]),
             (791,
              [('X', 229, 10),
               ('Gentry', 53, 45),
               ('X', 635, 10),
               ('Stapleton', 459, 63)]),
             (813,
              [('X', 229, 10),
               ('Gomez', 53, 48),
               ('X', 635, 10),
               ('Strickler', 459, 54)]),
             (834,
              [('X', 229, 10),
               ('Gonzales', 53, 63),
               ('X', 635, 10),
               ('Sweetser', 459, 63)]),
             (856,
              [('X', 229, 10),
               ('Hall', 53, 26),
               ('X', 635, 10),
               ('Thomson', 459, 63)]),
             (878,
              [('X', 229, 10),
               ('Harper', 53, 46),
               ('X', 635, 10),
               ('Townsend', 459, 69)]),
             (899,
              [('X', 229, 10),
               ('Herrell', 53, 44),
               ('X', 635, 10),
               ('Trujillo, Carl', 459, 80)]),
             (921,
              [('X', 229, 10),
               ('Johnson', 53, 57),
               ('X', 635, 10),
               ('Trujillo, Christine', 459, 112)]),
             (943,
              [('X', 229, 10),
               ('Larrañaga', 53, 68),
               ('X', 635, 10),
               ('Trujillo, Jim', 459, 76)]),
             (965,
              [('X', 229, 10),
               ('Lente', 53, 38),
               ('X', 635, 10),
               ('Trujillo, Linda', 459, 89)]),
             (986, [('Lewis', 53, 38), ('X', 635, 10), ('Wooley', 459, 50)]),
             (987, [('X', 324, 10)]),
             (1008,
              [('X', 229, 10),
               ('Little', 53, 32),
               ('X', 635, 10),
               ('Youngblood', 459, 80)]),
             (1038,
              [('CERTIFIED CORRECT TO THE BEST OF OUR KNOWLEDGE', 354, 467)]),
             (1060, [('(Speaker)', 749, 72)]),
             (1082, [('(Chief Clerk)', 730, 93)])])

The scraper warns:

12:51:41 WARNING pupa: No vote found for ('X', 326, 10)
12:51:41 WARNING pupa: No vote found for ('X', 635, 10)
12:51:41 WARNING pupa: No vote found for ('X', 635, 10)
12:51:41 WARNING pupa: No vote found for ('X', 324, 10)

@estaub
Copy link

estaub commented Jan 21, 2018

Yeah, that's certainly nudgable. I'm curious about the PDF source; I wonder if it's OCR, or they used some WYSIWYG form generator and were sloppy.

@In-vincible
Copy link

@mileswwatkins is this solved?

@mileswwatkins
Copy link
Member Author

@In-vincible, no, my PR just changed the code to skip votes that were troublesome, which is why I spun off this ticket.

https://github.com/openstates/openstates/pull/2103/files#diff-d57e2d82487395e5ff5349aea8c56550R273

@schneidy
Copy link

Spacing issues still exist within PDFs. Sometimes a yes vote is categorized as a no vote. Example: https://www.nmlegis.gov/Sessions/19%20Regular/votes/HB0256HVOTE.PDF

@jamesturk jamesturk transferred this issue from openstates/openstates-scrapers Jun 23, 2020
@jamesturk jamesturk added component:bill-data bill & vote data issues type:upstream issues that are waiting on an upstream fix labels Jun 23, 2020
@jessemortenson jessemortenson self-assigned this Dec 19, 2023
@jessemortenson jessemortenson changed the title NM: Fix individual members' votes in House PDFs NM: Update votes scraper (was: Fix individual members' votes in House PDFs) Dec 19, 2023
@jessemortenson
Copy link

Adding context to this old issue:

  • The goal is to get a full dataset of Vote Events for NV in both House and Senate, including classification, motion text, the result of the vote (eg total number of votes, pass vs. fail) and also the yes/no vote of each member who voted.
  • We have a NM Votes scraper, but it hasn't been run/tested in years as far as I know (as you can see by the age of this ticket). So the first task is to debug this and see if the existing code is helpful at all (and needs tweaking) or if it way off the mark and we need to start fresh.
  • NM records Vote Events in PDF files, in the format of a big table, as you can see in this example. This is likely to be tricky to parse, as you can see with the history of this issue! Open to ideas on new approaches to this task.
  • NM displays links to these vote PDFs on the Bill details page on their website, for example see the votes button on this page
  • You can use the NM list of bills that passed to find good example of bills that have votes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component:bill-data bill & vote data issues type:upstream issues that are waiting on an upstream fix
Projects
None yet
Development

No branches or pull requests

6 participants