Crash when trying to extract table it can't quite find #216

stucka · 2020-05-27T14:32:57Z

pdfplumber crashes when it's trying to extract_table with a table it can't quite ... find? but extract_tables (plural) returns an empty list. Should extract_table return None or an empty list?
May 18 file from here works fine. May 25 file crashes:
http://ldh.la.gov/index.cfm/page/3965

      1 for pagenumber, page in enumerate(pdf.pages):
----> 2     table = page.extract_table()

c:\python37\lib\site-packages\pdfplumber\page.py in extract_table(self, table_settings)
    177         # Return the largest table, as measured by number of cells.
    178         sorter = lambda x: (-len(x.cells), x.bbox[1], x.bbox[0])
--> 179         largest = list(sorted(tables, key=sorter))[0]
    180         return largest.extract()
    181 

IndexError: list index out of range

The text was updated successfully, but these errors were encountered:

jsvine · 2020-05-28T02:09:43Z

Thanks for flagging, @stucka! That's an oversight on my part, and changing the behavior sounds like a good idea.

jsvine · 2020-05-28T02:26:37Z

Fixed and now available in v0.5.21. Thanks again!

stucka · 2020-06-02T18:48:50Z

Thank you! Weirdly, still getting that though, and I can't figure out why from your code. It's now doing that on two of three versions of the same report, but the first one worked. Latest:
http://ldh.la.gov/assets/oph/Coronavirus/NursingHomes/NHReport053120.pdf
Download page: http://ldh.la.gov/index.cfm/page/3965

5/25 crashed on the penultimate version of pdfplumber. Here's the 5/31 file:

IndexError Traceback (most recent call last)
in
4 masterlist = []
5 for page in pdf.pages:
----> 6 table = page.extract_table()
7 for row in table:
8 line = OrderedDict()

c:\python37\lib\site-packages\pdfplumber\page.py in extract_table(self, table_settings)
177
178 if len(tables) == 0:
--> 179 return None
180
181 # Return the largest table, as measured by number of cells.

IndexError: list index out of range

jsvine · 2020-07-18T14:55:21Z

Just a note to say that, on my tests, I'm not getting this error. Judging by the traceback in your most recent comment, I wonder whether it was a temporary environment issue, since the IndexError doesn't seem to match up with the code the traceback produced. (I.e., return None shouldn't ever produce an IndexError, but perhaps I'm misreading.) In any case, if this issue persists for you, feel free to reopen this thread or start a new one. Thanks again for the initial bug report!

jsvine added the bug label May 28, 2020

jsvine closed this as completed in d64afa8 May 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Crash when trying to extract table it can't quite find #216

Crash when trying to extract table it can't quite find #216

stucka commented May 27, 2020

jsvine commented May 28, 2020

jsvine commented May 28, 2020

stucka commented Jun 2, 2020

jsvine commented Jul 18, 2020

Crash when trying to extract table it can't quite find #216

Crash when trying to extract table it can't quite find #216

Comments

stucka commented May 27, 2020

jsvine commented May 28, 2020

jsvine commented May 28, 2020

stucka commented Jun 2, 2020

jsvine commented Jul 18, 2020