LayoutPDFReader._parse_pdf returns error when pdf contains empty pages #59

aleksvercau · 2024-03-07T13:09:22Z

I tried processing a pdf file using the LayoutPDFReader.read_pdf() method, but got a KeyError for response_json['return_dict']['result']['blocks'], since the response did not contain results, because there was an error (on a side node: would be nice to have a specific error in this case instead of a key error, clearly stating that the file could not be processed and the reason why).

I split my pdf in pages and processed each page separately to understand what the issue was. Turns out that the error existed every time an empty page was being processed. I am not sure whether this is the case for empty pages of all types of pdfs or just for some pdf types (there are small differences between text pdfs depending on how they were created). It only occurred on one of the pdfs I was processing, but it was also the only pdf with empty pages...

Better: do not fail processing of a whole document if it has one empty page, but simply skip that page.

The text was updated successfully, but these errors were encountered:

jaavedd9 · 2024-03-25T16:10:48Z

I am facing issue too

mgrabmayr · 2024-04-23T22:54:18Z

me too. any intelligent fixes so far?

madhuprakash19 · 2024-07-04T04:38:49Z

I am facing the same issue

beko-dt · 2024-07-12T08:04:12Z

i also have this issue

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LayoutPDFReader._parse_pdf returns error when pdf contains empty pages #59

LayoutPDFReader._parse_pdf returns error when pdf contains empty pages #59

aleksvercau commented Mar 7, 2024

jaavedd9 commented Mar 25, 2024

mgrabmayr commented Apr 23, 2024

madhuprakash19 commented Jul 4, 2024

beko-dt commented Jul 12, 2024

LayoutPDFReader._parse_pdf returns error when pdf contains empty pages #59

LayoutPDFReader._parse_pdf returns error when pdf contains empty pages #59

Comments

aleksvercau commented Mar 7, 2024

jaavedd9 commented Mar 25, 2024

mgrabmayr commented Apr 23, 2024

madhuprakash19 commented Jul 4, 2024

beko-dt commented Jul 12, 2024