Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LayoutPDFReader._parse_pdf returns error when pdf contains empty pages #59

Open
aleksvercau opened this issue Mar 7, 2024 · 4 comments

Comments

@aleksvercau
Copy link

I tried processing a pdf file using the LayoutPDFReader.read_pdf() method, but got a KeyError for response_json['return_dict']['result']['blocks'], since the response did not contain results, because there was an error (on a side node: would be nice to have a specific error in this case instead of a key error, clearly stating that the file could not be processed and the reason why).

I split my pdf in pages and processed each page separately to understand what the issue was. Turns out that the error existed every time an empty page was being processed. I am not sure whether this is the case for empty pages of all types of pdfs or just for some pdf types (there are small differences between text pdfs depending on how they were created). It only occurred on one of the pdfs I was processing, but it was also the only pdf with empty pages...

Better: do not fail processing of a whole document if it has one empty page, but simply skip that page.

@jaavedd9
Copy link

I am facing issue too

@mgrabmayr
Copy link

me too. any intelligent fixes so far?

@madhuprakash19
Copy link

I am facing the same issue

@beko-dt
Copy link

beko-dt commented Jul 12, 2024

i also have this issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants