Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

invalid or incomplete deflate data #88

Closed
FinPl opened this issue Apr 10, 2020 · 2 comments
Closed

invalid or incomplete deflate data #88

FinPl opened this issue Apr 10, 2020 · 2 comments

Comments

@FinPl
Copy link

FinPl commented Apr 10, 2020

Hello,
I am encountering an error while trying to extract text from the first page of this document:

https://www.diffusion.transports.gouv.qc.ca/ords/pes/APEX_PES.P_PESB_DSI_AFFCH_RIG?P_VC_NUM_DOSSR=00007

I am rather new to pdf parsing and as I understand it there might be a problem with the compression used. Other documents which are similar work perfectly fine.

This one works:

https://www.diffusion.transports.gouv.qc.ca/ords/pes/APEX_PES.P_PESB_DSI_AFFCH_RIG?P_VC_NUM_DOSSR=00003

Can you help me solve that issue?

@sambitdash
Copy link
Owner

The compressed stream for the content stream is corrupt. Hence, at some point the extraction will not be completed. Accepting partial corrupt data can make the file pass through but some data may be corrupt due to bad flate compressed data.

@sambitdash
Copy link
Owner

Fix in: 208d064

There can never be a perfect solution when the data is corrupt, whatever data can be recovered is recovered.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants