Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to start parsing PDF file #179

Open
SAanish opened this issue Jul 26, 2017 · 11 comments
Open

Unable to start parsing PDF file #179

SAanish opened this issue Jul 26, 2017 · 11 comments

Comments

@SAanish
Copy link

SAanish commented Jul 26, 2017

var pdfReader = hummus.createReader(sourcePath);
pageNumber=pdfReader.getPagesCount()

@galkahana
Copy link
Owner

maybe the path is wrong? maybe its not a pdf?
this is fairly basic stuff

@Jackychans
Copy link

Run into the same issue with this pdf file. Please help

Path is correct. Only issue potentially from the pdf itself
tempDoc.pdf

Looking forward to any advise.

@yogalink
Copy link

Hello, i run into the same error 👍

In my case it was observed only on pdf version 1.3, however as jackychans shows us it's also for 1.7

Same case, path and data are correct, it comes from hummus.createReader() on nodejs.

@galkahana
Copy link
Owner

you'll need to send the PDF if you want it debugged

@galkahana
Copy link
Owner

@Jackychans tempDoc.pdf has got a header which is not PDF. remove all the part up to %PDF-1.7 (not including) and the file should parse fine.

@Jackychans
Copy link

Thanks @galkahana ga for response although it's not pretty fast, hehe.

I had found it wrong in the header of the file just after posting issue here.

Again, thanks

@zerobytes
Copy link

@Jackychans tempDoc.pdf has got a header which is not PDF. remove all the part up to %PDF-1.7 (not including) and the file should parse fine.

You say the header is not PDF, however any PDF reader will open the file normally. So i would assume the lib show either ignore the thinks it doesn't "care" or replace them, as it is nearly impossible to predict what will come inside the file that hummus doesn't want, considering a file that works everywhere else.

Let's say i go to google docs and generate a file, and it comes with something on its header. It will open anywhere, but my program, because hummus does not support it somehow.

@SolidTears
Copy link

same, i got this error with 4 different pdf...

@untrustedlifeswanleap
Copy link

untrustedlifeswanleap commented Sep 22, 2020

I have had this error with every pdf ive tested with and they all have properly formatted headers, I think something is wrong with the currently released version of hummus

@FranklinThaker
Copy link

Hummus is declining some PDFs as they're not according to PDF standards.
Check your PDF here -> https://www.pdfen.com/pdf-a-validator
We might have to convert PDF according to standard in catch block if we receive the same parsing error from Hummus.

@FranklinThaker
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants