Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Way to skip PDFs that cause Syntax error #651

Open
wcedmisten opened this issue Jun 2, 2024 · 2 comments
Open

Way to skip PDFs that cause Syntax error #651

wcedmisten opened this issue Jun 2, 2024 · 2 comments

Comments

@wcedmisten
Copy link

Hello!

Not sure if there is any workaround for this currently, but I'm trying to bulk upload a set of ~70,000 PDFs using ia upload. The problem is that I periodically get the error:

Uploaded content is unacceptable. - Syntax error detected in pdf data. You may be able to repair the pdf file with a repair tool, pdftk is one such tool.

Which returns an error. I then have to manually delete the PDF and run the command again to resume uploading. Is there a way to automatically skip the PDFs that throw the error so that manual intervention is not required?

@jjjake
Copy link
Owner

jjjake commented Jun 4, 2024

There is not currently a way to skip failed uploads and continue uploading other files specified in the command (I support this feature though, if anybody has time to add it).

I would suggest finding and filtering any invalid PDFs before uploading:

» find my_pdf_dir -type f | parallel 'pdfinfo -- {} >/dev/null 2>&1 || echo invalid pdf: {}'

@wcedmisten
Copy link
Author

Thanks! For posterity that command didn't output anything, even though pdfinfo on an individual bad file was outputting correctly. I ended up writing a non-paralellized version:

for f in $(ls);
  do
  if pdfinfo $f 2>&1 >/dev/null | grep 'Syntax';
    then echo 'Error on '$f;
  fi;
done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants