Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider using PyPDF2 strict=False mode #54

Closed
moaxey opened this issue Feb 3, 2019 · 8 comments
Closed

Consider using PyPDF2 strict=False mode #54

moaxey opened this issue Feb 3, 2019 · 8 comments

Comments

@moaxey
Copy link

moaxey commented Feb 3, 2019

This error occurs for me in the ppa version and when built from source on python3 wen saving a one-page pdf which has cropping applied (10% to top and bottom).

image

@jeromerobert
Copy link
Member

Could you give the URL of that PPA and share the PDF file with which you get this error ?

@moaxey
Copy link
Author

moaxey commented Feb 4, 2019

@dreua
Copy link
Member

dreua commented Feb 4, 2019

If I try to save the PDF without cropping (i.e. no change at all) I get a popup saying "Can't read object stream: Stream has ended unexpectedly" while the console outputs: PdfReadWarning: Invalid stream (index 13) within object 35 0: Stream has ended unexpectedly [pdf.py:1573]

@dreua
Copy link
Member

dreua commented Feb 4, 2019

Workaround: Take the PDF, open it in Evince (other PDF viewers may work as well), print to file. Use this file with Pdfarranger: No problems.
I assume that either the original file is broken / not 100% standard compliant in some way or it uses some features or a newer Pdf standard that Pypdf2 does not support. It could also just be a bug in Pypdf2.

@jeromerobert
Copy link
Member

This could be that PyPDF2 bug: py-pdf/pypdf#99. We could try the strict=False option.

dreua added a commit to dreua/pdfarranger that referenced this issue Feb 21, 2019
@dreua
Copy link
Member

dreua commented Feb 21, 2019

strict=False works for this file, I am just not sure if we miss errors when this is set False as default. Maybe errors should be kept in a log and displayed after/during export, what do you think? Would it be worth the effort?

Comments from PyPdf2/pdf.py PdfFileReader:

:param bool strict: Determines whether user should be warned of all
    problems and also causes some correctable problems to be fatal.
    Defaults to ``True``.
:param warndest: Destination for logging warnings (defaults to
    ``sys.stderr``).
:param bool overwriteWarnings: Determines whether to override Python's
    ``warnings.py`` module with a custom implementation (defaults to
    ``True``).

On another thought, I wonder if we should move away from Pypdf2 altogether and try PyMupdf instead.

@jeromerobert
Copy link
Member

On another thought, I wonder if we should move away from Pypdf2 altogether and try PyMupdf instead.

Having MuPDF as new (not python) dependency is not a decision to be taken lightly. PyMuPDF is also swig based and not available in Debian nor Ubuntu. PyPDF2 is only python so easier to manage.

@dreua
Copy link
Member

dreua commented Feb 24, 2019

Yes, of course. I was thinking about that too but did't communicate it, sorry.
I just got the impression that PyPdf2 is not that well maintained and theses issues with PDFs causing errors keep popping up. Both PyMuPdf and MuPdf seem to be very stable, fast and well maintained and therefore looked very promising to me, but I absolutely see your point. (PyMuPdf is available in Fedora but I didn't check Debian or Ubuntu, thanks for that input.)

Looking in another direction, I found out that PyPdf3 is a thing (don't go there, it's dead) and then I seriously had to laugh when the issue there directed me to PyPdf4. After resisting the temptation to create PyPdf5, I had a closer look and while the repository owner "claird" is not very active, there is this other guy "newnone" who has done a lot of work and refactoring. (I guess he works for "claird".) I think its worth keeping an eye on this, maybe it can replace Pypdf2 one day.

(Maybe someone can change the title since cropping is not the issue here.)

@jeromerobert jeromerobert changed the title Error saving cropped pdf Consider using PyPDF2 strict=False mode Mar 17, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants