New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document solution to "character out of range during base 85 decode" #228
Comments
It's not a FAQ because no one's ever asked the question before :P That most likely means that the PDF contains an error in some base 85-encoded stream. The workaround you found is forcing decompression of streams and will likely lead to a large file. If you've seen it multiple times, chances are you are dealing with PDF software that consistently produces this error. You should probably do If you're able to share the file I may be able to see if there's a way to improve the standard behavior. |
Using that save approach yielded: I think I know why this is happening, but I dont want to say anything that might reveal the nature of what we are doing in public.
I have made a request about this, but I dont think it will happen soon if at all. One other thing: |
The line 801 error occurs because you called .save() without a destination filename, not because of any other issue. |
Yes, but when you are streaming out to an io.BytesIO datastore, there is no need for a destination filename, just the output stream. This error only occurs with the pdf_out = io.BytesIO()
pdf_merger = Pdf.new()
pdf_merger.save(pdf_out) # works 99% of the time
pdf_merger.save(pdf_out, stream_decode_level=StreamDecodeLevel.none, compress_streams=False) # works the other 1% of the time
pdf_merger.save(pdf_out, recompress_flate=True) # fails with the error mentioned above |
Please try testing I did write |
The character out of range error is reproducible with code such as: pdf_merger = Pdf.new()
pdf_merger.add_blank_page()
pdf_merger.pages[0].Contents = pdf_merger.make_stream(
b'\xba\xad',
Filter=pikepdf.Name.ASCII85Decode
)
out = io.BytesIO()
pdf_merger.save(out) i.e. putting some garbage in a ASCII85 stream and saving it. Now that we have a reproducer, testing the methods explored so far gives:
My suggestion would be to iterate all objects and look for ASCII85 streams and try to figure what is producing them and if you can remove or repair the invalid streams. That would look something like this: for obj in pdf.objects:
if isinstance(obj, pikepdf.Stream) and pikepdf.Name.Filter in obj:
if pikepdf.Name.ASCII85Decode in obj.Filter.wrap_in_array().as_list():
try:
obj.read_bytes() # Try decoding
except RuntimeError:
print(f"{obj.objgen} has an invalid ascii85 stream") At this point I don't see a viable general solution to this issue - the application (user of pikepdf) needs to decide what to do with invalid input data. If for example, this particular ascii85 error is easily detectable and repairable, then we certainly could implement some kind of solution. |
I did. And it fails just as I'm happy with using But I do appreciate your suggestions on how to debug this. One day I might use that suggestion. But for the time being I think we can close this ticket. |
In our experience, the files are actually a bit smaller or a bit larger, about 50% each way. But they are not significantly larger. |
There is no need for |
Closing since this is superceded by #240 |
it would be nice if a FAQ documented that when merging PDFs, the default call to
.save()
can potentially lead to this exception.The solution is to call
.save()
withstream_decode_level=StreamDecodeLevel.none, compress_streams=False
.The text was updated successfully, but these errors were encountered: