Document solution to "character out of range during base 85 decode" #228

metaperl · 2021-07-26T21:04:27Z

it would be nice if a FAQ documented that when merging PDFs, the default call to .save() can potentially lead to this exception.

The solution is to call .save() with stream_decode_level=StreamDecodeLevel.none, compress_streams=False.

The text was updated successfully, but these errors were encountered:

jbarlow83 · 2021-07-26T21:39:19Z

It's not a FAQ because no one's ever asked the question before :P

That most likely means that the PDF contains an error in some base 85-encoded stream. The workaround you found is forcing decompression of streams and will likely lead to a large file. If you've seen it multiple times, chances are you are dealing with PDF software that consistently produces this error.

You should probably do .save(recompress_flate=True) instead to force recompression. The settings you use will decompress everything and make very large files.

If you're able to share the file I may be able to see if there's a way to improve the standard behavior.

metaperl · 2021-07-29T18:33:28Z

You should probably do .save(recompress_flate=True) instead to force recompression. The settings you use will decompress everything and make very large files.

Using that save approach yielded: AttributeError: 'pikepdf._qpd.Pdf' object has no attribute '_original_filename' in line 801 which is: if not filename_or_stream and self._original_filename:`

I think I know why this is happening, but I dont want to say anything that might reveal the nature of what we are doing in public.

If you're able to share the file I may be able to see if there's a way to improve the standard behavior.

I have made a request about this, but I dont think it will happen soon if at all.

One other thing: .save() yields a Python RuntimeError instead of a custom PikePDF exception. Do you think that is desired behavior?

jbarlow83 · 2021-07-29T19:34:53Z

The line 801 error occurs because you called .save() without a destination filename, not because of any other issue.

metaperl · 2021-07-29T20:48:26Z

The line 801 error occurs because you called .save() without a destination filename, not because of any other issue.

Yes, but when you are streaming out to an io.BytesIO datastore, there is no need for a destination filename, just the output stream. This error only occurs with the .save(recompress_flate=True) option. Not with .save(pdf_out) or .save(pdf_out, stream_decode_level=StreamDecodeLevel.none, compress_streams=False).

pdf_out = io.BytesIO()
pdf_merger = Pdf.new()
pdf_merger.save(pdf_out) # works 99% of the time
pdf_merger.save(pdf_out, stream_decode_level=StreamDecodeLevel.none, compress_streams=False) # works the other 1% of the time
pdf_merger.save(pdf_out, recompress_flate=True) # fails with the error mentioned above

jbarlow83 · 2021-07-29T21:34:35Z

Please try testing pdf_merger.save(pdf_out, recompress_flate=True). I don't believe you actually executed the code above since it 1) works for me verbatim 2) the attribute lookup of _original_filename occurs on a conditional path that is only taken if the save destination is omitted.

I did write .save(recompress_flate=True) which I was misleading and sent you in the wrong direction.

jbarlow83 · 2021-07-29T23:03:01Z

The character out of range error is reproducible with code such as:

pdf_merger = Pdf.new()
pdf_merger.add_blank_page()
pdf_merger.pages[0].Contents = pdf_merger.make_stream(
    b'\xba\xad',
    Filter=pikepdf.Name.ASCII85Decode
)
out = io.BytesIO()
pdf_merger.save(out)

i.e. putting some garbage in a ASCII85 stream and saving it.

Now that we have a reproducer, testing the methods explored so far gives:

recompress_flate=True - also raises the runtime error
stream_decode_level=StreamDecodeLevel.none, compress_streams=False - saves, but technically the PDF is invalid, garbage in garbage out

My suggestion would be to iterate all objects and look for ASCII85 streams and try to figure what is producing them and if you can remove or repair the invalid streams. That would look something like this:

for obj in pdf.objects:
    if isinstance(obj, pikepdf.Stream) and pikepdf.Name.Filter in obj:
        if pikepdf.Name.ASCII85Decode in obj.Filter.wrap_in_array().as_list():
            try:
                obj.read_bytes()  # Try decoding
            except RuntimeError:
                print(f"{obj.objgen} has an invalid ascii85 stream")

At this point I don't see a viable general solution to this issue - the application (user of pikepdf) needs to decide what to do with invalid input data. If for example, this particular ascii85 error is easily detectable and repairable, then we certainly could implement some kind of solution.

metaperl · 2021-07-30T14:39:00Z

Please try testing pdf_merger.save(pdf_out, recompress_flate=True)

I did. And it fails just as pdf_merger.save(pdf_out)

I'm happy with using pdf_merger.save(pdf_out, stream_decode_level=StreamDecodeLevel.none, compress_streams=False) in the rare times this does occur. The output file does open and can be read even though there might be something wrong from a technical standpoint.

But I do appreciate your suggestions on how to debug this. One day I might use that suggestion. But for the time being I think we can close this ticket.

metaperl · 2021-08-09T16:33:32Z

The settings you use will decompress everything and make very large files.

In our experience, the files are actually a bit smaller or a bit larger, about 50% each way. But they are not significantly larger.

metaperl · 2021-08-09T16:34:39Z

The solution is to call .save() with stream_decode_level=StreamDecodeLevel.none, compress_streams=False.

There is no need for stream_decode_level=StreamDecodeLevel.none because the default value of this parameter is already None.

jbarlow83 · 2021-08-25T09:13:10Z

Closing since this is superceded by #240

metaperl closed this as completed Jul 30, 2021

metaperl reopened this Aug 9, 2021

metaperl mentioned this issue Aug 9, 2021

.save() yields a Python RuntimeError instead of a custom PikePDF exception #240

Open

jbarlow83 closed this as completed Aug 25, 2021

metaperl mentioned this issue Jul 26, 2022

Did the type of exception raised due to failed calls to .save() based on character out of range change between 5.01 and 5.15? #372

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document solution to "character out of range during base 85 decode" #228

Document solution to "character out of range during base 85 decode" #228

metaperl commented Jul 26, 2021

jbarlow83 commented Jul 26, 2021

metaperl commented Jul 29, 2021 •

edited

jbarlow83 commented Jul 29, 2021

metaperl commented Jul 29, 2021

jbarlow83 commented Jul 29, 2021

jbarlow83 commented Jul 29, 2021 •

edited

metaperl commented Jul 30, 2021

metaperl commented Aug 9, 2021

metaperl commented Aug 9, 2021

jbarlow83 commented Aug 25, 2021

Document solution to "character out of range during base 85 decode" #228

Document solution to "character out of range during base 85 decode" #228

Comments

metaperl commented Jul 26, 2021

jbarlow83 commented Jul 26, 2021

metaperl commented Jul 29, 2021 • edited

jbarlow83 commented Jul 29, 2021

metaperl commented Jul 29, 2021

jbarlow83 commented Jul 29, 2021

jbarlow83 commented Jul 29, 2021 • edited

metaperl commented Jul 30, 2021

metaperl commented Aug 9, 2021

metaperl commented Aug 9, 2021

jbarlow83 commented Aug 25, 2021

metaperl commented Jul 29, 2021 •

edited

jbarlow83 commented Jul 29, 2021 •

edited