Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Acrobat: An error exists on this page. (with multiple SVG imports) #960

Closed
gmischler opened this issue Oct 15, 2023 · 13 comments · Fixed by #1145
Closed

Acrobat: An error exists on this page. (with multiple SVG imports) #960

gmischler opened this issue Oct 15, 2023 · 13 comments · Fixed by #1145
Labels
bug compression research needed too complicated to implement without careful study of official specifications

Comments

@gmischler
Copy link
Collaborator

While implementing "image paragraphs" for text regions, Acrobat reader suddenly started complaining about my test file:
image
Of course they want you to buy their other software to create PDFs, so the message is deliberately unhelpful.

Error details

I could boil it down to sections containing imported SVG data. Strangely it takes a certain amount of data until the error triggers. With the SVG logo, it either takes three of them on one page, or two and a bunch of text (at least that are the combinations I found).
None of the other viewers and validators that I have easy access to indicate any errors.

Processing the file with qpdf and "--normalize-content=y" (or "--qdf") fixes the problem. But I was unable to glean any useful information from a comparison.
I've seen reports that Adobe Preflight gives useful and detailed error reports. So if anyone has that available, it might lead us somewhere.

Minimal code

from fpdf import FPDF

img_file = "fpdf2/test/svg/svg_sources/SVG_logo.svg"
pdf = FPDF()
pdf.add_page()
pdf.image(img_file, w=30, h=30)
pdf.image(img_file, w=30, h=30)
pdf.image(img_file, w=30, h=30)
pdf.output("acro-svg.pdf")

(for some reason, github doesn't want me to include PDF files here...)

Environment

  • fpdf2 version used: current HEAD
@gmischler gmischler added the bug label Oct 15, 2023
@Lucas-C
Copy link
Member

Lucas-C commented Oct 17, 2023

Thank you for the detailed report @gmischler!

I made some tests this morning:

  • I found 3 SVG files triggering the error with Adobe Acrobat Reader:
    • test/svg/svg_sources/cubic02.svg 5.1KB - KO
    • test/svg/svg_sources/SVG_logo.svg 4.2KB - KO
    • test/svg/svg_sources/arcs02.svg 2.3KB - KO
  • 5 other files do not cause any issue, and this proves that it's not fully related to the SVG file sizes:
    • test/svg/svg_sources/Ghostscript_escher.svg 297KB - OK
    • test/svg/svg_sources/Ghostscript_colorcircle.svg 139KB - OK
    • test/svg/svg_sources/cubic01.svg 1.8KB - OK
    • test/svg/svg_sources/quad01.svg 1.1KB - OK
    • test/svg/svg_sources/arcs01.svg 889B - OK
  • as you already mentioned, the problem does not appear if the SVG file inserted only twice (I tested this for all 3 KO SVG files)
  • the problem was already present with fpdf2.7.0-2.7.5
  • pdf.compress = False makes the problem disappear!

@gmischler
Copy link
Collaborator Author

It's probably not something in the SVG data itself, but in how it interacts with compression. Adding the same SVG several times causes a lot of repetition in the text (they end up identical except for the placement/scaling transform), resulting in a very high compression ratio. Apparently we're not handling that situation in exactly the way as the acrobat reader expects.

I've found that some other software sometimes adds a "Length1" value to content streams. By the specs this is only meant (and mandatory) for compressed font data, where it gives the uncompressed size of the data. I experimented with adding that to the content stream of my example file, but didn't see any change in behaviour. Given that it is off-spec, that isn't really a surprise, but it was worth a shot.

Acrobat reader seems to issue (or not) those warnings depending on arbitrary criteria (including the Windows version, according to some reports). So it may well be that there's something in our use of compression it generally doesn't like, but only complains about when the compression rate is particularly high.

@Lucas-C Lucas-C added research needed too complicated to implement without careful study of official specifications compression labels Oct 30, 2023
@Lucas-C
Copy link
Member

Lucas-C commented Oct 30, 2023

In fpdf2, PDF pages are compressed using /FlateDecode implemented with zlib.compress():
https://github.com/py-pdf/fpdf2/blob/2.7.6/fpdf/syntax.py#L200

Have you tried displaying zlib.ZLIB_VERSION & zlib.ZLIB_RUNTIME_VERSION? Maybe this issue could be related to the version of the underlying zlib library used?

I'd be curious to know if this could problem happens with other PDF readers...
Adobe Acrobat Reader being closed-source, it won't be easy to figure what is the root problem...

@Lucas-C
Copy link
Member

Lucas-C commented Oct 30, 2023

I have been digging a little deeper into the resulting zlib compressed streams, but could not find much...

import zlib
from fpdf import FPDF
from pypdf import PdfReader

for svg_file in ("test/svg/svg_sources/arcs01.svg", "test/svg/svg_sources/arcs02.svg"):
  print(svg_file)

  pdf = FPDF()
  pdf.add_page()
  pdf.image(svg_file, w=30, h=30)
  pdf.image(svg_file, w=30, h=30)
  pdf.image(svg_file, w=30, h=30)
  pdf.output("issue_960.pdf")

  reader = PdfReader("issue_960.pdf")
  compressed_stream = reader.pages[0]["/Contents"]._data

  # cf. https://www.rfc-editor.org/rfc/rfc1950
  cmf, flg = compressed_stream[0], compressed_stream[1]
  print(f"* cmf=0x{cmf:X} flg=0x{flg:X}")  # 0x78 0x9C => zlib: Default Compression

  decompressor = zlib.decompressobj(wbits=zlib.MAX_WBITS)
  decompressed_data = decompressor.decompress(compressed_stream)
  print(f"* length of decompressed data: {len(decompressed_data)} bytes")
  print(f"* compression ratio: {100*len(compressed_stream)/len(decompressed_data):.2f}%")
  print(f"* end of the compressed data stream reached? {decompressor.eof=}")
  print(f"* {decompressor.unconsumed_tail=}")
  print(f"* {decompressor.unused_data=}")
  print()

Output:

test/svg/svg_sources/arcs01.svg
* cmf=0x78 flg=0x9C
* length of decompressed data: 2585 bytes
* compression ratio: 17.45%
* end of the compressed data stream reached? decompressor.eof=True
* decompressor.unconsumed_tail=b''
* decompressor.unused_data=b''

test/svg/svg_sources/arcs02.svg
* cmf=0x78 flg=0x9C
* length of decompressed data: 7808 bytes
* compression ratio: 4.85%
* end of the compressed data stream reached? decompressor.eof=True
* decompressor.unconsumed_tail=b''
* decompressor.unused_data=b''

The compression ratio of the smallest "problematic" SVG file (test/svg/svg_sources/arcs02.svg) is lower than test/svg/svg_sources/arcs01.svg which does not cause any problem, so it's not simply a matter of this ratio being "too high".

You are right @gmischler, this problems really seems correlated with a high compression ratio being used:

Compression ratio for test/svg/svg_sources/Ghostscript_escher.svg (OK): 29.12%
Compression ratio for test/svg/svg_sources/Ghostscript_colorcircle.svg (OK): 33.11%
Compression ratio for test/svg/svg_sources/cubic01.svg (OK): 9.62%
Compression ratio for test/svg/svg_sources/quad01.svg (OK): 11.95%
Compression ratio for test/svg/svg_sources/arcs01.svg (OK): 17.45%

Compression ratio for test/svg/svg_sources/cubic02.svg (KO): 7.66%
Compression ratio for test/svg/svg_sources/SVG_logo.svg (KO): 6.07%
Compression ratio for test/svg/svg_sources/arcs02.svg (KO): 4.85%

@Lucas-C
Copy link
Member

Lucas-C commented Oct 30, 2023

I suspect that Adobe Acrobat Reader decompression function is implemented a bit like that, for "safety" reasons:

import zlib

def acrobat_decompress(compressed_data, growth_max=12):
    max_length = len(compressed_data) * growth_max
    decompressor = zlib.decompressobj()
    decompressed_data = decompressor.decompress(compressed_data, max_length=max_length)
    if not decompressor.eof:
          raise RuntimeError(f"Uncompressed content is at least {growth_max} times bigger than compressed data")
    return decompressed_data

Of course, len(compressed_data) * 12 is just a guess, who knows what the actual implementation sets as the limit...

@Lucas-C
Copy link
Member

Lucas-C commented Oct 30, 2023

I made some extra tests with several source SVG files:

  • with a compression ratio of 9.26%, Adobe Acrobat Reader produced the error message
  • with a compression ratio of 9.28%, Adobe Acrobat Reader did not produce any error message
  • with a compression ratio of 9.30%, Adobe Acrobat Reader produced the error message
  • with a compression ratio of 9.35%, Adobe Acrobat Reader produced the error message
  • with a compression ratio of 9.42%, Adobe Acrobat Reader produced the error message
  • with a compression ratio of 9.46%, Adobe Acrobat Reader produced the error message
  • with a compression ratio of 9.48%, Adobe Acrobat Reader did not produce any error message
  • with a compression ratio of 9.51%, Adobe Acrobat Reader did not produce any error message
  • with a compression ratio of 9.54%, Adobe Acrobat Reader produced the error message
  • with a compression ratio of 9.55%, Adobe Acrobat Reader did not produce any error message
  • with a compression ratio of 9.80%, Adobe Acrobat Reader did not produce any error message

So it's not just a maximum ratio that is taken in consideration by Acrobat...

@Lucas-C
Copy link
Member

Lucas-C commented Oct 30, 2023

Maybe fpdf2 should produce a warning when a content stream is compressed with a compression ratio lower than 10%?

@gmischler
Copy link
Collaborator Author

Zlib comes with Python. My 3.10 installation uses 1.2.11, but I doubt that this makes any difference in the output.

A warning from fpdf2 seems a bit pointless as long as we don't know what the problem is. What is the user supposed to do with it?

Do all the affected files contain SVG data? I've tried to reproduce the error with other repetitive content subject to high compression, with no success. So it could still be some subtlety in the graphics commands, which acrobat only complains about under certain arbitrary circumstances.

It would really be helpful if soeone with Acrobat Pro could run those files through the preflight function. If the problem is real (and not just a viewer bug), that would give us the information directly from the horses mouth.

@GerardoAllende
Copy link

GerardoAllende commented Apr 9, 2024

When I use Acrobat, I get the same error when printing a PDF. The only requirement is that there is a "path" in the code.

Minimal test code:

from fpdf import FPDF

pdf = FPDF()
pdf.add_page()

with pdf.new_path() as path:
    path.move_to(1, 1)
    path.line_to(9, 9)
    path.close()

pdf.output("test.pdf")

Then print test.pdf using Acrobat reader. The error should appear right after printing.
test.pdf

The problem persists when pdf.compress = False

@Lucas-C
Copy link
Member

Lucas-C commented Apr 9, 2024

When I use Acrobat, I get the same error when printing a PDF. The only requirement is that there is a "path" in the code.

I think this is a different problem, so I moved your comment into a dedicated issue 🙂

@GerardoAllende
Copy link

GerardoAllende commented Apr 11, 2024

Different problem, same workaround -> #1144 also fixes this one.
Just comment these lines https://github.com/py-pdf/fpdf2/edit/master/fpdf/drawing.py#L1448-L1454
Results:
acro-svg-workaround.pdf
acro-svg-err.pdf

gmischler pushed a commit that referenced this issue Apr 12, 2024
* fix bug causing a warning message in Acrobat (#960) (#1144)

* Correction in two PDFs
@AurelianTimu
Copy link

I was having exactly the same issue when using SVGs. It was not 100% reproducible, and happening rarely..
I tried locally the fix in #1145 and so far in my testing I haven't seen the issue again.

Is there an ETA to land 2.7.9 on pypi?

@Lucas-C
Copy link
Member

Lucas-C commented May 1, 2024

Is there an ETA to land 2.7.9 on pypi?

If @gmischler & @andersonhc agree, I think we could perform a new release this month! 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug compression research needed too complicated to implement without careful study of official specifications
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants