-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Acrobat cannot display transformed PDFs with a decimal precision > 19 #1376
Comments
you can use decimal.getContext().prec to change the default precision import decimal
decimal.getcontext().prec = 19 so, I think do not hardcode precision in pyPDF2 is a correct behavior, this is not a bug, but should leave a comment is pyPDF2's document |
Although that changes the precision for operations like rounding numbers, it does not have an affect on string formatting, unfortunately. So I propose a separate context to manage formatting settings for PyPDF. |
@lutts sure, it's definitely not a bug, but Acrobat is the de facto standard viewer for Windows, so I guess it would be good to somehow make it transparent that transformations could cause Acrobat display problems. |
This definitely seems like a bug to me in that I think everyone expects the output to be viewable in acrobat (the de facto standard as mentioned above). Would it be possible to default to a precision > 5 and < 20 (say, 19) and then provide for a configurable higher precision if desired? |
Note that this affects decimals displayed in various places. I've encountered this setting boxes (there, I worked around using
It may also be worth mentioning (mainly for googlability) that when trying to anything with that document in Acrobat, it shows "There was a problem reading this document (14)". As to the characterization of the precision tolerated, I've conducted some experiments: It seems that Acrobat tolerates 19 digits after the decimal point. This is not what Decimal's |
To make the original tests I've done verifiable:
(This is all consistent with my previous statement of "it's the digits after the dot, not the mantissa length"). However, I've done one more test:
This indicates some mixed scheme in which 19 places after the dot are tolerated, but small numbers do use something more float-like. It should be noted though that there appears to be a minimal page size (1.1mm), so maybe that test is not ideal. Therefore, I'm going with a test closer to my original use case -- also containing a more comprehensive set of what works and what doesn't: import PyPDF2
from PyPDF2.generic import NameObject as N, ArrayObject as A, FloatObject as F
from decimal import Decimal
pdf = PyPDF2.PdfFileWriter()
p = pdf.addBlankPage(30, 30)
p[N('/ArtBox')] = A((F(0), F(0), F(1), F(Decimal("10.0000000000000000001")))) # works
p[N('/ArtBox')] = A((F(0), F(0), F(1), F(Decimal("1.0000000000000000001")))) # works
p[N('/ArtBox')] = A((F(0), F(0), F(1), F(Decimal("1.00000000000000000001")))) # broken
p[N('/ArtBox')] = A((F(0), F(0), F(1), F(Decimal("0.10000000000000000001")))) # works
p[N('/ArtBox')] = A((F(0), F(0), F(1), F(Decimal("0.100000000000000000012")))) # works
p[N('/ArtBox')] = A((F(0), F(0), F(1), F(Decimal("0.10000000000000000001234")))) # works
p[N('/ArtBox')] = A((F(0), F(0), F(1), F(Decimal("0.100000000000000000012347890123456")))) # works
p[N('/ArtBox')] = A((F(0), F(0), F(1), F(Decimal("0.100000000000000000012347890123456789999999999999999999999999999999999999")))) # works
with open("test.pdf", "wb") as of:
pdf.write(of) I'd summarize this as "19 digits after the dot always work; if it's zero before the dot, there is no practical limit". |
I've done some digging in the specs (Adobe® Portable Document Format Version 1.7 is what I read):
|
all I would propose to replace
your opinions ? |
LGTM except for the `elif self>=7:` part being inconsistent (do we need that middle case at all? log10 isn't that costly).
As this is not a PDF but an Acrobat detail, I'd recommend leaving a note to that effect (for the benefit of later devs who might be tempted to refactor here).
|
oops. mistake : it should be read |
It'd probably be I've spotted two more small bugs (it should be def __repr__(self):
"""Represent the number in decimal format with up to 19 decimal digits,
or to with the available precision when the number's integral part is
zero.
Reducing precision accomodates Adobe Acrobat (which fails to load files
containing more precise numbers).
>>> D("10")
10
>>> D("10.0000000000000000000001")
10
>>> D("10.0000000000100000000001")
10.00000000001
>>> D("0.0000000000100000000001")
0.0000000000100000000001
>>> D("100000000000000000000.0000000000100000000001")
100000000000000000000.00000000001
"""
if abs(self) >= 1:
return f"{self:.19f}".rstrip("0").rstrip(".")
else:
return f"{self:f}" |
This could be interesting. about
|
Is sprinkling in more magic numbers really an ideal solution to this problem? This feels too clever. My sense is that this will lead to a whack-a-mole situation that will never quite cover every edge case. It also makes the code more difficult to understand. I propose merging #1499 but making the |
16 is not a magic number : it corresponds to the number of digits for the 52-bits mantissa of a double which correspond to the standard implementation nowadays. Also, this should be compatible with float implementation : this should allow us to move from |
I was referring to the overall thread not specifically your previous comment @pubpub-zz Also, #1499 does not alter the DecimalContext. It defines a new context that is specific to PyPDF, which would not impact other programs. |
Explanation
Since PyPDF2 version 2.10.9, floats are represented using their intrinsic precision instead of reducing the precision to 5 decimal places.
Acrobat Reader seems to have a limitation in displaying PDFs with a decimal precision > 19. When you apply
page.scale_by()
to a PDF page using a non-integer value, Acrobat (22.002.20212) displays the transformed page as empty square.@programmarchy has already proposed a solution in #1267.
Environment
Code + PDF
input.pdf
output.pdf [intrinsic precision]
If you change the precision in pypdf.generic._base.FloatObject from
to
Acrobat displays the resulting PDF correctly, while
.20f
cannot be displayed anymore.The text was updated successfully, but these errors were encountered: