BUG: Format floats using their intrinsic decimal precision #1267

programmarchy · 2022-08-24T21:22:53Z

Since FloatObject is represented as a decimal, format numbers using their intrinsic precision, instead of reducing the precision to 5 decimal places.

This fixes rendering issues for PDFs that contain coordinates, transformations, etc. with real numbers containing more than 5 decimal places of precision. For example, PDFs exported from Microsoft PowerPoint contain numbers with up to 11 decimal places.

Fixes: #1266

codecov · 2022-08-25T03:09:14Z

Codecov Report

Base: 94.63% // Head: 94.63% // Increases project coverage by +0.00% 🎉

Coverage data is based on head (6b7f7ef) compared to base (3cf80bf).
Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #1267   +/-   ##
=======================================
  Coverage   94.63%   94.63%           
=======================================
  Files          30       30           
  Lines        5140     5141    +1     
  Branches     1058     1058           
=======================================
+ Hits         4864     4865    +1     
  Misses        164      164           
  Partials      112      112

Impacted Files	Coverage Δ
PyPDF2/generic/_base.py	`100.00% <ø> (ø)`
PyPDF2/_writer.py	`91.06% <100.00%> (+0.01%)`	⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

tests/test_writer.py

PyPDF2/generic/_base.py

…ng to 5 decimal places Explicitly format floats in outline color test so they can be compared

rather than adding a precision property to FloatObject

programmarchy · 2022-09-05T17:07:00Z

Rebased this PR so tests are passing, and believe all the changes requested in the last review have been addressed.

Could you take another look, @MasterOdin?

MartinThoma

It looks good from my side - thank you for writing a unit test 🤗

MartinThoma · 2022-09-14T04:13:42Z

@MasterOdin You were a lot more involved in this PR than I was. What do you think?

MasterOdin · 2022-09-15T15:35:36Z

Looks good. Thanks for the work here @programmarchy! 👍

MartinThoma · 2022-09-18T09:46:54Z

Thank you both for all the work you put in it 🙏

I've just merged the PR and I will release it today to PyPI :-)

MartinThoma · 2022-09-18T09:47:12Z

@programmarchy If you want, I can add you to https://pypdf2.readthedocs.io/en/latest/meta/CONTRIBUTORS.html :-)

New Features (ENH): - Add rotation property and transfer_rotate_to_content (#1348) Performance Improvements (PI): - Avoid string concatenation with large embedded base64-encoded images (#1350) Bug Fixes (BUG): - Format floats using their intrinsic decimal precision (#1267) Robustness (ROB): - Fix merge_page for pages without resources (#1349) Full Changelog: 2.10.8...2.10.9

programmarchy · 2022-09-19T19:26:56Z

@programmarchy If you want, I can add you to https://pypdf2.readthedocs.io/en/latest/meta/CONTRIBUTORS.html :-)

That would be very cool, thank you @MartinThoma!

mrknwk · 2022-09-23T22:45:55Z

When you use page.scale_by() with a non-integer value in 2.10.9, Acrobat (22.002.20212) displays the transformed PDF pages as empty square drawing areas. Maybe one could leave the default precision at less than 20 digits (this is the Acrobat tipping point, I guess) for compatibility reasons and make it configurable?

programmarchy · 2022-09-24T02:05:16Z

Interesting, would you mind sharing how you came to find out 20 digits is the tipping point for Acrobat, @mrknwk?

One way I was thinking of to make this configurable would be to adopt context vars as implemented in decimal.Context for example. The context provides sane defaults with a central point for changing behavior.

It would allow us to write something like:

import PyPDF2
from PyPDF2 import PdfReader, PdfWriter
from PyPDF2.context import Context, StripExtraTrailingZeros, QuantizeInteger

ctx = StreamContext()
ctx.max_prec = 5  # specify maximum precision
ctx.flags = [
    StripExtraTrailingZeros,
    QuantizeInteger,
]  # could also specify additional format flags
PyPDF2.setcontext(ctx)

reader = PdfReader("./path/to/file.pdf")
reader.pages[0].scale_by(0.5)
writer = PdfWriter()
writer.add_page(reader.pages[0])
...

Or like this:

with PyPDF2.localcontext() as ctx:
    ctx.max_prec = 5  # specify maximum precision
    ctx.flags = [
        StripExtraTrailingZeros,
        QuantizeInteger,
    ]  # could also specify additional format flags
    ...

Or maybe this:

PyPDF2.setcontext(AdobeAcrobactContext)

Might make sense to open a separate issue to discuss further.

mrknwk · 2022-09-24T07:45:52Z

@programmarchy It really was just trial and error. 😊

But 20 digits is also the limit that one of the maintainers of PDF Arranger found in a test. He contacted Adobe about it and apparently it is an Acrobat "implementation level limitation". So maybe the third option would be a nice way to go then.

programmarchy · 2022-09-26T14:33:32Z

@mrknwk I'd be happy to take a stab at implementing the above. Could you please create a GitHub issue with a corresponding sample PDF, and tag me?

programmarchy force-pushed the decimal-precision branch 2 times, most recently from e73cb59 to d7d447c Compare August 25, 2022 03:02

programmarchy changed the title ~~Format floats using the intrinsic decimal precision~~ Format floats using their intrinsic decimal precision Aug 25, 2022

MasterOdin reviewed Aug 25, 2022

View reviewed changes

tests/test_writer.py Outdated Show resolved Hide resolved

MasterOdin reviewed Aug 25, 2022

View reviewed changes

PyPDF2/generic/_base.py Outdated Show resolved Hide resolved

programmarchy marked this pull request as draft August 29, 2022 14:00

programmarchy requested a review from MasterOdin August 29, 2022 15:46

programmarchy marked this pull request as ready for review August 29, 2022 15:46

programmarchy force-pushed the decimal-precision branch 6 times, most recently from 5611919 to a71d15b Compare August 31, 2022 13:56

programmarchy added 2 commits September 5, 2022 11:38

Format floats using the intrinsic decimal precision instead of reduci…

00f0ed7

…ng to 5 decimal places Explicitly format floats in outline color test so they can be compared

Use Decimal.quantize to specify precision

9766c75

rather than adding a precision property to FloatObject

programmarchy force-pushed the decimal-precision branch from 2cfe102 to 9766c75 Compare September 5, 2022 16:52

Merge branch 'main' into decimal-precision

6b7f7ef

MartinThoma approved these changes Sep 14, 2022

View reviewed changes

MasterOdin approved these changes Sep 15, 2022

View reviewed changes

MartinThoma changed the title ~~Format floats using their intrinsic decimal precision~~ BUG: Format floats using their intrinsic decimal precision Sep 18, 2022

MartinThoma merged commit 5aeb926 into py-pdf:main Sep 18, 2022

MartinThoma added the is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF label Sep 18, 2022

mrknwk mentioned this pull request Sep 28, 2022

Acrobat cannot display transformed PDFs with a decimal precision > 19 #1376

Closed

joshhendo mentioned this pull request Dec 13, 2022

BUG: 1376 Acrobat Scale #1499

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Format floats using their intrinsic decimal precision #1267

BUG: Format floats using their intrinsic decimal precision #1267

programmarchy commented Aug 24, 2022

codecov bot commented Aug 25, 2022 •

edited

programmarchy commented Sep 5, 2022

MartinThoma left a comment

MartinThoma commented Sep 14, 2022

MasterOdin commented Sep 15, 2022

MartinThoma commented Sep 18, 2022

MartinThoma commented Sep 18, 2022

programmarchy commented Sep 19, 2022

mrknwk commented Sep 23, 2022

programmarchy commented Sep 24, 2022 •

edited by MartinThoma

mrknwk commented Sep 24, 2022

programmarchy commented Sep 26, 2022

BUG: Format floats using their intrinsic decimal precision #1267

BUG: Format floats using their intrinsic decimal precision #1267

Conversation

programmarchy commented Aug 24, 2022

codecov bot commented Aug 25, 2022 • edited

Codecov Report

programmarchy commented Sep 5, 2022

MartinThoma left a comment

Choose a reason for hiding this comment

MartinThoma commented Sep 14, 2022

MasterOdin commented Sep 15, 2022

MartinThoma commented Sep 18, 2022

MartinThoma commented Sep 18, 2022

programmarchy commented Sep 19, 2022

mrknwk commented Sep 23, 2022

programmarchy commented Sep 24, 2022 • edited by MartinThoma

mrknwk commented Sep 24, 2022

programmarchy commented Sep 26, 2022

codecov bot commented Aug 25, 2022 •

edited

programmarchy commented Sep 24, 2022 •

edited by MartinThoma