Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed FPDF.local_context() style leak during page breaks - Fix #1204 #1207

Merged
merged 2 commits into from
Jun 17, 2024

Conversation

Lucas-C
Copy link
Member

@Lucas-C Lucas-C commented Jun 17, 2024

Checklist:

  • The GitHub pipeline is OK (green), meaning that both pylint (static code analyzer) and black (code formatter) are happy with the changes of this PR.

  • A unit test is covering the code added / modified by this PR

  • This PR is ready to be merged

  • [NA] In case of a new feature, docstrings have been added, with also some documentation in the docs/ folder

  • A mention of the change is present in CHANGELOG.md

By submitting this pull request, I confirm that my contribution is made under the terms of the GNU LGPL 3.0 license.

@Lucas-C
Copy link
Member Author

Lucas-C commented Jun 17, 2024

This PR also includes a fix (in test/requirements.txt) to avoid this error when executing tests in GitHub Actions:

E   ImportError: numpy.core.multiarray failed to import
------------------------------- Captured stderr --------------------------------

A module that was compiled using NumPy 1.x cannot be run in
NumPy 2.0.0 as it may crash. To support both 1.x and 2.x
versions of NumPy, modules must be compiled with NumPy 2.0.
Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.

If you are a user of the module, the easiest solution will be to
downgrade to 'numpy<2' or try to upgrade the affected module.
We expect that some modules will need time to support NumPy 2.
...
  File "/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/cv2/__init__.py", line 153, in bootstrap
    native_module = importlib.import_module("cv2")
  File "/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
AttributeError: _ARRAY_API not found

@Lucas-C Lucas-C force-pushed the issue-1204 branch 2 times, most recently from 71f366f to 5239fea Compare June 17, 2024 07:59
@Lucas-C
Copy link
Member Author

Lucas-C commented Jun 17, 2024

If you want to review this @gmischler & @andersonhc, I'll be happy to have your feedbacks 🙂

Else I'll merge this next week.

@Lucas-C
Copy link
Member Author

Lucas-C commented Jun 17, 2024

I also used this opportunity to now render HTML in .write_html() inside a .local_context(),
in order to better isolate HTML rendering & styling, and avoid it to "leak" to the outer context.
This has been done in the 2nd commit of this PR.

This change introduces a slight overhead of an extra q/Q block to all PDF references files based on .write_html().

I used this command to visually compare that all the reference files are still visually identical:

scripts/compare-changed-pdfs.py test/html/

@Lucas-C Lucas-C force-pushed the issue-1204 branch 3 times, most recently from 9752952 to 796ab2d Compare June 17, 2024 12:44
Copy link
Collaborator

@gmischler gmischler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tricky business, but it seems to work!

fpdf/fpdf.py Show resolved Hide resolved
fpdf/fpdf.py Outdated Show resolved Hide resolved
@Lucas-C
Copy link
Member Author

Lucas-C commented Jun 17, 2024

Thank you for the review @gmischler 👍 🙂

By the way, the code in HTML2FPDF regarding font style management, contains many kludges:

  • self.font_family somehow duplicates self.pdf.font_family
  • self.style_stack somehow duplicates the GraphicsStyle stack of FPDF
  • the default behaviour of HTML2FPDF.set_font(), with set_default=False, is to set .font_* attributes on the FPDF instance WITHOUT inserting any operators in the stream, and that totally breaks the "contract" that the .font_* attributes are set only when the stream is already configured according to those properties...
  • the paragraph-rendering / content-stream production is performed (by calling HTML2FPDF._end_paragraph()) only when starting to process the NEXT HTML tag, which sometimes leads to font-properties already being set on FPDF in the meantime, even if no content has been rendered for the previous tag yet...

Those last two points in particular makes debugging quite painful...

I tried to simplify some minor things in this PR, but we should try to improve things in this class whenever possible.

@Lucas-C Lucas-C merged commit 68cf594 into master Jun 17, 2024
13 checks passed
@Lucas-C Lucas-C deleted the issue-1204 branch June 17, 2024 13:33
@gmischler
Copy link
Collaborator

Those last two points in particular makes debugging quite painful...

Unfortunately I didn't look at your #1207 carefully enough, and didn't see that you put the local context handling on its head from how it worked before.
That may actually have broken more things than it fixed.

I don't currently have time to study this in more detail.
After so many people have tried to implement different (and sometimes conflicting) concepts in there, maybe it's really time for a cleanroom reimplementation of that module. Are you feeling up to it? 😉

As a general guideline: The HTML module is not allowed to write anything into the output stream itself. It delegates that to the text region and table modules. Nonetheless it must change the font settings in FPDF() (within its local context), so that the text fragments are equipped the correct settings. So yes, setting eg. fonts without writing any output is correct in this situation.

@Lucas-C
Copy link
Member Author

Lucas-C commented Jun 18, 2024

Unfortunately I didn't look at your #1207 carefully enough, and didn't see that you put the local context handling on its head from how it worked before. That may actually have broken more things than it fixed.

This is a bit vague, what do you mean exactly by "on its head" and "more things" ? 🙂

I don't currently have time to study this in more detail.
After so many people have tried to implement different (and sometimes conflicting) concepts in there, maybe it's really time for a cleanroom reimplementation of that module. Are you feeling up to it? 😉

OK, agreed.
No, I don't feel up for it right now, but I'll keep this option in mind!

@gmischler
Copy link
Collaborator

gmischler commented Jun 18, 2024

This is a bit vague, what do you mean exactly by "on its head" and "more things" ?

I had implemented the graphics stack handling within the module. It was necessary to drop down to the external stack level for the rendering calls. I think that was so that _render_styled_text_line() would compare against the original settings. This eliminates redundant state changes to the output (eg. setting the same font repeatedly).

You just wrapped everything in a local context (which is probably fine), without lowering the stack level for rendering (which is probably suboptimal).
It apparently works that way, but the resulting PDFs will be larger because they contain unnecessary duplicate settings changes.
"On its head" may be exaggerated, but you may want to check the difference in output between doing the rendering calls on the internal vs. external stack level.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants