Fix #632: <img> without "height" attribute fails to leave enough blank space for itself #634

Bubbu0129 · 2022-12-21T09:51:14Z

Fixes #632

Checklist:

The GitHub pipeline is OK (green),
meaning that both pylint (static code analyzer) and black (code formatter) are happy with the changes of this PR.
A unit test is covering the code added / modified by this PR
This PR is ready to be merged
A mention of the change is present in CHANGELOG.md

The unit test test_img_inside_html_table_without_explicit_dimensions in test_html.py isn't passed. The outcome and the original pdf looks identical, but the bytes are different somehow. Can you check it for me?

By submitting this pull request, I confirm that my contribution is made under the terms of the GNU LGPL 3.0 license.

Lucas-C · 2022-12-21T12:27:04Z

Some memory checks are currently failing in the CI pipeline.
See #609 (comment)
You can safely bump the memory limits as part of this PR in order for those checks to pass

fpdf/fpdf.py

Bubbu0129 · 2022-12-22T02:25:16Z

Ok, I can fix it. Here's another problem.
The memory usage plunge is due to image() method returning the decoded bytes of the image file. The purpose of this function is rendering the image onto the pdf, so why return the info dictionary? I greped the whole project and there is no file using the return value of image(). Removing the return value also passed all the tests.
If the return value is somehow necessary, a flag can be implemented in the arguments of image() function. If flag == True, then only return rendered_width and rendered_height. Otherwise, return info.

Lucas-C · 2022-12-22T08:11:36Z

The memory usage plunge is due to image() method returning the decoded bytes of the image file. The purpose of this function is rendering the image onto the pdf, so why return the info dictionary? I greped the whole project and there is no file using the return value of image(). Removing the return value also passed all the tests.
If the return value is somehow necessary, a flag can be implemented in the arguments of image() function. If flag == True, then only return rendered_width and rendered_height. Otherwise, return info.

I'm 100% OK with removing the decoded bytes from the info dict returned by FPDF.image(), with no need for an extra flag to control this behaviour (as it was never properly documented/tested), but only if this does indeed reduce the memory usage. Otherwise I don't see a need for this change.

What makes you think this behaviour of FPDF.image() as an impact on memory usage?
I am very curious and interested by this lead 😊

Bubbu0129 · 2022-12-26T10:34:53Z

I first tried reserving info in the return value of self.image(), and as you can see, many tests failed due to memory overflow.

FAILED test/test_perfs.py::test_intense_image_rendering - Failed: Timeout >40.0s
FAILED test/image/test_load_image.py::test_share_images_cache - memunit.MemoryUsageException: Max usage 155.39Mb >= 140Mb
FAILED test/image/test_oversized.py::test_oversized_images_downscale_simple - memunit.MemoryUsageException: Max usage 155.39Mb >= 135Mb
FAILED test/image/test_oversized.py::test_oversized_images_downscale_twice - memunit.MemoryUsageException: Max usage 155.39Mb >= 135Mb
FAILED test/image/test_oversized.py::test_oversized_images_downscaled_and_highres - memunit.MemoryUsageException: Max usage 155.39Mb >= 135Mb
FAILED test/image/test_oversized.py::test_oversized_images_highres_and_downscaled - memunit.MemoryUsageException: Max usage 155.39Mb >= 135Mb
FAILED test/image/test_oversized.py::test_oversized_images_downscale_biggest_1st - memunit.MemoryUsageException: Max usage 155.39Mb >= 135Mb
FAILED test/image/test_oversized.py::test_oversized_images_downscale_biggest_2nd - memunit.MemoryUsageException: Max usage 155.39Mb >= 135Mb

In the original code, the return value of self.image() was never actually used. But in the case of issue #632, html.py need to obtain the rendered_width and rendered_height, so the return value is passed via the buffer. However, info["data"] contains the bytes of the image, plunging the memory usage.
I tried omitting info, and the tests got pass without memory issues.
I haven't understood the code thoroughly. Is info compulsory?

Lucas-C · 2022-12-26T11:34:04Z

Hi and Merry Christmas @Bubbu0129 😄

The performance tests heavily depend on your computer caracteristics:
I calibrated the thresholds so that they pass in my own development environment and in GitHub Actions pipelines,
but they may fail on a computer with a slightly less performant CPU / RAM / HDD.

In the original code, the return value of self.image() was never actually used. But in the case of issue #632, html.py need to obtain the rendered_width and rendered_height, so the return value is passed via the buffer. However, info["data"] contains the bytes of the image, plunging the memory usage.

OK, I think I understand your analysis, thank you for explaining.
I'm not sure about this though...
IMHO your change shouldn't have any impact, as the GC should similarly release the memory used by the image buffer once it has exited HTML2FPDF.handle_starttag.

I bet that even when returning the info dict from FPDF.image(), the performance tests will pass in the GitHub Actions pipeline.

FYI, I'm investigating fpdf2 spurious memory usage in this issue: #641

I haven't understood the code thoroughly. Is info compulsory?

What do you mean exactly?
The thing is that FPDF.image() used to return an info directory for years now,
so even if it was not properly documented, fpdf2 users code may rely on it,
and we should, insofar as possible, try not to break their code whenever they upgrade to the latest fpdf2 version.

Currently their is no part of fpdf2 that rely on this info dict returned by FPDF.image() though.
If it proves to really be a performance bottleneck, I would be OK to break backward compatibility here.

Could you please rebase your PR on this repo master branch and solve the minor merge conflict in CHANGELOG.md ?

Bubbu0129 · 2022-12-27T00:14:52Z

OK, I see.
I add the info dict back and bumped the memory limits.
Merry Christmas!

codecov · 2022-12-27T00:33:21Z

Codecov Report

Base: 93.96% // Head: 93.96% // Increases project coverage by +0.00% 🎉

Coverage data is based on head (8cdedda) compared to base (a961d74).
Patch coverage: 100.00% of modified lines in pull request are covered.

❗ Current head 8cdedda differs from pull request most recent head 7a8a2dc. Consider uploading reports for the commit 7a8a2dc to get more accurate results

Additional details and impacted files

@@           Coverage Diff           @@
##           master     #634   +/-   ##
=======================================
  Coverage   93.96%   93.96%           
=======================================
  Files          25       25           
  Lines        6457     6460    +3     
  Branches     1156     1156           
=======================================
+ Hits         6067     6070    +3     
  Misses        223      223           
  Partials      167      167

Impacted Files	Coverage Δ
fpdf/fpdf.py	`91.46% <100.00%> (+<0.01%)`	⬆️
fpdf/html.py	`94.74% <100.00%> (+0.02%)`	⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

CHANGELOG.md

test/test_perfs.py

Lucas-C · 2022-12-27T01:03:17Z

Merged! Thank you 😄

Bubbu0129 requested a review from Lucas-C as a code owner December 21, 2022 09:51

Lucas-C requested changes Dec 21, 2022

View reviewed changes

fpdf/fpdf.py Outdated Show resolved Hide resolved

Lucas-C reviewed Dec 27, 2022

View reviewed changes

CHANGELOG.md Outdated Show resolved Hide resolved

Lucas-C reviewed Dec 27, 2022

View reviewed changes

test/test_perfs.py Outdated Show resolved Hide resolved

Bubbu0129 added 4 commits December 27, 2022 01:52

Fix issue 632

953cab7

Add unit test for issue 632

6baff6f

Clarify variable names

8ef7167

Bumped memory limits in tests

7a8a2dc

Lucas-C merged commit 64d8482 into py-pdf:master Dec 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix #632: <img> without "height" attribute fails to leave enough blank space for itself #634

Fix #632: <img> without "height" attribute fails to leave enough blank space for itself #634

Bubbu0129 commented Dec 21, 2022

Lucas-C commented Dec 21, 2022

Bubbu0129 commented Dec 22, 2022

Lucas-C commented Dec 22, 2022

Bubbu0129 commented Dec 26, 2022

Lucas-C commented Dec 26, 2022 •

edited

Loading

Bubbu0129 commented Dec 27, 2022

codecov bot commented Dec 27, 2022 •

edited

Loading

Lucas-C commented Dec 27, 2022

Fix #632: <img> without "height" attribute fails to leave enough blank space for itself #634

Fix #632: <img> without "height" attribute fails to leave enough blank space for itself #634

Conversation

Bubbu0129 commented Dec 21, 2022

Lucas-C commented Dec 21, 2022

Bubbu0129 commented Dec 22, 2022

Lucas-C commented Dec 22, 2022

Bubbu0129 commented Dec 26, 2022

Lucas-C commented Dec 26, 2022 • edited Loading

Bubbu0129 commented Dec 27, 2022

codecov bot commented Dec 27, 2022 • edited Loading

Codecov Report

Lucas-C commented Dec 27, 2022

Lucas-C commented Dec 26, 2022 •

edited

Loading

codecov bot commented Dec 27, 2022 •

edited

Loading