Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #632: <img> without "height" attribute fails to leave enough blank space for itself #634

Merged
merged 4 commits into from
Dec 27, 2022

Conversation

Bubbu0129
Copy link

Fixes #632

Checklist:

  • The GitHub pipeline is OK (green),
    meaning that both pylint (static code analyzer) and black (code formatter) are happy with the changes of this PR.

  • A unit test is covering the code added / modified by this PR

  • This PR is ready to be merged

  • A mention of the change is present in CHANGELOG.md

The unit test test_img_inside_html_table_without_explicit_dimensions in test_html.py isn't passed. The outcome and the original pdf looks identical, but the bytes are different somehow. Can you check it for me?

By submitting this pull request, I confirm that my contribution is made under the terms of the GNU LGPL 3.0 license.

@Lucas-C
Copy link
Member

Lucas-C commented Dec 21, 2022

Some memory checks are currently failing in the CI pipeline.
See #609 (comment)
You can safely bump the memory limits as part of this PR in order for those checks to pass

fpdf/fpdf.py Outdated Show resolved Hide resolved
@Bubbu0129
Copy link
Author

Ok, I can fix it. Here's another problem.
The memory usage plunge is due to image() method returning the decoded bytes of the image file. The purpose of this function is rendering the image onto the pdf, so why return the info dictionary? I greped the whole project and there is no file using the return value of image(). Removing the return value also passed all the tests.
If the return value is somehow necessary, a flag can be implemented in the arguments of image() function. If flag == True, then only return rendered_width and rendered_height. Otherwise, return info.

@Lucas-C
Copy link
Member

Lucas-C commented Dec 22, 2022

The memory usage plunge is due to image() method returning the decoded bytes of the image file. The purpose of this function is rendering the image onto the pdf, so why return the info dictionary? I greped the whole project and there is no file using the return value of image(). Removing the return value also passed all the tests.
If the return value is somehow necessary, a flag can be implemented in the arguments of image() function. If flag == True, then only return rendered_width and rendered_height. Otherwise, return info.

I'm 100% OK with removing the decoded bytes from the info dict returned by FPDF.image(), with no need for an extra flag to control this behaviour (as it was never properly documented/tested), but only if this does indeed reduce the memory usage. Otherwise I don't see a need for this change.

What makes you think this behaviour of FPDF.image() as an impact on memory usage?
I am very curious and interested by this lead 😊

@Bubbu0129
Copy link
Author

I first tried reserving info in the return value of self.image(), and as you can see, many tests failed due to memory overflow.

FAILED test/test_perfs.py::test_intense_image_rendering - Failed: Timeout >40.0s
FAILED test/image/test_load_image.py::test_share_images_cache - memunit.MemoryUsageException: Max usage 155.39Mb >= 140Mb
FAILED test/image/test_oversized.py::test_oversized_images_downscale_simple - memunit.MemoryUsageException: Max usage 155.39Mb >= 135Mb
FAILED test/image/test_oversized.py::test_oversized_images_downscale_twice - memunit.MemoryUsageException: Max usage 155.39Mb >= 135Mb
FAILED test/image/test_oversized.py::test_oversized_images_downscaled_and_highres - memunit.MemoryUsageException: Max usage 155.39Mb >= 135Mb
FAILED test/image/test_oversized.py::test_oversized_images_highres_and_downscaled - memunit.MemoryUsageException: Max usage 155.39Mb >= 135Mb
FAILED test/image/test_oversized.py::test_oversized_images_downscale_biggest_1st - memunit.MemoryUsageException: Max usage 155.39Mb >= 135Mb
FAILED test/image/test_oversized.py::test_oversized_images_downscale_biggest_2nd - memunit.MemoryUsageException: Max usage 155.39Mb >= 135Mb

In the original code, the return value of self.image() was never actually used. But in the case of issue #632, html.py need to obtain the rendered_width and rendered_height, so the return value is passed via the buffer. However, info["data"] contains the bytes of the image, plunging the memory usage.
I tried omitting info, and the tests got pass without memory issues.
I haven't understood the code thoroughly. Is info compulsory?

@Lucas-C
Copy link
Member

Lucas-C commented Dec 26, 2022

Hi and Merry Christmas @Bubbu0129 😄

The performance tests heavily depend on your computer caracteristics:
I calibrated the thresholds so that they pass in my own development environment and in GitHub Actions pipelines,
but they may fail on a computer with a slightly less performant CPU / RAM / HDD.

In the original code, the return value of self.image() was never actually used. But in the case of issue #632, html.py need to obtain the rendered_width and rendered_height, so the return value is passed via the buffer. However, info["data"] contains the bytes of the image, plunging the memory usage.

OK, I think I understand your analysis, thank you for explaining.
I'm not sure about this though...
IMHO your change shouldn't have any impact, as the GC should similarly release the memory used by the image buffer once it has exited HTML2FPDF.handle_starttag.

I bet that even when returning the info dict from FPDF.image(), the performance tests will pass in the GitHub Actions pipeline.

FYI, I'm investigating fpdf2 spurious memory usage in this issue: #641

I haven't understood the code thoroughly. Is info compulsory?

What do you mean exactly?
The thing is that FPDF.image() used to return an info directory for years now,
so even if it was not properly documented, fpdf2 users code may rely on it,
and we should, insofar as possible, try not to break their code whenever they upgrade to the latest fpdf2 version.

Currently their is no part of fpdf2 that rely on this info dict returned by FPDF.image() though.
If it proves to really be a performance bottleneck, I would be OK to break backward compatibility here.


Could you please rebase your PR on this repo master branch and solve the minor merge conflict in CHANGELOG.md ?

@Bubbu0129
Copy link
Author

OK, I see.
I add the info dict back and bumped the memory limits.
Merry Christmas!

@codecov
Copy link

codecov bot commented Dec 27, 2022

Codecov Report

Base: 93.96% // Head: 93.96% // Increases project coverage by +0.00% 🎉

Coverage data is based on head (8cdedda) compared to base (a961d74).
Patch coverage: 100.00% of modified lines in pull request are covered.

❗ Current head 8cdedda differs from pull request most recent head 7a8a2dc. Consider uploading reports for the commit 7a8a2dc to get more accurate results

Additional details and impacted files
@@           Coverage Diff           @@
##           master     #634   +/-   ##
=======================================
  Coverage   93.96%   93.96%           
=======================================
  Files          25       25           
  Lines        6457     6460    +3     
  Branches     1156     1156           
=======================================
+ Hits         6067     6070    +3     
  Misses        223      223           
  Partials      167      167           
Impacted Files Coverage Δ
fpdf/fpdf.py 91.46% <100.00%> (+<0.01%) ⬆️
fpdf/html.py 94.74% <100.00%> (+0.02%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

CHANGELOG.md Outdated Show resolved Hide resolved
test/test_perfs.py Outdated Show resolved Hide resolved
@Lucas-C Lucas-C merged commit 64d8482 into py-pdf:master Dec 27, 2022
@Lucas-C
Copy link
Member

Lucas-C commented Dec 27, 2022

Merged! Thank you 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

<img> without "height" attribute fails to leave enough blank space for itself
2 participants