Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance: Read all metadata at once #362

Merged
merged 5 commits into from
Apr 1, 2020

Conversation

Glandos
Copy link
Contributor

@Glandos Glandos commented Feb 10, 2019

This will reduce the need to call PIL.Image.open, which is by far the heavier part when doing an idle build.

On my gallery with 536 images and 46 videos, I'm experiencing an improvement of ~28% faster build when writing only HTML.

@codecov
Copy link

codecov bot commented Feb 10, 2019

Codecov Report

Merging #362 into master will not change coverage by %.
The diff coverage is 96.42%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #362   +/-   ##
=======================================
  Coverage   87.49%   87.49%           
=======================================
  Files          19       19           
  Lines        1503     1503           
=======================================
  Hits         1315     1315           
  Misses        188      188           
Impacted Files Coverage Δ
sigal/log.py 82.75% <ø> (ø)
sigal/plugins/feeds.py 97.05% <ø> (ø)
sigal/plugins/nomedia.py 89.74% <ø> (ø)
sigal/plugins/upload_s3.py 0.00% <0.00%> (ø)
sigal/plugins/watermark.py 88.37% <ø> (ø)
sigal/settings.py 97.87% <ø> (ø)
sigal/__init__.py 83.95% <100.00%> (ø)
sigal/gallery.py 89.79% <100.00%> (ø)
sigal/image.py 92.26% <100.00%> (+0.04%) ⬆️
sigal/plugins/adjust.py 100.00% <100.00%> (ø)
... and 8 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 97b6808...ef87f6a. Read the comment docs.

@Glandos
Copy link
Contributor Author

Glandos commented Feb 11, 2019

The new non-covered code parts seem hard to be covered. These are try…catch blocks in the get_image_metadata after the first read. It's difficult to imagine a test case where EXIF is correct but not the size. Maybe they should be removed?

sigal/gallery.py Outdated
@@ -152,6 +152,7 @@ def _get_metadata(self):
""" Get image metadata from filename.md: title, description, meta."""
self.description = ''
self.meta = {}
self.file_metadata = {}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it need to be a dict ? It seems that it could be an attribute of the Media object.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My design here is far from perfect, indeed.

I started with a simple dict, because it is flexible enough. But we could imagine a new class MediaMetadata that hold some functions currently in image.py.
But I still don't know if we need one metadata holder for each underlying file (src, dst and thumb), or one for all.

sigal/image.py Outdated
filename, e)

try:
size = get_size(img)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that size is not used later ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Size is not used for the source image. But it used for other images, such as thumbnails.
My goal was to code a function that extract everything that could be needed. And get_size() with an already opened image is really harmless from a performance point-of-view:
Performance with snakeviz
(Notice how calling Image.open will call JpegImagePlugin._open twice here).

@saimn
Copy link
Owner

saimn commented Feb 11, 2019

Thanks @Glandos , looks like a nice improvement. About testing the exception cases, I agree that it is not easy, it would need specific images just to make the function fail. So I'm fine with not testing these cases, but we need to keep the try/catch as there are many malformed images in the wild.

@saimn saimn added this to the 2.1 milestone Feb 11, 2019
@saimn
Copy link
Owner

saimn commented Oct 5, 2019

Hi @Glandos ,
Where are we with this and the various changes that you (and others) made in Pillow ? I think that several key improvements were done in Pillow so is it still necessary to do the change here ? And if yes do you think that you could finalize this PR ?
Thanks

@Glandos
Copy link
Contributor Author

Glandos commented Oct 6, 2019

Thanks for the reminder. Being busy is not ideal when contributing…
Anyway, I've just run my tests with the latests Pillow (6.2.0), and this PR still improves the idle build by 12%. Not as much as before, but still significant.

So I'll try to finish this.

@saimn
Copy link
Owner

saimn commented Oct 15, 2019

No worries, I am not much available either for open source work these days.
It will be great when you can finish this one but take it easy :).

saimn and others added 4 commits March 15, 2020 22:58
Make _read_image no-op if argument is an image.
Add gathering metadata function that returns a dict
@saimn saimn force-pushed the performance_one_open_for_metadata branch from 21bdb28 to 18903fd Compare March 16, 2020 02:58
@saimn
Copy link
Owner

saimn commented Apr 1, 2020

Forgot to merge this after I had rebased...
Thanks @Glandos !

@saimn saimn merged commit 0d8a3c8 into saimn:master Apr 1, 2020
kontza pushed a commit to kontza/sigal that referenced this pull request Aug 28, 2020
…tadata

Performance: Read all metadata at once
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants