-
Notifications
You must be signed in to change notification settings - Fork 170
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance: Read all metadata at once #362
Conversation
Codecov Report
@@ Coverage Diff @@
## master #362 +/- ##
=======================================
Coverage 87.49% 87.49%
=======================================
Files 19 19
Lines 1503 1503
=======================================
Hits 1315 1315
Misses 188 188
Continue to review full report at Codecov.
|
The new non-covered code parts seem hard to be covered. These are |
sigal/gallery.py
Outdated
@@ -152,6 +152,7 @@ def _get_metadata(self): | |||
""" Get image metadata from filename.md: title, description, meta.""" | |||
self.description = '' | |||
self.meta = {} | |||
self.file_metadata = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it need to be a dict ? It seems that it could be an attribute of the Media object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My design here is far from perfect, indeed.
I started with a simple dict, because it is flexible enough. But we could imagine a new class MediaMetadata
that hold some functions currently in image.py
.
But I still don't know if we need one metadata holder for each underlying file (src
, dst
and thumb
), or one for all.
sigal/image.py
Outdated
filename, e) | ||
|
||
try: | ||
size = get_size(img) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that size is not used later ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Size is not used for the source image. But it used for other images, such as thumbnails.
My goal was to code a function that extract everything that could be needed. And get_size()
with an already opened image is really harmless from a performance point-of-view:
(Notice how calling Image.open
will call JpegImagePlugin._open
twice here).
Thanks @Glandos , looks like a nice improvement. About testing the exception cases, I agree that it is not easy, it would need specific images just to make the function fail. So I'm fine with not testing these cases, but we need to keep the try/catch as there are many malformed images in the wild. |
Hi @Glandos , |
Thanks for the reminder. Being busy is not ideal when contributing… So I'll try to finish this. |
No worries, I am not much available either for open source work these days. |
Make _read_image no-op if argument is an image. Add gathering metadata function that returns a dict
21bdb28
to
18903fd
Compare
Forgot to merge this after I had rebased... |
…tadata Performance: Read all metadata at once
This will reduce the need to call PIL.Image.open, which is by far the heavier part when doing an idle build.
On my gallery with 536 images and 46 videos, I'm experiencing an improvement of ~28% faster build when writing only HTML.