Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Pillow rather than Imagemagick for thumbnail generation #1104

Merged
merged 18 commits into from Apr 22, 2023

Conversation

dairiki
Copy link
Contributor

@dairiki dairiki commented Feb 26, 2023

Currently, we use an external utility, ImageMagick, to generate thumbnail images. Installation on certain (ahem, Windows) platforms is tricky and confusing for end-users, and slow in our CI tests. [ETA: I have just noticed that GitHub’s latest macOS and Windows runners have ImageMagick pre-installed.]

Pillow is a python image manipulation library.
It is a binary distribution — tricky to compile from source — however (pre-compiled) binary wheels for a fairly complete set of platforms are available from PyPI, so installation is straight-forward for supported platforms.

This PR is currently in a proof-of-concept state. The intent is to test to see whether moving from ImageMagick to Pillow is viable and a good idea.

Current State

Basic tests show that Pillow seems to work fine. Initial speed tests indicate that Pillow is, if anything, slightly faster than ImageMagick, at least when running the tests in test_images.py.

Open Issues

Should Pillow be added as an "extra" dependency, or just a straight install-requires?

Currently, this PR adds is as a [thumbnails] "extra". This means that installing Lektor becomes slightly more complicated, but ensures that Lektor is still installable on platforms where Pillow is not supported.

This PR adds Pillow as an unconditionally required dependency.

To Do

Need better tests for things like:

  • Thumbnail rotation based on EXIF Orientation data
  • Handling of color profiles. (I'm not quite sure yet whether the old ImageMagick code attempts to convert images in other color profiles to sRGB or simply discarded the original color profile and marked the thumbnails as sRGB. Right now, this PR attempts color profile conversion if the original is not sRGB.)
  • Speed. More speed comparisons between Pillow and ImageMagick are in order.
  • Decide whether to declare Pillow unconditionally required dependency or as an "extra" (see above)
  • Add support for WEBP? (See Add webp as image format in default config #1099)
    (This is probably best left for a future PR.)

Issue(s) Resolved

Pillow distributes binary wheels for a fairly complete set of platforms.
As such, using Pillow rather than imagemagick for thumbnail support makes installation of Lektor (with thumbnail support) more straightforward. No messing with Chocolatey or Homebrew (unless one wants ffmpeg for video thumbnail support.)

(It may be possible to convert the video thumbnailing code to use opencv rather than requiring external installation of ffmpeg.)

Fixes #

Related Issues / Links

#497, #1099

lektor/lektor-website#370

Description of Changes

  • Wrote at least one-line docstrings (for any new functions)
  • Added unit test(s) covering the changes (if testable)

@dairiki dairiki force-pushed the feat.pillow branch 2 times, most recently from 6d5650a to cfdd3ce Compare February 26, 2023 06:44
@dairiki
Copy link
Contributor Author

dairiki commented Apr 13, 2023

On generating thumbnails for images with embedded color profiles

Our current (imagemagick-based) code runs convert with -strip -colorspace sRGB (as well as other parameters) to generate the thumbnail image. From experimentation, it appears that this just strips (and ignores) any embedded colorspace in the original image, and sets no colorspace metadata in the thumbnail. (The default colorspace for images on the web is sRGB.) As a result, colors get mangled in the thumbnailing process.

As things currently stand, the new (Pillow) code, tries to convert images with an embedded colorspace to sRGB (and embeds the sRGB profile in the resulting thumbnail images.)

The latter (new) behavior probably makes more sense, as it attempts to preserve the colors in the image.

Another — maybe the best? — option would be to ignore the colorspace when computing the thumbnail, but copy the embedded colorspace from the source image to the thumbnail. That would leave any color correction up to the browser.

@dairiki
Copy link
Contributor Author

dairiki commented Apr 13, 2023

Speed Tests

I've just performed a simple speed test. The new (Pillow-based) code is faster.

Scaling a 3 MB, 4032x3024 JPEG to a 400x300 JPEG, with quality=95.

Using the new Pillow code: 89 msec
Using imagemagick: 140 msec


Test code below:

import subprocess
from timeit import timeit

from lektor.imagetools import _compute_thumbnail
from lektor.imagetools import ImageSize


SRC = "IMG_20170213_113608.jpg"
THUMB_SIZE = ImageSize(400, 300)
QUALITY = 95


def pillow_thb():
    with open(SRC, "rb") as infp:
        with open("img-pillow-thb.jpg", "wb") as outfp:
            _compute_thumbnail(infp, outfp, THUMB_SIZE, "JPEG", QUALITY)


N = 20
t = timeit("pillow_thb()", number=N, globals=globals())
print(f"pillow_thb {t / N:.3f}s")


def im_thb():
    cmd = [
        "convert", SRC,
        "-auto-orient",
        "-resize", f"{THUMB_SIZE.width}x{THUMB_SIZE.height}",
        "-strip", "-colorspace", "sRGB",
        "-quality", f"{QUALITY:d}",
        "img-im-thb.jpg",
    ]
    subprocess.run(cmd, check=True)


t = timeit("im_thb()", number=N, globals=globals())
print(f"im_thb {t / N:.3f}s")

@dairiki dairiki force-pushed the feat.pillow branch 2 times, most recently from a11848f to 6bc86e7 Compare April 15, 2023 01:23
@dairiki dairiki added this to the 3.4 milestone Apr 15, 2023
@dairiki dairiki marked this pull request as ready for review April 15, 2023 01:46
@yagebu
Copy link
Contributor

yagebu commented Apr 15, 2023

Great work as always!

I'd be all for making Pillow as a dependency.

@nixjdm
Copy link
Member

nixjdm commented Apr 15, 2023 via email

@dairiki
Copy link
Contributor Author

dairiki commented Apr 15, 2023

If only we had a similar option for FFmpeg.

Opencv-python-headless might be that option.

I haven't played with it yet, but it purports to be a general solution for (among many other things) extracting frames from video files. Binary wheels are provided for at least some Windows, Mac and Linux targets. (I think the binary wheels are linked with libffmpeg under the hood.)

@dairiki
Copy link
Contributor Author

dairiki commented Apr 15, 2023

I'd be all for making Pillow as a dependency.

@yagebu @nixjdm Just to be clear, you both vote to make Pillow a required dependency (rather than an "extra")?

I'm still sort of on the fence. It may be a pain to install Pillow on certain platforms (e.g. non-{Linux,Windows,Mac} OSes, or stale version of those, non-mainstream-CPUs, non-CPython, etc.), some users might rather forego thumbnailing support rather than be forced to install Pillow from source.

At the same time, I suspect that for the vast majority of users installing Pillow is painless. The confusion added by the need to specify an extra when installing Lektor in order to get thumbnail support multiplied by the frequency of occurrence of that confusion is likely to result in a lot more net pain than would result from the requirement of installing Pillow from sdist on a few esoteric platforms.

It would be nice if there were a way to declare a "default" extra dependency that would be installed unless the user explicitly deselected it, but, at present, there seems to be no good way to do that. (Here's a discussion on the possibility of adding such a feature.)

@yagebu
Copy link
Contributor

yagebu commented Apr 15, 2023

Just to be clear, you both vote to make Pillow a required dependency (rather than an "extra")?

I'm for making it a required dep :) On the more obscure platforms I'd assume Pillow to be available with the system package manager.

@nixjdm
Copy link
Member

nixjdm commented Apr 15, 2023 via email

@dairiki dairiki force-pushed the feat.pillow branch 5 times, most recently from 58d591a to c4fc850 Compare April 18, 2023 00:04
dairiki added a commit to dairiki/lektor-website that referenced this pull request Apr 18, 2023
dairiki added a commit to dairiki/lektor-website that referenced this pull request Apr 18, 2023
@dairiki dairiki force-pushed the feat.pillow branch 7 times, most recently from 1c726e6 to 604dfbc Compare April 19, 2023 18:41
Previous `format=`"unknown"` was being returned for short files, while
`None` was being returned for all other unrecognized image files.
After some experimentation, it turns out that the old imagemagick code
just discarded any embedded color profile from the source image.

After this commit, we attempt to transform the source image from any
embedded color profile to sRGB, but still omit the sRGB color profile
from the resulting thumbnail.
Remove test_images.py after ensuring that all of its tests are
replicated in test_imagetools.py.
If the model contains the make, we want the model as the combined
value.
Since Lektor==3.2.dev0 make_image_thumbnail has been issuing a warning
and falling back to FIT mode when called without both a thumbnail
width and height when the mode other than FIT.

Now we raise ValueError in that case.
This seems possibly no longer necessary to prevent the macOS CI run
from hanging. (See lektor#1132.)

This reverts commit 7960115.
@dairiki dairiki merged commit b843775 into lektor:master Apr 22, 2023
15 checks passed
@dairiki dairiki deleted the feat.pillow branch April 23, 2023 22:43
dairiki added a commit that referenced this pull request May 7, 2023
PR #1104 uses a functools.partial as a sub-artifact's
_build_func_. When verbose logging is enabled, this caused an
exception from `lektor.reporter.describe_build_func`.
@xlotlu
Copy link
Contributor

xlotlu commented May 18, 2023

Our current (imagemagick-based) code runs convert with -strip -colorspace sRGB (as well as other parameters) to generate the thumbnail image. From experimentation, it appears that this just strips (and ignores) any embedded colorspace in the original image, and sets no colorspace metadata in the thumbnail. (The default colorspace for images on the web is sRGB.) As a result, colors get mangled in the thumbnailing process.

As things currently stand, the new (Pillow) code, tries to convert images with an embedded colorspace to sRGB (and embeds the sRGB profile in the resulting thumbnail images.)

The latter (new) behavior probably makes more sense, as it attempts to preserve the colors in the image.

Another — maybe the best? — option would be to ignore the colorspace when computing the thumbnail, but copy the embedded colorspace from the source image to the thumbnail. That would leave any color correction up to the browser.

I think the entire rationale of the conversion was to take care of CMYK jpegs, which result in obscene thumbnail sizes due to the embedded color profile. The sRGB stripping was to shave off some extra bits, and I still think it's a good idea since (all?) browsers default to sRGB. These were things considered at the time.

There should be no color mangling, unless the source is both not sRGB and misses a profile. Although, that -colorspace after -strip looks suspect.

This is mostly water under the bridge now, but I wanted to point out that I find the current behaviour correct in a web environment -- anything that is not sRGB to be explicitly converted.


On a general note, back when I rewrote the image processing stuff I envisioned what you did @dairiki as handled by separate "thumbnail generation backends". I particularly had libvips in mind at the time, both due to the great speed improvements and its "smart crop" mode -- see #609. At the time I considered it necessary to create:

  • a backend plugin using libvips bindings
  • a backend plugin using the vips binary, similar to the old imagemagick implementation
  • and maybe another plugin to use imagemagick bindings directly.

The reasoning was libvips bindings are difficult to install on many platforms, while the vips binary is a simple deployment. The same for imagemagick. Funnily enough I wrote PIL off the list for some reason.

Overkill? I don't know, maybe. I was dealing with image gallery sites at the time. One thing to note is that I wrote #607 with a photographer's collection in mind (which imagemagick could handle, I'm not sure about PIL).

The last note (I didn't capture this in an issue), was that I really found it necessary to do parallel thumbnail generation. At the time I considered a) the straightforward way: subprocess using the binary, and b) an over-engineered worker-based implementation for use with the bindings.

Regards.

@dairiki
Copy link
Contributor Author

dairiki commented May 19, 2023

I think Pillow is plenty capable enough for #607.

Funnily enough I wrote PIL off the list for some reason.

There was a long period when the original PIL was not particularly well supported before Pillow took over. Pillows capabilities have grown over time — it may not have been the best choice back in 2018.


#609 is probably outside of my range of interest and expertise. I'm pretty sure Pillow can deal with the image input and output just fine. Perhaps opencv-python can help with the more non-linear sorts of processing? And there is a seam-carving package on PyPI (pure python, but uses numpy, etc.)


The last note (I didn't capture this in an issue), was that I really found it necessary to do parallel thumbnail generation.

I do have in the back of my head thoughts about parallelizing the Lektor build process (#971) at the lektor.builder level so that multiple artifacts/sources can be built over multiple threads or processes. I've dinked at it a little — It would require, I think, a significant refactor/cleanup of lektor.builder.

dairiki added a commit to dairiki/lektor that referenced this pull request Sep 11, 2023
)

* feat(imagetools): use Pillow (instead of imagemagick) for thumbnailing

* feat(imagetools): clean up remnants of imagemagick config

* test(imagetools): add color-space handling test page to test demo project

* fix: get_image_info(): return format=None for short file

Previous `format=`"unknown"` was being returned for short files, while
`None` was being returned for all other unrecognized image files.

* refactor(imagetools): omit sRGB color profile from thumbnails

After some experimentation, it turns out that the old imagemagick code
just discarded any embedded color profile from the source image.

After this commit, we attempt to transform the source image from any
embedded color profile to sRGB, but still omit the sRGB color profile
from the resulting thumbnail.

* refactor(imagetools): make Pillow a required dependency

* fix(imagetools): _combine_make - return model (not make)

If the model contains the make, we want the model as the combined
value.

* test: Test with older versions of Pillow - 6.0.x and 7.0.x

* feat!: convert warning to error when thumbnail dim missing

Since Lektor==3.2.dev0 make_image_thumbnail has been issuing a warning
and falling back to FIT mode when called without both a thumbnail
width and height when the mode other than FIT.

Now we raise ValueError in that case.

* Revert "ci: pin coverage<7.0.2 on macOS (lektor#1132)"

This seems possibly no longer necessary to prevent the macOS CI run
from hanging. (See lektor#1132.)

This reverts commit 7960115.
dairiki added a commit to dairiki/lektor that referenced this pull request Sep 11, 2023
PR lektor#1104 uses a functools.partial as a sub-artifact's
_build_func_. When verbose logging is enabled, this caused an
exception from `lektor.reporter.describe_build_func`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants