Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Add support for webp thumbnailing #7586

Merged
merged 1 commit into from
Jun 5, 2020

Conversation

WGH-
Copy link
Contributor

@WGH- WGH- commented May 27, 2020

Closes #4382

I tested this by editing a file on my server directly inside the virtualenv, seems to work fine.

@WGH-
Copy link
Contributor Author

WGH- commented May 27, 2020

On the other hand, it might make sense to generate webp previews for webp images. This one also works fine. What do you think?

diff --git a/synapse/config/repository.py b/synapse/config/repository.py
index 9d2ce2022..88c37d0fe 100644
--- a/synapse/config/repository.py
+++ b/synapse/config/repository.py
@@ -68,8 +68,10 @@ def parse_thumbnail_requirements(thumbnail_sizes):
         height = size["height"]
         method = size["method"]
         jpeg_thumbnail = ThumbnailRequirement(width, height, method, "image/jpeg")
+        webp_thumbnail = ThumbnailRequirement(width, height, method, "image/webp")
         png_thumbnail = ThumbnailRequirement(width, height, method, "image/png")
         requirements.setdefault("image/jpeg", []).append(jpeg_thumbnail)
+        requirements.setdefault("image/webp", []).append(webp_thumbnail)
         requirements.setdefault("image/gif", []).append(png_thumbnail)
         requirements.setdefault("image/png", []).append(png_thumbnail)
     return {
diff --git a/synapse/rest/media/v1/thumbnailer.py b/synapse/rest/media/v1/thumbnailer.py
index c234ea742..b7c3cbde1 100644
--- a/synapse/rest/media/v1/thumbnailer.py
+++ b/synapse/rest/media/v1/thumbnailer.py
@@ -34,7 +34,7 @@ EXIF_TRANSPOSE_MAPPINGS = {

 class Thumbnailer(object):

-    FORMATS = {"image/jpeg": "JPEG", "image/png": "PNG"}
+    FORMATS = {"image/jpeg": "JPEG", "image/png": "PNG", "image/webp": "WEBP"}

     def __init__(self, input_path):
         self.image = Image.open(input_path)

@WGH- WGH- marked this pull request as ready for review May 27, 2020 23:38
@clokep clokep requested a review from a team May 28, 2020 11:13
@erikjohnston
Copy link
Member

On the other hand, it might make sense to generate webp previews for webp images. This one also works fine. What do you think?

Given some devices can't render webp I think it makes sense for thumbnails to be in a more widely supported format.

@erikjohnston
Copy link
Member

This looks good. I don't suppose you've looked into if its easy to add tests for this somehow?

@WGH-
Copy link
Contributor Author

WGH- commented May 29, 2020

This looks good. I don't suppose you've looked into if its easy to add tests for this somehow?

I looked around a little bit, and didn't find tests that checked all the supported formats. IIRC there were only some tests with small hardcoded PNG files. I can try to refactor these tests, and add WebP there.

@erikjohnston
Copy link
Member

This looks good. I don't suppose you've looked into if its easy to add tests for this somehow?

I looked around a little bit, and didn't find tests that checked all the supported formats. IIRC there were only some tests with small hardcoded PNG files. I can try to refactor these tests, and add WebP there.

Ah. It might be easier to make a new set of tests that poke some of the functions directly? I haven't looked at that bit of the code in a while so don't have any good suggestions off the top of my head. Alternatively there might be tests in sytest?

Anyway, it'd be fab if you could have a look, and feel free to pop into #synapse-dev:matrix.org if you need some pointers. Otherwise, if it turns out to be a royal pain then perhaps we forget about adding a test for now and instead just file an issue to write some general tests for thumbnailing 🙂

@WGH- WGH- force-pushed the webp-thumbnail branch 2 times, most recently from 24e1761 to ff010f2 Compare June 3, 2020 20:29
@WGH-
Copy link
Contributor Author

WGH- commented Jun 3, 2020

I modified the tests to check that webp is also thumbnailed. Previously, there were only png thumbnail tests that checked that the result is bytewise identical to the expected one. Since I doubt that jpeg encoders are guaranteed to be deterministic, I decided to avoid the comparison in the webp test, so it only ensures that the thumbnail exists and some valid image is returned.

@erikjohnston
Copy link
Member

Awesome thank you! I wonder if we should thumbnail them as PNG's just to make the thumbnailing deterministic?

Comment on lines +143 to +160
None,
None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
None,
None,
None, # Thumbnailing may (?) be non-deterministic, so we just check for success not exact output.
None,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying to explain leads to obvious question "why PNG thumbnailing is deterministic, then?". I'd simply omit self-contradictory explanation of the somewhat broken test altogether, since we're planning on fixing it soon (yes, I'm planning to do that).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough, now that we have documented what None means above I think its fine to skip this 👍

Copy link
Member

@erikjohnston erikjohnston left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good just to add some comments to the parameterised stuff, like below, just to make it easier to see what's going on. Otherwise this is good to go 🎉

Comment on lines 102 to 105
_TestImage = namedtuple(
"_TestImage",
["data", "content_type", "extension", "expected_cropped", "expected_scaled"],
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
_TestImage = namedtuple(
"_TestImage",
["data", "content_type", "extension", "expected_cropped", "expected_scaled"],
)
import attr
from Typing import Optional
@attr.s
class _TestImage:
"""An image for testing thumbnailing with the expected results
Attributes:
data: The raw image to thumbnail
content_type: The type of the image as a content type, e.g. "image/png"
extension: The extension associated with the format, e.g. ".png"
expected_cropped: The expected bytes from cropped thumbnailing, or None if
test should just check for success.
expected_scaled: The expected bytes from scaled thumbnailing, or None if
test should just check for a valid image returned.
"""
data = attr.ib(type=bytes)
content_type = attr.ib(type=bytes)
extension = attr.ib(type=bytes)
expected_cropped = attr.ib(type=Optional[bytes])
expected_scaled = attr.ib(type=Optional[bytes])

@WGH-
Copy link
Contributor Author

WGH- commented Jun 4, 2020

I wonder if we should thumbnail them as PNG's just to make the thumbnailing deterministic?

This is a good question. WebP itself can be lossless and lossy, and I'd dare to say that it's two completely different algorithms.

Although it's mostly thought experiments, I didn't do real tests, my reasoning is as follows. I assume that lossy WebP is used in cases where JPEG is used (photo-like images), and lossless WebP is used where PNG is used (text, screenshots, line art).

  • Lossless WebP -> PNG: ok
  • Lossless WebP -> JPEG: thumbnail looks poor, but is compressed okayish
  • Lossy WebP -> PNG: poor compression
  • Lossy WebP -> JPEG: ok

Since it just a thumbnail, I think small size should be preferred over quality.

As of tests, to be frank, I dislike the idea of comparing images this way, even for PNGs. Although I doubt there're many way to compress 1 white pixel with zlib, this approach is very fragile and will break if some larger image is put through it. Default compression level might change, even the underlying library might decide to add some extra PNG metadata chunks, etc.

For lossless conversion, I think the proper test would be to check the resulting file format and raw image data. For lossy conversion, well, in the theory we could check that the resulting image is close enough to the expected result, but I think that would be unnecessary complication since we would duplicate the testing efforts of underlying library (Pillow).

@WGH-
Copy link
Contributor Author

WGH- commented Jun 4, 2020

Deterministic is probably the wrong word. Given the same settings and the same library versions, the result will be the same all the time. However, I doubt anyone guarantees byte-identical results across versions.

@erikjohnston
Copy link
Member

Fair, I'm sold on all of that. I guess what we actually want to test is that the returned images are the right size, but 🤷‍♂️

@WGH-
Copy link
Contributor Author

WGH- commented Jun 4, 2020

Fair, I'm sold on all of that. I guess what we actually want to test is that the returned images are the right size, but 🤷‍♂️

Right. I think it's not very hard to do: we can create solid color-filled images with Pillow at runtime, and thumbnail them in various ways

Would it be fine to both change the tests in such relatively substantial way AND introduce WebP support in one pull request?

@erikjohnston
Copy link
Member

Fair, I'm sold on all of that. I guess what we actually want to test is that the returned images are the right size, but man_shrugging

Right. I think it's not very hard to do: we can create solid color-filled images with Pillow at runtime, and thumbnail them in various ways

Ooh, interesting. That sounds like it should work.

Would it be fine to both change the tests in such relatively substantial way AND introduce WebP support in one pull request?

I don't think we should block webp support on making our tests better, so long as we have a test that will explode if we completely break webp support then I'm happy (which we do).

Closes matrix-org#4382

Signed-off-by: Maxim Plotnikov <wgh@torlan.ru>
@WGH-
Copy link
Contributor Author

WGH- commented Jun 4, 2020

Test 600 Outbound federation will ignore a missing event with bad JSON for room version 6... FAIL

Looks like some flaky test.

@erikjohnston
Copy link
Member

Test 600 Outbound federation will ignore a missing event with bad JSON for room version 6... FAIL

Looks like some flaky test.

Yeah, looks like it annoyingly. Will re-run the test then merge! 🎉 Thanks for bearing with the back and forth on this 🙂


(Just as a small note for any future PRs: we'd prefer if you didn't rebase while we're reviewing to make it easier to track the changes you make, we'll squash merge it at the end anyway. It wasn't a problem in this PR though 🙂 )

@erikjohnston erikjohnston merged commit e55ee7c into matrix-org:develop Jun 5, 2020
@WGH-
Copy link
Contributor Author

WGH- commented Jun 5, 2020

(Just as a small note for any future PRs: we'd prefer if you didn't rebase while we're reviewing to make it easier to track the changes you make, we'll squash merge it at the end anyway. It wasn't a problem in this PR though 🙂 )

@erikjohnston what about the PRs that logically consist of multiple commits? If I get the commit "separation" right from the start, I can push fixup commits, but what if separation becomes clear later on?

@erikjohnston
Copy link
Member

Certainly I tend to do my PRs with many small commits rather than one big one, though I do squash merge at the end usually. If it makes sense we can do a normal merge, though typically if the commits are separate enough that they warrant being their own commits in the git history after merge they probably should have been separate PRs. There's no hard and fast rule here though

@WGH-
Copy link
Contributor Author

WGH- commented Jun 5, 2020

though typically if the commits are separate enough that they warrant being their own commits in the git history after merge they probably should have been separate PRs

If some new feature requires a bit of preparatory refactoring of existing code, I prefer to have the refactoring in a separate commit in the history. Having them separate will make future archaeology (git bisect, etc.) easier, and smaller patches are usually easier to review than one giant patch. Sure, the refactoring can be a separate PR, but without the new feature it will usually look unnecessary, so it makes sense to review and merge them together.

But that's just like my opinion :)

@richvdh
Copy link
Member

richvdh commented Jun 5, 2020

If some new feature requires a bit of preparatory refactoring of existing code, I prefer to have the refactoring in a separate commit in the history...

These things might be true, but they are outweighed by the fact that rebasing after a review has taken place takes the reviewer back to the drawing board, making it much harder to land the PR. Better to have the PR landed as one big commit than not landed at all because the reviewer didn't have time to figure out what changed in each round of force-pushes.

babolivier added a commit that referenced this pull request Jun 10, 2020
…rg-hotfixes

Synapse 1.15.0rc1 (2020-06-09)
==============================

Features
--------

- Advertise support for Client-Server API r0.6.0 and remove related unstable feature flags. ([\#6585](#6585))
- Add an option to disable autojoining rooms for guest accounts. ([\#6637](#6637))
- For SAML authentication, add the ability to pass email addresses to be added to new users' accounts via SAML attributes. Contributed by Christopher Cooper. ([\#7385](#7385))
- Add admin APIs to allow server admins to manage users' devices. Contributed by @dklimpel. ([\#7481](#7481))
- Add support for generating thumbnails for WebP images. Previously, users would see an empty box instead of preview image. ([\#7586](#7586))
- Support the standardized `m.login.sso` user-interactive authentication flow. ([\#7630](#7630))

Bugfixes
--------

- Allow new users to be registered via the admin API even if the monthly active user limit has been reached. Contributed by @dkimpel. ([\#7263](#7263))
- Fix email notifications not being enabled for new users when created via the Admin API. ([\#7267](#7267))
- Fix str placeholders in an instance of `PrepareDatabaseException`. Introduced in Synapse v1.8.0. ([\#7575](#7575))
- Fix a bug in automatic user creation during first time login with `m.login.jwt`. Regression in v1.6.0. Contributed by @olof. ([\#7585](#7585))
- Fix a bug causing the cross-signing keys to be ignored when resyncing a device list. ([\#7594](#7594))
- Fix metrics failing when there is a large number of active background processes. ([\#7597](#7597))
- Fix bug where returning rooms for a group would fail if it included a room that the server was not in. ([\#7599](#7599))
- Fix duplicate key violation when persisting read markers. ([\#7607](#7607))
- Prevent an entire iteration of the device list resync loop from failing if one server responds with a malformed result. ([\#7609](#7609))
- Fix exceptions when fetching events from a remote host fails. ([\#7622](#7622))
- Make `synctl restart` start synapse if it wasn't running. ([\#7624](#7624))
- Pass device information through to the login endpoint when using the login fallback. ([\#7629](#7629))
- Advertise the `m.login.token` login flow when OpenID Connect is enabled. ([\#7631](#7631))
- Fix bug in account data replication stream. ([\#7656](#7656))

Improved Documentation
----------------------

- Update the OpenBSD installation instructions. ([\#7587](#7587))
- Advertise Python 3.8 support in `setup.py`. ([\#7602](#7602))
- Add a link to `#synapse:matrix.org` in the troubleshooting section of the README. ([\#7603](#7603))
- Clarifications to the admin api documentation. ([\#7647](#7647))

Internal Changes
----------------

- Convert the identity handler to async/await. ([\#7561](#7561))
- Improve query performance for fetching state from a PostgreSQL database. ([\#7567](#7567))
- Speed up processing of federation stream RDATA rows. ([\#7584](#7584))
- Add comment to systemd example to show postgresql dependency. ([\#7591](#7591))
- Refactor `Ratelimiter` to limit the amount of expensive config value accesses. ([\#7595](#7595))
- Convert groups handlers to async/await. ([\#7600](#7600))
- Clean up exception handling in `SAML2ResponseResource`. ([\#7614](#7614))
- Check that all asynchronous tasks succeed and general cleanup of `MonthlyActiveUsersTestCase` and `TestMauLimit`. ([\#7619](#7619))
- Convert `get_user_id_by_threepid` to async/await. ([\#7620](#7620))
- Switch to upstream `dh-virtualenv` rather than our fork for Debian package builds. ([\#7621](#7621))
- Update CI scripts to check the number in the newsfile fragment. ([\#7623](#7623))
- Check if the localpart of a Matrix ID is reserved for guest users earlier in the registration flow, as well as when responding to requests to `/register/available`. ([\#7625](#7625))
- Minor cleanups to OpenID Connect integration. ([\#7628](#7628))
- Attempt to fix flaky test: `PhoneHomeStatsTestCase.test_performance_100`. ([\#7634](#7634))
- Fix typos of `m.olm.curve25519-aes-sha2` and `m.megolm.v1.aes-sha2` in comments, test files. ([\#7637](#7637))
- Convert user directory, state deltas, and stats handlers to async/await. ([\#7640](#7640))
- Remove some unused constants. ([\#7644](#7644))
- Fix type information on `assert_*_is_admin` methods. ([\#7645](#7645))
- Convert registration handler to async/await. ([\#7649](#7649))
phil-flex pushed a commit to phil-flex/synapse that referenced this pull request Jun 16, 2020
Closes matrix-org#4382

Signed-off-by: Maxim Plotnikov <wgh@torlan.ru>
@bkil
Copy link

bkil commented Aug 5, 2021

I agree that creating webp thumbnails should make more sense and/or thumbnail should result in a smaller image than the source. See also: #2667

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support thumbnailing webp images
4 participants