Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Asynchronous image scaling #3090

Open
datakurre opened this issue Apr 23, 2020 · 4 comments
Open

Asynchronous image scaling #3090

datakurre opened this issue Apr 23, 2020 · 4 comments

Comments

@datakurre
Copy link
Member

datakurre commented Apr 23, 2020

PLIP (Plone Improvement Proposal)

Responsible Persons

Proposer: Asko Soukka

Seconder: Timo Stollenwerk, Victor Fernandez de Alba

Abstract

We propose option for replacing the current behavior of creating image scales synchronously on demand with a new behavior of building them asynchronously, both for faster performance with multiprocessing support faster responses due to non-blocking scaling requests.

Motivation

HTML5 features such Picture and srcset allow to optimize responsive design at least in three different dimensions: image size (mobile, tablet, desktop, ...), image pixel density (1x, 2x, 3x, ...) and image format (PNG, WEBP, JPEG2000, ...). This increases demand for different scales and other versions of a single image. Also no longer do all users need the same version of the image. When before all necessary scales where created on-demand already immediately by the editor viewing the saved document, more and more scales are created only long after the original edit, resulting slow performance when viewing the content.

Another motivation for asynchronous scaling is acute performance issue with Plone REST API based editing of image content, so called "headless" use case. To return cacheable image scales for all available versions, Plone REST API need to call Plone image scaling API to reserve URLs for those, effectively creating all configured scales immediately on first read of the content. While subsequent calls would be fast, this first slow read makes using Plone REST API for images inconvenient and discourages use of Plone scales and adding support for modern image formats.

Assumptions

We want to add support for modern image format alternatives (WEBP, JPEG2000, ...) for Plone with srcset.

We want to provide responsive images scale alternatives in Plone with Picture tag.

We believe, it is, at least initially, easier to support modern image formats and more scales by reusing the existing scaling framework than by integrate an external image scaling service.

This proposal only covers images stored with plone.namedfile blob image fields.

Proposal & Implementation

We propose enhancing the current on-demand image scaling with, possibly optional, asynchronous scaling implementation:

  • When a scale is requested, it is not immediately created, but instead, a scale storage item with the usual configuration, but empty data, is put into the existing scaling storage. This creates URL by which the scale could be requested later by browser and which could be returned by REST API without immediately creating the scale.

  • At the end of the transaction with new scale storage items, tasks for creating those scales are put into implemented image scaling queue. In task descriptions, the related scale storage and original image are referred with OIDs to allow fast retrieval from ZODB by the processor.

  • Image scaling queue processor is a new thread started once for each Plone instance (or WSGI server process). The thread will reserve its own dedicated ZODB connection from the configured connection pool, but with minimal object cache (only 100 objects) for minimal memory footprint.

  • Image scaling queue processor scales images using Python built-in concurrent.futures.ProcessPoolExecutor, which allows using all available CPU cores for image scaling in parallel. Scales are written into ZODB sequentially in their completion order, each with its own commit, by the scaling queue processor to prevent conflicts.

  • If scale has not been generated yet when requested, plone.namedfile scale traverser will do redirect to the original image size display-file URL.

  • If scale has not been generated yet when requested, but the scale storage placeholder is more than 10 minutes old, a new scaling task will be immediately queued.

  • On Volto, because Volto proxies all images from Plone, if scale has not been generated yet when requested, the request is retried for a few times to wait for the scale to appear before fallback to the original version. This effectively makes scaling both asynchronous (non-blocking) and still immediate on Volto use cases.

Deliverables

This PLIP will eventually provide three pull requests evolved from the following POC branches into their respective packages:

Risks

There may be bugs.

Not all use cases may not have been covered yet, resulting in empty scales (always redirecting to the original).

ZODB undo log gets bloated with image scales.

May not work with all WSGI deployments, because requires long-running Python thread next to Zope worker threads.

Disabling asynchronous image scaling may leave scales without scale value in scaling storage. Plone will fallback to deliver the original image version when these scales are requested. Eventually the scaling storage will clean those scales and replace them with synchronously generated once.

Participants

@datakurre datakurre added this to the Future milestone Apr 24, 2020
@datakurre datakurre changed the title WIP: Optional asynchronous image scaling and building scales on advance Asynchronous image scaling and building scales on advance Apr 24, 2020
@datakurre datakurre changed the title Asynchronous image scaling and building scales on advance Asynchronous image scaling and building scales in advance Apr 24, 2020
@erral
Copy link
Sponsor Member

erral commented Apr 24, 2020

May not work with all WSGI deployments, because requires long-running Python thread next to Zope worker threads.

Will this need an special setup to configure such a long-running Python thread? If so I would say we need to document it somewhere.

@datakurre
Copy link
Member Author

@erral It depends on WSGI server. With waitress it works OOTB. It may be that even running Plone as such requires this. I don't know for sure.

@datakurre datakurre changed the title Asynchronous image scaling and building scales in advance Asynchronous image scaling Apr 24, 2020
@datakurre
Copy link
Member Author

Part of this PLIP considering generating configured scales in advanced has been split into a pull request, which I hope, could be merged as an opt-in feature for plone.formwidget.namedfile without PLIP. (My assumption is that only turning it on by default would require PLIP if that would ever happen.) plone/plone.formwidget.namedfile#43

@datakurre
Copy link
Member Author

I added the branch to our migration buildout and it clearly provides a significant improvement for us. Creating an image over plone.restapi (with 30 scales in our cases) take about 2-2,5 seconds in average (for "normal" images)..this went down to about 500ms which is great improvement when you have to migration about 30.000 images^^
https://community.plone.org/t/plone-scale-deferred-creation-of-scales/12210/4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: New (drafts)
Development

No branches or pull requests

2 participants