Performance boost: use shrink-on-load and enable sequential mode #4

lovell · 2014-01-15T10:56:46Z

https://github.com/jcupitt/libvips/issues/95#issuecomment-32346240

"You could probably speed it up a bit: you're not using the jpeg shrink-on-load feature, which can give a really huge speed boost, up to about 10x, and you're not using sequential mode, which can give an extra 30% or so, even with PNG."

https://github.com/jcupitt/libvips/blob/master/tools/vipsthumbnail.c#L294 provides an example of this.

jcupitt · 2014-01-15T14:33:48Z

shrink-on-load: libjpeg supports shrink-on-load by factors of 2, 4 and 8. The library is able to avoid most decompression and this can give a huge speedup. For example:

$ time vips shrink wtc.jpg x.v 8 8 --vips-leak
memory: high-water mark 200.28 MB
real    0m0.822s
user    0m1.224s
sys 0m0.080s
$ header x.v
x.v: 1171x1171 uchar, 3 bands, srgb, jpegload

Here wtc.jpg is a 10,000 x 10,000 pixel RGB jpeg. This command is decoding the entire image, doing an 8x8 pixel block average, and writing the result to x.v. Memory use is high as this is a 12-core machine and each core needs a large buffer of pixels for the shrink.

With shrink-on-load:

$ time vips jpegload wtc.jpg x.v --shrink 8 --vips-leak
memory: high-water mark 8.17 MB
real    0m0.535s
user    0m0.520s
sys 0m0.040s
$ header x.v
x.v: 1172x1172 uchar, 3 bands, srgb, jpegload

sequential mode: see this blog post:

http://libvips.blogspot.co.uk/2012/06/how-libvips-opens-file.html

Currently sharp is decompressing the whole input image to a memory buffer if the decompressed image is under 100mb, or to a disc file for large files, then processing from that to the output.

If you turn on sequential mode when you open the file, vips will stream the image directly from the input file to the output file. This saves memory and disc traffic, but also gives a speedup, since the decompress of the input can run in parallel with the compress to the output.

The downside is that you can't use operations that need large coordinate changes, like 90 degree rotate, for example, and you can only process the input once. If you want to make several output images at different sizes it might be quicker to decompress once to a memory or disc buffer and then process from that.

lovell · 2014-01-15T17:25:16Z

Thanks @jcupitt - I owe you (at least) a pint.

… to new vip_ methods from legacy im_ methods. Large performance gains all round.

lovell · 2014-01-19T12:23:18Z

With a 4x core CPU I still get the best throughput with the default VIPS_ACCESS_RANDOM so plan to leave this in place rather than switch to sequential mode.

VIPS_ACCESS_SEQUENTIAL_UNBUFFERED:
jpeg sharp-file-file x 95.35 ops/sec ±0.60% (79 runs sampled)
jpeg sharp-file-buffer x 97.51 ops/sec ±0.51% (78 runs sampled)
png sharp-file-file x 33.14 ops/sec ±1.15% (81 runs sampled)

VIPS_ACCESS_SEQUENTIAL:
jpeg sharp-file-file x 95.68 ops/sec ±0.42% (78 runs sampled)
jpeg sharp-file-buffer x 97.84 ops/sec ±0.45% (79 runs sampled)
png sharp-file-file x 32.12 ops/sec ±1.32% (78 runs sampled)

VIPS_ACCESS_RANDOM:
jpeg sharp-file-file x 97.77 ops/sec ±0.42% (79 runs sampled)
jpeg sharp-file-buffer x 99.94 ops/sec ±0.42% (81 runs sampled)
png sharp-file-file x 52.03 ops/sec ±8.02% (88 runs sampled)

jcupitt · 2014-01-19T13:07:55Z

Strange I see quite a large difference with jpeg here:

$ time vips jpegload --access random theo.jpg x.v --vips-leak
memory: high-water mark 80.57 MB
real    0m0.379s
user    0m0.287s
sys 0m0.154s
$ time vips jpegload --access sequential theo.jpg x.v --vips-leak
memory: high-water mark 27.00 MB
real    0m1.010s
user    0m0.290s
sys 0m0.178s

That's with a 6k x 4k RGB jpeg. Are you doing the shrink-on-load thing now? Perhaps that's masking the difference.

The time difference is small with a png, but the memory saving is useful:

$ time vips pngload --access random theo.png x.v --vips-leak
memory: high-water mark 79.01 MB
real    0m0.737s
user    0m0.721s
sys 0m0.176s
$ time vips pngload --access sequential theo.png x.v --vips-leak
memory: high-water mark 22.84 MB
real    0m0.727s
user    0m0.707s
sys 0m0.165s

lovell · 2014-01-19T18:21:54Z

Thanks John - like you I saw lower memory usage with sequential.

Use, or non-use, of mmap for these access methods could explain the performance difference, especially between different architectures/OSes (I'm using x64 Ubuntu 13.10, Linux kernel 3.11.0-15).

I'll expose access method as a pass-through option, defaulting to random to stay in line with libvips.

l may also try firing up a few different EC2 instances to see which method performs best in a virtualised environment.

lovell mentioned this issue Jan 15, 2014

vips__buffer_shutdown() not called for thread 0x... libvips/libvips#95

Closed

lovell added a commit that referenced this issue Jan 16, 2014

Use shrink-on-load for JPEG images, partially implementing #4. Switch…

be8f35d

… to new vip_ methods from legacy im_ methods. Large performance gains all round.

lovell closed this as completed in d509458 Jan 19, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance boost: use shrink-on-load and enable sequential mode #4

Performance boost: use shrink-on-load and enable sequential mode #4

lovell commented Jan 15, 2014

jcupitt commented Jan 15, 2014

lovell commented Jan 15, 2014

lovell commented Jan 19, 2014

jcupitt commented Jan 19, 2014

lovell commented Jan 19, 2014

Performance boost: use shrink-on-load and enable sequential mode #4

Performance boost: use shrink-on-load and enable sequential mode #4

Comments

lovell commented Jan 15, 2014

jcupitt commented Jan 15, 2014

lovell commented Jan 15, 2014

lovell commented Jan 19, 2014

jcupitt commented Jan 19, 2014

lovell commented Jan 19, 2014