Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance boost: use shrink-on-load and enable sequential mode #4

Closed
lovell opened this issue Jan 15, 2014 · 5 comments
Closed

Performance boost: use shrink-on-load and enable sequential mode #4

lovell opened this issue Jan 15, 2014 · 5 comments

Comments

@lovell
Copy link
Owner

lovell commented Jan 15, 2014

https://github.com/jcupitt/libvips/issues/95#issuecomment-32346240

"You could probably speed it up a bit: you're not using the jpeg shrink-on-load feature, which can give a really huge speed boost, up to about 10x, and you're not using sequential mode, which can give an extra 30% or so, even with PNG."

https://github.com/jcupitt/libvips/blob/master/tools/vipsthumbnail.c#L294 provides an example of this.

@jcupitt
Copy link
Contributor

jcupitt commented Jan 15, 2014

shrink-on-load: libjpeg supports shrink-on-load by factors of 2, 4 and 8. The library is able to avoid most decompression and this can give a huge speedup. For example:

$ time vips shrink wtc.jpg x.v 8 8 --vips-leak
memory: high-water mark 200.28 MB
real    0m0.822s
user    0m1.224s
sys 0m0.080s
$ header x.v
x.v: 1171x1171 uchar, 3 bands, srgb, jpegload

Here wtc.jpg is a 10,000 x 10,000 pixel RGB jpeg. This command is decoding the entire image, doing an 8x8 pixel block average, and writing the result to x.v. Memory use is high as this is a 12-core machine and each core needs a large buffer of pixels for the shrink.

With shrink-on-load:

$ time vips jpegload wtc.jpg x.v --shrink 8 --vips-leak
memory: high-water mark 8.17 MB
real    0m0.535s
user    0m0.520s
sys 0m0.040s
$ header x.v
x.v: 1172x1172 uchar, 3 bands, srgb, jpegload

sequential mode: see this blog post:

http://libvips.blogspot.co.uk/2012/06/how-libvips-opens-file.html

Currently sharp is decompressing the whole input image to a memory buffer if the decompressed image is under 100mb, or to a disc file for large files, then processing from that to the output.

If you turn on sequential mode when you open the file, vips will stream the image directly from the input file to the output file. This saves memory and disc traffic, but also gives a speedup, since the decompress of the input can run in parallel with the compress to the output.

The downside is that you can't use operations that need large coordinate changes, like 90 degree rotate, for example, and you can only process the input once. If you want to make several output images at different sizes it might be quicker to decompress once to a memory or disc buffer and then process from that.

@lovell
Copy link
Owner Author

lovell commented Jan 15, 2014

Thanks @jcupitt - I owe you (at least) a pint.

lovell added a commit that referenced this issue Jan 16, 2014
… to new vip_ methods from legacy im_ methods. Large performance gains all round.
@lovell
Copy link
Owner Author

lovell commented Jan 19, 2014

With a 4x core CPU I still get the best throughput with the default VIPS_ACCESS_RANDOM so plan to leave this in place rather than switch to sequential mode.

VIPS_ACCESS_SEQUENTIAL_UNBUFFERED:
jpeg sharp-file-file x 95.35 ops/sec ±0.60% (79 runs sampled)
jpeg sharp-file-buffer x 97.51 ops/sec ±0.51% (78 runs sampled)
png sharp-file-file x 33.14 ops/sec ±1.15% (81 runs sampled)

VIPS_ACCESS_SEQUENTIAL:
jpeg sharp-file-file x 95.68 ops/sec ±0.42% (78 runs sampled)
jpeg sharp-file-buffer x 97.84 ops/sec ±0.45% (79 runs sampled)
png sharp-file-file x 32.12 ops/sec ±1.32% (78 runs sampled)

VIPS_ACCESS_RANDOM:
jpeg sharp-file-file x 97.77 ops/sec ±0.42% (79 runs sampled)
jpeg sharp-file-buffer x 99.94 ops/sec ±0.42% (81 runs sampled)
png sharp-file-file x 52.03 ops/sec ±8.02% (88 runs sampled)

@jcupitt
Copy link
Contributor

jcupitt commented Jan 19, 2014

Strange I see quite a large difference with jpeg here:

$ time vips jpegload --access random theo.jpg x.v --vips-leak
memory: high-water mark 80.57 MB
real    0m0.379s
user    0m0.287s
sys 0m0.154s
$ time vips jpegload --access sequential theo.jpg x.v --vips-leak
memory: high-water mark 27.00 MB
real    0m1.010s
user    0m0.290s
sys 0m0.178s

That's with a 6k x 4k RGB jpeg. Are you doing the shrink-on-load thing now? Perhaps that's masking the difference.

The time difference is small with a png, but the memory saving is useful:

$ time vips pngload --access random theo.png x.v --vips-leak
memory: high-water mark 79.01 MB
real    0m0.737s
user    0m0.721s
sys 0m0.176s
$ time vips pngload --access sequential theo.png x.v --vips-leak
memory: high-water mark 22.84 MB
real    0m0.727s
user    0m0.707s
sys 0m0.165s

@lovell
Copy link
Owner Author

lovell commented Jan 19, 2014

Thanks John - like you I saw lower memory usage with sequential.

Use, or non-use, of mmap for these access methods could explain the performance difference, especially between different architectures/OSes (I'm using x64 Ubuntu 13.10, Linux kernel 3.11.0-15).

I'll expose access method as a pass-through option, defaulting to random to stay in line with libvips.

l may also try firing up a few different EC2 instances to see which method performs best in a virtualised environment.

@lovell lovell closed this as completed in d509458 Jan 19, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants