New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The performance between libjpeg-turbo and libjpeg #222

Closed
redirus opened this Issue Mar 22, 2018 · 6 comments

Comments

Projects
None yet
3 participants
@redirus

redirus commented Mar 22, 2018

Hi, When I use imread function to decode image in OpenCV, is the performance better using libjpeg-turbo than libjpeg in Linux system like centos?

What's the difference between libjpegturbo.so and libjpeg.so after I build the libjpeg-turbo?

Any test data in performance?

Thanks

@dcommander

This comment has been minimized.

Member

dcommander commented Mar 22, 2018

I can’t speak to the performance when using OpenCV, but the raw, low-level performance of libjpeg-turbo is 2-7 times faster than libjpeg on x86[-64] CPUs.

There are two libraries:

libjpeg.so provides the libjpeg API, which is backward compatible with the IJG’s software (specifically, jpeg-6b with the addition of arithmetic coding, the in-memory source and destination managers, and additional decompression scaling factors back-ported from jpeg-7 and jpeg-8.)

libturbojpeg.so (not libjpegturbo.so) provides the TurboJPEG API, a higher-level API that is easier to use than the libjpeg API but not as powerful. TurboJPEG requires that both source and destination images be fully memory-resident— it doesn’t support buffered I/O or custom source/destination managers like the libjpeg API does, not does it currently support partial image decompression. It does, however, provide a more straightforward interface for doing lossless transforms of a JPEG image or for compression from/decompressing to planar YUV buffers.

Our Java interface is built upon the TurboJPEG API, which itself is just a wrapper that calls the libjpeg API behind the scenes. Thus, both APIs perform similarly, assuming that they are used with the same images and similar data flow.

@dcommander dcommander closed this Mar 22, 2018

@dcommander

This comment has been minimized.

Member

dcommander commented Mar 22, 2018

Benchmark data is here:
https://libjpeg-turbo.org/About/Performance

That data is applicable for the current stable releases (1.5.x.) The upcoming beta, which I hope to release in the next week, will be faster on newer CPUs, since it uses AVX2 instructions when available.

@redirus

This comment has been minimized.

redirus commented Mar 22, 2018

@dcommander Thank you for your reply.

When I used libjpeg-turbo and libjpeg respectively in opencv, I found that the consuming time are almost the same when I call imread function to decode image.

I wonder why this happened.

@dcommander dcommander added the question label Mar 22, 2018

@dcommander

This comment has been minimized.

Member

dcommander commented Mar 22, 2018

Just glancing at the OpenCV source code, a couple of things stand out:

  1. They have an in-tree version of libjpeg (not -turbo), so make sure that your OpenCV build is linking against the external libjpeg-turbo library and not the in-tree libjpeg library.
  2. They are doing expensive and unnecessary per-pixel conversion between RGB and BGR:
    https://github.com/opencv/opencv/blob/83b8cd0152332ef95ee44e6c91097232f2bdd2c0/modules/imgcodecs/src/grfmt_jpeg.cpp#L455-L456
    https://github.com/opencv/opencv/blob/83b8cd0152332ef95ee44e6c91097232f2bdd2c0/modules/imgcodecs/src/grfmt_jpeg.cpp#L690-L694
    https://github.com/opencv/opencv/blob/7763b58a602699ccb4980457880648b83ab3892c/modules/imgcodecs/src/utils.cpp#L222-L236
    This sort of pixel conversion is necessary with libjpeg but not with libjpeg-turbo, because of our colorspace extensions, and it's entirely possible that it is hiding the speedup of libjpeg-turbo. OpenCV should be extended to use the libjpeg-turbo colorspace extensions, when they are available.

I haven't looked closely at their source and destination managers, but those may be sub-optimal as well. Optimizing downstream libraries such as OpenCV is certainly something I can do under paid contract, but barring that, you should contact the OpenCV developers.

@nzjrs

This comment has been minimized.

nzjrs commented Apr 17, 2018

FYI: we did quite an extensive performance comparison on libjpeg vs. libjpeg-turbo http://blog.loopbio.com/video-io-2-jpeg-decoding.html and concluded much the same as you did. OpenCV is suboptimal and the new version of libjpeg-turbo is very fast.

@dcommander

This comment has been minimized.

Member

dcommander commented Apr 17, 2018

Great article! Something else you should be aware of is that libjpeg-turbo is currently under consideration for becoming an official ISO/ITU-T reference implementation (see #148). The format extensions that we refused to adopt from jpeg-8 and jpeg-9 ("SmartScale") were never accepted by ISO/ITU-T, so images containing those extensions are not, strictly speaking, "JPEG images" in the sense of conforming to the official specs. My own research (https://libjpeg-turbo.org/About/SmartScale) did not show the SmartScale extensions to be particularly useful. Also, the developer community in general has been reluctant to embrace any format extensions outside of JFIF v1 with Huffman entropy coding, so even though arithmetic coding is an official part of the spec, has been supported by both libjpeg and libjpeg-turbo for many years, and is no longer encumbered by a patent, few applications will read or write arithmetic-coded JPEG images. Thus, even if SmartScale extensions were to become officially recognized by the standards bodies (which they won't as long as the author of those extensions continues to pursue the matter quixotically rather than politically), the adoption rate of those extensions within the community would likely approach zero for the foreseeable future. Apart from those unofficial extensions, the only format feature that libjpeg supports and we don't is wide gamut colorspaces. That feature is an official part of the spec (from JFIF v2), so there's nothing preventing us from supporting it as well, but it would take some effort to integrate, and it wouldn't be fully SIMD-accelerated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment