Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Meta] Performance bottlenecks #257

Closed
2 tasks
nwin opened this issue Jan 22, 2015 · 13 comments
Closed
2 tasks

[Meta] Performance bottlenecks #257

nwin opened this issue Jan 22, 2015 · 13 comments

Comments

@nwin
Copy link
Contributor

nwin commented Jan 22, 2015

Since it gets mentioned relatively often, this bug should collect all known performance issues.

  • Inflate implementation
  • PNG encoder (filter selection)
@softprops
Copy link

Has any progress been made on this? I'm trying to get a sense of how it may compare performance wise to ImageMagick

@baadc0de
Copy link

baadc0de commented Jan 3, 2017

Sorry if this is the wrong issue to report/ask about this, but is it really ok for the default new path to be so much slower as opposed to from_raw?

#[test]
fn allocate_gray_image_1g_raw() {
	let mut img = GrayImage::from_raw(32768, 32768, Vec::with_capacity(32768 * 32768));
}

#[test]
fn allocate_gray_image_1g() {
	let mut img = GrayImage::new(32768, 32768);
}
  • image 0.10.4
  • rustc 1.15.0-nightly (03bdaade2 2016-11-27)

@LaylBongers
Copy link

Vec::with_capacity won't give you a Vec with actual values in it, it will just have the capacity to hold that amount of values. To create a properly initialized Vec you need to do this: vec![0; 32768 * 32768]

@baadc0de
Copy link

baadc0de commented Jan 3, 2017

Apologies :) I thought it must have been something like that... I'll explore more to see what would be the idiomatic rust way to allocate memory once instead of runtime resizing. Thanks for commenting!

@cdekok
Copy link

cdekok commented Jul 22, 2017

I stumbled on servo on this issue it's quite old servo/servo#8271
When I ran the benchmark script on 0.14.0 I got some errors on the read scan lines script not sure why.. it looks like the functionality is no longer supported?

Am not sure if the test is done correctly but the difference is huge.

Compiling the libjpeg-turbo decode scanlines (500 times) benchmark
gcc libjpeg_bench.c -o bench -O1 -std=c99 -ljpeg
Running the libjpeg-turbo decode scanlines (500 times) benchmarks
The libjpeg-turbo decode scanlines (500 times) benchmark took 3.34326982498 seconds to run. [images/nightshot.jpg]
The libjpeg-turbo decode scanlines (500 times) benchmark took 1.4089140892 seconds to run. [images/lena.jpg]
The libjpeg-turbo decode scanlines (500 times) benchmark took 2.33788299561 seconds to run. [images/artificial.jpg]
The libjpeg-turbo decode scanlines (500 times) benchmark took 0.812776088715 seconds to run. [images/rust_logo.jpg]
The libjpeg-turbo decode scanlines (500 times) benchmark took 1.2770421505 seconds to run. [images/pumpkin.jpg]
Compiling the Rust-Image JPEG Decode scanlines (500 times) benchmark
   Compiling rustimage_bench v0.1.0 (file:///home/chris/www/ImageBench/benchmarks/rustimage_bench)
    Finished release [optimized] target(s) in 10.46 secs
Running the Rust-Image JPEG Decode scanlines (500 times) benchmarks
thread 'main' panicked at 'not yet implemented', /home/chris/.cargo/registry/src/github.com-1ecc6299db9ec823/image-0.14.0/./src/jpeg/decoder.rs:59:8
note: Run with `RUST_BACKTRACE=1` for a backtrace.
The Rust-Image JPEG Decode scanlines (500 times) benchmark took 0.00345993041992 seconds to run. [images/nightshot.jpg]
thread 'main' panicked at 'not yet implemented', /home/chris/.cargo/registry/src/github.com-1ecc6299db9ec823/image-0.14.0/./src/jpeg/decoder.rs:59:8
note: Run with `RUST_BACKTRACE=1` for a backtrace.
The Rust-Image JPEG Decode scanlines (500 times) benchmark took 0.00261187553406 seconds to run. [images/lena.jpg]
thread 'main' panicked at 'not yet implemented', /home/chris/.cargo/registry/src/github.com-1ecc6299db9ec823/image-0.14.0/./src/jpeg/decoder.rs:59:8
note: Run with `RUST_BACKTRACE=1` for a backtrace.
The Rust-Image JPEG Decode scanlines (500 times) benchmark took 0.00640392303467 seconds to run. [images/artificial.jpg]
thread 'main' panicked at 'not yet implemented', /home/chris/.cargo/registry/src/github.com-1ecc6299db9ec823/image-0.14.0/./src/jpeg/decoder.rs:59:8
note: Run with `RUST_BACKTRACE=1` for a backtrace.
The Rust-Image JPEG Decode scanlines (500 times) benchmark took 0.00419807434082 seconds to run. [images/rust_logo.jpg]
thread 'main' panicked at 'not yet implemented', /home/chris/.cargo/registry/src/github.com-1ecc6299db9ec823/image-0.14.0/./src/jpeg/decoder.rs:59:8
note: Run with `RUST_BACKTRACE=1` for a backtrace.
The Rust-Image JPEG Decode scanlines (500 times) benchmark took 0.00215101242065 seconds to run. [images/pumpkin.jpg]
Compiling the Rust-Image JPEG Decode full (500 times) benchmark
    Finished release [optimized] target(s) in 0.0 secs
Running the Rust-Image JPEG Decode full (500 times) benchmarks
The Rust-Image JPEG Decode full (500 times) benchmark took 11.4673569202 seconds to run. [images/nightshot.jpg]
The Rust-Image JPEG Decode full (500 times) benchmark took 3.28688502312 seconds to run. [images/lena.jpg]
The Rust-Image JPEG Decode full (500 times) benchmark took 6.14310908318 seconds to run. [images/artificial.jpg]
The Rust-Image JPEG Decode full (500 times) benchmark took 3.10858297348 seconds to run. [images/rust_logo.jpg]
The Rust-Image JPEG Decode full (500 times) benchmark took 3.23516893387 seconds to run. [images/pumpkin.jpg]

@nwin
Copy link
Contributor Author

nwin commented Jul 24, 2017

The scanlines have been removed, that interface doesn’t make much sense as some image formats are using tiles (like jpeg for example) so the naturally don’t come in scanlines.

Regarding the performance, looks like a factor of 2-4, I expected worse for unoptimized code.

@oyvindln
Copy link
Contributor

oyvindln commented Jul 24, 2017

jpeg-turbo uses hand-written assembly and SIMD to make the mathematical operations fast, that's a bit difficult to compete with on stable rust at this time.

@cdekok
Copy link

cdekok commented Jul 24, 2017

Yes for unoptimized code it's not too bad but I was researching this library to be used as image server it would need to be as fast as possible but it seems not really mature for it yet.
Perhaps I can try to write a small benchmark against vips for some thumbnailing https://github.com/jcupitt/libvips/wiki/Speed-and-memory-use am curious to see how well it would perform against it.

@oyvindln
Copy link
Contributor

oyvindln commented Aug 23, 2017

For jpeg (and other formats doing similar colour conversions and transforms) we really need some way of using SIMD on stable rust to get close to the C implementations. E.g according to valgrind, this function in jpeg was where 20% of the time spent (possibly more as it uses multiple threads). With SIMD Vectors one could do the calculations on multiple components or multiple pixels in parallel.

@OtaK
Copy link

OtaK commented Jul 13, 2018

Since SIMD has landed in Rust stable, what's the status of this issue?

@AndreKR
Copy link

AndreKR commented Apr 5, 2020

I compared Go's JPEG encoder with Rust's JPEG encoder and Go's is quite a bit faster:

Go:

395.0226ms
381.0218ms
399.0228ms
378.0216ms
376.0215ms

Rust (--release):

494 ms
460 ms
500 ms
483 ms
486 ms

I manually tuned the quality setting so that the file size is approximately equal (around 409 kB). This is at 75 in Go and 66 in Rust. The visual quality seems pretty much identical with my test photo.

Go's JPEG encoder does not use any SIMD instructions (unless the compiler uses them automatically, which I don't think it does): https://golang.org/src/image/jpeg/writer.go

Note that Go's encoder does not write proper headers (which means no density values), which is a problem, but I don't think that causes the difference in performance.

Go code
package main

import (
	"fmt"
	"image/jpeg"
	"image/png"
	"log"
	"os"
	"time"
)

func main() {
	f, err := os.Open("example.png")
	check(err)
	i, err := png.Decode(f)
	check(err)
	_ = f.Close()
	for {
		f, err := os.Create("go-temp.jpg")
		check(err)
		start := time.Now()
		err = jpeg.Encode(f, i, &jpeg.Options{Quality: 75})
		check(err)
		err = f.Close()
		check(err)
		fmt.Println(time.Since(start))
		err = os.Rename("go-temp.jpg", "go.jpg")
		check(err)
	}
}

func check(err error) {
	if err != nil {
		log.Fatalln(err)
	}
}
Rust code
use image::ImageDecoder;
use std::io::Write;
use std::time::Instant;

fn main() {
    let decoder = image::png::PngDecoder::new(std::fs::File::open("example.png").unwrap()).unwrap();
    let mut img: Vec<u8> = vec![0; decoder.total_bytes() as usize];
    let (width, height) = decoder.dimensions();
    let color_type = decoder.color_type();
    decoder.read_image(img.as_mut()).unwrap();

    loop {
        let start = Instant::now();
        {
            let w = std::fs::File::create("rust-temp.jpg").unwrap();
            let mut bw = std::io::BufWriter::new(w);
            let mut encoder = image::jpeg::JPEGEncoder::new_with_quality(&mut bw, 66);
            encoder.encode(&img, width, height, color_type).unwrap();
            bw.flush().unwrap();
        }
        println!("{} ms", start.elapsed().as_millis());
        std::fs::rename("rust-temp.jpg", "rust.jpg").unwrap();
    }
}

@nlfiedler
Copy link

I'm using image crate to generate thumbnails, and yes, it is a bit on the slow side. I don't know anything, but one point I noticed while looking through libvips (which may be fast, but crashes quite easily) is that it will use "shrink-on-load" features to create thumbnails from jpeg images.

https://github.com/libvips/libvips/wiki/HOWTO----Image-shrinking

Hope that helps (or you already knew this and this was no help at all).

@fintelia
Copy link
Contributor

There's been many performance optimizations since this issue was created. Please create dedicated issues if there's specific bottlenecks you notice that aren't being tracked already

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests