Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow under macOS #28

Closed
lamarqua opened this issue Aug 29, 2016 · 12 comments
Closed

Slow under macOS #28

lamarqua opened this issue Aug 29, 2016 · 12 comments
Labels

Comments

@lamarqua
Copy link

lamarqua commented Aug 29, 2016

I've found minifb to be pretty slow under OS X - my machine is a Macbook Pro (2.5GHz) running El Capitan.

A simple blank test:

extern crate minifb;
extern crate time;

const WIDTH: usize = 320;
const HEIGHT: usize = 200;

fn main() {

    let mut buffer: Vec<u32> = vec![0; WIDTH * HEIGHT];

    let mut window = match minifb::Window::new("Test - ESC to exit", WIDTH, HEIGHT,
                                               minifb::WindowOptions { scale: minifb::Scale::X4, borderless : false, title: true, resize: false }) {
        Ok(win) => win,
        Err(err) => {
            println!("Unable to create window {}", err);
            return;
        }
    };


    let mut last_time = time::PreciseTime::now();
    while window.is_open() && !window.is_key_down(minifb::Key::Escape) {
        window.update_with_buffer(&buffer);
        let now = time::PreciseTime::now();
        println!("Frame time: {}", last_time.to(now).num_milliseconds());
        last_time = now;
    }
}

Gives me around 42 ms per frame already!

Frame time: 44
Frame time: 44
Frame time: 46
Frame time: 49
Frame time: 43
Frame time: 42
Frame time: 44
Frame time: 41
Frame time: 43
Frame time: 43

It seems that the Scale::X4 is the culprit. With X1, I get better results, albeit not ideal:

Frame time: 23
Frame time: 6
Frame time: 19
Frame time: 22
Frame time: 11
Frame time: 16
Frame time: 17
Frame time: 15
Frame time: 17
Frame time: 22
Frame time: 11
Frame time: 16

There is huge variability within frame times.
I get much better result from simple drawing code using either glium or SDL2 bindings, but I much prefer the simplicity of minifb.

@emoon
Copy link
Owner

emoon commented Aug 29, 2016

Hum... I'm not sure why this would be the case. I pretty much only use system/API calls to draw this in here https://github.com/emoon/rust_minifb/blob/master/src/native/macosx/OSXWindowFrameView.m#L25

I figured that this would be pretty fast but maybe that isn't.

@lamarqua
Copy link
Author

I'm not super comfortable with OS X (despite using it as a dev platform...) but I'll see if I can investigate this with a profiler.

@emoon
Copy link
Owner

emoon commented Aug 29, 2016

Using Instruments (that comes with XCode) should be able to trace down into the calls of a running process so that will likely give a hint on whats going on but I suspect some of the operations I would think should be on the GPU happens on the CPU instead (like the scaling)

@jedahan
Copy link

jedahan commented Aug 29, 2016

This is an old flamegraph I made while testing performance with minifb : https://github.com/jedahan/rustboy/blob/master/pretty-graph.svg might be some help, might not.

@jedahan
Copy link

jedahan commented Aug 29, 2016

It seems the majority of time in update_with_framebuffer is argb32_image_mark_rgb24, which from this stack overflow is kinda slow http://stackoverflow.com/questions/21665473/coregraphics-argb32-image-mark-rgb24-is-slow . Maybe there is a way to create a 32-bit context so no conversion is needed.

@jedahan
Copy link

jedahan commented Aug 29, 2016

And a possible solution from http://stackoverflow.com/questions/33075557/avoiding-colorspace-transformations-when-blitting-mac-os-x-10-11-sdk#33076871

colorSpace = ::CGDisplayCopyColorSpace(::CGMainDisplayID());

if (!colorSpace)
    colorSpace = CGColorSpaceCreateDeviceRGB();

jedahan pushed a commit to jedahan/rust_minifb that referenced this issue Aug 29, 2016
This should avoid unnecessary calls to argb32_image_mark_rgb24,
which is kinda slow. See emoon#28.
jedahan pushed a commit to jedahan/rust_minifb that referenced this issue Aug 29, 2016
This should avoid unnecessary calls to argb32_image_mark_rgb24,
which is kinda slow. See emoon#28.
@jedahan
Copy link

jedahan commented Aug 29, 2016

In that benchmark from above, I got a ~25% total speedup, from average 44ms to 33ms with that patchset. Tested on my (retina) MacBookPro12,1.

@jedahan
Copy link

jedahan commented Aug 29, 2016

Following this post http://carol-nichols.com/2015/12/09/rust-profiling-on-osx-cpu-time/ , here are the flamegraphs with and without the change:

https://github.com/jedahan/scrap/blob/master/slow.svg

https://github.com/jedahan/scrap/blob/master/fast.svg

Theres a bit of a speedup, but argb32_image_mark_argb32 is still the slowest function...

@emoon
Copy link
Owner

emoon commented Aug 30, 2016

Thanks for the info. Yeah I wonder what that argb32_image_mark_argb32 is doing.

@emoon
Copy link
Owner

emoon commented Aug 30, 2016

Maybe it's actually doing the scaling inside this function (instead of actually doing it on GPU which would be 0 ms CPU time and very fast)

@emoon
Copy link
Owner

emoon commented Aug 30, 2016

To me it seems like the only way to get around it would be to implement this in OpenGL or Metal :/ Unless someone has a better idea.

@emoon
Copy link
Owner

emoon commented Oct 23, 2018

Hi, I have now rewritten the macOS backend to use Metal instead which should make this a lot faster. If you are still using minifb it would be great to hear if this improves things for you or not. I will close this for now but if it doesn't help please re-open this again

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants