Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resizing the Pixel Buffer #261

Closed
markusmoenig opened this issue Feb 8, 2022 · 9 comments
Closed

Resizing the Pixel Buffer #261

markusmoenig opened this issue Feb 8, 2022 · 9 comments
Labels
question Usability question

Comments

@markusmoenig
Copy link

Hi,

maybe a dumb question but is it possible to resize the pixel buffer when the window resizes ? I.e. keep the pixel buffer and the window at the same size ?

@parasyte
Copy link
Owner

parasyte commented Feb 8, 2022

There are multiple ways to resize the pixel buffer.

For instance, you might mean that you want the pixel buffer dimensions to physically change to match the dimensions of the window. E.g. changing the pixel buffer resolution. This is supported with the resize_buffer method.

Another thing you might mean is that you don't like the black border that is added when you resize the window, and instead you want the pixel buffer to be stretched to fill the empty space. Doing this is possible, but it has some minor complications, and there are also a number of ways to "fill the empty space". I'll enumerate some of the common ways that I am aware of:

  1. "Stretch to fit" means the image will be stretched to fit fully-contained within the window without clipping. This adds black "letterbox" or "pillarbox" borders to maintain the aspect ratio of the original image.
  2. "Stretch to fill" means the image will be stretched to completely fill the window, but its top/bottom or left/right sides will be clipped to maintain the aspect ratio without adding a border.
  3. "Stretch" means just stretch the image so all four of its corners are in all four corners of the window without clipping. This changes the aspect ratio and causes a "squashed" look with most window resolutions.

These methods can also be combined with changing the image resolution as in resize_buffer to add or subtract extra space in the image to fill in the empty space. You can see some examples of this here: http://melonjs.github.io/melonJS/docs/me.video.html#.init

If you want to do any of these, you can use a custom render pass for it. I just put one together for the "stretch to fit" method in this branch: https://github.com/parasyte/pixels/tree/fill-window/examples/fill-window I'll create a PR for it shortly.

I hope that's a good answer! It might have raised even more questions for you. But that's also because scaling can be complicated and there are a lot of ways to think about it.

@parasyte parasyte added the question Usability question label Feb 8, 2022
@markusmoenig
Copy link
Author

Hi

Wow, great answer thanks, I somehow missed the resize_buffer function, works fine. The different Stretch renderer all sound intriguing, looking forward to seeing the "stretch to fit" example.

I made an Xcode project which interfaces with my Rust lib and reads a pixel buffer every 60fps and blits it to the screen using Metal, meaning I can use Pixels to distribute my game on Windows / Linux and can use native Xcode / Metal to deploy it easily on macOS / iOS / tvOS.

Such a great setup! Thanks a lot for this. And for what I am doing (classic RPGs) the CPU is doing just fine.

@parasyte
Copy link
Owner

parasyte commented Feb 9, 2022

For "stretch to fit" change this line to max:

.min(screen_height / texture_height)

@parasyte
Copy link
Owner

parasyte commented Feb 9, 2022

FWIW, I just opened #262, even if it might not be exactly what you are looking for. It is superficially related, though. I don't know what you would like to do with this issue, now. Whether the question has been adequately answered, or if there is more I can do to help?

@markusmoenig
Copy link
Author

Thanks #262 looks great. For my use case #170 would make the biggest difference but I can understand that this is a lot of work.

Thanks again for the help and yes, we can close this issue.

@parasyte
Copy link
Owner

parasyte commented Feb 9, 2022

For my use case #170 would make the biggest difference but I can understand that this is a lot of work.

Yeah! It would be awesome to just do as much as possible on the GPU. I don't know exactly what it would look like in the scope of pixels though. 😭

I made an Xcode project which interfaces with my Rust lib and reads a pixel buffer every 60fps and blits it to the screen using Metal

Oh I forgot to respond to this earlier, but I do have a thought on this. pixels already uses Metal on macOS and iOS. I haven't done anything with iOS ever. But if it's anything like macOS, I think all you really need to do is provide Pixels with a surface that implements raw_window_handle. Then you don't need to do any extra copying.

But again, I don't have experience with iOS (or tvOS, if those are different for some reason). It might be that having two paths to the GPU with the same stack is the best approach, but I don't think it is necessary? In any case, I'm glad you got it to work and that you are happy with it!

Such a great setup! Thanks a lot for this. And for what I am doing (classic RPGs) the CPU is doing just fine.

I love hearing how it is used and seeing it in action is always validating. Feel free to share any project details if you like! It is absolutely worth creating a "showcase" issue for it. I'll pin the issue to guide more views there, too.

@markusmoenig
Copy link
Author

markusmoenig commented Feb 9, 2022

Re #170, what I understand what it means for pixels is that you can request areas on the target device which you can process separately (maybe even in different threads) instead of one big destination texture which is hard to split into processing parts.

Re Xcode, yeah it is strange, even though pixels uses Metal it is very hard to deploy it to the AppStores as a standalone package. The easiest way right now is just to bypass pixels completely and just give a pixel buffer to my library (which would normally be provided by pixels). The cool thing about Metal on Apple Silicon is that CPU / GPU memory is shared, i.e. copying stuff into the GPU is very fast. There may be a way to utilize pixels but using the Metal API you just have a way more efficient event loop, for example you can adjust the target FPS on the fly (i.e. only update the screen when something is happening). The way I see it pixels with the wgpu backend runs always on 60fps ? I tried the sleep approach you mentioned in another thread but the CPU usage was still at 35% or so for me, with a native event loop it is way way lower like at 2-3 %.

My project will be GPL based, will publish it when it is progressed far enough, will let you know!

@parasyte
Copy link
Owner

parasyte commented Feb 9, 2022

Re #170, what I understand what it means for pixels is that you can request areas on the target device which you can process separately (maybe even in different threads) instead of one big destination texture which is hard to split into processing parts.

Ah! That's actually really easy with rayon if you only care about vertically slicing the buffer. If you need to slice it into a 2D grid, then that is more work.

The way I see it pixels with the wgpu backend runs always on 60fps ? I tried the sleep approach you mentioned in another thread but the CPU usage was still at 35% or so for me, with a native event loop it is way way lower like at 2-3 %.

The FPS depends entirely on your event loop and the present mode for the GPU. Default configuration is to let the GPU block your event loop to maintain 60 FPS. But you can do more fine-grained control with your own sleeping to slow it down, or disable the GPU blocking and present frames as fast as your hardware will allow. I can't say for sure why you see a drastic improvement with a different event loop.

lol, I get 2,840 fps on my machine when I uncap minimal-winit! It's totally CPU-bound. Distributing the workload to multiple threads actually slows it down to about 2,500 fps. But there is probably a lot of false sharing in this example because the pixel buffer is so small, and my CPU has a relatively large 64 KiB L1 cache per core.

With a 1920x1080 buffer, a single thread can push about 500 fps over here. When I send multiple lines of the buffer (20 at a time) to each CPU core with rayon, I can get 970 fps out of it. Almost double!

It's not a linear speedup, though. I have 12 physical cores (24 logical). Amdahl's law strikes again! I'm only using 50% of my CPU power by dividing the pixel buffer work to all 24 logical cores. There's another bottleneck somewhere else. Perhaps in the guts of wgpu or winit. Or maybe even the NVIDIA driver. I haven't actually looked, but my curiosity is satisfied.

My conclusion is that multiple threads only helps when you can avoid false sharing (especially L1 cache clashing) with a large pixel buffer. And even when you don't have that issue, writing to the pixel buffer on the CPU-side is not actually the slowest part.


Here's the updated World::draw method using rayon:

fn draw(&self, frame: &mut [u8]) {
    // Number of lines to process per-thread
    // `HEIGHT` must be evenly divisible by this number
    const LINES: usize = 20;

    frame
        .par_chunks_exact_mut(WIDTH as usize * 4 * LINES)
        .enumerate()
        .for_each(|(j, line)| {
            for (i, pixel) in line.chunks_exact_mut(4).enumerate() {
                let i = j * WIDTH as usize * LINES + i;
                let x = (i % WIDTH as usize) as i16;
                let y = (i / WIDTH as usize) as i16;

                let inside_the_box = x >= self.box_x
                    && x < self.box_x + BOX_SIZE
                    && y >= self.box_y
                    && y < self.box_y + BOX_SIZE;

                let rgba = if inside_the_box {
                    [0x5e, 0x48, 0xe8, 0xff]
                } else {
                    [0x48, 0xb2, 0xe8, 0xff]
                };

                pixel.copy_from_slice(&rgba);
            }
        });
}

@markusmoenig
Copy link
Author

Thanks again for the info and the rayon code. I will look more into Rayon and the sleep() issue. I am coming from Swift / Metal and I am not 100% Rust native yet, working on it :)

But again, the main advantage using Xcode on the Mac side is that you can easily deploy to iOS / tvOS which would otherwise be a nightmare to get right (if at all possible for tvOS). And if you use Xcode you have a skeleton app which uses its own event loop, afaik it is not possible to have an external event loop / window inside an Xcode project (but I may be wrong here). Anyway its working thats all that matters.

Thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Usability question
Projects
None yet
Development

No branches or pull requests

2 participants