Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

preview: QtPreview: Copy source to a temporary line buffer for speed #471

Merged
merged 1 commit into from
Feb 15, 2023

Conversation

njhollinghurst
Copy link
Collaborator

This simple change reduces viewfinder CPU-per-frame by a factor of at least 2 (tested on both ARM32 and ARM64).
Though where preview rate was limited by CPU it might not actually reduce total cycles (but might improve frame rate).

This reduces the benefit of the previous "U,V sharing" optimization, but the latter has been retained, to reduce code churn.

Further optimizations such as vectorizing the YUV->RGB conversion in signed 16-bit (rather than float) would yield diminishing returns and are not planned. If we really wanted further speedup we should arrange for a correctly-sized RGB888 source image!

Signed-off-by: Nick Hollinghurst <nick.hollinghurst@raspberrypi.com>
@6by9
Copy link
Collaborator

6by9 commented Feb 14, 2023

@naushir was looking at whether using allocations from the /dev/dma_heap/linux,cma dma heap gave the benefits desired (namely CPU cached memory) without jumping through the hoops of copying the data manually.

He'd made comment that it was relatively straightforward, but I know little more (we already jump through hoops allocating the buffers with REQ_BUFS, export them all, REQBUFS(0), and then switch to DMABUF import mode).

@njhollinghurst
Copy link
Collaborator Author

njhollinghurst commented Feb 14, 2023

Yes. The benefit may depend on how the pixels are accessed. For some access patterns it might be better to keep it out of the cache (to avoid evicting other stuff) -- unless ARM can detect that it's being "streamed" and magically confine it to a small part of the cache?

x264 (for example) seems to do a global copy of each incoming frame. I'm not sure if that's essential (due to stride/padding requirements?) or if it can be bypassed; in the latter case a cacheable buffer might make sense.

A H/W codec would presumably not care if the buffers were cacheable, provided they were never dirty.

@naushir
Copy link
Collaborator

naushir commented Feb 15, 2023

I think this is worth putting in now until we get the dma heap framebuffer allocations work in.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants