preview: QtPreview: Copy source to a temporary line buffer for speed #471

njhollinghurst · 2023-02-14T16:08:48Z

This simple change reduces viewfinder CPU-per-frame by a factor of at least 2 (tested on both ARM32 and ARM64).
Though where preview rate was limited by CPU it might not actually reduce total cycles (but might improve frame rate).

This reduces the benefit of the previous "U,V sharing" optimization, but the latter has been retained, to reduce code churn.

Further optimizations such as vectorizing the YUV->RGB conversion in signed 16-bit (rather than float) would yield diminishing returns and are not planned. If we really wanted further speedup we should arrange for a correctly-sized RGB888 source image!

Signed-off-by: Nick Hollinghurst <nick.hollinghurst@raspberrypi.com>

6by9 · 2023-02-14T16:22:20Z

@naushir was looking at whether using allocations from the /dev/dma_heap/linux,cma dma heap gave the benefits desired (namely CPU cached memory) without jumping through the hoops of copying the data manually.

He'd made comment that it was relatively straightforward, but I know little more (we already jump through hoops allocating the buffers with REQ_BUFS, export them all, REQBUFS(0), and then switch to DMABUF import mode).

njhollinghurst · 2023-02-14T16:37:45Z

Yes. The benefit may depend on how the pixels are accessed. For some access patterns it might be better to keep it out of the cache (to avoid evicting other stuff) -- unless ARM can detect that it's being "streamed" and magically confine it to a small part of the cache?

x264 (for example) seems to do a global copy of each incoming frame. I'm not sure if that's essential (due to stride/padding requirements?) or if it can be bypassed; in the latter case a cacheable buffer might make sense.

A H/W codec would presumably not care if the buffers were cacheable, provided they were never dirty.

naushir · 2023-02-15T16:15:27Z

I think this is worth putting in now until we get the dma heap framebuffer allocations work in.

preview: QtPreview: Copy source to a temporary line buffer for speed

07e8de0

Signed-off-by: Nick Hollinghurst <nick.hollinghurst@raspberrypi.com>

naushir merged commit e90d21d into main Feb 15, 2023

njhollinghurst deleted the preview-memcpy branch February 15, 2023 16:24

naushir mentioned this pull request Feb 23, 2023

Question: why is DRM preview failing if it is not master #475

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

preview: QtPreview: Copy source to a temporary line buffer for speed #471

preview: QtPreview: Copy source to a temporary line buffer for speed #471

njhollinghurst commented Feb 14, 2023

6by9 commented Feb 14, 2023

njhollinghurst commented Feb 14, 2023 •

edited

Loading

naushir commented Feb 15, 2023

preview: QtPreview: Copy source to a temporary line buffer for speed #471

preview: QtPreview: Copy source to a temporary line buffer for speed #471

Conversation

njhollinghurst commented Feb 14, 2023

6by9 commented Feb 14, 2023

njhollinghurst commented Feb 14, 2023 • edited Loading

naushir commented Feb 15, 2023

njhollinghurst commented Feb 14, 2023 •

edited

Loading