New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unaligned copies on 8/16 bpp formats on DX11 #2318

fkaa opened this Issue Aug 12, 2018 · 0 comments


None yet
1 participant
Copy link

fkaa commented Aug 12, 2018

The copy shaders at shaders\copy.hlsl work in 32bit strides and load 1 uint per thread. The addressing logic is also quite simple, and doesn't really work on formats that are smaller than 32bpp when the x offset and/or width is not aligned to 4.

Currently these smaller sized formats dispatch calls are scaled down so that they fit with the "1 load per thread", storing and reading 4/2 texels at a time, but a potential solution could be to not scale at all and "waste" 24 bits of bandwidth and do 1 load per texel, like the other format copies.

Note that raw buffers support unaligned 32bit loads at byte granularity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment