New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Read WorkgroupSize as whole vector from the spec constant value #59
Conversation
Tested on release and debug builds, and in hardware. |
LGTM |
No, not so fast. Doesn't work in some scenarios. My HW case wasn't right. |
324d1b8
to
4e1a86f
Compare
ff28a83
to
7f95b43
Compare
I did see this bug on NVIDIA, but after filing it their latest drivers have fixed the issue fwiw. |
Change the implementation of the get_local_size implementation to avoid some driver bugs: - One issue is trouble extracting individual elements. It's seems to work better to read the entire vector at once. - Another issue is that we have to extract directly from the spec composite value directly rather than from a variable initialized with that spec constant value.
Take the bitwise-and of the WorkgroupSize spec constant value *with itself* before extracting one of its components. This works around a driver bug. Added a TODO to remove this when we can.
7f95b43
to
964b936
Compare
Yes, I had the same experience. |
Rebased. I'd like to revisit this hack in the future. |
LGTM |
Change the implementation of the get_local_size implementation to
avoid some driver bugs:
work better to read the entire vector at once.
composite value directly rather than from a variable initialized with
that spec constant value.