[LIBCLC][AMDGCN] Fix get_max_sub_group_size #5386
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Using defines to figure out the wavefront size there is incorrect
because libclc is not built for a specific amdgcn version, so it will
always default to
64
.Instead use the
__oclc_wavefront64
global variable provided by ROCm,which will be set to a different value depending on the architecture.
This may fix some of the discrepancies between tests being run on
gfx908
and the tests running on the CI, as the CI hardware uses a wavefront of 32 which mismatches with what was returned by this function, and that this function is used in the implementation of the group collectives.And so it may fix:
and potentially the discrepancies I was seeing on: