[cuDNN][SDPA] Handle noncontig nested tensors in cuDNN SDPA #164958
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Previously we hardcoded the assumption in cuDNN that the inputs would be dense which breaks when e.g., the user is chunking tensors yielding noncontig inputs
New test added to check this when
TORCH_CUDNN_SDPA_NESTED_TENSOR_ENABLED=1
is set intest/test_transformers.py
One issue I noticed was that the old gating of nested tensor in
sdp_utils.cpp
seems to be a no-op? All of the inputs are reported as "dense" by the time that function is called in the nested tensor tests intest/test_nestedtensor.py -k sdpa
cc @csarofeen @ptrblck @xwang233 @cpuhrsch @jbschlosser @bhosmer @drisspg @soulitzer @davidberard98 @YuqingJ