Dont mutate tensor stride in place in cudnn conv #126786

eellison · 2024-05-21T15:59:37Z

Stack from ghstack (oldest at bottom):

-> Dont mutate tensor stride in place in cudnn conv #126786

Within the cudnn convolution, we were in-place updating the strides of the tensor to disambiguate for size-1 dims and contiguous and channels last tensors. Instead of mutating the tensors stride, just use a temporary. Inside cudnn it is then copied: https://github.com/NVIDIA/cudnn-frontend/blob/d7ccb5b3c47b4de709604cce463ad66b775b7812/include/cudnn_frontend_Tensor.h#L201-L203.

[ghstack-poisoned]

pytorch-bot · 2024-05-21T15:59:40Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/126786

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 0734941 with merge base 7e166e8 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ghstack-source-id: 5109bc89093fb2e0dca22675133c3ffbf9ee8633 Pull Request resolved: #126786

eqy

Thanks!

Does this need to also be fixed for legacy API use-cases that are creating TensorDescriptors by passing Tensors directly and calling .stride() on them? e.g.,

pytorch/aten/src/ATen/cudnn/Descriptors.cpp

Line 38 in d8f5627

    
           void TensorDescriptor::set(const at::Tensor &t, at::MemoryFormat memory_format, size_t pad) {

eellison · 2024-05-21T18:17:24Z

What are the callsites of those apis ? Do they invoke fixSizeOneDimStride ?

eellison · 2024-05-21T18:28:02Z

We don't compile RNNs in torch.compile - would prefer separate issue/fix for the other legacy callsites.

eqy · 2024-05-21T18:37:35Z

@eellison Yes I think overloads of set() bottom out on the variant here

pytorch/aten/src/ATen/cudnn/Descriptors.h

Line 169 in b40fb2d

    
           void set(cudnnDataType_t dataType, int dim, int* size, int* stride, bool nhwc) {

that calls fixSizeOneDimStride. The callsites are IIRC RNN, Conv v7, and maybe other legacy API places like batchnorm?

Fix for #126241. Within the cudnn convolution, we were in-place updating the strides of the tensor to disambiguate for size-1 dims and contiguous and channels last tensors. Instead of mutating the tensors stride, just use a temporary. Inside cudnn it is then copied: https://github.com/NVIDIA/cudnn-frontend/blob/d7ccb5b3c47b4de709604cce463ad66b775b7812/include/cudnn_frontend_Tensor.h#L201-L203. [ghstack-poisoned]

ghstack-source-id: 3de7eb31768a05fe2e9686b1b2c183fa8aee31c6 Pull Request resolved: #126786

Fix for #126241. Within the cudnn convolution, we were in-place updating the strides of the tensor to disambiguate for size-1 dims and contiguous and channels last tensors. Instead of mutating the tensors stride, just use a temporary. Inside cudnn it is then copied: https://github.com/NVIDIA/cudnn-frontend/blob/d7ccb5b3c47b4de709604cce463ad66b775b7812/include/cudnn_frontend_Tensor.h#L201-L203. [ghstack-poisoned]

ghstack-source-id: bf7ad7fcf19506b9387511d895e4ce412903a385 Pull Request resolved: #126786

ezyang

This is fine as bandaid but better would be to refactor call sites

eellison · 2024-05-21T23:40:06Z

@pytorchbot merge

pytorchmergebot · 2024-05-21T23:42:01Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Dont mutate tensor stride in place in cudnn conv

ab6145d

[ghstack-poisoned]

eellison requested a review from eqy as a code owner May 21, 2024 15:59

eellison added a commit that referenced this pull request May 21, 2024

Dont mutate tensor stride in place in cudnn conv

83e059c

ghstack-source-id: 5109bc89093fb2e0dca22675133c3ffbf9ee8633 Pull Request resolved: #126786

eellison requested review from shunting314, ezyang and anijain2305 May 21, 2024 16:01

eqy reviewed May 21, 2024

View reviewed changes

eellison added a commit that referenced this pull request May 21, 2024

Dont mutate tensor stride in place in cudnn conv

5f7a88f

ghstack-source-id: 3de7eb31768a05fe2e9686b1b2c183fa8aee31c6 Pull Request resolved: #126786

eellison requested a review from eqy May 21, 2024 19:10

eellison added a commit that referenced this pull request May 21, 2024

Dont mutate tensor stride in place in cudnn conv

98f22bb

ghstack-source-id: bf7ad7fcf19506b9387511d895e4ce412903a385 Pull Request resolved: #126786

ezyang approved these changes May 21, 2024

View reviewed changes

shunting314 approved these changes May 21, 2024

View reviewed changes

eqy approved these changes May 21, 2024

View reviewed changes

eellison added the topic: not user facing topic category label May 21, 2024

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label May 21, 2024

pytorchmergebot added the merging label May 21, 2024

pytorchmergebot added the Merged label May 22, 2024

pytorchmergebot closed this in 28f29e0 May 22, 2024

pytorchmergebot removed the merging label May 22, 2024

github-actions bot deleted the gh/eellison/654/head branch June 22, 2024 05:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dont mutate tensor stride in place in cudnn conv #126786

Dont mutate tensor stride in place in cudnn conv #126786

eellison commented May 21, 2024 •

edited

Loading

pytorch-bot bot commented May 21, 2024 •

edited

Loading

eqy left a comment

eellison commented May 21, 2024 •

edited

Loading

eellison commented May 21, 2024

eqy commented May 21, 2024

ezyang left a comment

eellison commented May 21, 2024

pytorchmergebot commented May 21, 2024

Dont mutate tensor stride in place in cudnn conv #126786

Dont mutate tensor stride in place in cudnn conv #126786

Conversation

eellison commented May 21, 2024 • edited Loading

pytorch-bot bot commented May 21, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/126786

✅ No Failures

eqy left a comment

Choose a reason for hiding this comment

eellison commented May 21, 2024 • edited Loading

eellison commented May 21, 2024

eqy commented May 21, 2024

ezyang left a comment

Choose a reason for hiding this comment

eellison commented May 21, 2024

pytorchmergebot commented May 21, 2024

Merge started

eellison commented May 21, 2024 •

edited

Loading

pytorch-bot bot commented May 21, 2024 •

edited

Loading

eellison commented May 21, 2024 •

edited

Loading