-
Notifications
You must be signed in to change notification settings - Fork 25.7k
[BE] Update cudnn to 9.10.1.4 #155122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BE] Update cudnn to 9.10.1.4 #155122
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/155122
Note: Links to docs will display an error until the docs builds have been completed. ❌ 7 New Failures, 202 Pending, 1 Unrelated FailureAs of commit 41b4572 with merge base da1f898 ( NEW FAILURES - The following jobs have failed:
BROKEN TRUNK - The following job failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
08f097e
to
f9d8019
Compare
@nWEIdia We also need to update cusparseLt now that the PyTorch supports the newer API. I recommend updating 12.6 and 12.8 to 0.7.1 to minimize the number of minor versions we need to support |
@atalman do you mind uploading wheels in question? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
nvidia_cudnn_cu12-9.10.1.4 files where uploaded to the index |
f9d8019
to
a3bfbf7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need similar torch inductor changes in benchmarks/dynamo/ci_expected_accuracy/inductor_torchbench_training.csv as in the previous PR? Or this cudnn version is better?
Perhaps also add inductor labels.
I think I remember another PR raised the tolerance so we might be okay! #154109 fingers crossed! |
@pytorchbot merge -i |
Merge startedYour change will be merged while ignoring the following 4 checks: docker-builds / docker-build (linux.12xlarge, pytorch-linux-jammy-py3-clang12-executorch), pull / linux-focal-py3.13-clang10 / test (crossref, 2, 2, linux.2xlarge), pull / linux-focal-py3.13-clang10 / test (default, 2, 5, linux.4xlarge), windows-arm64-binary-libtorch-release / libtorch-cpu-shared-with-deps-release-build Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command |
@pytorchbot merge -I |
❌ 🤖 pytorchbot command failed:
Try |
@pytorchbot merge -i |
Merge startedYour change will be merged while ignoring the following 6 checks: Build manywheel docker images for s390x / build-docker-cpu-s390x, docker-builds / docker-build (linux.12xlarge, pytorch-linux-jammy-py3-clang12-executorch), pull / linux-jammy-py3-clang12-executorch / build, pull / linux-focal-py3.13-clang10 / test (default, 2, 5, linux.4xlarge), pull / linux-focal-py3.13-clang10 / test (crossref, 2, 2, linux.2xlarge), windows-arm64-binary-libtorch-release / libtorch-cpu-shared-with-deps-release-build Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: 3 jobs have failed, first few of them are: linux-binary-libtorch / libtorch-rocm6_4-shared-with-deps-release-build / build, linux-binary-manywheel / manywheel-py3_9-rocm6_4-build / build, linux-binary-manywheel / manywheel-py3_10-rocm6_4-build / build Details for Dev Infra teamRaised by workflow job |
@nWEIdia Should we merge this or go ahead with the newer version. Seems like the newer version is a bugfix release so I don't see the harm. |
@Skylion007 Going step by step is a better idea. Yes, let's land this PR first with 9.10.1.4. |
@pytorchbot merge -i |
Merge startedYour change will be merged while ignoring the following 11 checks: pull / linux-jammy-py3-clang12-executorch / build, trunk / linux-jammy-rocm-py3.10 / test (default, 1, 2, linux.rocm.gpu.2), linux-binary-libtorch / libtorch-rocm6_4-shared-with-deps-release-build / build, linux-binary-manywheel / manywheel-py3_9-rocm6_4-build / build, linux-binary-manywheel / manywheel-py3_10-rocm6_4-build / build, linux-binary-manywheel / manywheel-py3_11-rocm6_4-build / build, linux-binary-manywheel / manywheel-py3_13t-rocm6_4-build / build, linux-binary-manywheel / manywheel-py3_13-rocm6_4-build / build, linux-binary-manywheel / manywheel-py3_12-rocm6_4-build / build, linux-binary-manywheel / manywheel-py3_13-xpu-test, linux-binary-manywheel / manywheel-py3_13t-xpu-test Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
@nWEIdia Oh whoops, yeah. Something went wrong with my last rebase, I"ll open a new PR |
Hi @Skylion007 please change title accordingly, this in not cudnn update PR but it changes 1 single test instead |
@pytorchmergebot revert -c nosignal -m "wrong pr description" |
@pytorchbot successfully started a revert job. Check the current status here. |
This reverts commit 73220d5. Reverted #155122 on behalf of https://github.com/atalman due to wrong pr description ([comment](#155122 (comment)))
@Skylion007 your PR has been successfully reverted. |
It looks like we no longer need this PR. Closing to save some resources. |
Follow up to #152782