[BE] Update cudnn to 9.10.1.4 #155122

Skylion007 · 2025-06-04T15:07:05Z

Follow up to #152782

pytorch-bot · 2025-06-04T15:07:09Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/155122

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 7 New Failures, 202 Pending, 1 Unrelated Failure

As of commit 41b4572 with merge base da1f898 ():

NEW FAILURES - The following jobs have failed:

linux-binary-libtorch / libtorch-rocm6_4-shared-with-deps-release-build / build (gh)
Process completed with exit code 1.
linux-binary-manywheel / manywheel-py3_10-rocm6_4-build / build (gh)
Process completed with exit code 1.
linux-binary-manywheel / manywheel-py3_11-rocm6_4-build / build (gh)
Process completed with exit code 1.
linux-binary-manywheel / manywheel-py3_12-rocm6_4-build / build (gh)
Process completed with exit code 1.
linux-binary-manywheel / manywheel-py3_13-rocm6_4-build / build (gh)
Process completed with exit code 1.
linux-binary-manywheel / manywheel-py3_13t-rocm6_4-build / build (gh)
Process completed with exit code 1.
linux-binary-manywheel / manywheel-py3_9-rocm6_4-build / build (gh)
Process completed with exit code 1.

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / cuda12.8-py3.10-gcc9-sm75 / test (pr_time_benchmarks, 1, 1, linux.g4dn.metal.nvidia.gpu) (gh) (trunk failure)
MISSING REGRESSION TEST

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

⏳ pull / linux-jammy-py3-clang12-executorch / build (gh) (#150261)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Skylion007 · 2025-06-04T15:41:21Z

@nWEIdia We also need to update cusparseLt now that the PyTorch supports the newer API. I recommend updating 12.6 and 12.8 to 0.7.1 to minimize the number of minor versions we need to support

malfet · 2025-06-04T15:46:27Z

@atalman do you mind uploading wheels in question?

atalman

lgtm

atalman · 2025-06-04T16:32:36Z

nvidia_cudnn_cu12-9.10.1.4 files where uploaded to the index

nWEIdia

Do we need similar torch inductor changes in benchmarks/dynamo/ci_expected_accuracy/inductor_torchbench_training.csv as in the previous PR? Or this cudnn version is better?

Perhaps also add inductor labels.

Skylion007 · 2025-06-04T18:34:28Z

Do we need similar torch inductor changes in benchmarks/dynamo/ci_expected_accuracy/inductor_torchbench_training.csv as in the previous PR? Or this cudnn version is better?

Perhaps also add inductor labels.

I think I remember another PR raised the tolerance so we might be okay! #154109 fingers crossed!

Skylion007 · 2025-06-04T19:59:57Z

@pytorchbot merge -i

pytorchmergebot · 2025-06-04T20:02:25Z

Merge started

Your change will be merged while ignoring the following 4 checks: docker-builds / docker-build (linux.12xlarge, pytorch-linux-jammy-py3-clang12-executorch), pull / linux-focal-py3.13-clang10 / test (crossref, 2, 2, linux.2xlarge), pull / linux-focal-py3.13-clang10 / test (default, 2, 5, linux.4xlarge), windows-arm64-binary-libtorch-release / libtorch-cpu-shared-with-deps-release-build

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2025-06-05T02:00:50Z

The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command
For more information see pytorch-bot wiki.

Skylion007 · 2025-06-05T13:55:46Z

@pytorchbot merge -I

pytorch-bot · 2025-06-05T13:55:49Z

❌ 🤖 pytorchbot command failed:

@pytorchbot: error: unrecognized arguments: -I

usage: @pytorchbot [-h] {merge,revert,rebase,label,drci,cherry-pick,close} ...

Try @pytorchbot --help for more info.

Skylion007 · 2025-06-05T13:56:14Z

@pytorchbot merge -i

pytorchmergebot · 2025-06-05T13:58:07Z

Merge started

Your change will be merged while ignoring the following 6 checks: Build manywheel docker images for s390x / build-docker-cpu-s390x, docker-builds / docker-build (linux.12xlarge, pytorch-linux-jammy-py3-clang12-executorch), pull / linux-jammy-py3-clang12-executorch / build, pull / linux-focal-py3.13-clang10 / test (default, 2, 5, linux.4xlarge), pull / linux-focal-py3.13-clang10 / test (crossref, 2, 2, linux.2xlarge), windows-arm64-binary-libtorch-release / libtorch-cpu-shared-with-deps-release-build

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2025-06-07T15:51:17Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2025-06-07T16:02:13Z

Merge failed

Reason: 3 jobs have failed, first few of them are: linux-binary-libtorch / libtorch-rocm6_4-shared-with-deps-release-build / build, linux-binary-manywheel / manywheel-py3_9-rocm6_4-build / build, linux-binary-manywheel / manywheel-py3_10-rocm6_4-build / build

Details for Dev Infra team

Raised by workflow job

nWEIdia · 2025-06-07T16:17:58Z

cc @eqy did we want to evaluate 9.10.2? In this PR or a separate one?

Skylion007 · 2025-06-10T16:37:37Z

@nWEIdia Should we merge this or go ahead with the newer version. Seems like the newer version is a bugfix release so I don't see the harm.

nWEIdia · 2025-06-10T16:45:54Z

@Skylion007 Going step by step is a better idea. Yes, let's land this PR first with 9.10.1.4.
And you are most welcome to create a follow up PR to 9.10.2 (9.10.2.21 as shown in this link), thanks!

Skylion007 · 2025-06-10T16:51:05Z

@pytorchbot merge -i

pytorchmergebot · 2025-06-10T16:53:03Z

Merge started

Your change will be merged while ignoring the following 11 checks: pull / linux-jammy-py3-clang12-executorch / build, trunk / linux-jammy-rocm-py3.10 / test (default, 1, 2, linux.rocm.gpu.2), linux-binary-libtorch / libtorch-rocm6_4-shared-with-deps-release-build / build, linux-binary-manywheel / manywheel-py3_9-rocm6_4-build / build, linux-binary-manywheel / manywheel-py3_10-rocm6_4-build / build, linux-binary-manywheel / manywheel-py3_11-rocm6_4-build / build, linux-binary-manywheel / manywheel-py3_13t-rocm6_4-build / build, linux-binary-manywheel / manywheel-py3_13-rocm6_4-build / build, linux-binary-manywheel / manywheel-py3_12-rocm6_4-build / build, linux-binary-manywheel / manywheel-py3_13-xpu-test, linux-binary-manywheel / manywheel-py3_13t-xpu-test

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

nWEIdia · 2025-06-10T16:56:14Z

Does this PR somehow become a no-op? i.e. missing cudnn bumps or am I missing sth?

Skylion007 · 2025-06-10T17:00:04Z

@nWEIdia Oh whoops, yeah. Something went wrong with my last rebase, I"ll open a new PR

Skylion007 · 2025-06-10T17:09:08Z

@nWEIdia fixed here: #155575 please approve

atalman · 2025-06-10T21:07:05Z

Hi @Skylion007 please change title accordingly, this in not cudnn update PR but it changes 1 single test instead

atalman · 2025-06-10T21:11:18Z

@pytorchmergebot revert -c nosignal -m "wrong pr description"

pytorchmergebot · 2025-06-10T21:13:07Z

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

This reverts commit 73220d5. Reverted #155122 on behalf of https://github.com/atalman due to wrong pr description ([comment](#155122 (comment)))

pytorchmergebot · 2025-06-10T21:13:22Z

@Skylion007 your PR has been successfully reverted.

nWEIdia · 2025-06-10T22:02:53Z

It looks like we no longer need this PR. Closing to save some resources.

Skylion007 requested review from atalman, eqy and nWEIdia June 4, 2025 15:07

Skylion007 requested review from a team and jeffdaily as code owners June 4, 2025 15:07

pytorch-bot bot added the topic: not user facing topic category label Jun 4, 2025

Skylion007 requested review from malfet and ngimel June 4, 2025 15:09

Skylion007 added the ciflow/binaries Trigger all binary build and upload jobs on the PR label Jun 4, 2025

pytorchbot added the open source label Jun 4, 2025

Skylion007 force-pushed the skylion007/update-cudnn-9-10-1-4 branch from 08f097e to f9d8019 Compare June 4, 2025 15:26

malfet approved these changes Jun 4, 2025

View reviewed changes

atalman approved these changes Jun 4, 2025

View reviewed changes

Skylion007 force-pushed the skylion007/update-cudnn-9-10-1-4 branch from f9d8019 to a3bfbf7 Compare June 4, 2025 18:25

nWEIdia reviewed Jun 4, 2025

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jun 4, 2025

pytorchmergebot added the merging label Jun 4, 2025

Skylion007 linked an issue Jun 5, 2025 that may be closed by this pull request

support for cuDNN 9.8+ #155203

Open

pytorchmergebot removed the merging label Jun 7, 2025

pytorchmergebot added the merging label Jun 10, 2025

pytorchmergebot closed this in 73220d5 Jun 10, 2025

pytorchmergebot removed the merging label Jun 10, 2025

Skylion007 mentioned this pull request Jun 10, 2025

[BE]: Update to CUDNN 9.10.1.4 (part 2) #155575

Closed

pytorchmergebot added a commit that referenced this pull request Jun 10, 2025

Revert "[BE] Update cudnn to 9.10.1.4 (#155122)"

40fefe2

This reverts commit 73220d5. Reverted #155122 on behalf of https://github.com/atalman due to wrong pr description ([comment](#155122 (comment)))

pytorchmergebot reopened this Jun 10, 2025

nWEIdia closed this Jun 10, 2025

[BE] Update cudnn to 9.10.1.4 #155122

[BE] Update cudnn to 9.10.1.4 #155122

Uh oh!

Conversation

Skylion007 commented Jun 4, 2025

Uh oh!

pytorch-bot bot commented Jun 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/155122

❌ 7 New Failures, 202 Pending, 1 Unrelated Failure

Uh oh!

Skylion007 commented Jun 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

malfet commented Jun 4, 2025

Uh oh!

atalman left a comment

Choose a reason for hiding this comment

Uh oh!

atalman commented Jun 4, 2025

Uh oh!

nWEIdia left a comment

Choose a reason for hiding this comment

Uh oh!

Skylion007 commented Jun 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Skylion007 commented Jun 4, 2025

Uh oh!

pytorchmergebot commented Jun 4, 2025

Merge started

Uh oh!

pytorchmergebot commented Jun 5, 2025

Uh oh!

Skylion007 commented Jun 5, 2025

Uh oh!

pytorch-bot bot commented Jun 5, 2025

Uh oh!

Skylion007 commented Jun 5, 2025

Uh oh!

pytorchmergebot commented Jun 5, 2025

Merge started

Uh oh!

pytorchmergebot commented Jun 7, 2025

Merge started

Uh oh!

pytorchmergebot commented Jun 7, 2025

Merge failed

Uh oh!

nWEIdia commented Jun 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Skylion007 commented Jun 10, 2025

Uh oh!

nWEIdia commented Jun 10, 2025

Uh oh!

Skylion007 commented Jun 10, 2025

Uh oh!

pytorchmergebot commented Jun 10, 2025

Merge started

Uh oh!

nWEIdia commented Jun 10, 2025

Uh oh!

Skylion007 commented Jun 10, 2025

Uh oh!

Skylion007 commented Jun 10, 2025

Uh oh!

atalman commented Jun 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

atalman commented Jun 10, 2025

Uh oh!

pytorchmergebot commented Jun 10, 2025

Uh oh!

pytorchmergebot commented Jun 10, 2025

Uh oh!

nWEIdia commented Jun 10, 2025

Uh oh!

Reviewers

pytorch-bot bot commented Jun 4, 2025 •

edited

Loading

Skylion007 commented Jun 4, 2025 •

edited

Loading

Skylion007 commented Jun 4, 2025 •

edited

Loading

nWEIdia commented Jun 7, 2025 •

edited

Loading

atalman commented Jun 10, 2025 •

edited

Loading