Skip to content

Conversation

@Skylion007
Copy link
Collaborator

Follow up to #152782

@Skylion007 Skylion007 requested review from atalman, eqy and nWEIdia June 4, 2025 15:07
@Skylion007 Skylion007 requested review from a team and jeffdaily as code owners June 4, 2025 15:07
@pytorch-bot
Copy link

pytorch-bot bot commented Jun 4, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/155122

Note: Links to docs will display an error until the docs builds have been completed.

❌ 7 New Failures, 202 Pending, 1 Unrelated Failure

As of commit 41b4572 with merge base da1f898 (image):

NEW FAILURES - The following jobs have failed:

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added the topic: not user facing topic category label Jun 4, 2025
@Skylion007 Skylion007 requested review from malfet and ngimel June 4, 2025 15:09
@Skylion007 Skylion007 added the ciflow/binaries Trigger all binary build and upload jobs on the PR label Jun 4, 2025
@Skylion007 Skylion007 force-pushed the skylion007/update-cudnn-9-10-1-4 branch from 08f097e to f9d8019 Compare June 4, 2025 15:26
@Skylion007
Copy link
Collaborator Author

Skylion007 commented Jun 4, 2025

@nWEIdia We also need to update cusparseLt now that the PyTorch supports the newer API. I recommend updating 12.6 and 12.8 to 0.7.1 to minimize the number of minor versions we need to support

@malfet
Copy link
Contributor

malfet commented Jun 4, 2025

@atalman do you mind uploading wheels in question?

Copy link
Contributor

@atalman atalman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@atalman
Copy link
Contributor

atalman commented Jun 4, 2025

nvidia_cudnn_cu12-9.10.1.4 files where uploaded to the index

@Skylion007 Skylion007 force-pushed the skylion007/update-cudnn-9-10-1-4 branch from f9d8019 to a3bfbf7 Compare June 4, 2025 18:25
Copy link
Collaborator

@nWEIdia nWEIdia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need similar torch inductor changes in benchmarks/dynamo/ci_expected_accuracy/inductor_torchbench_training.csv as in the previous PR? Or this cudnn version is better?

Perhaps also add inductor labels.

@Skylion007
Copy link
Collaborator Author

Skylion007 commented Jun 4, 2025

Do we need similar torch inductor changes in benchmarks/dynamo/ci_expected_accuracy/inductor_torchbench_training.csv as in the previous PR? Or this cudnn version is better?

Perhaps also add inductor labels.

I think I remember another PR raised the tolerance so we might be okay! #154109 fingers crossed!

@Skylion007
Copy link
Collaborator Author

@pytorchbot merge -i

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jun 4, 2025
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged while ignoring the following 4 checks: docker-builds / docker-build (linux.12xlarge, pytorch-linux-jammy-py3-clang12-executorch), pull / linux-focal-py3.13-clang10 / test (crossref, 2, 2, linux.2xlarge), pull / linux-focal-py3.13-clang10 / test (default, 2, 5, linux.4xlarge), windows-arm64-binary-libtorch-release / libtorch-cpu-shared-with-deps-release-build

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command
For more information see pytorch-bot wiki.

@Skylion007
Copy link
Collaborator Author

@pytorchbot merge -I

@pytorch-bot
Copy link

pytorch-bot bot commented Jun 5, 2025

❌ 🤖 pytorchbot command failed:

@pytorchbot: error: unrecognized arguments: -I

usage: @pytorchbot [-h] {merge,revert,rebase,label,drci,cherry-pick,close} ...

Try @pytorchbot --help for more info.

@Skylion007
Copy link
Collaborator Author

@pytorchbot merge -i

@pytorchmergebot
Copy link
Collaborator

@Skylion007 Skylion007 linked an issue Jun 5, 2025 that may be closed by this pull request
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

@nWEIdia
Copy link
Collaborator

nWEIdia commented Jun 7, 2025

cc @eqy did we want to evaluate 9.10.2? In this PR or a separate one?

@Skylion007
Copy link
Collaborator Author

@nWEIdia Should we merge this or go ahead with the newer version. Seems like the newer version is a bugfix release so I don't see the harm.

@nWEIdia
Copy link
Collaborator

nWEIdia commented Jun 10, 2025

@Skylion007 Going step by step is a better idea. Yes, let's land this PR first with 9.10.1.4.
And you are most welcome to create a follow up PR to 9.10.2 (9.10.2.21 as shown in this link), thanks!

@Skylion007
Copy link
Collaborator Author

@pytorchbot merge -i

@nWEIdia
Copy link
Collaborator

nWEIdia commented Jun 10, 2025

Does this PR somehow become a no-op? i.e. missing cudnn bumps or am I missing sth?

image

@Skylion007
Copy link
Collaborator Author

@nWEIdia Oh whoops, yeah. Something went wrong with my last rebase, I"ll open a new PR

@Skylion007
Copy link
Collaborator Author

@nWEIdia fixed here: #155575 please approve

@atalman
Copy link
Contributor

atalman commented Jun 10, 2025

Hi @Skylion007 please change title accordingly, this in not cudnn update PR but it changes 1 single test instead

@atalman
Copy link
Contributor

atalman commented Jun 10, 2025

@pytorchmergebot revert -c nosignal -m "wrong pr description"

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

pytorchmergebot added a commit that referenced this pull request Jun 10, 2025
This reverts commit 73220d5.

Reverted #155122 on behalf of https://github.com/atalman due to wrong pr description ([comment](#155122 (comment)))
@pytorchmergebot
Copy link
Collaborator

@Skylion007 your PR has been successfully reverted.

@nWEIdia
Copy link
Collaborator

nWEIdia commented Jun 10, 2025

It looks like we no longer need this PR. Closing to save some resources.

@nWEIdia nWEIdia closed this Jun 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

better-engineering Relatively self-contained tasks for better engineering contributors ci-no-td Do not run TD on this PR ciflow/binaries Trigger all binary build and upload jobs on the PR ciflow/inductor ciflow/trunk Trigger trunk jobs on your pull request Merged open source Reverted topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

support for cuDNN 9.8+

7 participants