Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nightly cuda 12_4 failures #125429

Closed
atalman opened this issue May 2, 2024 · 3 comments 路 Fixed by pytorch/builder#1808
Closed

Nightly cuda 12_4 failures #125429

atalman opened this issue May 2, 2024 · 3 comments 路 Fixed by pytorch/builder#1808
Assignees
Labels
module: binaries Anything related to official binaries that we release to users triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@atalman
Copy link
Contributor

atalman commented May 2, 2024

馃悰 Describe the bug

After merging cuda 12.4 for nightly. I observe following failures:

libtorch-cuda12_4-shared-with-deps-cxx11-abi-build:
https://github.com/pytorch/pytorch/actions/runs/8920273020/job/24505255627

Build Official Docker Images 12_4:
https://github.com/pytorch/pytorch/actions/runs/8920271198/job/24498006544

We need to resolve these before continuing adding CUDA 12.4 CI workflows

Versions

2.4.0

cc @seemethere @malfet @osalpekar

@atalman atalman added module: binaries Anything related to official binaries that we release to users triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels May 2, 2024
nWEIdia added a commit to nWEIdia/builder that referenced this issue May 3, 2024
Make it generic to cuda 12.
Fixes pytorch/pytorch#125429 partially for the
libtorch failure.
@nWEIdia
Copy link
Collaborator

nWEIdia commented May 3, 2024

attempt 3 worked for libtorch build issue: https://github.com/pytorch/pytorch/actions/runs/8935076086

@nWEIdia
Copy link
Collaborator

nWEIdia commented May 3, 2024

For Docker image, related discussion: https://gitlab.com/nvidia/container-images/cuda/-/issues/225

atalman pushed a commit to pytorch/builder that referenced this issue May 6, 2024
* Fix a cuda 12.1 specific file and library usage.
Make it generic to cuda 12.
Fixes pytorch/pytorch#125429 partially for the
libtorch failure.

* The corresponding .so.12 seems to be absent, use just .so instead.

* Remove trailing space
@nWEIdia
Copy link
Collaborator

nWEIdia commented May 6, 2024

I will instance a new bug for the docker job failure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: binaries Anything related to official binaries that we release to users triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants