Skip to content

Modify CI to enable tests on CUDA 13.1 and add Dockerfiles for CUDA 13.1#21564

Merged
kswiecicki merged 1 commit intosyclfrom
luszczewskakasia1_workflow-cuda13
Apr 20, 2026
Merged

Modify CI to enable tests on CUDA 13.1 and add Dockerfiles for CUDA 13.1#21564
kswiecicki merged 1 commit intosyclfrom
luszczewskakasia1_workflow-cuda13

Conversation

@luszczewskakasia1
Copy link
Copy Markdown
Contributor

No description provided.

@luszczewskakasia1 luszczewskakasia1 force-pushed the luszczewskakasia1_workflow-cuda13 branch from bfebf80 to 21002aa Compare March 19, 2026 10:02
Comment thread .github/workflows/ur-build-hw.yml Fixed
@luszczewskakasia1 luszczewskakasia1 force-pushed the luszczewskakasia1_workflow-cuda13 branch 2 times, most recently from d93c4dc to c545d4e Compare March 22, 2026 21:58
@luszczewskakasia1 luszczewskakasia1 force-pushed the luszczewskakasia1_workflow-cuda13 branch 3 times, most recently from 07681fd to 4ba045d Compare March 30, 2026 13:34
@rbanka1 rbanka1 force-pushed the luszczewskakasia1_workflow-cuda13 branch from 9547d92 to 774d559 Compare April 2, 2026 08:45
@rbanka1 rbanka1 requested a review from bratpiorka April 2, 2026 12:27
@bratpiorka
Copy link
Copy Markdown
Contributor

bratpiorka commented Apr 2, 2026

LGTM. The failing tests from the SYCL pre-commit / CUDA 13.1 will be disabled in a separate PR, with appropriate issue trackers

@rbanka1 rbanka1 marked this pull request as ready for review April 9, 2026 06:57
@rbanka1 rbanka1 requested a review from a team as a code owner April 9, 2026 06:57
@rbanka1 rbanka1 requested a review from sarnex April 13, 2026 10:28
target_devices: cuda:gpu
- name: NVIDIA/CUDA 13.1
runner: '["Linux", "cuda13"]'
image: "ghcr.io/intel/llvm/ubuntu2404_intel_drivers_cuda131:latest"
Copy link
Copy Markdown
Contributor

@sarnex sarnex Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to adding new containers/workflows and not use replacing the existing ones? I would prefer to avoid the new complexity if possible, thanks.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Old containers will not work because CUDA 13+ requires a new toolkit (installed in the container) and matching drivers (installed on the newhost machine)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather we update the host machine drivers and use the new toolkit than add a new container. I can update the drivers if you agree with this approach.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

your idea is only working for UR test, but for sycl we can't run jobs without container because if we don't define container, workflow will use default intel/llvm/ubuntu2404_intel_drivers:latest and the job will fail https://github.com/intel/llvm/actions/runs/24386352355/job/71224477448#step:21:27

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I didn't mean we shouldn't use a container. I was saying we should just update the existing container to use the new CUDA toolkit instead of keeping the existing one and adding a new one that has the new CUDA toolkit.

In the CI log, It looks like we get no devices, and almost surely that's because the container is using a toolkit that's too new. So probably the fix is to just update the kernel driver on all our CUDA runners.

Please finalize this PR for merge, and I will then update the drivers on our CUDA runners, rerun CI and then merge this PR.

Also, if you know have a recommended kernel driver version to use, let me know. Ping me again when I should try updating the driver. Thanks

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sarnex so you’re proposing to update the existing container to CUDA 13 drivers and toolkit?
Currently, some customers are using a CUDA adapter based on version 12.8. If we switch to CUDA 13 or newer, we could miss issues that reproduce only on that version. Also this requires passing -DUR_CONFORMANCE_NVIDIA_ARCH="sm_75" to all CUDA configurations (see https://github.com/intel/llvm/pull/21564/changes#diff-a718eeef19a58e1d7ae466d96cf2a94859c8e2cb4739a335d330406a9b22a748R132), as the default version sm_50 would not work (or set is as default).
My suggestion is to continue testing both versions for a period of time and, at a later stage, gradually drop support for CUDA 12.x.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I was missing that context, thanks

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know when this PR is ready for review with keeping the new containing (seems not finalized rn, some commented out code etc)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR is ready

@rbanka1 rbanka1 force-pushed the luszczewskakasia1_workflow-cuda13 branch from ce3fc8f to 774d559 Compare April 15, 2026 07:59
@bratpiorka bratpiorka requested a review from sarnex April 15, 2026 17:55
@bratpiorka
Copy link
Copy Markdown
Contributor

@sarnex Hi this PR is ready to re-review, thanks

@sarnex
Copy link
Copy Markdown
Contributor

sarnex commented Apr 17, 2026

Sorry for the slow response, was really busy yesterday. Will look now

Copy link
Copy Markdown
Contributor

@sarnex sarnex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ideally id like to avoid the duplication between the images but i dont have any better ideas

target_devices: cuda:gpu
- name: NVIDIA/CUDA 13.1
runner: '["Linux", "cuda13"]'
image: "ghcr.io/intel/llvm/ubuntu2404_intel_drivers_cuda131:latest"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: i dont think we need quotes around the image name

@kswiecicki
Copy link
Copy Markdown
Contributor

CI failures are unrelated to the GH workflow changes in this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants