CUDA: fix --split-mode row race condition #9413

JohannesGaessler · 2024-09-10T16:12:48Z

The problem is that --split-mode row uses multiple CUDA streams to overlap data transfer with computation but on CUDA GPUs that are Volta or newer each of these streams allocates a temporary buffer for the stream-k fixup. And because it's not safe to allocate temporary buffers with multiple CUDA streams in parallel this leads to a race condition. However, because both stream-k decomposition and --split-mode row mitigates tail effects it's fine to just not use stream-k if multiple parallel CUDA streams are in use; the performance difference with 3x RTX 4090 is ~1%.

Long-term it may make sense to think about how we can safely handle multiple parallel CUDA streams. I think a good approach would be to move the parallelism from something CUDA-specific to something at the level of GGML compute graphs. That would also enable the reuse of the code for other backends.

slaren · 2024-09-10T16:18:02Z

You could use a different pool per stream. Since the pools are objects now, it should be a simple change. I thought about moving the implementation of tensor parallelism to ggml_backend_sched, but it is a lot of work for no clear benefit outside of datacenter GPUs with very fast interconnects, and I think it will make backend-specific optimizations harder to implement.

CUDA: fix --split-mode row race condition

7e84f92

JohannesGaessler added the Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix label Sep 10, 2024

JohannesGaessler mentioned this pull request Sep 10, 2024

Bug: b3188 breaks row split mode for multiple GPUs #8801

Closed

slaren approved these changes Sep 10, 2024

View reviewed changes

github-actions bot added the Nvidia GPU Issues specific to Nvidia GPUs label Sep 10, 2024

JohannesGaessler merged commit 5af118e into ggml-org:master Sep 11, 2024
52 checks passed

dsx1986 pushed a commit to dsx1986/llama.cpp that referenced this pull request Oct 29, 2024

CUDA: fix --split-mode row race condition (ggml-org#9413)

397002c

arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 15, 2024

CUDA: fix --split-mode row race condition (ggml-org#9413)

6c55086

arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 18, 2024

CUDA: fix --split-mode row race condition (ggml-org#9413)

e9df9d9

Nexesenex pushed a commit to Nexesenex/croco.cpp that referenced this pull request Dec 23, 2024

CUDA: fix --split-mode row race condition (ggml-org#9413)

9d46fa1

Nexesenex pushed a commit to Nexesenex/croco.cpp that referenced this pull request Feb 25, 2025

CUDA: fix --split-mode row race condition (ggml-org#9413)

67e66e2

JohannesGaessler deleted the cuda-fix-sm-row branch May 5, 2025 20:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CUDA: fix --split-mode row race condition #9413

CUDA: fix --split-mode row race condition #9413

Uh oh!

JohannesGaessler commented Sep 10, 2024

Uh oh!

slaren commented Sep 10, 2024

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

CUDA: fix --split-mode row race condition #9413

CUDA: fix --split-mode row race condition #9413

Uh oh!

Conversation

JohannesGaessler commented Sep 10, 2024

Uh oh!

slaren commented Sep 10, 2024

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants