ggml : limit n_threads to the max n_tasks #5238

slaren · 2024-01-31T12:04:07Z

This avoids using a larger number of threads that the ops in the graph can actually use.

With full offload, the CPU graph only contains a get_rows that can only use one thread, so this should fix the performance regression when full offloading due to the overhead of launching additional threads. @stduhpf can you check if this fixes the issue?

ggml-ci

stduhpf · 2024-01-31T12:10:31Z

It fixes prompt processing speed, but not token generation.

I'll do more tresting later today.

slaren · 2024-01-31T12:22:37Z

This effectively sets the number of threads to 1 with full offload, so if the regression was caused by the change to n_threads in llama.cpp I don't see how this would not fix it. Are you sure that you tested in the same conditions?

stduhpf · 2024-01-31T12:25:21Z

No I'm not completely sure, I'll try testing again soon.

slaren · 2024-01-31T12:28:31Z

I can also reproduce the performance regression with CUDA, it's worse on windows but it is also noticeable under WSL, and this does fix it for me.

stduhpf · 2024-01-31T12:40:25Z

Ok, it's working. I messed up my first test, but this does fix the issue.

ggml : limit n_threads to the max n_tasks

7a04afd

ggml-ci

ggerganov approved these changes Jan 31, 2024

View reviewed changes

slaren merged commit dabcc5b into master Jan 31, 2024
60 checks passed

slaren deleted the sl/tasks-threads branch January 31, 2024 12:43

jordankanter pushed a commit to jordankanter/llama.cpp that referenced this pull request Feb 3, 2024

ggml : limit n_threads to the max n_tasks (ggerganov#5238)

e8af3d6

hodlen pushed a commit to hodlen/llama.cpp that referenced this pull request Apr 1, 2024

ggml : limit n_threads to the max n_tasks (ggerganov#5238)

9906851

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ggml : limit n_threads to the max n_tasks #5238

ggml : limit n_threads to the max n_tasks #5238

slaren commented Jan 31, 2024

stduhpf commented Jan 31, 2024 •

edited

slaren commented Jan 31, 2024

stduhpf commented Jan 31, 2024

slaren commented Jan 31, 2024

stduhpf commented Jan 31, 2024

ggml : limit n_threads to the max n_tasks #5238

ggml : limit n_threads to the max n_tasks #5238

Conversation

slaren commented Jan 31, 2024

stduhpf commented Jan 31, 2024 • edited

slaren commented Jan 31, 2024

stduhpf commented Jan 31, 2024

slaren commented Jan 31, 2024

stduhpf commented Jan 31, 2024

stduhpf commented Jan 31, 2024 •

edited