sycl: use async memory allocation to fix crashes during graph recording #16644

mmichel11 · 2025-10-17T23:08:41Z

Setting GGML_SYCL_DISABLE_GRAPHS=0 to enable SYCL graphs currently crashes with most use cases because of the following unsupported operations in graph recording regions:

Host waits
SYCL malloc / free calls

The following changes are made to fix SYCL graph functionality:

When graphs are enabled, use the SYCL async memory extension for temp buffers which is supported with SYCL graphs.
For SYCL compiler versions that do not support this extension, skip graphs with the affected op.
Switch from USM shared to device memory as the async extension currently just supports device allocations.

I have verified functionality with commit a3132c1 of intel/llvm. For earlier compilers that do not support this extension, graphs are disabled for most cases to prevent crashes.

GGML_SYCL_DISABLE_GRAPHS=0 causes crashes because: - Host waits are currently unsupported in graph recording mode. - SYCL malloc / free calls are unsupported in graph recording mode. The following changes are made to fix SYCL graph functionality: - When graphs are enabled, use the SYCL async memory extension for temp buffers which is supported with SYCL graphs. - For compiler versions that do not support this extension, skip graphs with the affected op. - Switch from USM shared to device memory as the async extension currently just supports device allocations.

NeoZhangJianyu · 2025-10-20T02:19:07Z

@mmichel11
By this PR, will the SYCL graph feature work well?
Is there any performance benefit?

I support to enable SYCL graph feature in SYCL backend.
But I appreciate a PR to enable SYCL graph to provide whole solution of SYCL graph, instead of a piece of code to fix a small issue before it's ready.
When a developer want to refer to SYCL graph solution in llama.cpp, there are many PRs. It'd hard to learn.

I understand that it's not easy to make the SYCL graph work well in llama.cpp.
I suggest merging such PRs to a feature branch, instead of main branch:

We hope main branch work with stable features and bug fix.
For this PR, user need to install a special compiler version to support the PR code.
It doesn't make sense for main branch.

Thank you!

lslusarczyk · 2025-10-20T07:15:15Z

If only current Intel llvm public compiler has all needed support, then it should be used for SYCL in CI. Let's (1) add CI executing this code, (2) check if performance on non-graph is not deteriorated by this PR.

If both above tasks are done let's merge and have SYCL graphs working on llama head again. This will be far better situation on head that now, when we have crashing graphs.

NeoZhangJianyu · 2025-10-20T08:25:36Z

If only current Intel llvm public compiler has all needed support, then it should be used for SYCL in CI. Let's (1) add CI executing this code, (2) check if performance on non-graph is not deteriorated by this PR.

If both above tasks are done let's merge and have SYCL graphs working on llama head again. This will be far better situation on head that now, when we have crashing graphs.

Have you tested the performance of SYCL graph locally?
How much is the performance improvement?

mmichel11 · 2025-10-20T18:36:53Z

If only current Intel llvm public compiler has all needed support, then it should be used for SYCL in CI. Let's (1) add CI executing this code, (2) check if performance on non-graph is not deteriorated by this PR.
If both above tasks are done let's merge and have SYCL graphs working on llama head again. This will be far better situation on head that now, when we have crashing graphs.

Have you tested the performance of SYCL graph locally? How much is the performance improvement?

I've tested performance locally, and SYCL graphs does not yet deliver performance benefit over the standard path. In particular, usage of graphs needs to be improved to reduce the number of finalize calls bottlenecking performance and graph update needs to be fixed by adding alternative paths for unsupported nodes in graph update (memcpy). This patch just reenables functionality for newer compilers with the async memory extension.

If the feature branch is the way we need to go now, the biggest problem I foresee is future SYCL changes breaking compatibility with graphs as @lslusarczyk mentioned creating difficulties for enablement in the long-term. Having some CI job to ensure SYCL graph functionality with new changes on master seems ideal to prevent this if possible.

NeoZhangJianyu · 2025-10-21T05:06:02Z

If only current Intel llvm public compiler has all needed support, then it should be used for SYCL in CI. Let's (1) add CI executing this code, (2) check if performance on non-graph is not deteriorated by this PR.
If both above tasks are done let's merge and have SYCL graphs working on llama head again. This will be far better situation on head that now, when we have crashing graphs.

Have you tested the performance of SYCL graph locally? How much is the performance improvement?

I've tested performance locally, and SYCL graphs does not yet deliver performance benefit over the standard path. In particular, usage of graphs needs to be improved to reduce the number of finalize calls bottlenecking performance and graph update needs to be fixed by adding alternative paths for unsupported nodes in graph update (memcpy). This patch just reenables functionality for newer compilers with the async memory extension.

If the feature branch is the way we need to go now, the biggest problem I foresee is future SYCL changes breaking compatibility with graphs as @lslusarczyk mentioned creating difficulties for enablement in the long-term. Having some CI job to ensure SYCL graph functionality with new changes on master seems ideal to prevent this if possible.

The SYCL graph code is separated by macro in code.
It won’t impact legacy code.

Though we can’t run CI to check and protect SYCL graph code not to be broken by other PRs, we could pay attention to during review PRs.
At least I will do it.
So, don’t worry it.

Thank you for your sharing! Hope this feature be implemented as soon.

ggml/src/ggml-sycl/ggml-sycl.cpp

…ree]

ggml/src/ggml-sycl/ggml-sycl.cpp

…ng (ggml-org#16644) * sycl: use async memory allocation to fix graph recording failures GGML_SYCL_DISABLE_GRAPHS=0 causes crashes because: - Host waits are currently unsupported in graph recording mode. - SYCL malloc / free calls are unsupported in graph recording mode. The following changes are made to fix SYCL graph functionality: - When graphs are enabled, use the SYCL async memory extension for temp buffers which is supported with SYCL graphs. - For compiler versions that do not support this extension, skip graphs with the affected op. - Switch from USM shared to device memory as the async extension currently just supports device allocations. * Address reviewer feedback * Use global async variable to decide path in sycl_ext_[malloc_device|free]

github-actions bot added ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language labels Oct 17, 2025

NeoZhangJianyu reviewed Oct 21, 2025

View reviewed changes

Address reviewer feedback

bc430c6

NeoZhangJianyu reviewed Oct 22, 2025

View reviewed changes

ggml/src/ggml-sycl/ggml-sycl.cpp Show resolved Hide resolved

ggml/src/ggml-sycl/ggml-sycl.cpp Show resolved Hide resolved

ggml/src/ggml-sycl/ggml-sycl.cpp Outdated Show resolved Hide resolved

ggml/src/ggml-sycl/ggml-sycl.cpp Outdated Show resolved Hide resolved

Use global async variable to decide path in sycl_ext_[malloc_device|f…

af23220

…ree]

NeoZhangJianyu self-requested a review October 22, 2025 07:38

NeoZhangJianyu reviewed Oct 22, 2025

View reviewed changes

ggml/src/ggml-sycl/ggml-sycl.cpp Show resolved Hide resolved

NeoZhangJianyu approved these changes Oct 23, 2025

View reviewed changes

NeoZhangJianyu merged commit 9de9672 into ggml-org:master Oct 23, 2025
70 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

sycl: use async memory allocation to fix crashes during graph recording #16644

sycl: use async memory allocation to fix crashes during graph recording #16644

mmichel11 commented Oct 17, 2025

Uh oh!

NeoZhangJianyu commented Oct 20, 2025 •

edited

Loading

Uh oh!

lslusarczyk commented Oct 20, 2025

Uh oh!

NeoZhangJianyu commented Oct 20, 2025

Uh oh!

mmichel11 commented Oct 20, 2025 •

edited

Loading

Uh oh!

NeoZhangJianyu commented Oct 21, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

sycl: use async memory allocation to fix crashes during graph recording #16644

sycl: use async memory allocation to fix crashes during graph recording #16644

Conversation

mmichel11 commented Oct 17, 2025

Uh oh!

NeoZhangJianyu commented Oct 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lslusarczyk commented Oct 20, 2025

Uh oh!

NeoZhangJianyu commented Oct 20, 2025

Uh oh!

mmichel11 commented Oct 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

NeoZhangJianyu commented Oct 21, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

NeoZhangJianyu commented Oct 20, 2025 •

edited

Loading

mmichel11 commented Oct 20, 2025 •

edited

Loading