-
Notifications
You must be signed in to change notification settings - Fork 13.5k
sycl: use async memory allocation to fix crashes during graph recording #16644
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
GGML_SYCL_DISABLE_GRAPHS=0 causes crashes because:
- Host waits are currently unsupported in graph recording mode.
- SYCL malloc / free calls are unsupported in graph recording mode.
The following changes are made to fix SYCL graph functionality:
- When graphs are enabled, use the SYCL async memory extension for temp
buffers which is supported with SYCL graphs.
- For compiler versions that do not support this extension, skip
graphs with the affected op.
- Switch from USM shared to device memory as the async extension
currently just supports device allocations.
|
@mmichel11 I support to enable SYCL graph feature in SYCL backend. I understand that it's not easy to make the SYCL graph work well in llama.cpp.
Thank you! |
|
If only current Intel llvm public compiler has all needed support, then it should be used for SYCL in CI. Let's (1) add CI executing this code, (2) check if performance on non-graph is not deteriorated by this PR. If both above tasks are done let's merge and have SYCL graphs working on llama head again. This will be far better situation on head that now, when we have crashing graphs. |
Have you tested the performance of SYCL graph locally? |
I've tested performance locally, and SYCL graphs does not yet deliver performance benefit over the standard path. In particular, usage of graphs needs to be improved to reduce the number of finalize calls bottlenecking performance and graph update needs to be fixed by adding alternative paths for unsupported nodes in graph update (memcpy). This patch just reenables functionality for newer compilers with the async memory extension. If the feature branch is the way we need to go now, the biggest problem I foresee is future SYCL changes breaking compatibility with graphs as @lslusarczyk mentioned creating difficulties for enablement in the long-term. Having some CI job to ensure SYCL graph functionality with new changes on master seems ideal to prevent this if possible. |
The SYCL graph code is separated by macro in code. Though we can’t run CI to check and protect SYCL graph code not to be broken by other PRs, we could pay attention to during review PRs. Thank you for your sharing! Hope this feature be implemented as soon. |
…ng (ggml-org#16644) * sycl: use async memory allocation to fix graph recording failures GGML_SYCL_DISABLE_GRAPHS=0 causes crashes because: - Host waits are currently unsupported in graph recording mode. - SYCL malloc / free calls are unsupported in graph recording mode. The following changes are made to fix SYCL graph functionality: - When graphs are enabled, use the SYCL async memory extension for temp buffers which is supported with SYCL graphs. - For compiler versions that do not support this extension, skip graphs with the affected op. - Switch from USM shared to device memory as the async extension currently just supports device allocations. * Address reviewer feedback * Use global async variable to decide path in sycl_ext_[malloc_device|free]
…ng (ggml-org#16644) * sycl: use async memory allocation to fix graph recording failures GGML_SYCL_DISABLE_GRAPHS=0 causes crashes because: - Host waits are currently unsupported in graph recording mode. - SYCL malloc / free calls are unsupported in graph recording mode. The following changes are made to fix SYCL graph functionality: - When graphs are enabled, use the SYCL async memory extension for temp buffers which is supported with SYCL graphs. - For compiler versions that do not support this extension, skip graphs with the affected op. - Switch from USM shared to device memory as the async extension currently just supports device allocations. * Address reviewer feedback * Use global async variable to decide path in sycl_ext_[malloc_device|free]
Setting
GGML_SYCL_DISABLE_GRAPHS=0to enable SYCL graphs currently crashes with most use cases because of the following unsupported operations in graph recording regions:The following changes are made to fix SYCL graph functionality:
I have verified functionality with commit a3132c1 of intel/llvm. For earlier compilers that do not support this extension, graphs are disabled for most cases to prevent crashes.