Skip to content

b9820

Latest

Choose a tag to compare

@github-actions github-actions released this 26 Jun 18:35
3fc4e10

sched : reintroduce less synchronizations during split compute (#20793)

  • CUDA: Improve performance via less synchronizations between token (#17795)

  • Adds CPU-to-CUDA copy capability to
    ggml_backend_cuda_cpy_tensor_async()

  • Adds function to relax sync requirements between input copies on
    supported backends (CUDA for now)

  • Exchanges synchronous copy with async copy function.

  • Adds macro guards to allow compilation in non-CUDA builds

  • Reworked backend detection in ggml-backend.cpp to avoid linking
    conflicts

  • Relax requirement of checks in async CUDA copies from backend and buffer type to just buffer type, to avoid linking issues

  • Minor cleanup

  • Makes opt-in to relax use of explicit syncs more general. Backends like
    vulkan which require a synchronization between HtoD copies and graph
    execution could also adopt this change now.

  • Reintroduces stricter check for CPU->CUDA backend async copy via
    GGML_DEVICE_TYPE_CPU.

  • Corrects initialization of ggml_backend_sync_mode in
    ggml_backend_sched_split initialization

  • Simplifies synchronizations to adhere to saaasg pattern.

  • Apply suggestion from @ggerganov (src->buffer to buf_src)

Co-authored-by: Georgi Gerganov ggerganov@gmail.com

  • Apply suggestion from @ggerganov (src->buffer to buf_src) v2

Co-authored-by: Georgi Gerganov ggerganov@gmail.com


Co-authored-by: Georgi Gerganov ggerganov@gmail.com

Co-authored-by: Johannes Gäßler johannesg@5d6.de

  • Adds single-GPU synchronizations to multi-GPU settings to fix hip backend pipeline parallel bugs.

  • Scheduler Hardening: Exclude hip/MUSA from copy_from_host CPU split ->
    GPU split optimization

  • Scheduler Hardening: Re-adding original additional synchronizations for
    non-async backends

  • Adds disclaimer to hip/musa exclusion of copy_from_host. Highlights that it is out of
    precaution, but that no perf-impact is visible, and that it can be
    revisited separately anytime.


Co-authored-by: Georgi Gerganov ggerganov@gmail.com
Co-authored-by: Johannes Gäßler johannesg@5d6.de

macOS/iOS:

Linux:

Android:

Windows:

openEuler:

  • DISABLED
  • openEuler x86 (310p)
  • openEuler x86 (910b, ACL Graph)
  • openEuler aarch64 (310p)
  • openEuler aarch64 (910b, ACL Graph)

UI: