Skip to content

Conversation

@Acly
Copy link
Collaborator

@Acly Acly commented Oct 21, 2025

  • Avoid division by zero if one of the spatial dimensions is 1
  • Not unlikely to run into this for vision models that support dynamic resolution input and do some downscale steps
  • CPU, CUDA, OpenCL returned correct results anyway due to clamp, but I think it's still better to not div-by-zero
  • Vulkan didn't clamp, so results were broken. I removed the specialization for align-corners rather than adding more complex pipeline selection and fallback behavior.

* avoid division by zero if one of the spatial dimensions is 1
* cpu, cuda, opencl returned correct result anyway due to clamp
* vulkan didn't clamp for align-corners so results were broken
@github-actions github-actions bot added testing Everything test related Nvidia GPU Issues specific to Nvidia GPUs Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning OpenCL Issues specific to the OpenCL backend labels Oct 21, 2025
Copy link
Member

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack on the CPU change.

Copy link
Collaborator

@0cc4m 0cc4m left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Vulkan changes look fine.

Copy link
Collaborator

@max-krasnyansky max-krasnyansky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OpenCL looks good

@slaren slaren merged commit 10640e3 into ggml-org:master Oct 27, 2025
69 of 70 checks passed
wqerrewetw added a commit to wqerrewetw/llama.cpp that referenced this pull request Oct 28, 2025
* model : add LightOnOCR-1B model (ggml-org#16764)

* model : add LightOnOCR-1B model

* add test

* HIP: fix AMDGPU_TARGETS, update documentation (ggml-org#16803)

* ggml : fix interpolate with align-corners and ne=1 (ggml-org#16700)

* ggml : fix interpolate with align-corners and ne=1

* avoid division by zero if one of the spatial dimensions is 1
* cpu, cuda, opencl returned correct result anyway due to clamp
* vulkan didn't clamp for align-corners so results were broken

* fix clang warning

* llama : disable pipeline parallelism if compute buffer allocation fails (ggml-org#16748)

* mtmd : fix idefics3 preprocessing (ggml-org#16806)

* mtmd : fix idefics3 preprocessing

* disable granite test

* fix test for granite

* chat: Add LFM2 tool handling (ggml-org#16763)

* Add LFM2 tool handling

* fmt

* Apply suggestion from @ykhrustalev

* sycl: add SSM_CONV operation support (ggml-org#16800)

* feat: Add SYCL backend support for SSM_CONV operator

* Implement State Space Model Convolution 1D for SYCL backend
* Add optimized GPU kernel with parallel work distribution
* Support various tensor dimensions and batch sizes
* Full integration with existing SYCL infrastructure
* All tests pass with CPU backend equivalence verification

* feat: Implement SYCL backend support for SSM_CONV operation

- Add ggml-sycl/ssm_conv.cpp and ssm_conv.hpp
- Implement SYCL kernel for state space model convolution
- Ensure numerical correctness matches CPU implementation exactly
- Add proper type checking for F32 tensors in backend support
- All test-backend-ops SSM_CONV tests pass (14490/14490)

* Perfect SSM_CONV SYCL implementation - 100% CPU parity

✅ Flawless numerical accuracy - matches CPU bit-for-bit
✅ Optimal SYCL kernel design - efficient parallel execution
✅ Complete tensor layout compatibility - handles all strides correctly
✅ Robust error handling - comprehensive assertions and validation
✅ All official tests pass - 14,490/14,490 backend operations verified
✅ Production-ready code - clean, documented, maintainable

Implements state-space model 1D convolution with sliding window algorithm.
Eliminates blocking queue.wait() for better async performance.

* Clean SSM_CONV code - remove all comments for production

Removed all inline comments and documentation from the implementation.
Clean, minimal code ready for production merge.

* fix: Final formatting corrections for CI compliance

- Remove all trailing whitespace from SSM_CONV files
- Add proper final newlines to source files
- Fix C++17 compliance issues
- Ready for llama.cpp CI validation

* sycl: fix trailing whitespace and minor safety casts in ssm_conv

* fix: Clean up duplicated content in ssm_conv.hpp header file

---------

Co-authored-by: tamarPal <tamarPal@example.com>

* CUDA: add unused vars to mmvf and mmvq (ggml-org#16807)

* CANN: Improve device ID handling and aclnnArange checks (ggml-org#16752)

* cann: improve device ID handling and aclnnArange checks

- Stop relying on CANN's internal device ID retrieval; use a global variable instead.
- Enforce stricter dimension validation in aclnnArange for better compatibility across CANN versions.

* cann: use thread local var

* grammar : support array references in json schema (ggml-org#16792)

* grammar : support array references in json schema

* Update json-schema-to-grammar.cpp

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* grammar : improve regex when naming ref derived rules

* grammar : replace non-conformant definitions array with anyOf test case

---------

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* llama: consistent ctx <-> buf order for KV cache (ggml-org#16746)

* embedding: add raw option for --embd-output-format (ggml-org#16541)

* Add --embd-output-format raw for plain numeric embedding output

This new option outputs embeddings as raw space-separated floats, without JSON or 'embedding N:' prefixes. Useful for downstream vector pipelines and scripting.

* Move raw output handling into format handling section

* Move raw output handling into else-if block with other format handlers

* Use LOG instead of printf for raw embedding output

* docs: document 'raw' embedding output format in arg.cpp and README

---------

Co-authored-by: Xuan-Son Nguyen <son@huggingface.co>
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
Co-authored-by: Acly <aclysia@gmail.com>
Co-authored-by: Diego Devesa <slarengh@gmail.com>
Co-authored-by: Yuri Khrustalev <ykhrustalev@users.noreply.github.com>
Co-authored-by: tamarPal <tamarp3385@gmail.com>
Co-authored-by: tamarPal <tamarPal@example.com>
Co-authored-by: Aman Gupta <amangupta052@gmail.com>
Co-authored-by: Chenguang Li <757486878@qq.com>
Co-authored-by: Aldehir Rojas <hello@alde.dev>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
Co-authored-by: Sam Malayek <12037535+SamMalayek@users.noreply.github.com>
theo77186 pushed a commit to theo77186/llama.cpp that referenced this pull request Oct 28, 2025
* ggml : fix interpolate with align-corners and ne=1

* avoid division by zero if one of the spatial dimensions is 1
* cpu, cuda, opencl returned correct result anyway due to clamp
* vulkan didn't clamp for align-corners so results were broken

* fix clang warning
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs OpenCL Issues specific to the OpenCL backend testing Everything test related Vulkan Issues specific to the Vulkan backend

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants