Skip to content

ggml : add ggml_conv_1d_grouped#22833

Open
Juste-Leo2 wants to merge 1 commit into
ggml-org:masterfrom
Juste-Leo2:Conv1dGrouped
Open

ggml : add ggml_conv_1d_grouped#22833
Juste-Leo2 wants to merge 1 commit into
ggml-org:masterfrom
Juste-Leo2:Conv1dGrouped

Conversation

@Juste-Leo2
Copy link
Copy Markdown
Contributor

Overview

This PR adds the ggml_conv_1d_grouped operation (sub-graph) to be used in supporting CCA (Compressed Convolutional Attention) for future support of Zyphra's models (ZAYA1).
(This is a first step towards #22776)

CCA uses a specific convolution system that doesn't seem to be currently implemented in llama.cpp.

image

Figure 1: Architecture of Compressed Convolutional Attention (CCA), extracted from the ZAYA1 technical report (arXiv:2605.05365).

Here, the added operation amounts to supporting Depthwise Conv and Headwise Conv.

To explain how it works, I made a diagram with matplotlib:

output (2)

Figure 2: Functioning of the Grouped 1D Convolution operation (example with Groups = 2).

For example, when the group is equal to 2, we have an initial splitting of the tensors with separate convolutions. Then comes a concatenation to get the final tensor back.

Additional information

The code contains a small trick: the idea is to use ggml_view_3d to avoid making lots of memory copies. I took care to keep the code modifications to a minimum to make the review easier.

Here are the results obtained for the tests:

:~/github/llama.cpp/build$ ./test-conv-1d-grouped
Testing ggml_conv_1d_grouped

  TEST: groups=1 (standard conv1d) (IC=128 OC=256 K=3 L=32 G=1 s=1 p=0)
    PASS
  TEST: ZAYA1-8B exact params (IC=1280 OC=1280 K=2 L=16 G=10 s=1 p=0)
    PASS
  TEST: small 2 groups (IC=4 OC=4 K=2 L=8 G=2 s=1 p=0)
    PASS
  TEST: with padding (IC=8 OC=8 K=2 L=16 G=4 s=1 p=1)
    PASS
  TEST: IC != OC (IC=12 OC=6 K=3 L=10 G=3 s=1 p=0)
    PASS
  TEST: stride=2 (IC=8 OC=8 K=2 L=16 G=4 s=2 p=0)
    PASS
  TEST: longer sequence (IC=1280 OC=1280 K=2 L=128 G=10 s=1 p=0)
    PASS

Result: 7 passed, 0 failed

Note: I will do my best to answer questions regarding the implementation. Currently, it uses existing llama.cpp operations, which I think is best for maintainability at the start.

Requirements

  • AI usage disclosure: YES
    • I used antigravity to understand the operation and make the diagram too.
    • AI was also used to write the code (I am not yet comfortable enough with the ggml C++ syntax) and to make the best possible optimizations.
    • To respect the coding style compared to other implementations.
    • To translate the PR (written in French by my own hands) and improve readability :)

@Juste-Leo2 Juste-Leo2 requested a review from ggerganov as a code owner May 8, 2026 10:15
@github-actions github-actions Bot added testing Everything test related ggml changes relating to the ggml tensor library for machine learning labels May 8, 2026
kmbandy added a commit to kmbandy/llama.cpp that referenced this pull request May 17, 2026
Ports PR ggml-org#22833 and PR ggml-org#23112 from ggml-org/llama.cpp onto our fork.

- ggml: add ggml_conv_1d_grouped op (depthwise + headwise conv via
  ggml_view_3d slicing, falls back to existing conv1d/dw for groups=1
  and groups=IC)
- gguf: register ZAYA arch, CCA_VAL_PROJ1/2, CCA_CONV_GRP, CCA_K_SCALE,
  RES_SCALE_HS/RES/FINAL, ZAYA_ROUTER_MLP2/4/BIASES/EDA_SCALE tensors
- src: add llama_model_zaya with alternating CCA (even) and MoE (odd)
  layers; residual scaling at every layer and final norm
- conversion/zaya.py: HF→GGUF converter for ZayaModel/ZayaForCausalLM
- Includes ggml_cont fixes for ROCm non-contiguous tensor compatibility
  and F16 cast fixes for CPU backend (from Zyphra fork review)

Markovian RSA (test-time compute method) is intentionally excluded and
will be a separate implementation.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning testing Everything test related

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant