Skip to content

Sync master with upstream release b8770#484

Merged
jan-service-account merged 7 commits intodevfrom
update-dev-from-master-2026-04-13-00-58
Apr 13, 2026
Merged

Sync master with upstream release b8770#484
jan-service-account merged 7 commits intodevfrom
update-dev-from-master-2026-04-13-00-58

Conversation

@jan-service-account
Copy link
Copy Markdown

Updates dev branch with latest release (b8770) from ggml-org/llama.cpp

ngxson and others added 7 commits April 12, 2026 13:02
* docs: add guide on how to add multimodal support

* nits
* mtmd: add Gemma 4 audio conformer encoder support

Add audio processing for Gemma 4 E2B/E4B via a USM-style Conformer.

Architecture:
- 12-layer Conformer: FFN → Self-Attention → Causal Conv1D → FFN → Norm
- Subsampling Conv Projection: 2x Conv2D(stride=2) with LayerNorm
- Full self-attention with sinusoidal RPE and sliding window mask (24)
- Logit softcapping at 50.0, ClippableLinear clamping
- Output: 1024 → 1536 → RMSNorm → multimodal embedder

Mel preprocessing (dedicated mtmd_audio_preprocessor_gemma4a):
- HTK mel scale, 128 bins, magnitude STFT, mel_floor=1e-3
- Standard periodic Hann window (320 samples), zero-padded to FFT size
- Semicausal left-padding (frame_length/2 samples)
- Frame count matched to PyTorch (unfold formula)
- No pre-emphasis, no Whisper-style normalization
- Mel cosine similarity vs PyTorch: 0.9998

Key fixes:
- Tensor loading dedup: prevent get_tensor() from creating duplicate
  entries in ctx_data. Fixed with std::set guard.
- ClippableLinear clamp_info loading moved after per-layer tensors.
- Sliding window mask (24 positions) matching PyTorch context_size.
- Skip Whisper normalization for Gemma4 mel output.

Tested on E2B and E4B with CPU and Vulkan backends.
Transcribes: "Glad to see things are going well and business is starting
to pick up" (matching ground truth).

Ref: ggml-org#21325
* mtmd: add gemma 4 test (vision + audio)

* add to docs
* add qwen3a

* wip

* vision ok

* no more deepstack for audio

* convert ASR model ok

* qwen3 asr working

* Apply suggestions from code review

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* nits

* Apply suggestions from code review

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* fix bad merge

* fix multi inheritance

---------

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
@jan-service-account jan-service-account merged commit 76ff29c into dev Apr 13, 2026
5 checks passed
@jan-service-account jan-service-account deleted the update-dev-from-master-2026-04-13-00-58 branch April 13, 2026 01:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants