Add Qwen3 TTS architecture support by Acceldium · Pull Request #20752 · ggml-org/llama.cpp

Acceldium · 2026-03-19T11:29:57Z

Note that this branch adds initial support for running Qwen3-TTS models in llama.cpp. Qwen3-TTS uses a multi-stage pipeline (language model + audio decoder/tokenizer) that requires executing multiple independent compute graphs in sequence — a pattern llama.cpp does not currently support natively.

Add GGUF support for the Qwen3-TTS model family: - gguf-py: define QWEN3TTS and QWEN3TTS_CP architectures, TTS-specific KV keys (text_vocab_size, text_embedding_length, num_code_groups, position_id_per_seconds), and tensor name mappings - convert_hf_to_gguf.py: add Qwen3TTSTalkerModel (28-layer Talker with interleaved MRoPE) and Qwen3TTSCodePredictorModel (5-layer Code Predictor with standard RoPE), including speaker encoder tensor remapping - tools/tts/convert_qwen3tts.py: wrapper script to convert both Talker and Code Predictor GGUFs from a single HF model directory - tools/tts/convert_qwen3tts_tokenizer.py: tokenizer conversion helper

Add C++ inference support for Qwen3-TTS Talker and Code Predictor: - llama-arch: register LLM_ARCH_QWEN3TTS (Talker, 28-layer with MRoPE) and LLM_ARCH_QWEN3TTS_CP (Code Predictor, 5-layer with standard RoPE), including KV keys, tensor names, and tensor info entries - llama-hparams: add TTS-specific fields (text_vocab_size, text_embd_size, num_code_groups, position_id_per_s) - llama-model: implement load_hparams, load_tensors for both architectures with TTS tensor pointers (text/codec embeddings, projection layers, per-codebook heads) - llama-graph: guard build_inp_embd for null tok_embd to support embedding-only input path used by TTS - models/qwen3tts.cpp: llm_build_qwen3tts graph builder with text projection, codec embedding, and MRoPE attention - models/qwen3tts_cp.cpp: llm_build_qwen3tts_cp graph builder for the code predictor sub-model

Add the end-user facing components for Qwen3-TTS: - tools/tts/qwen3tts.cpp: main CLI tool implementing the full TTS pipeline -- text tokenization, Talker prefill/decode with MRoPE, Code Predictor for multi-codebook generation, and vocoder (DAC-based decoder with VQ, strided ConvTranspose1d, Snake activations) - tools/tts/speaker-encoder: standalone ECAPA-TDNN speaker encoder for voice cloning (extracts speaker embeddings from reference audio) - tools/tts/CMakeLists.txt: build targets for llama-qwen3tts and speaker-encoder - examples/qwen3-tts/: Python usage examples (basic TTS, voice cloning, multilingual, Gradio app, benchmarking)

ggml-gh-bot · 2026-03-19T11:34:44Z

Hi @Acceldium, thanks for your contribution!

Per our contribution guidelines, the automated PR checker found the following issue(s) that need your attention:

Multiple open PRs from a new contributor: We limit new contributors (those without a previously merged PR) to 1 open PR at a time. You currently have 4 open PRs.
AI-generated content: This project does not accept PRs, descriptions or commit messages that are fully or predominantly AI-generated. If you have used AI to assist you in writing code, please make sure to disclose that explicitly.
Large PR: Large changes require prior discussion (e.g. an issue or RFC) and maintainers may not be able to review this PR as-is. Consider splitting it into smaller, focused PRs.

Please note that maintainers reserve the right to make final decisions on PRs. If you believe there is a mistake, please comment below.

Acceldium added 2 commits March 19, 2026 07:13

Acceldium requested review from CISC and ggerganov as code owners March 19, 2026 11:29

Acceldium force-pushed the Qwen3-TTS branch from 1063380 to cbecb06 Compare March 19, 2026 11:36

github-actions bot added model Model specific examples python python script changes labels Mar 19, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Qwen3 TTS architecture support#20752

Add Qwen3 TTS architecture support#20752
Acceldium wants to merge 3 commits intoggml-org:masterfrom
Acceldium:Qwen3-TTS

Acceldium commented Mar 19, 2026

Uh oh!

ggml-gh-bot bot commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Acceldium commented Mar 19, 2026

Uh oh!

ggml-gh-bot bot commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant