Skip to content

Add Qwen3 TTS architecture support#20752

Open
Acceldium wants to merge 3 commits intoggml-org:masterfrom
Acceldium:Qwen3-TTS
Open

Add Qwen3 TTS architecture support#20752
Acceldium wants to merge 3 commits intoggml-org:masterfrom
Acceldium:Qwen3-TTS

Conversation

@Acceldium
Copy link

Note that this branch adds initial support for running Qwen3-TTS models in llama.cpp. Qwen3-TTS uses a multi-stage pipeline (language model + audio decoder/tokenizer) that requires executing multiple independent compute graphs in sequence — a pattern llama.cpp does not currently support natively.

Add GGUF support for the Qwen3-TTS model family:

- gguf-py: define QWEN3TTS and QWEN3TTS_CP architectures, TTS-specific
  KV keys (text_vocab_size, text_embedding_length, num_code_groups,
  position_id_per_seconds), and tensor name mappings
- convert_hf_to_gguf.py: add Qwen3TTSTalkerModel (28-layer Talker with
  interleaved MRoPE) and Qwen3TTSCodePredictorModel (5-layer Code
  Predictor with standard RoPE), including speaker encoder tensor
  remapping
- tools/tts/convert_qwen3tts.py: wrapper script to convert both Talker
  and Code Predictor GGUFs from a single HF model directory
- tools/tts/convert_qwen3tts_tokenizer.py: tokenizer conversion helper
Add C++ inference support for Qwen3-TTS Talker and Code Predictor:

- llama-arch: register LLM_ARCH_QWEN3TTS (Talker, 28-layer with MRoPE)
  and LLM_ARCH_QWEN3TTS_CP (Code Predictor, 5-layer with standard
  RoPE), including KV keys, tensor names, and tensor info entries
- llama-hparams: add TTS-specific fields (text_vocab_size,
  text_embd_size, num_code_groups, position_id_per_s)
- llama-model: implement load_hparams, load_tensors for both
  architectures with TTS tensor pointers (text/codec embeddings,
  projection layers, per-codebook heads)
- llama-graph: guard build_inp_embd for null tok_embd to support
  embedding-only input path used by TTS
- models/qwen3tts.cpp: llm_build_qwen3tts graph builder with text
  projection, codec embedding, and MRoPE attention
- models/qwen3tts_cp.cpp: llm_build_qwen3tts_cp graph builder for
  the code predictor sub-model
Add the end-user facing components for Qwen3-TTS:

- tools/tts/qwen3tts.cpp: main CLI tool implementing the full TTS
  pipeline -- text tokenization, Talker prefill/decode with MRoPE,
  Code Predictor for multi-codebook generation, and vocoder (DAC-based
  decoder with VQ, strided ConvTranspose1d, Snake activations)
- tools/tts/speaker-encoder: standalone ECAPA-TDNN speaker encoder
  for voice cloning (extracts speaker embeddings from reference audio)
- tools/tts/CMakeLists.txt: build targets for llama-qwen3tts and
  speaker-encoder
- examples/qwen3-tts/: Python usage examples (basic TTS, voice
  cloning, multilingual, Gradio app, benchmarking)
@ggml-gh-bot
Copy link

ggml-gh-bot bot commented Mar 19, 2026

Hi @Acceldium, thanks for your contribution!

Per our contribution guidelines, the automated PR checker found the following issue(s) that need your attention:

  • Multiple open PRs from a new contributor: We limit new contributors (those without a previously merged PR) to 1 open PR at a time. You currently have 4 open PRs.

  • AI-generated content: This project does not accept PRs, descriptions or commit messages that are fully or predominantly AI-generated. If you have used AI to assist you in writing code, please make sure to disclose that explicitly.

  • Large PR: Large changes require prior discussion (e.g. an issue or RFC) and maintainers may not be able to review this PR as-is. Consider splitting it into smaller, focused PRs.


Please note that maintainers reserve the right to make final decisions on PRs. If you believe there is a mistake, please comment below.

@github-actions github-actions bot added model Model specific examples python python script changes labels Mar 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

examples model Model specific python python script changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant