Add MOSS-TTS-v1.5 (delay) and MOSS-Audio-Tokenizer by ted-mosi · Pull Request #46447 · huggingface/transformers

ted-mosi · 2026-06-05T13:30:42Z

What does this PR do?

This PR adds native Transformers support for:

MossTTSDelayModel
MossTTSDelayProcessor
MossAudioTokenizerModel
AutoModelForTextToWaveform
AutoModelForAudioTokenization
Model docs and tests for both MOSS-TTS-v1.5 (delay) and MOSS-Audio-Tokenizer

Architecture notes

MOSS-TTS-v1.5 uses a delay-pattern TTS generation format with text tokens plus multiple audio VQ channels. The processor handles user/assistant message
construction, reference-audio tokenization, delay/de-delay conversion, and waveform decoding through the native MOSS audio tokenizer.

MOSS-Audio-Tokenizer is added as a native audio-tokenizer model so MOSS-TTS no longer depends on Hub remote code for audio tokenization.

Before submitting

I confirm that this is not a pure code agent PR.
Did you read the contributor guideline,
Pull Request section?
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

AI assistance disclosure

AI assistance was used while drafting and testing parts of this PR. I reviewed the changed files, checked the implementation against existing Transformers
model patterns, and ran the relevant local tests listed below. I am responsible for the final code and for responding to review feedback.

Who can review?

@eustlb @ebezzam @vasqu

github-actions · 2026-06-05T13:37:17Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, moss_audio_tokenizer, moss_tts_delay

ted-mosi added 16 commits June 4, 2026 13:09

WIP.

316b1d4

Style fix.

d7b14b0

Fix.

9abb0d3

Style fix.

1a3374d

Fix.

877d4c0

Fix.

8fbf108

Fix.

1db616b

Fix.

8ab4caa

Added first integration test.

01ffaa0

Added more integration test.

add3628

Added MOSS-Audio-Tokenizer.

1a0c7c3

Fix.

f492dc2

Refactored the test for tts_robust_normalizer.

cc1f177

Updated the doc for MOSS-TTS-V1.5 (delay).

ccbd777

Address code review.

1ffb5fd

Merge branch 'main' into add-mosstts-v1-5

c1aa792

ted-mosi added 2 commits June 5, 2026 21:47

Fix.

c64d5cc

Fix pipeline.

5d55d4e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add MOSS-TTS-v1.5 (delay) and MOSS-Audio-Tokenizer#46447

Add MOSS-TTS-v1.5 (delay) and MOSS-Audio-Tokenizer#46447
ted-mosi wants to merge 18 commits into
huggingface:mainfrom
ted-mosi:add-mosstts-v1-5

ted-mosi commented Jun 5, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ted-mosi commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Architecture notes

Before submitting

AI assistance disclosure

Who can review?

Uh oh!

github-actions Bot commented Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ted-mosi commented Jun 5, 2026 •

edited

Loading