Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions api-reference/openapi.json
Original file line number Diff line number Diff line change
Expand Up @@ -2185,7 +2185,7 @@
"type": "null"
}
],
"description": "Inline voice references for zero-shot cloning. Requires MessagePack (not JSON). For single speaker, provide an array of ReferenceAudio objects. For multiple speakers, provide an array of arrays where each inner array contains references for one speaker. The speaker index corresponds to the index in reference_id array. Example for multi-speaker: [[{audio, text}], [{audio, text}, {audio, text}]] for 2 speakers where speaker 1 has 2 reference samples.",
"description": "Inline voice references for zero-shot cloning. Requires MessagePack (not JSON). For single speaker, provide an array of ReferenceAudio objects. For multiple speakers, provide an array of arrays where each inner array contains references for one speaker. **Multi-speaker is only available with the S2-Pro model.** The speaker index corresponds to the index in reference_id array. Example for multi-speaker: [[{audio, text}], [{audio, text}, {audio, text}]] for 2 speakers where speaker 1 has 2 reference samples.",
"title": "References"
},
"reference_id": {
Expand All @@ -2206,7 +2206,7 @@
}
],
"default": null,
"description": "Voice model ID(s) from Fish Audio library or your custom models. For single speaker synthesis, provide a string. For multi-speaker synthesis (e.g., dialogue), provide an array of model IDs. When using multiple speakers, use speaker tags in your text like [0] and [1] to indicate which speaker should speak each part. Example: '[0]Hello![1]Hi there![0]How are you?' with reference_id: ['speaker-a-id', 'speaker-b-id']",
"description": "Voice model ID(s) from Fish Audio library or your custom models. For single speaker synthesis, provide a string. For multi-speaker synthesis (e.g., dialogue), provide an array of model IDs. **Multi-speaker is only available with the S2-Pro model.** When using multiple speakers, use speaker tags in your text like [0] and [1] to indicate which speaker should speak each part. Example: '[0]Hello![1]Hi there![0]How are you?' with reference_id: ['speaker-a-id', 'speaker-b-id']",
"title": "Reference Id"
},
"prosody": {
Expand Down
2 changes: 1 addition & 1 deletion developer-guide/models-pricing/models-overview.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ Fish Audio offers state-of-the-art text-to-speech models optimized for different
<Card title="s2-pro" icon="star">
**Fish Audio S2-Pro** - Our next-generation TTS model with best-in-class performance
- Natural language control with `[bracket]` syntax — not limited to a fixed set (e.g., `[whispers sweetly]`, `[laughing nervously]`)
- Multi-speaker dialogue support
- Multi-speaker dialogue support **(S2-Pro exclusive)**
- 80+ languages
- 100ms time-to-first-audio
- Full SGLang-based serving stack
Expand Down