Commit bb0eb99
committed
Fix sampling defaults: align with Python's non-greedy config
Symptom: paragraph-length inputs produced a wav where only the first
second had audio and the rest was pure silence. Example input
"hello how are you? i am good..." generated 257 speech tokens of which
240 were the silence token 4218 — the C++ T3 was running with
top_k=1 (greedy), which on Chatterbox falls into a silence-token
repetition trap as soon as any natural pause is synthesized.
Align the defaults with ChatterboxTurboTTS.generate() in tts_turbo.py:
before (C++) after (C++, matches Python)
top_k 1 (greedy) 1000
top_p 1.0 0.95
temperature 1.0 0.8
repeat_penalty 1.0 1.2
n_predict 256 1000
Any of these can still be overridden on the CLI; --top-k 1 reproduces
the old greedy behaviour for debugging.
Verified: same input that previously yielded one 0.5-s window of speech
followed by 19 windows of pure zero RMS now has non-trivial RMS across
all 21 windows; total wav RMS goes from 8.3e-03 to 4.8e-02 and max
amplitude from 0.18 to 0.50 on the same prompt. afplay confirms normal
continuous speech.1 parent 4001702 commit bb0eb99
1 file changed
Lines changed: 14 additions & 10 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
150 | 150 | | |
151 | 151 | | |
152 | 152 | | |
153 | | - | |
| 153 | + | |
154 | 154 | | |
155 | 155 | | |
156 | | - | |
157 | | - | |
158 | | - | |
159 | | - | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
160 | 164 | | |
161 | 165 | | |
162 | 166 | | |
| |||
179 | 183 | | |
180 | 184 | | |
181 | 185 | | |
182 | | - | |
| 186 | + | |
183 | 187 | | |
184 | 188 | | |
185 | | - | |
186 | | - | |
187 | | - | |
188 | | - | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
189 | 193 | | |
190 | 194 | | |
191 | 195 | | |
| |||
0 commit comments