Skip to content

Commit bb0eb99

Browse files
committed
Fix sampling defaults: align with Python's non-greedy config
Symptom: paragraph-length inputs produced a wav where only the first second had audio and the rest was pure silence. Example input "hello how are you? i am good..." generated 257 speech tokens of which 240 were the silence token 4218 — the C++ T3 was running with top_k=1 (greedy), which on Chatterbox falls into a silence-token repetition trap as soon as any natural pause is synthesized. Align the defaults with ChatterboxTurboTTS.generate() in tts_turbo.py: before (C++) after (C++, matches Python) top_k 1 (greedy) 1000 top_p 1.0 0.95 temperature 1.0 0.8 repeat_penalty 1.0 1.2 n_predict 256 1000 Any of these can still be overridden on the CLI; --top-k 1 reproduces the old greedy behaviour for debugging. Verified: same input that previously yielded one 0.5-s window of speech followed by 19 windows of pure zero RMS now has non-trivial RMS across all 21 windows; total wav RMS goes from 8.3e-03 to 4.8e-02 and max amplitude from 0.18 to 0.50 on the same prompt. afplay confirms normal continuous speech.
1 parent 4001702 commit bb0eb99

1 file changed

Lines changed: 14 additions & 10 deletions

File tree

src/main.cpp

Lines changed: 14 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -150,13 +150,17 @@ struct cli_params {
150150
bool dump_tokens_only = false;
151151
int32_t seed = 0;
152152
int32_t n_threads = std::min(4, (int32_t) std::thread::hardware_concurrency());
153-
int32_t n_predict = 256;
153+
int32_t n_predict = 1000; // matches Python's default-ish output budget for paragraph-length text
154154
int32_t n_ctx = 0;
155155
int32_t n_gpu_layers = 0;
156-
int32_t top_k = 1;
157-
float top_p = 1.0f;
158-
float temp = 1.0f;
159-
float repeat_penalty = 1.0f;
156+
// Sampling defaults matched to ChatterboxTurboTTS.generate() in tts_turbo.py:
157+
// temperature=0.8, top_k=1000, top_p=0.95, repetition_penalty=1.2
158+
// The previous greedy defaults (top_k=1) collapse into silence-token
159+
// repetition loops on any non-trivial text.
160+
int32_t top_k = 1000;
161+
float top_p = 0.95f;
162+
float temp = 0.8f;
163+
float repeat_penalty = 1.2f;
160164
};
161165

162166
static void print_usage(const char * argv0) {
@@ -179,13 +183,13 @@ static void print_usage(const char * argv0) {
179183
fprintf(stderr, " bit-exact numerical validation (requires --ref-dir).\n");
180184
fprintf(stderr, " --seed N RNG seed (default: 0)\n");
181185
fprintf(stderr, " --threads N CPU threads (default: %d)\n", std::min(4, (int32_t) std::thread::hardware_concurrency()));
182-
fprintf(stderr, " --n-predict N Max speech tokens (default: 256)\n");
186+
fprintf(stderr, " --n-predict N Max speech tokens (default: 1000)\n");
183187
fprintf(stderr, " --context N Override KV context length\n");
184188
fprintf(stderr, " --n-gpu-layers N GPU backend when N > 0\n");
185-
fprintf(stderr, " --top-k N (default: 1)\n");
186-
fprintf(stderr, " --top-p P (default: 1.0)\n");
187-
fprintf(stderr, " --temp T (default: 1.0)\n");
188-
fprintf(stderr, " --repeat-penalty R (default: 1.0)\n");
189+
fprintf(stderr, " --top-k N (default: 1000, matches Python; use 1 for greedy)\n");
190+
fprintf(stderr, " --top-p P (default: 0.95)\n");
191+
fprintf(stderr, " --temp T (default: 0.8)\n");
192+
fprintf(stderr, " --repeat-penalty R (default: 1.2)\n");
189193
fprintf(stderr, " -h, --help\n");
190194
}
191195

0 commit comments

Comments
 (0)