feat(cli): add --lang and auto-infer phonemizer locale from voice prefix#351
Merged
jrusso1020 merged 2 commits intomainfrom Apr 20, 2026
Merged
feat(cli): add --lang and auto-infer phonemizer locale from voice prefix#351jrusso1020 merged 2 commits intomainfrom
jrusso1020 merged 2 commits intomainfrom
Conversation
`hyperframes tts` was calling Kokoro's `model.create(text, voice=, speed=)` with no language argument, so Kokoro's default phonemizer (en-us) was applied regardless of the voice selected. Picking `ef_dora` or `jf_alpha` and feeding it Spanish or Japanese text produced English-phonemized output. Closes #349. - `manager.ts`: add `SUPPORTED_LANGS`, `inferLangFromVoiceId`, and `isSupportedLang`. Attach a `defaultLang` field to every bundled voice and expand the bundled list with `ef_dora`, `ff_siwis`, `jf_alpha`, `zf_xiaobei` so `--list` surfaces multilingual options. - `synthesize.ts`: accept optional `lang: SupportedLang` in `SynthesizeOptions`, forward it to the Python worker as `argv[7]`. The worker introspects `Kokoro.create`'s signature and only passes `lang=` when the installed kokoro-onnx version supports it. Returned metadata now includes `lang` and `langApplied` so callers can detect silent no-ops. Bump the cached script filename to `synth-v2.py` so existing installs pick up the new script automatically. - `commands/tts.ts`: add `--lang, -l` with validation against `SUPPORTED_LANGS`. Resolution order is explicit `--lang` > inferred from voice prefix > `en-us`. When explicit lang disagrees with the voice-implied lang (legitimate for stylized accents), emit a dim-level hint; suppress under `--json`. When kokoro-onnx silently ignores the kwarg, log that too. Update `--list` with a new "Lang code" column and add multilingual examples. - Tests: new `manager.test.ts` covering every supported prefix, the unknown-prefix fallback, case-insensitivity, `isSupportedLang` validation, and a regression guard that every bundled voice has a valid `defaultLang` matching its ID. - Docs: `docs/packages/cli.mdx` and `skills/hyperframes/references/tts.md` updated with the flag, examples, the espeak-ng dependency note for non-English phonemization, and the voice-prefix → lang table. Backward compatibility: - English voices (a*/b* prefixes) continue to phonemize as en-us / en-gb — no change. - Non-English voices now phonemize correctly by default (bug fix, not a regression). - Older kokoro-onnx versions that don't know the `lang` kwarg keep working via signature introspection; the CLI logs a dim note if `--lang` was requested but ignored. Verification: - `bun --cwd packages/cli test` — 128 tests pass (incl. 17 new). - `bunx oxlint` and `bunx oxfmt --check` clean on changed files. - `bun run build` succeeds. - `npx tsx packages/cli/src/cli.ts tts --help` / `--list` render cleanly; invalid `--lang` produces a clean error with the valid-codes list.
|
Preview deployment for your docs. Learn more about Mintlify Previews.
💡 Tip: Enable Workflows to automatically generate PRs for you. |
miguel-heygen
approved these changes
Apr 20, 2026
Post-review cleanup on #351. Net -21 lines. - Drop `defaultLang` field + `makeVoice()` helper from VoiceInfo — compute via `inferLangFromVoiceId(v.id)` at read time in listVoices. The only reader was the --list table; caching the derived value on every voice added a self-consistency invariant we had to test. - Drop redundant `lang` field from SynthesizeResult — caller already knows the requested lang since it passed it in; only `langApplied` carries information the caller can't derive. - Use `errorBox` for --lang validation to match the house style in render.ts (other validation errors already use errorBox). - Reuse existing `langList` module constant in the validation error instead of re-joining SUPPORTED_LANGS. - Inline `DEFAULT_LANG` — used once in inferLangFromVoiceId. - Trim WHAT-restating comments and the duplicate prefix-enumeration JSDoc on inferLangFromVoiceId (VOICE_PREFIX_LANG already carries per-row comments). - Clean up orphaned `synth*.py` files in ~/.cache/hyperframes/tts when writing the current versioned script, so repeated upgrades don't leak files. - Drop the `EN-US` case-sensitive-rejection test assertion — the CLI lowercases input before validation, so accepting mixed case is a feature, not a bug. Tests: 16/16 in `manager.test.ts`, 127/127 full CLI suite pass. Lint + format + typecheck clean.
|
Tested locally end-to-end on this PR branch and can confirm the fix works as intended. Local validation performed
Result
From a practical E2E perspective, the fix behaves as expected and addresses the multilingual phonemization issue reported in #349. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Fixes multilingual TTS output in
hyperframes tts. Adds:--lang <code>flag for explicit phonemizer locale override--langis omitted (the actual bug fix)kokoro-onnxversion predates thelangkwargCloses #349.
Why
hyperframes ttswas calling Kokoro'smodel.create(text, voice=, speed=)with no language argument. Kokoro's text frontend defaults toen-usregardless of voice, so picking a non-English voice likeef_dora(Spanish) orjf_alpha(Japanese) and feeding it native-language text produced English-phonemized output — every non-English voice was effectively broken.Kokoro's own voice ID convention encodes the language in the first letter (
a=American,b=British,e=Spanish,f=French,h=Hindi,i=Italian,j=Japanese,p=Brazilian Portuguese,z=Mandarin), so the default can be derived mechanically from the voice. Explicit--langis kept for cases where users intentionally want a mismatch (stylized accent, specific locale likeen-gbvsen-us).How
packages/cli/src/tts/manager.tsSUPPORTED_LANGSreadonly tuple of the nine valid phonemizer codesinferLangFromVoiceId(voiceId)helper mapping voice prefixes → localesisSupportedLang()type guarddefaultLangto everyVoiceInfoBUNDLED_VOICESwithef_dora,ff_siwis,jf_alpha,zf_xiaobeiso--listsurfaces multilingual options (it was English-only before)packages/cli/src/tts/synthesize.tslanginSynthesizeOptions; forward asargv[7]to PythonKokoro.create's signature and only passeslang=when the installed kokoro-onnx version supports it — older installs keep working with their default (English) phonemizationlangandlangAppliedso the caller can detect silent no-opssynth-v2.pyso existing users automatically pick up the new script (the old cached copy doesn't know aboutargv[7])packages/cli/src/commands/tts.ts--lang, -lflag with validation againstSUPPORTED_LANGS--lang>inferLangFromVoiceId(voice)>en-us--langdisagrees with voice-implied lang (legitimate for stylized accents, suppressed under--json)langkwarg (old installs)--listadds a "Lang code" column so users can see what each voice phonemizes to by default--helpBackward compatibility
tts "Hello" --voice af_heartlang=en-uslang=en-us(inferred) — unchangedtts "Bonjour" --voice ff_siwistts "..." --voice ef_dora --lang en-uslangkwarg support--langlogs a dim note if requestedEnglish voices (
a*/b*prefixes) are unchanged. Non-English voices now phonemize correctly by default — that's a bug fix, not a regression.Test plan
packages/cli/src/tts/manager.test.ts— every supported prefix, unknown-prefix fallback, case-insensitivity,isSupportedLangacceptance/rejection, regression guard that every bundled voice has a validdefaultLangmatching its IDbun --cwd packages/cli test→ 128 passed (11 test files, 17 new)bunx oxlint+bunx oxfmt --checkclean on all changed filesbun run buildsucceedsnpx tsx packages/cli/src/cli.ts tts --helprenders--langrow and examples correctlynpx tsx packages/cli/src/cli.ts tts --listrenders the new Lang code column and lists the expanded voice setnpx tsx packages/cli/src/cli.ts tts "hi" --lang notrealproduces a clean validation error listing valid codesnpx tsx packages/cli/src/cli.ts tts --lang es(no input) falls through to the "provide text" errornpx hyperframes tts "La reunión empieza a las nueve" --voice ef_dora --output /tmp/es.wavand confirming by ear that it pronounces "reunión" and "nueve" in Spanish rather than English.Notes for reviewers
espeak-nginstalled at the system level (brew install espeak-ng/apt-get install espeak-ng). The docs/skill call this out. A preflight check forespeak-ngwhen--langis non-English would be nice follow-up but the Kokoro error is already reasonably clear.--lang EN-USis rejected (we normalize withtoLowerCase()before checking so--lang EN-US→en-usstill works; explicit invalid values like--lang englishor--lang deare rejected).--langsurface, and the inference logic + CLI surface ship together so users get the fix without needing to know the new flag.