feat: LuxTTS integration — multi-engine TTS support#254
Conversation
📝 WalkthroughWalkthroughAdds multi-engine TTS support (LuxTTS + Qwen): new LuxTTS backend, engine-dispatching in backend, engine-aware API and frontend changes, form/types updates, dependency additions, and migration of developer docs/workflow from Makefile to a Just-based setup. Changes
Sequence DiagramsequenceDiagram
participant User
participant Frontend as Frontend UI
participant API as Backend API / Dispatcher
participant Qwen as Qwen Backend
participant Lux as LuxTTS Backend
participant Cache as Model Cache
User->>Frontend: Select engine (qwen | luxtts) and submit
Frontend->>API: POST generate_speech(text, engine)
API->>API: get_tts_backend_for_engine(engine)
alt engine == "qwen"
API->>Qwen: load_model()
Qwen->>Cache: check/download model
Cache-->>Qwen: model ready
API->>Qwen: create_voice_prompt(profile, use_cache)
Qwen-->>API: voice_prompt
API->>Qwen: generate(text, voice_prompt)
Qwen-->>API: audio
else engine == "luxtts"
API->>Lux: load_model()
Lux->>Cache: check/download model
Cache-->>Lux: model ready
API->>Lux: create_voice_prompt(profile, use_cache)
Lux-->>API: voice_prompt
API->>Lux: generate(text, voice_prompt, instruct)
Lux-->>API: audio
end
API-->>Frontend: stream audio
Frontend->>User: play audio
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
dccded4 to
2867421
Compare
31f72ff to
1318293
Compare
Introduce LuxTTS (ZipVoice) alongside Qwen TTS, enabling users to choose between engines at generation time. LuxTTS offers fast, English-focused voice cloning at 48kHz with ~1GB VRAM. Backend: - Add LuxTTSBackend with encode_prompt/generate_speech integration - Multi-engine registry (get_tts_backend_for_engine) replacing singleton - Engine-prefixed voice prompt cache keys to avoid collisions - Engine field on GenerationRequest (default 'qwen' for backward compat) - Engine dispatch in /generate and /generate/stream endpoints - LuxTTS in model status, download, and delete maps Frontend: - TTS Engine selector dropdown in GenerationForm (Qwen TTS / LuxTTS) - Conditionally hide Model Size and Delivery Instructions for LuxTTS - Engine field added to TypeScript types and Zod schema - LuxTTS section in Model Management page
Adds 'just' as the recommended dev tool: 'just setup' for one-time install, 'just dev' to run backend + frontend in one terminal. Updates CONTRIBUTING.md to document just as the primary setup method.
piper-phonemize has no PyPI wheels — needs custom find-links URL from k2-fsa.github.io. Removed redundant transitive deps that Zipvoice already declares.
- Combine engine + model size into one flat dropdown (Qwen3-TTS 1.7B, Qwen3-TTS 0.6B, LuxTTS) in both FloatingGenerateBox and GenerationForm - Add linacodec git dep to requirements.txt (uv-only source, pip can't resolve it from Zipvoice's pyproject.toml) - Remove redundant transitive deps from requirements.txt - Quiet the sidecar setup script (was printing misleading instructions)
- Fix silent Zod validation failure when LuxTTS selected (modelSize was set to 'default' which failed enum validation, preventing form submit) - Preserve engine, model size, and language after successful generation instead of resetting to defaults
There was a problem hiding this comment.
Actionable comments posted: 6
🧹 Nitpick comments (5)
justfile (1)
62-63: Consider a readiness check instead of fixed sleep.The
sleep 2is a timing assumption that may be insufficient if the backend takes longer to initialize (e.g., first run with model downloads). A health check loop would be more robust:♻️ Optional: Replace sleep with health check
echo "Starting backend on http://localhost:17493 ..." {{ venv_bin }}/uvicorn backend.main:app --reload --port 17493 & - sleep 2 + # Wait for backend to be ready (up to 30s) + for i in {1..30}; do + curl -sf http://localhost:17493/health >/dev/null && break + sleep 1 + done echo "Starting Tauri desktop app..."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@justfile` around lines 62 - 63, The fixed 2-second sleep after starting the server (the line running "{{ venv_bin }}/uvicorn backend.main:app --reload --port 17493 &" followed by "sleep 2") is brittle; replace it with a readiness/health-check loop that polls a known endpoint (e.g., /health or /docs) on localhost:17493 until it returns a successful status or a timeout is reached, retrying with short sleeps between attempts; ensure the loop runs after launching uvicorn in the background and fails the script with a clear message if the timeout is exceeded.backend/backends/luxtts_backend.py (1)
174-192: Minor: cache_key computed twice.The
cache_keyis computed on line 176 for the cache lookup, then again on line 191 for caching. Consider reusing the variable.♻️ Proposed fix
async def create_voice_prompt( self, audio_path: str, reference_text: str, use_cache: bool = True, ) -> Tuple[dict, bool]: # ... await self.load_model() + cache_key = "luxtts_" + get_cache_key(audio_path, reference_text) if use_cache else None + if use_cache: - cache_key = "luxtts_" + get_cache_key(audio_path, reference_text) cached = get_cached_voice_prompt(cache_key) if cached is not None and isinstance(cached, dict): return cached, True # ... encode ... if use_cache: - cache_key = "luxtts_" + get_cache_key(audio_path, reference_text) cache_voice_prompt(cache_key, encoded)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/backends/luxtts_backend.py` around lines 174 - 192, The cache_key is computed twice; compute it once before the cache lookup and reuse it for both get_cached_voice_prompt and cache_voice_prompt when use_cache is true: move the call to get_cache_key(audio_path, reference_text) into a single cache_key variable (prefixed with "luxtts_") before calling get_cached_voice_prompt, keep the same cache_key to decide early return, then after awaiting asyncio.to_thread(_encode_sync) call cache_voice_prompt(cache_key, encoded); adjust the block that defines _encode_sync and references to use_cache, cache_key, get_cached_voice_prompt, cache_voice_prompt, and self.model.encode_prompt accordingly.app/src/components/Generation/FloatingGenerateBox.tsx (1)
405-439: Consider wrapping in FormField for consistency.The model/engine selector uses
form.watch()andform.setValue()directly rather thanFormFieldwithrenderprop like the language selector (lines 381-403). While this works, it's inconsistent with the rest of the form and won't display validation errors viaFormMessage.Since this is a composite field controlling two form values (
engineandmodelSize), the current approach is pragmatic. No immediate fix needed, but worth noting for future refactors.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/src/components/Generation/FloatingGenerateBox.tsx` around lines 405 - 439, The Select block directly uses form.watch('engine') and form.setValue(...) to manage engine and modelSize, which is inconsistent and bypasses validation UI; wrap this composite selector inside a FormField (like the language selector) by creating a FormField for a virtual/compound field that renders the Select via its render prop, and inside that render use field.onChange/field.value or continue calling form.setValue but expose FormMessage for validation; reference the Select component and the form fields 'engine' and 'modelSize' and ensure the Select's onValueChange still sets form.setValue('engine', ...) and form.setValue('modelSize', ...) while the surrounding FormField provides FormItem/FormControl/FormMessage for consistency.backend/main.py (2)
618-634: Fire-and-forget task references may cause issues.The background download tasks created on lines 625 and 649 are not stored. While unlikely in practice, these tasks could theoretically be garbage collected before completion.
♻️ Proposed fix - store task references
+# Module-level set to keep background tasks alive +_background_tasks: set = set() + # In generate_speech: - asyncio.create_task(download_model_background()) + task = asyncio.create_task(download_model_background()) + _background_tasks.add(task) + task.add_done_callback(_background_tasks.discard)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/main.py` around lines 618 - 634, The fire-and-forget asyncio.create_task call for download_model_background risks GC before completion; capture and retain the Task (e.g., assign the result of asyncio.create_task(...) to a variable and add it to a long-lived collection or let task_manager track it) so the background task is referenced until finished, and ensure download_model_background handles exceptions and removes the task from the tracking collection on completion; update locations referencing download_model_background, tts_model.load_model_async, task_manager.start_download, and the create_task invocation to implement this task-tracking approach.
1360-1367:check_luxtts_loadedmay instantiate backend unnecessarily.Calling
get_tts_backend_for_engine("luxtts")to check if it's loaded will create theLuxTTSBackendinstance if it doesn't exist yet. This is fine sinceLuxTTSBackend.__init__()doesn't load the model, but it's worth noting that this differs from the Qwen check which uses the existingtts.get_tts_model()singleton.Consider checking if the backend exists in the registry first:
♻️ Alternative approach
def check_luxtts_loaded(): try: - from .backends import get_tts_backend_for_engine - backend = get_tts_backend_for_engine("luxtts") - return backend.is_loaded() + from .backends import _tts_backends + backend = _tts_backends.get("luxtts") + return backend.is_loaded() if backend else False except Exception: return False🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/main.py` around lines 1360 - 1367, The current check_luxtts_loaded calls get_tts_backend_for_engine("luxtts") which may instantiate a LuxTTSBackend; change it to first inspect the TTS backend registry for an existing "luxtts" entry and only call get_tts_backend_for_engine if an instance is already registered. Concretely: import the backends module used by get_tts_backend_for_engine (from .backends import get_tts_backend_for_engine, <registry_name>), check the registry container for the key "luxtts" (or the registry API that lists available/registered backends) and if present call backend = get_tts_backend_for_engine("luxtts") and return backend.is_loaded(), otherwise return False; update check_luxtts_loaded to use that registry check instead of unconditionally calling get_tts_backend_for_engine.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@app/src/components/ServerSettings/ModelManagement.tsx`:
- Around line 306-320: The LuxTTS block renders <ModelItem> without required
props from <ModelItemProps>, causing type errors and missing cancel/dismiss
behavior; update the <ModelItem> invocation to pass the missing props: provide
onCancel (call the same cancel handler used elsewhere, e.g., the function used
for downloads/cancels), pass isCancelling (compare a cancellingModel state to
model.model_name) and isDismissed (use the dismissal state/lookup used for other
models), while keeping existing props like model, onDownload (handleDownload),
onDelete (setModelToDelete + setDeleteDialogOpen), isDownloading
(downloadingModel === model.model_name) and formatSize so LuxTTS items support
cancel/dismiss handling and satisfy the type checker.
In `@backend/backends/__init__.py`:
- Around line 145-165: The backend lookup/creation for _tts_backends using the
check-then-act pattern is prone to race conditions; wrap the creation/store
sequence in a module-level threading.Lock (e.g., _tts_backends_lock) so only one
thread can create and assign a backend for a given engine; acquire the lock
before re-checking "if engine in _tts_backends", instantiate the correct class
(MLXTTSBackend via get_backend_type(), PyTorchTTSBackend, or LuxTTSBackend),
store it into _tts_backends[engine], then release the lock to ensure a single
instance is created and avoid duplicate model loads.
In `@backend/backends/luxtts_backend.py`:
- Around line 206-217: In the loop that builds combined_audio, the sample rate
return from load_audio is assigned to sr but never used; change the unpacking to
use a throwaway variable (e.g., audio, _ = load_audio(path, sample_rate=24000))
so the unused value is clearly ignored; update the line inside the function that
calls load_audio (referenced in the code block using load_audio,
normalize_audio, combined_audio, mixed, combined_text) to prefix the unused
variable with an underscore.
- Around line 259-262: The returned tensor from self.model.generate_speech
(variable wav) may reside on GPU/MPS; change the conversion to ensure the tensor
is moved to CPU and detached before calling .numpy() (e.g., call .cpu() and
.detach() on wav) so audio = ... then .squeeze() works regardless of device;
update the conversion around the lines that convert wav to numpy in the function
that calls self.model.generate_speech to use wav.detach().cpu().numpy() (or
equivalent) instead of wav.numpy().
In `@backend/requirements.txt`:
- Around line 21-22: The requirements file currently references linacodec and
Zipvoice from git HEAD which is unstable; pin each git dependency to a specific
commit hash in backend/requirements.txt (replace the current git URLs for
"linacodec" and "Zipvoice" with the same repo URLs annotated with the chosen
commit hashes, e.g. include @<commit-hash> after the repo URL) so installations
are reproducible; after pinning, verify backend/backends/luxtts_backend.py (look
for LuxTTS constructor usage and generate_speech() calls) still match the pinned
commit API and update those callsites if the pinned version has different
parameters.
In `@scripts/setup-dev-sidecar.js`:
- Around line 214-236: The COFF Machine field and Optional Header Magic in the
byte array are hardcoded for AMD64/PE32+ which will produce invalid binaries for
32-bit Windows targets; update scripts/setup-dev-sidecar.js to detect the target
triple (e.g., check for "i686-pc-windows-msvc") and conditionally set the
Machine bytes (use 0x14,0x01 for IMAGE_FILE_MACHINE_I386) and the Optional
Header Magic (use 0x0b,0x01 for PE32) instead of the current 0x64,0x86 and
0x0b,0x02 values, or alternately generate matching PE headers for the detected
target so the produced binary format matches the target architecture.
---
Nitpick comments:
In `@app/src/components/Generation/FloatingGenerateBox.tsx`:
- Around line 405-439: The Select block directly uses form.watch('engine') and
form.setValue(...) to manage engine and modelSize, which is inconsistent and
bypasses validation UI; wrap this composite selector inside a FormField (like
the language selector) by creating a FormField for a virtual/compound field that
renders the Select via its render prop, and inside that render use
field.onChange/field.value or continue calling form.setValue but expose
FormMessage for validation; reference the Select component and the form fields
'engine' and 'modelSize' and ensure the Select's onValueChange still sets
form.setValue('engine', ...) and form.setValue('modelSize', ...) while the
surrounding FormField provides FormItem/FormControl/FormMessage for consistency.
In `@backend/backends/luxtts_backend.py`:
- Around line 174-192: The cache_key is computed twice; compute it once before
the cache lookup and reuse it for both get_cached_voice_prompt and
cache_voice_prompt when use_cache is true: move the call to
get_cache_key(audio_path, reference_text) into a single cache_key variable
(prefixed with "luxtts_") before calling get_cached_voice_prompt, keep the same
cache_key to decide early return, then after awaiting
asyncio.to_thread(_encode_sync) call cache_voice_prompt(cache_key, encoded);
adjust the block that defines _encode_sync and references to use_cache,
cache_key, get_cached_voice_prompt, cache_voice_prompt, and
self.model.encode_prompt accordingly.
In `@backend/main.py`:
- Around line 618-634: The fire-and-forget asyncio.create_task call for
download_model_background risks GC before completion; capture and retain the
Task (e.g., assign the result of asyncio.create_task(...) to a variable and add
it to a long-lived collection or let task_manager track it) so the background
task is referenced until finished, and ensure download_model_background handles
exceptions and removes the task from the tracking collection on completion;
update locations referencing download_model_background,
tts_model.load_model_async, task_manager.start_download, and the create_task
invocation to implement this task-tracking approach.
- Around line 1360-1367: The current check_luxtts_loaded calls
get_tts_backend_for_engine("luxtts") which may instantiate a LuxTTSBackend;
change it to first inspect the TTS backend registry for an existing "luxtts"
entry and only call get_tts_backend_for_engine if an instance is already
registered. Concretely: import the backends module used by
get_tts_backend_for_engine (from .backends import get_tts_backend_for_engine,
<registry_name>), check the registry container for the key "luxtts" (or the
registry API that lists available/registered backends) and if present call
backend = get_tts_backend_for_engine("luxtts") and return backend.is_loaded(),
otherwise return False; update check_luxtts_loaded to use that registry check
instead of unconditionally calling get_tts_backend_for_engine.
In `@justfile`:
- Around line 62-63: The fixed 2-second sleep after starting the server (the
line running "{{ venv_bin }}/uvicorn backend.main:app --reload --port 17493 &"
followed by "sleep 2") is brittle; replace it with a readiness/health-check loop
that polls a known endpoint (e.g., /health or /docs) on localhost:17493 until it
returns a successful status or a timeout is reached, retrying with short sleeps
between attempts; ensure the loop runs after launching uvicorn in the background
and fails the script with a clear message if the timeout is exceeded.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 42548d0a-ba2d-4505-a3f0-e926aa815e54
📒 Files selected for processing (15)
CONTRIBUTING.mdREADME.mdapp/src/components/Generation/FloatingGenerateBox.tsxapp/src/components/Generation/GenerationForm.tsxapp/src/components/ServerSettings/ModelManagement.tsxapp/src/lib/api/types.tsapp/src/lib/hooks/useGenerationForm.tsbackend/backends/__init__.pybackend/backends/luxtts_backend.pybackend/main.pybackend/models.pybackend/profiles.pybackend/requirements.txtjustfilescripts/setup-dev-sidecar.js
| <ModelItem | ||
| key={model.model_name} | ||
| model={model} | ||
| onDownload={() => handleDownload(model.model_name)} | ||
| onDelete={() => { | ||
| setModelToDelete({ | ||
| name: model.model_name, | ||
| displayName: model.display_name, | ||
| sizeMb: model.size_mb, | ||
| }); | ||
| setDeleteDialogOpen(true); | ||
| }} | ||
| isDownloading={downloadingModel === model.model_name} | ||
| formatSize={formatSize} | ||
| /> |
There was a problem hiding this comment.
Pass required ModelItem props in the LuxTTS block (type-check blocker).
At Line 306, ModelItem is rendered without onCancel, isCancelling, and isDismissed, but those props are required by ModelItemProps (Lines 478-481). This is a compile-time correctness issue and also drops cancel/error handling behavior for LuxTTS entries.
Suggested fix
<ModelItem
key={model.model_name}
model={model}
onDownload={() => handleDownload(model.model_name)}
onDelete={() => {
setModelToDelete({
name: model.model_name,
displayName: model.display_name,
sizeMb: model.size_mb,
});
setDeleteDialogOpen(true);
}}
+ onCancel={() => handleCancel(model.model_name)}
isDownloading={downloadingModel === model.model_name}
+ isCancelling={
+ cancelMutation.isPending && cancelMutation.variables === model.model_name
+ }
+ isDismissed={dismissedErrors.has(model.model_name)}
+ erroredDownload={erroredDownloads.get(model.model_name)}
formatSize={formatSize}
/>🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@app/src/components/ServerSettings/ModelManagement.tsx` around lines 306 -
320, The LuxTTS block renders <ModelItem> without required props from
<ModelItemProps>, causing type errors and missing cancel/dismiss behavior;
update the <ModelItem> invocation to pass the missing props: provide onCancel
(call the same cancel handler used elsewhere, e.g., the function used for
downloads/cancels), pass isCancelling (compare a cancellingModel state to
model.model_name) and isDismissed (use the dismissal state/lookup used for other
models), while keeping existing props like model, onDownload (handleDownload),
onDelete (setModelToDelete + setDeleteDialogOpen), isDownloading
(downloadingModel === model.model_name) and formatSize so LuxTTS items support
cancel/dismiss handling and satisfy the type checker.
| linacodec @ git+https://github.com/ysharma3501/LinaCodec.git | ||
| Zipvoice @ git+https://github.com/ysharma3501/LuxTTS.git |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Get latest commit hashes for pinning
echo "LinaCodec latest commit:"
curl -s https://api.github.com/repos/ysharma3501/LinaCodec/commits/main | jq -r '.sha'
echo "LuxTTS latest commit:"
curl -s https://api.github.com/repos/ysharma3501/LuxTTS/commits/main | jq -r '.sha'Repository: jamiepine/voicebox
Length of output: 118
🏁 Script executed:
#!/bin/bash
# Check if repositories exist and their default branches
echo "=== LinaCodec Repository Info ==="
curl -s -I https://github.com/ysharma3501/LinaCodec
echo ""
echo "=== LuxTTS Repository Info ==="
curl -s -I https://github.com/ysharma3501/LuxTTS
echo ""
# Try with master branch instead of main
echo "=== Trying master branch for LinaCodec ==="
curl -s https://api.github.com/repos/ysharma3501/LinaCodec/commits/master | jq -r '.sha' 2>/dev/null || echo "master not found"
echo ""
echo "=== Trying master branch for LuxTTS ==="
curl -s https://api.github.com/repos/ysharma3501/LuxTTS/commits/master | jq -r '.sha' 2>/dev/null || echo "master not found"Repository: jamiepine/voicebox
Length of output: 9827
🌐 Web query:
ysharma3501 LinaCodec GitHub repository
💡 Result:
GitHub repository: ysharma3501/LinaCodec — “A highly compressive and high-quality neural audio codec for speech models.” It provides a Python package (linacodec) with example usage for encoding audio into tokens + a global embedding and decoding back to 48 kHz audio, plus helpers like voice conversion. [1]
Install (from README):
pip install git+https://github.com/ysharma3501/LinaCodec.git[1]
Associated model card (points back to the GitHub repo): YatharthS/LinaCodec on Hugging Face. [2]
Sources: [1] [2]
🌐 Web query:
ysharma3501 LuxTTS GitHub repository
💡 Result:
The GitHub repository is ysharma3501/LuxTTS: https://github.com/ysharma3501/LuxTTS (LuxTTS is a lightweight ZipVoice-based TTS model for high-quality voice cloning, advertised at 150× realtime). [1]
Related model page (points back to the same repo): https://huggingface.co/YatharthS/LuxTTS. [2]
Pin git dependencies to specific commits for reproducibility and stability.
The linacodec and Zipvoice packages are installed from git HEAD without version pins. If upstream changes the API (e.g., the LuxTTS constructor signature or generate_speech() parameters used in backend/backends/luxtts_backend.py), builds will silently fail or behave unexpectedly. This is especially critical for custom/non-standard repositories like these.
🔒 Proposed fix to pin commits
-linacodec @ git+https://github.com/ysharma3501/LinaCodec.git
-Zipvoice @ git+https://github.com/ysharma3501/LuxTTS.git
+linacodec @ git+https://github.com/ysharma3501/LinaCodec.git@<commit-hash>
+Zipvoice @ git+https://github.com/ysharma3501/LuxTTS.git@<commit-hash>Determine the appropriate commit hashes from each repository and replace <commit-hash> accordingly.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/requirements.txt` around lines 21 - 22, The requirements file
currently references linacodec and Zipvoice from git HEAD which is unstable; pin
each git dependency to a specific commit hash in backend/requirements.txt
(replace the current git URLs for "linacodec" and "Zipvoice" with the same repo
URLs annotated with the chosen commit hashes, e.g. include @<commit-hash> after
the repo URL) so installations are reproducible; after pinning, verify
backend/backends/luxtts_backend.py (look for LuxTTS constructor usage and
generate_speech() calls) still match the pinned commit API and update those
callsites if the pinned version has different parameters.
| 0x64, | ||
| 0x86, // Machine: AMD64 | ||
| 0x01, | ||
| 0x00, // NumberOfSections: 1 | ||
| 0x00, | ||
| 0x00, | ||
| 0x00, | ||
| 0x00, // TimeDateStamp | ||
| 0x00, | ||
| 0x00, | ||
| 0x00, | ||
| 0x00, // PointerToSymbolTable | ||
| 0x00, | ||
| 0x00, | ||
| 0x00, | ||
| 0x00, // NumberOfSymbols | ||
| 0xf0, | ||
| 0x00, // SizeOfOptionalHeader | ||
| 0x22, | ||
| 0x00, // Characteristics: EXECUTABLE_IMAGE | LARGE_ADDRESS_AWARE | ||
| // Optional Header (PE32+) | ||
| 0x0B, 0x02, // Magic: PE32+ | ||
| 0x00, 0x00, // Linker version | ||
| 0x00, 0x00, 0x00, 0x00, // SizeOfCode | ||
| 0x00, 0x00, 0x00, 0x00, // SizeOfInitializedData | ||
| 0x00, 0x00, 0x00, 0x00, // SizeOfUninitializedData | ||
| 0x00, 0x10, 0x00, 0x00, // AddressOfEntryPoint | ||
| 0x00, 0x00, 0x00, 0x00, // BaseOfCode | ||
| 0x00, 0x00, 0x00, 0x40, 0x01, 0x00, 0x00, 0x00, // ImageBase | ||
| 0x00, 0x10, 0x00, 0x00, // SectionAlignment | ||
| 0x00, 0x02, 0x00, 0x00, // FileAlignment | ||
| 0x06, 0x00, 0x00, 0x00, // OS version | ||
| 0x00, 0x00, 0x00, 0x00, // Image version | ||
| 0x06, 0x00, 0x00, 0x00, // Subsystem version | ||
| 0x00, 0x00, 0x00, 0x00, // Win32VersionValue | ||
| 0x00, 0x20, 0x00, 0x00, // SizeOfImage | ||
| 0x00, 0x02, 0x00, 0x00, // SizeOfHeaders | ||
| 0x00, 0x00, 0x00, 0x00, // CheckSum | ||
| 0x03, 0x00, // Subsystem: CONSOLE | ||
| 0x60, 0x01, // DllCharacteristics | ||
| 0x0b, | ||
| 0x02, // Magic: PE32+ |
There was a problem hiding this comment.
Guard non-x64 Windows targets or generate matching PE headers.
At Line 214/Line 215 the COFF machine is hardcoded to AMD64, and Line 235/Line 236 uses PE32+ magic. If target detection returns i686-pc-windows-msvc, this produces a target-named binary with an incompatible executable format.
Proposed defensive fix
if (isWindows) {
+ if (!targetTriple.startsWith('x86_64-')) {
+ throw new Error(
+ `Unsupported Windows target for placeholder PE: ${targetTriple}. ` +
+ 'Only x86_64 Windows placeholder is currently implemented.',
+ );
+ }
+
// Create a minimal valid Windows PE executable that exits with code 1
// This is the smallest valid PE that Windows will accept
const minimalPE = Buffer.from([🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scripts/setup-dev-sidecar.js` around lines 214 - 236, The COFF Machine field
and Optional Header Magic in the byte array are hardcoded for AMD64/PE32+ which
will produce invalid binaries for 32-bit Windows targets; update
scripts/setup-dev-sidecar.js to detect the target triple (e.g., check for
"i686-pc-windows-msvc") and conditionally set the Machine bytes (use 0x14,0x01
for IMAGE_FILE_MACHINE_I386) and the Optional Header Magic (use 0x0b,0x01 for
PE32) instead of the current 0x64,0x86 and 0x0b,0x02 values, or alternately
generate matching PE headers for the detected target so the produced binary
format matches the target architecture.
- Add threading lock to get_tts_backend_for_engine() to prevent race condition where concurrent requests could create duplicate backend instances (double-checked locking pattern) - Fix LuxTTS generate: call .detach().cpu() before .numpy() so it works on GPU/MPS devices, not just CPU - Store background download tasks in a module-level set to prevent garbage collection before completion (asyncio.create_task fire-and- forget pattern) - Deduplicate cache_key computation in LuxTTS create_voice_prompt - Prefix unused sr variable with underscore
There was a problem hiding this comment.
🧹 Nitpick comments (2)
backend/backends/__init__.py (1)
199-204: Consider acquiring lock inreset_backends()for thread safety.
reset_backends()calls_tts_backends.clear()without holding_tts_backends_lock. While dict operations are atomic in CPython, a concurrentget_tts_backend_for_engine()call could see an inconsistent state or re-create a backend immediately after the clear. Since this function is primarily for testing, the risk is low but worth noting.🔧 Proposed fix
def reset_backends(): """Reset backend instances (useful for testing).""" - global _tts_backend, _tts_backends, _stt_backend - _tts_backend = None - _tts_backends.clear() - _stt_backend = None + global _tts_backend, _tts_backends, _stt_backend + with _tts_backends_lock: + _tts_backend = None + _tts_backends.clear() + _stt_backend = None🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/backends/__init__.py` around lines 199 - 204, reset_backends() mutates shared state without synchronizing with _tts_backends_lock; acquire _tts_backends_lock before clearing or reassigning related globals to avoid race conditions. Update reset_backends() to acquire _tts_backends_lock, perform _tts_backends.clear() and set _tts_backend/_stt_backend under the lock, then release it; ensure the lock used matches the one in get_tts_backend_for_engine() and other backend-accessing functions.backend/backends/luxtts_backend.py (1)
88-93: Model loading lacks protection against concurrent load_model calls.If two coroutines call
load_model()concurrently, both may pass theself.model is not Nonecheck before either completes loading. This could result in redundant model loading or resource contention.Based on learnings, a similar race condition exists in
PyTorchTTSBackendand is tracked as a future follow-up. This is a pre-existing pattern in the codebase.🔒 Proposed fix using asyncio.Lock
+import asyncio + class LuxTTSBackend: """LuxTTS backend for zero-shot voice cloning.""" def __init__(self): self.model = None self.model_size = "default" self._device = None + self._load_lock = asyncio.Lock() # ... async def load_model(self, model_size: str = "default") -> None: """Load the LuxTTS model.""" - if self.model is not None: - return - - await asyncio.to_thread(self._load_model_sync) + async with self._load_lock: + if self.model is not None: + return + await asyncio.to_thread(self._load_model_sync)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/backends/luxtts_backend.py` around lines 88 - 93, The load_model coroutine suffers a race where multiple callers can pass the "if self.model is not None" check concurrently; protect it with an asyncio.Lock: add an asyncio.Lock instance on the backend (e.g., self._load_lock created in __init__ or lazily), then wrap the check-and-load sequence inside "async with self._load_lock" in load_model, re-check self.model after acquiring the lock, and only then call await asyncio.to_thread(self._load_model_sync); reference the methods/attributes load_model, _load_model_sync, and self.model when applying the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@backend/backends/__init__.py`:
- Around line 199-204: reset_backends() mutates shared state without
synchronizing with _tts_backends_lock; acquire _tts_backends_lock before
clearing or reassigning related globals to avoid race conditions. Update
reset_backends() to acquire _tts_backends_lock, perform _tts_backends.clear()
and set _tts_backend/_stt_backend under the lock, then release it; ensure the
lock used matches the one in get_tts_backend_for_engine() and other
backend-accessing functions.
In `@backend/backends/luxtts_backend.py`:
- Around line 88-93: The load_model coroutine suffers a race where multiple
callers can pass the "if self.model is not None" check concurrently; protect it
with an asyncio.Lock: add an asyncio.Lock instance on the backend (e.g.,
self._load_lock created in __init__ or lazily), then wrap the check-and-load
sequence inside "async with self._load_lock" in load_model, re-check self.model
after acquiring the lock, and only then call await
asyncio.to_thread(self._load_model_sync); reference the methods/attributes
load_model, _load_model_sync, and self.model when applying the change.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 6fddb006-2901-421a-b297-d941800d80b2
📒 Files selected for processing (3)
backend/backends/__init__.pybackend/backends/luxtts_backend.pybackend/main.py
- Reflects merged PRs: #254 (LuxTTS/multi-engine), #257 (Chatterbox), #252 (CUDA swap), #238 (download UI) - Updated architecture diagram to show all 4 TTS engines - Added TTS engine comparison table and multi-engine architecture section - Marked resolved bottlenecks (singleton backend, frontend Qwen assumptions) - Updated PR triage: marked #194 and #33 as superseded - Added 'Adding a New Engine' guide (now ~1 day effort) - Updated recommended priorities to reflect current state - Added new API endpoints (CUDA, cancel, active tasks)
Summary
enginedefaults to"qwen"so existing workflows are unchangedChanges
Backend
backend/backends/luxtts_backend.py— NewLuxTTSBackendwrappingzipvoice.luxvoice.LuxTTS(encode_prompt, generate_speech, model caching, device auto-detection)backend/backends/__init__.py— Multi-engine registry withget_tts_backend_for_engine(engine)replacing the singleton pattern;TTS_ENGINESdict for supported enginesbackend/models.py—enginefield onGenerationRequest(optional, default"qwen", validated^(qwen|luxtts)$)backend/main.py— Engine dispatch in/generateand/generate/streamendpoints; LuxTTS added to model status, download trigger, and delete mapsbackend/profiles.py—create_voice_prompt_for_profileacceptsengineparam, uses engine-specific backendbackend/requirements.txt— AddedZipvoice(git install from LuxTTS repo),onnxruntime,piper-phonemize,lhotse,pydub,inflect; pinnedtransformers<=4.57.6Frontend
GenerationForm.tsx— New TTS Engine selector dropdown; Model Size and Delivery Instructions fields hidden when LuxTTS is selected (not applicable)useGenerationForm.ts—engineadded to Zod schema with default'qwen'; engine-aware model name resolution and API payload constructiontypes.ts—engineandinstructfields added toGenerationRequestTypeScript typeModelManagement.tsx— New "LuxTTS Models" section (conditionally rendered when LuxTTS models exist in status)Key design decisions
luxtts_prefix) prevent voice prompt collisions between Qwen and LuxTTSYatharthS/LuxTTS) on first useinstruct— delivery instructions are a Qwen-only feature, hidden in UI when LuxTTS is selectedDepends on
feat/cuda-backend-swap)Summary by CodeRabbit
New Features
Documentation
Chores