Skip to content

replicate#87

Merged
richardr1126 merged 4 commits intomainfrom
replicate
Apr 16, 2026
Merged

replicate#87
richardr1126 merged 4 commits intomainfrom
replicate

Conversation

@richardr1126
Copy link
Copy Markdown
Owner

@richardr1126 richardr1126 commented Apr 16, 2026

  • feat: add support for Replicate TTS provider and models
  • feat(replicate): add support for custom Replicate model selection

Summary by CodeRabbit

  • New Features

    • Replicate added as a supported TTS provider with built-in model/catalog support
    • Settings default behavior updated for Replicate (model selected automatically; API base URL hidden)
    • UI now conditionally shows native-model speed controls and indicates when native speed is unsupported
  • Bug Fixes

    • Improved rate-limit handling using upstream Retry-After info for 429 responses
  • Documentation

    • New Replicate setup guide and updates across TTS, deployment, and quick-start docs
  • Chores

    • Added Replicate runtime dependency
  • Tests

    • Added unit tests covering Kokoro/Replicate voice and speed behaviors

- Updated environment variables documentation to include Replicate as a TTS provider option.
- Added Replicate to the sidebar for TTS provider guides.
- Included Replicate as a dependency in package.json and pnpm-lock.yaml.
- Enhanced audiobook chapter generation to normalize native speed settings based on the TTS provider.
- Improved error handling in TTS API routes to provide retry information for rate-limited responses.
- Updated AudiobookExportModal to reflect native speed support for Replicate models.
- Modified SettingsModal to set default model for Replicate.
- Enhanced SpeedControl component to conditionally render native speed controls based on provider support.
- Updated TTSContext to utilize effective native speed for TTS requests.
- Implemented Replicate request handling in the TTS generation logic.
- Added new documentation for configuring Replicate as a TTS provider.
@vercel
Copy link
Copy Markdown

vercel Bot commented Apr 16, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
open-reader-web-ui Ready Ready Preview, Comment Apr 16, 2026 7:09pm

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 16, 2026

Warning

Rate limit exceeded

@richardr1126 has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 44 minutes and 30 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 44 minutes and 30 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ec98e58f-97eb-42cb-82ad-98ae4e2d3a74

📥 Commits

Reviewing files that changed from the base of the PR and between 893f74f and 3db3fd1.

📒 Files selected for processing (5)
  • src/app/api/audiobook/chapter/route.ts
  • src/app/api/tts/route.ts
  • src/lib/server/tts/generate.ts
  • src/lib/server/tts/upstream-response.ts
  • src/lib/shared/tts-provider-catalog.ts
📝 Walkthrough

Walkthrough

Adds Replicate as a first-class TTS provider: docs, package dependency, provider catalog and voice-resolution, server-side Replicate request implementation with retry/cooldown, UI and settings changes for provider/model selection, and native-model-speed capability checks propagated through caching and playback logic.

Changes

Cohort / File(s) Summary
Documentation - Guides & Site
README.md, docs-site/docs/introduction.md, docs-site/docs/configure/tts-providers.md, docs-site/docs/configure/tts-provider-guides/replicate.md, docs-site/docs/deploy/vercel-deployment.md, docs-site/docs/docker-quick-start.md, docs-site/docs/reference/environment-variables.md, docs-site/sidebars.ts
Added Replicate to highlights and provider lists, added a Replicate provider guide, updated env var examples/defaults to include Replicate, adjusted provider naming (DeepInfra), and added sidebar entry.
Dependencies
package.json
Added replicate runtime dependency (^1.4.0).
Provider Catalog & Types
src/lib/shared/tts-provider-catalog.ts, src/lib/shared/kokoro.ts
Introduced replicate provider and voice sources/types/constants, built-in model/voice lists, voice-resolution via Replicate OpenAPI schema fetch + caching, supportsNativeModelSpeed() and kokoro voice-limit adjustments. (Exports added/updated.)
Server TTS Implementation
src/lib/server/tts/generate.ts, src/app/api/tts/route.ts
Implemented Replicate request shaping, provider routing, concurrency gate and cooldown, retry/backoff using Retry-After, abort-aware sleep, and improved upstream error/status/retry parsing and logging.
API Route & Settings Normalization
src/app/api/audiobook/chapter/route.ts
Normalized incoming/existing settings including native speed coercion; merged settings before validation to prevent native-speed mismatches.
TTS Context & Playback
src/contexts/TTSContext.tsx
Introduced effectiveNativeSpeed (1 when provider/model doesn't support native speed); switched cache keys, request payloads, and memo deps to use effective speed.
UI: Settings, Modals & Controls
src/components/SettingsModal.tsx, src/components/AudiobookExportModal.tsx, src/components/player/SpeedControl.tsx
Added Replicate default model handling in settings, hid API base input for Replicate, improved anchored dropdowns, surfaced native-speed support (show/hide sliders and display “Not supported”), and updated export/settings UI to use effective native speed.
Tests
tests/unit/kokoro.spec.ts, tests/unit/tts-provider-catalog.spec.ts
Added unit tests for kokoro voice limits and extensive tests for Replicate voice resolution, input-key discovery, caching behavior, and supportsNativeModelSpeed logic.

Sequence Diagram

sequenceDiagram
    actor User
    participant UI as Settings / Player UI
    participant Client as Browser Client
    participant Server as OpenReader Server
    participant Replicate as Replicate API
    participant Cache as Audio/Schema Cache

    User->>UI: Select provider/model (Replicate)
    UI->>UI: supportsNativeModelSpeed() → set effectiveNativeSpeed
    User->>UI: Request TTS playback
    UI->>Client: POST /api/tts (includes effectiveNativeSpeed)
    Client->>Server: Forward TTS request
    Server->>Server: resolveTTSRequest & supportsNativeModelSpeed()
    Server->>Cache: Check audio cache (key uses effectiveNativeSpeed)
    alt Cache hit
        Cache-->>Server: Return cached audio
    else Cache miss
        Server->>Replicate: runReplicateRequest (through gate/cooldown)
        alt 200 OK
            Replicate-->>Server: Audio
            Server->>Cache: Store audio
        else 429 (Rate limited)
            Replicate-->>Server: 429 + Retry-After?
            Server->>Server: applyReplicateCooldown(), respond 429 with retryAfterSeconds
        else 5xx / error
            Replicate-->>Server: Error
            Server->>Server: Retry/backoff (uses Retry-After or default)
        end
    end
    Server-->>Client: Return audio or ProblemDetails
    Client-->>UI: Play audio / show retry info
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~65 minutes

Possibly related PRs

  • clean code refactor #84: Shares modifications to the TTS provider/catalog and voice-resolution helpers that this PR extends with Replicate-specific implementations.

Poem

🐇 I hopped through docs and server queues,
A Replicate bell on my shoes,
I stitched the voices, tamed the speed,
Queued retries when limits lead,
Now audio blooms where rabbits read. 🎶

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 2.70% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'replicate' is too vague and generic, failing to convey meaningful information about what the pull request accomplishes. Use a more descriptive title such as 'Add Replicate TTS provider support' or 'Add support for Replicate as a TTS provider' to clearly communicate the main change.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch replicate

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d0c0ab7420

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +333 to +334
const audioUrl = typeof output === 'string' ? output : String(output);
const audioResponse = await fetch(audioUrl, { signal });
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Read Replicate FileOutput instead of coercing to URL string

replicate@1.x returns FileOutput objects (and sometimes arrays of them) by default, but this code assumes every non-string result can be converted to a fetchable URL via String(output). For file outputs this becomes values like [object ReadableStream] (or comma-joined object strings), so fetch(audioUrl) fails and Replicate TTS requests return errors instead of audio. This breaks the new provider path for models that emit file outputs unless the client is configured with useFileOutput: false or the stream/object is read directly.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
docs-site/docs/deploy/vercel-deployment.md (1)

36-37: ⚠️ Potential issue | 🟡 Minor

**Inconsistency between code example and recommendation.**The code block at lines 36-37 shows NEXT_PUBLIC_DEFAULT_TTS_PROVIDER=deepinfra and NEXT_PUBLIC_DEFAULT_TTS_MODEL=hexgrad/Kokoro-82M, but the info block at lines 52-53 recommends replicate and google/gemini-3.1-flash-tts. Consider updating the code block to match the recommendation for consistency.

📝 Proposed fix to align code example with recommendation
 # Optional client/runtime feature defaults:
 NEXT_PUBLIC_ENABLE_DOCX_CONVERSION=false
 NEXT_PUBLIC_ENABLE_DESTRUCTIVE_DELETE_ACTIONS=false
 NEXT_PUBLIC_ENABLE_TTS_PROVIDERS_TAB=false
-NEXT_PUBLIC_DEFAULT_TTS_PROVIDER=deepinfra
-NEXT_PUBLIC_DEFAULT_TTS_MODEL=hexgrad/Kokoro-82M
+NEXT_PUBLIC_DEFAULT_TTS_PROVIDER=replicate
+NEXT_PUBLIC_DEFAULT_TTS_MODEL=google/gemini-3.1-flash-tts
 NEXT_PUBLIC_SHOW_ALL_DEEPINFRA_MODELS=false

Also applies to: 52-53

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs-site/docs/deploy/vercel-deployment.md` around lines 36 - 37, The code
example uses NEXT_PUBLIC_DEFAULT_TTS_PROVIDER and NEXT_PUBLIC_DEFAULT_TTS_MODEL
with values deepinfra and hexgrad/Kokoro-82M which conflicts with the
recommendation; update both occurrences of the environment examples where
NEXT_PUBLIC_DEFAULT_TTS_PROVIDER and NEXT_PUBLIC_DEFAULT_TTS_MODEL are shown to
use the recommended values (set NEXT_PUBLIC_DEFAULT_TTS_PROVIDER=replicate and
NEXT_PUBLIC_DEFAULT_TTS_MODEL=google/gemini-3.1-flash-tts) so the examples and
recommendation are consistent.
src/lib/server/tts/generate.ts (1)

140-154: ⚠️ Potential issue | 🟠 Major

Custom Replicate models only pass { text } to the API, silently dropping all other request parameters.

When using a custom Replicate model (not one of the four hard-coded builders: google/gemini-3.1-flash-tts, minimax/speech-2.8-turbo, qwen/qwen3-tts, inworld/tts-1.5-mini), the buildReplicateInput() function falls through to line 317 and returns { text: request.text } only. This causes:

  • Voice is dropped from the API request
  • Format is dropped from the API request
  • Instructions are dropped upstream by resolveTTSRequest() since supportsTtsInstructions() returns false for custom models
  • Speed is reset to 1 since supportsNativeModelSpeed() returns false for custom models

This makes custom Replicate models only viable if the underlying model's schema is exactly { text }. Either constrain the UI to the mapped models or add an explicit mechanism to define per-model input mappings for custom Replicate models.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lib/server/tts/generate.ts` around lines 140 - 154, The
buildReplicateInput() fallback currently returns only { text: request.text } for
non-mapped models, causing voice/format/instructions/speed to be dropped; update
buildReplicateInput() to preserve and pass through the other parameters (voice,
format, speed, instructions) when they exist or provide a clear error for
unsupported custom models, and add a per-model input-mapping mechanism (e.g., a
map keyed by model name) so custom Replicate models can declare the expected
input shape; also ensure resolveTTSRequest() still filters instructions only
when supportsTtsInstructions(model) is false and that
supportsNativeModelSpeed(model) governs whether to include speed or normalize
it, referencing buildReplicateInput(), resolveTTSRequest(),
supportsTtsInstructions(), and supportsNativeModelSpeed() to locate the changes.
🧹 Nitpick comments (3)
src/lib/shared/tts-provider-catalog.ts (1)

136-144: Consider edge case for custom Replicate models.

For custom Replicate models (not in either set), supportsNativeModelSpeed returns false (line 160). This is a safe default, but users with custom models that do support native speed won't be able to use it. Consider documenting this behavior or adding a mechanism for custom model speed support if needed in the future.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lib/shared/tts-provider-catalog.ts` around lines 136 - 144, The current
default in supportsNativeModelSpeed returns false for any Replicate model not in
REPLICATE_MODELS_WITH_NATIVE_SPEED or REPLICATE_MODELS_WITHOUT_NATIVE_SPEED,
which prevents custom models that do support native speed from being used;
update the code or docs to handle this by either documenting the behavior
clearly and advising users to add their custom model IDs to
REPLICATE_MODELS_WITH_NATIVE_SPEED, or implement a configurable mechanism (e.g.,
an exported addReplicateModelWithNativeSpeed(modelId) function, a
runtime-configurable Set, or reading a REPLICATE_NATIVE_SPEED_MODELS env/config
list) and reference the sets REPLICATE_MODELS_WITH_NATIVE_SPEED,
REPLICATE_MODELS_WITHOUT_NATIVE_SPEED and the supportsNativeModelSpeed function
so callers can opt their custom models into native-speed support.
src/components/SettingsModal.tsx (1)

565-565: Consider simplifying the base URL visibility condition.

The condition localTTSProvider !== 'replicate' && (localTTSProvider === 'custom-openai' || !localBaseUrl || localBaseUrl === '') works but is slightly convoluted. Consider extracting to a helper for clarity.

♻️ Optional: Extract to named variable
+ const showBaseUrlInput = localTTSProvider !== 'replicate' && 
+   (localTTSProvider === 'custom-openai' || !localBaseUrl);

- {localTTSProvider !== 'replicate' && (localTTSProvider === 'custom-openai' || !localBaseUrl || localBaseUrl === '') && (
+ {showBaseUrlInput && (
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/components/SettingsModal.tsx` at line 565, The JSX conditional for
showing the base URL is complex; extract it into a named boolean (e.g.,
shouldShowBaseUrl) defined in SettingsModal (above the render/return) that
computes: localTTSProvider !== 'replicate' && (localTTSProvider ===
'custom-openai' || !localBaseUrl || localBaseUrl === ''), then replace the
inline expression in the JSX with that variable (referencing localTTSProvider
and localBaseUrl where computed).
src/app/api/tts/route.ts (1)

62-71: Handle HTTP-date Retry-After values in addition to delta-seconds.

Current parsing only supports numeric values, so valid date-form headers are ignored.

🔧 Suggested parser improvement
 function getUpstreamRetryAfterSeconds(error: unknown): number | undefined {
   if (typeof error !== 'object' || error === null) return undefined;
   const rec = error as Record<string, unknown>;
   const response = rec.response as { headers?: { get?: (name: string) => string | null } } | undefined;
   const retryAfterHeader = response?.headers?.get?.('retry-after');
   if (!retryAfterHeader) return undefined;
-  const parsed = Number(retryAfterHeader);
-  if (!Number.isFinite(parsed) || parsed <= 0) return undefined;
-  return Math.ceil(parsed);
+  const deltaSeconds = Number(retryAfterHeader);
+  if (Number.isFinite(deltaSeconds) && deltaSeconds > 0) {
+    return Math.ceil(deltaSeconds);
+  }
+
+  const retryAtMs = Date.parse(retryAfterHeader);
+  if (!Number.isFinite(retryAtMs)) return undefined;
+  const remainingSeconds = Math.ceil((retryAtMs - Date.now()) / 1000);
+  return remainingSeconds > 0 ? remainingSeconds : undefined;
 }

Also applies to: 237-257

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/app/api/tts/route.ts` around lines 62 - 71, The current
getUpstreamRetryAfterSeconds function only treats the Retry-After header as
delta-seconds; update it to also handle HTTP-date values: keep the existing
numeric parsing (Number(retryAfterHeader)), and if that fails, call
Date.parse(retryAfterHeader) to get a timestamp, compute (parsedDateMs -
Date.now())/1000, treat only positive results, and return Math.ceil(seconds).
Apply the same change to the other equivalent parser in the file (the duplicate
implementation around the later block) so both numeric and HTTP-date formats are
supported.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs-site/docs/configure/tts-providers.md`:
- Around line 22-28: The documentation uses the inconsistent casing "Deepinfra"
in the TTS providers section; update every instance of "Deepinfra" to the
canonical "DeepInfra" (e.g., in the sentence "For `OpenAI`, `Deepinfra`, and
`Replicate`..." and in the model list entry) so the provider name matches the
rest of the docs and branding.

In `@src/app/api/audiobook/chapter/route.ts`:
- Around line 96-100: The normalization function normalizeNativeSpeedForSettings
is only used for mismatch checks, so when data.settings is absent the runtime
can still read nativeSpeed from raw existingSettings and bypass normalization;
update the code paths that build the runtime settings (the merge of
existingSettings and data.settings / incoming settings used for generation) to
always pass the merged AudiobookGenerationSettings through
normalizeNativeSpeedForSettings before using it for generation or comparison,
ensuring any downstream reads (generation, supportsNativeModelSpeed checks, and
mismatch logic) use the normalized settings; apply the same change wherever
merged settings are constructed (including the logic that currently reads
existingSettings directly) so nativeSpeed is enforced consistently.

In `@src/lib/server/tts/generate.ts`:
- Around line 119-136: runWithReplicateGate currently serializes all requests by
awaiting the global replicateQueue (variable replicateQueue) so every call waits
for the previous one; change it so the function no longer awaits prev and thus
does not create a global one-at-a-time queue. Keep the cooldown logic based on
replicateBlockedUntilMs and sleepWithSignal(signal) to delay start when
necessary, but remove the await prev and the replicateQueue chaining; instead
let operations run concurrently while each checks waitMs = Math.max(0,
replicateBlockedUntilMs - Date.now()) and sleeps if needed, and only update
replicateBlockedUntilMs when you receive a 429 backoff in the code that calls
operation. Alternatively, if limited concurrency is desired, replace the global
chain with a bounded semaphore (not replicateQueue) to permit N concurrent
operations; update runWithReplicateGate (and references to release) accordingly.
- Around line 334-355: When the audio fetch fails (the audioResponse.ok branch
after fetch(audioUrl, { signal })), don't throw a plain Error; instead throw or
propagate an error object that preserves the upstream status and headers so
getUpstreamStatus(error) and getUpstreamRetryAfterSeconds(error) can read them.
Modify the audio fetch failure path that currently does throw new Error(`Failed
to fetch Replicate audio: ${audioResponse.status}`) to attach the response
status and relevant headers (or the response object) to the thrown error (or
create a custom Error subclass) so the subsequent logic around
getUpstreamStatus, getUpstreamRetryAfterSeconds, applyReplicateCooldown and
retry logic works correctly.

---

Outside diff comments:
In `@docs-site/docs/deploy/vercel-deployment.md`:
- Around line 36-37: The code example uses NEXT_PUBLIC_DEFAULT_TTS_PROVIDER and
NEXT_PUBLIC_DEFAULT_TTS_MODEL with values deepinfra and hexgrad/Kokoro-82M which
conflicts with the recommendation; update both occurrences of the environment
examples where NEXT_PUBLIC_DEFAULT_TTS_PROVIDER and
NEXT_PUBLIC_DEFAULT_TTS_MODEL are shown to use the recommended values (set
NEXT_PUBLIC_DEFAULT_TTS_PROVIDER=replicate and
NEXT_PUBLIC_DEFAULT_TTS_MODEL=google/gemini-3.1-flash-tts) so the examples and
recommendation are consistent.

In `@src/lib/server/tts/generate.ts`:
- Around line 140-154: The buildReplicateInput() fallback currently returns only
{ text: request.text } for non-mapped models, causing
voice/format/instructions/speed to be dropped; update buildReplicateInput() to
preserve and pass through the other parameters (voice, format, speed,
instructions) when they exist or provide a clear error for unsupported custom
models, and add a per-model input-mapping mechanism (e.g., a map keyed by model
name) so custom Replicate models can declare the expected input shape; also
ensure resolveTTSRequest() still filters instructions only when
supportsTtsInstructions(model) is false and that supportsNativeModelSpeed(model)
governs whether to include speed or normalize it, referencing
buildReplicateInput(), resolveTTSRequest(), supportsTtsInstructions(), and
supportsNativeModelSpeed() to locate the changes.

---

Nitpick comments:
In `@src/app/api/tts/route.ts`:
- Around line 62-71: The current getUpstreamRetryAfterSeconds function only
treats the Retry-After header as delta-seconds; update it to also handle
HTTP-date values: keep the existing numeric parsing (Number(retryAfterHeader)),
and if that fails, call Date.parse(retryAfterHeader) to get a timestamp, compute
(parsedDateMs - Date.now())/1000, treat only positive results, and return
Math.ceil(seconds). Apply the same change to the other equivalent parser in the
file (the duplicate implementation around the later block) so both numeric and
HTTP-date formats are supported.

In `@src/components/SettingsModal.tsx`:
- Line 565: The JSX conditional for showing the base URL is complex; extract it
into a named boolean (e.g., shouldShowBaseUrl) defined in SettingsModal (above
the render/return) that computes: localTTSProvider !== 'replicate' &&
(localTTSProvider === 'custom-openai' || !localBaseUrl || localBaseUrl === ''),
then replace the inline expression in the JSX with that variable (referencing
localTTSProvider and localBaseUrl where computed).

In `@src/lib/shared/tts-provider-catalog.ts`:
- Around line 136-144: The current default in supportsNativeModelSpeed returns
false for any Replicate model not in REPLICATE_MODELS_WITH_NATIVE_SPEED or
REPLICATE_MODELS_WITHOUT_NATIVE_SPEED, which prevents custom models that do
support native speed from being used; update the code or docs to handle this by
either documenting the behavior clearly and advising users to add their custom
model IDs to REPLICATE_MODELS_WITH_NATIVE_SPEED, or implement a configurable
mechanism (e.g., an exported addReplicateModelWithNativeSpeed(modelId) function,
a runtime-configurable Set, or reading a REPLICATE_NATIVE_SPEED_MODELS
env/config list) and reference the sets REPLICATE_MODELS_WITH_NATIVE_SPEED,
REPLICATE_MODELS_WITHOUT_NATIVE_SPEED and the supportsNativeModelSpeed function
so callers can opt their custom models into native-speed support.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 540f391f-2502-4d20-8d39-4f3e4aa2e515

📥 Commits

Reviewing files that changed from the base of the PR and between 6606b0e and d0c0ab7.

⛔ Files ignored due to path filters (1)
  • pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (17)
  • README.md
  • docs-site/docs/configure/tts-provider-guides/replicate.md
  • docs-site/docs/configure/tts-providers.md
  • docs-site/docs/deploy/vercel-deployment.md
  • docs-site/docs/docker-quick-start.md
  • docs-site/docs/introduction.md
  • docs-site/docs/reference/environment-variables.md
  • docs-site/sidebars.ts
  • package.json
  • src/app/api/audiobook/chapter/route.ts
  • src/app/api/tts/route.ts
  • src/components/AudiobookExportModal.tsx
  • src/components/SettingsModal.tsx
  • src/components/player/SpeedControl.tsx
  • src/contexts/TTSContext.tsx
  • src/lib/server/tts/generate.ts
  • src/lib/shared/tts-provider-catalog.ts

Comment thread docs-site/docs/configure/tts-providers.md Outdated
Comment thread src/app/api/audiobook/chapter/route.ts
Comment thread src/lib/server/tts/generate.ts
Comment thread src/lib/server/tts/generate.ts
Align default Replicate model to the versioned Kokoro model across UI, docs,
and server logic. Refactor TTS settings merging for audiobooks to ensure
consistent normalization. Improve Retry-After header handling for upstream
rate limits. Expand Replicate voice resolution to use model schemas when
available, with test coverage for custom and built-in models.

Update documentation to reflect new Replicate defaults, model selection,
and configuration guidance.
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (3)
src/lib/shared/tts-provider-catalog.ts (1)

127-128: Unbounded in-memory caches may cause memory growth over time.

replicateVoiceInputKeyCache and replicateOpenApiSchemaPromiseCache grow indefinitely as new models are queried. In a long-running server with many custom Replicate models, this could accumulate memory.

Consider using an LRU cache with a size limit, similar to ttsAudioCache in generate.ts.

♻️ Example using LRUCache
+import { LRUCache } from 'lru-cache';
+
-const replicateVoiceInputKeyCache = new Map<string, ReplicateVoiceInputKey>();
-const replicateOpenApiSchemaPromiseCache = new Map<string, Promise<unknown | null>>();
+const replicateVoiceInputKeyCache = new LRUCache<string, ReplicateVoiceInputKey>({ max: 100 });
+const replicateOpenApiSchemaPromiseCache = new LRUCache<string, Promise<unknown | null>>({ max: 100 });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lib/shared/tts-provider-catalog.ts` around lines 127 - 128, The two
unbounded maps replicateVoiceInputKeyCache and
replicateOpenApiSchemaPromiseCache can grow without limit; replace them with a
bounded LRU cache (or use the existing LRU approach used by ttsAudioCache in
generate.ts) so entries are evicted when capacity is reached; specifically swap
the Map instantiations for an LRUCache (or a small custom LRUMap) with a
sensible max size, preserve the same key/value types (string ->
ReplicateVoiceInputKey and string -> Promise<unknown|null>), and ensure all code
that reads/writes these caches (references to replicateVoiceInputKeyCache and
replicateOpenApiSchemaPromiseCache) continues to work with the LRU API.
src/app/api/audiobook/chapter/route.ts (1)

312-316: Type assertion on partial object merge.

The cast to AudiobookGenerationSettings assumes the merged object has all required fields. If existingSettings from storage is malformed or incomplete, runtime errors could occur when accessing properties like ttsProvider or ttsModel.

Consider adding validation or using a type guard before the cast.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/app/api/audiobook/chapter/route.ts` around lines 312 - 316, The
mergedSettings creation dangerously casts a possibly-partial object to
AudiobookGenerationSettings; instead validate and ensure required fields (e.g.,
ttsProvider, ttsModel, any other non-optional properties) exist on the merged
object before calling normalizeNativeSpeedForSettings. Update the logic around
mergedSettings/normalizedExistingSettings/incomingSettings to either 1) run a
type guard that checks required keys and shapes and only then cast and call
normalizeNativeSpeedForSettings, 2) or fill missing required fields with safe
defaults and log/throw a controlled error if validation fails; ensure
normalizeNativeSpeedForSettings is only invoked with a verified
AudiobookGenerationSettings instance.
src/app/api/tts/route.ts (1)

62-77: Duplicate getUpstreamRetryAfterSeconds with subtle inconsistency.

This implementation ceils numeric Retry-After values (Line 70), while the identical function in generate.ts returns them unmodified. Consider extracting to a shared utility for consistency.

♻️ Extract shared utility

Move getUpstreamStatus and getUpstreamRetryAfterSeconds to a shared module (e.g., src/lib/server/tts/errors.ts) and import in both files:

// src/lib/server/tts/errors.ts
export function getUpstreamStatus(error: unknown): number | undefined { ... }
export function getUpstreamRetryAfterSeconds(error: unknown): number | undefined { ... }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/app/api/tts/route.ts` around lines 62 - 77, Duplicate logic: extract
getUpstreamStatus and getUpstreamRetryAfterSeconds into a shared utility and
make their behavior consistent with the implementation in generate.ts (remove
the Math.ceil on numeric Retry-After values so numeric values are returned
unchanged). Create a new module exporting getUpstreamStatus and
getUpstreamRetryAfterSeconds, update both route.ts and generate.ts to import and
use those exported functions, and ensure getUpstreamRetryAfterSeconds still
parses header strings and date values, only returning undefined for non-positive
or unparsable values but not altering valid numeric values.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/app/api/audiobook/chapter/route.ts`:
- Around line 312-316: The mergedSettings creation dangerously casts a
possibly-partial object to AudiobookGenerationSettings; instead validate and
ensure required fields (e.g., ttsProvider, ttsModel, any other non-optional
properties) exist on the merged object before calling
normalizeNativeSpeedForSettings. Update the logic around
mergedSettings/normalizedExistingSettings/incomingSettings to either 1) run a
type guard that checks required keys and shapes and only then cast and call
normalizeNativeSpeedForSettings, 2) or fill missing required fields with safe
defaults and log/throw a controlled error if validation fails; ensure
normalizeNativeSpeedForSettings is only invoked with a verified
AudiobookGenerationSettings instance.

In `@src/app/api/tts/route.ts`:
- Around line 62-77: Duplicate logic: extract getUpstreamStatus and
getUpstreamRetryAfterSeconds into a shared utility and make their behavior
consistent with the implementation in generate.ts (remove the Math.ceil on
numeric Retry-After values so numeric values are returned unchanged). Create a
new module exporting getUpstreamStatus and getUpstreamRetryAfterSeconds, update
both route.ts and generate.ts to import and use those exported functions, and
ensure getUpstreamRetryAfterSeconds still parses header strings and date values,
only returning undefined for non-positive or unparsable values but not altering
valid numeric values.

In `@src/lib/shared/tts-provider-catalog.ts`:
- Around line 127-128: The two unbounded maps replicateVoiceInputKeyCache and
replicateOpenApiSchemaPromiseCache can grow without limit; replace them with a
bounded LRU cache (or use the existing LRU approach used by ttsAudioCache in
generate.ts) so entries are evicted when capacity is reached; specifically swap
the Map instantiations for an LRUCache (or a small custom LRUMap) with a
sensible max size, preserve the same key/value types (string ->
ReplicateVoiceInputKey and string -> Promise<unknown|null>), and ensure all code
that reads/writes these caches (references to replicateVoiceInputKeyCache and
replicateOpenApiSchemaPromiseCache) continues to work with the LRU API.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 056db415-b6d9-4c14-88e7-415bb0dd68a7

📥 Commits

Reviewing files that changed from the base of the PR and between d0c0ab7 and 893f74f.

📒 Files selected for processing (13)
  • README.md
  • docs-site/docs/configure/tts-provider-guides/replicate.md
  • docs-site/docs/configure/tts-providers.md
  • docs-site/docs/deploy/vercel-deployment.md
  • docs-site/docs/introduction.md
  • src/app/api/audiobook/chapter/route.ts
  • src/app/api/tts/route.ts
  • src/components/SettingsModal.tsx
  • src/lib/server/tts/generate.ts
  • src/lib/shared/kokoro.ts
  • src/lib/shared/tts-provider-catalog.ts
  • tests/unit/kokoro.spec.ts
  • tests/unit/tts-provider-catalog.spec.ts
✅ Files skipped from review due to trivial changes (3)
  • README.md
  • docs-site/docs/introduction.md
  • docs-site/docs/configure/tts-provider-guides/replicate.md
🚧 Files skipped from review as they are similar to previous changes (3)
  • docs-site/docs/deploy/vercel-deployment.md
  • docs-site/docs/configure/tts-providers.md
  • src/components/SettingsModal.tsx

…ngs validation

Move getUpstreamStatus and getUpstreamRetryAfterSeconds to a shared utility
module for consistent upstream error handling across TTS endpoints. Strengthen
audiobook chapter API by introducing runtime validation for incoming settings
payloads, ensuring type safety and error reporting for malformed requests.
Replace in-memory Map caches with LRUMap for Replicate voice and schema
lookups, improving memory management and eviction logic.
@richardr1126 richardr1126 merged commit 6709710 into main Apr 16, 2026
4 of 5 checks passed
@richardr1126 richardr1126 deleted the replicate branch April 16, 2026 19:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant