fix(cartesia): surface TTS websocket server errors#1534
Conversation
Cartesia error frames received over the synthesis WebSocket were logged and dropped, so the base SynthesizeStream never saw a thrown error and tts_error was never emitted. Throw a retryable APIConnectionError so _mainTaskImpl can retry up to connOptions.maxRetry times and then emit tts_error with recoverable: false once retries are exhausted. Also stop the recvTask catch from swallowing APIError, and stop the outer catch from double-wrapping it via toRetryableConnectionError.
Cartesia error frames received over the synthesis WebSocket were logged and dropped, so the base SynthesizeStream never saw a thrown error and tts_error was never emitted. Throw a retryable APIConnectionError so _mainTaskImpl can retry up to connOptions.maxRetry times and then emit tts_error with recoverable: false once retries are exhausted. Restructure the recv loop to mirror the Python plugin's branch order (livekit/agents#3028 + #3080): any frame with `done: true` — including the error returned for empty/whitespace input on function-call turns — is treated as completion once the sentence stream has been closed, instead of being raised and retried. Only a "pure" error frame (no done:true) raises APIConnectionError. Also stop the recvTask catch from swallowing APIError, and stop the outer catch from double-wrapping it via toRetryableConnectionError.
🦋 Changeset detectedLatest commit: f6d32e2 The changes in this PR will be included in the next version bump. This PR includes changesets to release 33 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
… errors Per @charlotte-zhuang's review: Cartesia error frames also carry `done: true`, so the prior `serverMsg.done === true` branch swallowed real mid-stream errors instead of letting them throw to retry. Use isDoneMessage (type === 'done') for the completion branch so all errors reach the error branch. To preserve the empty-transcript fix that motivated mirroring Python's #3080 — Cartesia rejects empty / whitespace-only input with an error frame on function-call turns where the LLM emits no spoken text — track whether any non-empty token was sent, and in the error branch treat that specific case as benign completion instead of retrying. This is strictly better than the Python plugin, which catches error frames in its `data.get("done")` branch and silently swallows them when the tokenizer is open.
There was a problem hiding this comment.
I'm a bit concerned with the new change where the client tries to decipher server errors and think that the server errors could serve as a good source of truth.
I also think the new "reducer-like" pattern of branching on server message type is a bit confusing now since the "done" logic would have to run in 2 places.
LMK what you think!
I'm also happy to stack a PR on top of yours with what I had in mind if that's easier.
| if (isErrorMessage(serverMsg)) { | ||
| this.#logger.error({ error: serverMsg.error }, 'Cartesia returned error'); | ||
| continue; | ||
| } |
There was a problem hiding this comment.
Rather than tracking non-fatal errors client-side, can you check the status_code from the error message?
I also think that it would be great to move error handling back to the top to avoid duplicating the "done" logic, but that's no big deal.
// Handle error messages
if (isErrorMessage(serverMsg)) {
// Do not retry the connection on 4xx errors
// since they can be safely ignored, e.g. empty transcripts
if (400 <= serverMsg.status_code && serverMsg.status_code < 500) {
this.#logger.debug({ error: serverMsg.error }, 'Cartesia sent a non-fatal error');
} else {
this.#logger.error({ error: serverMsg.error }, 'Cartesia returned error');
throw new APIConnectionError({
message: `Cartesia returned error: ${serverMsg.error}`,
options: { retryable: true },
});
}
}
| } else if (this.#opts.wordTimestamps !== false && hasWordTimestamps(serverMsg)) { | ||
| const wordTimestamps = serverMsg.word_timestamps; | ||
| for (let i = 0; i < wordTimestamps.words.length; i++) { | ||
| const word = wordTimestamps.words[i]; | ||
| const startTime = wordTimestamps.start[i]; | ||
| const endTime = wordTimestamps.end[i]; | ||
| if (word !== undefined && startTime !== undefined && endTime !== undefined) { | ||
| pendingTimedTranscripts.push( | ||
| createTimedString({ | ||
| text: word + ' ', // Add space after word for consistency | ||
| startTime, | ||
| endTime, | ||
| }), | ||
| ); | ||
| } | ||
| } | ||
| } else if (isErrorMessage(serverMsg)) { |
There was a problem hiding this comment.
In combination with my previous comment, it could be nice to revert this change so the code looks like this:
- parse message
- if fatal error, throw to retry. otherwise, log and go to step 3
- if there are timestamps, emit them
- if there is audio, emit it
- if the message is
"type": "done"or"type": "error" AND "done": true, send the last frame
…rors Replace the client-side sentNonEmptyToken heuristic with a check against the status_code on Cartesia's error frame. 4xx (e.g. empty-transcript on function-call turns) is logged and finishes cleanly; 5xx bubbles up so the base SynthesizeStream can retry.
…e frames Restructure the recv-loop dispatch so error handling runs first (log on 4xx, throw on 5xx) and a single branch handles closing for both `type:"done"` and `type:"error"` frames carrying `done:true`. Removes the duplicated close sequence that previously lived in both isDoneMessage and isErrorMessage.
@charlotte-zhuang Thank you for your patience in reviewing this! I've made some changes to address your comments, let me know if it lines up with your thinking. |
charlotte-zhuang
left a comment
There was a problem hiding this comment.
thanks for the fix!
|
FYI I couldn't actually test the changes since I'm not sure how to run |
|
I tested it and it seems to work well! |
| this.#logger.debug({ error: serverMsg.error }, 'Cartesia sent a non-fatal error'); | ||
| } else { | ||
| this.#logger.error({ error: serverMsg.error }, 'Cartesia returned error'); | ||
| throw new APIConnectionError({ |
There was a problem hiding this comment.
since we have a status code here, we should throw a APIStatusError instead, which carries that detail.
c908917 to
7c5147d
Compare
Per review feedback, the fatal (5xx) Cartesia error frame now raises an APIStatusError carrying the server status_code instead of a generic APIConnectionError, surfacing the code for diagnostics while keeping the same retryable behaviour. The non-fatal 4xx fall-through is unchanged.
7c5147d to
f6d32e2
Compare
Description
Encountered this issue when Cartesia had a brief outage: our agents were going silent and were not throwing any errors.
The Cartesia TTS plugin's WebSocket receive loop silently swallowed server-returned error frames; it logged them and
continued. The baseSynthesizeStreamnever got a thrown error, sotts_errorwas never emitted, retries never ran, andttsErrorCounts/maxUnrecoverableErrorsescalation never kicked in.Ports the Python SDK fix (livekit/agents#3028 + #3080).
Changes Made
APIConnectionErroron Cartesia error frames so the baseSynthesizeStreamretry path actually runs.recvTaskbranch order to match Python: any frame withdone: true— including the error Cartesia returns for empty input on function-call turns — is treated as completion once the sentence stream has closed. Only pure error frames raise. Mirroring #3080 up front avoids shipping the same regression Python hit.recvTask'scatchfrom swallowingAPIError, and stop the outercatchfrom re-wrapping it viatoRetryableConnectionError. Without these the newthrownever reaches the base class.@livekit/agents-plugin-cartesia- dunno if that's the right designation for thisPre-Review Checklist
Testing
restaurant_agent.tsandrealtime_agent.tswork properly (for major changes)Additional Notes
n/a