Added
-
Added
TogetherSTTServiceandTogetherTTSServicefor real-time speech-to-text and text-to-speech using Together AI's WebSocket APIs.
(PR #4054) -
Added per-sentence synthesis mode and zero-shot audio prompt support to
NvidiaTTSService, letting NVIDIA TTS users choose between stitched and per-request synthesis flows and configure voice-cloning prompts for supported models.
(PR #4742) -
Added
on_heartbeat_timeoutevent handler toPipelineWorker, fired when a heartbeat frame is not received within the monitor timeout period.
(PR #4761) -
Added Time To First Audio (TTFA) metrics to TTS services, reported as
TTFAMetricsDataalongside the existing TTFB metric. TTFA measures the time to the first audible sample — TTFB plus the leading silence many providers pad onto the start of a response — so comparing the two shows how much perceived latency is padding versus service response time. Audible onset is detected from short-time RMS energy (detect_speech_onsetinpipecat.audio.utils), which rejects noise-floor blips and brief transients; theMetricsLogObserversurfaces the new metric.
(PR #4782) -
GeminiTTSServicecan now use the Gemini Developer API (google-genai) backend in addition to the existing Google Cloud backend. Passapi_key(or setGOOGLE_API_KEY) to authenticate with an API key instead of Google Cloud service-account credentials.- The backend is selected automatically: passing
api_keyopts into the GenAI backend, whilecredentials/credentials_pathcontinue to use the Google Cloud backend. Useuse_genai=True/Falseto force a backend explicitly. AGOOGLE_API_KEYpresent in the environment alone does not switch backends — it is only used once the GenAI backend is active. - New
http_optionsparameter forwardsgoogle.genai.types.HttpOptionsto the GenAI client. - The GenAI backend does not support
prompt/style instructions ormulti_speakeroutput; setting them logs a warning and they are ignored. Use the Google Cloud backend for those features.
(PR #4787)
- The backend is selected automatically: passing
-
Added a
modestreaming parameter toAssemblyAISTTService, exposing AssemblyAI's U3 Pro latency/accuracy preset (min_latency,balanced, ormax_accuracy). It trades transcription accuracy against turn-finalization latency and is only applicable to U3 Pro models, where the server defaults tobalanced.
(PR #4810) -
TaskManagercan now be constructed with an event loop and an optionalcontextvars.Context(TaskManager(loop=..., context=...)), and creates all of its tasks within that context. You can pass a single task manager toWorkerRunner(task_manager=...)(and to individual workers) to share one loop and context across the runner and every worker, so context variables set in one task are visible to the others.
(PR #4815) -
Added tunable parameters to the xAI TTS services:
speed,optimize_streaming_latency, andtext_normalization(pluswith_timestampson the WebSocket service). Set them via the service'sSettings, e.g.XAITTSService.Settings(speed=1.1).
(PR #4821) -
Added word-level timestamps to
XAITTSService. Whenwith_timestampsis enabled (now the default), xAI's per-character timing is converted into per-wordTTSTextFrameobjects, each carrying an accuratepts. Note that xAI delivers timestamps in coarse batches, so word frames are emitted in bursts; consumers should schedule offptsrather than arrival time.
(PR #4821) -
Added a
base_urlparameter toTwilioFrameSerializerto configure the REST API host used for auto hang-up. By default the host is still derived fromregion/edge(unchanged behavior), but settingbase_urllets you target a Twilio-API-compatible backend or a self-hosted server instead ofapi.twilio.com.
(PR #4845) -
Added a first-class RTVI
dtmfclient message. Sending{type: "dtmf", data: {button: "1"}}makes theRTVIProcessorpush anInputDTMFFramedownstream, the same path a telephony transport's keypress takes, so any bot with DTMF handling (e.g. aDTMFAggregator) reacts to it. One keypress per message.
(PR #4849) -
Added DTMF keypress support to the behavioral evals. A scenario turn can now press keys with a
dtmf:field (e.g.dtmf: "123#") instead ofuser:, sent as one RTVIdtmfmessage per key. A bot running aDTMFAggregatorreacts to them as a transcription, so adtmfturn can assert onuser_transcriptionandresponselike a spoken turn.
(PR #4849) -
Added built-in text transform functions for TTS voice formatting under
pipecat.utils.text.transforms:strip_markdown,normalize_acronyms,
expand_currency,expand_numbers,expand_percentages,
expand_phone_numbers,expand_units,email_to_speech,normalize_dates,
andreplace_text. These can be composed individually via the
text_transformsparameter on anyTTSService, or used together via the new
VoiceFormatterbundle.VoiceFormatteris a single configurable callable that applies all
transforms in the correct order (structural cleanup → language expansions →
custom replacements). Most transforms are enabled by default; pass keyword
arguments to toggle them:tts = CartesiaTTSService( text_transforms=[("*", VoiceFormatter(expand_numbers=True,
normalize_acronyms=False))],
)
```- Individual transforms can be composed for fine-grained control:
tts = CartesiaTTSService( text_transforms=[("*", strip_markdown), ("*", expand_currency), ("*",
expand_percentages)],
)
```
(PR #4854) -
Added silence-based keepalive to
NvidiaSTTServiceto keep idle NVIDIA streaming ASR sessions from going stale. When no audio arrives for a while, the service sends silence over the existing stream instead of letting it sit idle and degrade.
(PR #4877) -
Pipecat Flows is now part of
pipecat-ai. The conversation-flow framework previously published as the separatepipecat-ai-flowspackage now ships with Pipecat under thepipecat.flowsnamespace —from pipecat.flows import FlowManager, NodeConfig— so there is no longer a separate package to install or keep version-matched. Code importing frompipecat_flowsshould switch topipecat.flows. If the deprecatedpipecat-ai-flowspackage is still installed alongside this Pipecat, Pipecat logs an error prompting you to remove it. The standalone package's release history remains available in the archived pipecat-flows repository.
(PR #4882) -
Added
clear_after_secsparameter toSOXRStreamAudioResampler(default0.2) to control how long after inactivity the internal resampler state is cleared. Set toNoneto disable clearing.
(PR #4886) -
Added
resampler_clear_after_secstoFrameSerializer.InputParamsso all telephony serializers (Twilio, Plivo, Vonage, Telnyx, Exotel, Genesys) expose this setting to callers.
(PR #4886) -
Added a
language_codestreaming parameter toAssemblyAISTTServicefor declaring the audio language (e.g."es","fr"). On U3 Pro models a tier-1 code (en/es/fr/de/it/pt) steers transcription toward that language. It is mutually exclusive withlanguage_detectionand is not sent unless set, so existing behavior is unchanged.
(PR #4889) -
Added
AudioBufferStartRecordingFrameandAudioBufferStopRecordingFramecontrol frames. Push them through the pipeline to start and stopAudioBufferProcessorrecording. Thestart_recording()/stop_recording()methods continue to work.
(PR #4890) -
Added an
auto_start_recordingoption toAudioBufferProcessorthat starts recording as soon as the pipeline starts. Bots generated by the Pipecat CLI with the recording feature now use this option.
(PR #4890) -
Added
on_recording_startedandon_recording_stoppedevents toAudioBufferProcessor, fired when recording starts and stops.on_recording_stoppedfires after the final buffered audio has been emitted.
(PR #4890) -
AI services can now describe themselves to downstream processors at start by overriding
service_metadata_frame()to return a populatedServiceMetadataFrame;broadcast_service_metadata()broadcasts whatever it returns. The STT services that do server-side end-of-turn detection (Deepgram Flux, Cartesia Turns, AssemblyAI, Gladia, Speechmatics) use this to recommendExternalUserTurnStrategies, so bots no longer need to setuser_turn_strategiesby hand; your own setting still wins.
(PR #4892) -
Added
endpoint_latency_adjustment_leveltoSonioxSTTService.Settings, exposing Soniox's endpoint-detection latency control (integer 0–3; higher finalizes turns sooner at some cost to accuracy). Takes effect when Soniox endpoint detection is active (vad_force_turn_endpoint=False).
(PR #4894) -
Added a
speedsetting (0.7-1.3) toSonioxTTSService.
(PR #4947) -
SonioxTTSServicenow supports cloned voices: pass the voice UUID asvoice.
(PR #4947) -
SonioxSTTServicenow emitsUserStartedSpeakingFrame/UserStoppedSpeakingFrameand recommendsExternalUserTurnStrategieswhen using Soniox's built-in endpoint detection (vad_force_turn_endpoint=False), matchingAssemblyAISTTService. The turn opens on the local VAD signal when a VAD analyzer is configured (most responsive) or on the first transcript token otherwise, and closes on the Soniox endpoint. A newshould_interruptparameter (default True) controls whether the bot is interrupted when the user starts speaking in this mode.
(PR #4949)
Changed
-
TavusTransportnow delivers bot audio via the Tavusconversation.echoapp message API instead of a WebRTC audio track. By default, audio is sent paced to real playback time (compatible with downstream processors likeAudioBufferProcessor). SetTavusParams(audio_out_faster_than_realtime=True)to instead accumulate and send audio in 100ms chunks as fast as possible, which gives Tavus more of a rendering buffer at the cost of losing realtime pacing.- The default
persona_idchanged from"pipecat-stream"to"pipecat0", which signals Tavus to expect audio over theconversation.echoapp message API instead of a WebRTC custom audio track.
(PR #4648)
- The default
-
TavusVideoServicenow sends bot audio to Tavus via theconversation.echoapp message API instead of a WebRTC audio track. Audio is accumulated and sent in 100ms chunks, as fast as possible.- The default
persona_idchanged from"pipecat-stream"to"pipecat0", which signals Tavus to expect audio over theconversation.echoapp message API instead of a WebRTC custom audio track.
(PR #4648)
- The default
-
⚠️ The defaultGeminiTTSServicemodel changed fromgemini-2.5-flash-ttstogemini-3.1-flash-tts-preview. Passsettings=GeminiTTSService.Settings(model="gemini-2.5-flash-tts")to keep the previous model.
(PR #4787) -
TTFAMetricsDatanow reports the latency breakdown directly:ttfa(the measurement, renamed fromvalue),ttfb, andleading_silence. Consumers can see how much of the perceived latency is silence padding (leading_silence == ttfa - ttfb) without correlating a separateTTFBMetricsData.ttfbhere mirrors the standalone, earlierTTFBMetricsDatafor convenience and is not a separate measurement.
(PR #4814) -
Dangling tasks are now reported by the
WorkerRunnerfor its shared task manager once everything has been torn down. A worker only reports dangling tasks when it owns its own task manager, so a worker sharing the runner's task manager no longer flags the runner's and other workers' tasks as dangling.check_dangling_tasksis now a constructor argument on bothWorkerRunnerandBaseWorker.
(PR #4815) -
XAITTSServicenow cancels the current utterance on interruption by sending atext.clearmessage over the existing WebSocket, instead of disconnecting and reconnecting. This makes barge-in faster by avoiding a reconnect on every interruption.
(PR #4821) -
Bumped
pipecat-ai-prebuiltminimum version to1.0.3in therunnerextra, which updates the prebuilt client UI served by the development runner to use RTVI protocol 2.0.0.
(PR #4847) -
Eval scenarios now accept a
send_after:with onlydelay_ms(noevent), a pure time delay relative to the previous send, andexpect:is now optional so a turn can just send input or wait.
(PR #4849) -
Simplified the
WorkerBusmessage-dispatch tasks by removing redundantCancelledErrorhandling (the task manager already handles cancellation, andCancelledErrorwas never caught by the subscriber-isolation handler). Subscriber-exception isolation and cancellation behavior are unchanged.
(PR #4851) -
When text transforms change the alphanumeric content of a TTS frame (e.g.
expand_currencyturning"$42.50"into"forty-two dollars and fifty cents"), the conversation context now correctly receives the original LLM text ("$42.50") rather than the expanded TTS words. Intermediate spoken words within a transformed span are suppressed from the context until the full span is complete, so the context entry is clean and accurately represents what was said.
(PR #4854) -
pipecat initis now the starting point for building a Pipecat app. It writes the coding-agent filesAGENTS.mdandCLAUDE.md, then helps you build:- Build with a coding agent (such as Claude Code or Codex). This also writes a
GETTING_STARTED.mdguide for building Pipecat apps with an AI coding assistant. - Scaffold a runnable bot immediately through an interactive setup wizard.
- Run
pipecat init quickstartto scaffold the ready-to-run quickstart project, set up for coding agents in one step.
(PR #4861)
- Build with a coding agent (such as Claude Code or Codex). This also writes a
-
AssemblyAISTTServicenow defaults to theuniversal-3-5-promodel (AssemblyAI's launched flagship streaming model), replacing the pre-GA aliasu3-rt-pro. Both resolve to the same U3 Pro family and feature set, so the only change is the model id sent on the wire.u3-rt-proandu3-rt-pro-beta-1remain accepted for backward compatibility.
(PR #4863) -
Bumped the
@pipecat-ai/*JS client dependencies used by thepipecat initclient templates (and the UI-worker examples):client-jsto1.12.0,client-reactto1.7.1,daily-transportto1.6.7,small-webrtc-transportto1.10.5, andwebsocket-transportto1.7.0.
(PR #4866) -
⚠️ pipecat initnow keeps existingAGENTS.md,CLAUDE.md, andGETTING_STARTED.mdfiles instead of overwriting them on re-run. When a guide was written by an older Pipecat version, an interactive run offers to refresh it; otherwise pass the new--overwrite-guideflag (renamed from--force, now covering all three files) to refresh them.
(PR #4869) -
Updated the ai-coustics integration to SDK 0.21 (bumped
aic-sdkto~=2.5.0).AICQuailVADAnalyzernow reports the model's continuous raw VAD probability (VadContext.raw_vad_probability()) gated by Pipecat'sVADParamsinstead of a binary speech flag. Because the previous output was binary (0.0/1.0),VADParams.confidencehad no effect on this analyzer before — it now governs the speech threshold, so existingAICQuailVADAnalyzerusers should review theirVADParams.confidenceafter upgrading. The ai-coustics voice examples now use thequail-vf-2.2-l-16khzenhancement model.
(PR #4874) -
Widened the
googleextra'sgoogle-genaidependency to>=1.68.0,<3, allowing the 2.x line. The google-genai 2.0 major release scopes its breaking changes to the Interactions API, which Pipecat does not use; the surfaces Pipecat relies on (Client,aio.live.connect,generate_content/generate_content_stream,types.*) are unchanged, so no migration is required.
(PR #4895) -
⚠️ LLMContextAggregatorPair'srealtime_service_modeis now auto-configured and defaults toNone(wasFalse): a realtime (speech-to-speech) LLM service announces itself via service metadata and the aggregator turns the mode on automatically, so you no longer opt in by hand. If your realtime-service pipeline previously ran withoutrealtime_service_mode=True, realtime context-write behavior now applies to it: listen foron_user_turn_message_addedto get the newly-added user message rather thanon_user_turn_stopped, which no longer carries it (UserTurnStoppedMessage.contentisNonein realtime mode). Passrealtime_service_mode=Falseto keep the legacy, pre-realtime_service_modebehavior.
(PR #4919) -
⚠️ Auto-switching to external user turn strategies for realtime services is no longer conditioned onrealtime_service_mode: a realtime service that does its own server-side turn detection now getsExternalUserTurnStrategieswhenever it recommends them, even withrealtime_service_mode=Falseor left off. (The switch has moved onto the same service-metadata recommendation mechanism the STT services use.) As before, passing your ownuser_turn_strategiesoverrides the recommendation.
(PR #4919) -
⚠️ SonioxTTSServicenow emits word-alignedTTSTextFrames instead of pushing each sentence's full text up front, so the context reflects only what was actually spoken on interruption.
(PR #4947)
Deprecated
-
Deprecated
TaskManagerParamsandTaskManager.setup(). Passloopandcontextto theTaskManagerconstructor instead.- Deprecated the
WorkerRunnerloopargument. Passtask_manager(which owns its own loop) instead.
(PR #4815)
- Deprecated the
-
Deprecated the
speech_hold_duration,minimum_speech_duration, andsensitivityparameters ofAICQuailVADAnalyzer. They only affected the SDK's post-processed VAD output, which the new raw-probability path no longer uses — speech gating is now governed by Pipecat'sVADParams(confidence/start_secs/stop_secs). The parameters are accepted but ignored, and will be removed in 2.0.0.
(PR #4874)
Removed
-
⚠️ RemovedWorkerParams.loop. Pass a task manager viaWorkerParams(task_manager=...)instead.
(PR #4815) -
⚠️ Removed thepipecat createcommand; scaffolding now lives inpipecat init, the single entry point for starting a Pipecat app. Alongside the coding-agent guide (AGENTS.md,CLAUDE.md) it already wrote,pipecat initnow also scaffolds a runnable bot — interactively, or non-interactively from flags or a config file (e.g.pipecat init . --bot-type web -t daily --stt deepgram_stt --llm openai_llm --tts cartesia_tts; runpipecat init --list-optionsfor valid values).pipecat init quickstartreplacespipecat create quickstart. Scaffolding is now directory-first and in-place — the project name comes from the target directory, andcreate's--output/-oand--name-subfolder layout are gone.
(PR #4883) -
⚠️ Removed the internalRealtimeServiceMetadataFrameandRealtimeServiceInfo. Realtime LLM services now describe themselves withLLMServiceMetadataFrame(carryingis_realtime_service); if you imported either symbol directly, switch toLLMServiceMetadataFrame.
(PR #4919)
Fixed
-
Fixed
MarkdownTextFilterleaking raw#markers into TTS output for second-level (and deeper) markdown headers. Headers are now normalized to plain text before the newline-collapse step that previously broke header recognition bymd.convert. This also handles closed ATX headers (e.g.## Title ##) while preserving trailing whitespace needed for word-by-word streaming.
(PR #4708) -
Fixed
NvidiaSegmentedSTTServicenot initializingspeaker_diarizationanddiarization_max_speakersdefaults, which left both fields asNOT_GIVENafter construction.
(PR #4718) -
Fixed
SarvamTTSServiceWebSocket handshakes sending duplicateUser-Agentheaders. Pipecat now passes the Sarvam SDKUser-Agentvia the WebSocket client'suser_agent_headerparameter instead ofadditional_headers.
(PR #4794) -
Fixed
DeepgramSageMakerSTTServiceblocking pipeline startup while connecting. The SageMaker BiDi connection is now established in a background task, so a slow or failing connect no longer holds up theStartFramebarrier (the first bot turn, e.g. a greeting, can proceed while STT connects) and connection failures surface viaon_connection_errorinstead of looking like a hang.
(PR #4803) -
Fixed
RNNoiseFilterfailing to import withNo module named 'av.option'when installingpipecat-ai[rnnoise]. PyAV 17.1.0 removed theav.optionsubmodule thatpyrnnoise'saudiolabdependency imports, so thernnoiseextra now capsav<17.1.0.
(PR #4807) -
Fixed eval failure reasons (a missing function call, a response timeout, an unsatisfied
eval:, or asend_afterthat never fired) not being written to the per-scenario.eval.logfile. They were only printed to the terminal during the run, so there was no record to debug failures after the fact.
(PR #4811) -
Fixed
SileroOnnxModelraising a confusingAttributeErrorinstead of aValueErrorwhen given an input audio chunk with too many dimensions. The validation error message calledx.dim()(a PyTorch method) on a NumPy array; it now usesx.ndimand surfaces the intended "Too many dimensions" message.
(PR #4820) -
Fixed
XAIHttpTTSServiceomitting thelanguagefield when it was unset. xAI markslanguageas required, so it is now always sent, falling back to"auto"for language auto-detection.
(PR #4821) -
Fixed
WorkerBuspermanently stopping message delivery to a subscriber when that subscriber'son_bus_message(or an overridable lifecycle hook such ason_job_response/on_activated) raised an exception. The router and data dispatch tasks now log the exception and keep running, so subsequent messages — including cancel/cleanup — are still delivered.
(PR #4827) -
Fixed
Mem0MemoryServiceinjecting an empty "Based on previous conversations, I recall:" message into the LLM context on turns where no relevant memories were found. Mem0 2.xsearch()returns a{"results": [...]}dict, which is always truthy, so the empty-memory guard never triggered; retrieved memories are now normalized to a list so the guard, formatting, and debug counts all behave correctly.
(PR #4843) -
Fixed a missing
pipecat/workers/__init__.pyso thepy.typedmarker coverspipecat.workers.*and type checkers (e.g. pyright in strict mode) resolvefrom pipecat.workers.runner import WorkerRunnerwithout reportingreportMissingTypeStubs.
(PR #4846) -
Fixed
RTVIObserversilently dropping TTFA (Time To First Audio) metrics. TTFA metrics are now forwarded to RTVI clients under attfakey, alongside TTFB, processing, token, and character metrics.
(PR #4880) -
Fixed
TwilioFrameSerializerrejecting a validbase_urlconfiguration when only one ofregion/edgewas also set. Theregion/edgepairing is only required when deriving Twilio's FQDN host; sincebase_urlis used verbatim and ignoresregion/edge, the validation now skips that check whenbase_urlis provided.
(PR #4885) -
Fixed eval scenarios silently corrupting unquoted DTMF turns with leading zeros or a hex prefix. YAML 1.1 parsed
dtmf: 012as octal (10) anddtmf: 0x10as hex before validation, sending the wrong keypresses; the scenario loader now resolves only plain-decimal scalars as integers, so leading-zero sequences keep their digits anddtmf: 123still works.
(PR #4887) -
Fixed local STT services (
WhisperSTTService,WhisperSTTServiceMLX,MoonshineSTTService) corrupting the start of every utterance.SegmentedSTTServicewraps each VAD segment in a WAV container, but these services read the bytes directly as 16-bit PCM, so the 44-byte WAV header was decoded as 22int16samples and prepended as a near-full-scale burst, changing the transcription.SegmentedSTTServicenow exposes awants_wav_segmentsproperty (defaultTrue, what cloud upload APIs expect) that local models override to receive raw PCM instead.
(PR #4896) -
Fixed services and transports leaking connections and background tasks when a pipeline is torn down without an
EndFrame/CancelFramereaching every processor. Resource release (closing websockets and connections, releasing clients/sessions, cancellingcreate_task()tasks) now runs from the guaranteedcleanup()hook in addition to the frame-drivenstop()/cancel()paths, so it happens on every exit path. Affected services include the websocket STT/TTS bases (and their subclasses), realtime LLM services, Deepgram Flux, and the AWS/Azure/Google/NVIDIA/HeyGen/Simli/Tavus integrations, plus the Daily, LiveKit, websocket, SmallWebRTC, and Tavus transports.
(PR #4902) -
Fixed occasional abnormal WebSocket closures (1006) when disconnecting the ElevenLabs TTS service. Pipecat now waits for ElevenLabs to complete its side of the two-step close before closing, rather than racing the closing handshake.
(PR #4904) -
Fixed
InworldTTSServicesurfacing anErrorFrameduring idle periods when the keepalive task sent a contextlesssend_textand Inworld rejected it. Thecontext_id is requiredandno open contextrejections are now treated as benign (logged at debug level and skipped), matching the existingContext not foundhandling, which prevents spurious failover away from Inworld.
(PR #4906) -
Fixed
ElevenLabsHttpTTSServicesendingprevious_textwith theeleven_v3model, which rejects that parameter. ElevenLabs returned a 400 error for every request after the first sentence of a turn, so only the first sentence of a multi-sentence response was spoken.
(PR #4925) -
Fixed
AggregatedFrameSequencerduplicating a word and misattributing its context in TTS word-timestamp mode (Cartesia, ElevenLabs, etc.) when a whitespace-only token was force-completed.
(PR #4930) -
Fixed the incomplete-turn (
○/◐) re-prompt nudge firing while the user was already speaking again. The re-prompt timeout was only cancelled onInterruptionFrame, which does not fire when the user resumes speaking inside the same open turn. It is now also cancelled onVADUserStartedSpeakingFrame, so a user who resumes after a pause is no longer interrupted by a canned "no rush" prompt.
(PR #4938) -
Fixed the bot speaking the same response multiple times within one user turn when using
filter_incomplete_user_turnsturn completion. When the acoustic detector (e.g. Smart Turn) triggered several inferences in one turn, each produced a✓and every one was voiced. At most one completion is now spoken at a time; later duplicate completions are dropped until a new user turn begins or the user resumes speaking within the same turn.
(PR #4938) -
Fixed a second, redundant LLM inference when using
filter_incomplete_user_turnsturn completion. After the LLM marked a user turn incomplete (○/◐), the mixin armed a re-prompt timeout; if that timeout fired at the same moment a completed inference (✓) arrived, both the re-prompt and the completed response ran. The pending timeout is now cancelled as soon as a new LLM response starts, so only one inference runs.
(PR #4938) -
Fixed the bot talking over the user when using
filter_incomplete_user_turnsturn completion. A completion (✓) resolves with some latency, so the user may have resumed speaking by the time it arrives. The user turn controller now drops a turn finalization that arrives while the user is speaking, so a stale completion no longer ends the turn (and talks over them); the turn stays open for the next inference to re-evaluate.
(PR #4938) -
Fixed
AssemblyAISTTServiceprocessing metrics never being recorded in AssemblyAI turn-detection mode (vad_force_turn_endpoint=False) withshould_interrupt=True(the default): the interruption broadcast on speech start immediately stopped the just-started metrics. Metrics now start after the interruption broadcast.
(PR #4949) -
Fixed
RimeTTSServiceintermittently going silent for the rest of the session when metrics are enabled. Rime's websocket may split a 16-bit sample across audio chunks, and the resulting odd-length audio frames crashed the TTS playback task; dangling bytes are now carried over to the next chunk so frames always contain whole samples.
(PR #4952)