v0.7.0
First stable release after the 0.7.0a1–0.7.0a18 alpha series. Major architectural changes since 0.6.13 (2026-03-05).
pip install roomkit==0.7.0
Highlights
- Real-time speech-to-speech AI is the headline feature. New
RealtimeVoiceChannelwraps OpenAI Realtime, Gemini Live, xAI, ElevenLabs, Anam, and PersonaPlex behind one Channel ABC. Ten-mixin architecture (_realtime_audio,_realtime_tools,_realtime_speech,_realtime_skills,_realtime_transcription,_realtime_response,_realtime_tool_search,_realtime_tool_recovery,_realtime_context,_skill_handlers) keeps each concern focused. - Tool Search for tool-heavy realtime sessions.
find_tools(query)+list_toolskeep the active tool surface under the ~20 limit where Gemini Live function-calling stays reliable, while exposing thousands of tools dynamically viaprovider.reconfigure. - Skill delivery modes —
on_demandvsinline_full. Handles providers that can't reconfigure mid-session (Gemini 3.x) by baking skill bodies intosystem_instructionat session start. - Carrier-grade SIP: NAT traversal via
advertised_ip, BYE routing fixed for inbound calls behind SBCs, RFC 3326Reasonheader parse + emit, runtime auth resolver (set_auth_resolver), runtime invite filter (set_invite_filter), PSTN-compatibility knobs for outbound dial. - Orchestration:
Supervisorstrategy withsequential/parallel/auto_delegateexecution +async_deliveryfor non-blocking pipelines.HandoffHandlerstate machine.Loopproducer/reviewer pattern. All wired tokit.status_busfor observable multi-agent flows. - Video / vision / avatar: vision providers (OpenAI, Gemini), avatar providers (MuseTalk lip-sync, WebSocket, Anam cloud), video filters (watermark, YOLO, censor, MediaPipe face-touch detection), screen capture + control tools (
DescribeScreenTool,ScreenInputTools), webcam capture (DescribeWebcamTool), PyAV recorder with A/V sync, video bridge. - Storage:
PostgresStorev2 relational schema with proper indexes (replacing JSONB blobs).PostgresKnowledgeSourcefor full-text retrieval.SummarizingMemory+RetrievalMemoryproviders. - Delivery backends: pluggable
InMemoryDeliveryBackendandRedisDeliveryBackend(Streams + consumer groups) so deliveries survive process restarts and scale across workers. - Twilio Media Streams voice backend with stateful soxr resampling and pure-Python G.711 mu-law codec — no
audioopdependency. - Quality:
ON_AI_RESPONSE+ON_FEEDBACKhooks,ConversationScorerABC,ScoringHook,QualityTrackerreports.
Migration from 0.6.x
Removed APIs (BREAKING)
kit.connect_voice/kit.disconnect_voice/kit.connect_video/kit.disconnect_video/kit.bind_voice_session/kit.connect_realtime_voice/kit.disconnect_realtime_voice→ usekit.join(...)andkit.leave(session).RoomKit(stt=..., tts=..., voice=...)constructor parameters → pass providers toVoiceChannel(stt=..., tts=..., backend=...)directly.kit.stt/kit.tts/kit.voiceproperties now look up from registered channels.- Top-level
from roomkit import …exports slimmed from 399 to 66. Providers, voice/video types, mocks, recording, orchestration, and telemetry must be imported from their subpackages (e.g.from roomkit.providers.anthropic.ai import AnthropicAIProvider). HookTrigger.ON_REALTIME_TOOL_CALL→ renamed toHookTrigger.ON_TOOL_CALL. Event payload is now a channel-agnosticToolCallEvent. Return results viaHookResult(action="allow", metadata={"result": ...}).- Tool handler signature: 3-arg
(session, name, arguments)→ 2-arg(name, arguments). Useget_current_voice_session()contextvar for session access in voice tool handlers. audit_realtime_tool_handler→ useaudit_tool_handler(now channel-agnostic).parse_voicemeup_webhook()/configure_voicemeup_mms()module-level functions → per-instanceprovider.parse_inbound(payload, channel_id)/provider.configure_mms(...). Enables multi-tenant isolation.GeminiLiveProvider.prime_realtime_input()→provider.start_audio_stream(session)(also exposed onRealtimeVoiceChannel.inject_text(..., start_audio_stream=True)).
Behavior changes
- Recording is opt-out, not opt-in. Rooms with recorders now capture every attached channel by default. Disable per-channel with
ChannelRecordingConfig(audio=False, video=False). Recording now captures both inbound (mic) and outbound (TTS) audio mixed into a single track. Toolprotocol is the standard tool registration path. Pass any object with.definition: dictand.handler(name, args) -> strviatools=[my_tool]. The legacytool_handler=parameter still exists for MCP / audit middleware.PostgresStoreis now relational (schema v2). v1 JSONB-blob databases are auto-migrated on first connect; drops olddatacolumns and rebuilds the relational schema.OpenAIRealtimeProviderhonoursinput_sample_rate/output_sample_rate. PCM is only accepted at 24 kHz by the GA API; invalid rates now raiseValueErrorat construction.audioopdependency removed. Replaced with pure-Python G.711 codec + linear interpolation resampler — runs on Python 3.13+ withoutaudioop-lts.
Security
Five vulnerabilities closed in 3cd5124 immediately before the release.
- HTTP webhook SSRF guard hardened (
HTTPProviderConfig.webhook_url). Previous validator missed127.1,2130706433,0x7f000001,localhost., and any hostname whose A record points to RFC 1918 / loopback / link-local. Newroomkit.providers.url_safety.validate_public_urlnormalizes IPv4 numeric forms, strips trailing-dot DNS, and resolves every A/AAAA record at validation time. DeepgramSTTProviderno longer fetchesAudioContent.urlserver-side. Switched to Deepgram's nativetranscribe_urlso the fetch happens from Deepgram's network, not ours. Closes an SSRF vector reachable from any inbound webhook channel.PersonaPlexConfig.ssl_verifydefault flipped fromFalsetoTrue. Local self-signed dev must now passssl_verify=Falseexplicitly.- Telnyx webhook signatures now check timestamp freshness. Reject signatures > 300s away from the current clock; window configurable via
tolerance_seconds. Closes an indefinite replay window. DescribeWebcamToolno longer exposessave_pathto the AI. Operator-controlledsave_dirat construction; handler auto-generates filenames. Closes a prompt-injection → arbitrary-file-write primitive.save_pathin tool arguments is now silently ignored.
Full per-PR detail
See CHANGELOG.md for the granular 0.7.0a1 through 0.7.0a18 entries.
Full compare: v0.6.13...v0.7.0