This release introduces conversational IoT environment control for the first
time in Iris, alongside reliability fixes across audio, the doctor-to-patient
speech pipeline, and AI response quality.
✨ What's New
🌡️ Conversational Environment Control (First Release)
Iris can now control the patient's physical environment — fan and lights — both
manually and through natural conversation with the doctor.
- Manual control: A dedicated Environment panel lets the patient toggle the
fan and lights directly with a single tap. - Conversational control: When the doctor asks "Are you feeling hot?" and
the patient selects "Yes", Iris automatically publishes an MQTT command to
the patient's Wio Terminal, turning the fan on. No extra steps required. - Supported actions via conversation:
- Doctor says "hot" or "warm" → Fan ON
- Doctor says "cold" or "chilly" → Fan OFF
- Doctor says "dark" or "dim" → Light ON
- Doctor says "bright" → Light OFF
- Hardware: Seeed Wio Terminal running a custom MQTT firmware, subscribing
to commands from the Iris server over a local Mosquitto broker. - Architecture: Hardware control is fully deterministic — JavaScript keyword
matching handles all trigger decisions. The local LLM is never involved in
a time-critical hardware action.
🤝 Doctor-to-Patient Live Speech Pipeline
The doctor's speech now reaches the patient portal in real time via the LiveKit
data channel, triggering contextual AAC response predictions automatically.
- Python STT agent (Deepgram via LiveKit) transcribes the doctor's microphone
and publishes to thedoctor_transcripttopic. - Patient portal receives the transcript, displays it, and generates 3 response
buttons tailored to the doctor's question within 1.5 seconds.
🧠 Hybrid AI Prediction Engine
Prediction button generation now uses a JS-first hybrid approach designed
for the constraints of a 1B in-browser model:
- JavaScript handles all structural decisions: is it a Yes/No question?
Does it contain an environmental keyword? - LLM (Llama-3.2-1B, WebGPU, runs fully in-browser) handles only creative
language generation — one focused task at a time. - Results:
- "How are you?" →
["I'm okay", "Not feeling well", "I'm in pain"] - "Are you feeling hot?" →
["Yes", "No", "Please turn on the fan"] - "Are you in pain?" →
["Yes", "No", "A little bit"]
- "How are you?" →
🐛 Bug Fixes
Audio / TTS
- Fixed: Every button press was speaking twice.
GazeButton.triggerActionwas dispatching adwell-clickevent AND calling
executeAction, which dispatched a seconddwell-click.SpeakHandler
caught both, producing two simultaneous TTS outputs. Removed the redundant
dispatch fromGazeButton.
Doctor STT Pipeline
-
Fixed: Doctor's speech silently dropped on the patient side.
useDataChannel()with no topic only receives messages on the default empty
topic. The Python agent publishes to"doctor_transcript", so all messages
were discarded. Replaced with explicituseDataChannel("doctor_transcript")
anduseDataChannel("patient_text")hooks. -
Fixed: Agent missed Doctor's audio on late join.
Thetrack_subscribedhandler only catches tracks published after the agent
joins. Added a post-connect loop overctx.room.remote_participantsto pick
up any Doctor audio track already in the room.
AI Predictions
-
Fixed: STT noise labels (e.g.
[typing]) triggering predictions.
Added bracket filtering inLiveKitWrapper.tsx— any transcript containing
[,],(, or)is discarded before reaching the LLM. -
Fixed: Context-aware fallbacks when LLM is loading.
Generic["Yes", "No", "I don't know"]replaced with question-type-aware
fallbacks: Yes/No →["Yes", "No", "I'm not sure"], open questions →
["I'm okay", "Not feeling well", "I need help"].
Session Lifecycle
- Fixed: Prediction buttons persisting after call ends.
setSessionStatenow resetspredictions,isContextResponse, and
isPredictingon disconnect, returning the patient interface to a clean state.
🏗️ Architecture Notes
- MQTT hardware broker: Mosquitto (local)
- Hardware client: Seeed Wio Terminal (Arduino/C++)
- STT: Deepgram Nova-2 via LiveKit Agents (Python)
- Local LLM: Llama-3.2-1B-Instruct (WebGPU, in-browser via MLC WebLLM)
- TTS: ONNX Kokoro (Web Worker, in-browser)