Release Iris v3.1.0 — Conversational Hardware Control & Telemedicine Reliability · shawnsony07/iris

This release introduces conversational IoT environment control for the first
time in Iris, alongside reliability fixes across audio, the doctor-to-patient
speech pipeline, and AI response quality.

✨ What's New

🌡️ Conversational Environment Control (First Release)

Iris can now control the patient's physical environment — fan and lights — both
manually and through natural conversation with the doctor.

Manual control: A dedicated Environment panel lets the patient toggle the
fan and lights directly with a single tap.
Conversational control: When the doctor asks "Are you feeling hot?" and
the patient selects "Yes", Iris automatically publishes an MQTT command to
the patient's Wio Terminal, turning the fan on. No extra steps required.
Supported actions via conversation:
- Doctor says "hot" or "warm" → Fan ON
- Doctor says "cold" or "chilly" → Fan OFF
- Doctor says "dark" or "dim" → Light ON
- Doctor says "bright" → Light OFF
Hardware: Seeed Wio Terminal running a custom MQTT firmware, subscribing
to commands from the Iris server over a local Mosquitto broker.
Architecture: Hardware control is fully deterministic — JavaScript keyword
matching handles all trigger decisions. The local LLM is never involved in
a time-critical hardware action.

🤝 Doctor-to-Patient Live Speech Pipeline

The doctor's speech now reaches the patient portal in real time via the LiveKit
data channel, triggering contextual AAC response predictions automatically.

Python STT agent (Deepgram via LiveKit) transcribes the doctor's microphone
and publishes to the doctor_transcript topic.
Patient portal receives the transcript, displays it, and generates 3 response
buttons tailored to the doctor's question within 1.5 seconds.

🧠 Hybrid AI Prediction Engine

Prediction button generation now uses a JS-first hybrid approach designed
for the constraints of a 1B in-browser model:

JavaScript handles all structural decisions: is it a Yes/No question?
Does it contain an environmental keyword?
LLM (Llama-3.2-1B, WebGPU, runs fully in-browser) handles only creative
language generation — one focused task at a time.
Results:
- "How are you?" → ["I'm okay", "Not feeling well", "I'm in pain"]
- "Are you feeling hot?" → ["Yes", "No", "Please turn on the fan"]
- "Are you in pain?" → ["Yes", "No", "A little bit"]

🐛 Bug Fixes

Audio / TTS

Fixed: Every button press was speaking twice.
GazeButton.triggerAction was dispatching a dwell-click event AND calling
executeAction, which dispatched a second dwell-click. SpeakHandler
caught both, producing two simultaneous TTS outputs. Removed the redundant
dispatch from GazeButton.

Doctor STT Pipeline

Fixed: Doctor's speech silently dropped on the patient side.
useDataChannel() with no topic only receives messages on the default empty
topic. The Python agent publishes to "doctor_transcript", so all messages
were discarded. Replaced with explicit useDataChannel("doctor_transcript")
and useDataChannel("patient_text") hooks.
Fixed: Agent missed Doctor's audio on late join.
The track_subscribed handler only catches tracks published after the agent
joins. Added a post-connect loop over ctx.room.remote_participants to pick
up any Doctor audio track already in the room.

AI Predictions

Fixed: STT noise labels (e.g. [typing]) triggering predictions.
Added bracket filtering in LiveKitWrapper.tsx — any transcript containing
[, ], (, or ) is discarded before reaching the LLM.
Fixed: Context-aware fallbacks when LLM is loading.
Generic ["Yes", "No", "I don't know"] replaced with question-type-aware
fallbacks: Yes/No → ["Yes", "No", "I'm not sure"], open questions →
["I'm okay", "Not feeling well", "I need help"].

Session Lifecycle

Fixed: Prediction buttons persisting after call ends.
setSessionState now resets predictions, isContextResponse, and
isPredicting on disconnect, returning the patient interface to a clean state.

🏗️ Architecture Notes

MQTT hardware broker: Mosquitto (local)
Hardware client: Seeed Wio Terminal (Arduino/C++)
STT: Deepgram Nova-2 via LiveKit Agents (Python)
Local LLM: Llama-3.2-1B-Instruct (WebGPU, in-browser via MLC WebLLM)
TTS: ONNX Kokoro (Web Worker, in-browser)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Iris v3.1.0 — Conversational Hardware Control & Telemedicine Reliability

Choose a tag to compare

Sorry, something went wrong.