Skip to content

Iris v3.1.0 — Conversational Hardware Control & Telemedicine Reliability

Latest

Choose a tag to compare

@shawnsony07 shawnsony07 released this 11 Jun 11:16
· 18 commits to main since this release

This release introduces conversational IoT environment control for the first
time in Iris, alongside reliability fixes across audio, the doctor-to-patient
speech pipeline, and AI response quality.


✨ What's New

🌡️ Conversational Environment Control (First Release)

Iris can now control the patient's physical environment — fan and lights — both
manually and through natural conversation with the doctor.

  • Manual control: A dedicated Environment panel lets the patient toggle the
    fan and lights directly with a single tap.
  • Conversational control: When the doctor asks "Are you feeling hot?" and
    the patient selects "Yes", Iris automatically publishes an MQTT command to
    the patient's Wio Terminal, turning the fan on. No extra steps required.
  • Supported actions via conversation:
    • Doctor says "hot" or "warm" → Fan ON
    • Doctor says "cold" or "chilly" → Fan OFF
    • Doctor says "dark" or "dim" → Light ON
    • Doctor says "bright" → Light OFF
  • Hardware: Seeed Wio Terminal running a custom MQTT firmware, subscribing
    to commands from the Iris server over a local Mosquitto broker.
  • Architecture: Hardware control is fully deterministic — JavaScript keyword
    matching handles all trigger decisions. The local LLM is never involved in
    a time-critical hardware action.

🤝 Doctor-to-Patient Live Speech Pipeline

The doctor's speech now reaches the patient portal in real time via the LiveKit
data channel, triggering contextual AAC response predictions automatically.

  • Python STT agent (Deepgram via LiveKit) transcribes the doctor's microphone
    and publishes to the doctor_transcript topic.
  • Patient portal receives the transcript, displays it, and generates 3 response
    buttons tailored to the doctor's question within 1.5 seconds.

🧠 Hybrid AI Prediction Engine

Prediction button generation now uses a JS-first hybrid approach designed
for the constraints of a 1B in-browser model:

  • JavaScript handles all structural decisions: is it a Yes/No question?
    Does it contain an environmental keyword?
  • LLM (Llama-3.2-1B, WebGPU, runs fully in-browser) handles only creative
    language generation — one focused task at a time.
  • Results:
    • "How are you?"["I'm okay", "Not feeling well", "I'm in pain"]
    • "Are you feeling hot?"["Yes", "No", "Please turn on the fan"]
    • "Are you in pain?"["Yes", "No", "A little bit"]

🐛 Bug Fixes

Audio / TTS

  • Fixed: Every button press was speaking twice.
    GazeButton.triggerAction was dispatching a dwell-click event AND calling
    executeAction, which dispatched a second dwell-click. SpeakHandler
    caught both, producing two simultaneous TTS outputs. Removed the redundant
    dispatch from GazeButton.

Doctor STT Pipeline

  • Fixed: Doctor's speech silently dropped on the patient side.
    useDataChannel() with no topic only receives messages on the default empty
    topic. The Python agent publishes to "doctor_transcript", so all messages
    were discarded. Replaced with explicit useDataChannel("doctor_transcript")
    and useDataChannel("patient_text") hooks.

  • Fixed: Agent missed Doctor's audio on late join.
    The track_subscribed handler only catches tracks published after the agent
    joins. Added a post-connect loop over ctx.room.remote_participants to pick
    up any Doctor audio track already in the room.

AI Predictions

  • Fixed: STT noise labels (e.g. [typing]) triggering predictions.
    Added bracket filtering in LiveKitWrapper.tsx — any transcript containing
    [, ], (, or ) is discarded before reaching the LLM.

  • Fixed: Context-aware fallbacks when LLM is loading.
    Generic ["Yes", "No", "I don't know"] replaced with question-type-aware
    fallbacks: Yes/No → ["Yes", "No", "I'm not sure"], open questions →
    ["I'm okay", "Not feeling well", "I need help"].

Session Lifecycle

  • Fixed: Prediction buttons persisting after call ends.
    setSessionState now resets predictions, isContextResponse, and
    isPredicting on disconnect, returning the patient interface to a clean state.

🏗️ Architecture Notes

  • MQTT hardware broker: Mosquitto (local)
  • Hardware client: Seeed Wio Terminal (Arduino/C++)
  • STT: Deepgram Nova-2 via LiveKit Agents (Python)
  • Local LLM: Llama-3.2-1B-Instruct (WebGPU, in-browser via MLC WebLLM)
  • TTS: ONNX Kokoro (Web Worker, in-browser)