Releases: shawnsony07/iris
Iris v3.1.0 — Conversational Hardware Control & Telemedicine Reliability
This release introduces conversational IoT environment control for the first
time in Iris, alongside reliability fixes across audio, the doctor-to-patient
speech pipeline, and AI response quality.
✨ What's New
🌡️ Conversational Environment Control (First Release)
Iris can now control the patient's physical environment — fan and lights — both
manually and through natural conversation with the doctor.
- Manual control: A dedicated Environment panel lets the patient toggle the
fan and lights directly with a single tap. - Conversational control: When the doctor asks "Are you feeling hot?" and
the patient selects "Yes", Iris automatically publishes an MQTT command to
the patient's Wio Terminal, turning the fan on. No extra steps required. - Supported actions via conversation:
- Doctor says "hot" or "warm" → Fan ON
- Doctor says "cold" or "chilly" → Fan OFF
- Doctor says "dark" or "dim" → Light ON
- Doctor says "bright" → Light OFF
- Hardware: Seeed Wio Terminal running a custom MQTT firmware, subscribing
to commands from the Iris server over a local Mosquitto broker. - Architecture: Hardware control is fully deterministic — JavaScript keyword
matching handles all trigger decisions. The local LLM is never involved in
a time-critical hardware action.
🤝 Doctor-to-Patient Live Speech Pipeline
The doctor's speech now reaches the patient portal in real time via the LiveKit
data channel, triggering contextual AAC response predictions automatically.
- Python STT agent (Deepgram via LiveKit) transcribes the doctor's microphone
and publishes to thedoctor_transcripttopic. - Patient portal receives the transcript, displays it, and generates 3 response
buttons tailored to the doctor's question within 1.5 seconds.
🧠 Hybrid AI Prediction Engine
Prediction button generation now uses a JS-first hybrid approach designed
for the constraints of a 1B in-browser model:
- JavaScript handles all structural decisions: is it a Yes/No question?
Does it contain an environmental keyword? - LLM (Llama-3.2-1B, WebGPU, runs fully in-browser) handles only creative
language generation — one focused task at a time. - Results:
- "How are you?" →
["I'm okay", "Not feeling well", "I'm in pain"] - "Are you feeling hot?" →
["Yes", "No", "Please turn on the fan"] - "Are you in pain?" →
["Yes", "No", "A little bit"]
- "How are you?" →
🐛 Bug Fixes
Audio / TTS
- Fixed: Every button press was speaking twice.
GazeButton.triggerActionwas dispatching adwell-clickevent AND calling
executeAction, which dispatched a seconddwell-click.SpeakHandler
caught both, producing two simultaneous TTS outputs. Removed the redundant
dispatch fromGazeButton.
Doctor STT Pipeline
-
Fixed: Doctor's speech silently dropped on the patient side.
useDataChannel()with no topic only receives messages on the default empty
topic. The Python agent publishes to"doctor_transcript", so all messages
were discarded. Replaced with explicituseDataChannel("doctor_transcript")
anduseDataChannel("patient_text")hooks. -
Fixed: Agent missed Doctor's audio on late join.
Thetrack_subscribedhandler only catches tracks published after the agent
joins. Added a post-connect loop overctx.room.remote_participantsto pick
up any Doctor audio track already in the room.
AI Predictions
-
Fixed: STT noise labels (e.g.
[typing]) triggering predictions.
Added bracket filtering inLiveKitWrapper.tsx— any transcript containing
[,],(, or)is discarded before reaching the LLM. -
Fixed: Context-aware fallbacks when LLM is loading.
Generic["Yes", "No", "I don't know"]replaced with question-type-aware
fallbacks: Yes/No →["Yes", "No", "I'm not sure"], open questions →
["I'm okay", "Not feeling well", "I need help"].
Session Lifecycle
- Fixed: Prediction buttons persisting after call ends.
setSessionStatenow resetspredictions,isContextResponse, and
isPredictingon disconnect, returning the patient interface to a clean state.
🏗️ Architecture Notes
- MQTT hardware broker: Mosquitto (local)
- Hardware client: Seeed Wio Terminal (Arduino/C++)
- STT: Deepgram Nova-2 via LiveKit Agents (Python)
- Local LLM: Llama-3.2-1B-Instruct (WebGPU, in-browser via MLC WebLLM)
- TTS: ONNX Kokoro (Web Worker, in-browser)
Iris v3.0: WebRTC Telemedicine, Dual-Topology STT, & Predictive Local AI
🚀 New Features & Architecture
- WebRTC Telemedicine Portals: Added full peer-to-peer telemedicine integration via LiveKit. The
/doctorportal now allows caregivers/doctors to securely join a room with patients, providing real-time video, audio, and synchronous text feeds. - Dual-Topology Speech-To-Text (STT):
- Local STT (Patient-side): Runs entirely in the browser using a dedicated Web Worker running Xenova's
whisper-tiny.enfor private, ambient context listening. - Cloud STT (Doctor-side): Added a highly robust Python LiveKit agent (
worker/agent.py) utilizing Deepgram STT and Silero VAD for frame-perfect transcription accuracy over remote connections.
- Local STT (Patient-side): Runs entirely in the browser using a dedicated Web Worker running Xenova's
- Predictive Conversational Local AI: Integrated
@mlc-ai/web-llmrunning a quantizedLlama-3.2-1B-Instructmodel completely via WebGPU hardware acceleration. It intercepts the doctor's speech over the data channel to intelligently predict and generate three logical, first-person responses for the patient in under a second. - Automated Bootstrapping: Added
start.batandend.batscripts for seamless 1-click execution of the Next.js frontend, the Python LiveKit background agent, and local port conflict management.
🐛 Bug Fixes & Improvements
- Instant DataChannel Syncing: Fixed LiveKit token permissions (
canPublishData) to allow flawless, bi-directional text syncing between the patient's generated speech and the doctor's live captions. - Old Conversation Wiping: Updated the Zustand global store to cleanly wipe old text captions whenever a call is disconnected, preventing old context from bleeding into new calls.
- LLM Stability & Hallucination Fixes: Fixed severe AI rebellions ("I am not an AAC device") by stripping out the complex persona prompt. Additionally implemented custom regex filters to strip out any unwanted parenthetical meta-commentary
(This means yes)the 1B model tries to append. - STT Silence Filtering: Added strict filters to automatically drop hallucinated
[BLANK_AUDIO]tags from triggering the predictive engine.
Iris v2.1.0: Interactive Neobrutalist Landing Page & Platform Stability
What's New in Iris v2.1.0
🎨 Revamped Neobrutalist Landing Page
- Interactive Cursor Tracking: Built a custom
<InteractiveEye />logo that tracks the user's cursor dynamically with fluid spring physics. - Neobrutalist Design System: Applied bold, high-contrast layouts, thick borders, scrolling marquee text banners, and snappy physical button-press animations.
- Responsiveness: Fully optimized layout for desktop and tablet displays to ensure clear readability.
📞 Twilio Emergency Lifeline
- Serverless Backend Routing: Added a serverless endpoint (
/api/twilio) integrating the Twilio API to handle critical notifications and emergency communication.
⚡ First-Load & Performance Optimizations
- Background Model Pre-loading: The WebLLM model download is now initiated in the background during calibration rather than blocking the user beforehand.
- Global Loading States: Leveraged global Zustand store integration to track and show download progress, providing a seamless loading transition.
🐛 Bug Fixes & Stability
- TTS/STT Feedback Loop Prevention: Fixed an echo loop issue by introducing automatic microphone buffer purging when the app speaks, combined with a safety silence-debounce window on the TTS service.
- TypeScript UI Fixes: Resolved interface build issues with
GazeButtonparameters.
v2.0.0 - Precision Gaze & Caregiver Lifeline
We release v2.0.0, moving Iris from a local eye-tracking prototype to a highly accessible communication utility with a real-world emergency lifeline!
Here are the major highlights in this release:
📞 Caregiver Twilio SMS Integration
- Emergency Lifeline: Integrated the official Twilio SDK and built a secure Next.js API endpoint (
/api/twilio). - Instant Alerts: Triggering the Emergency block via eye tracker now instantly fires a real-world SMS alert directly to a caregiver's phone.
👁️ High-Visibility Gaze Cursor & Centered UI
- Bolder Cursor: Upgraded the gaze cursor to be 50% larger with a vibrant red crosshair (
#ff0000) and a solid black background ring, making tracking a breeze. - Layout Optimization: Swapped default grid block positions so Yes and No sit comfortably centered on the primary row for faster access.
🤖 Local AI Persona & Reliability Fixes
- Hallucination Bypass: Added a hardcode bypass for simple "Yes" and "No" triggers so they generate instantly without LLM hallucination.
- Natural Human Prompting: Reworked the underlying WebLLM system prompt to bypass medical safety triggers. The local 1B model now speaks with a natural, helpful first-person human voice instead of saying "I am a robot".
🧼 UI & Visual Cleanups
- Neobrutalist Styling: Upgraded the legacy top toolbar buttons (Debug and Recenter) to match the bold Neobrutalist borders and hard shadows.
- Mockup Removal: Removed old static mockup overlays (like the static CPU/FACE chips) to make sure only live, active indicators are displayed.