Releases · shawnsony07/iris

11 Jun 11:16

v3.1.0

f962666

Iris v3.1.0 — Conversational Hardware Control & Telemedicine Reliability Latest

Latest

This release introduces conversational IoT environment control for the first
time in Iris, alongside reliability fixes across audio, the doctor-to-patient
speech pipeline, and AI response quality.

✨ What's New

🌡️ Conversational Environment Control (First Release)

Iris can now control the patient's physical environment — fan and lights — both
manually and through natural conversation with the doctor.

Manual control: A dedicated Environment panel lets the patient toggle the
fan and lights directly with a single tap.
Conversational control: When the doctor asks "Are you feeling hot?" and
the patient selects "Yes", Iris automatically publishes an MQTT command to
the patient's Wio Terminal, turning the fan on. No extra steps required.
Supported actions via conversation:
- Doctor says "hot" or "warm" → Fan ON
- Doctor says "cold" or "chilly" → Fan OFF
- Doctor says "dark" or "dim" → Light ON
- Doctor says "bright" → Light OFF
Hardware: Seeed Wio Terminal running a custom MQTT firmware, subscribing
to commands from the Iris server over a local Mosquitto broker.
Architecture: Hardware control is fully deterministic — JavaScript keyword
matching handles all trigger decisions. The local LLM is never involved in
a time-critical hardware action.

🤝 Doctor-to-Patient Live Speech Pipeline

The doctor's speech now reaches the patient portal in real time via the LiveKit
data channel, triggering contextual AAC response predictions automatically.

Python STT agent (Deepgram via LiveKit) transcribes the doctor's microphone
and publishes to the doctor_transcript topic.
Patient portal receives the transcript, displays it, and generates 3 response
buttons tailored to the doctor's question within 1.5 seconds.

🧠 Hybrid AI Prediction Engine

Prediction button generation now uses a JS-first hybrid approach designed
for the constraints of a 1B in-browser model:

JavaScript handles all structural decisions: is it a Yes/No question?
Does it contain an environmental keyword?
LLM (Llama-3.2-1B, WebGPU, runs fully in-browser) handles only creative
language generation — one focused task at a time.
Results:
- "How are you?" → ["I'm okay", "Not feeling well", "I'm in pain"]
- "Are you feeling hot?" → ["Yes", "No", "Please turn on the fan"]
- "Are you in pain?" → ["Yes", "No", "A little bit"]

🐛 Bug Fixes

Audio / TTS

Fixed: Every button press was speaking twice.
GazeButton.triggerAction was dispatching a dwell-click event AND calling
executeAction, which dispatched a second dwell-click. SpeakHandler
caught both, producing two simultaneous TTS outputs. Removed the redundant
dispatch from GazeButton.

Doctor STT Pipeline

Fixed: Doctor's speech silently dropped on the patient side.
useDataChannel() with no topic only receives messages on the default empty
topic. The Python agent publishes to "doctor_transcript", so all messages
were discarded. Replaced with explicit useDataChannel("doctor_transcript")
and useDataChannel("patient_text") hooks.
Fixed: Agent missed Doctor's audio on late join.
The track_subscribed handler only catches tracks published after the agent
joins. Added a post-connect loop over ctx.room.remote_participants to pick
up any Doctor audio track already in the room.

AI Predictions

Fixed: STT noise labels (e.g. [typing]) triggering predictions.
Added bracket filtering in LiveKitWrapper.tsx — any transcript containing
[, ], (, or ) is discarded before reaching the LLM.
Fixed: Context-aware fallbacks when LLM is loading.
Generic ["Yes", "No", "I don't know"] replaced with question-type-aware
fallbacks: Yes/No → ["Yes", "No", "I'm not sure"], open questions →
["I'm okay", "Not feeling well", "I need help"].

Session Lifecycle

Fixed: Prediction buttons persisting after call ends.
setSessionState now resets predictions, isContextResponse, and
isPredicting on disconnect, returning the patient interface to a clean state.

🏗️ Architecture Notes

MQTT hardware broker: Mosquitto (local)
Hardware client: Seeed Wio Terminal (Arduino/C++)
STT: Deepgram Nova-2 via LiveKit Agents (Python)
Local LLM: Llama-3.2-1B-Instruct (WebGPU, in-browser via MLC WebLLM)
TTS: ONNX Kokoro (Web Worker, in-browser)

Assets 2

10 Jun 13:46

shawnsony07

v3.0.0

53489bc

Iris v3.0: WebRTC Telemedicine, Dual-Topology STT, & Predictive Local AI

🚀 New Features & Architecture

WebRTC Telemedicine Portals: Added full peer-to-peer telemedicine integration via LiveKit. The /doctor portal now allows caregivers/doctors to securely join a room with patients, providing real-time video, audio, and synchronous text feeds.
Dual-Topology Speech-To-Text (STT):
- Local STT (Patient-side): Runs entirely in the browser using a dedicated Web Worker running Xenova's whisper-tiny.en for private, ambient context listening.
- Cloud STT (Doctor-side): Added a highly robust Python LiveKit agent (worker/agent.py) utilizing Deepgram STT and Silero VAD for frame-perfect transcription accuracy over remote connections.
Predictive Conversational Local AI: Integrated @mlc-ai/web-llm running a quantized Llama-3.2-1B-Instruct model completely via WebGPU hardware acceleration. It intercepts the doctor's speech over the data channel to intelligently predict and generate three logical, first-person responses for the patient in under a second.
Automated Bootstrapping: Added start.bat and end.bat scripts for seamless 1-click execution of the Next.js frontend, the Python LiveKit background agent, and local port conflict management.

🐛 Bug Fixes & Improvements

Instant DataChannel Syncing: Fixed LiveKit token permissions (canPublishData) to allow flawless, bi-directional text syncing between the patient's generated speech and the doctor's live captions.
Old Conversation Wiping: Updated the Zustand global store to cleanly wipe old text captions whenever a call is disconnected, preventing old context from bleeding into new calls.
LLM Stability & Hallucination Fixes: Fixed severe AI rebellions ("I am not an AAC device") by stripping out the complex persona prompt. Additionally implemented custom regex filters to strip out any unwanted parenthetical meta-commentary (This means yes) the 1B model tries to append.
STT Silence Filtering: Added strict filters to automatically drop hallucinated [BLANK_AUDIO] tags from triggering the predictive engine.

Assets 2

10 Jun 06:34

shawnsony07

v2.1.0

1d75a39

Iris v2.1.0: Interactive Neobrutalist Landing Page & Platform Stability

What's New in Iris v2.1.0

🎨 Revamped Neobrutalist Landing Page

Interactive Cursor Tracking: Built a custom <InteractiveEye /> logo that tracks the user's cursor dynamically with fluid spring physics.
Neobrutalist Design System: Applied bold, high-contrast layouts, thick borders, scrolling marquee text banners, and snappy physical button-press animations.
Responsiveness: Fully optimized layout for desktop and tablet displays to ensure clear readability.

📞 Twilio Emergency Lifeline

Serverless Backend Routing: Added a serverless endpoint (/api/twilio) integrating the Twilio API to handle critical notifications and emergency communication.

⚡ First-Load & Performance Optimizations

Background Model Pre-loading: The WebLLM model download is now initiated in the background during calibration rather than blocking the user beforehand.
Global Loading States: Leveraged global Zustand store integration to track and show download progress, providing a seamless loading transition.

🐛 Bug Fixes & Stability

TTS/STT Feedback Loop Prevention: Fixed an echo loop issue by introducing automatic microphone buffer purging when the app speaks, combined with a safety silence-debounce window on the TTS service.
TypeScript UI Fixes: Resolved interface build issues with GazeButton parameters.

Assets 2

02 Jun 08:30

shawnsony07

v2.0.0

d6fe6be

v2.0.0 - Precision Gaze & Caregiver Lifeline

We release v2.0.0, moving Iris from a local eye-tracking prototype to a highly accessible communication utility with a real-world emergency lifeline!

Here are the major highlights in this release:

📞 Caregiver Twilio SMS Integration

Emergency Lifeline: Integrated the official Twilio SDK and built a secure Next.js API endpoint (/api/twilio).
Instant Alerts: Triggering the Emergency block via eye tracker now instantly fires a real-world SMS alert directly to a caregiver's phone.

👁️ High-Visibility Gaze Cursor & Centered UI

Bolder Cursor: Upgraded the gaze cursor to be 50% larger with a vibrant red crosshair (#ff0000) and a solid black background ring, making tracking a breeze.
Layout Optimization: Swapped default grid block positions so Yes and No sit comfortably centered on the primary row for faster access.

🤖 Local AI Persona & Reliability Fixes

Hallucination Bypass: Added a hardcode bypass for simple "Yes" and "No" triggers so they generate instantly without LLM hallucination.
Natural Human Prompting: Reworked the underlying WebLLM system prompt to bypass medical safety triggers. The local 1B model now speaks with a natural, helpful first-person human voice instead of saying "I am a robot".

🧼 UI & Visual Cleanups

Neobrutalist Styling: Upgraded the legacy top toolbar buttons (Debug and Recenter) to match the bold Neobrutalist borders and hard shadows.
Mockup Removal: Removed old static mockup overlays (like the static CPU/FACE chips) to make sure only live, active indicators are displayed.

Assets 2

Releases: shawnsony07/iris

Iris v3.1.0 — Conversational Hardware Control & Telemedicine Reliability

✨ What's New

🌡️ Conversational Environment Control (First Release)

🤝 Doctor-to-Patient Live Speech Pipeline

🧠 Hybrid AI Prediction Engine

🐛 Bug Fixes

Audio / TTS

Doctor STT Pipeline

AI Predictions

Session Lifecycle

🏗️ Architecture Notes

Uh oh!

Iris v3.0: WebRTC Telemedicine, Dual-Topology STT, & Predictive Local AI

🚀 New Features & Architecture

🐛 Bug Fixes & Improvements

Uh oh!

Iris v2.1.0: Interactive Neobrutalist Landing Page & Platform Stability

What's New in Iris v2.1.0

🎨 Revamped Neobrutalist Landing Page

📞 Twilio Emergency Lifeline

⚡ First-Load & Performance Optimizations

🐛 Bug Fixes & Stability

Uh oh!

v2.0.0 - Precision Gaze & Caregiver Lifeline

📞 Caregiver Twilio SMS Integration

👁️ High-Visibility Gaze Cursor & Centered UI

🤖 Local AI Persona & Reliability Fixes

🧼 UI & Visual Cleanups

Uh oh!