Skip to content

Iris v3.0: WebRTC Telemedicine, Dual-Topology STT, & Predictive Local AI

Choose a tag to compare

@shawnsony07 shawnsony07 released this 10 Jun 13:46
· 23 commits to main since this release

🚀 New Features & Architecture

  • WebRTC Telemedicine Portals: Added full peer-to-peer telemedicine integration via LiveKit. The /doctor portal now allows caregivers/doctors to securely join a room with patients, providing real-time video, audio, and synchronous text feeds.
  • Dual-Topology Speech-To-Text (STT):
    • Local STT (Patient-side): Runs entirely in the browser using a dedicated Web Worker running Xenova's whisper-tiny.en for private, ambient context listening.
    • Cloud STT (Doctor-side): Added a highly robust Python LiveKit agent (worker/agent.py) utilizing Deepgram STT and Silero VAD for frame-perfect transcription accuracy over remote connections.
  • Predictive Conversational Local AI: Integrated @mlc-ai/web-llm running a quantized Llama-3.2-1B-Instruct model completely via WebGPU hardware acceleration. It intercepts the doctor's speech over the data channel to intelligently predict and generate three logical, first-person responses for the patient in under a second.
  • Automated Bootstrapping: Added start.bat and end.bat scripts for seamless 1-click execution of the Next.js frontend, the Python LiveKit background agent, and local port conflict management.

🐛 Bug Fixes & Improvements

  • Instant DataChannel Syncing: Fixed LiveKit token permissions (canPublishData) to allow flawless, bi-directional text syncing between the patient's generated speech and the doctor's live captions.
  • Old Conversation Wiping: Updated the Zustand global store to cleanly wipe old text captions whenever a call is disconnected, preventing old context from bleeding into new calls.
  • LLM Stability & Hallucination Fixes: Fixed severe AI rebellions ("I am not an AAC device") by stripping out the complex persona prompt. Additionally implemented custom regex filters to strip out any unwanted parenthetical meta-commentary (This means yes) the 1B model tries to append.
  • STT Silence Filtering: Added strict filters to automatically drop hallucinated [BLANK_AUDIO] tags from triggering the predictive engine.