v2.0.29 — Voice commands + per-device voice training
Feature: Voice commands + per-device voice training
Hands-free Wemo control from two surfaces — the Windows / macOS / Linux desktop app and the Docker / Synology web UI (accessible from any phone / tablet / laptop browser on your LAN). One shared library, two thin wrappers, zero new native dependencies.
Uses the browser-native webkitSpeechRecognition / SpeechRecognition API — already shipped in Chromium, Edge, and Safari. Free, no API key, no licence.
Command grammar (wake-word "dibby" required by default)
| Spoken | Action |
|---|---|
dibby turn on bonus room pot light |
Set device on |
dibby turn off deck master |
Set device off |
dibby toggle dining room pot lights |
Flip current state |
dibby bonus room on / dibby deck master off |
Terse form |
dibby turn everything on / dibby all off |
Bulk command across all known devices |
Device names are matched fuzzily against friendlyName — "deck mister" still hits "Deck Master" (Levenshtein score ≤ 0.4 = ≥ 60% similar). Below that threshold the app surfaces a "didn't recognise that device" toast instead of firing the wrong Wemo.
Per-device voice training (the accent-friendly part)
Fuzzy matching against friendlyName is great for typos but breaks down on accents, nicknames, and language mismatches. Each device now carries an optional voiceAliases: string[] field that competes with friendlyName on equal footing during matching, populated by recording phrases:
- Open the device's Info panel (desktop) or expand the device card (web UI).
- Click 🎤 add voice name.
- Say the phrase the way you actually say it — "deck light", "outside switch", "garage", whatever feels natural.
- The speech engine transcribes your recording and shows it back: "Heard: deck light — save?".
- On save the transcript is appended to that device's
voiceAliaseslist. Multiple aliases per device are supported — Deck Master Switch can answer to "deck", "deck light", and "outside light" simultaneously.
Why this works for accents: the alias is stored as whatever the user's own STT engine actually returned when they spoke the phrase. If Chrome consistently transcribes a user's "deck light" as "tek light" because of accent, the stored alias becomes "tek light" — and that's exactly what comes back at command time too. The alias and the live command go through the same transcription pipeline, so they match cleanly even when the literal English doesn't.
Aliases survive every plugin / app upgrade because they live in the same dibby-wemo.json / devices.json file as the rest of the device record.
Privacy disclosure (shown once per browser)
Voice commands use your browser's built-in speech recognition. Chrome and Edge stream audio to Google/Microsoft to transcribe it; Safari uses on-device recognition. Dibby Wemo never records, stores, or transmits audio itself.
Dismissal is persisted in localStorage so the modal shows once per browser, not once per session.
Firefox doesn't ship SpeechRecognition by default — the 🎤 button shows a disabled state with a tooltip recommending Chrome / Edge / Safari. No broken UI.
What's new in the UI
- Web UI (
apps/desktop/resources/web/index.html):- 🎤 toggle button in the Devices toolbar next to ⟳ Scan
- Live transcript bubble while listening (interim text in grey, finals in white with a ✓)
- Per-card voice-alias chips with × delete + 🎤 add voice name link
- Pulsing red glow on the toolbar mic button while the engine is active
- Electron desktop renderer:
VoiceCommandButton— Sidebar mic button with the same pulse animationVoiceAliasManager— embedded in the Info tab of every device; lists chips, records new aliases, deletes existing ones
- Help doc — full Voice Commands section: enabling voice, the command grammar, training aliases, privacy story, how to reset the disclosure
Backend additions
docker/server.js—GET/POST/DELETE /api/devices/{host}/{port}/voice-aliases[/{index}]+ static handlers for/voice-commands.jsand/voice-trainer.jswith the correctContent-Typeso the browser parses them as scripts (the fallbackindex.htmlroute would otherwise mis-serve them as HTML).apps/desktop/src/main/ipc/devices.ipc.js— new IPC channelsget-voice-aliases,add-voice-alias,remove-voice-alias. They mutate the in-place device record via the existingDwmStore.saveDevices, so HomeKit bridge sync / scheduler / etc. keep seeing the same device identity.apps/desktop/src/preload/index.js— bridge exposeswindow.wemoAPI.{getVoiceAliases, addVoiceAlias, removeVoiceAlias}.
Not in this release (deferred)
- Offline STT (whisper.cpp / vosk) — would add 100+ MB to every install. Re-evaluate when users specifically ask for offline mode.
- Voice authoring of DWM rules ("dibby schedule deck master on at sunset") — future release.
- Hardware-style always-on wake-word detection (picovoice / porcupine) — needs a paid licence or a 30 MB tflite model. Current soft wake-word ("dibby ..." prefix) is a reasonable compromise.
- Voice in the Homebridge plugin UI — separate iframe sandbox + different mic-permission model. Add later if requested.
Affected packages — unified version bump
- Desktop apps (Windows installer, macOS .dmg, Linux AppImage / deb / rpm) — functional change: voice button in sidebar, alias manager in device info
- Docker image
ghcr.io/k0rb3nd4ll4s/dibby-wemo-manager:2.0.29— functional change: web UI gains voice toolbar + per-card training + REST endpoints - Synology
.spk(apollolake / geminilake / denverton / broadwell / rtd1296) — functional change: same as Docker homebridge-dibby-wemo@2.0.29— version bump only, no functional change in this releasenode-red-contrib-dibby-wemo@2.0.29— version bump only- Home Assistant (HACS) 2.0.29 — version bump only
Upgrade
- Desktop: download the installer for your platform from this release's Assets after CI finishes attaching artifacts.
- Docker / Synology Container Manager: Stop → Build → Start (pulls
:latest). - Synology
.spk: download the new.spkfor your arch → Package Center → Manual Install (preserves data). - Homebridge:
npm install -g homebridge-dibby-wemo@2.0.29. - HACS: ⋮ → Reload data → Dibby Wemo → ⋮ → Redownload → 2.0.29 → restart HA.