-
-
Notifications
You must be signed in to change notification settings - Fork 13
Voice Control
Voice control is the core feature of Ava, allowing you to control smart home devices by speaking.
┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ You speak │ -> │ Ava records │ -> │Home Assistant│ -> │ Ava plays │
│ wake word + │ │ sends audio │ │ speech │ │ voice │
│ command │ │ │ │ recognition │ │ reply │
└─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘
Detailed Flow:
- Standby: Ava continuously listens for wake word (local processing, no internet)
- Wake Detection: When wake word detected, plays prompt sound, starts recording
- Audio Transmission: Recording sent to Home Assistant via ESPHome protocol
- Speech Recognition: Home Assistant's voice assistant performs speech-to-text
- Intent Processing: Home Assistant understands intent and executes action
- Speech Synthesis: Home Assistant generates voice response
- Playback: Ava receives and plays voice response
Ava supports two independent wake words. You can use one or both simultaneously.
Ava has two on-device wake word engines. Both run entirely locally — no audio leaves the device for wake detection.
| Dimension | Value |
|---|---|
| Architecture | TFLite binary classification |
| Model size | 50-80KB, uint8 quantized |
| Inference | 10ms frame, stride 3 |
| Frontend | microfeatures |
| Decision | 5-frame sliding window mean > threshold |
| Output | Scalar 0-1 probability |
| Wake-word swap | Full retraining required |
| CPU / memory | Minimal |
| False wake defense | Threshold only (zero-sum trade-off) |
| Interpretability | None |
| Best for | Low-end Android 5+ persistent background |
microWakeWord is the default engine. It uses TensorFlow Lite with tiny quantized models. Each wake word is a separate .tflite model file paired with a .json config. The detector runs a sliding window average over the last 5 frames and triggers when the average probability exceeds the cutoff.
Built-in micro models (9):
| Model ID | Wake Word | Author |
|---|---|---|
hey_jarvis |
Hey Jarvis | Kevin Ahrendt |
alexa |
Alexa | Kevin Ahrendt |
hey_home_assistant |
Hey Home Assistant | Michael Hansen |
hey_mycroft |
Hey Mycroft | Kevin Ahrendt |
hey_luna |
Hey Luna | adamlonsdale |
hey_peppa_pig |
Hey Peppa Pig | Michael Hansen |
okay_computer |
Okay Computer | Michael Hansen |
okay_nabu |
OK Nabu | Kevin Ahrendt |
choo_choo_homie |
Choo Choo Homie | Michael Hansen |
Stop word: stop (Stop) by Kevin Ahrendt.
| Dimension | Value |
|---|---|
| Architecture | ONNX CTC phoneme decoding + edit distance |
| Model size | ~500KB |
| Inference | 80ms cycle, 1300ms window, 128×40 feature map |
| Frontend | Log-Mel + adaptive noise floor + spectral VAD |
| Decision | Voice gate → CTC confidence → edit distance ≤1 → 2-hit confirm → 2s cooldown |
| Output | Phoneme sequence with traceability |
| Wake-word swap | Manifest JSON hot-swap |
| CPU / memory | Significantly higher |
| False wake defense | Multi-layer independent gates |
| Interpretability | Phoneme-level debugging |
| Best for | Noise-sensitive, explainability-required deployments |
vsWakeWord uses ONNX Runtime with CTC (Connectionist Temporal Classification) decoding. Instead of a binary "is this the wake word?" classifier, it decodes the audio into a phoneme sequence and matches against target phonemes with edit distance ≤1. This makes it more robust to noise and accents, at the cost of higher CPU usage.
Built-in vs models (3):
| Model ID | Wake Word | Type |
|---|---|---|
hey_jarvis |
Hey Jarvis | Wake word |
ok_nabu |
OK Nabu | Wake word |
ok_stop |
Ok Stop | Stop classifier |
vsWakeWord manifest structure:
Each vs model is a pair: id.json (manifest) + id.ort (ONNX model). The manifest defines:
{
"name": "hey_jarvis",
"format": "vs-wake-word-ctc-v1",
"recommended_threshold": 0.61,
"input": { "shape": [1, 128, 40], "feature": "log_mel" },
"output": { "shape": [1, 49, 52], "meaning": "frame_level_phoneme_log_probabilities" },
"feature_config": {
"sample_rate": 16000, "window_ms": 1300, "frame_ms": 25, "hop_ms": 10,
"n_fft": 512, "n_mels": 40, "f_min": 80.0, "f_max": 7600.0
},
"ctc": {
"vocab_size": 52, "blank_id": 1, "max_edit_distance": 1,
"wake_word_targets": [[27, 9, 15, 2, 24, 44, 5, 3, 36, 41, 15, 38]],
"wake_word_target_phonemes": [["h", "e", "ɪ", " ", "d", "ʒ", "ɑ", "ː", "ɹ", "v", "ɪ", "s"]]
},
"runtime": {
"required_hits": 2, "hit_mode": "consecutive",
"cooldown_ms": 2000, "high_confidence_bypass": 6.8
},
"stop_classifier": false
}Key fields:
-
wake_word_targets— phoneme ID sequences to match (multiple pronunciations allowed) -
wake_word_target_phonemes— human-readable phoneme sequences for debugging -
max_edit_distance— how many phoneme substitutions/insertions/deletions are tolerated (≤1) -
runtime.required_hits— how many consecutive matches needed to trigger (2 = double confirm) -
runtime.high_confidence_bypass— skip the hit counter if confidence is very high -
stop_classifier—truefor stop-word models (different gating logic)
| Dimension | microWakeWord | vsWakeWord |
|---|---|---|
| Architecture | TFLite binary classification | ONNX CTC phoneme decoding + edit distance |
| Model size | 50-80KB, uint8 quantized | ~500KB |
| Inference | 10ms frame, stride 3 | 80ms cycle, 1300ms window, 128×40 feature map |
| Frontend | microfeatures | Adaptive noise floor + spectral VAD |
| Decision | 5-frame sliding window mean | Voice gate → CTC confidence → edit distance ≤1 → 2-hit confirm → 2s cooldown |
| Output | Scalar 0-1 probability | Phoneme sequence with traceability |
| Wake-word swap | Full retraining required | Manifest JSON hot-swap |
| CPU / memory | Minimal | Significantly higher |
| False wake defense | Threshold only (zero-sum trade-off) | Multi-layer independent gates |
| Interpretability | None | Phoneme-level debugging |
| Best for | Low-end Android 5+ persistent background | Noise-sensitive, explainability-required deployments |
Each engine stores wake words independently (microWakeWords / vsWakeWords). Switching engines auto-restores the last selection — no more lost models or silent failures.
Cross-engine ID mapping: micro's okay_nabu auto-maps to VS's ok_nabu. HA-configured wake words also resolve correctly across engines.
Service auto-restarts on engine switch, keeping the detector in sync with settings.
Ava Pro is based on brownard/Ava. This section covers only the wake word engine differences.
| Dimension | brownard/Ava (original) | Ava Pro (knoop7/Ava) |
|---|---|---|
| Engine count | 1 (microWakeWord) | 2 (microWakeWord + vsWakeWord) |
| microWakeWord engine | TFLite binary classification, sliding window threshold | Same engine, same 9 built-in models |
| vsWakeWord engine | Not available | ONNX CTC phoneme decoding + edit distance + multi-layer false wake gates |
| Built-in models | 9 micro (.tflite) | 9 micro (.tflite) + 3 vs (.ort) |
| Model format (micro) | .tflite + .json | .tflite + .json (identical, V2/V3 compatible) |
| Model format (vs) | N/A | .ort (ONNX) + .json manifest with CTC phoneme targets |
| Custom model loading | DocumentTreeWakeWordProvider (SAF folder picker in Settings) | APK assets injection (MT Manager / APK Editor) — see below |
| False wake defense | Threshold only (single layer) | Threshold (micro) or voice gate + CTC confidence + edit distance + 2-hit confirm + cooldown (vs) |
| Inference runtime | TensorFlow Lite | TensorFlow Lite + ONNX Runtime (reduced build, CPU EP only) |
| CPU / memory footprint | Minimal | Minimal (micro) or significantly higher (vs) |
The microWakeWord engine itself is identical between both apps — same models, same inference path. The speed difference comes from what happens around the engine, not inside it.
1. Stop word detection is no longer always-on.
The original runs the stop-word model continuously alongside the wake-word model — two models inference every audio chunk, 24/7, even when nothing is happening. Ava Pro (since 0.5.2) only activates the stop-word model when it is actually useful: when a timer alarm is ringing, or when a voice session is in progress (Listening / Processing / Responding). During idle standby, the stop model is completely skipped. This cuts continuous CPU load in half during the 99% of the time the device is just waiting.
2. vsWakeWord skips inference during silence.
When using the vsWakeWord engine, Ava Pro runs a lightweight voice activity gate before feeding audio to the ONNX model. The gate analyzes audio energy and spectral characteristics in real time. If the input is silence or background noise (no human speech present), the entire ONNX inference is skipped — no model loading, no tensor computation, no phoneme decoding. The ONNX model only wakes up when the gate detects voice-like audio. On a quiet device sitting in a hallway, this means the ONNX engine effectively sleeps most of the time, while still catching the wake word the moment someone speaks.
3. Buffered audio replay on gate open.
When the voice gate transitions from closed to open (someone starts speaking), Ava Pro replays the last few seconds of buffered audio through the ONNX model in a single batch. This means the beginning of the wake word — which happened while the gate was still deciding — is not lost. The user experiences instant wake detection without waiting for the model to "warm up" from silence.
4. Incremental feature extraction.
vsWakeWord extracts log-Mel features from audio. Instead of recomputing the full feature window every chunk, Ava Pro shifts the existing feature buffer and only computes the new frames. On a 1300ms window with 80ms chunks, this means computing ~6 new frames instead of ~128 every cycle — a 20x reduction in FFT work.
5. Adaptive gain normalization.
vsWakeWord applies a smooth adaptive gain to normalize voice volume before inference. This is not about speed directly, but it means the model sees consistently-leveled audio regardless of distance or microphone sensitivity. Consistent input means the CTC confidence scores are more stable, which means the 2-hit confirmation gate reaches its threshold in fewer attempts — faster triggering with fewer false rejects.
6. ONNX Runtime is a stripped build.
The ONNX Runtime shipped with Ava Pro is a custom reduced build — only the CPU execution provider, no GPU/NNAPI delegates, no training APIs. This makes model loading and session creation faster, and the native library is smaller to load into memory. The tradeoff is no hardware acceleration, but for wake-word-sized models the CPU path is already fast enough and avoids the latency and compatibility issues of GPU/NNAPI on fragmented Android devices.
Net effect: On a typical device in idle standby, Ava Pro's wake-word CPU usage is lower than the original because stop-word inference is skipped. When someone speaks, vsWakeWord's voice gate + buffered replay + incremental features make detection feel instant despite the heavier model. The microWakeWord engine path matches the original's speed; the vsWakeWord path trades higher peak CPU for smarter gating and faster perceived response.
The original brownard/Ava ships a DocumentTreeWakeWordProvider that uses Android's Storage Access Framework (SAF). In Settings, you pick a folder, drop .tflite + .json files there, and the app loads them at runtime.
Ava Pro does not include this provider. The reasons are architectural:
-
Dual engine factory. Ava Pro's
WakeWordDetectorFactorydispatches to either microWakeWord or vsWakeWord. The original SAF provider only knows microWakeWord's.tfliteformat. vsWakeWord uses.ort(ONNX) + manifest JSON with CTC phoneme targets — a completely different model format. A single SAF folder cannot serve both engines without complex format detection and validation logic. -
ONNX Runtime safety. TFLite can load arbitrary
.tflitefiles from user storage safely. ONNX Runtime is more sensitive to model format mismatches — an invalid.ortfile could crash the native inference session or cause OOM on low-end devices. The asset-bundled approach guarantees models are validated at build time. -
Build-time validation. By bundling models in
assets/, the build system catches format errors, missing manifest fields, and phoneme inventory mismatches before the app ships. SAF-loaded models have no such guarantee — a malformed JSON could silently disable wake detection. -
Scope. Ava Pro added vsWakeWord, dual wake words, voiceprint, visual feedback, and stop word optimization. Re-implementing SAF support for both engine formats safely was deprioritized in favor of these features.
Ava Pro loads microWakeWord models from assets/wakeWords/ and vsWakeWord models from assets/vswakeword/ inside the APK. You can inject custom models without building from source — just edit the APK directly.
Tools needed:
- Android file manager with APK editing: MT Manager (Chinese), APK Editor Pro, or Nexus APK Editor
- Or on desktop:
apktool+zipalign+apksigner
Steps (MT Manager method):
- Download the Ava Pro lite APK from GitHub releases
- Open MT Manager, long-press the APK, select "View" (or "Extract")
- Navigate to
assets/wakeWords/(for microWakeWord) orassets/vswakeword/(for vsWakeWord) - Copy your custom model files into the directory:
- microWakeWord:
my_word.tflite+my_word.json - vsWakeWord:
my_word.ort+my_word.json
- microWakeWord:
- Save and repack the APK (MT Manager handles re-signing automatically)
- Uninstall the old Ava, install the modified APK
- Open Settings -> Voice Config -> Wake Word, your custom model appears in the list
Steps (desktop apktool method):
# Decompile
apktool d Ava-0.5.4.apk -o ava_decoded
# Add your model
cp my_word.tflite my_word.json ava_decoded/assets/wakeWords/
# For vsWakeWord
cp my_word.ort my_word.json ava_decoded/assets/vswakeword/
# Repack
apktool b ava_decoded -o Ava-custom.apk
# Sign
zipalign -v 4 Ava-custom.apk Ava-custom-aligned.apk
apksigner sign --ks ava-key.jks --ks-pass pass:1234 --out Ava-custom-signed.apk Ava-custom-aligned.apkJSON format requirements:
For microWakeWord (same as V3 community format):
{
"type": "micro",
"wake_word": "My Custom Word",
"author": "Your Name",
"model": "my_word.tflite",
"trained_languages": ["en"],
"version": 2,
"micro": {
"probability_cutoff": 0.6,
"sliding_window_size": 5,
"feature_step_size": 10,
"tensor_arena_size": 30000,
"minimum_esphome_version": "2024.7.0"
}
}For vsWakeWord (see manifest structure above for full template).
Note: This is an advanced operation intended for power users. Wake word customization is inherently an advanced topic — the models require training or downloading from trusted sources. The APK injection method works because Ava Pro reads models from the APK's
assets/directory at runtime, same as the built-in models. No source code compilation is needed.
Compatibility: Both apps share the same microWakeWord model format (V2/V3
.tflite+.json). Models from TaterTotterson/microWakeWords V3 directory work directly. Just drop the.tflite+.jsonpair intoassets/wakeWords/.
Ava uses microWakeWords V3 format. The JSON config is identical to the V3 models from the community.
The microWakeWords repo by TaterTotterson maintains a large library of pretrained V3 models. The format is directly compatible with Ava.
- Browse the microWakeWordsV3 directory
- Find a wake word you like (e.g.,
aleesa,angel,annika,arale,artamis, etc.) - Download both files:
name.jsonandname.tflite - Place them in Ava's
app/src/main/assets/wakeWords/directory - Rebuild the APK
The V3 JSON format is identical to Ava's built-in models:
{
"type": "micro",
"wake_word": "ah_lehks_sah",
"author": "Tater Totterson",
"website": "https://github.com/TaterTotterson/microWakeWord-Trainer-AppleSilicon",
"model": "ah_lehks_sah.tflite",
"trained_languages": ["en"],
"version": 2,
"micro": {
"probability_cutoff": 0.1,
"sliding_window_size": 3,
"feature_step_size": 10,
"tensor_arena_size": 30000,
"minimum_esphome_version": "2024.7.0"
}
}If you have a Mac with Apple Silicon (M1/M2/M3/M4), you can train a custom wake word with a local web UI:
-
Install the trainer:
- Download the signed macOS app from WakeWord Trainer releases
- Or clone and run from source:
git clone https://github.com/TaterTotterson/microWakeWord-Trainer-AppleSilicon.git cd microWakeWord-Trainer-AppleSilicon ./run.sh - Open
http://127.0.0.1:8789in your browser
-
Train the wake word:
- Enter your wake phrase in the Trainer tab
- Choose language (en, or other Piper-supported languages)
- Optionally test pronunciation with Test TTS
- Click Start training
- The trainer uses Piper TTS to generate samples automatically
- Personal samples are optional but improve accuracy
-
Optionally capture real samples from devices:
- Flash a device with Tater firmware (from the Firmware tab)
- Enable
Capture Wake Audioon the device - Set
Trainer App URLtohttp://<trainer-ip>:8789 - Review captured clips in the Captured Audio tab
- Mark good clips as "This is good", bad ones as "False wake"
-
Get the output files:
- Successful training produces:
trained_wake_words/<wake_word>.tflitetrained_wake_words/<wake_word>.json
- Successful training produces:
-
Install into Ava:
- Copy both files to Ava's
app/src/main/assets/wakeWords/directory - Rebuild the APK
- Copy both files to Ava's
git clone https://github.com/TaterTotterson/microWakeWord-Trainer-AppleSilicon.git
cd microWakeWord-Trainer-AppleSilicon
./train_microwakeword_macos.sh "hey_my_custom_word"If personal_samples/*.wav or negative_samples/*.wav exist in the folder, they are included automatically.
| Parameter | Description | Typical Value |
|---|---|---|
probability_cutoff |
Detection threshold (lower = easier to trigger, more false positives) | 0.1 - 0.97 |
sliding_window_size |
Frames to average before triggering | 3 - 9 |
feature_step_size |
Feature extraction step in ms | 10 |
tensor_arena_size |
TFLite arena size in bytes (must match model) | 21000 - 30000 |
minimum_esphome_version |
Minimum ESPHome version | 2024.7.0 |
Note: microWakeWord requires a trained model for each wake word. There is no hot-swap — you need a
.tflitefile. But the V3 community library has 100+ pretrained models you can download directly.
vsWakeWord supports manifest JSON hot-swap. To create a custom wake word:
- Train a CTC model using the vsWakeWord training pipeline (PyTorch → ONNX export)
-
Create a manifest JSON with your phoneme targets:
{ "name": "my_custom_word", "format": "vs-wake-word-ctc-v1", "recommended_threshold": 0.6, "input": { "name": "input", "shape": [1, 128, 40], "dtype": "float32", "feature": "log_mel" }, "output": { "name": "log_probs", "shape": [1, 49, 52], "dtype": "float32" }, "feature_config": { "sample_rate": 16000, "window_ms": 1300, "frame_ms": 25, "hop_ms": 10, "n_fft": 512, "n_mels": 40, "f_min": 80.0, "f_max": 7600.0, "log_floor": 1e-06 }, "ctc": { "vocab_size": 52, "blank_id": 1, "pad_id": 0, "word_sep_id": 2, "wake_word_targets": [[your_phoneme_ids]], "wake_word_target_phonemes": [["your", "phonemes"]], "max_edit_distance": 1 }, "runtime": { "required_hits": 2, "hit_mode": "consecutive", "cooldown_ms": 2000, "high_confidence_bypass": 6.8 }, "stop_classifier": false } -
Place both files (
id.json+id.ort) inassets/vswakeword/directory - Rebuild the Ava APK
The 52-phoneme inventory uses IPA-style symbols. See an existing manifest (e.g., hey_jarvis.json) for the full inventory list.
Advantage: vsWakeWord's manifest-based approach lets you swap wake word targets without retraining the base model in some cases — just update the
wake_word_targetsphoneme IDs. However, for best accuracy, a model trained on your specific wake word is recommended.
- Open Ava app
- Go to Settings -> Voice Config
- Find Wake Word Engine and choose microWakeWord or vsWakeWord
- Find Wake Word 1 option
- Select your preferred wake word from the list
- Optionally configure Wake Word 2 for dual wake word mode
- New wake word takes effect after service restart (auto)
Adjust sensitivity to control how easily the wake word triggers:
- Higher sensitivity = easier to trigger, but more false positives
- Lower sensitivity = fewer false positives, but may miss quiet speech
Each wake word can have its own wake sound:
- Wake Word 1 Sound: Played when Wake Word 1 is detected
- Wake Word 2 Sound: Played when Wake Word 2 is detected
- Default Sound: Used if no custom sound is set
- None: Silent recording start
Ava provides clear visual feedback during wake and conversation:
Wake Instant:
- Colorful ripple expanding from screen center
- Android 13+: RuntimeShader with distorted halo + star particles
- Android 7: Soft circular diffusion
- Compatibility paths for other versions
Conversation (when Floating Subtitle is disabled):
- Full-screen edge glow that changes with state:
- Listening: Edge light breathes with microphone volume
- Processing: Slow breathing animation
- Speaking: Pulsates with TTS energy
Dual Wake-Word Color Coding:
- Wake Word 1 = green (default)
- Wake Word 2 = blue (default)
- Ripple and edge light match the triggered wake word
- Custom colors available in Settings → Extensions → Interface → Voice feedback colors
- 7 rainbow presets (red through purple) also available
Technical Notes:
- Edge glow uses pre-rendered Gaussian blur bitmaps for performance
- Ripple animation driven by system uptime (prevents Kiosk devices with "animation duration = 0" from killing the effect)
- Android 7.0/7.1 optimized to clean circular diffusion without Shader dependency
Ava Pro includes an experimental on-device smart voiceprint recognition feature. Once enabled, Ava learns frequent wakeword callers from short local samples. On subsequent wakeword triggers, it identifies who likely woke the device and publishes the result to a Home Assistant "Voiceprint Status" sensor.
This is not a heavyweight cloud voiceprint model. It is a lightweight local matcher purpose-built for Ava's fixed-wakeword scenario. It uses short local samples, audio feature extraction, quality gating, and conservative learning — no large model packages required.
- After you wake the device a few times, it learns who usually speaks
- All audio stays local, nothing is uploaded to cloud
- Two user slots available, each with a custom display name
- Go to Settings -> Voice Config
- Turn on Smart voiceprint recognition
- Set User 1 name and User 2 name
- Wake the device a few times for each user
- The voiceprint status sensor appears in Home Assistant
sensor.your_device_name_voice_print_status
Choose how the microphone captures audio:
| Mode | Description |
|---|---|
| Auto select | Lets the device choose. Best default |
| Speech boost | Tries to make speech clearer. May change tone |
| Normal pickup | Plain microphone input. Safest choice |
| Call boost | More like call audio. May over-process |
| Unprocessed | Closest to raw input. Good for testing |
| Option | Description |
|---|---|
| Noise suppression | Use device noise suppressor when available |
| Automatic gain control | Use device AGC when available |
| Hardware echo cancellation | Enable hardware echo cancellation when supported |
| Software echo cancellation | Cancel Ava's own playback from microphone so wake words work during music/TTS playback |
Software AEC is recommended when using music playback or TTS. It replaces the device echo canceler while enabled.
Boost PCM audio before wake word detection and streaming. Range: 0-24 dB.
| Preset | Description |
|---|---|
| Standard 16kHz Mono | Default for voice recognition |
| Broadcast 48kHz Mono | Higher quality capture |
| Stereo Input 48kHz | Stereo recording |
| Low Latency 16kHz | Reduced latency |
| Voice Call Optimized | For voice call use case |
| Unprocessed Raw 48kHz | Raw capture, no processing |
| Compact 8kHz | Lowest bandwidth |
| CD Quality 44.1kHz | High quality |
| Studio 96kHz | Very high quality |
| Ultra HD 192kHz | Maximum quality |
The Voice Channel switch disables all voice input, wake word, and voice assistant threads. Service restart required to apply.
Stop words interrupt the current conversation or stop Ava's response (e.g., timer alarm).
| Stop Word | Engine | Model ID | Description |
|---|---|---|---|
| Stop | microWakeWord | stop |
Default stop word for micro engine |
| Ok Stop | vsWakeWord | ok_stop |
Stop word for vs engine (stop_classifier: true) |
Stop-word detection only runs when actually needed (since 0.5.2):
- A timer alarm is actively ringing
- A voice session is in progress (Listening, Processing, Responding)
During idle standby with no alarm, the stop model is skipped — cutting CPU load and heat.
Continuous conversation lets you issue multiple commands without saying the wake word each time.
- Say wake word + first command
- After Ava responds, automatically enters listening mode
- Say next command directly (no wake word needed)
- After 10 seconds of silence, exits continuous conversation mode
| Mode | Description |
|---|---|
| Exit Keyword Stop | End when the assistant says goodbye or farewell phrases (default) |
| Question Mark Continue | Keep listening only if the reply ends with "?"; stop after other replies |
- Go to Settings -> Interaction
- Turn on Continuous Conversation switch
- Choose end condition mode
Conversation subtitles display what you said and Ava's response on screen as a floating window.
- Go to Settings -> Interaction
- Turn on Floating Subtitle switch
Mute mode turns off the microphone, Ava won't respond to any wake words.
Method 1: In Settings
- Go to Settings -> Voice Config
- Turn on Mute switch
Method 2: Home Assistant Control
service: switch.turn_on
target:
entity_id: switch.your_device_name_mute| Setting | Location | Description | Default |
|---|---|---|---|
| Device Name | Voice Config | Name shown in HA | device_model_voice_assistant |
| Port | Voice Config | ESPHome communication port | 6053 |
| Wake Word Engine | Voice Config | microWakeWord or vsWakeWord | microWakeWord |
| Wake Word 1 | Voice Config | Primary wake word | Hey Jarvis |
| Wake Word 2 | Voice Config | Secondary wake word | None |
| Sensitivity | Voice Config | Wake word sensitivity | - |
| Wake Sound | Voice Config | Prompt sound when recording starts | Optional |
| Voice Channel | Voice Config | Master voice input switch | On |
| System Recording Mode | Voice Config | Audio capture mode | Auto select |
| Noise suppression | Voice Config | Device noise suppressor | - |
| Automatic gain control | Voice Config | Device AGC | - |
| Hardware echo cancellation | Voice Config | Hardware AEC | - |
| Software echo cancellation | Voice Config | Software AEC for playback | - |
| Voiceprint recognition | Voice Config | Speaker recognition | Off |
| Software Mic Gain | Voice Config | PCM boost 0-24 dB | 0 |
| Audio Config | Voice Config | Capture preset | Standard 16kHz Mono |
| Mute | Voice Config | Turn off microphone | Off |
| Continuous Conversation | Interaction | Issue commands continuously | Off |
| Floating Subtitle | Interaction | Display conversation text | Off |
service: esphome.your_device_name_trigger_wake
data: {}# Enable mute
service: switch.turn_on
target:
entity_id: switch.your_device_name_mute
# Disable mute
service: switch.turn_off
target:
entity_id: switch.your_device_name_muteservice: media_player.volume_set
target:
entity_id: media_player.your_device_name
data:
volume_level: 0.8 # 0.0 - 1.0service: text.set_value
target:
entity_id: text.your_device_name_voice_command
data:
value: "Turn on the living room light"- Check if microphone permission is granted
- Check if mute mode is enabled
- Check if Voice Channel is enabled
- Make sure device volume isn't muted
- Try speaking closer to device
- Try a different System Recording Mode
- Try a different wake word
- Adjust wake word sensitivity
- Make sure environment isn't too noisy
- Enable noise suppression
- Speak at moderate speed, pronounce clearly
- Try vsWakeWord engine — its CTC phoneme matching is more robust to noise and accents
- Enable Software echo cancellation in Settings -> Voice Config
- This cancels Ava's own playback from the microphone input
This is usually a Home Assistant issue:
- Check HA voice assistant configuration
- Make sure Whisper etc. components are working
- Check network latency
- Try speaking more clearly
Back to Home