Voice Control

Voice control is the core feature of Ava, allowing you to control smart home devices by speaking.

How It Works

┌─────────────┐    ┌─────────────┐    ┌─────────────┐    ┌─────────────┐
│  You speak  │ -> │ Ava records │ -> │Home Assistant│ -> │ Ava plays   │
│ wake word + │    │ sends audio │    │   speech    │    │   voice     │
│   command   │    │             │    │ recognition │    │   reply     │
└─────────────┘    └─────────────┘    └─────────────┘    └─────────────┘

Detailed Flow:

Standby: Ava continuously listens for wake word (local processing, no internet)
Wake Detection: When wake word detected, plays prompt sound, starts recording
Audio Transmission: Recording sent to Home Assistant via ESPHome protocol
Speech Recognition: Home Assistant's voice assistant performs speech-to-text
Intent Processing: Home Assistant understands intent and executes action
Speech Synthesis: Home Assistant generates voice response
Playback: Ava receives and plays voice response

Wake Words

Ava supports two independent wake words. You can use one or both simultaneously.

Wake Word Engines

Ava has two on-device wake word engines. Both run entirely locally — no audio leaves the device for wake detection.

microWakeWord (default)

Dimension	Value
Architecture	TFLite binary classification
Model size	50-80KB, uint8 quantized
Inference	10ms frame, stride 3
Frontend	microfeatures
Decision	5-frame sliding window mean > threshold
Output	Scalar 0-1 probability
Wake-word swap	Full retraining required
CPU / memory	Minimal
False wake defense	Threshold only (zero-sum trade-off)
Interpretability	None
Best for	Low-end Android 5+ persistent background

microWakeWord is the default engine. It uses TensorFlow Lite with tiny quantized models. Each wake word is a separate .tflite model file paired with a .json config. The detector runs a sliding window average over the last 5 frames and triggers when the average probability exceeds the cutoff.

Built-in micro models (9):

Model ID	Wake Word	Author
`hey_jarvis`	Hey Jarvis	Kevin Ahrendt
`alexa`	Alexa	Kevin Ahrendt
`hey_home_assistant`	Hey Home Assistant	Michael Hansen
`hey_mycroft`	Hey Mycroft	Kevin Ahrendt
`hey_luna`	Hey Luna	adamlonsdale
`hey_peppa_pig`	Hey Peppa Pig	Michael Hansen
`okay_computer`	Okay Computer	Michael Hansen
`okay_nabu`	OK Nabu	Kevin Ahrendt
`choo_choo_homie`	Choo Choo Homie	Michael Hansen

Stop word: stop (Stop) by Kevin Ahrendt.

vsWakeWord

Dimension	Value
Architecture	ONNX CTC phoneme decoding + edit distance
Model size	~500KB
Inference	80ms cycle, 1300ms window, 128×40 feature map
Frontend	Log-Mel + adaptive noise floor + spectral VAD
Decision	Voice gate → CTC confidence → edit distance ≤1 → 2-hit confirm → 2s cooldown
Output	Phoneme sequence with traceability
Wake-word swap	Manifest JSON hot-swap
CPU / memory	Significantly higher
False wake defense	Multi-layer independent gates
Interpretability	Phoneme-level debugging
Best for	Noise-sensitive, explainability-required deployments

vsWakeWord uses ONNX Runtime with CTC (Connectionist Temporal Classification) decoding. Instead of a binary "is this the wake word?" classifier, it decodes the audio into a phoneme sequence and matches against target phonemes with edit distance ≤1. This makes it more robust to noise and accents, at the cost of higher CPU usage.

Built-in vs models (3):

Model ID	Wake Word	Type
`hey_jarvis`	Hey Jarvis	Wake word
`ok_nabu`	OK Nabu	Wake word
`ok_stop`	Ok Stop	Stop classifier

vsWakeWord manifest structure:

Each vs model is a pair: id.json (manifest) + id.ort (ONNX model). The manifest defines:

{
  "name": "hey_jarvis",
  "format": "vs-wake-word-ctc-v1",
  "recommended_threshold": 0.61,
  "input": { "shape": [1, 128, 40], "feature": "log_mel" },
  "output": { "shape": [1, 49, 52], "meaning": "frame_level_phoneme_log_probabilities" },
  "feature_config": {
    "sample_rate": 16000, "window_ms": 1300, "frame_ms": 25, "hop_ms": 10,
    "n_fft": 512, "n_mels": 40, "f_min": 80.0, "f_max": 7600.0
  },
  "ctc": {
    "vocab_size": 52, "blank_id": 1, "max_edit_distance": 1,
    "wake_word_targets": [[27, 9, 15, 2, 24, 44, 5, 3, 36, 41, 15, 38]],
    "wake_word_target_phonemes": [["h", "e", "ɪ", " ", "d", "ʒ", "ɑ", "ː", "ɹ", "v", "ɪ", "s"]]
  },
  "runtime": {
    "required_hits": 2, "hit_mode": "consecutive",
    "cooldown_ms": 2000, "high_confidence_bypass": 6.8
  },
  "stop_classifier": false
}

Key fields:

wake_word_targets — phoneme ID sequences to match (multiple pronunciations allowed)
wake_word_target_phonemes — human-readable phoneme sequences for debugging
max_edit_distance — how many phoneme substitutions/insertions/deletions are tolerated (≤1)
runtime.required_hits — how many consecutive matches needed to trigger (2 = double confirm)
runtime.high_confidence_bypass — skip the hit counter if confidence is very high
stop_classifier — true for stop-word models (different gating logic)

Engine Comparison

Dimension	microWakeWord	vsWakeWord
Architecture	TFLite binary classification	ONNX CTC phoneme decoding + edit distance
Model size	50-80KB, uint8 quantized	~500KB
Inference	10ms frame, stride 3	80ms cycle, 1300ms window, 128×40 feature map
Frontend	microfeatures	Adaptive noise floor + spectral VAD
Decision	5-frame sliding window mean	Voice gate → CTC confidence → edit distance ≤1 → 2-hit confirm → 2s cooldown
Output	Scalar 0-1 probability	Phoneme sequence with traceability
Wake-word swap	Full retraining required	Manifest JSON hot-swap
CPU / memory	Minimal	Significantly higher
False wake defense	Threshold only (zero-sum trade-off)	Multi-layer independent gates
Interpretability	None	Phoneme-level debugging
Best for	Low-end Android 5+ persistent background	Noise-sensitive, explainability-required deployments

Engine Switching

Each engine stores wake words independently (microWakeWords / vsWakeWords). Switching engines auto-restores the last selection — no more lost models or silent failures.

Cross-engine ID mapping: micro's okay_nabu auto-maps to VS's ok_nabu. HA-configured wake words also resolve correctly across engines.

Service auto-restarts on engine switch, keeping the detector in sync with settings.

Differences from Original brownard/Ava — Wake Word Engine Only

Ava Pro is based on brownard/Ava. This section covers only the wake word engine differences.

Dimension	brownard/Ava (original)	Ava Pro (knoop7/Ava)
Engine count	1 (microWakeWord)	2 (microWakeWord + vsWakeWord)
microWakeWord engine	TFLite binary classification, sliding window threshold	Same engine, same 9 built-in models
vsWakeWord engine	Not available	ONNX CTC phoneme decoding + edit distance + multi-layer false wake gates
Built-in models	9 micro (.tflite)	9 micro (.tflite) + 3 vs (.ort)
Model format (micro)	.tflite + .json	.tflite + .json (identical, V2/V3 compatible)
Model format (vs)	N/A	.ort (ONNX) + .json manifest with CTC phoneme targets
Custom model loading	DocumentTreeWakeWordProvider (SAF folder picker in Settings)	APK assets injection (MT Manager / APK Editor) — see below
False wake defense	Threshold only (single layer)	Threshold (micro) or voice gate + CTC confidence + edit distance + 2-hit confirm + cooldown (vs)
Inference runtime	TensorFlow Lite	TensorFlow Lite + ONNX Runtime (reduced build, CPU EP only)
CPU / memory footprint	Minimal	Minimal (micro) or significantly higher (vs)

Why Ava Pro runs faster than the original

The microWakeWord engine itself is identical between both apps — same models, same inference path. The speed difference comes from what happens around the engine, not inside it.

1. Stop word detection is no longer always-on.

The original runs the stop-word model continuously alongside the wake-word model — two models inference every audio chunk, 24/7, even when nothing is happening. Ava Pro (since 0.5.2) only activates the stop-word model when it is actually useful: when a timer alarm is ringing, or when a voice session is in progress (Listening / Processing / Responding). During idle standby, the stop model is completely skipped. This cuts continuous CPU load in half during the 99% of the time the device is just waiting.

2. vsWakeWord skips inference during silence.

When using the vsWakeWord engine, Ava Pro runs a lightweight voice activity gate before feeding audio to the ONNX model. The gate analyzes audio energy and spectral characteristics in real time. If the input is silence or background noise (no human speech present), the entire ONNX inference is skipped — no model loading, no tensor computation, no phoneme decoding. The ONNX model only wakes up when the gate detects voice-like audio. On a quiet device sitting in a hallway, this means the ONNX engine effectively sleeps most of the time, while still catching the wake word the moment someone speaks.

3. Buffered audio replay on gate open.

When the voice gate transitions from closed to open (someone starts speaking), Ava Pro replays the last few seconds of buffered audio through the ONNX model in a single batch. This means the beginning of the wake word — which happened while the gate was still deciding — is not lost. The user experiences instant wake detection without waiting for the model to "warm up" from silence.

4. Incremental feature extraction.

vsWakeWord extracts log-Mel features from audio. Instead of recomputing the full feature window every chunk, Ava Pro shifts the existing feature buffer and only computes the new frames. On a 1300ms window with 80ms chunks, this means computing ~6 new frames instead of ~128 every cycle — a 20x reduction in FFT work.

5. Adaptive gain normalization.

vsWakeWord applies a smooth adaptive gain to normalize voice volume before inference. This is not about speed directly, but it means the model sees consistently-leveled audio regardless of distance or microphone sensitivity. Consistent input means the CTC confidence scores are more stable, which means the 2-hit confirmation gate reaches its threshold in fewer attempts — faster triggering with fewer false rejects.

6. ONNX Runtime is a stripped build.

The ONNX Runtime shipped with Ava Pro is a custom reduced build — only the CPU execution provider, no GPU/NNAPI delegates, no training APIs. This makes model loading and session creation faster, and the native library is smaller to load into memory. The tradeoff is no hardware acceleration, but for wake-word-sized models the CPU path is already fast enough and avoids the latency and compatibility issues of GPU/NNAPI on fragmented Android devices.

Net effect: On a typical device in idle standby, Ava Pro's wake-word CPU usage is lower than the original because stop-word inference is skipped. When someone speaks, vsWakeWord's voice gate + buffered replay + incremental features make detection feel instant despite the heavier model. The microWakeWord engine path matches the original's speed; the vsWakeWord path trades higher peak CPU for smarter gating and faster perceived response.

Why Ava Pro doesn't use SAF folder picker for custom wake words

The original brownard/Ava ships a DocumentTreeWakeWordProvider that uses Android's Storage Access Framework (SAF). In Settings, you pick a folder, drop .tflite + .json files there, and the app loads them at runtime.

Ava Pro does not include this provider. The reasons are architectural:

Dual engine factory. Ava Pro's WakeWordDetectorFactory dispatches to either microWakeWord or vsWakeWord. The original SAF provider only knows microWakeWord's .tflite format. vsWakeWord uses .ort (ONNX) + manifest JSON with CTC phoneme targets — a completely different model format. A single SAF folder cannot serve both engines without complex format detection and validation logic.
ONNX Runtime safety. TFLite can load arbitrary .tflite files from user storage safely. ONNX Runtime is more sensitive to model format mismatches — an invalid .ort file could crash the native inference session or cause OOM on low-end devices. The asset-bundled approach guarantees models are validated at build time.
Build-time validation. By bundling models in assets/, the build system catches format errors, missing manifest fields, and phoneme inventory mismatches before the app ships. SAF-loaded models have no such guarantee — a malformed JSON could silently disable wake detection.
Scope. Ava Pro added vsWakeWord, dual wake words, voiceprint, visual feedback, and stop word optimization. Re-implementing SAF support for both engine formats safely was deprioritized in favor of these features.

How to add custom wake words to Ava Pro (APK injection, no source build required)

Ava Pro loads microWakeWord models from assets/wakeWords/ and vsWakeWord models from assets/vswakeword/ inside the APK. You can inject custom models without building from source — just edit the APK directly.

Tools needed:

Android file manager with APK editing: MT Manager (Chinese), APK Editor Pro, or Nexus APK Editor
Or on desktop: apktool + zipalign + apksigner

Steps (MT Manager method):

Download the Ava Pro lite APK from GitHub releases
Open MT Manager, long-press the APK, select "View" (or "Extract")
Navigate to assets/wakeWords/ (for microWakeWord) or assets/vswakeword/ (for vsWakeWord)
Copy your custom model files into the directory:
- microWakeWord: my_word.tflite + my_word.json
- vsWakeWord: my_word.ort + my_word.json
Save and repack the APK (MT Manager handles re-signing automatically)
Uninstall the old Ava, install the modified APK
Open Settings -> Voice Config -> Wake Word, your custom model appears in the list

Steps (desktop apktool method):

# Decompile
apktool d Ava-0.5.4.apk -o ava_decoded

# Add your model
cp my_word.tflite my_word.json ava_decoded/assets/wakeWords/

# For vsWakeWord
cp my_word.ort my_word.json ava_decoded/assets/vswakeword/

# Repack
apktool b ava_decoded -o Ava-custom.apk

# Sign
zipalign -v 4 Ava-custom.apk Ava-custom-aligned.apk
apksigner sign --ks ava-key.jks --ks-pass pass:1234 --out Ava-custom-signed.apk Ava-custom-aligned.apk

JSON format requirements:

For microWakeWord (same as V3 community format):

{
  "type": "micro",
  "wake_word": "My Custom Word",
  "author": "Your Name",
  "model": "my_word.tflite",
  "trained_languages": ["en"],
  "version": 2,
  "micro": {
    "probability_cutoff": 0.6,
    "sliding_window_size": 5,
    "feature_step_size": 10,
    "tensor_arena_size": 30000,
    "minimum_esphome_version": "2024.7.0"
  }
}

For vsWakeWord (see manifest structure above for full template).

Note: This is an advanced operation intended for power users. Wake word customization is inherently an advanced topic — the models require training or downloading from trusted sources. The APK injection method works because Ava Pro reads models from the APK's assets/ directory at runtime, same as the built-in models. No source code compilation is needed.

Compatibility: Both apps share the same microWakeWord model format (V2/V3 .tflite + .json). Models from TaterTotterson/microWakeWords V3 directory work directly. Just drop the .tflite + .json pair into assets/wakeWords/.

Custom Wake Words

microWakeWord: Download Pretrained or Train Your Own

Ava uses microWakeWords V3 format. The JSON config is identical to the V3 models from the community.

Option 1: Download Pretrained Models (Easiest)

The microWakeWords repo by TaterTotterson maintains a large library of pretrained V3 models. The format is directly compatible with Ava.

Browse the microWakeWordsV3 directory
Find a wake word you like (e.g., aleesa, angel, annika, arale, artamis, etc.)
Download both files: name.json and name.tflite
Place them in Ava's app/src/main/assets/wakeWords/ directory
Rebuild the APK

The V3 JSON format is identical to Ava's built-in models:

{
  "type": "micro",
  "wake_word": "ah_lehks_sah",
  "author": "Tater Totterson",
  "website": "https://github.com/TaterTotterson/microWakeWord-Trainer-AppleSilicon",
  "model": "ah_lehks_sah.tflite",
  "trained_languages": ["en"],
  "version": 2,
  "micro": {
    "probability_cutoff": 0.1,
    "sliding_window_size": 3,
    "feature_step_size": 10,
    "tensor_arena_size": 30000,
    "minimum_esphome_version": "2024.7.0"
  }
}

Option 2: Train Your Own with macOS Trainer (Apple Silicon)

If you have a Mac with Apple Silicon (M1/M2/M3/M4), you can train a custom wake word with a local web UI:

Install the trainer:
- Download the signed macOS app from WakeWord Trainer releases
- Or clone and run from source:
```
git clone https://github.com/TaterTotterson/microWakeWord-Trainer-AppleSilicon.git
cd microWakeWord-Trainer-AppleSilicon
./run.sh
```
- Open http://127.0.0.1:8789 in your browser
Train the wake word:
- Enter your wake phrase in the Trainer tab
- Choose language (en, or other Piper-supported languages)
- Optionally test pronunciation with Test TTS
- Click Start training
- The trainer uses Piper TTS to generate samples automatically
- Personal samples are optional but improve accuracy
Optionally capture real samples from devices:
- Flash a device with Tater firmware (from the Firmware tab)
- Enable Capture Wake Audio on the device
- Set Trainer App URL to http://<trainer-ip>:8789
- Review captured clips in the Captured Audio tab
- Mark good clips as "This is good", bad ones as "False wake"
Get the output files:
- Successful training produces:
  - trained_wake_words/<wake_word>.tflite
  - trained_wake_words/<wake_word>.json
Install into Ava:
- Copy both files to Ava's app/src/main/assets/wakeWords/ directory
- Rebuild the APK

Option 3: Train with Direct Script

git clone https://github.com/TaterTotterson/microWakeWord-Trainer-AppleSilicon.git
cd microWakeWord-Trainer-AppleSilicon
./train_microwakeword_macos.sh "hey_my_custom_word"

If personal_samples/*.wav or negative_samples/*.wav exist in the folder, they are included automatically.

JSON Config Parameters

Parameter	Description	Typical Value
`probability_cutoff`	Detection threshold (lower = easier to trigger, more false positives)	0.1 - 0.97
`sliding_window_size`	Frames to average before triggering	3 - 9
`feature_step_size`	Feature extraction step in ms	10
`tensor_arena_size`	TFLite arena size in bytes (must match model)	21000 - 30000
`minimum_esphome_version`	Minimum ESPHome version	2024.7.0

Note: microWakeWord requires a trained model for each wake word. There is no hot-swap — you need a .tflite file. But the V3 community library has 100+ pretrained models you can download directly.

vsWakeWord: Manifest Hot-Swap

vsWakeWord supports manifest JSON hot-swap. To create a custom wake word:

Train a CTC model using the vsWakeWord training pipeline (PyTorch → ONNX export)

Create a manifest JSON with your phoneme targets:

{
  "name": "my_custom_word",
  "format": "vs-wake-word-ctc-v1",
  "recommended_threshold": 0.6,
  "input": { "name": "input", "shape": [1, 128, 40], "dtype": "float32", "feature": "log_mel" },
  "output": { "name": "log_probs", "shape": [1, 49, 52], "dtype": "float32" },
  "feature_config": {
    "sample_rate": 16000, "window_ms": 1300, "frame_ms": 25, "hop_ms": 10,
    "n_fft": 512, "n_mels": 40, "f_min": 80.0, "f_max": 7600.0, "log_floor": 1e-06
  },
  "ctc": {
    "vocab_size": 52, "blank_id": 1, "pad_id": 0, "word_sep_id": 2,
    "wake_word_targets": [[your_phoneme_ids]],
    "wake_word_target_phonemes": [["your", "phonemes"]],
    "max_edit_distance": 1
  },
  "runtime": {
    "required_hits": 2, "hit_mode": "consecutive",
    "cooldown_ms": 2000, "high_confidence_bypass": 6.8
  },
  "stop_classifier": false
}

Place both files (id.json + id.ort) in assets/vswakeword/ directory
Rebuild the Ava APK

The 52-phoneme inventory uses IPA-style symbols. See an existing manifest (e.g., hey_jarvis.json) for the full inventory list.

Advantage: vsWakeWord's manifest-based approach lets you swap wake word targets without retraining the base model in some cases — just update the wake_word_targets phoneme IDs. However, for best accuracy, a model trained on your specific wake word is recommended.

How to Change Wake Word

Open Ava app
Go to Settings -> Voice Config
Find Wake Word Engine and choose microWakeWord or vsWakeWord
Find Wake Word 1 option
Select your preferred wake word from the list
Optionally configure Wake Word 2 for dual wake word mode
New wake word takes effect after service restart (auto)

Wake Word Sensitivity

Adjust sensitivity to control how easily the wake word triggers:

Higher sensitivity = easier to trigger, but more false positives
Lower sensitivity = fewer false positives, but may miss quiet speech

Wake Sound

Each wake word can have its own wake sound:

Wake Word 1 Sound: Played when Wake Word 1 is detected
Wake Word 2 Sound: Played when Wake Word 2 is detected
Default Sound: Used if no custom sound is set
None: Silent recording start

Wake Visual Feedback

Ava provides clear visual feedback during wake and conversation:

Wake Instant:

Colorful ripple expanding from screen center
Android 13+: RuntimeShader with distorted halo + star particles
Android 7: Soft circular diffusion
Compatibility paths for other versions

Conversation (when Floating Subtitle is disabled):

Full-screen edge glow that changes with state:
- Listening: Edge light breathes with microphone volume
- Processing: Slow breathing animation
- Speaking: Pulsates with TTS energy

Dual Wake-Word Color Coding:

Wake Word 1 = green (default)
Wake Word 2 = blue (default)
Ripple and edge light match the triggered wake word
Custom colors available in Settings → Extensions → Interface → Voice feedback colors
7 rainbow presets (red through purple) also available

Technical Notes:

Edge glow uses pre-rendered Gaussian blur bitmaps for performance
Ripple animation driven by system uptime (prevents Kiosk devices with "animation duration = 0" from killing the effect)
Android 7.0/7.1 optimized to clean circular diffusion without Shader dependency

Voiceprint Recognition

Ava Pro includes an experimental on-device smart voiceprint recognition feature. Once enabled, Ava learns frequent wakeword callers from short local samples. On subsequent wakeword triggers, it identifies who likely woke the device and publishes the result to a Home Assistant "Voiceprint Status" sensor.

This is not a heavyweight cloud voiceprint model. It is a lightweight local matcher purpose-built for Ava's fixed-wakeword scenario. It uses short local samples, audio feature extraction, quality gating, and conservative learning — no large model packages required.

How It Works

After you wake the device a few times, it learns who usually speaks
All audio stays local, nothing is uploaded to cloud
Two user slots available, each with a custom display name

Setup

Go to Settings -> Voice Config
Turn on Smart voiceprint recognition
Set User 1 name and User 2 name
Wake the device a few times for each user
The voiceprint status sensor appears in Home Assistant

Home Assistant Entity

sensor.your_device_name_voice_print_status

Audio Configuration

System Recording Mode

Choose how the microphone captures audio:

Mode	Description
Auto select	Lets the device choose. Best default
Speech boost	Tries to make speech clearer. May change tone
Normal pickup	Plain microphone input. Safest choice
Call boost	More like call audio. May over-process
Unprocessed	Closest to raw input. Good for testing

Audio Processing

Option	Description
Noise suppression	Use device noise suppressor when available
Automatic gain control	Use device AGC when available
Hardware echo cancellation	Enable hardware echo cancellation when supported
Software echo cancellation	Cancel Ava's own playback from microphone so wake words work during music/TTS playback

Software AEC is recommended when using music playback or TTS. It replaces the device echo canceler while enabled.

Software Mic Gain

Boost PCM audio before wake word detection and streaming. Range: 0-24 dB.

Audio Config Presets

Preset	Description
Standard 16kHz Mono	Default for voice recognition
Broadcast 48kHz Mono	Higher quality capture
Stereo Input 48kHz	Stereo recording
Low Latency 16kHz	Reduced latency
Voice Call Optimized	For voice call use case
Unprocessed Raw 48kHz	Raw capture, no processing
Compact 8kHz	Lowest bandwidth
CD Quality 44.1kHz	High quality
Studio 96kHz	Very high quality
Ultra HD 192kHz	Maximum quality

Voice Channel

The Voice Channel switch disables all voice input, wake word, and voice assistant threads. Service restart required to apply.

Stop Words

Stop words interrupt the current conversation or stop Ava's response (e.g., timer alarm).

Stop Word	Engine	Model ID	Description
Stop	microWakeWord	`stop`	Default stop word for micro engine
Ok Stop	vsWakeWord	`ok_stop`	Stop word for vs engine (`stop_classifier: true`)

Stop-word detection only runs when actually needed (since 0.5.2):

A timer alarm is actively ringing
A voice session is in progress (Listening, Processing, Responding)

During idle standby with no alarm, the stop model is skipped — cutting CPU load and heat.

Continuous Conversation

Continuous conversation lets you issue multiple commands without saying the wake word each time.

How It Works

Say wake word + first command
After Ava responds, automatically enters listening mode
Say next command directly (no wake word needed)
After 10 seconds of silence, exits continuous conversation mode

End Conditions

Mode	Description
Exit Keyword Stop	End when the assistant says goodbye or farewell phrases (default)
Question Mark Continue	Keep listening only if the reply ends with "?"; stop after other replies

How to Enable

Go to Settings -> Interaction
Turn on Continuous Conversation switch
Choose end condition mode

Conversation Subtitles

Conversation subtitles display what you said and Ava's response on screen as a floating window.

How to Enable

Go to Settings -> Interaction
Turn on Floating Subtitle switch

Mute Mode

Mute mode turns off the microphone, Ava won't respond to any wake words.

How to Enable

Method 1: In Settings

Go to Settings -> Voice Config
Turn on Mute switch

Method 2: Home Assistant Control

service: switch.turn_on
target:
  entity_id: switch.your_device_name_mute

Settings Summary

Setting	Location	Description	Default
Device Name	Voice Config	Name shown in HA	device_model_voice_assistant
Port	Voice Config	ESPHome communication port	6053
Wake Word Engine	Voice Config	microWakeWord or vsWakeWord	microWakeWord
Wake Word 1	Voice Config	Primary wake word	Hey Jarvis
Wake Word 2	Voice Config	Secondary wake word	None
Sensitivity	Voice Config	Wake word sensitivity	-
Wake Sound	Voice Config	Prompt sound when recording starts	Optional
Voice Channel	Voice Config	Master voice input switch	On
System Recording Mode	Voice Config	Audio capture mode	Auto select
Noise suppression	Voice Config	Device noise suppressor	-
Automatic gain control	Voice Config	Device AGC	-
Hardware echo cancellation	Voice Config	Hardware AEC	-
Software echo cancellation	Voice Config	Software AEC for playback	-
Voiceprint recognition	Voice Config	Speaker recognition	Off
Software Mic Gain	Voice Config	PCM boost 0-24 dB	0
Audio Config	Voice Config	Capture preset	Standard 16kHz Mono
Mute	Voice Config	Turn off microphone	Off
Continuous Conversation	Interaction	Issue commands continuously	Off
Floating Subtitle	Interaction	Display conversation text	Off

Home Assistant Services

Manually Trigger Wake

service: esphome.your_device_name_trigger_wake
data: {}

Control Mute

# Enable mute
service: switch.turn_on
target:
  entity_id: switch.your_device_name_mute

# Disable mute
service: switch.turn_off
target:
  entity_id: switch.your_device_name_mute

Set Volume

service: media_player.volume_set
target:
  entity_id: media_player.your_device_name
data:
  volume_level: 0.8  # 0.0 - 1.0

Send Voice Command (Text)

service: text.set_value
target:
  entity_id: text.your_device_name_voice_command
data:
  value: "Turn on the living room light"

FAQ

Ava can't hear me?

Check if microphone permission is granted
Check if mute mode is enabled
Check if Voice Channel is enabled
Make sure device volume isn't muted
Try speaking closer to device
Try a different System Recording Mode

Wake word recognition inaccurate?

Try a different wake word
Adjust wake word sensitivity
Make sure environment isn't too noisy
Enable noise suppression
Speak at moderate speed, pronounce clearly
Try vsWakeWord engine — its CTC phoneme matching is more robust to noise and accents

Wake word doesn't work during music playback?

Enable Software echo cancellation in Settings -> Voice Config
This cancels Ava's own playback from the microphone input

Speech recognition results wrong?

This is usually a Home Assistant issue:

Check HA voice assistant configuration
Make sure Whisper etc. components are working
Check network latency
Try speaking more clearly

Back to Home

Uh oh!

Voice Control

Voice Control

How It Works

Wake Words

Wake Word Engines

microWakeWord (default)

vsWakeWord

Engine Comparison

Engine Switching

Differences from Original brownard/Ava — Wake Word Engine Only

Why Ava Pro runs faster than the original

Why Ava Pro doesn't use SAF folder picker for custom wake words

How to add custom wake words to Ava Pro (APK injection, no source build required)

Custom Wake Words

microWakeWord: Download Pretrained or Train Your Own

Option 1: Download Pretrained Models (Easiest)

Option 2: Train Your Own with macOS Trainer (Apple Silicon)

Option 3: Train with Direct Script

JSON Config Parameters

vsWakeWord: Manifest Hot-Swap

How to Change Wake Word

Wake Word Sensitivity

Wake Sound

Wake Visual Feedback

Voiceprint Recognition

How It Works

Setup

Home Assistant Entity

Audio Configuration

System Recording Mode

Audio Processing

Software Mic Gain

Audio Config Presets

Voice Channel

Stop Words

Continuous Conversation

How It Works

End Conditions

How to Enable

Conversation Subtitles

How to Enable

Mute Mode

How to Enable

Settings Summary

Home Assistant Services

Manually Trigger Wake

Control Mute

Set Volume

Send Voice Command (Text)

FAQ

Ava can't hear me?

Wake word recognition inaccurate?

Wake word doesn't work during music playback?

Speech recognition results wrong?

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!