v2026.6.0 - Intercom Native polish and Espressif GMF audio stack
🚀 v2026.6.0 - Intercom Native polish and Espressif GMF audio stack
Hotfix after initial 2026.6.0 publication
Published on May 30, 2026 after field testing the first 2026.6.0 build.
esp_afenow uses a compile-time split between single-mic and dual-mic targets.- Single-mic AFE profiles use the official ESP-SR direct feed/fetch path instead of the GMF AFE element path.
- Dual-mic AFE profiles keep the GMF manager/element path and raw-output selection behavior.
- Spotpear full AFE TCP/UDP profiles keep AFE VAD restore disabled by default to avoid restoring an unstable VAD state at boot.
- The full MWW safe-start logic now uses the Intercom API idle condition instead of string-matching the state name.
This hotfix keeps the public YAMLs pointed at main and removes local debug/telemetry from production profiles.
This release is the next major step after the 2026.5.0 PBX-lite migration.
2026.5.x introduced the new call model: ESP devices as independent extensions, Home Assistant as a peer/bridge, unified phonebook, TCP/UDP routing and browser softphone support.
2026.6.0 keeps that model and rebuilds the audio foundation under it.
🏠 Home Assistant / Intercom Native
The Home Assistant side has been cleaned up around the unified PBX-lite event model.
🔁 Unified call event model
The integration and Lovelace card now use the unified:
intercom_native.call_event
event shape for session, bridge and forward updates.
This gives automations and the card a more consistent view of:
- call scope
- event type
- call state
- hangup / decline / failure reason
- bridge and forward lifecycle
The older split event behavior is no longer the preferred model.
📵 Better unavailable-device handling
The card now handles unavailable ESP devices more explicitly instead of showing stale call controls as if the device were still reachable.
This should make dashboard state clearer when an ESP is offline, rebooting, being flashed, or temporarily disconnected from Home Assistant.
⚡ Safer fast hangup / redial behavior
The browser softphone path has been hardened for fast user actions.
If a call is ended and another call starts immediately after, browser audio cleanup no longer tears down the new call's microphone/audio path by mistake.
This fixes a class of "second call has no browser audio" style problems.
📱 Mobile notification answer flow
The documented mobile flow now supports real Answer / Decline actions:
- Answer opens the dashboard view containing
intercom-cardwith?intercom_answer=1 - the card requests microphone permission and starts the full-duplex browser/app audio path
- Decline stays in Home Assistant automation logic and calls
intercom_native.decline
This is the supported way to answer an ESP-originated call from the Home Assistant Companion app.
🧹 Versioned card cache behavior
The card is registered with a versioned frontend URL derived from the installed integration version.
After upgrading, hard-refresh the dashboard page or clear the Companion app cache if the card still shows an old version.
✅ Minimum Versions
This release requires:
- ESPHome:
2026.5.xor newer - Home Assistant Core:
2026.5.0or newer
HACS metadata now declares the Home Assistant minimum version accordingly.
⚠️ Breaking Changes
Custom YAMLs that still use the old audio component/package layout need to be migrated.
Main migration points:
- maintained YAMLs now use
esp_audio_stack - old
i2s_audio_duplexpackages are no longer the supported path - some YAML options were renamed:
speaker_volume->master_volumemic_attenuation->input_gainframe_buffers_in_psram->buffers_in_psramaudio_stack_in_psram->audio_task_stack_in_psram
- Generic full profiles are split into AEC and AFE variants
- full audio/LVGL profiles include OTA maintenance handling
- old copied Lovelace card files should be replaced by the bundled card
After upgrading, clear ESPHome build caches once before compiling.
find . -type d -name .esphome -prune -exec rm -rf {} +🎧 Audio Stack Migration
The biggest internal change in 2026.6.0 is the migration from the old custom duplex audio path to the new:
esp_audio_stack
backend.
This replaces the maintained i2s_audio_duplex path.
The goal is not just a component rename. The new backend is built around Espressif / ESP-IDF audio components that are designed to work together:
esp_driver_i2sfor official I2S channel ownershipesp_codec_devfor codec-backed devicesgmf_io/io_codec_devfor codec IOesp_audio_effectsfor rate, bit-depth and layout conversionesp-srfor Acoustic Echo Cancellationgmf_ai_audio/esp_gmf_afe_managerfor the full Audio Front-End pipeline
This means the project now carries less custom audio infrastructure and relies more directly on the Espressif audio ecosystem.
💡 Why This Matters
Earlier versions had custom code for a lot of low-level audio work:
- I2S lifecycle
- speaker/microphone glue
- AEC reference routing
- rate conversion
- bit-depth conversion
- channel layout conversion
- ring buffers
- processor feed/fetch timing
- codec-specific assumptions
That worked, but it created too much maintenance pressure and too many board-specific edge cases.
With esp_audio_stack, the project is closer to the native ESP-IDF audio model while still exposing normal ESPHome surfaces above it:
- microphone
- speaker
- media player
- mixer
- Voice Assistant
- Micro Wake Word
- intercom API
- Home Assistant entities
🧩 Supported Audio Shapes
The maintained profiles now cover these layouts through the new stack:
- single-bus codec boards
- single-bus no-codec boards
- dual-bus MEMS mic + I2S amplifier boards
- ES8311 stereo playback-reference boards
- ES7210 + ES8311 TDM reference boards
- dual-mic AFE boards
- lightweight AEC-only Generic S3 profiles
- full AFE profiles for larger flash/RAM layouts
Codec-backed devices use esp_codec_dev.
No-codec devices use official esp_driver_i2s channels directly, avoiding unnecessary codec/GMF IO dependencies on smaller builds.
🎙️ AEC and AFE Profiles
Profiles are now split more clearly.
🪶 esp_aec
Use this for lightweight echo cancellation.
It is the default direction for:
- intercom-only devices
- Generic S3 full-experience profiles that need to fit smaller flash layouts
- users who want Acoustic Echo Cancellation without the full Audio Front-End cost
🧠 esp_afe
Use this for the full Espressif Audio Front-End path.
It adds:
- Acoustic Echo Cancellation
- Noise Suppression
- Automatic Gain Control
- Voice Activity Detection
- Speech Enhancement / Blind Source Separation on supported dual-mic boards
It is heavier, but it is the right direction for boards with enough flash/RAM and for full voice-device profiles.
📦 Generic Profile Split
Generic S3 full-experience YAMLs are now split by intended target:
-
generic-s3-full-aec-*- lightweight path
- intended for 4 MB-friendly builds
- uses standalone
esp_aec - uses the lighter
previous_framereference
-
generic-s3-full-afe-*- full Audio Front-End path
- intended for larger flash layouts
- uses
esp_afe - uses TYPE2-style software reference
This avoids pretending one Generic YAML can fit every board and every flash layout.
🔊 Better AEC Reference Handling
Echo cancellation quality depends heavily on the playback reference.
The new stack handles reference routing per topology:
- ES8311 boards can use stereo digital feedback
- ES7210 TDM boards can use a hardware TDM reference slot
- no-codec Generic AEC profiles can use
previous_frame - Generic AFE profiles can use TYPE2-style software reference
This is one of the main reasons for the audio migration. AEC quality depends on reference timing, channel layout and conversion path, not only on enabling a library.
🧠 Runtime and Memory Improvements
The migration also cleaned up runtime behavior:
- large buffers and task stacks are allocated earlier
- repeated heap churn during call/media transitions has been reduced
- microphone and speaker wrapper loops wake on real events instead of spinning
intercom_apiparks its loop when idle- intercom TX uses lower-copy reads where possible
- full profiles place selected buffers/stacks in PSRAM
- full LVGL/audio profiles enter OTA maintenance mode before flashing
This helps demanding full-experience devices where media playback, Piper TTS, Micro Wake Word, Voice Assistant, AFE/AEC and intercom all coexist.
🧭 Maintained Board Direction
Current maintained baseline:
- Waveshare ESP32-S3 Audio Board: full AFE, dual mic, TDM reference
- Spotpear Ball v2: codec-backed AFE/intercom profiles
- Generic S3 AEC: lightweight 4 MB-friendly full-experience profiles
- Generic S3 AFE: larger flash full AFE profiles
- Generic dual-bus: maintained intercom profiles
- Waveshare P4 Touch: present and improving, still board-specific/experimental
🧪 Validation
Before this release, the public YAMLs were switched to remote release mode so users can download only the YAML and let ESPHome fetch packages, assets and external components from main.
Validation performed:
- HACS validation passes
- hassfest validation passes
generic-s3-full-afe-tcp.yamlcompiles successfully with ESPHome2026.5.1- ESPHome fetches this repository from
main - Espressif managed components resolve and build correctly
Generic full AFE firmware size from the validation build is about 2.1 MB.
⬆️ Upgrade Notes
Recommended upgrade path:
- Update the Home Assistant integration through HACS.
- Restart Home Assistant.
- Hard-refresh the dashboard page containing
intercom-card. - Clear ESPHome build cache once.
- Recompile from the updated YAMLs.
- Flash the ESP firmware.
If you maintain custom YAMLs, start from the closest maintained profile and reapply only your board-specific changes.