Skip to content

v2026.6.0 - Intercom Native polish and Espressif GMF audio stack

Choose a tag to compare

@n-IA-hane n-IA-hane released this 29 May 14:16
· 742 commits to 5ce07fa54837218bca50b69172fee0fd476859f0 since this release

🚀 v2026.6.0 - Intercom Native polish and Espressif GMF audio stack

Hotfix after initial 2026.6.0 publication

Published on May 30, 2026 after field testing the first 2026.6.0 build.

  • esp_afe now uses a compile-time split between single-mic and dual-mic targets.
  • Single-mic AFE profiles use the official ESP-SR direct feed/fetch path instead of the GMF AFE element path.
  • Dual-mic AFE profiles keep the GMF manager/element path and raw-output selection behavior.
  • Spotpear full AFE TCP/UDP profiles keep AFE VAD restore disabled by default to avoid restoring an unstable VAD state at boot.
  • The full MWW safe-start logic now uses the Intercom API idle condition instead of string-matching the state name.

This hotfix keeps the public YAMLs pointed at main and removes local debug/telemetry from production profiles.

This release is the next major step after the 2026.5.0 PBX-lite migration.

2026.5.x introduced the new call model: ESP devices as independent extensions, Home Assistant as a peer/bridge, unified phonebook, TCP/UDP routing and browser softphone support.

2026.6.0 keeps that model and rebuilds the audio foundation under it.

🏠 Home Assistant / Intercom Native

The Home Assistant side has been cleaned up around the unified PBX-lite event model.

🔁 Unified call event model

The integration and Lovelace card now use the unified:

intercom_native.call_event

event shape for session, bridge and forward updates.

This gives automations and the card a more consistent view of:

  • call scope
  • event type
  • call state
  • hangup / decline / failure reason
  • bridge and forward lifecycle

The older split event behavior is no longer the preferred model.

📵 Better unavailable-device handling

The card now handles unavailable ESP devices more explicitly instead of showing stale call controls as if the device were still reachable.

This should make dashboard state clearer when an ESP is offline, rebooting, being flashed, or temporarily disconnected from Home Assistant.

⚡ Safer fast hangup / redial behavior

The browser softphone path has been hardened for fast user actions.

If a call is ended and another call starts immediately after, browser audio cleanup no longer tears down the new call's microphone/audio path by mistake.

This fixes a class of "second call has no browser audio" style problems.

📱 Mobile notification answer flow

The documented mobile flow now supports real Answer / Decline actions:

  • Answer opens the dashboard view containing intercom-card with ?intercom_answer=1
  • the card requests microphone permission and starts the full-duplex browser/app audio path
  • Decline stays in Home Assistant automation logic and calls intercom_native.decline

This is the supported way to answer an ESP-originated call from the Home Assistant Companion app.

🧹 Versioned card cache behavior

The card is registered with a versioned frontend URL derived from the installed integration version.

After upgrading, hard-refresh the dashboard page or clear the Companion app cache if the card still shows an old version.

✅ Minimum Versions

This release requires:

  • ESPHome: 2026.5.x or newer
  • Home Assistant Core: 2026.5.0 or newer

HACS metadata now declares the Home Assistant minimum version accordingly.

⚠️ Breaking Changes

Custom YAMLs that still use the old audio component/package layout need to be migrated.

Main migration points:

  • maintained YAMLs now use esp_audio_stack
  • old i2s_audio_duplex packages are no longer the supported path
  • some YAML options were renamed:
    • speaker_volume -> master_volume
    • mic_attenuation -> input_gain
    • frame_buffers_in_psram -> buffers_in_psram
    • audio_stack_in_psram -> audio_task_stack_in_psram
  • Generic full profiles are split into AEC and AFE variants
  • full audio/LVGL profiles include OTA maintenance handling
  • old copied Lovelace card files should be replaced by the bundled card

After upgrading, clear ESPHome build caches once before compiling.

find . -type d -name .esphome -prune -exec rm -rf {} +

🎧 Audio Stack Migration

The biggest internal change in 2026.6.0 is the migration from the old custom duplex audio path to the new:

esp_audio_stack

backend.

This replaces the maintained i2s_audio_duplex path.

The goal is not just a component rename. The new backend is built around Espressif / ESP-IDF audio components that are designed to work together:

  • esp_driver_i2s for official I2S channel ownership
  • esp_codec_dev for codec-backed devices
  • gmf_io / io_codec_dev for codec IO
  • esp_audio_effects for rate, bit-depth and layout conversion
  • esp-sr for Acoustic Echo Cancellation
  • gmf_ai_audio / esp_gmf_afe_manager for the full Audio Front-End pipeline

This means the project now carries less custom audio infrastructure and relies more directly on the Espressif audio ecosystem.

💡 Why This Matters

Earlier versions had custom code for a lot of low-level audio work:

  • I2S lifecycle
  • speaker/microphone glue
  • AEC reference routing
  • rate conversion
  • bit-depth conversion
  • channel layout conversion
  • ring buffers
  • processor feed/fetch timing
  • codec-specific assumptions

That worked, but it created too much maintenance pressure and too many board-specific edge cases.

With esp_audio_stack, the project is closer to the native ESP-IDF audio model while still exposing normal ESPHome surfaces above it:

  • microphone
  • speaker
  • media player
  • mixer
  • Voice Assistant
  • Micro Wake Word
  • intercom API
  • Home Assistant entities

🧩 Supported Audio Shapes

The maintained profiles now cover these layouts through the new stack:

  • single-bus codec boards
  • single-bus no-codec boards
  • dual-bus MEMS mic + I2S amplifier boards
  • ES8311 stereo playback-reference boards
  • ES7210 + ES8311 TDM reference boards
  • dual-mic AFE boards
  • lightweight AEC-only Generic S3 profiles
  • full AFE profiles for larger flash/RAM layouts

Codec-backed devices use esp_codec_dev.

No-codec devices use official esp_driver_i2s channels directly, avoiding unnecessary codec/GMF IO dependencies on smaller builds.

🎙️ AEC and AFE Profiles

Profiles are now split more clearly.

🪶 esp_aec

Use this for lightweight echo cancellation.

It is the default direction for:

  • intercom-only devices
  • Generic S3 full-experience profiles that need to fit smaller flash layouts
  • users who want Acoustic Echo Cancellation without the full Audio Front-End cost

🧠 esp_afe

Use this for the full Espressif Audio Front-End path.

It adds:

  • Acoustic Echo Cancellation
  • Noise Suppression
  • Automatic Gain Control
  • Voice Activity Detection
  • Speech Enhancement / Blind Source Separation on supported dual-mic boards

It is heavier, but it is the right direction for boards with enough flash/RAM and for full voice-device profiles.

📦 Generic Profile Split

Generic S3 full-experience YAMLs are now split by intended target:

  • generic-s3-full-aec-*

    • lightweight path
    • intended for 4 MB-friendly builds
    • uses standalone esp_aec
    • uses the lighter previous_frame reference
  • generic-s3-full-afe-*

    • full Audio Front-End path
    • intended for larger flash layouts
    • uses esp_afe
    • uses TYPE2-style software reference

This avoids pretending one Generic YAML can fit every board and every flash layout.

🔊 Better AEC Reference Handling

Echo cancellation quality depends heavily on the playback reference.

The new stack handles reference routing per topology:

  • ES8311 boards can use stereo digital feedback
  • ES7210 TDM boards can use a hardware TDM reference slot
  • no-codec Generic AEC profiles can use previous_frame
  • Generic AFE profiles can use TYPE2-style software reference

This is one of the main reasons for the audio migration. AEC quality depends on reference timing, channel layout and conversion path, not only on enabling a library.

🧠 Runtime and Memory Improvements

The migration also cleaned up runtime behavior:

  • large buffers and task stacks are allocated earlier
  • repeated heap churn during call/media transitions has been reduced
  • microphone and speaker wrapper loops wake on real events instead of spinning
  • intercom_api parks its loop when idle
  • intercom TX uses lower-copy reads where possible
  • full profiles place selected buffers/stacks in PSRAM
  • full LVGL/audio profiles enter OTA maintenance mode before flashing

This helps demanding full-experience devices where media playback, Piper TTS, Micro Wake Word, Voice Assistant, AFE/AEC and intercom all coexist.

🧭 Maintained Board Direction

Current maintained baseline:

  • Waveshare ESP32-S3 Audio Board: full AFE, dual mic, TDM reference
  • Spotpear Ball v2: codec-backed AFE/intercom profiles
  • Generic S3 AEC: lightweight 4 MB-friendly full-experience profiles
  • Generic S3 AFE: larger flash full AFE profiles
  • Generic dual-bus: maintained intercom profiles
  • Waveshare P4 Touch: present and improving, still board-specific/experimental

🧪 Validation

Before this release, the public YAMLs were switched to remote release mode so users can download only the YAML and let ESPHome fetch packages, assets and external components from main.

Validation performed:

  • HACS validation passes
  • hassfest validation passes
  • generic-s3-full-afe-tcp.yaml compiles successfully with ESPHome 2026.5.1
  • ESPHome fetches this repository from main
  • Espressif managed components resolve and build correctly

Generic full AFE firmware size from the validation build is about 2.1 MB.

⬆️ Upgrade Notes

Recommended upgrade path:

  1. Update the Home Assistant integration through HACS.
  2. Restart Home Assistant.
  3. Hard-refresh the dashboard page containing intercom-card.
  4. Clear ESPHome build cache once.
  5. Recompile from the updated YAMLs.
  6. Flash the ESP firmware.

If you maintain custom YAMLs, start from the closest maintained profile and reapply only your board-specific changes.