Skip to content

Isolate receiver-side remote builds under per-build ESPHOME_DATA_DIR#578

Merged
bdraco merged 7 commits into
mainfrom
receiver-remote-build-isolated-data-dir
May 11, 2026
Merged

Isolate receiver-side remote builds under per-build ESPHOME_DATA_DIR#578
bdraco merged 7 commits into
mainfrom
receiver-remote-build-isolated-data-dir

Conversation

@bdraco
Copy link
Copy Markdown
Member

@bdraco bdraco commented May 11, 2026

What does this implement/fix?

User-reported Install failed with download_artifacts failed: receiver rejected (build_dir_missing) after a successful remote compile. Confirmed via traceback from the receiver:

FileNotFoundError: StorageJSON sidecar missing for
.esphome/.remote_builds/8Ja02_-QQeziVZemmcO2BZOnaLAaVSgq/acfloatmonitor32/acfloatmonitor32.yaml

Root cause

The receiver-side compile subprocess and the download-time reader resolved CORE.data_dir independently and disagreed:

  • Writer (subprocess): esphome computes CORE.data_dir = dirname(CORE.config_path)/.esphome, so a YAML at .esphome/.remote_builds/<id>/<device>/<device>.yaml lands its storage at <subtree>/.esphome/storage/<basename>.json.
  • Reader (dashboard process): pack_build_artifacts(configuration)load_build_artifacts(configuration)StorageJSON.load(ext_storage_path(configuration)), which uses the dashboard's CORE.data_dir (the sentinel YAML's parent + .esphome) and the full configuration string as the storage key.

Different paths → silent FileNotFoundErrorbuild_dir_missing reject → offloader's pick_build_path falls back to LOCAL on every install after the first. In HA-addon mode the symptom is different but the same class of bug — ESPHOME_DATA_DIR=/data is inherited by the subprocess, so two paired offloaders submitting kitchen.yaml collide on /data/storage/kitchen.yaml.json.

Fix

1. Pin ESPHOME_DATA_DIR on the receiver-side compile subprocess

<config_dir>/.esphome/.remote_builds/<dashboard_id>/<device>/

Every per-config artefact esphome writes (storage sidecar, idedata cache, build directory, PlatformIO project) now lands under one (dashboard_id, device)-keyed directory regardless of deployment mode (default / HA-addon / explicit ESPHOME_DATA_DIR env). The reader routes the configuration through remote_build_layout.parse_from_configuration and reads from the same per-build subtree — writer and reader agree on one path.

Why pin per-build rather than mirror esphome's natural dirname rule:

  • HA-addon mode would still collide on /data/storage/<basename>.json if we only fixed the read path.
  • Implicit-rule coupling is brittle; if upstream esphome changes the CORE.data_dir rule we silently drift. Setting the env explicitly means writer and reader share one declarative source.
  • Per-build isolation is correctness, not just performance: two offloaders submitting the same device name would otherwise share a build directory and one might end up flashing the other's .o files.

Same-offloader incremental builds for the same device still hit the warm build dir (same subtree across consecutive compiles). Cross-offloader collisions become from-scratch builds — the safe choice. PlatformIO's toolchain cache in ~/.platformio/packages/ is outside the data_dir and shared regardless, so heavy downloads aren't duplicated.

2. Centralise StorageJSON path resolution in helpers/storage_path.py

Audit of every ext_storage_path caller in the project shows all 11 sites today receive bare-basename configurations (the user's local YAMLs) — none of them ever see a remote-build configuration in production. So fixing just load_build_artifacts was sufficient today, but it relied on the invisible contract "no caller passes a remote-build path to ext_storage_path." A future caller forgetting that costs another silent build_dir_missing reject.

New module helpers/storage_path.py exposes:

  • resolve_data_dir(configuration) — per-build subtree for remote builds, CORE.data_dir otherwise.
  • resolve_storage_path(configuration) — sidecar path. Drop-in replacement for ext_storage_path everywhere in this project.
  • resolve_idedata_path(configuration, *, name) — idedata cache path.

The fork happens once, here. Every project-internal ext_storage_path caller is migrated:

  • controllers/firmware/controller.pyget_binaries / download / _verify_chip
  • controllers/config.pyget_compiled_device_info
  • controllers/devices/controller.py — storage write / version persist / archive listing
  • controllers/devices/helpers.py — wipe / remove / archive sidecar helpers
  • helpers/config_hash.py — build-info hash read
  • helpers/build_size.py
  • helpers/device_yaml.py — scanner-side load
  • helpers/build_artifacts.pyload_build_artifacts (already covered, now via the shared helper)

Local-only callers see identical behaviour because parse_from_configuration returns None for bare-basename inputs and the resolver falls through to upstream ext_storage_path's shape. A new caller that happens to receive a remote-build configuration is now correct by construction.

Why a separate module rather than co-locating in helpers/build_artifacts: build_artifacts is the flash-image discovery + idedata-manifest loader; the path resolution is a more general primitive several unrelated call sites need. Keeping the helper in its own module means those callers don't pull the artifacts-loader's transitive deps.

What lands

  • helpers/remote_build_layout.py — new RemoteBuildPath.data_dir(config_dir) accessor returning the per-build subtree. Shared by the writer-side env override and the reader-side path resolution.
  • helpers/storage_path.py — new module with resolve_data_dir / resolve_storage_path / resolve_idedata_path. Routes the configuration through the layout helper for remote-build paths, falls through to ext_storage_path for local YAMLs.
  • controllers/firmware/controller.py — env composition factored into _compose_subprocess_env(job). For receiver-side remote-build jobs (configuration parses through the layout helper) it pins ESPHOME_DATA_DIR; for local jobs it inherits unchanged. All existing ext_storage_path calls routed through resolve_storage_path.
  • helpers/build_artifacts.pyload_build_artifacts now reads through resolve_storage_path / resolve_idedata_path; the inline data_dir helpers it previously carried moved to storage_path as the public API.
  • 7 other production filesext_storage_path import + every call site switched to resolve_storage_path (or resolve_idedata_path where applicable). No behaviour change for any of them; all receive basenames.
  • tests/_storage_fixtures.pywrite_storage_json gains an optional data_dir= override so the per-build subtree shape can be modelled without rolling a new payload schema per test. Always keys the sidecar on Path(configuration).name regardless of branch — matches esphome's CORE.config_filename rule.
  • tests/controllers/firmware/test_subprocess_env.py — new file: three unit tests pinning the env override fork (local → no override, remote → per-build subtree, malformed remote path → falls through).
  • tests/test_storage_path.py — new file: seven unit tests pinning the resolver fork at the module-public seam (local vs remote, basename keyspace, idedata sibling).
  • tests/test_build_artifacts.py — covers the storage path inputs end-to-end via load_build_artifacts; redundant private-helper tests removed (the new test_storage_path.py is the focused entry point).
  • tests/e2e/test_install_round_trip.py_write_build_artifacts_on_disk updated to mirror the per-build write layout the receiver-side compile actually produces; the existing end-to-end download_artifacts round-trip test now exercises the new path.
  • 15 other test files — monkeypatches updated from ext_storage_pathresolve_storage_path so the rebound name matches the production import.

Testing

  • pytest tests/ -q → 3006 passed, 4 skipped.
  • The receiver-side e2e test test_remote_install_submit_then_lifecycle_then_download_on_one_session would fail under the previous (buggy) layout — it now reads where the writer landed.
  • New unit tests would fail any future regression on the env override or the resolver fork.

Related issue or feature (if applicable):

Types of changes

  • Bugfix (non-breaking change which fixes an issue) — bugfix
  • New feature (non-breaking change which adds functionality) — new-feature
  • Enhancement to an existing feature — enhancement
  • Breaking change (fix or feature that would cause existing functionality to not work as expected) — breaking-change
  • Refactor (no behaviour change) — refactor
  • Documentation only — docs
  • Maintenance / chore — maintenance
  • CI / workflow change — ci
  • Dependencies bump — dependencies

Frontend coordination

  • No frontend change needed
  • Companion frontend PR: esphome/device-builder-dashboard-frontend#

Checklist

  • The code change is tested and works locally.
  • Pre-commit hooks pass (ruff, codespell, yaml/json/python checks).
  • Tests have been added or updated under tests/ where applicable.
  • components.json has not been hand-edited.
  • Architecture-level changes are reflected in docs/ARCHITECTURE.md and/or docs/API.md.

A paired offloader's firmware/install silently rejected
build_dir_missing after a successful remote compile. The
receiver-side compile subprocess wrote storage / idedata / build
under esphome's default data_dir (computed from the YAML's
parent or the dashboard's ESPHOME_DATA_DIR env), but the
download-time reader resolved through ext_storage_path against
the dashboard process's own CORE.data_dir — different path,
silent FileNotFoundError, the offloader's pick_build_path
fell back to LOCAL on every install after the first.

Fix the read/write asymmetry by pinning the receiver-side
compile subprocess's ESPHOME_DATA_DIR explicitly to the
per-build subtree:

    <config_dir>/.esphome/.remote_builds/<dashboard_id>/<device>/

Every per-config artefact (storage sidecar, idedata cache,
build directory, PlatformIO project) now lands under one
(dashboard_id, device)-keyed directory regardless of
deployment mode (default / HA-addon / explicit
ESPHOME_DATA_DIR override). Reader-side
load_build_artifacts parses the configuration through
remote_build_layout.parse_from_configuration and reads from
the same per-build subtree, so writer and reader agree on
one path. The 6c TTL sweep already walks RemoteBuildPath.subtree
so the whole per-build state (now including the esphome data
dir) reclaims in one shutil.rmtree.

Tracked in issue #106 phase 7a-5; follow-up design notes for
remote CLEAN / RESET_BUILD_ENV under the new isolation model
are captured in the issue comment thread.
@bdraco bdraco added the bugfix Bug fix label May 11, 2026
Copilot AI review requested due to automatic review settings May 11, 2026 05:28
@bdraco bdraco added the bugfix Bug fix label May 11, 2026
@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented May 11, 2026

Merging this PR will not alter performance

✅ 12 untouched benchmarks


Comparing receiver-remote-build-isolated-data-dir (fe8c5e9) with main (70c16a7)

Open in CodSpeed

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 11, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.16%. Comparing base (3ce9281) to head (fe8c5e9).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##             main     #578   +/-   ##
=======================================
  Coverage   99.16%   99.16%           
=======================================
  Files          81       82    +1     
  Lines       10508    10536   +28     
=======================================
+ Hits        10420    10448   +28     
  Misses         88       88           
Flag Coverage Δ
py3.12 99.13% <100.00%> (+<0.01%) ⬆️
py3.14 99.16% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
esphome_device_builder/controllers/config.py 97.22% <100.00%> (+<0.01%) ⬆️
...e_device_builder/controllers/devices/controller.py 99.38% <100.00%> (+<0.01%) ⬆️
...home_device_builder/controllers/devices/helpers.py 99.34% <100.00%> (+<0.01%) ⬆️
..._device_builder/controllers/firmware/controller.py 100.00% <100.00%> (ø)
esphome_device_builder/helpers/build_artifacts.py 100.00% <100.00%> (ø)
esphome_device_builder/helpers/build_size.py 100.00% <100.00%> (ø)
esphome_device_builder/helpers/config_hash.py 100.00% <100.00%> (ø)
esphome_device_builder/helpers/device_yaml.py 98.76% <100.00%> (+<0.01%) ⬆️
...home_device_builder/helpers/remote_build_layout.py 100.00% <100.00%> (ø)
esphome_device_builder/helpers/storage_path.py 100.00% <100.00%> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes receiver-side remote-build artifact resolution by isolating each remote build under a per-build ESPHOME_DATA_DIR subtree, ensuring the receiver’s compile subprocess and the later artifact reader agree on where ESPHome writes StorageJSON, idedata, and build outputs. This addresses download_artifacts failed: receiver rejected (build_dir_missing) after successful remote compiles and prevents cross-offloader collisions in HA add-on mode.

Changes:

  • Add RemoteBuildPath.data_dir(config_dir) as the single source of truth for the per-build ESPHOME_DATA_DIR.
  • Pin ESPHOME_DATA_DIR for receiver-side remote-build compile subprocesses via FirmwareController._compose_subprocess_env.
  • Update artifact loading to resolve the correct data dir for remote-build configurations and adjust tests/fixtures accordingly.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
esphome_device_builder/helpers/remote_build_layout.py Adds RemoteBuildPath.data_dir() accessor to standardize per-build data-dir selection.
esphome_device_builder/controllers/firmware/controller.py Factors subprocess env composition and pins ESPHOME_DATA_DIR for receiver remote-build jobs.
esphome_device_builder/helpers/build_artifacts.py Resolves data_dir consistently for remote-build configs and loads StorageJSON/idedata from that subtree.
tests/_storage_fixtures.py Extends write_storage_json with data_dir override to model per-build layout in tests.
tests/controllers/firmware/test_subprocess_env.py Adds unit tests validating env composition for local vs remote-build configurations.
tests/test_build_artifacts.py Adds coverage for _resolve_data_dir and _idedata_path_in behavior.
tests/test_remote_build_artifacts_download.py Updates monkeypatch target for renamed idedata-path helper.
tests/e2e/test_install_round_trip.py Updates e2e artifact-on-disk fixture to mirror the new per-build ESPHOME_DATA_DIR layout.

Comment thread tests/_storage_fixtures.py Outdated
Every project-internal caller now routes through
helpers.storage_path.resolve_storage_path (and its lower-level
resolve_data_dir / resolve_idedata_path siblings) instead of
calling upstream esphome.storage_json.ext_storage_path
directly. The new helper forks on whether the configuration
parses as a remote-build path: receiver-side remote builds
resolve to the per-build subtree the compile subprocess writes
into (ESPHOME_DATA_DIR pin from earlier in this PR), everything
else falls through to ext_storage_path's existing CORE.data_dir
behaviour.

Why a separate seam: the previous shape relied on the invisible
contract "no caller passes a remote-build path to
ext_storage_path." That happened to hold today but the cost of
a future caller forgetting is another silent build_dir_missing
reject. With one resolver every consumer is automatically
correct regardless of where the configuration came from.

Why a separate module rather than inside build_artifacts: path
resolution is a more general primitive several unrelated callers
need (config_hash, build_size, devices, firmware get_binaries /
download, get_compiled_device_info, device_yaml scanner, mdns
version persistence). Keeping the helper in its own module
means those callers do not pull the artifacts loader.

Callers migrated:
 * controllers/firmware/controller.py (get_binaries, download,
   _verify_chip)
 * controllers/config.py (get_compiled_device_info)
 * controllers/devices/controller.py (storage write, version
   persist, archive listing)
 * controllers/devices/helpers.py (wipe / remove / archive
   sidecar helpers)
 * helpers/config_hash.py (build-info hash read)
 * helpers/build_size.py
 * helpers/device_yaml.py (scanner-side load)

Tests updated to monkeypatch the new resolve_storage_path
binding in each module rather than the old ext_storage_path.
A new test_storage_path.py pins the resolver fork directly.
Per Copilot review on #578: the helper's docstring promised
basename-keyed sidecars (matching esphome's
``CORE.config_filename`` shape) but the default branch keyed on
the raw ``configuration`` argument. A future caller that passed
``subdir/kitchen.yaml`` would crash on the missing intermediate
``storage/subdir/`` directory. None of today's callers hit that
path (every existing test passes a bare basename), but the
behaviour and the docstring drifted; tighten both onto
``Path(configuration).name`` so the two branches behave
identically regardless of whether *configuration* arrives
basename or nested.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 28 out of 28 changed files in this pull request and generated 4 comments.

Comment thread esphome_device_builder/controllers/firmware/controller.py Outdated
Comment thread tests/e2e/test_install_round_trip.py Outdated
Comment thread esphome_device_builder/controllers/firmware/controller.py
Comment thread esphome_device_builder/controllers/firmware/controller.py
bdraco added 2 commits May 11, 2026 01:02
Per second-round Copilot review on #578: after the centralization
moved the resolver from helpers.build_artifacts._resolve_data_dir
to helpers.storage_path.resolve_data_dir and every call site
from ext_storage_path() to resolve_storage_path(), several
comments and docstrings still pointed at the old names.

* controllers/firmware/controller.py: _compose_subprocess_env
  docstring, plus the get_binaries / download
  traversal-validation rationale (the basename collapse argument
  is now wrong-shaped — resolve_storage_path keys on
  Path(configuration).name, so the validator's job is to stop a
  traversal-shaped configuration from reaching the closure at
  all, not to compensate for ext_storage_path's missing
  sanitisation).
* tests/e2e/test_install_round_trip.py: _write_build_artifacts_on_disk
  docstring still cited the removed private helper.
* tests/test_build_artifacts.py: _write_idedata docstring's
  cross-reference.

No behaviour change; just keeps the docs honest so the security
rationale doesn't mislead future readers.
Commit 7bc1cd5 bundled two unrelated edits that came in from a
parallel branch's working tree: the compile→upload progress
reset in remote_runner._fetch_and_run_local_upload plus its
matching test. Those belong in their own PR; revert just those
two files so this branch stays focused on the stale-reference
docstring fixes.

No behaviour change to anything that was already in this PR —
the docstring / comment updates from 7bc1cd5 are untouched.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 28 out of 28 changed files in this pull request and generated 1 comment.

Comment thread esphome_device_builder/controllers/devices/helpers.py Outdated
bdraco added 2 commits May 11, 2026 01:16
…ing resolver

Per third-round Copilot review on #578: the docstring claimed
``resolve_storage_path(configuration) -> data_dir/storage/<configuration>.json``,
which was true for the old ``ext_storage_path`` but not for the
new resolver — ``resolve_storage_path`` collapses to
``Path(configuration).name`` so the sidecar lands at
``data_dir/storage/<basename>.json``.

Rewrite the traversal-analysis paragraph to:

* Distinguish the two callers' shapes: the archive path
  joins the raw configuration (no basename collapse — full
  traversal exposure), the storage path is already
  basename-collapsed by the resolver (so the segments can't
  escape ``data_dir/storage``).
* Spell out the remaining storage-side risk under the
  collapsing resolver: a value like ``../etc/passwd``
  collapses to ``passwd.json`` and lets a caller target an
  attacker-named sidecar inside ``data_dir/storage``.
* Keep the bottom line: rejecting non-basename inputs at the
  WS boundary closes both gaps regardless of which downstream
  helper consumes the value.

No behaviour change — the validator's actual rules are
unchanged.
That markdown is the planning doc for the compile→upload
progress-reset work that lives on a parallel branch (same
provenance as the remote_runner.py / test_remote_runner.py
leakage backed out in bede49b). Doesn't belong in #578.
@bdraco bdraco merged commit 62fecb2 into main May 11, 2026
13 checks passed
@bdraco bdraco deleted the receiver-remote-build-isolated-data-dir branch May 11, 2026 06:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bugfix Bug fix

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants