Skip to content

Fix Bundle file not found on every remote-build submit#552

Merged
bdraco merged 1 commit into
mainfrom
fix-submit-job-bundle-path
May 10, 2026
Merged

Fix Bundle file not found on every remote-build submit#552
bdraco merged 1 commit into
mainfrom
fix-submit-job-bundle-path

Conversation

@bdraco
Copy link
Copy Markdown
Member

@bdraco bdraco commented May 10, 2026

What does this implement/fix?

Fixes "Bundle file not found" on every remote build the operator dispatches through remote_build/submit_job.

What was happening

SubmitJobReceiver._extract_and_queue was writing the assembled tarball to:

<config>/.esphome/.remote_builds/<dashboard_id>/<device_name>/bundle.tar.gz

…and then calling esphome.bundle.prepare_bundle_for_compile(bundle_path, target_dir) against that same <dashboard_id>/<device_name>/ directory.

The upstream helper preserves only .esphome / .pioenvs / .pio inside target_dir and wipes every other entry before its inner extract_bundle reads from bundle_path (esphome/bundle.py:752-758). So the just-written bundle.tar.gz got deleted between the executor write and the inner extract, and extract_bundle raised EsphomeError("Bundle file not found: …") at bundle.py:467. The receiver-side submit_job handler caught that, mapped it to a structured artifacts_end{accepted: false, reason: "extract_failed"} reject, and the user saw a failed build with this log line:

WARNING [remote_build.submit_job] submit_job from <dashboard_id>: extract failed for
job <job_id> (<config>.yaml): Bundle file not found:
<config>/.esphome/.remote_builds/<dashboard_id>/<device>/bundle.tar.gz

What changes

bundle_path moves from target_dir / "bundle.tar.gz" to target_dir.parent / f"{device_name}.tar.gz" — a sibling of the build subtree under the same <dashboard_id>/ parent. The wipe step can't reach it.

Path-traversal safety carries over unchanged: device_name is the canonical form _validate_configuration_filename returns (which already strips separators / ..), so a malicious sender still can't pick a shape that climbs out of the <dashboard_id>/ namespace. The constant rename (_BUNDLE_FILENAME_BUNDLE_SUFFIX) flags the shape change; the comment on the new constant explains why "outside target_dir" is load-bearing.

Why the existing tests didn't catch it

  • Unit tests (tests/test_remote_build_submit_job.py): every test that exercises the extract path stubs prepare_bundle_for_compile with a trivial pass-through that just asserts the bundle file exists at write time and returns a synthetic YAML path. They never modelled the upstream wipe semantics, so a regression in where the bundle lives relative to target_dir was invisible at the unit level.
  • E2E tests (tests/e2e/): the harness has tests for pair/session lifecycle, JOB_* fan-out, cancel_job, and (newly in Add e2e coverage for download_artifacts round-trip (6a) #549) download_artifacts — but no test drives the offloader→receiver submit_job round-trip with the real upstream prepare_bundle_for_compile. test_submit_job_fanout.py's module docstring explicitly calls out the gap: "A submit_job e2e test is the natural next follow-up once the receiver-side firmware controller can be swapped for a recording stub without coupling the harness to DeviceBuilder."

So both the unit suite and the e2e harness had a blind spot at the same junction. New regression in this PR:

  • test_submit_job_bundle_path_survives_prepare_bundle_wipe stubs prepare_bundle_for_compile with a wipe-emulating function (mirrors upstream's preserve set, deletes non-preserved entries in target_dir, then reads bundle_path). A regression that moves the bundle back inside target_dir trips the read_bytes() arm and the test fails. This catches the bug pattern without needing to build a real esphome bundle.

A real submit_job e2e test (constructing a real bundle via esphome.bundle.BundleBuilder, exercising upstream's prepare_bundle_for_compile end-to-end) is the broader fix for the e2e gap — separate follow-up.

Related issue or feature (if applicable):

Types of changes

  • Bugfix (non-breaking change which fixes an issue) — bugfix
  • New feature (non-breaking change which adds functionality) — new-feature
  • Enhancement to an existing feature — enhancement
  • Breaking change (fix or feature that would cause existing functionality to not work as expected) — breaking-change
  • Refactor (no behaviour change) — refactor
  • Documentation only — docs
  • Maintenance / chore — maintenance
  • CI / workflow change — ci
  • Dependencies bump — dependencies

Frontend coordination

  • No frontend change needed
  • Companion frontend PR: esphome/device-builder-dashboard-frontend#

Checklist

  • The code change is tested and works locally.
  • Pre-commit hooks pass (ruff, codespell, yaml/json/python checks).
  • Tests have been added or updated under tests/ where applicable.
  • components.json has not been hand-edited (regenerate via script/sync_components.py if a sync is needed).
  • Architecture-level changes are reflected in docs/ARCHITECTURE.md and/or docs/API.md.

submit_job was writing the assembled tarball to
<config>/.esphome/.remote_builds/<dashboard_id>/<device_name>/bundle.tar.gz
and then calling esphome.bundle.prepare_bundle_for_compile(
bundle_path, target_dir). Upstream preserves only .esphome /
.pioenvs / .pio inside target_dir and wipes every other entry
before extract_bundle reads back from bundle_path -- so the
just-written bundle.tar.gz got deleted between the executor
write and the inner extract, surfacing as

  submit_job from <dashboard_id>: extract failed for job
  <job_id> (<config>.yaml): Bundle file not found:
  <config>/.esphome/.remote_builds/<dashboard_id>/<device>/bundle.tar.gz

for every remote build the operator tried from the UI.

Move the bundle to a sibling of target_dir
(<dashboard_id>/<device_name>.tar.gz next to the
<dashboard_id>/<device_name>/ build subtree) so upstream's
wipe step can't reach it. The constant rename
(_BUNDLE_FILENAME -> _BUNDLE_SUFFIX) flags the shape change.
device_name is the canonical form
_validate_configuration_filename returns, so path-traversal
safety from the comment on the old constant carries over
unchanged.

Regression test: new
test_submit_job_bundle_path_survives_prepare_bundle_wipe in
test_remote_build_submit_job.py stubs
prepare_bundle_for_compile with a wipe-emulating function
(iterate target_dir, delete non-preserved, then read
bundle_path) so a regression that moves the bundle back
inside target_dir trips the read_bytes() arm and surfaces as
extract_failed on the ack. The existing unit tests stubbed
prepare_bundle_for_compile with a trivial pass-through
(assert exists + write yaml + return), which didn't model the
wipe semantics; that's why the bug lived.
Copilot AI review requested due to automatic review settings May 10, 2026 22:34
@github-actions github-actions Bot added the bugfix Bug fix label May 10, 2026
@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented May 10, 2026

Merging this PR will not alter performance

✅ 12 untouched benchmarks


Comparing fix-submit-job-bundle-path (76a85e7) with main (ceace1b)

Open in CodSpeed

@codecov-commenter
Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.11%. Comparing base (ceace1b) to head (76a85e7).

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##             main     #552   +/-   ##
=======================================
  Coverage   99.11%   99.11%           
=======================================
  Files          76       76           
  Lines        9993     9993           
=======================================
  Hits         9905     9905           
  Misses         88       88           
Flag Coverage Δ
py3.12 99.08% <100.00%> (ø)
py3.14 99.11% <100.00%> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...ice_builder/controllers/remote_build/submit_job.py 100.00% <100.00%> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a receiver-side remote-build regression where the assembled bundle tarball was written inside target_dir, then deleted by upstream esphome.bundle.prepare_bundle_for_compile() (which wipes non-preserved entries inside target_dir before extracting), causing "Bundle file not found" on every submit_job.

Changes:

  • Move the assembled bundle tarball path to be a sibling of the per-device build subtree (<dashboard_id>/<device>.tar.gz) instead of a child of it.
  • Rename the constant from a fixed filename to a suffix (_BUNDLE_SUFFIX) and document why the sibling placement is required.
  • Add a unit test that emulates the upstream “wipe-then-read” behavior to prevent regressions.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
esphome_device_builder/controllers/remote_build/submit_job.py Writes the assembled bundle tarball outside target_dir so upstream wipe semantics can’t delete it before extraction.
tests/test_remote_build_submit_job.py Adds a regression test that simulates upstream wiping target_dir and then reading bundle_path.

@bdraco bdraco merged commit 55d8904 into main May 10, 2026
17 checks passed
@bdraco bdraco deleted the fix-submit-job-bundle-path branch May 10, 2026 22:41
bdraco added a commit that referenced this pull request May 10, 2026
Closes the unit-vs-e2e gap that let PR #552's "Bundle file not
found" regression ship. The existing
tests/test_remote_build_submit_job.py stubbed
prepare_bundle_for_compile with a trivial pass-through, and
the existing e2e harness covered pair/session, JOB_* fan-out,
and cancel_job — but no test drove the offloader->receiver
submit_job round-trip with the real upstream
prepare_bundle_for_compile. Without that, a regression in the
bundle-write / extract seam between the two halves was
invisible to both layers.

Two new tests in tests/e2e/test_submit_job.py:

* test_submit_job_round_trip_extracts_real_bundle_and_queues_job:
  builds a minimal-but-valid esphome bundle in-test
  (manifest.json + kitchen.yaml in a gzipped tar — same shape
  upstream's extract_bundle accepts), drives
  PeerLinkClient.submit_job across the live peer-link Noise
  WS, asserts the ack landed accepted=true AND the extracted
  YAML actually exists on disk at the receiver-side path the
  dispatch resolved. The on-disk assertion is the load-bearing
  one: a regression that puts the bundle back inside
  target_dir would surface here as the upstream wipe step
  deleting the bundle before extract_bundle can read it,
  accepted would flip to false, and the test fails before the
  disk assertion ever fires.

* test_submit_job_round_trip_then_fanout_to_offloader_bus:
  stitches the fan-out leg onto the same round-trip — once
  the receiver queues the FirmwareJob, JOB_QUEUED +
  JOB_STARTED on the receiver bus drive JobFanout to push
  job_state_changed back through the same session, landing as
  OFFLOADER_JOB_STATE_CHANGED on the offloader bus. The
  existing test_submit_job_fanout.py covered the fan-out half
  in isolation by seeding the correlation directly; this one
  pins both halves end-to-end so a regression in either
  surfaces here.

The offloader-side WS command's `esphome bundle` CLI
subprocess is deliberately bypassed — that path is upstream
esphome's contract, covered by tests on build_yaml_bundle.
The bundle is built in-test and handed straight to
PeerLinkClient.submit_job so the e2e stays focused on the
receiver-side gap.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bugfix Bug fix

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants