Skip to content

Route firmware/install through pick_build_path + REMOTE upload chain (7a-3)#568

Merged
bdraco merged 5 commits into
mainfrom
7a-3-install-integration
May 11, 2026
Merged

Route firmware/install through pick_build_path + REMOTE upload chain (7a-3)#568
bdraco merged 5 commits into
mainfrom
7a-3-install-integration

Conversation

@bdraco
Copy link
Copy Markdown
Member

@bdraco bdraco commented May 11, 2026

What does this implement/fix?

Phase 7a-3 of issue #106. Closes the transparent-install loop the 7a-2b runner (#560) left half-wired. The user-visible payoff: with a healthy paired build server, clicking Install on a device transparently dispatches the compile to that server and flashes locally with the receiver's bytes — no extra UI to discover, no separate Send-builds dance.

Two coupled pieces

(a) firmware/install scheduler integration

FirmwareController.install / install_bulk now route through helpers.build_scheduler.pick_build_path before queueing. When an APPROVED + peer-link-connected + idle pairing is available, the constructed FirmwareJob carries source=REMOTE + source_pin_sha256=<pin> + source_label=<receiver_label>; otherwise the job stays source=LOCAL and runs through the existing in-process subprocess pipeline. Silent fallback by design — the user doesn't pick a build location; the scheduler routes transparently.

Serial port values (/dev/ttyUSB0, COM3, etc.) force LOCAL. The REMOTE flash step uses esphome upload --file <staged_firmware.bin> which routes through upload_using_esptool as a single FlashImage(offset="0x0"). That works for OTA / web_server pushes but corrupts an ESP32 wired flash, which needs the multi-image bootloader / partitions / OTA-data set stitched from idedata.extra.flash_images. Until the runner can stage the full multi-image set in a way the non---file path resolves cleanly, serial REMOTE installs stay LOCAL.

New on RemoteBuildController:

  • build_scheduler_snapshot() — frozen BuildSchedulerInputs view of _pairings / _open_peer_links / _peer_queue_status plus the remote_builds_enabled master toggle (hardcoded True for now — the 7b Settings UI lands the proper toggle alongside the per-host / force-local / version-mismatch knobs).
  • get_pairing(pin_sha256) — sync StoredPairing lookup the install resolver uses to stamp source_label on the new job.

(b) Runner UPLOAD / INSTALL chain

remote_runner.run_remote_compile_job previously rejected anything that wasn't JobType.COMPILE. Now it also accepts UPLOAD and INSTALL:

  • Same compile dispatch via client.submit_job(target="compile") (per § Transparent install flow's load-bearing "receiver only ever compiles" policy).
  • On receiver-completed, pulls the artifact tarball back through client.download_artifacts(job_id=...).
  • Extracts firmware.bin to a per-job tmpdir via the shared extract_firmware_bin helper.
  • Spawns esphome upload --device <port> --file <staged> through FirmwareController._tracked_subprocess — the same plumbing the local subprocess path uses, so:
    • Output streams via _ingest_output_line (one event stream for follow_job / firmware-tasks regardless of which CPU produced the bytes).
    • The cancel handler's _terminate_current_process lands SIGTERM on the upload chain.
  • Failures (download_artifacts reject, malformed tarball, missing firmware.bin, non-zero esphome upload exit) route through _fail_locally and honour cancel intent if the user already clicked Stop.

Refactor — tarball helpers consolidated

The receiver-side pack helpers (_pack_build_artifacts / _PackedArtifacts, previously in artifacts_download.py) and the offloader-side unpack helpers (_unpack_artifacts_response / _UnpackArtifactsError / _read_artifacts_tarball + six private helpers, previously in remote_build/controller.py) moved to a new controllers/remote_build/artifacts_tarball.py module so the wire format is owned in one place. The runner imports the same extract_firmware_bin helper the WS unpacker uses instead of duplicating the tarfile-open logic. artifacts_download.py keeps the streaming flow + ArtifactsDownloadSender; controller.py keeps the WS-side dispatch + _download_artifacts_error_to_command_error. Test imports / monkeypatch paths follow.

Tracking

Filed #562 to upstream esphome.__main__.get_port_type to a stable public module so we can drop our local _is_serial_port. The matching upstream PR + dep-bump lands later; the local classifier is fine as a holding pattern.

Related issue or feature (if applicable):

Types of changes

  • Bugfix (non-breaking change which fixes an issue) — bugfix
  • New feature (non-breaking change which adds functionality) — new-feature
  • Enhancement to an existing feature — enhancement
  • Breaking change (fix or feature that would cause existing functionality to not work as expected) — breaking-change
  • Refactor (no behaviour change) — refactor
  • Documentation only — docs
  • Maintenance / chore — maintenance
  • CI / workflow change — ci
  • Dependencies bump — dependencies

Frontend coordination

No new WS commands, no wire-shape changes. The install dialog's "Building on {receiver_label}" sub-line lands in a small lockstep frontend PR — job.source_label is the existing field added in #556, so the frontend just reads it.

Checklist

  • The code change is tested and works locally.
  • Pre-commit hooks pass (ruff, codespell, yaml/json/python checks).
  • Tests have been added or updated under tests/ where applicable.
  • components.json has not been hand-edited (regenerate via script/sync_components.py if a sync is needed).
  • Architecture-level changes are reflected in docs/ARCHITECTURE.md and/or docs/API.md.

…(7a-3)

Closes the transparent-install loop the 7a-2b runner left
half-wired. Two coupled pieces land together:

**(a) firmware/install scheduler integration.** install /
install_bulk now route through pick_build_path before
queueing. APPROVED + connected + idle pairing → job carries
source=REMOTE + source_pin_sha256 + source_label. Serial
ports force LOCAL — the runner's flash step uses
"esphome upload --file" which routes through esptool as a
single FlashImage at offset 0x0; works for OTA, corrupts
ESP32 wired flash. New on RemoteBuildController:
build_scheduler_snapshot() + get_pairing(pin).

**(b) Runner UPLOAD / INSTALL chain.** run_remote_compile_job
now accepts UPLOAD / INSTALL too: same compile dispatch,
then on receiver-completed pull artifacts via
download_artifacts, stage firmware.bin to a tmpdir, spawn
"esphome upload --device --file" through _tracked_subprocess
so cancel SIGTERM lands on the upload chain just like the
local-only path.

**Refactor.** Tarball pack/unpack helpers consolidated into
controllers/remote_build/artifacts_tarball.py; the runner
and the WS unpacker now share extract_firmware_bin instead
of duplicating tarfile-open logic.

Tracking issue #562 for promoting esphome's get_port_type
out of __main__ so we can drop our local _is_serial_port.

Tests: build_scheduler_snapshot immutability, install
scheduler routing (LOCAL/REMOTE/serial-forces-LOCAL/no-
remote-build), runner UPLOAD/INSTALL chain (COMPLETED on
exit 0, FAILED on non-zero, FAILED on download error /
malformed tarball, CANCELLED on cancel during upload,
UPLOAD shares chain with INSTALL). 2944 tests pass.
Copilot AI review requested due to automatic review settings May 11, 2026 02:04
@github-actions github-actions Bot added the new-feature New feature label May 11, 2026
@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented May 11, 2026

Merging this PR will not alter performance

✅ 12 untouched benchmarks


Comparing 7a-3-install-integration (125c704) with main (eb43d47)

Open in CodSpeed

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR completes phase 7a-3 of the transparent install flow by routing firmware/install (and bulk install) through the build scheduler so installs can be offloaded to an idle, connected, approved remote build peer, while keeping the actual flashing step local by downloading the receiver’s artifacts and running esphome upload --file ... on the offloader.

Changes:

  • Route FirmwareController.install() / install_bulk() through pick_build_path() and stamp REMOTE jobs with source_pin_sha256 + source_label when a suitable pairing is available (with serial ports forcing LOCAL).
  • Extend the REMOTE runner to support UPLOAD/INSTALL by downloading artifacts, extracting firmware.bin, and running a local esphome upload --file subprocess with the existing tracked-subprocess + output ingestion plumbing.
  • Consolidate artifact tarball pack/unpack helpers into controllers/remote_build/artifacts_tarball.py and update call sites/tests accordingly.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
esphome_device_builder/controllers/firmware/controller.py Routes install/bulk install through the scheduler and stamps jobs with LOCAL/REMOTE source metadata.
esphome_device_builder/controllers/firmware/helpers.py Factors serial-port detection into _is_serial_port() for reuse by validation and install routing.
esphome_device_builder/controllers/firmware/remote_runner.py Adds REMOTE UPLOAD/INSTALL chain: download artifacts → extract firmware.bin → local upload subprocess.
esphome_device_builder/controllers/remote_build/controller.py Adds build_scheduler_snapshot() + get_pairing() and switches WS artifact unpack to shared tarball helpers.
esphome_device_builder/controllers/remote_build/artifacts_tarball.py New module owning the pack/unpack/extract helpers for the artifacts tarball wire format.
esphome_device_builder/controllers/remote_build/artifacts_download.py Updates receiver-side download sender to use shared pack_build_artifacts() helpers/types.
tests/controllers/firmware/conftest.py Ensures test DeviceBuilder stubs include remote_build=None to match pre-start production shape.
tests/controllers/firmware/test_install.py Adds scheduler-routing tests for LOCAL/REMOTE decisions and serial-port force-local behavior.
tests/controllers/firmware/test_remote_runner.py Adds REMOTE UPLOAD/INSTALL runner tests including artifact download, tarball errors, upload failures, and cancel-during-upload.
tests/controllers/firmware/test_rename_lock.py Updates install-bulk test stub to include remote_build=None for scheduler integration.
tests/test_remote_build_artifacts_download.py Updates tests to use new shared tarball helper names/paths and types.
tests/test_remote_build_controller.py Adds tests for build_scheduler_snapshot() immutability semantics and get_pairing().

Comment thread esphome_device_builder/controllers/firmware/remote_runner.py
Comment thread esphome_device_builder/controllers/firmware/remote_runner.py Outdated
Comment thread esphome_device_builder/controllers/remote_build/artifacts_tarball.py Outdated
Five issues + the blockbuster mkdir CI failure:

1. **Blocking ``os.mkdir`` in async runner.** CI caught
   ``tempfile.TemporaryDirectory()`` running synchronously
   on the event loop. Use ``tempfile.mkdtemp`` + manual
   ``shutil.rmtree`` cleanup via ``run_in_executor`` so the
   blocking syscalls never touch the loop.

2. **``run_remote_compile_job`` is misleadingly named.** It
   now handles COMPILE / UPLOAD / INSTALL. Renamed to
   ``run_remote_job``; import in ``controller.py`` and
   every test reference follow.

3. **Successful REMOTE COMPILE never stamped ``exit_code``.**
   Local compiles always stamp the subprocess's exit, and
   the legacy ``follow_job`` framing coerces ``None`` to a
   failure code. Set ``job.exit_code = 0`` on the COMPILE
   success path so downstream consumers don't misread a
   successful remote compile as a failure.

4. **``extract_firmware_bin`` decompression-bomb vector.**
   The tarball comes from a peer-controlled source, and
   gzip can shrink huge zero-filled / sparse data to a tiny
   on-the-wire payload. Added a ``_check_member_size``
   per-member + cumulative cap against
   ``FIRMWARE_MAX_TOTAL_BYTES`` so a malformed tarball
   declaring multi-GiB members bails before ``extractfile``
   reads a single byte. ``read_artifacts_tarball`` gets the
   same guard. ``getmember("firmware.bin")`` replaces the
   linear-scan loop while we're in there.

5. **``_finalise_compile`` spelling inconsistency.** Sister
   helper ``_finalize_cancelled`` uses American spelling;
   renamed to ``_finalize_compile``, then generalised to
   ``_finalize_success`` since the UPLOAD/INSTALL success
   path now uses the same helper (DRY pass — see #6).

6. **DRY duplication on terminal-event fires.** The runner
   had three near-identical "mark COMPLETED + fire
   ``JOB_COMPLETED``" and two near-identical "mark FAILED
   + set error + fire ``JOB_FAILED``" sites. Folded the
   COMPILE branch, the UPLOAD-success branch, into
   ``_finalize_success``; folded the UPLOAD-failed branch
   and the receiver-reported-failed branch into the
   existing ``_fail_locally`` (which is also
   cancel-aware, so a Stop that races a failure now
   finalises CANCELLED on every path).

7. **``bus: Any`` typing.** Tightened
   ``_run_upload_subprocess`` to ``bus: EventBus`` matching
   the rest of the module.

8. **Cancel-before-spawn early check.** A Stop that
   landed between the receiver's completed frame and the
   subprocess spawn used to start the upload and
   immediately terminate it. ``_fetch_and_run_local_upload``
   now checks ``_cancel_requested`` before staging the
   tmpdir and routes through ``_finalize_cancelled``,
   skipping the wasted spawn.

9. **Env vars duplicated between LOCAL and remote upload
   paths.** Extracted to
   ``constants.ESPHOME_SUBPROCESS_ENV``; both call sites
   now build env as ``{**os.environ, **ESPHOME_SUBPROCESS_ENV}``
   so the two streams look identical to subscribers
   regardless of which CPU produced the bytes.

10. **No ``install_bulk`` scheduler-integration test.** Added
    two: bulk-routing-to-REMOTE-when-pairing-idle, and
    serial-port-forces-every-config-LOCAL.

2946 tests pass; full firmware coverage holds.
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 11, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.16%. Comparing base (00d5e93) to head (125c704).
⚠️ Report is 4 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff            @@
##             main     #568    +/-   ##
========================================
  Coverage   99.15%   99.16%            
========================================
  Files          80       81     +1     
  Lines       10378    10495   +117     
========================================
+ Hits        10290    10407   +117     
  Misses         88       88            
Flag Coverage Δ
py3.12 99.13% <100.00%> (+<0.01%) ⬆️
py3.14 99.16% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...e_device_builder/controllers/firmware/constants.py 100.00% <100.00%> (ø)
..._device_builder/controllers/firmware/controller.py 100.00% <100.00%> (ø)
...ome_device_builder/controllers/firmware/helpers.py 100.00% <100.00%> (ø)
...vice_builder/controllers/firmware/remote_runner.py 100.00% <100.00%> (ø)
...der/controllers/remote_build/artifacts_download.py 100.00% <100.00%> (ø)
...lder/controllers/remote_build/artifacts_tarball.py 100.00% <100.00%> (ø)
...ice_builder/controllers/remote_build/controller.py 99.70% <100.00%> (-0.03%) ⬇️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 3 comments.

Comment thread esphome_device_builder/controllers/remote_build/controller.py Outdated
Comment thread esphome_device_builder/controllers/firmware/controller.py
bdraco added 2 commits May 10, 2026 21:52
Codecov flagged two remaining uncovered lines:

1. ``controller.py:1572`` — ``_resolve_install_source``'s
   "scheduler picked a pin but ``get_pairing`` returned
   None" branch (TOCTOU window where an ``unpair`` lands
   between the scheduler snapshot read and the
   ``get_pairing`` lookup). Test:
   ``test_install_falls_back_to_local_when_scheduler_picked_pin_disappeared``.

2. ``remote_runner.py:595`` — the post-spawn
   in-context-manager cancel check that fires
   ``_terminate_current_process`` when a cancel landed
   *during* the staging executor hops (mkdtemp +
   write_bytes) between the pre-spawn early-check and
   the spawn-returns moment. Tested by patching
   ``Path.write_bytes`` to flip ``_cancel_requested`` as a
   side effect so the cancel lands in exactly that gap.

All three 7a-3 modules now 100% covered.
1. ``build_scheduler_snapshot`` docstring claimed
   ``StoredPairing`` is a frozen dataclass; it's actually a
   mutable ``@dataclass`` edited in place
   (``_apply_pair_status_result``, ``_commit_endpoint_rebind``).
   Rewrote the docstring to describe what the shallow copy
   actually buys (mapping-membership isolation) vs what it
   doesn't (the rows themselves remain shared). The scheduler
   today is the only consumer and runs sync on the same
   event-loop tick as the install handler, reading scalar
   fields no in-flight call mutates — so the current shape
   is sound; the doc now spells that out + flags the
   ``PairingSummary`` projection as the deliberate
   alternative if a future consumer needs a deeper
   snapshot.

2. ``extract_firmware_bin`` previously gated on
   ``tar.extractfile(member) is None`` only — which
   *follows* symlinks + hardlinks transparently and returns
   a readable stream for them. A hostile peer could write a
   ``firmware.bin`` symlink pointing at ``../../etc/passwd``
   and the runner would happily flash whatever the link
   resolved to on the receiver's filesystem. Added an
   explicit ``member.isfile()`` gate matching
   ``_read_tarball_member``'s shape on the general-purpose
   unpack path, then dropped the now-unreachable ``is None``
   belt-and-braces (cast-to-BufferedReader instead — same
   pattern the helper above already uses). Test:
   ``test_extract_firmware_bin_raises_when_firmware_is_a_symlink``.

3. ``_execute_remote_job`` docstring still said
   "COMPILE-only — UPLOAD / INSTALL go through 7a-3". 7a-3
   is this PR — UPLOAD / INSTALL are supported. Rewrote to
   describe the actual dispatch shape (compile-only via
   ``submit_job(target="compile")`` for both, then local
   ``esphome upload --file`` for UPLOAD / INSTALL on
   receiver-completed).

100% coverage holds across all three modules.
@bdraco bdraco merged commit 0ac3ede into main May 11, 2026
13 checks passed
@bdraco bdraco deleted the 7a-3-install-integration branch May 11, 2026 02:58
bdraco added a commit that referenced this pull request May 14, 2026
Four CLAUDE.md fixes:

1. Move ``_send_initial_queue_status`` to the bottom of the
   module (after ``handle_cancel_job``) so it sits with the
   other private helpers; CLAUDE.md says public API on top,
   underscore-prefixed helpers at the bottom.

2. Trim the ``register_peer_link_session`` docstring; drop
   the ``#568 regression`` cross-reference and replace the
   ``_peer_queue_status`` / ``pick_build_path`` mechanism
   restatement with a one-line caller-visible contract.

3. Trim the ``_send_initial_queue_status`` docstring to a
   single line (the previous version had
   "Single-session counterpart of broadcast_queue_status"
   framing CLAUDE.md explicitly calls out).

4. Drop the ``Same shape — discard any in-flight artifacts
   download`` comment; the parallel-block framing is
   exactly what CLAUDE.md asks comments to avoid, and the
   code below it (``discard_session(session.dashboard_id)``)
   reads obviously.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

new-feature New feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants