Route firmware/install through pick_build_path + REMOTE upload chain (7a-3)#568
Merged
Conversation
…(7a-3) Closes the transparent-install loop the 7a-2b runner left half-wired. Two coupled pieces land together: **(a) firmware/install scheduler integration.** install / install_bulk now route through pick_build_path before queueing. APPROVED + connected + idle pairing → job carries source=REMOTE + source_pin_sha256 + source_label. Serial ports force LOCAL — the runner's flash step uses "esphome upload --file" which routes through esptool as a single FlashImage at offset 0x0; works for OTA, corrupts ESP32 wired flash. New on RemoteBuildController: build_scheduler_snapshot() + get_pairing(pin). **(b) Runner UPLOAD / INSTALL chain.** run_remote_compile_job now accepts UPLOAD / INSTALL too: same compile dispatch, then on receiver-completed pull artifacts via download_artifacts, stage firmware.bin to a tmpdir, spawn "esphome upload --device --file" through _tracked_subprocess so cancel SIGTERM lands on the upload chain just like the local-only path. **Refactor.** Tarball pack/unpack helpers consolidated into controllers/remote_build/artifacts_tarball.py; the runner and the WS unpacker now share extract_firmware_bin instead of duplicating tarfile-open logic. Tracking issue #562 for promoting esphome's get_port_type out of __main__ so we can drop our local _is_serial_port. Tests: build_scheduler_snapshot immutability, install scheduler routing (LOCAL/REMOTE/serial-forces-LOCAL/no- remote-build), runner UPLOAD/INSTALL chain (COMPLETED on exit 0, FAILED on non-zero, FAILED on download error / malformed tarball, CANCELLED on cancel during upload, UPLOAD shares chain with INSTALL). 2944 tests pass.
Contributor
There was a problem hiding this comment.
Pull request overview
This PR completes phase 7a-3 of the transparent install flow by routing firmware/install (and bulk install) through the build scheduler so installs can be offloaded to an idle, connected, approved remote build peer, while keeping the actual flashing step local by downloading the receiver’s artifacts and running esphome upload --file ... on the offloader.
Changes:
- Route
FirmwareController.install()/install_bulk()throughpick_build_path()and stamp REMOTE jobs withsource_pin_sha256+source_labelwhen a suitable pairing is available (with serial ports forcing LOCAL). - Extend the REMOTE runner to support
UPLOAD/INSTALLby downloading artifacts, extractingfirmware.bin, and running a localesphome upload --filesubprocess with the existing tracked-subprocess + output ingestion plumbing. - Consolidate artifact tarball pack/unpack helpers into
controllers/remote_build/artifacts_tarball.pyand update call sites/tests accordingly.
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| esphome_device_builder/controllers/firmware/controller.py | Routes install/bulk install through the scheduler and stamps jobs with LOCAL/REMOTE source metadata. |
| esphome_device_builder/controllers/firmware/helpers.py | Factors serial-port detection into _is_serial_port() for reuse by validation and install routing. |
| esphome_device_builder/controllers/firmware/remote_runner.py | Adds REMOTE UPLOAD/INSTALL chain: download artifacts → extract firmware.bin → local upload subprocess. |
| esphome_device_builder/controllers/remote_build/controller.py | Adds build_scheduler_snapshot() + get_pairing() and switches WS artifact unpack to shared tarball helpers. |
| esphome_device_builder/controllers/remote_build/artifacts_tarball.py | New module owning the pack/unpack/extract helpers for the artifacts tarball wire format. |
| esphome_device_builder/controllers/remote_build/artifacts_download.py | Updates receiver-side download sender to use shared pack_build_artifacts() helpers/types. |
| tests/controllers/firmware/conftest.py | Ensures test DeviceBuilder stubs include remote_build=None to match pre-start production shape. |
| tests/controllers/firmware/test_install.py | Adds scheduler-routing tests for LOCAL/REMOTE decisions and serial-port force-local behavior. |
| tests/controllers/firmware/test_remote_runner.py | Adds REMOTE UPLOAD/INSTALL runner tests including artifact download, tarball errors, upload failures, and cancel-during-upload. |
| tests/controllers/firmware/test_rename_lock.py | Updates install-bulk test stub to include remote_build=None for scheduler integration. |
| tests/test_remote_build_artifacts_download.py | Updates tests to use new shared tarball helper names/paths and types. |
| tests/test_remote_build_controller.py | Adds tests for build_scheduler_snapshot() immutability semantics and get_pairing(). |
13 tasks
Five issues + the blockbuster mkdir CI failure:
1. **Blocking ``os.mkdir`` in async runner.** CI caught
``tempfile.TemporaryDirectory()`` running synchronously
on the event loop. Use ``tempfile.mkdtemp`` + manual
``shutil.rmtree`` cleanup via ``run_in_executor`` so the
blocking syscalls never touch the loop.
2. **``run_remote_compile_job`` is misleadingly named.** It
now handles COMPILE / UPLOAD / INSTALL. Renamed to
``run_remote_job``; import in ``controller.py`` and
every test reference follow.
3. **Successful REMOTE COMPILE never stamped ``exit_code``.**
Local compiles always stamp the subprocess's exit, and
the legacy ``follow_job`` framing coerces ``None`` to a
failure code. Set ``job.exit_code = 0`` on the COMPILE
success path so downstream consumers don't misread a
successful remote compile as a failure.
4. **``extract_firmware_bin`` decompression-bomb vector.**
The tarball comes from a peer-controlled source, and
gzip can shrink huge zero-filled / sparse data to a tiny
on-the-wire payload. Added a ``_check_member_size``
per-member + cumulative cap against
``FIRMWARE_MAX_TOTAL_BYTES`` so a malformed tarball
declaring multi-GiB members bails before ``extractfile``
reads a single byte. ``read_artifacts_tarball`` gets the
same guard. ``getmember("firmware.bin")`` replaces the
linear-scan loop while we're in there.
5. **``_finalise_compile`` spelling inconsistency.** Sister
helper ``_finalize_cancelled`` uses American spelling;
renamed to ``_finalize_compile``, then generalised to
``_finalize_success`` since the UPLOAD/INSTALL success
path now uses the same helper (DRY pass — see #6).
6. **DRY duplication on terminal-event fires.** The runner
had three near-identical "mark COMPLETED + fire
``JOB_COMPLETED``" and two near-identical "mark FAILED
+ set error + fire ``JOB_FAILED``" sites. Folded the
COMPILE branch, the UPLOAD-success branch, into
``_finalize_success``; folded the UPLOAD-failed branch
and the receiver-reported-failed branch into the
existing ``_fail_locally`` (which is also
cancel-aware, so a Stop that races a failure now
finalises CANCELLED on every path).
7. **``bus: Any`` typing.** Tightened
``_run_upload_subprocess`` to ``bus: EventBus`` matching
the rest of the module.
8. **Cancel-before-spawn early check.** A Stop that
landed between the receiver's completed frame and the
subprocess spawn used to start the upload and
immediately terminate it. ``_fetch_and_run_local_upload``
now checks ``_cancel_requested`` before staging the
tmpdir and routes through ``_finalize_cancelled``,
skipping the wasted spawn.
9. **Env vars duplicated between LOCAL and remote upload
paths.** Extracted to
``constants.ESPHOME_SUBPROCESS_ENV``; both call sites
now build env as ``{**os.environ, **ESPHOME_SUBPROCESS_ENV}``
so the two streams look identical to subscribers
regardless of which CPU produced the bytes.
10. **No ``install_bulk`` scheduler-integration test.** Added
two: bulk-routing-to-REMOTE-when-pairing-idle, and
serial-port-forces-every-config-LOCAL.
2946 tests pass; full firmware coverage holds.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #568 +/- ##
========================================
Coverage 99.15% 99.16%
========================================
Files 80 81 +1
Lines 10378 10495 +117
========================================
+ Hits 10290 10407 +117
Misses 88 88
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
This was referenced May 11, 2026
Codecov flagged two remaining uncovered lines: 1. ``controller.py:1572`` — ``_resolve_install_source``'s "scheduler picked a pin but ``get_pairing`` returned None" branch (TOCTOU window where an ``unpair`` lands between the scheduler snapshot read and the ``get_pairing`` lookup). Test: ``test_install_falls_back_to_local_when_scheduler_picked_pin_disappeared``. 2. ``remote_runner.py:595`` — the post-spawn in-context-manager cancel check that fires ``_terminate_current_process`` when a cancel landed *during* the staging executor hops (mkdtemp + write_bytes) between the pre-spawn early-check and the spawn-returns moment. Tested by patching ``Path.write_bytes`` to flip ``_cancel_requested`` as a side effect so the cancel lands in exactly that gap. All three 7a-3 modules now 100% covered.
1. ``build_scheduler_snapshot`` docstring claimed ``StoredPairing`` is a frozen dataclass; it's actually a mutable ``@dataclass`` edited in place (``_apply_pair_status_result``, ``_commit_endpoint_rebind``). Rewrote the docstring to describe what the shallow copy actually buys (mapping-membership isolation) vs what it doesn't (the rows themselves remain shared). The scheduler today is the only consumer and runs sync on the same event-loop tick as the install handler, reading scalar fields no in-flight call mutates — so the current shape is sound; the doc now spells that out + flags the ``PairingSummary`` projection as the deliberate alternative if a future consumer needs a deeper snapshot. 2. ``extract_firmware_bin`` previously gated on ``tar.extractfile(member) is None`` only — which *follows* symlinks + hardlinks transparently and returns a readable stream for them. A hostile peer could write a ``firmware.bin`` symlink pointing at ``../../etc/passwd`` and the runner would happily flash whatever the link resolved to on the receiver's filesystem. Added an explicit ``member.isfile()`` gate matching ``_read_tarball_member``'s shape on the general-purpose unpack path, then dropped the now-unreachable ``is None`` belt-and-braces (cast-to-BufferedReader instead — same pattern the helper above already uses). Test: ``test_extract_firmware_bin_raises_when_firmware_is_a_symlink``. 3. ``_execute_remote_job`` docstring still said "COMPILE-only — UPLOAD / INSTALL go through 7a-3". 7a-3 is this PR — UPLOAD / INSTALL are supported. Rewrote to describe the actual dispatch shape (compile-only via ``submit_job(target="compile")`` for both, then local ``esphome upload --file`` for UPLOAD / INSTALL on receiver-completed). 100% coverage holds across all three modules.
This was referenced May 11, 2026
bdraco
added a commit
that referenced
this pull request
May 14, 2026
Four CLAUDE.md fixes: 1. Move ``_send_initial_queue_status`` to the bottom of the module (after ``handle_cancel_job``) so it sits with the other private helpers; CLAUDE.md says public API on top, underscore-prefixed helpers at the bottom. 2. Trim the ``register_peer_link_session`` docstring; drop the ``#568 regression`` cross-reference and replace the ``_peer_queue_status`` / ``pick_build_path`` mechanism restatement with a one-line caller-visible contract. 3. Trim the ``_send_initial_queue_status`` docstring to a single line (the previous version had "Single-session counterpart of broadcast_queue_status" framing CLAUDE.md explicitly calls out). 4. Drop the ``Same shape — discard any in-flight artifacts download`` comment; the parallel-block framing is exactly what CLAUDE.md asks comments to avoid, and the code below it (``discard_session(session.dashboard_id)``) reads obviously.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this implement/fix?
Phase 7a-3 of issue #106. Closes the transparent-install loop the 7a-2b runner (#560) left half-wired. The user-visible payoff: with a healthy paired build server, clicking Install on a device transparently dispatches the compile to that server and flashes locally with the receiver's bytes — no extra UI to discover, no separate Send-builds dance.
Two coupled pieces
(a)
firmware/installscheduler integrationFirmwareController.install/install_bulknow route throughhelpers.build_scheduler.pick_build_pathbefore queueing. When an APPROVED + peer-link-connected + idle pairing is available, the constructedFirmwareJobcarriessource=REMOTE+source_pin_sha256=<pin>+source_label=<receiver_label>; otherwise the job stayssource=LOCALand runs through the existing in-process subprocess pipeline. Silent fallback by design — the user doesn't pick a build location; the scheduler routes transparently.Serial
portvalues (/dev/ttyUSB0,COM3, etc.) force LOCAL. The REMOTE flash step usesesphome upload --file <staged_firmware.bin>which routes throughupload_using_esptoolas a singleFlashImage(offset="0x0"). That works for OTA / web_server pushes but corrupts an ESP32 wired flash, which needs the multi-image bootloader / partitions / OTA-data set stitched fromidedata.extra.flash_images. Until the runner can stage the full multi-image set in a way the non---filepath resolves cleanly, serial REMOTE installs stay LOCAL.New on
RemoteBuildController:build_scheduler_snapshot()— frozenBuildSchedulerInputsview of_pairings/_open_peer_links/_peer_queue_statusplus theremote_builds_enabledmaster toggle (hardcodedTruefor now — the 7b Settings UI lands the proper toggle alongside the per-host / force-local / version-mismatch knobs).get_pairing(pin_sha256)— syncStoredPairinglookup the install resolver uses to stampsource_labelon the new job.(b) Runner UPLOAD / INSTALL chain
remote_runner.run_remote_compile_jobpreviously rejected anything that wasn'tJobType.COMPILE. Now it also acceptsUPLOADandINSTALL:client.submit_job(target="compile")(per § Transparent install flow's load-bearing "receiver only ever compiles" policy).client.download_artifacts(job_id=...).firmware.binto a per-job tmpdir via the sharedextract_firmware_binhelper.esphome upload --device <port> --file <staged>throughFirmwareController._tracked_subprocess— the same plumbing the local subprocess path uses, so:_ingest_output_line(one event stream for follow_job / firmware-tasks regardless of which CPU produced the bytes)._terminate_current_processlands SIGTERM on the upload chain.download_artifactsreject, malformed tarball, missingfirmware.bin, non-zeroesphome uploadexit) route through_fail_locallyand honour cancel intent if the user already clicked Stop.Refactor — tarball helpers consolidated
The receiver-side pack helpers (
_pack_build_artifacts/_PackedArtifacts, previously inartifacts_download.py) and the offloader-side unpack helpers (_unpack_artifacts_response/_UnpackArtifactsError/_read_artifacts_tarball+ six private helpers, previously inremote_build/controller.py) moved to a newcontrollers/remote_build/artifacts_tarball.pymodule so the wire format is owned in one place. The runner imports the sameextract_firmware_binhelper the WS unpacker uses instead of duplicating the tarfile-open logic.artifacts_download.pykeeps the streaming flow +ArtifactsDownloadSender;controller.pykeeps the WS-side dispatch +_download_artifacts_error_to_command_error. Test imports / monkeypatch paths follow.Tracking
Filed #562 to upstream
esphome.__main__.get_port_typeto a stable public module so we can drop our local_is_serial_port. The matching upstream PR + dep-bump lands later; the local classifier is fine as a holding pattern.Related issue or feature (if applicable):
Types of changes
bugfixnew-featureenhancementbreaking-changerefactordocsmaintenancecidependenciesFrontend coordination
No new WS commands, no wire-shape changes. The install dialog's "Building on {receiver_label}" sub-line lands in a small lockstep frontend PR —
job.source_labelis the existing field added in #556, so the frontend just reads it.Checklist
ruff,codespell, yaml/json/python checks).tests/where applicable.components.jsonhas not been hand-edited (regenerate viascript/sync_components.pyif a sync is needed).docs/ARCHITECTURE.mdand/ordocs/API.md.