Add git-backed YAML version history#1149
Conversation
New controllers/version_history/ package: a subprocess git wrapper (GitRepo) plus an async controller wired into DeviceBuilder. GitRepo probes for the git binary (feature self-disables if absent), then either adopts an enclosing work tree — covering /config/esphome already being a repo, or sitting inside one such as /config — or initializes a fresh repo with a default .gitignore. It never rewrites a pre-existing .gitignore, never writes user.* into git config (commit identity is passed per-invocation with git -c), and skips the user's hooks/signing (--no-verify, commit.gpgsign=false). Commits are pathspec-scoped (git add -A -- <paths> + git commit -- <paths>) so an automatic commit can never sweep the user's unrelated staged edits into our history. The controller serializes all index ops behind a lock and runs them in an executor; every op is best-effort and swallows failures so history can't break a save. Read helpers (log/show/diff/deleted) back the upcoming history UI.
Wire the version-history controller into the YAML mutation flow with
two complementary triggers:
- Dashboard mutation sites commit immediately with a rich message:
editor save ("Edit X"), add-component ("Add <id> to X"), friendly-
name edit, create, clone, and import. _persist_yaml_mutation grew an
optional message and a _commit_history helper that no-ops when the
feature is disabled.
- A scanner-driven catch-all covers edits made outside the dashboard
(VS Code, the HA File Editor). It subscribes to DEVICE_ADDED /
UPDATED / REMOVED — which fire only on a real on-disk cache-key
change, not on mDNS/ping state ticks — and debounces before
committing. A dashboard save has already committed by the time the
debounced flush runs, so the catch-all becomes a no-op there;
pathspec-scoped, idempotent commits make the de-dup automatic.
Fresh-init now seeds the existing configs as the initial snapshot so
each device has a first version immediately. Test fixtures that build
a DevicesController with a mock _db default version_history to None.
delete_single and archive_single now commit the removal of the
top-level YAML to version history ("Delete X" / "Archive X"). The
file's pre-removal content remains recoverable via git history even
though its regenerable build artifacts (build tree, StorageJSON +
idedata + validated caches) are wiped as before. The archive/ folder
stays the browsable parked-configs list; git history is the
time-machine that also covers hard deletes and external rm.
Five commands on the version-history controller back the History pane and the restore-deleted view: - version_history/list_versions — commit history for a config - version_history/get_version — content at a commit - version_history/get_diff — unified diff vs the working copy - version_history/list_deleted — configs in history but absent on disk - version_history/restore — revert to a commit (or, with no sha, the latest surviving version), recreating a deleted file Restore writes through a new public DevicesController.apply_restored_yaml so it reuses the persist → scan → event pipeline and the restore is itself committed; the device row updates via normal events. Commit ids are validated as plain hex before reaching git. Reads are transient git queries, so they're list_*-style commands rather than subscribe_events state. Documented in docs/API.md.
Merging this PR will not alter performance
Comparing Footnotes
|
Three async tests asserted on GitRepo.log_file() called synchronously, which runs subprocess (os.read) on the event-loop thread — tripping blockbuster on Linux CI. Route them through the executor-backed list_versions command instead.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1149 +/- ##
==========================================
+ Coverage 99.41% 99.42% +0.01%
==========================================
Files 210 213 +3
Lines 15467 15829 +362
==========================================
+ Hits 15376 15738 +362
Misses 91 91
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
- Extract _commit_argv so the identity / no-hooks / no-signing commit prefix isn't duplicated between commit_paths and the seed commit. - Add _in_executor to drop the repeated get_running_loop/run_in_executor boilerplate across the WS read commands and commit. - Fix a debounce race: an external edit landing while the flush is committing was queued but, since the flush task wasn't done, no new flush got scheduled — it sat until the next scanner event. The flush now drains in a loop; the final empty-check and return happen without an await between them, so nothing can be stranded. Regression test added.
Add a test proving a flag-like commit message and a leading-dash filename can't be reparsed as git options (everything is passed via argv, never a shell).
Match helpers.subprocess: the default close_fds=True makes the child iterate the fd table before exec — pure overhead on memory-pressured systems — and our git spawns don't rely on inherited fds being closed.
# Conflicts: # esphome_device_builder/controllers/devices/add_component.py
Add tests for the paths the suite was missing: controller stop() lifecycle (listener detach + flush cancel), git error resilience (discover_or_init tolerates OSError, commit_paths swallows a failed git commit), the not-found restore / get_version paths, the catch-all surviving one failing config in a batch, deleted_files filtering out nested includes and non-config files, and the disabled-repo read guards. Branch coverage of the package is now 98%.
Cover the last defensive branches: a pre-existing .gitignore preserved on fresh init, the seed-commit no-op when nothing is staged, deleted_files on a repo without commits, the _rel_to_toplevel out-of-tree fallback, and stop() with no flush pending.
The version-history restore tests mock apply_restored_yaml, so the real method body went uncovered. Add a hook test driving it directly: it writes the restored content back and records a 'Restore' commit.
PR Review — Add git-backed YAML version historySolid, conservative implementation of git-backed version history. Merge-ready; one minor doc-accuracy nit.
🟢 Suggestions1. Lock dict is bounded by distinct filenames ever written, not live device count (`esphome_device_builder/controllers/devices/controller.py`, L117-119)The comment states the For a normal dashboard this is genuinely negligible. But a long-lived process that churns through many create/delete cycles with unique filenames (e.g. repeated import → adopt → delete, or test-harness-style automation) grows the dict without bound — it tracks historical names, not the current fleet. Not a blocker. Either:
The accurate-comment option is fine; the current wording just understates the bound. Checklist
Automated review by Kōan79713a7 |
…failures Address review on #1149: - Make the security comment accurate: sha is regex-validated; the separately-guarded configuration goes through settings.rel_path (rejects ../absolute escaping the config dir) and only reaches git as a pathspec after --. Add a test that the read/restore commands reject a traversal configuration with INVALID_ARGS. - stop() now drains the debounce queue (shared _flush_pending, reused by the timer path) and awaits the cancelled flush task, so an external edit landing in the debounce window isn't dropped on shutdown. - Raise the catch-all failure log from debug to warning+exc_info so a persistently failing watcher is visible at default levels.
The feature only add/commits (plus read-only log/show/diff/rev-parse); no checkout/reset/stash/rm. Add a regression test asserting a user's uncommitted, unstaged on-disk edits are byte-for-byte preserved across an automatic commit.
The fresh-init seed (git add -A) could commit our own secrets — the peer-link identity key, receiver/offloader peer credentials, sidecars, locks — plus .DS_Store, whenever the config dir already had a (stock or foreign) .gitignore that didn't cover them, since we leave an existing .gitignore untouched. Fix: force our machine-state/secret patterns into the repo's local .git/info/exclude on both init and adopt. info/exclude is git's repo-local, never-committed ignore, so it protects the key regardless of the user's .gitignore and without mutating anything they track. Broaden the default .gitignore (written only on fresh init when none exists) to match: .device-builder*, .receiver_peers.json, .offloader_pairings.json, .DS_Store. secrets.yaml stays in the visible default gitignore (user's call to version) rather than the forced exclude. Also document that _persist_yaml_mutation now awaits a git commit inline (recovery guarantee; adds that latency to the save).
A config dir often holds large non-config files (logs, databases, media) that have no business in git history — and history is forever. The seed now stages only top-level YAML configs plus our .gitignore, matching the dashboard's unit of versioning, instead of git add -A. secrets.yaml and the CORE sentinel are left out. Adopted repos are unaffected (they never seed). YAML glob patterns are now a constant.
Address re-review on #1149: - Attach a done-callback to the debounce flush task so a failure that escapes the per-config guard (a drain-logic bug, not a git error) is logged instead of dying as an unretrieved task exception — the catch-all is the only recorder for external edits. - restore() now flushes the debounce queue first, so restoring over an external edit still in the debounce window leaves that just-overwritten version recoverable in history. - Correct the _flush_pending guard comment: git failures are logged in commit() (which returns None); this guard isolates the rarer bad configuration so one entry can't strand the batch.
Address re-review on #1149 (the dead catch-all warning, root-caused): commit() / commit_paths() no longer swallow git errors and return the same None as 'nothing changed'. A genuine git failure now raises; None means only 'nothing to commit'. The best-effort swallow moves to the boundaries that must never break a save: - DevicesController._commit_history (mutation path) catches + logs, so a git hiccup costs a recoverable history gap for that one save, never the save itself. - _flush_pending (scanner catch-all) catches + WARNs — now reachable on a real git failure, so a persistently broken external-edit recorder is visible instead of dying silently. Tests updated for the new contract. 100% branch coverage held. Surfacing a persistent-failure health signal to a future History pane is deferred — no frontend consumer yet; the failure is logged.
Address re-review on #1149 (two non-blocking suggestions): - Serialize each YAML write with its history commit behind a per-file lock. commit_paths stages on-disk content, so without this a second concurrent writer to the same configuration could slip between the first save's write and commit and win the history slot, dropping a version. Regression test: two concurrent saves to one file each get their own content committed. - Trim rationale out of the docstrings I'd added per the project style guide; load-bearing contract kept, narrative dropped. Surfacing a persistent-failure signal to a future History pane stays deferred — no frontend consumer yet; failures log at ERROR/WARNING.
The test's record stub reads the file from inside the (package-frame) commit path; a blocking read there trips blockbuster on Linux CI and gets swallowed by _commit_history, leaving the assertion empty. Read via asyncio.to_thread so it runs off the event loop.
… win Address re-review on #1149: - _commit_history now swallows only genuine git/subprocess errors (OSError, CalledProcessError); a programming bug (Attr/Type) propagates instead of being mislabelled as a recoverable history gap. - Initialize _yaml_write_locks in __init__ (drop the __dict__.setdefault smell; fixtures set it explicitly) and note the per-device bound is intentional, not a leak. - A dashboard commit now calls version_history.discard_pending() so its rich message supersedes the catch-all's generic 'external edit' one when both target the same file within the debounce window. Tests updated/added (git-error swallowed, programming bug propagates, discard_pending). 100% branch coverage held.
Fix a regression: _commit_history dropped the queued catch-all entry before the commit attempt, so a swallowed git failure left the save unversioned by both the rich commit and the fallback. Now the pending entry is dropped only on success; on failure it stays so the debounced flush still records the content. Also document in _commit_history that it doesn't take the per-file write lock — the editor-save path holds it across write+commit, while delete/archive (inherently racy delete-while-editing) run unserialised.
The catch-all flush swallowed bare Exception while the inline save path narrows to git/subprocess errors. Narrow _flush_pending to the same set so a programming bug propagates to the flush task's done-callback rather than being masked as a routine 'catch-all failed' warning. Both paths now use a shared GIT_COMMIT_ERRORS tuple (defined in git_repo where these are raised, re-exported from the package) instead of duplicating (OSError, subprocess.CalledProcessError). Tests: a git error keeps the batch going; a non-git bug surfaces via the callback.
___DASHBOARD_SENTINEL___.yaml (controllers/config/settings.py) is only assigned to CORE.config_path — it's never written to disk, so the seed glob can't match it. The filter was dead code duplicating that magic string; remove it. secrets.yaml stays filtered (real on disk, must not be committed under a foreign .gitignore).
- Route delete_single / archive_single's removal commit through the per-file _yaml_write_lock so it can't interleave with a concurrent editor save's commit on the same config (uniform with the editor-save path; closes the serialization-consistency gap). create/clone/import stay unserialised — new files can't race a concurrent same-config save. - Track consecutive git-commit failures in the version-history controller and flag it 'degraded' after a threshold, logging a single escalation at the crossing and on recovery. Exposes a 'degraded' property so a persistent breakage is distinguishable from a one-off hiccup; surfacing it to the user is the History-pane follow-up. Note: the catch-all except was already narrowed to GIT_COMMIT_ERRORS in an earlier commit (the review summary was stale on that point).
The dict retains a lock per distinct filename ever written this process lifetime (delete/archive don't evict), not per live device. State that accurately and note why eviction is unsafe (could desync a lock a concurrent save awaits).
There was a problem hiding this comment.
Pull request overview
Adds a backend “time machine” for device YAML files by recording each dashboard/external mutation into a git repo rooted at (or enclosing) the config directory, and exposes WS endpoints to list/diff/restore versions (including restoring deleted configs). This fits into the backend’s controllers model by introducing a dedicated VersionHistoryController and wiring it into existing device mutation flows.
Changes:
- Introduces
controllers/version_history/(git wrapper + async controller) and wires it intoDeviceBuilderstartup/shutdown. - Hooks device mutation paths to commit rich, per-action messages and adds per-file write/commit serialization to avoid losing versions under concurrent saves.
- Documents and tests the new WS API and safety guarantees (pathspec-scoped commits, staged-work preservation, graceful disable when git is absent).
Reviewed changes
Copilot reviewed 19 out of 19 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/test_featured_components.py | Updates test controller stubs for new YAML write lock + optional version history. |
| tests/controllers/version_history/test_git_repo.py | Adds coverage for repo discovery/init and pathspec-scoped commit safety. |
| tests/controllers/version_history/test_controller.py | Adds async controller tests for debounce catch-all, read APIs, restore behavior, and degraded-state handling. |
| tests/controllers/version_history/init.py | Adds package marker docstring for version-history tests. |
| tests/controllers/devices/test_version_history_hooks.py | Verifies device mutation sites record version-history commits and error handling semantics. |
| tests/controllers/devices/test_branches_coverage.py | Adjusts persist assertions for new message= parameter. |
| tests/controllers/devices/conftest.py | Updates DevicesController test factory to initialize new attributes and default version history off. |
| esphome_device_builder/device_builder.py | Wires VersionHistoryController into lifecycle and command collection. |
| esphome_device_builder/controllers/version_history/git_repo.py | Implements conservative synchronous git wrapper (adopt/init, excludes, scoped commits, read helpers). |
| esphome_device_builder/controllers/version_history/controller.py | Implements async, serialized commit API + WS read/restore commands + debounce catch-all for external edits. |
| esphome_device_builder/controllers/version_history/init.py | Exports controller + git error tuple. |
| esphome_device_builder/controllers/devices/mutations_simple.py | Adds commit message for friendly-name edit mutation. |
| esphome_device_builder/controllers/devices/mutations_create.py | Records creation commit after creating config/metadata. |
| esphome_device_builder/controllers/devices/mutations_clone.py | Records clone commit after writing cloned config/metadata. |
| esphome_device_builder/controllers/devices/importable.py | Records import commit after writing imported YAML. |
| esphome_device_builder/controllers/devices/controller.py | Adds per-file write locks, commits on update/restore, and commit helper with best-effort git error handling. |
| esphome_device_builder/controllers/devices/archive.py | Serializes archive/delete with per-file lock and records removal commits. |
| esphome_device_builder/controllers/devices/add_component.py | Records commit message for add-component mutations. |
| docs/API.md | Documents version_history/* WS commands and behavior when git is unavailable. |
| try: | ||
| toplevel = self._discover_toplevel() | ||
| if toplevel is not None: | ||
| self.toplevel = toplevel | ||
| self.enabled = True | ||
| self._ensure_local_excludes() | ||
| _LOGGER.debug("Adopted existing git work tree at %s", toplevel) | ||
| return | ||
| self._init_repo() | ||
| except OSError as exc: | ||
| _LOGGER.warning("Could not set up version-history git repo: %s", exc) |
| """Tests for the subprocess ``git`` wrapper behind version history. | ||
|
|
||
| The load-bearing guarantees these pin: | ||
| - A pre-existing repo is adopted, not re-initialised, and its | ||
| ``.gitignore`` is left untouched. | ||
| - Commits are pathspec-scoped, so the user's unrelated staged edits | ||
| never get folded into our automatic commit. | ||
| - A missing ``git`` binary disables the feature instead of crashing. | ||
| """ |
What does this implement/fix?
Motivation
We've shipped 3–4 separate bugs where the Visual Editor / YAML form blanked out a device's entire config during the beta cycle and then persisted that empty (or near-empty) content over the user's real YAML — a save that silently destroys hours of work with no way back short of the user's own backups. Today the only on-disk copy is the one we just clobbered.
This adds the recovery path: every save is committed to git, so a config the UI wipes is one
version_history/restoreaway from its previous content. It also covers the broader asks in esphome/discussions#3687 (per-file history, diff, restore) and external edits (VS Code, the HA File Editor), but the load-bearing reason to land it is the "the editor ate my YAML" class of bug.Scope — initial version, backend only
This is the first step: just start saving history so we can restore later if something goes wrong. There is no frontend yet — no History pane, no UI button. The WS commands (
version_history/*) exist and are tested so a config can be recovered today (via the API, or by hand withgitin the config dir), and so a future UI has something to call. The per-file History / diff / restore UI is a separate follow-up PR inesphome/device-builder-frontend. Nothing here changes the dashboard's appearance; it only records and exposes history.Summary
Implements git-backed per-YAML version history for the config directory, and makes the deletion model history-aware: a removed YAML stays restorable from git even after its build artifacts are gone.
New
controllers/version_history/package:GitRepo— a conservative subprocessgitwrapper. Probes for thegitbinary (the whole feature self-disables if it's absent), then adopts an enclosing work tree — covering/config/esphomealready being a repo, or sitting inside one such as/config— or initializes a fresh repo (seeding the existing configs as the first snapshot). On a fresh init it writes a.gitignorecovering.esphome/,.device-builder*.json, andsecrets.yaml(kept out of history since a local repo may later be pushed to a remote); a pre-existing.gitignoreis left untouched. It never writesuser.*into git config (commit identity is passed per-invocation withgit -c) and skips the user's hooks / signing.VersionHistoryController— async, lock-serialised commit API + the read/restore WS commands. Wired intoDeviceBuilder.Commits are pathspec-scoped (
git add -A -- <paths>+git commit -- <paths>), so an automatic commit can never sweep the user's unrelated staged edits into our history — the dominant safety concern for a pre-existing repo. There's a dedicated test for exactly that.Hybrid commit triggers:
device_added/device_updated/device_removed, which fire only on a real on-disk cache-key change, not on mDNS/ping state ticks. A dashboard save has already committed by the time the debounced flush runs, so the catch-all becomes a no-op there; idempotent commits make the de-dup automatic.Deletion is history-aware:
delete_single/archive_singlecommit the removal so the pre-removal content stays restorable. Thearchive/folder remains the browsable parked-configs list; git history is the time-machine that also covers hard deletes and externalrm.WS API (documented in
docs/API.md):version_history/list_versions,get_version,get_diff,list_deleted,restore. Restore reuses the normal persist → scan → event pipeline (via a new publicDevicesController.apply_restored_yaml) and is itself committed; it recreates a deleted file as well as reverting an edit.Best-effort throughout: a git failure is logged and never breaks a user's save.
Related issue or feature (if applicable):
Types of changes
bugfixnew-featureenhancementbreaking-changerefactordocsmaintenancecidependenciesFrontend coordination
New WS commands (
version_history/*) need a companion frontend PR to surface the per-file History pane (list / diff / restore) and the restore-deleted view. The backend is usable without it; the commands degrade gracefully (empty lists) when git is unavailable.Checklist
ruff,codespell, yaml/json/python checks).tests/where applicable.components.index.json/definitions/components/*.jsonhave not been hand-edited (regenerate viascript/sync_components.pyif a sync is needed).docs/ARCHITECTURE.mdand/ordocs/API.md.