fix(verifier): accept socket-activated services and stop ConfigFile alias misattribution#31
Merged
Merged
Conversation
…lias misattribution Two coupled bugs surfaced together when running `core-ops apply` on a host with traefik that uses systemd socket activation: 1. `verify_workload` required the runtime `.service` unit to be Active. Socket-activated services are correctly Inactive until first connection; verification now accepts Inactive when at least one of the service's triggering `.socket` units is Active. A Failed service is never accepted even with Active sockets. 2. `desired_target_aliases` populated `runtime_unit -> managed_id` entries for every workload, including ConfigFile and SocketDropIn. The catch-all in `systemd_unit_for_quadlet_file` synthesises `<stem>.service` from any unrecognised extension, so `/etc/foo/foo.toml` produced `foo.service` and collided with the real container's runtime unit name. Last-write-wins re-pointed verification failures from the recoverable container to the non-recoverable config file path, which both misattributed the failure in the apply report and suppressed the recovery `StartUnit` action. ConfigFile and SocketDropIn workloads are now skipped when populating the alias map. Adds an accepted regression scenario at tests/fixtures/verification/scenarios/accepted-socket-activated-trigger.yaml backed by a single-revision history fixture: external `systemctl stop` of a socket-activated service must still verify as converged, and the apply output must not surface a `[!] config/etc/...` failure marker for the config file. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 04e42f51a6
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…write-wins Codex P2 review on #31: `Service=` is a single-valued systemd directive, and when a socket has drop-ins on disk at /etc/systemd/system/<sock>.d/*.conf, later assignments override earlier ones (an empty `Service=` resets to the default `<stem>.service`). The first-match resolver could pick the base unit's target even when a drop-in retargeted the socket — causing the verifier to bless socket-activation acceptance for the wrong service and fail the actual one. The resolver now walks the base socket contents followed by every SocketDropIn workload under `<socket>.d/` sorted lex by file name, taking the last non-empty assignment and treating empty `Service=` as a reset. Adds two unit tests covering the override case and the empty-reset case. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
verify_workloadnow accepts a service inInactivestate when at least one of its triggering.socketunits isActive— socket-activated services are correctly Inactive until first connection.Failedis still always a verification failure.desired_target_aliasesno longer populatesruntime_unit -> managed_identries forConfigFileandSocketDropInworkloads. Without this skip, the catch-all insystemd_unit_for_quadlet_filesynthesised a<stem>.servicealias from any unrecognised extension (e.g./etc/foo/foo.toml→foo.service), which collided with the real container's runtime unit and last-write-wins re-pointed verification failures from the recoverable container onto the non-recoverable config-file path. That misattributed the failure in the apply report and suppressed the recoveryStartUnitaction.How this surfaced
On a host applying the matrix-homeserver source repo,
core-ops applyreported:— a config file flagged as failed because of a service unit's runtime state.
traefik.servicewas correctly Inactive (socket-activated, no traffic yet), but the verifier didn't model socket activation, and the alias map then routed the failure totraefik.tomlwhose stem coincidentally matched.Test plan
cargo test --all-targets— 447 / 447 passcargo clippy --all-targets -- -D warnings— cleancargo run --bin core-ops-release -- validate --base-ref master—Outcome: passed, declared bumppatchmatches requiredtests/unit/test_verification.rscover socket-activation acceptance (Inactive-with-Active-socket → success, Failed-state → still fails, all-sockets-Inactive → still fails) and alias-misroute regressions (ConfigFile and SocketDropIn cases)tests/fixtures/verification/scenarios/accepted-socket-activated-trigger.yamlexecuted against a live VM viacore-ops-verify run: all 9 assertions passed, teardown complete. The decisive step — second apply aftersystemctl stop frontend.service— emits3 unchanged / Outcome: converged(pre-fix, this would have been[!] config/etc/frontend/frontend.toml failed / Outcome: non-converging).Notes
Cargo.tomlbumped 2.1.0 → 2.1.1;tests/fixtures/provenance_state/valid-success.jsonfollows.changes/fix-socket-activated-verification.md(release_intent: patch).🤖 Generated with Claude Code