Skip to content

server: populate EdgeDevConfig.controllercert_confighash#152

Merged
milan-zededa merged 1 commit into
lf-edge:masterfrom
eriknordmark:controllercert-confighash
May 11, 2026
Merged

server: populate EdgeDevConfig.controllercert_confighash#152
milan-zededa merged 1 commit into
lf-edge:masterfrom
eriknordmark:controllercert-confighash

Conversation

@eriknordmark
Copy link
Copy Markdown
Contributor

Summary

Adam never set EdgeDevConfig.controllercert_confighash on its config response. EVE's handleControllerCertsSha (pkg/pillar/cmd/zedagent/handlecertconfig.go:622-631) compares this field against the device's saved value and triggers a /certs refetch on mismatch — but with adam emitting the empty string, the trigger was dead. The only fast path to a /certs refetch was a signing-cert rotation that raised SenderStatusCertMiss from auth-envelope verification.

This made pure encrypt-cert rotation effectively unobservable on the device until the periodic controllerCertsTask fired (default CertInterval = 24h): the cipher decrypt path silently returns "Controller Certificate get fail" without triggering any refetch. See lf-edge/eve#5926 for the broader gap analysis.

Change

Compute base64.URLEncoding.EncodeToString(sha256(signing.pem || encrypt.pem)) and populate EdgeDevConfig.controllercert_confighash. Any rotation of either file flips the hash, EVE compares, schedules a refetch.

The hash is opaque to EVE — only equality matters — so hashing the raw file bytes is sufficient and avoids cert parsing. The field is set before ConfigHash is computed so a cert-chain rotation also flips ConfigHash; otherwise adam would return 304 Not Modified for any pure cert rotation that didn't otherwise change the device config and EVE would never see the new field.

The v1 API path passes an empty hash (v1 has no controller cert chain).

Test plan

  • go build ./... clean.
  • gofmt -l pkg/server/ clean.
  • go test ./pkg/server/... passes (no test files in this package today).
  • Manual: with this build of adam, run lf-edge/eden#1163's ctrl_encrypt_cert_change rotating only the encrypt cert (no signing rotation). Verify a freshly-deployed --use-encrypt-cert app reaches RUNNING — proves EVE refetched /certs via handleControllerCertsSha and decrypted with the new ECDH key.

🤖 Generated with Claude Code

EVE's handleControllerCertsSha (zedagent/handlecertconfig.go:622-631)
compares EdgeDevConfig.controllercert_confighash from each config pull
against the device's saved value, and triggers /certs refetch on a
mismatch. Adam never set this field, so the trigger was dead and the
only fast path to refetching /certs was the auth-envelope
SenderCertHash mismatch raised by signing-cert rotation. Pure
encrypt-cert rotation had no fast trigger - the cipher decrypt path
silently fails when the encrypt cert hash isn't in pubControllerCert,
the periodic controllerCertsTask defaults to CertInterval=24h. See
lf-edge/eve#5926.

Compute a base64-URL-encoded sha256 over the raw bytes of signing.pem
and encrypt.pem (the two files getAllCerts already serves to /certs).
Any rotation of either file changes the bytes, flips the hash, and
EVE's existing handleControllerCertsSha path schedules an immediate
/certs refetch. The hash is opaque - EVE only compares it for
equality - so file-byte hashing is sufficient and avoids parsing.

Set the field on EdgeDevConfig before configProcess computes
ConfigHash so a cert-chain change also flips ConfigHash. Otherwise
adam would return 304 Not Modified for any pure cert rotation that
didn't otherwise change the device config, and EVE would never see
the new controllercert_confighash to compare against.

The v1 API has no controller cert chain (it's used only by very old
EVE versions that don't sign config envelopes); pass an empty hash so
its config response keeps the historical zero-value behavior.

Signed-off-by: eriknordmark <erik@zededa.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
eriknordmark added a commit to eriknordmark/eden that referenced this pull request May 9, 2026
End-to-end test that rotates the controller's ECDH encryption cert
twice and verifies that user-data cipher blocks tagged with the
encrypt cert hash keep decrypting on the device. Companion to
ctrl_cert_change.txt: that test rotates the signing cert and
incidentally exercises the encryption path because Eden historically
reused signing-key.pem for ECDH derivation; this test deploys apps
with --use-encrypt-cert so their cipher contexts reference the
controller's encrypt cert (Type=3) and rotates that cert specifically.

Each rotation rotates *both* signing and encrypt certs because EVE's
only fast trigger for /certs refetch is signing-cert mismatch in the
auth-envelope SenderCertHash. The cipher decrypt path silently fails
when the encrypt cert hash is unknown without triggering refetch,
adam doesn't populate EdgeDevConfig.controllercert_confighash so
handleControllerCertsSha is dead, and the periodic
controllerCertsTask defaults to CertInterval=24h. See lf-edge/eve#5926
for the design gap and lf-edge/adam#152 for the fix; once that adam
patch ships in an Eden-tracked release, the test could be extended
with an encrypt-cert-only rotation step.

Order within each rotation: change-encrypt-cert first, then
change-signing-cert. change-encrypt-cert re-encrypts the
encrypt-tagged cipher blocks and writes the new encrypt files on
adam's disk. change-signing-cert is a no-op for those cipher blocks
(its reencryptConfigs filter skips them), but its on-disk signing-key
swap is what makes adam sign the next auth envelope with a key the
device hasn't seen, triggering SenderStatusCertMiss. By the time the
device refetches /certs, both new certs are on adam's disk and arrive
in a single round-trip.

Verification proceeds in three layers:

1. Fresh-app deployment (eclient2 after first rotation, eclient3
   after second). The app encrypts with the just-rotated ECDH key;
   reaching RUNNING means EVE successfully fetched the new encrypt
   cert into pubControllerCert and decrypted the cipher block.

2. check_encrypt_cert.sh walks /run/zedagent/ControllerCert/ (or
   /persist/status/zedagent/ControllerCert/ on EVE versions where
   that pubsub is Persistent: true) and byte-matches a Type=3 entry
   against encrypt-new.pem. The script normalizes for adam's
   strings.TrimSpace before computing sha256 to match the bytes
   actually published.

3. Reboot survival: after both rotations and a reboot, all three
   apps come back RUNNING and check_encrypt_cert.sh re-confirms the
   latest rotated encrypt cert is still advertised.

Two rotations exercise both controllercerts.bak code paths inside
EVE's MaybeSaveControllerCerts: the first rotation runs with no .bak
yet, the second runs with .bak from the first rotation present.

Registered as test 24/26 in tests/workflow/smoke.tests.txt.

Signed-off-by: eriknordmark <erik@zededa.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@eriknordmark eriknordmark marked this pull request as ready for review May 10, 2026 18:33
Copy link
Copy Markdown
Member

@uncleDecart uncleDecart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, @milan-zededa, any code paths needs to be fixed regarding multiple EVE instances?

@milan-zededa
Copy link
Copy Markdown
Contributor

LGTM, @milan-zededa, any code paths needs to be fixed regarding multiple EVE instances?

I believe the code is OK, h.signingCertPath and h.encryptCertPath are used for all the devices (so it is expected that they receive the same cert hash)

@milan-zededa milan-zededa merged commit 331e0dc into lf-edge:master May 11, 2026
2 checks passed
eriknordmark added a commit to eriknordmark/eden that referenced this pull request May 11, 2026
End-to-end test that rotates the controller's ECDH encryption cert
twice and verifies that user-data cipher blocks tagged with the
encrypt cert hash keep decrypting on the device. Companion to
ctrl_cert_change.txt: that test rotates the signing cert and
incidentally exercises the encryption path because Eden historically
reused signing-key.pem for ECDH derivation; this test deploys apps
with --use-encrypt-cert so their cipher contexts reference the
controller's encrypt cert (Type=3) and rotates that cert specifically.

Each rotation rotates *both* signing and encrypt certs because EVE's
only fast trigger for /certs refetch is signing-cert mismatch in the
auth-envelope SenderCertHash. The cipher decrypt path silently fails
when the encrypt cert hash is unknown without triggering refetch,
adam doesn't populate EdgeDevConfig.controllercert_confighash so
handleControllerCertsSha is dead, and the periodic
controllerCertsTask defaults to CertInterval=24h. See lf-edge/eve#5926
for the design gap and lf-edge/adam#152 for the fix; once that adam
patch ships in an Eden-tracked release, the test could be extended
with an encrypt-cert-only rotation step.

Order within each rotation: change-encrypt-cert first, then
change-signing-cert. change-encrypt-cert re-encrypts the
encrypt-tagged cipher blocks and writes the new encrypt files on
adam's disk. change-signing-cert is a no-op for those cipher blocks
(its reencryptConfigs filter skips them), but its on-disk signing-key
swap is what makes adam sign the next auth envelope with a key the
device hasn't seen, triggering SenderStatusCertMiss. By the time the
device refetches /certs, both new certs are on adam's disk and arrive
in a single round-trip.

Verification proceeds in three layers:

1. Fresh-app deployment (eclient2 after first rotation, eclient3
   after second). The app encrypts with the just-rotated ECDH key;
   reaching RUNNING means EVE successfully fetched the new encrypt
   cert into pubControllerCert and decrypted the cipher block.

2. check_encrypt_cert.sh walks /run/zedagent/ControllerCert/ (or
   /persist/status/zedagent/ControllerCert/ on EVE versions where
   that pubsub is Persistent: true) and byte-matches a Type=3 entry
   against encrypt-new.pem. The script normalizes for adam's
   strings.TrimSpace before computing sha256 to match the bytes
   actually published.

3. Reboot survival: after both rotations and a reboot, all three
   apps come back RUNNING and check_encrypt_cert.sh re-confirms the
   latest rotated encrypt cert is still advertised.

Two rotations exercise both controllercerts.bak code paths inside
EVE's MaybeSaveControllerCerts: the first rotation runs with no .bak
yet, the second runs with .bak from the first rotation present.

Registered as test 24/26 in tests/workflow/smoke.tests.txt.

Signed-off-by: eriknordmark <erik@zededa.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
eriknordmark added a commit to lf-edge/eden that referenced this pull request May 13, 2026
End-to-end test that rotates the controller's ECDH encryption cert
twice and verifies that user-data cipher blocks tagged with the
encrypt cert hash keep decrypting on the device. Companion to
ctrl_cert_change.txt: that test rotates the signing cert and
incidentally exercises the encryption path because Eden historically
reused signing-key.pem for ECDH derivation; this test deploys apps
with --use-encrypt-cert so their cipher contexts reference the
controller's encrypt cert (Type=3) and rotates that cert specifically.

Each rotation rotates *both* signing and encrypt certs because EVE's
only fast trigger for /certs refetch is signing-cert mismatch in the
auth-envelope SenderCertHash. The cipher decrypt path silently fails
when the encrypt cert hash is unknown without triggering refetch,
adam doesn't populate EdgeDevConfig.controllercert_confighash so
handleControllerCertsSha is dead, and the periodic
controllerCertsTask defaults to CertInterval=24h. See lf-edge/eve#5926
for the design gap and lf-edge/adam#152 for the fix; once that adam
patch ships in an Eden-tracked release, the test could be extended
with an encrypt-cert-only rotation step.

Order within each rotation: change-encrypt-cert first, then
change-signing-cert. change-encrypt-cert re-encrypts the
encrypt-tagged cipher blocks and writes the new encrypt files on
adam's disk. change-signing-cert is a no-op for those cipher blocks
(its reencryptConfigs filter skips them), but its on-disk signing-key
swap is what makes adam sign the next auth envelope with a key the
device hasn't seen, triggering SenderStatusCertMiss. By the time the
device refetches /certs, both new certs are on adam's disk and arrive
in a single round-trip.

Verification proceeds in three layers:

1. Fresh-app deployment (eclient2 after first rotation, eclient3
   after second). The app encrypts with the just-rotated ECDH key;
   reaching RUNNING means EVE successfully fetched the new encrypt
   cert into pubControllerCert and decrypted the cipher block.

2. check_encrypt_cert.sh walks /run/zedagent/ControllerCert/ (or
   /persist/status/zedagent/ControllerCert/ on EVE versions where
   that pubsub is Persistent: true) and byte-matches a Type=3 entry
   against encrypt-new.pem. The script normalizes for adam's
   strings.TrimSpace before computing sha256 to match the bytes
   actually published.

3. Reboot survival: after both rotations and a reboot, all three
   apps come back RUNNING and check_encrypt_cert.sh re-confirms the
   latest rotated encrypt cert is still advertised.

Two rotations exercise both controllercerts.bak code paths inside
EVE's MaybeSaveControllerCerts: the first rotation runs with no .bak
yet, the second runs with .bak from the first rotation present.

Registered as test 24/26 in tests/workflow/smoke.tests.txt.

Signed-off-by: eriknordmark <erik@zededa.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@eriknordmark eriknordmark deleted the controllercert-confighash branch May 15, 2026 20:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants