Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

keyring: support prepublishing keys #23577

Merged
merged 4 commits into from
Jul 19, 2024
Merged

Conversation

tgross
Copy link
Member

@tgross tgross commented Jul 12, 2024

When a root key is rotated, the servers immediately start signing Workload Identities with the new active key. But workloads may be using those WI tokens to sign into external services, which may not have had time to fetch the new public key and which might try to fetch new keys as needed.

Add support for prepublishing keys. Prepublished keys will be visible in the JWKS endpoint but will not be used for signing or encryption until their PublishTime. Update the periodic key rotation to prepublish keys at half the root_key_rotation_threshold window, and promote prepublished keys to active after the PublishTime.

This changeset also fixes three bugs in periodic root key rotation and garbage collection, none of which can be safely fixed without implementing prepublishing:

  • Periodic root key rotation would never happen because the default root_key_rotation_threshold of 720h exceeds the 72h maximum window of the FSM time table. We now compare the CreateTime against the wall clock time instead of the time table. (We expect to remove the time table in future work, ref GC limits > 3 days are in effect infinite b/c of FSM timetable limit #16359)
  • Root key garbage collection could GC keys that were used to sign identities. We now wait until root_key_rotation_threshold + root_key_gc_threshold before GC'ing a key.
  • When rekeying a root key, the core job did not mark the key as inactive after the rekey was complete.

Ref: https://hashicorp.atlassian.net/browse/NET-10398
Ref: https://hashicorp.atlassian.net/browse/NET-10280
Fixes: #19669
Fixes: #23528
Fixes: #19368


Notes for reviewers:

  • This is strictly speaking an enhancement but because it's the only way to safely fix the 3 bugs listed (i.e. without orphaning Workload Identities), I'm planning on backporting it to supported Nomad Enterprise versions. Alternately, we could remove periodic key rotation entirely from those older versions and that would "fix" the bug there.
  • I'll need to manually backport the docs changes to 1.7.x and 1.6.x CE branches.

Copy link
Contributor

@pkazmierczak pkazmierczak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Great work! I left some questions/comments, but nothing major.

nomad/core_sched.go Show resolved Hide resolved
nomad/core_sched.go Show resolved Hide resolved
nomad/structs/keyring.go Show resolved Hide resolved
Copy link
Member

@schmichael schmichael left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CLI comment isn't a blocker but give it a quick think 😁

nomad/core_sched.go Outdated Show resolved Hide resolved
nomad/core_sched.go Show resolved Hide resolved
nomad/core_sched.go Outdated Show resolved Hide resolved
nomad/encrypter.go Show resolved Hide resolved
nomad/structs/keyring.go Outdated Show resolved Hide resolved
nomad/structs/keyring.go Outdated Show resolved Hide resolved
command/operator_root_keyring_rotate.go Show resolved Hide resolved
When a root key is rotated, the servers immediately start signing Workload
Identities with the new active key. But workloads may be using those WI tokens
to sign into external services, which may not have had time to fetch the new
public key and which might try to fetch new keys as needed.

Add support for prepublishing keys. Prepublished keys will be visible in the
JWKS endpoint but will not be used for signing or encryption until their
`PublishTime`. Update the periodic key rotation to prepublish keys at half the
`root_key_rotation_threshold` window, and promote prepublished keys to active
after the `PublishTime`.

This changeset also fixes two bugs in periodic root key rotation and garbage
collection, both of which can't be safely fixed without implementing
prepublishing:

* Periodic root key rotation would never happen because the default
  `root_key_rotation_threshold` of 720h exceeds the 72h maximum window of the FSM
  time table. We now compare the `CreateTime` against the wall clock time instead
  of the time table. (We expect to remove the time table in future work, ref
  #16359)
* Root key garbage collection could GC keys that were used to sign
  identities. We now wait until `root_key_rotation_threshold` +
  `root_key_gc_threshold` before GC'ing a key.
* When rekeying a root key, the core job did not mark the key as inactive after
  the rekey was complete.

Ref: https://hashicorp.atlassian.net/browse/NET-10398
Ref: https://hashicorp.atlassian.net/browse/NET-10280
Fixes: #19669
Fixes: #23528
Fixes: #19368
@tgross tgross force-pushed the keyring-rotation-and-prepublish branch from bb8c57d to cae6cc2 Compare July 19, 2024 15:34
@tgross tgross merged commit 2f43534 into main Jul 19, 2024
21 checks passed
@tgross tgross deleted the keyring-rotation-and-prepublish branch July 19, 2024 17:29
tgross added a commit that referenced this pull request Jul 19, 2024
Backports required documentation bits from #23577 because the PR was only
automatically backported to the 1.7.x+ent branch.
tgross added a commit that referenced this pull request Jul 19, 2024
Backports required documentation bits from #23577 because the PR was only
automatically backported to the 1.6.x+ent branch.
@tgross tgross modified the milestones: 1.8.x, 1.8.3 Jul 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/ent/1.6.x+ent Changes are backported to 1.6.x+ent backport/ent/1.7.x+ent Changes are backported to 1.7.x+ent backport/1.8.x backport to 1.8.x release line theme/keyring type/bug type/enhancement
Projects
None yet
3 participants