Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GC limits > 3 days are in effect infinite b/c of FSM timetable limit #16359

Open
stswidwinski opened this issue Mar 7, 2023 · 2 comments
Open
Labels
hcc/jira stage/accepted Confirmed, and intend to work on. No timeline committment though. theme/config type/bug

Comments

@stswidwinski
Copy link
Contributor

Nomad version

1.5.0 and anything prior.

Operating system and Environment details

Unix.

Issue

When Garbage collection limits are set to a value larger than 3 days, the Nomad Scheduler will never garbage collect the required object leading to infinite accumulation of data (and infinite memory and disk leak) and related resources (such as CSI volumes). The GC limits included are at least:

  1. https://developer.hashicorp.com/nomad/docs/configuration/server#eval_gc_threshold
  2. https://developer.hashicorp.com/nomad/docs/configuration/server#batch_eval_gc_threshold
  3. https://developer.hashicorp.com/nomad/docs/configuration/server#deployment_gc_threshold
  4. https://developer.hashicorp.com/nomad/docs/configuration/server#job_gc_threshold
  5. https://developer.hashicorp.com/nomad/docs/configuration/server#acl_token_gc_threshold
  6. https://developer.hashicorp.com/nomad/docs/configuration/server#csi_plugin_gc_threshold
  7. https://developer.hashicorp.com/nomad/docs/configuration/server#csi_volume_claim_gc_interval

The expected behavior is that it is possible to set garbage collection limits at a much larger maximal value than 3 days to allow for history build up and easier debugging.

The details of the bug.

At the time of garbage collection, Nomad will derive an approximate raft index which is used as a watermark for garbage collection. The mapping of time to such an index is handled uniformly via:

https://github.com/hashicorp/nomad/blob/v1.5.0/nomad/core_sched.go#L1133-L1143

This relies on fsm  and the TimeTable  which is initialized within. To be precise, the initialization of this table occurs here:

https://github.com/hashicorp/nomad/blob/v1.5.0/nomad/fsm.go#L170

With a hard-coded maximal time table limit:

https://github.com/hashicorp/nomad/blob/v1.5.0/nomad/fsm.go#L27-L29

If the limit is breached, the resolution of the index is going to default to zero:

https://github.com/hashicorp/nomad/blob/v1.5.0/nomad/timetable.go#L93-L106

Hence, thresholdIndex = 0  which results in any check of the form X.modifyIndex > thresholdIndex  to evaluate to true  resulting in no garbage collection. For instance, for eval s:

https://github.com/hashicorp/nomad/blob/v1.5.0/nomad/core_sched.go#L282-L288

Repro.

The simplest way to reproduce this behavior is by modifying the code to change the maximal time table limit of fsm  to something small and observe that no GC occurs for evaluations which should be GCed. A unit test of Fsm  or garbage collection may also be used to confirm the behavior.

@stswidwinski stswidwinski changed the title GC limits > 3 days are in effect infinite. GC limits > 3 days are in effect infinite (cause resources to never be GCed) Mar 7, 2023
@tgross
Copy link
Member

tgross commented Mar 7, 2023

Hi @stswidwinski! That's certainly a nasty bug. I'm pretty sure the reason we limit the time table to 72h is to avoid having infinite growth of that table, but yeah that definitely assumes that we're not setting thresholds greater than that. It'd probably be reasonable to have the configuration find the oldest GC threshold and double it in the FSM configuration, but we'd want to document warnings around that this will potentially allow a good bit of memory growth.

@tgross tgross added the stage/accepted Confirmed, and intend to work on. No timeline committment though. label Mar 7, 2023
@tgross tgross added this to Needs Triage in Nomad - Community Issues Triage via automation Mar 7, 2023
@tgross tgross moved this from Needs Triage to Needs Roadmapping in Nomad - Community Issues Triage Mar 7, 2023
@tgross tgross changed the title GC limits > 3 days are in effect infinite (cause resources to never be GCed) GC limits > 3 days are in effect infinite b/c of FSM timetable limit Mar 31, 2023
@tgross
Copy link
Member

tgross commented May 18, 2023

Related: #17233

schmichael added a commit that referenced this issue Jan 11, 2024
tgross added a commit that referenced this issue Jul 12, 2024
When a root key is rotated, the servers immediately start signing Workload
Identities with the new active key. But workloads may be using those WI tokens
to sign into external services, which may not have had time to fetch the new
public key and which might try to fetch new keys as needed.

Add support for prepublishing keys. Prepublished keys will be visible in the
JWKS endpoint but will not be used for signing or encryption until their
`PublishTime`. Update the periodic key rotation to prepublish keys at half the
`root_key_rotation_threshold` window, and promote prepublished keys to active
after the `PublishTime`.

This changeset also fixes two bugs in periodic root key rotation and garbage
collection, both of which can't be safely fixed without implementing
prepublishing:

* Periodic root key rotation would never happen because the default
  `root_key_rotation_threshold` of 720h exceeds the 72h maximum window of the FSM
  time table. We now compare the `CreateTime` against the wall clock time instead
  of the time table. (We expect to remove the time table in future work, ref
  #16359)
* Root key garbage collection could GC keys that were used to sign
  identities. We now wait until `root_key_rotation_threshold` +
  `root_key_gc_threshold` before GC'ing a key.

Ref: https://hashicorp.atlassian.net/browse/NET-10398
Ref: https://hashicorp.atlassian.net/browse/NET-10280
Fixes: #19669
Fixes: #23528
tgross added a commit that referenced this issue Jul 12, 2024
When a root key is rotated, the servers immediately start signing Workload
Identities with the new active key. But workloads may be using those WI tokens
to sign into external services, which may not have had time to fetch the new
public key and which might try to fetch new keys as needed.

Add support for prepublishing keys. Prepublished keys will be visible in the
JWKS endpoint but will not be used for signing or encryption until their
`PublishTime`. Update the periodic key rotation to prepublish keys at half the
`root_key_rotation_threshold` window, and promote prepublished keys to active
after the `PublishTime`.

This changeset also fixes two bugs in periodic root key rotation and garbage
collection, both of which can't be safely fixed without implementing
prepublishing:

* Periodic root key rotation would never happen because the default
  `root_key_rotation_threshold` of 720h exceeds the 72h maximum window of the FSM
  time table. We now compare the `CreateTime` against the wall clock time instead
  of the time table. (We expect to remove the time table in future work, ref
  #16359)
* Root key garbage collection could GC keys that were used to sign
  identities. We now wait until `root_key_rotation_threshold` +
  `root_key_gc_threshold` before GC'ing a key.

Ref: https://hashicorp.atlassian.net/browse/NET-10398
Ref: https://hashicorp.atlassian.net/browse/NET-10280
Fixes: #19669
Fixes: #23528
tgross added a commit that referenced this issue Jul 12, 2024
When a root key is rotated, the servers immediately start signing Workload
Identities with the new active key. But workloads may be using those WI tokens
to sign into external services, which may not have had time to fetch the new
public key and which might try to fetch new keys as needed.

Add support for prepublishing keys. Prepublished keys will be visible in the
JWKS endpoint but will not be used for signing or encryption until their
`PublishTime`. Update the periodic key rotation to prepublish keys at half the
`root_key_rotation_threshold` window, and promote prepublished keys to active
after the `PublishTime`.

This changeset also fixes two bugs in periodic root key rotation and garbage
collection, both of which can't be safely fixed without implementing
prepublishing:

* Periodic root key rotation would never happen because the default
  `root_key_rotation_threshold` of 720h exceeds the 72h maximum window of the FSM
  time table. We now compare the `CreateTime` against the wall clock time instead
  of the time table. (We expect to remove the time table in future work, ref
  #16359)
* Root key garbage collection could GC keys that were used to sign
  identities. We now wait until `root_key_rotation_threshold` +
  `root_key_gc_threshold` before GC'ing a key.
* When rekeying a root key, the core job did not mark the key as inactive after
  the rekey was complete.

Ref: https://hashicorp.atlassian.net/browse/NET-10398
Ref: https://hashicorp.atlassian.net/browse/NET-10280
Fixes: #19669
Fixes: #23528
Fixes: #19368
tgross added a commit that referenced this issue Jul 12, 2024
When a root key is rotated, the servers immediately start signing Workload
Identities with the new active key. But workloads may be using those WI tokens
to sign into external services, which may not have had time to fetch the new
public key and which might try to fetch new keys as needed.

Add support for prepublishing keys. Prepublished keys will be visible in the
JWKS endpoint but will not be used for signing or encryption until their
`PublishTime`. Update the periodic key rotation to prepublish keys at half the
`root_key_rotation_threshold` window, and promote prepublished keys to active
after the `PublishTime`.

This changeset also fixes two bugs in periodic root key rotation and garbage
collection, both of which can't be safely fixed without implementing
prepublishing:

* Periodic root key rotation would never happen because the default
  `root_key_rotation_threshold` of 720h exceeds the 72h maximum window of the FSM
  time table. We now compare the `CreateTime` against the wall clock time instead
  of the time table. (We expect to remove the time table in future work, ref
  #16359)
* Root key garbage collection could GC keys that were used to sign
  identities. We now wait until `root_key_rotation_threshold` +
  `root_key_gc_threshold` before GC'ing a key.
* When rekeying a root key, the core job did not mark the key as inactive after
  the rekey was complete.

Ref: https://hashicorp.atlassian.net/browse/NET-10398
Ref: https://hashicorp.atlassian.net/browse/NET-10280
Fixes: #19669
Fixes: #23528
Fixes: #19368
tgross added a commit that referenced this issue Jul 18, 2024
When a root key is rotated, the servers immediately start signing Workload
Identities with the new active key. But workloads may be using those WI tokens
to sign into external services, which may not have had time to fetch the new
public key and which might try to fetch new keys as needed.

Add support for prepublishing keys. Prepublished keys will be visible in the
JWKS endpoint but will not be used for signing or encryption until their
`PublishTime`. Update the periodic key rotation to prepublish keys at half the
`root_key_rotation_threshold` window, and promote prepublished keys to active
after the `PublishTime`.

This changeset also fixes two bugs in periodic root key rotation and garbage
collection, both of which can't be safely fixed without implementing
prepublishing:

* Periodic root key rotation would never happen because the default
  `root_key_rotation_threshold` of 720h exceeds the 72h maximum window of the FSM
  time table. We now compare the `CreateTime` against the wall clock time instead
  of the time table. (We expect to remove the time table in future work, ref
  #16359)
* Root key garbage collection could GC keys that were used to sign
  identities. We now wait until `root_key_rotation_threshold` +
  `root_key_gc_threshold` before GC'ing a key.
* When rekeying a root key, the core job did not mark the key as inactive after
  the rekey was complete.

Ref: https://hashicorp.atlassian.net/browse/NET-10398
Ref: https://hashicorp.atlassian.net/browse/NET-10280
Fixes: #19669
Fixes: #23528
Fixes: #19368
tgross added a commit that referenced this issue Jul 18, 2024
When a root key is rotated, the servers immediately start signing Workload
Identities with the new active key. But workloads may be using those WI tokens
to sign into external services, which may not have had time to fetch the new
public key and which might try to fetch new keys as needed.

Add support for prepublishing keys. Prepublished keys will be visible in the
JWKS endpoint but will not be used for signing or encryption until their
`PublishTime`. Update the periodic key rotation to prepublish keys at half the
`root_key_rotation_threshold` window, and promote prepublished keys to active
after the `PublishTime`.

This changeset also fixes two bugs in periodic root key rotation and garbage
collection, both of which can't be safely fixed without implementing
prepublishing:

* Periodic root key rotation would never happen because the default
  `root_key_rotation_threshold` of 720h exceeds the 72h maximum window of the FSM
  time table. We now compare the `CreateTime` against the wall clock time instead
  of the time table. (We expect to remove the time table in future work, ref
  #16359)
* Root key garbage collection could GC keys that were used to sign
  identities. We now wait until `root_key_rotation_threshold` +
  `root_key_gc_threshold` before GC'ing a key.
* When rekeying a root key, the core job did not mark the key as inactive after
  the rekey was complete.

Ref: https://hashicorp.atlassian.net/browse/NET-10398
Ref: https://hashicorp.atlassian.net/browse/NET-10280
Fixes: #19669
Fixes: #23528
Fixes: #19368
tgross added a commit that referenced this issue Jul 19, 2024
When a root key is rotated, the servers immediately start signing Workload
Identities with the new active key. But workloads may be using those WI tokens
to sign into external services, which may not have had time to fetch the new
public key and which might try to fetch new keys as needed.

Add support for prepublishing keys. Prepublished keys will be visible in the
JWKS endpoint but will not be used for signing or encryption until their
`PublishTime`. Update the periodic key rotation to prepublish keys at half the
`root_key_rotation_threshold` window, and promote prepublished keys to active
after the `PublishTime`.

This changeset also fixes two bugs in periodic root key rotation and garbage
collection, both of which can't be safely fixed without implementing
prepublishing:

* Periodic root key rotation would never happen because the default
  `root_key_rotation_threshold` of 720h exceeds the 72h maximum window of the FSM
  time table. We now compare the `CreateTime` against the wall clock time instead
  of the time table. (We expect to remove the time table in future work, ref
  #16359)
* Root key garbage collection could GC keys that were used to sign
  identities. We now wait until `root_key_rotation_threshold` +
  `root_key_gc_threshold` before GC'ing a key.
* When rekeying a root key, the core job did not mark the key as inactive after
  the rekey was complete.

Ref: https://hashicorp.atlassian.net/browse/NET-10398
Ref: https://hashicorp.atlassian.net/browse/NET-10280
Fixes: #19669
Fixes: #23528
Fixes: #19368
tgross added a commit that referenced this issue Jul 19, 2024
When a root key is rotated, the servers immediately start signing Workload
Identities with the new active key. But workloads may be using those WI tokens
to sign into external services, which may not have had time to fetch the new
public key and which might try to fetch new keys as needed.

Add support for prepublishing keys. Prepublished keys will be visible in the
JWKS endpoint but will not be used for signing or encryption until their
`PublishTime`. Update the periodic key rotation to prepublish keys at half the
`root_key_rotation_threshold` window, and promote prepublished keys to active
after the `PublishTime`.

This changeset also fixes two bugs in periodic root key rotation and garbage
collection, both of which can't be safely fixed without implementing
prepublishing:

* Periodic root key rotation would never happen because the default
  `root_key_rotation_threshold` of 720h exceeds the 72h maximum window of the FSM
  time table. We now compare the `CreateTime` against the wall clock time instead
  of the time table. (We expect to remove the time table in future work, ref
  #16359)
* Root key garbage collection could GC keys that were used to sign
  identities. We now wait until `root_key_rotation_threshold` +
  `root_key_gc_threshold` before GC'ing a key.
* When rekeying a root key, the core job did not mark the key as inactive after
  the rekey was complete.

Ref: https://hashicorp.atlassian.net/browse/NET-10398
Ref: https://hashicorp.atlassian.net/browse/NET-10280
Fixes: #19669
Fixes: #23528
Fixes: #19368
tgross added a commit that referenced this issue Jul 19, 2024
…23651)

When a root key is rotated, the servers immediately start signing Workload Identities with the new active key. But workloads may be using those WI tokens to sign into external services, which may not have had time to fetch the new public key and which might try to fetch new keys as needed.

Add support for prepublishing keys. Prepublished keys will be visible in the JWKS endpoint but will not be used for signing or encryption until their `PublishTime`. Update the periodic key rotation to prepublish keys at half the `root_key_rotation_threshold` window, and promote prepublished keys to active after the `PublishTime`.

This changeset also fixes three bugs in periodic root key rotation and garbage collection, none of which can be safely fixed without implementing prepublishing:

* Periodic root key rotation would never happen because the default `root_key_rotation_threshold` of 720h exceeds the 72h maximum window of the FSM time table. We now compare the `CreateTime` against the wall clock time instead of the time table. (We expect to remove the time table in future work, ref #16359)
* Root key garbage collection could GC keys that were used to sign identities. We now wait until `root_key_rotation_threshold` + `root_key_gc_threshold` before GC'ing a key.
*  When rekeying a root key, the core job did not mark the key as inactive after the rekey was complete.

Ref: https://hashicorp.atlassian.net/browse/NET-10398
Ref: https://hashicorp.atlassian.net/browse/NET-10280
Fixes: #19669
Fixes: #23528
Fixes: #19368

Co-authored-by: Tim Gross <tgross@hashicorp.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hcc/jira stage/accepted Confirmed, and intend to work on. No timeline committment though. theme/config type/bug
Projects
Status: Needs Roadmapping
Development

No branches or pull requests

3 participants