Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

keyring: replication tries to replicate rotated-away keys #19367

Open
tgross opened this issue Dec 7, 2023 · 2 comments
Open

keyring: replication tries to replicate rotated-away keys #19367

tgross opened this issue Dec 7, 2023 · 2 comments

Comments

@tgross
Copy link
Member

tgross commented Dec 7, 2023

In #19340 @sbihel reported a behavior where the followers would try to replicate keys that had been previously rotated out, and this would fail:

[WARN] nomad.keyring.replicator: failed to fetch key from current leader, trying peers: key=128ba7c1-baa0-3bc6-c20f-833b97a1fbe2 error=
[ERROR] nomad.keyring.replicator: failed to fetch key from any peer: key=128ba7c1-baa0-3bc6-c20f-833b97a1fbe2 error="rpc error: no such key "128ba7c1-baa0-3bc6-c20f-833b97a1fbe2" in keyring"
[ERROR] nomad.keyring.replicator: failed to fetch key from any peer: rpc error: no such key "128ba7c1-baa0-3bc6-c20f-833b97a1fbe2" in keyring: key=128ba7c1-baa0-3bc6-c20f-833b97a1fbe2

#19340 covered another critical bug and was automatically closed once the fix was merged. This issue is a follow-up.

@tgross
Copy link
Member Author

tgross commented Dec 14, 2023

The specific error we're getting here is when the server we're replicating the key from tries to get the key material from its keyring. That key material isn't present anymore so the replication can't work anymore. That's not an unexpected scenario by itself, because we have to handle that for when we want to bootstrap the keyring from one server to all the other servers (and some servers may get replication requests for keys they don't yet have).

But for what is effectively an "orphaned" key, we're in a messy spot. We can't guarantee that the key is safe to remove from the metadata, because the operator may have had a bad recovery process and needs to restore the on-disk keyring to the servers. As a workaround, the operator can remove the key via nomad operator root keyring remove if they know it's truly orphaned. But being able to fix #19368 seems important to figure out to fix this issue.

@tgross
Copy link
Member Author

tgross commented Jan 9, 2024

Ref #19669

@tgross tgross modified the milestones: 1.7.x, 1.8.x Jun 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant