-
Notifications
You must be signed in to change notification settings - Fork 244
OCPBUGS-33013: certsyncpod: Swap secret/cm directories atomically #2009
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
b3eca57
to
a389cbd
Compare
8dec327
to
60f05a8
Compare
@tchap: This pull request references Jira Issue OCPBUGS-33013, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
Requesting review from QA contact: The bug has been updated to refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
@tchap: This pull request references Jira Issue OCPBUGS-33013, which is valid. 3 validation(s) were run on this bug
Requesting review from QA contact: In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
@tchap: This pull request references Jira Issue OCPBUGS-33013, which is valid. 3 validation(s) were run on this bug
Requesting review from QA contact: The bug has been updated to refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
I actually have to make sure this can be merged as this is only supported on Linux 3.15 or later. /hold |
This patch should be OK for RHEL 8 or later based on https://access.redhat.com/articles/3078 The latest CI for OCP 4.21 actually uses RHEL 9.6. |
60f05a8
to
f6df27a
Compare
The PR using this change in cluster-kube-apiserver-operator seems to be passing on CI, I deem this ready. /unhold |
@tchap is there a must-gather from an incident i could take a look at ? |
@p0lyn0mial I think that I consulted this one: https://access.redhat.com/support/cases/#/case/03849958/discussion?attachmentId=a096R00003JpGgMQAV |
@vrutkovs do you have time to take a look at this issue ? I think that the issue might be real. I think the issue is when a two file cert is replaced. It can happen that the server picks up the update and notices the public/private key mismatch and crashes. Is there a way to repo this issue ? |
@p0lyn0mial Yeah, there is already openshift/cluster-kube-apiserver-operator#1917 that I used initially to run additional tests, but I will update it once I attend to all review issues. |
Pretty sure previous approach would break on dynamic configmaps (its already broken on static TLS secrets!). The difference in approach is important: previously we treated files one by one, now we treat an entire configmap/secret as a directory and make each element depend on the success of the entire resource update |
/approve to make sure we agree on the approach and need to sort out the details |
There is really only one last pending issue regarding code from my perspective, which is that @p0lyn0mial mentioned we should have |
Iiuc we want an implementation for unittests, which means it doesn't have to be truly atomic, just good enough to pass unittests on other platforms (mac primarily, let's hope no one uses Windows/BSD to run unit tests) |
8087421
to
2eb032a
Compare
I added the mock implementation and tested it on my Mac. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
Let's bump kube-apiserver test PR and make sure it doesn't cause any sideeffects on e2e runs?
2eb032a
to
c8b5cec
Compare
I squshed all the commits into one, LGTM removed 😐 Updated the KASO PR to include the current version of this PR and also added the extra tests mentioned by @vrutkovs |
/lgtm |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: tchap, vrutkovs The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/hold until we are sure the testing KASO PR works just fine. |
eventRecorder.Warningf("CertificateUpdateFailed", "Failed to hash current content directory for %s: %s/%s: %v", typeName, o.Namespace, o.Name, err) | ||
return err | ||
} | ||
if !bytes.Equal(targetDirHashBefore, targetDirHashAfter) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If two processes are writing to the same directory, hashing won’t guarantee atomicity, because the second process could start modifying the directory after the check.
A possible solution would be to make the second process also swap the directory atomically, perhaps using the same technique.
I don’t think the order matters, so atomically swapping the contents should be enough. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It needs to happen twice - when both targetDirHashBefore
and targetDirHashAfter
are calculated and needs to end up in the consistent state (if the second write is a bit slower/faster, the hashes will mismatch). IMO it's too rare to guard against
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The number of times we calculate and compare the hashes doesn’t matter, because the second process can start writing or changing the target directory after the last comparison or calculation.
In this case, the second process is the installer pod.
We should change the installer pod to perform an atomic swap rather than rely on the hashes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my understanding, the desired procedure is:
- make a hash of source dir
- prepare a new dir from source dir
- hash source dir again to make sure it didn't change during previous step
- we swap atomically if hashes match
So the only chance of invalid swap here is if it initiates after step 1 and between step 3 and 4 - that would be a very rare event, wouldn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
step 3 and 4 - that would be a very rare event, wouldn't it?
This PR addresses a rare bug where kas can observe an old prv key together with a new pub key, which causes a crash :(
I think we should take this potential issue into account and fix it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I aligned installerpod
to use the same atomic swap. I also refactored the PR a bit because I had to pull shared code into a separate package. The swapping function is now called SyncDirectory
since it replaces the content actually.
New changes are detected. LGTM label has been removed. |
5c2ed6a
to
ccd7851
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tchap for easier review please open a new pr with the SwapDirectories function + tests.
and then a new PR for the SyncDirectory function.
// | ||
// SyncDirectory is supposed to be used with secrets/configmaps, so typeName is expected to be "configmap" or "secret". | ||
// This does not affect the logic, but it's included in error messages. | ||
func SyncDirectory[C string | []byte]( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about changing this method to: Write(targetDir string, files map[string][]byte, filePerm os.FileMode
After all we are writing files to the target directory in an atomic way.
Callers have to transform files to []byte which is easy since strings in go are []bytes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used Sync instead of Write because Write feels like if doesn't affect existing files. But I don't have a strong opinion really.
} | ||
defer func() { | ||
if err := fs.RemoveAll(tmpDir); err != nil { | ||
klog.Errorf("Failed to remove temporary directory %q during cleanup: %v", tmpDir, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should return the error the the callers not only log it.
pkg/operator/staticpod/internal/atomicfiles/swap_directories_linux.go
Outdated
Show resolved
Hide resolved
pkg/operator/staticpod/internal/atomicfiles/swap_directories_linux.go
Outdated
Show resolved
Hide resolved
pkg/operator/staticpod/internal/atomicfiles/swap_directories_linux.go
Outdated
Show resolved
Hide resolved
pkg/operator/staticpod/internal/atomicfiles/swap_directories_other.go
Outdated
Show resolved
Hide resolved
The function can be used to atomically sync a directory with the desired state. This uses atomicdir.swap implemented earlier.
@tchap: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Currently it can happen that cert-syncer replaces some of the secret/configmap files successfully and then fails. This can lead to problems when these are e.g. TLS cert/key files and the directory gets inconsistent. This may seem transient, but when cert-syncer is terminated in the middle, it can later fail to start as the whole kube-apiserver gets into a crash loop.
This introduces a new
staticpod.SwapDirectoriesAtomic
, which usesunix.Renameat2
withRENAME_EXCHANGE
flag set.