New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use k8s atomic writer for storage of secrets / configmaps #93
use k8s atomic writer for storage of secrets / configmaps #93
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: gabemontero The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
||
corev1 "k8s.io/api/core/v1" | ||
kerrors "k8s.io/apimachinery/pkg/api/errors" | ||
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" | ||
"k8s.io/client-go/kubernetes" | ||
"k8s.io/klog/v2" | ||
atomic "k8s.io/kubernetes/pkg/volume/util" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that this should be something like "volumeutil", "kvolumeutil" or just "kutil" to be more descriptive to the package it is pulling in, not what feature of that package you are using.
pkg/csidriver/driver.go
Outdated
// container image must be empty, or the directory does not exist, and is created for the Pod's container as | ||
// part of provisioning the container. | ||
if err := commonOSRemove(podPath, fmt.Sprintf("commonUpsertRanger key %s volid %s share id %s pod name %s", key, dv.GetVolID(), dv.GetSharedDataId(), dv.GetPodName())); err != nil { | ||
// NOTE: atomic_writer handles any pruning of secret/configmap keys where were present before, but are no longer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/where/that/
pkg/csidriver/driver.go
Outdated
err = aw.Write(podFile) | ||
if err != nil { | ||
return err | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if err = aw.Write(podFile); err != nil {
return err
}
// has been mounted as a separate dir in our share, so skip | ||
if info.IsDir() { | ||
return nil | ||
dirFile, err := os.Open(dir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should there be a deferred close directory after this?
pkg/csidriver/driver.go
Outdated
err = os.RemoveAll(fileName) | ||
if err != nil { | ||
klog.V(0).Infof("commonOSRemove: %s", err.Error()) | ||
return err | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if err = os.RemoveAll(fileName); err != nil {
klog.V(0).Infof("commonOSRemove: %s", err.Error())
return err
}
If this is going to be loglevel 0, it should probably be more human-friendly text, or maybe you meant it to be loglevel 4 like the others?
pkg/csidriver/driver_test.go
Outdated
linkName, err := os.Readlink(info.Name()) | ||
if err == nil { | ||
searchName = linkName | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if linkName, err := os.Readlink(info.Name()); err == nil {
searchName = linkName
}
if err := os.MkdirAll(podPath, os.ModePerm); err != nil { | ||
podFile := map[string]atomic.FileProjection{} | ||
aw, err := atomic.NewAtomicWriter(podPath, "shared-resource-csi-driver") | ||
if err != nil { | ||
return err | ||
} | ||
if payload.ByteData != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can a payload
have both ByteData
and StringData
populated at the same time? Can payload
have both be nil
also?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes and yes
see the ConfigMap godoc wrt both populated
both nil means an empty secret/configmap
pkg/csidriver/driver.go
Outdated
err = syscall.Unlink(filepath.Join(dir, dirEntry.Name())) | ||
if err != nil { | ||
klog.V(0).Infof("commonOSRemove: %s", err.Error()) | ||
return err | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if err = syscall.Unlink(filepath.Join(dir, dirEntry.Name())); err != nil {
klog.V(0).Infof("commonOSRemove: %s", err.Error())
return err
}
This should probably be loglevel 4 or more human-friendly text.
pkg/csidriver/driver.go
Outdated
err = syscall.Unlink(filepath.Join(dir, dirEntry.Name())) | ||
if err != nil { | ||
klog.V(0).Infof("commonOSRemove: %s", err.Error()) | ||
return err | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if err = syscall.Unlink(filepath.Join(dir, dirEntry.Name())); err != nil {
klog.V(0).Infof("commonOSRemove: %s", err.Error())
return err
}
Loglevel 4 or more human-friendly text
Compile errors I am not seeing locally:
and
probably need to bump the golang version in the Dockerfile and/or openshift/release and/or what I am using locally |
hmm .... it might even depend on which version of go 1.17 you use ... I'm getting some hits on internet searches. This repo is already using 1.17 ... my laptop is at 1.17.6 I saw were Luke emailed aos-devel about 1.18.1 looking some more ... |
weirdly the unit test at least is using 1.16 which definitely has the issue: �[36mINFO�[0m[2022-04-14T16:22:34Z] Tagging openshift/release:golang-1.16 into pipeline:root. e2e's are 1.17 ... though I cannot tell with dot rel off of 1.17 �[36mINFO�[0m[2022-04-14T16:22:35Z] Tagging ocp/builder:rhel-8-golang-1.17-openshift-4.11 into pipeline:ocp_builder_rhel-8-golang-1.17-openshift-4.11. |
always forget the ci-operator yam file ... that at least fixed the unit test |
removed permissions needs a little work ; my simple test did not catch what the more complicated e2e caught will pick back up after the holiday golang version at least sorted out |
8503d02
to
ec47d2e
Compare
hmm ... slow passes for me locally /test e2e-aws-csi-driver-slow |
reviewed ... the test waits 10 min and the relist is 10 min ... could have hit edge case. May push tweak to wait time based on next set of results. |
/test e2e-aws-csi-driver-disruptive |
ok e2e's all green doing one more run with the slow e2e interval tweaked @coreydaley have the updates from your comments in a separate commit PTAL / thanks |
testArgs.SearchString = "invoker" | ||
framework.ExecPod(testArgs) | ||
|
||
framework.CreateShareRelatedRBAC(testArgs) | ||
t.Logf("%s: wait up to 10 minutes for examining pod %s since the controller does not currently watch all clusterroles and clusterrolebindings and reverse engineer which ones satisfied the SAR calls, so we wait for relist on shares", time.Now().String(), testArgs.Name) | ||
t.Logf("%s: wait up to 11 minutes for examining pod %s since the controller does not currently watch all clusterroles and clusterrolebindings and reverse engineer which ones satisfied the SAR calls, so we wait for relist on shares", time.Now().String(), testArgs.Name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need the testArgs.TestDuration
or will it roll over from the previous one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rolls over from previous
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, thanks.
/lgtm |
thanks @coreydaley that said, /hold have to squash your review commit |
b717afa
to
73f2ef6
Compare
/hold cancel ok @coreydaley can you re-lgtm? thanks |
/lgtm |
@gabemontero: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/label docs-approved no docs impact - internal implementation detail - also see relative lack on docs in upstream k8s around using inotify, etc. ... considered a "linux" thing |
/label px-approved internal implementation detail |
launching clusterbot cluster with this PR for final verification of entitled build usage |
forgot, cannot do entitled builds with clusterbot clusters had to verify manually via local build ... will do so again after this merges and we have a level with it /label qe-approved |
verified openshift builds with entitlements at level 4.11.0-0.ci-2022-04-18-211945 |
Taking a glance at the latest shared resource epics, I cannot find a story related to this (@coreydaley @adambkaplan do let me know if that is just an oversight on my part), but an item we have discussed for completing productization of shared resource is switching actual storage of the secret/configmap keys to using k8s atomic writer to provide "atomic" access to data via some creative linux symlinking so that if the contents change, users can use
inotify
orfanotify
to coordinate access.This is how a Pod's serviceaccount secrets have been stored since the early days of k8s, and this approach was created after some early bugs where intermittent errors would occur when read and writes of that secret data were done at the same time.
The upstream Secrets Store CSI Driver also adopted this approach somewhat recently.
As I mentioned last time we discussed this in team meetings, they actually create a copy of atomic_writer.go in their tree (even though they still already vendor in k8s/k8s). In looking at the history of https://github.com/kubernetes-sigs/secrets-store-csi-driver/commits/main/pkg/util/fileutil/atomic_writer.go their updates from their initial copy are only superficial.
So I eschewed their copy approach and simply vendor in k8s/k8s like we do for openshift/builder and use the k8s/k8s atomic_writier.
Lastly, I moved our simple config map example to a locally created config map vs. one in the ocp config managed namespace. It made it eaiser for manual testing around removing / adding keys and confirming that atomic_writer "handles it".
/assign @coreydaley
/assign @adambkaplan