Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use k8s atomic writer for storage of secrets / configmaps #93

Merged
merged 5 commits into from Apr 18, 2022

Conversation

gabemontero
Copy link
Contributor

Taking a glance at the latest shared resource epics, I cannot find a story related to this (@coreydaley @adambkaplan do let me know if that is just an oversight on my part), but an item we have discussed for completing productization of shared resource is switching actual storage of the secret/configmap keys to using k8s atomic writer to provide "atomic" access to data via some creative linux symlinking so that if the contents change, users can use inotify or fanotify to coordinate access.

This is how a Pod's serviceaccount secrets have been stored since the early days of k8s, and this approach was created after some early bugs where intermittent errors would occur when read and writes of that secret data were done at the same time.

The upstream Secrets Store CSI Driver also adopted this approach somewhat recently.

As I mentioned last time we discussed this in team meetings, they actually create a copy of atomic_writer.go in their tree (even though they still already vendor in k8s/k8s). In looking at the history of https://github.com/kubernetes-sigs/secrets-store-csi-driver/commits/main/pkg/util/fileutil/atomic_writer.go their updates from their initial copy are only superficial.

So I eschewed their copy approach and simply vendor in k8s/k8s like we do for openshift/builder and use the k8s/k8s atomic_writier.

Lastly, I moved our simple config map example to a locally created config map vs. one in the ocp config managed namespace. It made it eaiser for manual testing around removing / adding keys and confirming that atomic_writer "handles it".

/assign @coreydaley
/assign @adambkaplan

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Apr 14, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: gabemontero

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 14, 2022

corev1 "k8s.io/api/core/v1"
kerrors "k8s.io/apimachinery/pkg/api/errors"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/client-go/kubernetes"
"k8s.io/klog/v2"
atomic "k8s.io/kubernetes/pkg/volume/util"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this should be something like "volumeutil", "kvolumeutil" or just "kutil" to be more descriptive to the package it is pulling in, not what feature of that package you are using.

// container image must be empty, or the directory does not exist, and is created for the Pod's container as
// part of provisioning the container.
if err := commonOSRemove(podPath, fmt.Sprintf("commonUpsertRanger key %s volid %s share id %s pod name %s", key, dv.GetVolID(), dv.GetSharedDataId(), dv.GetPodName())); err != nil {
// NOTE: atomic_writer handles any pruning of secret/configmap keys where were present before, but are no longer
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/where/that/

Comment on lines 280 to 282
err = aw.Write(podFile)
if err != nil {
return err
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if err = aw.Write(podFile); err != nil {
	return err
}

// has been mounted as a separate dir in our share, so skip
if info.IsDir() {
return nil
dirFile, err := os.Open(dir)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be a deferred close directory after this?

err = os.RemoveAll(fileName)
if err != nil {
klog.V(0).Infof("commonOSRemove: %s", err.Error())
return err
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if err = os.RemoveAll(fileName); err != nil {
klog.V(0).Infof("commonOSRemove: %s", err.Error())
return err
}

If this is going to be loglevel 0, it should probably be more human-friendly text, or maybe you meant it to be loglevel 4 like the others?

Comment on lines 830 to 834
linkName, err := os.Readlink(info.Name())
if err == nil {
searchName = linkName
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if linkName, err := os.Readlink(info.Name()); err == nil {
    searchName = linkName
}

if err := os.MkdirAll(podPath, os.ModePerm); err != nil {
podFile := map[string]atomic.FileProjection{}
aw, err := atomic.NewAtomicWriter(podPath, "shared-resource-csi-driver")
if err != nil {
return err
}
if payload.ByteData != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can a payload have both ByteData and StringData populated at the same time? Can payload have both be nil also?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes and yes

see the ConfigMap godoc wrt both populated

both nil means an empty secret/configmap

Comment on lines 306 to 309
err = syscall.Unlink(filepath.Join(dir, dirEntry.Name()))
if err != nil {
klog.V(0).Infof("commonOSRemove: %s", err.Error())
return err
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if err = syscall.Unlink(filepath.Join(dir, dirEntry.Name())); err != nil {
    klog.V(0).Infof("commonOSRemove: %s", err.Error())
    return err
}

This should probably be loglevel 4 or more human-friendly text.

Comment on lines 317 to 318
err = syscall.Unlink(filepath.Join(dir, dirEntry.Name()))
if err != nil {
klog.V(0).Infof("commonOSRemove: %s", err.Error())
return err
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if err = syscall.Unlink(filepath.Join(dir, dirEntry.Name())); err != nil {
    klog.V(0).Infof("commonOSRemove: %s", err.Error())
    return err
}

Loglevel 4 or more human-friendly text

@gabemontero
Copy link
Contributor Author

gabemontero commented Apr 14, 2022

Compile errors I am not seeing locally:

Running e2e suite normal
# sigs.k8s.io/json/internal/golang/encoding/json
vendor/sigs.k8s.io/json/internal/golang/encoding/json/encode.go:1249:12: sf.IsExported undefined (type reflect.StructField has no field or method IsExported)
vendor/sigs.k8s.io/json/internal/golang/encoding/json/encode.go:1255:18: sf.IsExported undefined (type reflect.StructField has no field or method IsExported)
FAIL	github.com/openshift/csi-driver-shared-resource/test/e2e [build failed]
FAIL

and

 INFO[2022-04-14T16:24:29Z] env GOOS=linux GOARCH=amd64 go test -mod=vendor -count 1 ./cmd/... ./pkg/...
# sigs.k8s.io/json/internal/golang/encoding/json
vendor/sigs.k8s.io/json/internal/golang/encoding/json/encode.go:1249:12: sf.IsExported undefined (type reflect.StructField has no field or method IsExported)
vendor/sigs.k8s.io/json/internal/golang/encoding/json/encode.go:1255:18: sf.IsExported undefined (type reflect.StructField has no field or method IsExported)
FAIL

probably need to bump the golang version in the Dockerfile and/or openshift/release and/or what I am using locally

@gabemontero
Copy link
Contributor Author

hmm .... it might even depend on which version of go 1.17 you use ... I'm getting some hits on internet searches. This repo is already using 1.17 ... my laptop is at 1.17.6

I saw were Luke emailed aos-devel about 1.18.1

looking some more ...

@gabemontero
Copy link
Contributor Author

weirdly the unit test at least is using 1.16 which definitely has the issue:

�[36mINFO�[0m[2022-04-14T16:22:34Z] Tagging openshift/release:golang-1.16 into pipeline:root.

e2e's are 1.17 ... though I cannot tell with dot rel off of 1.17

�[36mINFO�[0m[2022-04-14T16:22:35Z] Tagging ocp/builder:rhel-8-golang-1.17-openshift-4.11 into pipeline:ocp_builder_rhel-8-golang-1.17-openshift-4.11.

@gabemontero
Copy link
Contributor Author

always forget the ci-operator yam file ... that at least fixed the unit test

@gabemontero
Copy link
Contributor Author

removed permissions needs a little work ; my simple test did not catch what the more complicated e2e caught

will pick back up after the holiday

golang version at least sorted out

@gabemontero
Copy link
Contributor Author

hmm ... slow passes for me locally

/test e2e-aws-csi-driver-slow

@gabemontero
Copy link
Contributor Author

hmm ... slow passes for me locally

/test e2e-aws-csi-driver-slow

reviewed ... the test waits 10 min and the relist is 10 min ... could have hit edge case. May push tweak to wait time based on next set of results.

@gabemontero
Copy link
Contributor Author

/test e2e-aws-csi-driver-disruptive

@gabemontero
Copy link
Contributor Author

ok e2e's all green

doing one more run with the slow e2e interval tweaked

@coreydaley have the updates from your comments in a separate commit

PTAL / thanks

testArgs.SearchString = "invoker"
framework.ExecPod(testArgs)

framework.CreateShareRelatedRBAC(testArgs)
t.Logf("%s: wait up to 10 minutes for examining pod %s since the controller does not currently watch all clusterroles and clusterrolebindings and reverse engineer which ones satisfied the SAR calls, so we wait for relist on shares", time.Now().String(), testArgs.Name)
t.Logf("%s: wait up to 11 minutes for examining pod %s since the controller does not currently watch all clusterroles and clusterrolebindings and reverse engineer which ones satisfied the SAR calls, so we wait for relist on shares", time.Now().String(), testArgs.Name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need the testArgs.TestDuration or will it roll over from the previous one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rolls over from previous

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, thanks.

@coreydaley
Copy link
Member

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Apr 18, 2022
@gabemontero
Copy link
Contributor Author

/lgtm

thanks @coreydaley

that said,

/hold

have to squash your review commit

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 18, 2022
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Apr 18, 2022
@gabemontero
Copy link
Contributor Author

/hold cancel

ok @coreydaley can you re-lgtm? thanks

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 18, 2022
@coreydaley
Copy link
Member

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Apr 18, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Apr 18, 2022

@gabemontero: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@gabemontero
Copy link
Contributor Author

/label docs-approved

no docs impact - internal implementation detail - also see relative lack on docs in upstream k8s around using inotify, etc. ... considered a "linux" thing

@openshift-ci openshift-ci bot added the docs-approved Signifies that Docs has signed off on this PR label Apr 18, 2022
@gabemontero
Copy link
Contributor Author

/label px-approved

internal implementation detail

@openshift-ci openshift-ci bot added the px-approved Signifies that Product Support has signed off on this PR label Apr 18, 2022
@gabemontero
Copy link
Contributor Author

launching clusterbot cluster with this PR for final verification of entitled build usage

@gabemontero
Copy link
Contributor Author

forgot, cannot do entitled builds with clusterbot clusters

had to verify manually via local build ... will do so again after this merges and we have a level with it

/label qe-approved

@openshift-ci openshift-ci bot added the qe-approved Signifies that QE has signed off on this PR label Apr 18, 2022
@openshift-merge-robot openshift-merge-robot merged commit 6e3b1e0 into openshift:master Apr 18, 2022
@gabemontero gabemontero deleted the atomic-writer branch April 18, 2022 20:05
@gabemontero
Copy link
Contributor Author

verified openshift builds with entitlements at level 4.11.0-0.ci-2022-04-18-211945

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. docs-approved Signifies that Docs has signed off on this PR lgtm Indicates that a PR is ready to be merged. px-approved Signifies that Product Support has signed off on this PR qe-approved Signifies that QE has signed off on this PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants