New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug 1888958: Store secrets in one place and utilize mutex #765
Conversation
|
@jcantrill: This pull request references Bugzilla bug 1888958, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker. 3 validation(s) were run on this bug
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/hold |
pkg/k8shandler/certificates.go
Outdated
| return | ||
| } | ||
|
|
||
| scriptsDir := utils.GetScriptsDir() | ||
| if err = GenerateCertificates(clusterRequest.Cluster.Namespace, scriptsDir, "elasticsearch", utils.DefaultWorkingDir); err != nil { | ||
| if err = GenerateCertificates(clusterRequest.Cluster.Namespace, scriptsDir, "elasticsearch", utils.GetWorkingDir()); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the script could indicate by its return value that it didn't change anything, we could skip writing the secret and cut down on API server traffic quite a bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated as we discussed
| func (clusterRequest *ClusterLoggingRequest) extractSecretToFile(secretName string, key string, toFile string) (err error) { | ||
| secret, err := clusterRequest.GetSecret(secretName) | ||
| func (clusterRequest *ClusterLoggingRequest) extractMasterCerts() (err error) { | ||
| secret, err := clusterRequest.GetSecret(constants.MasterCASecretName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the new code runs for the first time after an upgrade, it'll dump files "masterca" and "masterkey" on disk and keep them circulating. Just might account for that and skip them altogether.
pkg/k8shandler/certificates.go
Outdated
| // Pull master signing cert out from secret in logging.Spec.SecretName | ||
| if err = clusterRequest.readSecrets(); err != nil { | ||
| mutex.Lock() | ||
| defer mutex.Unlock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checking for 0-length files while populating per-component secrets was one of the bullet points, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i believe the changes to the script resolve this
|
@jcantrill Added error checking to reading of ES certs: |
pkg/k8shandler/certificates.go
Outdated
| } | ||
| results := map[string][]byte{} | ||
| for _, f := range files { | ||
| results[f.Name()] = utils.GetFileContents(path.Join(workDir, f.Name())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could use a check for GetFileContents() failing here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The called function logs an error and returns nil. Adding check for nil and log message. Skipping adding the file if the content is nil
| map[string][]byte{ | ||
| var secrets = map[string][]byte{} | ||
| Syncronize(func() error { | ||
| secrets = map[string][]byte{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're not checking for possible errors from GetWorkingDirFileContents()
I took a stab at adding that in https://github.com/syedriko/cluster-logging-operator/blob/bz_1888958/pkg/k8shandler/logstore.go#L155
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is OK not to check missing here. We need a bunch of secrets and we should fail and reconcile again. The cert gen code should regen if something is missing. If it does not that is a bug in the regen code IMO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But if we don't check we don't fail and stick empty buffers into secrets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the action we can take in this situation? What do we believe will change during a subsequent reconciliation that would resolve the issue? The best we can do is really generate a log message and/or update status. This particular file is probably moot as we want to remove curation from CLO but if we are to do anything then it should be applied to the other places where we take similar action
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can abort, refuse to update the secret and to potentially make our situation worse. GetWorkingDirFileContents() is not very nuanced in how it reports errors, but there can be a genuine I/O failure that will make GetWorkingDirFileContents() return nil while there's a good cert in the secret as well as on disk. By proceeding we make things worse by wiping out a good cert from the secret. We will recover in time, but this is a disruption we can avoid.
|
/test unit |
c2cbd29
to
1ac04cd
Compare
507e12a
to
f4f34e1
Compare
|
/test unit |
|
/refresh |
|
@jcantrill: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
de17373
to
1420be7
Compare
|
/hold cancel |
scripts/cert_generation.sh
Outdated
|
|
||
| if [ ! -s "${WORKING_DIR}/kibana-session-secret" ] ; then | ||
| info "Generating kibana session secret" | ||
| result=$(python -c "import string,random;print(''.join(random.choice(string.ascii_letters + string.digits) for i in range(32)))") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This achieves almost the same end, but with a shorter alphabet of letters, without the Python dependency
[root@e33867c18033 cluster-logging-operator]# dd if=/dev/urandom count=1 ibs=16 status=none | hexdump -e '"%02X"'
4306BCE91BC0C99B7AB482ECC3E73047
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated to remove python dep
|
/test e2e-operator |
|
/retest |
1 similar comment
|
/retest |
5127e56
to
6214207
Compare
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jcantrill, syedriko The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@jcantrill: All pull requests linked via external trackers have merged: Bugzilla bug 1888958 has been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/cherrypick release-4.6 |
|
@jcantrill: #765 failed to apply on top of branch "release-4.6": In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Bug 1888958: Store secrets in one place and utilize mutex
Description
This PR updates the certificate generation code to:
/assign @syedriko @alanconway
Links