Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 1888958: Store secrets in one place and utilize mutex #765

Merged
merged 1 commit into from Nov 7, 2020

Conversation

jcantrill
Copy link
Collaborator

Description

This PR updates the certificate generation code to:

  • Add info message why certs are being (re)generated
  • Adds a mutex to the cert code to ensure changes are made in a single threaded fashion
  • Moves stashing of generated certs to the "master-certs" secret

/assign @syedriko @alanconway

Links

@openshift-ci-robot
Copy link

@jcantrill: This pull request references Bugzilla bug 1888958, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.7.0) matches configured target release for branch (4.7.0)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

In response to this:

Bug 1888958: Store secrets in one place and utilize mutex

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added bugzilla/severity-high Referenced Bugzilla bug's severity is high for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. labels Oct 29, 2020
@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 29, 2020
@jcantrill
Copy link
Collaborator Author

/hold

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 29, 2020
return
}

scriptsDir := utils.GetScriptsDir()
if err = GenerateCertificates(clusterRequest.Cluster.Namespace, scriptsDir, "elasticsearch", utils.DefaultWorkingDir); err != nil {
if err = GenerateCertificates(clusterRequest.Cluster.Namespace, scriptsDir, "elasticsearch", utils.GetWorkingDir()); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the script could indicate by its return value that it didn't change anything, we could skip writing the secret and cut down on API server traffic quite a bit.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated as we discussed

func (clusterRequest *ClusterLoggingRequest) extractSecretToFile(secretName string, key string, toFile string) (err error) {
secret, err := clusterRequest.GetSecret(secretName)
func (clusterRequest *ClusterLoggingRequest) extractMasterCerts() (err error) {
secret, err := clusterRequest.GetSecret(constants.MasterCASecretName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the new code runs for the first time after an upgrade, it'll dump files "masterca" and "masterkey" on disk and keep them circulating. Just might account for that and skip them altogether.

// Pull master signing cert out from secret in logging.Spec.SecretName
if err = clusterRequest.readSecrets(); err != nil {
mutex.Lock()
defer mutex.Unlock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking for 0-length files while populating per-component secrets was one of the bullet points, too.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i believe the changes to the script resolve this

@syedriko
Copy link
Contributor

}
results := map[string][]byte{}
for _, f := range files {
results[f.Name()] = utils.GetFileContents(path.Join(workDir, f.Name()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could use a check for GetFileContents() failing here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The called function logs an error and returns nil. Adding check for nil and log message. Skipping adding the file if the content is nil

map[string][]byte{
var secrets = map[string][]byte{}
Syncronize(func() error {
secrets = map[string][]byte{
Copy link
Contributor

@syedriko syedriko Oct 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're not checking for possible errors from GetWorkingDirFileContents()
I took a stab at adding that in https://github.com/syedriko/cluster-logging-operator/blob/bz_1888958/pkg/k8shandler/logstore.go#L155

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is OK not to check missing here. We need a bunch of secrets and we should fail and reconcile again. The cert gen code should regen if something is missing. If it does not that is a bug in the regen code IMO

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But if we don't check we don't fail and stick empty buffers into secrets.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the action we can take in this situation? What do we believe will change during a subsequent reconciliation that would resolve the issue? The best we can do is really generate a log message and/or update status. This particular file is probably moot as we want to remove curation from CLO but if we are to do anything then it should be applied to the other places where we take similar action

Copy link
Contributor

@syedriko syedriko Nov 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can abort, refuse to update the secret and to potentially make our situation worse. GetWorkingDirFileContents() is not very nuanced in how it reports errors, but there can be a genuine I/O failure that will make GetWorkingDirFileContents() return nil while there's a good cert in the secret as well as on disk. By proceeding we make things worse by wiping out a good cert from the secret. We will recover in time, but this is a disruption we can avoid.

@jcantrill
Copy link
Collaborator Author

/test unit

@jcantrill
Copy link
Collaborator Author

/test unit

@jcantrill
Copy link
Collaborator Author

/refresh

@openshift-ci-robot
Copy link

@jcantrill: The following test failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/unit f4f34e14294d97a58a0d94b02a38e47755c85d46 link /test unit

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@jcantrill jcantrill force-pushed the 1888958 branch 2 times, most recently from de17373 to 1420be7 Compare November 3, 2020 21:59
@jcantrill
Copy link
Collaborator Author

/hold cancel

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 3, 2020

if [ ! -s "${WORKING_DIR}/kibana-session-secret" ] ; then
info "Generating kibana session secret"
result=$(python -c "import string,random;print(''.join(random.choice(string.ascii_letters + string.digits) for i in range(32)))")
Copy link
Contributor

@syedriko syedriko Nov 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This achieves almost the same end, but with a shorter alphabet of letters, without the Python dependency

[root@e33867c18033 cluster-logging-operator]# dd if=/dev/urandom count=1 ibs=16 status=none | hexdump -e '"%02X"'
4306BCE91BC0C99B7AB482ECC3E73047

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated to remove python dep

@syedriko
Copy link
Contributor

syedriko commented Nov 4, 2020

/test e2e-operator

@jcantrill
Copy link
Collaborator Author

/retest

1 similar comment
@jcantrill
Copy link
Collaborator Author

/retest

@jcantrill jcantrill force-pushed the 1888958 branch 3 times, most recently from 5127e56 to 6214207 Compare November 5, 2020 18:42
@syedriko
Copy link
Contributor

syedriko commented Nov 7, 2020

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Nov 7, 2020
@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jcantrill, syedriko

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot merged commit 3af50bd into openshift:master Nov 7, 2020
@openshift-ci-robot
Copy link

@jcantrill: All pull requests linked via external trackers have merged:

Bugzilla bug 1888958 has been moved to the MODIFIED state.

In response to this:

Bug 1888958: Store secrets in one place and utilize mutex

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@jcantrill jcantrill deleted the 1888958 branch November 7, 2020 13:51
@jcantrill
Copy link
Collaborator Author

/cherrypick release-4.6

@openshift-cherrypick-robot

@jcantrill: #765 failed to apply on top of branch "release-4.6":

Applying: Bug 1888958: Store secrets in one place and utilize mutex
Using index info to reconstruct a base tree...
M	Dockerfile
M	Makefile
A	pkg/certificates/certificates.go
M	pkg/k8shandler/certificates.go
M	pkg/k8shandler/curation.go
M	pkg/k8shandler/fluentd.go
M	pkg/k8shandler/logstore.go
M	pkg/k8shandler/visualization.go
M	pkg/utils/utils.go
A	test/functional/framework.go
M	test/helpers/framework.go
Falling back to patching base and 3-way merge...
Auto-merging test/helpers/framework.go
CONFLICT (content): Merge conflict in test/helpers/framework.go
CONFLICT (modify/delete): test/functional/framework.go deleted in HEAD and modified in Bug 1888958: Store secrets in one place and utilize mutex. Version Bug 1888958: Store secrets in one place and utilize mutex of test/functional/framework.go left in tree.
Auto-merging pkg/utils/utils.go
Auto-merging pkg/k8shandler/visualization.go
Auto-merging pkg/k8shandler/logstore.go
Auto-merging pkg/k8shandler/fluentd.go
Auto-merging pkg/k8shandler/curation.go
Auto-merging pkg/k8shandler/certificates.go
CONFLICT (content): Merge conflict in pkg/k8shandler/certificates.go
CONFLICT (modify/delete): pkg/certificates/certificates.go deleted in HEAD and modified in Bug 1888958: Store secrets in one place and utilize mutex. Version Bug 1888958: Store secrets in one place and utilize mutex of pkg/certificates/certificates.go left in tree.
Auto-merging Makefile
CONFLICT (content): Merge conflict in Makefile
Auto-merging Dockerfile
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0001 Bug 1888958: Store secrets in one place and utilize mutex
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

In response to this:

/cherrypick release-4.6

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

pmoogi-redhat pushed a commit to pmoogi-redhat/cluster-logging-operator that referenced this pull request Apr 26, 2022
Bug 1888958: Store secrets in one place and utilize mutex
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/severity-high Referenced Bugzilla bug's severity is high for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged. release/4.7
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants