Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] - Add fix-certs cmd #444

Closed
wants to merge 2 commits into from

Conversation

tnozicka
Copy link
Contributor

@tnozicka tnozicka commented Apr 29, 2019

@openshift-ci-robot openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 29, 2019
@openshift-ci-robot openshift-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Apr 29, 2019
@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: tnozicka

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 29, 2019
@tnozicka tnozicka changed the title [WIP] - Add rotate certs cmd [WIP] - Add fix certs cmd Apr 29, 2019
@tnozicka tnozicka changed the title [WIP] - Add fix certs cmd [WIP] - Add fix-certs cmd Apr 29, 2019
@openshift-ci-robot openshift-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 30, 2019
@tnozicka tnozicka force-pushed the add-rotate-certs-cmd branch 2 times, most recently from 272ef65 to 40fc9d5 Compare April 30, 2019 08:31
@tnozicka
Copy link
Contributor Author

recovery-apiserver got it's own PR #448

@sttts
Copy link
Contributor

sttts commented May 6, 2019

Commit message is bad of second commit.

}

cmd := &cobra.Command{
Use: "fix-certs",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

description is missing. What is this command doing?

KubeApiserverImage string
PodManifestDir string
StaticPodResourcesDir string
Timeout time.Duration
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Timeout of what?

cmd.Flags().StringVar(&o.KubeApiserverImage, "kube-apiserver-image", o.KubeApiserverImage, "")
cmd.Flags().StringVar(&o.StaticPodResourcesDir, "static-pod-resources", o.StaticPodResourcesDir, "path to store a directory containing static pod resources for recovery apiserver manifest")
cmd.Flags().StringVar(&o.PodManifestDir, "pod-manifest-dir", o.PodManifestDir, "directory for the static pod manifes")
cmd.Flags().DurationVar(&o.Timeout, "timeout", o.Timeout, "timeout, 0 means infinite")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Timeout of what?

@tnozicka
Copy link
Contributor Author

tnozicka commented May 6, 2019

yes, this one is in progress still

PodManifestDir: o.PodManifestDir,
StaticPodResourcesDir: o.StaticPodResourcesDir,
}
defer apiserver.Destroy()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this be after creation?

return fmt.Errorf("failed to create config client: %v", err)
}

operatorConfigInformers := operatorexternalinformers.NewSharedInformerFactory(operatorConfigClient, 10*time.Minute)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing the start of both shared informer factories.

APIVersion: "v1",
Kind: "namespace",
Name: "kube-system",
}) // fake
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is fake?


certRotationWg := sync.WaitGroup{}
go func() {
certRotationWg.Add(1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add outside of the loop. This can race.

@tnozicka
Copy link
Contributor Author

tnozicka commented May 6, 2019

@sttts this is WIP still, I'll let you know when it is ready to review, I have not even went through it myself yet :)

return fmt.Errorf("failed to find kube-apiserver certs dir: %v", err)
}

kubeControllerManagerManifest := path.Join(o.PodManifestDir, KubeControllerManagerStaticPodFileName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

filepath

return fmt.Errorf("unknown object type %q", def.objectType)
}

dir := path.Join(def.toplevelDir, def.objectType, def.name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

filepath


for name, bytes := range data {
filePath := path.Join(dir, name)
err := recovery.EnsureFileContent(filePath, bytes)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this use 0600 for secrets?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keeps the perm of what was there

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and fallback is 600

@@ -11,3 +11,6 @@ COPY --from=builder /go/src/github.com/openshift/cluster-kube-apiserver-operator
COPY manifests/*.yaml /manifests
COPY manifests/image-references /manifests
LABEL io.openshift.release.operator true
# FIXME: entrypoint shouldn't be bash but the binary (needs fixing the chain)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mfojtik I'd expect a reason

@@ -11,3 +11,5 @@ COPY --from=builder /go/src/github.com/openshift/cluster-kube-apiserver-operator
COPY manifests/*.yaml /manifests
COPY manifests/image-references /manifests
LABEL io.openshift.release.operator true
# FIXME: entrypoint shouldn't be bash but the binary (needs fixing the chain)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove

func NewCreateCommand() *cobra.Command {
o := &CreateOptions{
Options: NewDefaultOptions(),
KubeApiserverImage: "", // TODO: set the public pullspec
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove, error out in Validate() ?

}

func (o *CreateOptions) Run() error {
ctx, cancel := watch.ContextWithOptionalTimeout(context.Background(), o.Timeout)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

context.TODO()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in future we will pass the context via Run() that context will be wired to TERM signal handler

func (o *CreateOptions) Run() error {
ctx, cancel := watch.ContextWithOptionalTimeout(context.Background(), o.Timeout)
defer cancel()
// TODO: hook up signals
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

			termHandler := server.SetupSignalHandler()
			ctx, shutdown := context.WithCancel(context.TODO())
			go func() {
				defer shutdown()
				<-termHandler
			}()

return nil
}

fmt.Printf("Waiting for recovery apiserver to come up.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use klog or at least add new line ;-)

return nil, err
}

return &clientcmdapiv1.Config{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rest.CopyConfig

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from where? you are also commenting on the wrong PR, this one is just picking up #448

}, nil
}

func (s *Apiserver) GetKubeClientset() (*kubernetes.Clientset, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why you need clientset?

}

// TODO: add/prettify descriptions
cmd.Flags().StringVar(&o.KubeApiserverImage, "kube-apiserver-image", o.KubeApiserverImage, "")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as other pull, I don't want to choose images or directories.

defer cancel()
// TODO: hook up signals

apiserver := &recovery.Apiserver{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks weird.

StaticPodResourcesDir: o.StaticPodResourcesDir,
}

err := apiserver.Create()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not do this. Fixing the certs should use the admin.kubeconfig from it's known location. It should not try to start a kube-apiserver


operatorConfigInformers := operatorexternalinformers.NewSharedInformerFactory(operatorConfigClient, 10*time.Minute)
certRotationWg.Add(1)
go func() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an odd structure. Start is non-blocking. I don't see a reason to wrap it in a gofunc

certRotationWg.Add(1)
go func() {
defer certRotationWg.Done()
kubeApiserverInformersForNamespaces.Start(certRotationCtx.Done())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same coment about gofunc

eventRecorder := events.NewKubeRecorder(kubeClient.CoreV1().Events(""), "fix-certs (CLI)", &corev1.ObjectReference{
APIVersion: "v1",
Kind: "namespace",
Name: "kube-system",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use the openshift-kube-apiserver-operator namespace and set the namespace value or we end up in default and unfindable.

klog.Info("Waiting for certs to be refreshed...")
// FIXME: wait for valid certs
// time.Sleep(5*time.Minute)
time.Sleep(30 * time.Second)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't do. Add a RunOnce at each level so we can be sure we run and avoid the go func/defer, wait, hope flow.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't do. Add a RunOnce at each level so we can be sure we run and avoid the go func/defer, wait, hope flow.

You'll also need a WaitForPrereqs func to sync the bits at least once.

}

timestamp := time.Now().Format(time.RFC3339)
kubeControllerManagerPod.Annotations["force-triggered-by-fix-certs-at"] = timestamp
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expected to see the kube-apiserver pod here too. Let's do them all, just to be sure.

@openshift-ci-robot
Copy link

@tnozicka: The following tests failed, say /retest to rerun them all:

Test name Commit Details Rerun command
ci/prow/e2e-aws-operator faea844 link /test e2e-aws-operator
ci/prow/e2e-aws faea844 link /test e2e-aws
ci/prow/verify faea844 link /test verify
ci/prow/unit faea844 link /test unit
ci/prow/images faea844 link /test images
ci/prow/e2e-aws-upgrade faea844 link /test e2e-aws-upgrade

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@deads2k
Copy link
Contributor

deads2k commented May 6, 2019

merged in #461

@deads2k deads2k closed this May 6, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants