New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug 1832923: Forget bootstrap etcd member IP after bootstrap #367
Bug 1832923: Forget bootstrap etcd member IP after bootstrap #367
Conversation
/test e2e-aws-upgrade |
@ironcladlou: The specified target(s) for
Use In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Latest failures are I think unrelated, and I have a successful 4.4 to 4.5 upgrade job. I want to do a little refactoring and add test coverage for this controller. |
@ironcladlou: This pull request references Bugzilla bug 1832923, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker. 3 validation(s) were run on this bug
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
23cbbc0
to
1a23919
Compare
@hexfusion @retroflexer @deads2k refactored, added a bunch of tests, re-running the Y upgrade test. Tomorrow I might go ahead and continue expanding the unit test while I'm in here to cover node=>configmap translation scenarios. |
/retest |
There's a problem here. In a real cluster the annotation is never cleaned up because the revision checks falsely detect a rollout because nodestates.targetrevision is empty. So, the unit tests are inaccurate. I need clarification on the semantics around these fields: https://github.com/openshift/api/blob/master/operator/v1/types.go#L208
|
Now I'm relying on latestAvailableRevision instead of nodestatus.targetRevision. |
97189b9
to
10c3202
Compare
Unrelated flakes; annotation was removed and etcd related things are reporting available. |
AWS failures look unrelated:
Dug into the artifacts and nothing seems amiss with etcd or the apiserver. /retest |
@hexfusion @retroflexer this is ready for review |
Successful 4.4 -> 4.5 run w/ the latest commit: https://deck-ci.apps.ci.l2s4.p1.openshiftapps.com/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-launch-aws/12426 |
/retest |
1 similar comment
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall, a few questions.
configmapClient: kubeClient.CoreV1(), | ||
} | ||
return factory.New().ResyncEvery(time.Minute).WithInformers( | ||
operatorClient.Informer(), | ||
configmapsInformer.Informer(), | ||
kubeInformers.InformersFor(operatorclient.TargetNamespace).Core().V1().ConfigMaps().Informer(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious what is the difference between the former and the latter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
functionally equivalent, both cause sync to trigger on configmap change in the openshift-etcd namespace
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK so why change :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because configmapsInformer was only used in one place (so what's the point of a local to begin with), and also had a misleading name (it isn't an informer for all configmaps, it's only for the openshift-etcd namespace).
pkg/operator/etcdendpointscontroller/etcdendpointscontroller.go
Outdated
Show resolved
Hide resolved
After bootstrap completes, forget the bootstrap etcd member IP by clearing the annotation from the endpoint configmap. Otherwise the etcd client will continue failing to connect to the dead bootstrap node forever.
10c3202
to
c314c0b
Compare
/test images |
/retest |
/test all |
/refresh |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: hexfusion, ironcladlou The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest Please review the full test history for this PR and help us cut down flakes. |
2 similar comments
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
// check to see if bootstrapping is complete | ||
bootstrapFinishedConfigMap, err := configMapClient.ConfigMaps("kube-system").Get("bootstrap") | ||
if err != nil { | ||
if errors.IsNotFound(err) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: prefer
if errors.IsNotFound(){
return
}
if err != nil{
different return
}
/retest Please review the full test history for this PR and help us cut down flakes. |
3 similar comments
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
@ironcladlou: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/retest Please review the full test history for this PR and help us cut down flakes. |
@ironcladlou: All pull requests linked via external trackers have merged: openshift/cluster-etcd-operator#367. Bugzilla bug 1832923 has been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
After bootstrap completes, forget the bootstrap etcd member IP by clearing
the annotation from the endpoint configmap. Otherwise the etcd client will
continue failing to connect to the dead bootstrap node forever.