New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug 1836927: Prefer the new etcd endpoints configmap for storage URL discovery #859
Bug 1836927: Prefer the new etcd endpoints configmap for storage URL discovery #859
Conversation
it looks plausible to me, but ask @sttts or @p0lyn0mial @sanchezl |
Now that I'm looking at it again, I'd rather keep them orthogonal if possible, thinking... |
Latest commit tries to keep the observers orthogonal by making the |
/retest |
/test e2e-aws-operator |
@ironcladlou: This pull request references Bugzilla bug 1836927, which is valid. The bug has been updated to refer to the pull request using the external bug tracker. 3 validation(s) were run on this bug
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/retest |
pkg/operator/configobservation/configobservercontroller/observe_config_controller.go
Outdated
Show resolved
Hide resolved
/retest |
/test e2e-aws |
/test e2e-aws-operator |
/test e2e-aws-upgrade |
pkg/operator/configobservation/etcdendpoints/observe_etcd_endpoints.go
Outdated
Show resolved
Hide resolved
206d702
to
c1f4eb3
Compare
Kube service design asserts `endpoint` resources cannot exist without a corresponding `service` resource, and Kube will actively delete the endpoint when the service is deleted or if Kube detects the endpoint is a "stray". The operator needs to: 1. Manage etcd endpoint state atomically. 2. Maintain exclusive ownership of the etcd endpoint state resource. Altogether this makes the `endpoint` resource inappropriate for the task. The competition between the operator and the Kube endpoints controller to manage the endpoint has led to instability. To resolve the problems, persist etcd endpoint state in a `configmap`. Maintain compatibility by continuing to write the `endpoint`, and update consuming components to prefer the `configmap` over the `endpoint`. Also requires: openshift/cluster-kube-apiserver-operator#859 openshift/cluster-openshift-apiserver-operator#364
Kube service design asserts `endpoint` resources cannot exist without a corresponding `service` resource, and Kube will actively delete the endpoint when the service is deleted or if Kube detects the endpoint is a "stray". The operator needs to: 1. Manage etcd endpoint state atomically. 2. Maintain exclusive ownership of the etcd endpoint state resource. Altogether this makes the `endpoint` resource inappropriate for the task. The competition between the operator and the Kube endpoints controller to manage the endpoint has led to instability. To resolve the problems, persist etcd endpoint state in a `configmap`. Maintain compatibility by continuing to write the `endpoint`, and update consuming components to prefer the `configmap` over the `endpoint`. Also requires: openshift/cluster-kube-apiserver-operator#859 openshift/cluster-openshift-apiserver-operator#364
Kube service design asserts `endpoint` resources cannot exist without a corresponding `service` resource, and Kube will actively delete the endpoint when the service is deleted or if Kube detects the endpoint is a "stray". The operator needs to: 1. Manage etcd endpoint state atomically. 2. Maintain exclusive ownership of the etcd endpoint state resource. Altogether this makes the `endpoint` resource inappropriate for the task. The competition between the operator and the Kube endpoints controller to manage the endpoint has led to instability. To resolve the problems, persist etcd endpoint state in a `configmap`. Maintain compatibility by continuing to write the `endpoint`, and update consuming components to prefer the `configmap` over the `endpoint`. Also requires: openshift/cluster-kube-apiserver-operator#859 openshift/cluster-openshift-apiserver-operator#364
/retest |
e2e test failures seem unrelated. |
@deads2k @sanchezl @hexfusion @retroflexer this is ready for review. |
/retest Please review the full test history for this PR and help us cut down flakes. |
19 similar comments
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
e2e is flaking and the problems are obscured by https://bugzilla.redhat.com/show_bug.cgi?id=1837992. Added the fix from #864 to see if that helps understand or resolve the problem. |
decb971
to
581b4e8
Compare
Both the endpoint and configmap observer were missing a sort of the final etcd URL array, causing endless churn rolling out new apiserver revisions. |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: deads2k, ironcladlou The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@ironcladlou: Some pull requests linked via external trackers have merged: openshift/cluster-etcd-operator#354, openshift/cluster-kube-apiserver-operator#859. The following pull requests linked via external trackers have not merged:
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
xref openshift/cluster-etcd-operator#354