New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Bug 1836927: Prefer the new etcd endpoints configmap for storage URL discovery #859

Merged

openshift-merge-robot merged 1 commit into openshift:master from ironcladlou:host-etcd-cm-migration

May 21, 2020

Contributor

ironcladlou commented May 15, 2020

xref openshift/cluster-etcd-operator#354

openshift-ci-robot added the do-not-merge/work-in-progress label

openshift-ci-robot requested review from mfojtik and soltysh

May 15, 2020 15:48

Contributor

deads2k commented May 15, 2020

it looks plausible to me, but ask @sttts or @p0lyn0mial @sanchezl

Contributor Author

ironcladlou commented May 15, 2020

Now that I'm looking at it again, I'd rather keep them orthogonal if possible, thinking...

Contributor Author

ironcladlou commented May 15, 2020

Latest commit tries to keep the observers orthogonal by making the endpoint observer no-op when the configmap exists.

Contributor Author

ironcladlou commented May 18, 2020

/retest

Contributor Author

ironcladlou commented May 18, 2020

/test e2e-aws-operator
/test e2e-aws-upgrade

ironcladlou mentioned this pull request

Bug 1836927: Replace etcd endpoint representation with configmap openshift/cluster-etcd-operator#354

Merged

ironcladlou changed the title ~~WIP: Prefer the new etcd endpoints configmap for storage URL discovery~~ Bug 1836927: Prefer the new etcd endpoints configmap for storage URL discovery

openshift-ci-robot added bugzilla/severity-high and removed do-not-merge/work-in-progress labels

openshift-ci-robot commented May 18, 2020

@ironcladlou: This pull request references Bugzilla bug 1836927, which is valid. The bug has been updated to refer to the pull request using the external bug tracker.

3 validation(s) were run on this bug

bug is open, matching expected state (open)
bug target release (4.5.0) matches configured target release for branch (4.5.0)
bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

In response to this:

Bug 1836927: Prefer the new etcd endpoints configmap for storage URL discovery

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci-robot added the bugzilla/valid-bug label

Contributor Author

ironcladlou commented May 18, 2020

/retest

deads2k reviewed

View reviewed changes

pkg/operator/configobservation/configobservercontroller/observe_config_controller.go Outdated Show resolved Hide resolved

deads2k reviewed

View reviewed changes

pkg/operator/configobservation/etcd/observe_etcd.go Outdated Show resolved Hide resolved

Contributor Author

ironcladlou commented May 18, 2020

/retest

Contributor

hexfusion commented May 18, 2020

/test e2e-aws

Contributor

hexfusion commented May 18, 2020

/test e2e-aws-operator

Contributor

hexfusion commented May 18, 2020

/test e2e-aws-upgrade

ironcladlou mentioned this pull request

Bug 1836927: Prefer etcd endpoints configmap for storage URL discovery openshift/cluster-openshift-apiserver-operator#364

Merged

ironcladlou commented

View reviewed changes

pkg/operator/configobservation/etcdendpoints/observe_etcd_endpoints.go Outdated Show resolved Hide resolved

ironcladlou force-pushed the host-etcd-cm-migration branch from 206d702 to c1f4eb3 Compare

May 18, 2020 20:57

ironcladlou added a commit to ironcladlou/cluster-etcd-operator that referenced this pull request


          Replace etcd endpoint representation with configmap

79f228a

Kube service design asserts `endpoint` resources cannot exist without a
corresponding `service` resource, and Kube will actively delete the endpoint
when the service is deleted or if Kube detects the endpoint is a "stray".

The operator needs to:

1. Manage etcd endpoint state atomically.
2. Maintain exclusive ownership of the etcd endpoint state resource.

Altogether this makes the `endpoint` resource inappropriate for the task. The
competition between the operator and the Kube endpoints controller to manage the
endpoint has led to instability.

To resolve the problems, persist etcd endpoint state in a `configmap`.

Maintain compatibility by continuing to write the `endpoint`, and update
consuming components to prefer the `configmap` over the `endpoint`.

Also requires:
openshift/cluster-kube-apiserver-operator#859
openshift/cluster-openshift-apiserver-operator#364

ironcladlou added a commit to ironcladlou/cluster-etcd-operator that referenced this pull request


          Replace etcd endpoint representation with configmap

f1124ad

Kube service design asserts `endpoint` resources cannot exist without a
corresponding `service` resource, and Kube will actively delete the endpoint
when the service is deleted or if Kube detects the endpoint is a "stray".

The operator needs to:

1. Manage etcd endpoint state atomically.
2. Maintain exclusive ownership of the etcd endpoint state resource.

Altogether this makes the `endpoint` resource inappropriate for the task. The
competition between the operator and the Kube endpoints controller to manage the
endpoint has led to instability.

To resolve the problems, persist etcd endpoint state in a `configmap`.

Maintain compatibility by continuing to write the `endpoint`, and update
consuming components to prefer the `configmap` over the `endpoint`.

Also requires:
openshift/cluster-kube-apiserver-operator#859
openshift/cluster-openshift-apiserver-operator#364

ironcladlou added a commit to ironcladlou/cluster-etcd-operator that referenced this pull request


          Replace etcd endpoint representation with configmap

c52f9e7

Kube service design asserts `endpoint` resources cannot exist without a
corresponding `service` resource, and Kube will actively delete the endpoint
when the service is deleted or if Kube detects the endpoint is a "stray".

The operator needs to:

1. Manage etcd endpoint state atomically.
2. Maintain exclusive ownership of the etcd endpoint state resource.

Altogether this makes the `endpoint` resource inappropriate for the task. The
competition between the operator and the Kube endpoints controller to manage the
endpoint has led to instability.

To resolve the problems, persist etcd endpoint state in a `configmap`.

Maintain compatibility by continuing to write the `endpoint`, and update
consuming components to prefer the `configmap` over the `endpoint`.

Also requires:
openshift/cluster-kube-apiserver-operator#859
openshift/cluster-openshift-apiserver-operator#364

Contributor Author

ironcladlou commented May 18, 2020

/retest

Contributor Author

ironcladlou commented May 18, 2020

e2e test failures seem unrelated.

Contributor Author

ironcladlou commented May 19, 2020

@deads2k @sanchezl @hexfusion @retroflexer this is ready for review.

Contributor

openshift-bot commented May 20, 2020

/retest

Please review the full test history for this PR and help us cut down flakes.

19 similar comments

Contributor

openshift-bot commented May 20, 2020

/retest

Please review the full test history for this PR and help us cut down flakes.

Contributor

openshift-bot commented May 20, 2020

/retest

Please review the full test history for this PR and help us cut down flakes.

Contributor

openshift-bot commented May 20, 2020

/retest

Please review the full test history for this PR and help us cut down flakes.

Contributor

openshift-bot commented May 21, 2020

/retest

Please review the full test history for this PR and help us cut down flakes.

Contributor

openshift-bot commented May 21, 2020

/retest

Please review the full test history for this PR and help us cut down flakes.

Contributor

openshift-bot commented May 21, 2020

/retest

Please review the full test history for this PR and help us cut down flakes.

Contributor

openshift-bot commented May 21, 2020

/retest

Please review the full test history for this PR and help us cut down flakes.

Contributor

openshift-bot commented May 21, 2020

/retest

Please review the full test history for this PR and help us cut down flakes.

Contributor

openshift-bot commented May 21, 2020

/retest

Please review the full test history for this PR and help us cut down flakes.

Contributor

openshift-bot commented May 21, 2020

/retest

Please review the full test history for this PR and help us cut down flakes.

Contributor

openshift-bot commented May 21, 2020

/retest

Please review the full test history for this PR and help us cut down flakes.

Contributor

openshift-bot commented May 21, 2020

/retest

Please review the full test history for this PR and help us cut down flakes.

Contributor

openshift-bot commented May 21, 2020

/retest

Please review the full test history for this PR and help us cut down flakes.

Contributor

openshift-bot commented May 21, 2020

/retest

Please review the full test history for this PR and help us cut down flakes.

Contributor

openshift-bot commented May 21, 2020

/retest

Please review the full test history for this PR and help us cut down flakes.

Contributor

openshift-bot commented May 21, 2020

/retest

Please review the full test history for this PR and help us cut down flakes.

Contributor

openshift-bot commented May 21, 2020

/retest

Please review the full test history for this PR and help us cut down flakes.

Contributor

openshift-bot commented May 21, 2020

/retest

Please review the full test history for this PR and help us cut down flakes.

Contributor

openshift-bot commented May 21, 2020

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-ci-robot removed the lgtm label

Contributor Author

ironcladlou commented May 21, 2020

e2e is flaking and the problems are obscured by https://bugzilla.redhat.com/show_bug.cgi?id=1837992. Added the fix from #864 to see if that helps understand or resolve the problem.


          Prefer the new etcd endpoints configmap for storage URL discovery

581b4e8

ironcladlou force-pushed the host-etcd-cm-migration branch from decb971 to 581b4e8 Compare

May 21, 2020 17:12

Contributor Author

ironcladlou commented May 21, 2020

Both the endpoint and configmap observer were missing a sort of the final etcd URL array, causing endless churn rolling out new apiserver revisions.

Contributor

deads2k commented May 21, 2020

/lgtm

openshift-ci-robot added the lgtm label

openshift-ci-robot commented May 21, 2020

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: deads2k, ironcladlou

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [deads2k]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-merge-robot merged commit b11d73a into openshift:master

openshift-ci-robot commented May 21, 2020

@ironcladlou: Some pull requests linked via external trackers have merged: openshift/cluster-etcd-operator#354, openshift/cluster-kube-apiserver-operator#859. The following pull requests linked via external trackers have not merged:

openshift/cluster-openshift-apiserver-operator#364 is open
Bugzilla bug 1836927 has been moved to the MODIFIED state.

In response to this:

Bug 1836927: Prefer the new etcd endpoints configmap for storage URL discovery

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment