Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 1836927: Prefer the new etcd endpoints configmap for storage URL discovery #859

Merged

Conversation

ironcladlou
Copy link
Contributor

@openshift-ci-robot openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 15, 2020
@deads2k
Copy link
Contributor

deads2k commented May 15, 2020

it looks plausible to me, but ask @sttts or @p0lyn0mial @sanchezl

@ironcladlou
Copy link
Contributor Author

Now that I'm looking at it again, I'd rather keep them orthogonal if possible, thinking...

@ironcladlou
Copy link
Contributor Author

Latest commit tries to keep the observers orthogonal by making the endpoint observer no-op when the configmap exists.

@ironcladlou
Copy link
Contributor Author

/retest

@ironcladlou
Copy link
Contributor Author

/test e2e-aws-operator
/test e2e-aws-upgrade

@ironcladlou ironcladlou changed the title WIP: Prefer the new etcd endpoints configmap for storage URL discovery Bug 1836927: Prefer the new etcd endpoints configmap for storage URL discovery May 18, 2020
@openshift-ci-robot openshift-ci-robot added bugzilla/severity-high Referenced Bugzilla bug's severity is high for the branch this PR is targeting. and removed do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. labels May 18, 2020
@openshift-ci-robot
Copy link

@ironcladlou: This pull request references Bugzilla bug 1836927, which is valid. The bug has been updated to refer to the pull request using the external bug tracker.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.5.0) matches configured target release for branch (4.5.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

In response to this:

Bug 1836927: Prefer the new etcd endpoints configmap for storage URL discovery

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added the bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. label May 18, 2020
@ironcladlou
Copy link
Contributor Author

/retest

@ironcladlou
Copy link
Contributor Author

/retest

@hexfusion
Copy link
Contributor

/test e2e-aws

@hexfusion
Copy link
Contributor

/test e2e-aws-operator

@hexfusion
Copy link
Contributor

/test e2e-aws-upgrade

ironcladlou added a commit to ironcladlou/cluster-etcd-operator that referenced this pull request May 18, 2020
Kube service design asserts `endpoint` resources cannot exist without a
corresponding `service` resource, and Kube will actively delete the endpoint
when the service is deleted or if Kube detects the endpoint is a "stray".

The operator needs to:

1. Manage etcd endpoint state atomically.
2. Maintain exclusive ownership of the etcd endpoint state resource.

Altogether this makes the `endpoint` resource inappropriate for the task. The
competition between the operator and the Kube endpoints controller to manage the
endpoint has led to instability.

To resolve the problems, persist etcd endpoint state in a `configmap`.

Maintain compatibility by continuing to write the `endpoint`, and update
consuming components to prefer the `configmap` over the `endpoint`.

Also requires:
openshift/cluster-kube-apiserver-operator#859
openshift/cluster-openshift-apiserver-operator#364
ironcladlou added a commit to ironcladlou/cluster-etcd-operator that referenced this pull request May 18, 2020
Kube service design asserts `endpoint` resources cannot exist without a
corresponding `service` resource, and Kube will actively delete the endpoint
when the service is deleted or if Kube detects the endpoint is a "stray".

The operator needs to:

1. Manage etcd endpoint state atomically.
2. Maintain exclusive ownership of the etcd endpoint state resource.

Altogether this makes the `endpoint` resource inappropriate for the task. The
competition between the operator and the Kube endpoints controller to manage the
endpoint has led to instability.

To resolve the problems, persist etcd endpoint state in a `configmap`.

Maintain compatibility by continuing to write the `endpoint`, and update
consuming components to prefer the `configmap` over the `endpoint`.

Also requires:
openshift/cluster-kube-apiserver-operator#859
openshift/cluster-openshift-apiserver-operator#364
ironcladlou added a commit to ironcladlou/cluster-etcd-operator that referenced this pull request May 18, 2020
Kube service design asserts `endpoint` resources cannot exist without a
corresponding `service` resource, and Kube will actively delete the endpoint
when the service is deleted or if Kube detects the endpoint is a "stray".

The operator needs to:

1. Manage etcd endpoint state atomically.
2. Maintain exclusive ownership of the etcd endpoint state resource.

Altogether this makes the `endpoint` resource inappropriate for the task. The
competition between the operator and the Kube endpoints controller to manage the
endpoint has led to instability.

To resolve the problems, persist etcd endpoint state in a `configmap`.

Maintain compatibility by continuing to write the `endpoint`, and update
consuming components to prefer the `configmap` over the `endpoint`.

Also requires:
openshift/cluster-kube-apiserver-operator#859
openshift/cluster-openshift-apiserver-operator#364
@ironcladlou
Copy link
Contributor Author

/retest

@ironcladlou
Copy link
Contributor Author

e2e test failures seem unrelated.

@ironcladlou
Copy link
Contributor Author

@deads2k @sanchezl @hexfusion @retroflexer this is ready for review.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

19 similar comments
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-ci-robot openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label May 21, 2020
@ironcladlou
Copy link
Contributor Author

e2e is flaking and the problems are obscured by https://bugzilla.redhat.com/show_bug.cgi?id=1837992. Added the fix from #864 to see if that helps understand or resolve the problem.

@ironcladlou
Copy link
Contributor Author

Both the endpoint and configmap observer were missing a sort of the final etcd URL array, causing endless churn rolling out new apiserver revisions.

@deads2k
Copy link
Contributor

deads2k commented May 21, 2020

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label May 21, 2020
@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: deads2k, ironcladlou

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot merged commit b11d73a into openshift:master May 21, 2020
@openshift-ci-robot
Copy link

@ironcladlou: Some pull requests linked via external trackers have merged: openshift/cluster-etcd-operator#354, openshift/cluster-kube-apiserver-operator#859. The following pull requests linked via external trackers have not merged:

In response to this:

Bug 1836927: Prefer the new etcd endpoints configmap for storage URL discovery

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/severity-high Referenced Bugzilla bug's severity is high for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants