Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 1817028: *: add an init container to stop the pod with bad revision #284

Merged
merged 2 commits into from Apr 7, 2020

Conversation

alaypatel07
Copy link
Contributor

@alaypatel07 alaypatel07 commented Mar 26, 2020

Add an init container that fails the pod if it has stale revision data.
This can especially occur during DR as explained in #278.

This is a two-pronged solution:

  1. Add the init container that fails on bad env data
  2. Modify the clustermembercontroller to not add the member unless all
    init containers have exit 0.

@openshift-ci-robot openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 26, 2020
@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 26, 2020
Copy link
Contributor

@hexfusion hexfusion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alay this will always fail see notes.

bindata/etcd/pod.yaml Outdated Show resolved Hide resolved
bindata/etcd/pod.yaml Outdated Show resolved Hide resolved
bindata/etcd/pod.yaml Outdated Show resolved Hide resolved
@alaypatel07 alaypatel07 changed the title [WIP]: add an init container to stop the pod with bad revision *: add an init container to stop the pod with bad revision Mar 27, 2020
@openshift-ci-robot openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 27, 2020
@alaypatel07 alaypatel07 force-pushed the fix-dr-2 branch 3 times, most recently from d514d93 to 0e89a3b Compare March 27, 2020 16:51
@hexfusion
Copy link
Contributor

{"component":"entrypoint","file":"prow/entrypoint/run.go:168","func":"k8s.io/test-infra/prow/entrypoint.Options.ExecuteProcess","level":"error","msg":"Entrypoint received interrupt: terminated","time":"2020-03-27T20:04:41Z"}

prow blew up

/test all

@alaypatel07 alaypatel07 force-pushed the fix-dr-2 branch 4 times, most recently from 2457453 to 53f8328 Compare March 28, 2020 01:32
@alaypatel07
Copy link
Contributor Author

/retest

@alaypatel07 alaypatel07 force-pushed the fix-dr-2 branch 4 times, most recently from edefabe to 3182726 Compare March 30, 2020 19:13
@hexfusion
Copy link
Contributor

/test all

@alaypatel07 alaypatel07 force-pushed the fix-dr-2 branch 3 times, most recently from db1daf5 to 281882c Compare April 1, 2020 15:48
@alaypatel07
Copy link
Contributor Author

/retest

@hexfusion
Copy link
Contributor

/test all

@hexfusion
Copy link
Contributor

sigh why are we failing AWS

@hexfusion
Copy link
Contributor

        s: "promQL query: count_over_time(ALERTS{alertname!~\"Watchdog|AlertmanagerReceiversNotConfigured|KubeAPILatencyHigh\",alertstate=\"firing\",severity!=\"info\"}[2h]) >= 1 had reported incorrect results:\n[{\"metric\":{\"alertname\":\"TargetDown\",\"alertstate\":\"firing\",\"job\":\"metrics\",\"namespace\":\"openshift-console-operator\",\"service\":\"metrics\",\"severity\":\"warning\"},\"value\":[1585834134.713,\"51\"]}]",

we need to get the bottom of this

#!/bin/sh
set -euo pipefail

: "${NODE_NODE_ENVVAR_NAME_ETCD_URL_HOST?not set}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't read this. I expected to see a downward API injection of IP address and then a bash comparison between the computed IP and the downward API IP.

@alaypatel07
Copy link
Contributor Author

/retest

@alaypatel07 alaypatel07 force-pushed the fix-dr-2 branch 3 times, most recently from c4ae7d9 to f8e8a8c Compare April 6, 2020 22:47
bindata/etcd/pod.yaml Outdated Show resolved Hide resolved
@alaypatel07 alaypatel07 force-pushed the fix-dr-2 branch 4 times, most recently from 2d085b4 to a9fe664 Compare April 6, 2020 23:28
@alaypatel07 alaypatel07 force-pushed the fix-dr-2 branch 2 times, most recently from c9162e1 to f537710 Compare April 7, 2020 02:57
@openshift-ci-robot
Copy link

@alaypatel07: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-aws-disruptive 52e4742 link /test e2e-aws-disruptive
ci/prow/e2e-azure 52e4742 link /test e2e-azure

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@hexfusion
Copy link
Contributor

/lgtm

manual tested in DR

@hexfusion
Copy link
Contributor

/skip

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Apr 7, 2020
@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alaypatel07, hexfusion

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [alaypatel07,hexfusion]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot merged commit 811ecfc into openshift:master Apr 7, 2020
@openshift-ci-robot
Copy link

@alaypatel07: All pull requests linked via external trackers have merged: openshift/library-go#760, openshift/cluster-etcd-operator#284. Bugzilla bug 1817028 has been moved to the MODIFIED state.

In response to this:

Bug 1817028: *: add an init container to stop the pod with bad revision

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@alaypatel07
Copy link
Contributor Author

/cherrypick release-4.4

@openshift-cherrypick-robot

@alaypatel07: new pull request created: #293

In response to this:

/cherrypick release-4.4

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants