Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 1809665: The console should wait until it is out of rotation to shut down #385

Merged

Conversation

smarterclayton
Copy link
Contributor

@smarterclayton smarterclayton commented Feb 28, 2020

When a pod is marked deleted, endpoints are updated instantly and
propagate to load balancers, routers, and nodes. A component that
wishes to remain available during upgrades must wait longer than
the default propagation interval for these changes to avoid having
requests delivered to pods that are shutting down.

Change the console to wait 25s before terminating the serving
process and up to 40s on the node to ensure all front ends have
time to drain. The minimum interval here is how long an average
connection can take behind the router to drain, once new connections
stop getting created. I.e.

wait = time to propagate endpoints (5s) +
       time for router reload (5s) +
       time for longest request to finish (15s)
     = 25s

This change should result in the console not being disrupted during
upgrade (0s disruption to the console route).

Downtime documented in https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade/19447

When a pod is marked deleted, endpoints are updated instantly and
propagate to load balancers, routers, and nodes. A component that
wishes to remain available during upgrades must wait longer than
the default propagation interval for these changes to avoid having
requests delivered to pods that are shutting down.

Change the console to wait 25s before terminating the serving
process and up to 40s on the node to ensure all front ends have
time to drain. The minimum interval here is how long an average
connection can take behind the router to drain, once new connections
stop getting created. I.e.

  wait = time to propagate endpoints (5s) +
         time for router reload (5s) +
         time for longest request to finish (15s)
       = 25s
@smarterclayton
Copy link
Contributor Author

/hold

while testing

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 28, 2020
@smarterclayton
Copy link
Contributor Author

/test images

@smarterclayton
Copy link
Contributor Author

/hold cancel

Verified during upgrade route ingresses remain available https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade/19806 when testing all three PRs together.

This is ready for review.

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 3, 2020
@smarterclayton smarterclayton changed the title The console should wait until it is out of rotation to shut down Bug 1809665: The console should wait until it is out of rotation to shut down Mar 3, 2020
@openshift-ci-robot openshift-ci-robot added the bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. label Mar 3, 2020
@openshift-ci-robot
Copy link
Contributor

@smarterclayton: This pull request references Bugzilla bug 1809665, which is invalid:

  • expected the bug to be in one of the following states: NEW, ASSIGNED, ON_DEV, POST, POST, but it is MODIFIED instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

Bug 1809665: The console should wait until it is out of rotation to shut down

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@smarterclayton
Copy link
Contributor Author

/cherry-pick release-4.4

@openshift-cherrypick-robot

@smarterclayton: once the present PR merges, I will cherry-pick it on top of release-4.4 in a new PR and assign it to you.

In response to this:

/cherry-pick release-4.4

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@smarterclayton
Copy link
Contributor Author

/cherry-pick release-4.3

@openshift-cherrypick-robot

@smarterclayton: once the present PR merges, I will cherry-pick it on top of release-4.3 in a new PR and assign it to you.

In response to this:

/cherry-pick release-4.3

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@smarterclayton
Copy link
Contributor Author

4.4 bug is 1809667 and 4.3 bug is 1809668

Copy link
Contributor

@benjaminapetersen benjaminapetersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve
/lgtm

looking into the e2e test fail to see if its a flake.

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Mar 3, 2020
@benjaminapetersen
Copy link
Contributor

/retest

Feb 28 23:36:09.600: INFO:  > ERROR: (gcloud.compute.instance-groups.list-instances) could not parse resource [] 
...
 fail [k8s.io/kubernetes/test/e2e/apimachinery/resource_quota.go:166]: Unexpected error:
    <*errors.errorString | 0xc0002c43f0>: {
        s: "timed out waiting for the condition",
}
timed out waiting for the condition 

@smarterclayton
Copy link
Contributor Author

/retest

@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: benjaminapetersen, smarterclayton

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 3, 2020
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

1 similar comment
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@smarterclayton
Copy link
Contributor Author

/bugzilla refresh

@openshift-ci-robot
Copy link
Contributor

@smarterclayton: This pull request references Bugzilla bug 1809665, which is invalid:

  • expected the bug to be in one of the following states: NEW, ASSIGNED, ON_DEV, POST, POST, but it is ON_QA instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@smarterclayton
Copy link
Contributor Author

/bugzilla refresh

@openshift-ci-robot openshift-ci-robot added the bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. label Mar 3, 2020
@openshift-ci-robot
Copy link
Contributor

@smarterclayton: This pull request references Bugzilla bug 1809665, which is valid. The bug has been updated to refer to the pull request using the external bug tracker.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.5.0) matches configured target release for branch (4.5.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot removed the bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. label Mar 3, 2020
@smarterclayton
Copy link
Contributor Author

/retest

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

8 similar comments
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-merge-robot openshift-merge-robot merged commit a9b8732 into openshift:master Mar 4, 2020
@openshift-ci-robot
Copy link
Contributor

@smarterclayton: All pull requests linked via external trackers have merged. Bugzilla bug 1809665 has been moved to the MODIFIED state.

In response to this:

Bug 1809665: The console should wait until it is out of rotation to shut down

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-cherrypick-robot

@smarterclayton: new pull request created: #387

In response to this:

/cherry-pick release-4.4

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-cherrypick-robot

@smarterclayton: new pull request created: #388

In response to this:

/cherry-pick release-4.3

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants