New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug 1845903: gcp-routes: decrease downfile poll to be faster than LB on recovery #1808
Bug 1845903: gcp-routes: decrease downfile poll to be faster than LB on recovery #1808
Conversation
@sttts: This pull request references Bugzilla bug 1845903, which is invalid:
Comment In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/lgtm |
/bugzilla refresh |
@sttts: This pull request references Bugzilla bug 1845903, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker. 3 validation(s) were run on this bug
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/retest |
/cherry-pick release-4.5 |
@sttts: once the present PR merges, I will cherry-pick it on top of release-4.5 in a new PR and assign it to you. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/approve |
/retest |
1 similar comment
/retest |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: cgwalters, squeed, sttts The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest Please review the full test history for this PR and help us cut down flakes. |
/test e2e-aws-scaleup-rhel7 |
/retest Please review the full test history for this PR and help us cut down flakes. |
2 similar comments
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
9 similar comments
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
@sttts: All pull requests linked via external trackers have merged: openshift/machine-config-operator#1808. Bugzilla bug 1845903 has been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@sttts: new pull request created: #1821 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/cherry-pick release-4.4 |
@sttts: new pull request created: #1837 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
When the local kube-apiserver becomes ready, the GCP LBs pick up the endpoint and route traffic to the IP. In parallel the gcp-routes service notices the local, green readyz and stops sending traffic to that LB.
The second step must happen BEFORE the first. Otherwise, local requests still go to the GCP LB that already picked up the endpoint. We risk to blackhole 1/3 of the requests (because GCP has no hairpinning support).
The reason is that the gcp-routes script polls every 5s (without inotify which isn't installed in the image as we know) such that we end up with 1*2s + 1.9999s + 5s until the gcp-routes script updates iptables (1.9999s because the readyz polling happens at an unfortunate time, and the 5s for the poll script to notice). Hence, 9s >> 6s of the LB. Hence, for 3s we might lose 1/3 of the requests originating from the local host.
Compare: https://github.com/openshift/installer/pull/3512/files#diff-3aaac4ae7d381237a540f05371931b76R10