-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug 1781763: [release-4.3] proxy: add handler with same ResyncPeriod as shared informer. #81
Conversation
Works around a nasty bug where having multiple handlers with non-homogeneous resync times causes updates during connection disruption to be missed by one or more handlers.
@openshift-cherrypick-robot: No Bugzilla bug is referenced in the title of this pull request. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/retitle Bug 1781763: [release-4.3] proxy: add handler with same ResyncPeriod as shared informer. |
@openshift-cherrypick-robot: An error was encountered adding this pull request to the external tracker bugs for bug 1781763 on the Bugzilla server at https://bugzilla.redhat.com:
Please contact an administrator to resolve this issue, then request a bug refresh with In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@danwinship can you review? |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: danwinship, openshift-cherrypick-robot The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/bugzilla refresh |
@squeed: This pull request references Bugzilla bug 1781763, which is valid. The bug has been updated to refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@openshift-cherrypick-robot: All pull requests linked via external trackers have merged. Bugzilla bug 1781763 has been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/cherry-pick release-4.2 |
@knobunc: new pull request created: #90 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Baked in edges: $ oc adm release info quay.io/openshift-release-dev/ocp-release:4.3.0-rc.0-x86_64 | grep Upgrades Upgrades: 4.2.13 $ oc adm release info quay.io/openshift-release-dev/ocp-release:4.3.0-rc.3-x86_64 | grep Upgrades Upgrades: 4.2.16, 4.3.0-rc.0, 4.3.0-rc.1, 4.3.0-rc.2 The wide 'from' regexp was appropriate for 4.3.0-rc.0, which had no 4.3 update sources. But rc.3 does have update sources, and we want to allow 4.3.0-rc.0 -> 4.3.0-rc.3, because it is not impacted by the 4.2->4.3 GCP update bug. The overly-strict regexp was from 6d3db09 (Blocking edges to candidate 4.3.0-rc.3, 2020-01-23, #34). Also expand the referenced bugs to for the blocked 4.2 -> 4.3 edges: * Update hangs with [1]: Working towards 4.3.0...: 13% complete and machine-config going Degraded=True with RequiredPoolsFailed: Unable to apply 4.3.0-...: timed out waiting for the condition during syncRequiredMachineConfigPools: pool master has not progressed to latest configuration: controller version mismatch for rendered-master-6c22... expected 23a6... has d780... retrying Fixed in 4.2 with MCO 31fed93 [2] and in 4.2 with MCO 25bb6ae [3]. $ oc adm release info --commits registry.svc.ci.openshift.org/ocp/release:4.2.14 | grep machine-config machine-config-operator https://github.com/openshift/machine-config-operator d780d197a9c5848ba786982c0c4aaa7487297046 $ oc adm release info --commits registry.svc.ci.openshift.org/ocp/release:4.2.16 | grep machine-config machine-config-operator https://github.com/openshift/machine-config-operator 31fed93186c9f84708f5cdfd0227ffe4f79b31cd So the 4.2 fix was in 4.2.16. $ oc adm release info --commits registry.svc.ci.openshift.org/ocp/release:4.3.0-rc.0 | grep machine-config machine-config-operator https://github.com/openshift/machine-config-operator 23a6e6fb37e73501bc3216183ef5e6ebb15efc7a $ oc adm release info --commits registry.svc.ci.openshift.org/ocp/release:4.3.0-rc.3 | grep machine-config machine-config-operator https://github.com/openshift/machine-config-operator 25bb6aeb58135c38a667e849edf5244871be4992 So the 4.3 fix was new in rc.3. * Updates hang with FailedCreatePodSandBox events in the openshift-ingress namespace like [4]: pod/router-default-...: Failed create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_router-default-..._openshift-ingress_...(...): Multus: error adding pod to network "openshift-sdn": delegateAdd: error invoking DelegateAdd - "openshift-sdn": error in getting result from AddNetwork: CNI request failed with status 400: 'failed to run IPAM for ...: failed to run CNI IPAM ADD: failed to allocate for range 0: no IP addresses available in range set: <ip1>-<ip2> Fixed in 4.2 with MCO 9366460 [5] and in 4.3 with MCO 311a01e [6]. $ git --no-pager log --first-parent --oneline -4 origin/release-4.2 6e0df82c (origin/release-4.2) Merge pull request openshift#1347 from openshift-cherrypick-robot/cherry-pick-1285-to-release-4.2 93664600 Merge pull request openshift#1362 from rphillips/fixes/1787581_4.2 bd358bb7 Merge pull request openshift#1323 from openshift-cherrypick-robot/cherry-pick-1320-to-release-4.2 31fed931 Merge pull request openshift#1358 from runcom/osimageurl-race-42 so the 4.2 fix was after 4.2.16's 31fed93186. $ git --no-pager log --first-parent --oneline -8 origin/release-4.3 3ad3a836 (origin/release-4.3) Merge pull request openshift#1399 from celebdor/haproxy-v4v6 25503eee Merge pull request openshift#1353 from russellb/1211-4.3-backport 67ab306b Merge pull request openshift#1426 from mandre/ssc43 d74f56fe Merge pull request openshift#1410 from retroflexer/manual-cherry-pick-from-master 207cc171 Merge pull request openshift#1406 from openshift-cherrypick-robot/cherry-pick-1396-to-release-4.3 25bb6aeb Merge pull request openshift#1359 from runcom/osimageurl-race-43 311a01e8 Merge pull request openshift#1361 from rphillips/fixes/1787581_4.3 23a6e6fb Merge pull request openshift#1348 from openshift-cherrypick-robot/cherry-pick-1285-to-release-4.3 So the 4.3 fix was between rc.0's 23a6e6fb37 and rc.3's 25bb6aeb58 (see 'release info' calls in the previous list entry for those commit hashes). * Update CI fails with [7,8]: Could not reach HTTP service through <ip>:80 after 2m0s and authentication going Degraded=True with RouteHealthDegradedFailedGet: RouteHealthDegraded: failed to GET route: dial tcp <ip>:443: connect: connection refused Fixed in 4.2 with SDN 677b3a8 [9] and in 4.3 with SDN 74a8aee [10]. $ oc adm release info --commits registry.svc.ci.openshift.org/ocp/release:4.2.16 | grep ' node ' node https://github.com/openshift/sdn 770cb7bf922a721bc6c62af5490439d6174036fe $ oc adm release info --commits registry.svc.ci.openshift.org/ocp/release:4.2.14 | grep ' node ' node https://github.com/openshift/sdn 770cb7bf922a721bc6c62af5490439d6174036fe $ git --no-pager log --first-parent --oneline -4 origin/release-4.2 098a6410 (origin/release-4.2) Merge pull request #95 from danwinship/fork-k8s-client-go-4.2 9955a65b Merge pull request #72 from juanluisvaladas/too_many_dns_queries_42 677b3a80 Merge pull request #90 from openshift-cherrypick-robot/cherry-pick-81-to-release-4.2 770cb7bf Merge pull request #73 from danwinship/egressip-cleanup-4.2 So the fix landed after 4.2.16's 770cb7bf. $ oc adm release info --commits registry.svc.ci.openshift.org/ocp/release:4.3.0-rc.0 | grep ' sdn ' sdn https://github.com/openshift/sdn d4e36d5019ef0e130e0d246581508821a7322753 $ git --no-pager log --first-parent --oneline -5 origin/release-4.3 490a574e (origin/release-4.3) Merge pull request openshift#98 from openshift-cherrypick-robot/cherry-pick-96-to-release-4.3 85ab1033 Merge pull request #78 from openshift-cherrypick-robot/cherry-pick-57-to-release-4.3 d4e36d50 Merge pull request #85 from openshift-cherrypick-robot/cherry-pick-84-to-release-4.3 dabc4ef5 Merge pull request #83 from dougbtv/backport-build-use-host-local 74a8aee3 Merge pull request #81 from openshift-cherrypick-robot/cherry-pick-79-to-release-4.3 So the fix landed before rc.0's d4e36d50. * GCP update CI fails with [11]: Could not reach HTTP service through <ip>:80 after 2m0s in 4.2.16 -> 4.3.0-rc.0 [12], 4.2.16 -> 4.3.0-rc.3 [13,14,15], and 4.2.18 -> 4.3.1 [16]. This doesn't happen every time though; at least one 4.2.16 -> 4.3.0-rc.3 has passed on GCP [17]. We don't have a root-cause yet, but the final failure matches [8] discussed above. [1]: https://bugzilla.redhat.com/show_bug.cgi?id=1786993 [2]: openshift/machine-config-operator#1358 (comment) [3]: openshift/machine-config-operator#1359 (comment) [4]: https://bugzilla.redhat.com/show_bug.cgi?id=1787635 [5]: openshift/machine-config-operator#1362 (comment) [6]: openshift/machine-config-operator#1361 (comment) [7]: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-gcp-upgrade/214#1:build-log.txt%3A414 [8]: https://bugzilla.redhat.com/show_bug.cgi?id=1781763 [9]: openshift/sdn#90 (comment) [10]: openshift/sdn#81 (comment) [11]: https://bugzilla.redhat.com/show_bug.cgi?id=1785457 [12]: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-gcp-upgrade/216 [13]: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-gcp-upgrade/232 [14]: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-gcp-upgrade/233 [15]: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-gcp-upgrade/234 [16]: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-gcp-upgrade/286 [17]: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-gcp-upgrade/230
Baked in edges: $ oc adm release info quay.io/openshift-release-dev/ocp-release:4.3.0-rc.0-x86_64 | grep Upgrades Upgrades: 4.2.13 $ oc adm release info quay.io/openshift-release-dev/ocp-release:4.3.0-rc.3-x86_64 | grep Upgrades Upgrades: 4.2.16, 4.3.0-rc.0, 4.3.0-rc.1, 4.3.0-rc.2 The wide 'from' regexp was appropriate for 4.3.0-rc.0, which had no 4.3 update sources. But rc.3 does have update sources, and we want to allow 4.3.0-rc.0 -> 4.3.0-rc.3, because it is not impacted by the 4.2->4.3 GCP update bug. The overly-strict regexp was from 6d3db09 (Blocking edges to candidate 4.3.0-rc.3, 2020-01-23, #34). Also expand the referenced bugs to for the blocked 4.2 -> 4.3 edges: * Update hangs with [1]: Working towards 4.3.0...: 13% complete and machine-config going Degraded=True with RequiredPoolsFailed: Unable to apply 4.3.0-...: timed out waiting for the condition during syncRequiredMachineConfigPools: pool master has not progressed to latest configuration: controller version mismatch for rendered-master-6c22... expected 23a6... has d780... retrying Fixed in 4.2 with MCO 31fed93 [2] and in 4.2 with MCO 25bb6ae [3]. $ oc adm release info --commits registry.svc.ci.openshift.org/ocp/release:4.2.14 | grep machine-config machine-config-operator https://github.com/openshift/machine-config-operator d780d197a9c5848ba786982c0c4aaa7487297046 $ oc adm release info --commits registry.svc.ci.openshift.org/ocp/release:4.2.16 | grep machine-config machine-config-operator https://github.com/openshift/machine-config-operator 31fed93186c9f84708f5cdfd0227ffe4f79b31cd So the 4.2 fix was in 4.2.16. $ oc adm release info --commits registry.svc.ci.openshift.org/ocp/release:4.3.0-rc.0 | grep machine-config machine-config-operator https://github.com/openshift/machine-config-operator 23a6e6fb37e73501bc3216183ef5e6ebb15efc7a $ oc adm release info --commits registry.svc.ci.openshift.org/ocp/release:4.3.0-rc.3 | grep machine-config machine-config-operator https://github.com/openshift/machine-config-operator 25bb6aeb58135c38a667e849edf5244871be4992 So the 4.3 fix was new in rc.3. * Updates hang with FailedCreatePodSandBox events in the openshift-ingress namespace like [4]: pod/router-default-...: Failed create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_router-default-..._openshift-ingress_...(...): Multus: error adding pod to network "openshift-sdn": delegateAdd: error invoking DelegateAdd - "openshift-sdn": error in getting result from AddNetwork: CNI request failed with status 400: 'failed to run IPAM for ...: failed to run CNI IPAM ADD: failed to allocate for range 0: no IP addresses available in range set: <ip1>-<ip2> Fixed in 4.2 with MCO 9366460 [5] and in 4.3 with MCO 311a01e [6]. $ git --no-pager log --first-parent --oneline -4 origin/release-4.2 6e0df82c (origin/release-4.2) Merge pull request openshift#1347 from openshift-cherrypick-robot/cherry-pick-1285-to-release-4.2 93664600 Merge pull request openshift#1362 from rphillips/fixes/1787581_4.2 bd358bb7 Merge pull request openshift#1323 from openshift-cherrypick-robot/cherry-pick-1320-to-release-4.2 31fed931 Merge pull request openshift#1358 from runcom/osimageurl-race-42 so the 4.2 fix was after 4.2.16's 31fed93186. $ git --no-pager log --first-parent --oneline -8 origin/release-4.3 3ad3a836 (origin/release-4.3) Merge pull request openshift#1399 from celebdor/haproxy-v4v6 25503eee Merge pull request openshift#1353 from russellb/1211-4.3-backport 67ab306b Merge pull request openshift#1426 from mandre/ssc43 d74f56fe Merge pull request openshift#1410 from retroflexer/manual-cherry-pick-from-master 207cc171 Merge pull request openshift#1406 from openshift-cherrypick-robot/cherry-pick-1396-to-release-4.3 25bb6aeb Merge pull request openshift#1359 from runcom/osimageurl-race-43 311a01e8 Merge pull request openshift#1361 from rphillips/fixes/1787581_4.3 23a6e6fb Merge pull request openshift#1348 from openshift-cherrypick-robot/cherry-pick-1285-to-release-4.3 So the 4.3 fix was between rc.0's 23a6e6fb37 and rc.3's 25bb6aeb58 (see 'release info' calls in the previous list entry for those commit hashes). * Update CI fails with [7,8]: Could not reach HTTP service through <ip>:80 after 2m0s and authentication going Degraded=True with RouteHealthDegradedFailedGet: RouteHealthDegraded: failed to GET route: dial tcp <ip>:443: connect: connection refused Fixed in 4.2 with SDN 677b3a8 [9] and in 4.3 with SDN 74a8aee [10]. $ oc adm release info --commits registry.svc.ci.openshift.org/ocp/release:4.2.16 | grep ' node ' node https://github.com/openshift/sdn 770cb7bf922a721bc6c62af5490439d6174036fe $ oc adm release info --commits registry.svc.ci.openshift.org/ocp/release:4.2.14 | grep ' node ' node https://github.com/openshift/sdn 770cb7bf922a721bc6c62af5490439d6174036fe $ git --no-pager log --first-parent --oneline -4 origin/release-4.2 098a6410 (origin/release-4.2) Merge pull request #95 from danwinship/fork-k8s-client-go-4.2 9955a65b Merge pull request #72 from juanluisvaladas/too_many_dns_queries_42 677b3a80 Merge pull request #90 from openshift-cherrypick-robot/cherry-pick-81-to-release-4.2 770cb7bf Merge pull request #73 from danwinship/egressip-cleanup-4.2 So the fix landed after 4.2.16's 770cb7bf. $ oc adm release info --commits registry.svc.ci.openshift.org/ocp/release:4.3.0-rc.0 | grep ' sdn ' sdn https://github.com/openshift/sdn d4e36d5019ef0e130e0d246581508821a7322753 $ git --no-pager log --first-parent --oneline -5 origin/release-4.3 490a574e (origin/release-4.3) Merge pull request openshift#98 from openshift-cherrypick-robot/cherry-pick-96-to-release-4.3 85ab1033 Merge pull request #78 from openshift-cherrypick-robot/cherry-pick-57-to-release-4.3 d4e36d50 Merge pull request #85 from openshift-cherrypick-robot/cherry-pick-84-to-release-4.3 dabc4ef5 Merge pull request #83 from dougbtv/backport-build-use-host-local 74a8aee3 Merge pull request #81 from openshift-cherrypick-robot/cherry-pick-79-to-release-4.3 So the fix landed before rc.0's d4e36d50. * GCP update CI fails with [11]: Could not reach HTTP service through <ip>:80 after 2m0s in 4.2.16 -> 4.3.0-rc.0 [12], 4.2.16 -> 4.3.0-rc.3 [13,14,15], and 4.2.18 -> 4.3.1 [16]. This doesn't happen every time though; at least one 4.2.16 -> 4.3.0-rc.3 has passed on GCP [17]. We don't have a root-cause yet, but the final failure matches [8] discussed above. [1]: https://bugzilla.redhat.com/show_bug.cgi?id=1786993 [2]: openshift/machine-config-operator#1358 (comment) [3]: openshift/machine-config-operator#1359 (comment) [4]: https://bugzilla.redhat.com/show_bug.cgi?id=1787635 [5]: openshift/machine-config-operator#1362 (comment) [6]: openshift/machine-config-operator#1361 (comment) [7]: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-gcp-upgrade/214#1:build-log.txt%3A414 [8]: https://bugzilla.redhat.com/show_bug.cgi?id=1781763 [9]: openshift/sdn#90 (comment) [10]: openshift/sdn#81 (comment) [11]: https://bugzilla.redhat.com/show_bug.cgi?id=1785457 [12]: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-gcp-upgrade/216 [13]: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-gcp-upgrade/232 [14]: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-gcp-upgrade/233 [15]: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-gcp-upgrade/234 [16]: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-gcp-upgrade/286 [17]: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-gcp-upgrade/230
This is an automated cherry-pick of #79
/assign squeed