New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug 1959200: Adds back checking OF flows for CNI #555
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dcbw The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/hold we need to bump OVN with the relevant fix also in this PR. https://bugzilla.redhat.com/show_bug.cgi?id=1959200 depends on: |
@dcbw: This pull request references Bugzilla bug 1959200, which is valid. The bug has been updated to refer to the pull request using the external bug tracker. 3 validation(s) were run on this bug
No GitHub users were found matching the public email listed for the QA contact in Bugzilla (anusaxen@redhat.com), skipping review request. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
ovn2.13-20.12.0-135 has the necessary fixes now; need that pulled into OCP. |
@dcbw https://bugzilla.redhat.com/show_bug.cgi?id=1952846 has been verified, can you tag and pull that into this PR? |
/hold cancel |
/retest |
/retest |
1 similar comment
/retest |
/test ci/prow/images |
@dcbw: The specified target(s) for
Use In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/test images |
4 similar comments
/test images |
/test images |
/test images |
/test images |
/retest |
2 similar comments
/retest |
/retest |
Hybrid step: ns/e2e-statefulset-937 pod/ss-0 node/ip-10-0-230-171.ec2.internal - 24.72 seconds after deletion - reason/FailedCreatePodSandBox Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_ss-0_e2e-statefulset-937_3c9e8983-67c2-495c-936c-c16447924956_0(f902bb70527f34146fd8f98fdb0c5f6e4e9ea9c135db55ec754bad0105b6adb4): [e2e-statefulset-937/ss-0:ovn-kubernetes]: error adding container to network "ovn-kubernetes": CNI request failed with status 400: '[e2e-statefulset-937/ss-0 f902bb70527f34146fd8f98fdb0c5f6e4e9ea9c135db55ec754bad0105b6adb4] [e2e-statefulset-937/ss-0 f902bb70527f34146fd8f98fdb0c5f6e4e9ea9c135db55ec754bad0105b6adb4] failed to configure pod interface: error while waiting on OVS.Interface.external-ids:ovn-installed for pod: timed out while waiting for OVS port binding |
/test okd-e2e-gcp-ovn
|
/retest |
/override ci/prow/e2e-metal-ipi-ovn-dualstack |
@dcbw: Overrode contexts on behalf of dcbw: ci/prow/e2e-metal-ipi-ovn-dualstack In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/hold |
ac6f1a4
to
90626bf
Compare
/test all |
We see at scale that this can happen: 1. CNI delete 2. OVN is so busy it takes 30 seconds to remove the old logical port 3. CNI ADD within 30 seconds 4. ovn-controller sees old logical switchport, binds and considers new pod up, but no traffic works 5. sometime later OVN gets updated, and ovn-controller updates the pod with the new flows and traffic finally works To solve this problem we need to have a minimal check to ensure the right flows are present for the pod before we check if ovn_installed is true. This change adds back the checks for mac address and of port number. Signed-off-by: Tim Rozet <trozet@redhat.com> (cherry picked from commit 22bed6a)
-140 fixes ovn-installed status (rhbz#1952846), backports the lflow cache bounding, and fixes IPv6 ECMP symmetric flows (rhbz#1959008).
The runtime might call ovnkube to set up the pod sandbox before the pod informer has received the pod from the apiserver. Currently the code simply returns an error and expects kubelet to retry. Instead let's be nicer and wait a short bit of time for the pod to show up before erroring out. Signed-off-by: Dan Williams <dcbw@redhat.com> (cherry picked from commit adaed36)
If the the pod's MAC annotation changes, that means the pod was deleted and re-created. We should cancel any outstanding pod sandbox ADD request that doesn't match the latest MAC address for the pod. Signed-off-by: Dan Williams <dcbw@redhat.com>
/retest |
/test all |
/retest |
1 similar comment
/retest |
@dcbw: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
@dcbw: PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
ovn-org/ovn-kubernetes#2275 should fix this instead. |
@dcbw: This pull request references Bugzilla bug 1959200. The bug has been updated to no longer refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
We see at scale that this can happen:
pod up, but no traffic works
with the new flows and traffic finally works
To solve this problem we need to have a minimal check to ensure the
right flows are present for the pod before we check if ovn_installed is
true. This change adds back the checks for mac address and of port
number.
(cherry picked from commit 22bed6a)
4.8 backport of ovn-org/ovn-kubernetes#2220
@trozet