New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make sdn pod teardown robust #14893
Make sdn pod teardown robust #14893
Conversation
@openshift/networking @dcbw PTAL |
[test] |
pkg/sdn/plugin/ovscontroller_test.go
Outdated
@@ -215,7 +215,7 @@ func TestOVSService(t *testing.T) { | |||
} | |||
|
|||
const ( | |||
containerID string = "bcb5d8d287fcf97458c48ad643b101079e3bc265a94e097e7407440716112f69" | |||
sandboxID string = "bcb5d8d287fcf97458c48ad643b101079e3bc265a94e097e7407440716112f69" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
belongs in the previous PR
pkg/sdn/plugin/ovscontroller.go
Outdated
func getPodDetailsByContainerID(flows []string, containerID string) (int, string, string, string, error) { | ||
note, err := getPodNote(containerID) | ||
func getPodDetailsBySandboxID(ovsif ovs.Interface, sandboxID string) (int, string, string, string, error) { | ||
flows, err := ovsif.DumpFlows() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should rebase the real pkg/util/ovs implementation on top of the fake one, so that, eg, when you do an AddFlow(), it calls ovs-ofctl and also updates its own internal array of flows, so that we can implement code like this without actually having to call dump-flows all the time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently DumpFlows is used in during startup, update pod and teardown case where podIP is empty. We can't optimize during startup, update pod may not be a frequent operation. May be we can introduce this idea when dumpFlows() is called frequently.
pkg/sdn/plugin/pod_linux.go
Outdated
ktypes.KubernetesPodNameLabel: req.PodName, | ||
}, | ||
} | ||
if id, err := m.getPodSandboxID(filter, req); err == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
req.containerId should already be the pod sandbox ID... I'm pretty sure we don't need to re-request it, since at Teardown time the runtime is calling us directly with the sandboxID.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I will update.
pkg/sdn/plugin/pod_linux.go
Outdated
filter := &kruntimeapi.PodSandboxFilter{ | ||
LabelSelector: map[string]string{ktypes.KubernetesPodUIDLabel: string(pod.UID)}, | ||
} | ||
podSandboxList, err := runtimeService.ListPodSandbox(filter) | ||
podSandboxID, err := m.getPodSandboxID(filter, req) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I mentioned in the other PR, I think the sandbox ID detection stuff for update() should go into node.go or update.go. Then we just pass the actual sandbox ID into the PodRequest which gets set up in node.go::Update(), and we'll get it already here.
cfdf928
to
7afe292
Compare
@dcbw @danwinship Updated, ptal |
pkg/sdn/plugin/ovscontroller.go
Outdated
return err | ||
} | ||
if _, ip, _, _, err := getPodDetailsBySandboxID(flows, sandboxID); err == nil { | ||
podIP = ip |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to return an error here if this fails
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may get an error here if the corresponding ovs flow rule doesn't exists which means tearDown already happened. Don't we want to ignore in this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, in that case you need to return; right now it's just continuing on with podIP=""
8f88ae0
to
4c82dd3
Compare
4c82dd3
to
d9a1d53
Compare
re[test] |
a87117b
to
084e376
Compare
Evaluated for origin test up to 084e376 |
continuous-integration/openshift-jenkins/test FAILURE (https://ci.openshift.redhat.com/jenkins/job/test_pull_request_origin/2975/) (Base Commit: 22888b2) (PR Branch Commit: 084e376) |
Was this slated for 3.6? |
On Tue, Jul 11, 2017 at 5:31 PM, Clayton Coleman ***@***.***> wrote:
Was this slated for 3.6?
No, targeted for 3.6.1
… —
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#14893 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABM0hqV7pag4e3mCPZga9zjw83lmmQPyks5sNBPlgaJpZM4OF-MH>
.
|
084e376
to
ce9aa5a
Compare
Can you make another pull against the 3.6 branch (and an issue) and free this up to merge against master? |
/unassign |
/approve though you'll want to rebase when #14894 lands so that we don't need yet another approval from clayton/liggit/etc |
/approve |
/retest Please review the full test history for this PR and help us cut down flakes. |
3 similar comments
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
I haven't seen the unit tests pass yet. Have they actually passed and I just missed it? |
|
We have seen bugs where podIP is not valid/empty for teardown operation, in this case ovs flows are left out. This change will fix this case by picking the podIP from ovs flows (using 'note' field that was derived from pod sandbox ID).
ce9aa5a
to
a4130be
Compare
@danwinship @dcbw fixed ovscontroller_test.go, ptal |
/retest |
/retest All the errors are flakes attempting to either get the RPM packages or to pull the openshift docker images. |
changes /lgtm |
/retest |
@dcbw FYI looks like that didn't work. I guess the command needs to be at the start of the line? |
/test all [submit-queue is verifying that this PR is safe to merge] |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: danwinship, dcbw, deads2k, pravisankar The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these OWNERS Files:
You can indicate your approval by writing |
/test extended_conformance_gce |
/test all [submit-queue is verifying that this PR is safe to merge] |
Automatic merge from submit-queue |
We have seen bugs where podIP is not valid/empty for teardown operation,
in this case ovs flows are left out. This change will fix this case
by picking the podIP from ovs flows (using 'note' field that was derived from
pod sandbox ID)