Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make sdn pod teardown robust #14893

Merged

Conversation

pravisankar
Copy link

@pravisankar pravisankar commented Jun 26, 2017

We have seen bugs where podIP is not valid/empty for teardown operation,
in this case ovs flows are left out. This change will fix this case
by picking the podIP from ovs flows (using 'note' field that was derived from
pod sandbox ID)

@pravisankar
Copy link
Author

@openshift/networking @dcbw PTAL

@pravisankar
Copy link
Author

[test]

@@ -215,7 +215,7 @@ func TestOVSService(t *testing.T) {
}

const (
containerID string = "bcb5d8d287fcf97458c48ad643b101079e3bc265a94e097e7407440716112f69"
sandboxID string = "bcb5d8d287fcf97458c48ad643b101079e3bc265a94e097e7407440716112f69"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

belongs in the previous PR

func getPodDetailsByContainerID(flows []string, containerID string) (int, string, string, string, error) {
note, err := getPodNote(containerID)
func getPodDetailsBySandboxID(ovsif ovs.Interface, sandboxID string) (int, string, string, string, error) {
flows, err := ovsif.DumpFlows()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should rebase the real pkg/util/ovs implementation on top of the fake one, so that, eg, when you do an AddFlow(), it calls ovs-ofctl and also updates its own internal array of flows, so that we can implement code like this without actually having to call dump-flows all the time?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently DumpFlows is used in during startup, update pod and teardown case where podIP is empty. We can't optimize during startup, update pod may not be a frequent operation. May be we can introduce this idea when dumpFlows() is called frequently.

ktypes.KubernetesPodNameLabel: req.PodName,
},
}
if id, err := m.getPodSandboxID(filter, req); err == nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

req.containerId should already be the pod sandbox ID... I'm pretty sure we don't need to re-request it, since at Teardown time the runtime is calling us directly with the sandboxID.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I will update.

filter := &kruntimeapi.PodSandboxFilter{
LabelSelector: map[string]string{ktypes.KubernetesPodUIDLabel: string(pod.UID)},
}
podSandboxList, err := runtimeService.ListPodSandbox(filter)
podSandboxID, err := m.getPodSandboxID(filter, req)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned in the other PR, I think the sandbox ID detection stuff for update() should go into node.go or update.go. Then we just pass the actual sandbox ID into the PodRequest which gets set up in node.go::Update(), and we'll get it already here.

@pravisankar
Copy link
Author

@dcbw @danwinship Updated, ptal

return err
}
if _, ip, _, _, err := getPodDetailsBySandboxID(flows, sandboxID); err == nil {
podIP = ip
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to return an error here if this fails

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may get an error here if the corresponding ovs flow rule doesn't exists which means tearDown already happened. Don't we want to ignore in this case?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, in that case you need to return; right now it's just continuing on with podIP=""

@pravisankar pravisankar force-pushed the make-teardown-robust branch 2 times, most recently from 8f88ae0 to 4c82dd3 Compare June 28, 2017 22:22
@pravisankar
Copy link
Author

re[test]

@pravisankar pravisankar force-pushed the make-teardown-robust branch 2 times, most recently from a87117b to 084e376 Compare July 5, 2017 17:36
@openshift-bot
Copy link
Contributor

Evaluated for origin test up to 084e376

@openshift-bot
Copy link
Contributor

continuous-integration/openshift-jenkins/test FAILURE (https://ci.openshift.redhat.com/jenkins/job/test_pull_request_origin/2975/) (Base Commit: 22888b2) (PR Branch Commit: 084e376)

@smarterclayton
Copy link
Contributor

Was this slated for 3.6?

@pravisankar
Copy link
Author

pravisankar commented Jul 12, 2017 via email

@openshift-merge-robot openshift-merge-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Jul 24, 2017
@openshift-merge-robot openshift-merge-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Jul 26, 2017
@deads2k
Copy link
Contributor

deads2k commented Jul 26, 2017

No, targeted for 3.6.1

Can you make another pull against the 3.6 branch (and an issue) and free this up to merge against master?

@deads2k
Copy link
Contributor

deads2k commented Jul 26, 2017

/unassign

@openshift-merge-robot openshift-merge-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 28, 2017
@openshift-merge-robot openshift-merge-robot removed the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 28, 2017
@dcbw
Copy link
Contributor

dcbw commented Aug 4, 2017

/approve
/lgtm

though you'll want to rebase when #14894 lands so that we don't need yet another approval from clayton/liggit/etc

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Aug 4, 2017
@deads2k
Copy link
Contributor

deads2k commented Aug 7, 2017

/approve

@openshift-merge-robot openshift-merge-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 7, 2017
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

3 similar comments
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@smarterclayton
Copy link
Contributor

I haven't seen the unit tests pass yet. Have they actually passed and I just missed it?

@smarterclayton smarterclayton removed the lgtm Indicates that a PR is ready to be merged. label Aug 8, 2017
@danwinship
Copy link
Contributor

[WARNING] `go test` had the following output to stderr:
# github.com/openshift/origin/pkg/sdn/plugin
pkg/sdn/plugin/ovscontroller_test.go:303: not enough arguments in call to oc.TearDownPod

We have seen bugs where podIP is not valid/empty for teardown operation,
in this case ovs flows are left out. This change will fix this case
by picking the podIP from ovs flows (using 'note' field that was derived from
pod sandbox ID).
@openshift-merge-robot openshift-merge-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Aug 8, 2017
@pravisankar
Copy link
Author

@danwinship @dcbw fixed ovscontroller_test.go, ptal

@pravisankar
Copy link
Author

/retest

@dcbw
Copy link
Contributor

dcbw commented Aug 9, 2017

/retest

All the errors are flakes attempting to either get the RPM packages or to pull the openshift docker images.

@dcbw
Copy link
Contributor

dcbw commented Aug 9, 2017

changes /lgtm

@pravisankar
Copy link
Author

/retest

@danwinship
Copy link
Contributor

changes /lgtm

@dcbw FYI looks like that didn't work. I guess the command needs to be at the start of the line?
/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Aug 18, 2017
@openshift-merge-robot
Copy link
Contributor

/test all [submit-queue is verifying that this PR is safe to merge]

@openshift-merge-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: danwinship, dcbw, deads2k, pravisankar

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@pravisankar
Copy link
Author

/test extended_conformance_gce

@openshift-merge-robot
Copy link
Contributor

/test all [submit-queue is verifying that this PR is safe to merge]

@openshift-merge-robot
Copy link
Contributor

Automatic merge from submit-queue

@openshift-merge-robot openshift-merge-robot merged commit 0dfa88a into openshift:master Aug 18, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. component/networking lgtm Indicates that a PR is ready to be merged. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants