Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sdn: fix initialization order to prevent crash on node startup #13766

Merged
merged 1 commit into from Apr 26, 2017

Conversation

dcbw
Copy link
Member

@dcbw dcbw commented Apr 13, 2017

OsdnNode.Start()
(node.pm == nil at this point)
-> node.policy.Start() (which is multitenant policy)
-> mp.vnids.Start()
-> go vmap.watchNetNamespaces()
-> (net namespace event happens)
-> watchNetNamespaces()
-> vmap.policy.AddNetNamespace() (policy is multitenant)
-> mp.updatePodNetwork()
-> mp.node.podManager.UpdateLocalMulticastRules() (and podManager is still nil)

Create the PodManager earlier so it's not nil if we get early events.

Fixes: #13742

@openshift/networking

@smarterclayton
Copy link
Contributor

I assume this will be backported to 1.5?

@dcbw
Copy link
Member Author

dcbw commented Apr 13, 2017

@smarterclayton yeah, it should be

@dcbw dcbw force-pushed the sdn-multicast-init-order branch 2 times, most recently from 57a7167 to fc95b86 Compare April 13, 2017 20:48
// podManager must be created early because other goroutines
// may call into it before its started due to event watches
log.V(5).Infof("Creating openshift-sdn pod manager")
node.podManager = newPodManager(node.kClient, node.policy, node.mtu, node.oc)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"it's" (in the comment)

should this be in NewNodePlugin() rather than Start()?

also, maybe move node.localSubnetCIDR, err = node.getLocalSubnet() to before the node_iptables calls, and then add a comment between the two indicating that that's the point after which all of node's fields have been initialized?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danwinship all great points. Done.

return err
}
if err := node.podManager.Start(cniserver.CNIServerSocketPath); err != nil {
// Kubelet has initialized, now we have a valid node.host
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, ok, so then the comment above about "all OsdnNode fields have been initialized" is a lie then. Fix that in some way. Then LGTM.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danwinship PTAL, I think we can just move the kubelet init block above the "everything is initilaized" comment; no reason it has to be that far down I don't think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Erm... I think you're right, but given the fun we've had with startup ordering in the past (and that we're continuing to have now) I'd feel a little bit sketchy merging that to release-1.5 without it getting any testing in master first. How about we make that change for master but for release-1.5 leave it where it is and just clarify in the comment that everything except .host and .kubeletCniPlugin are initialized by that point.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this isn't happening often on 1.5 i'm ok waiting.

@smarterclayton
Copy link
Contributor

[test]

OsdnNode.Start()
   (node.pm == nil at this point)
   -> node.policy.Start()  (which is multitenant policy)
   -> mp.vnids.Start()
   -> go vmap.watchNetNamespaces()
   -> (net namespace event happens)
   -> watchNetNamespaces()
   -> vmap.policy.AddNetNamespace() (policy is multitenant)
   -> mp.updatePodNetwork()
   -> mp.node.podManager.UpdateLocalMulticastRules() (and podManager is still nil)

Create the PodManager earlier so it's not nil if we get early events.

Fixes: openshift#13742
@dcbw
Copy link
Member Author

dcbw commented Apr 19, 2017

test failure was previous run...

[test]

@dcbw
Copy link
Member Author

dcbw commented Apr 19, 2017

re-[test] issue #13650

@dcbw
Copy link
Member Author

dcbw commented Apr 20, 2017

re-[test] issue #13827

@smarterclayton
Copy link
Contributor

smarterclayton commented Apr 20, 2017 via email

@dcbw
Copy link
Member Author

dcbw commented Apr 20, 2017

re-[test] issue #13831

@dcbw
Copy link
Member Author

dcbw commented Apr 20, 2017

Also flaked on #13814 and #9888

@dcbw
Copy link
Member Author

dcbw commented Apr 20, 2017

re-[test] issue #13271 issue #13814 issue #13812

@danwinship
Copy link
Contributor

[merge] once the tests stop flaking...

@dcbw
Copy link
Member Author

dcbw commented Apr 20, 2017

re-[test] issue #13812 issue #13846

@smarterclayton
Copy link
Contributor

smarterclayton commented Apr 23, 2017 via email

@smarterclayton
Copy link
Contributor

smarterclayton commented Apr 24, 2017 via email

@openshift-bot
Copy link
Contributor

Evaluated for origin merge up to 94fb0f4

@openshift-bot
Copy link
Contributor

continuous-integration/openshift-jenkins/merge FAILURE (https://ci.openshift.redhat.com/jenkins/job/merge_pull_request_origin/465/) (Base Commit: 184b859)

@knobunc
Copy link
Contributor

knobunc commented Apr 25, 2017

@knobunc
Copy link
Contributor

knobunc commented Apr 26, 2017

[merge]

@eparis
Copy link
Member

eparis commented Apr 26, 2017

[test]

@eparis
Copy link
Member

eparis commented Apr 26, 2017

(if it comes back green lets merge by hand, since we already merged in older releases)

@openshift-bot
Copy link
Contributor

Evaluated for origin test up to 94fb0f4

@openshift-bot
Copy link
Contributor

continuous-integration/openshift-jenkins/test SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pull_request_origin/956/) (Base Commit: c2e95f5)

@eparis eparis merged commit 6bb336b into openshift:master Apr 26, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants