Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More SDN code reorg #11137

Merged
merged 3 commits into from Oct 9, 2016
Merged

Conversation

danwinship
Copy link
Contributor

  1. Split EgressNetworkPolicy monitoring into its own file
  2. Split "VNID tracking" from "multitenant policy" (since the networkpolicy plugin will want the former but not the latter), so now vnids_node.go is just NetNamespace monitoring
  3. Simplify SDN setup, which still had vestiges of the old "FlowController" split in it, so now subnets.go is just HostSubnet monitoring/maintaining.

@openshift/networking PTAL

var err error
var subnet *osapi.HostSubnet
// Try every retryInterval and bail-out if it exceeds max retries
for i := 0; i < retries; i++ {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could use existing wait.ExponentialBackoff() here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, clayton pushed back on that for the CNI stuff too. See https://github.com/openshift/origin/pull/9981/files#diff-6357e2d44bec4f49542401c788e87f51R426 for an example.

return err
}

err = node.SubnetStartNode()
if err != nil {
return err
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/danwinship/origin/blob/9df71e32e6c65fb5eed4177bfa36d14b8b84890e/pkg/sdn/plugin/node.go#L125
What happens to the pods when UpadePod() fails in case of network change? Networking for these pods will be broken. May be we can retry couple of times and log error if it didn't succeed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think retrying is likely to help; if we can't update the pod networking, then something is just broken. (Eg, OVS isn't running.) But I think if things are broken enough that UpdatePod() fails, then startup is going to fail for other reasons anyway

var err error
var subnet *osapi.HostSubnet
// Try every retryInterval and bail-out if it exceeds max retries
for i := 0; i < retries; i++ {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, clayton pushed back on that for the CNI stuff too. See https://github.com/openshift/origin/pull/9981/files#diff-6357e2d44bec4f49542401c788e87f51R426 for an example.

ids: make(map[string]uint32),
namespaces: make(map[uint32]sets.String),
}
return &nodeVNIDMap{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This just saving memory or someting?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it's part of a set of changes to get rid of plugin.go:getVNID(); the fields get initialized by VnidStartNode() now, which only gets called for multitenant, so if GetVNID() sees that they haven't been initialized later, it can just return 0 for the VNID, rather than plugin.go having to make that assumption itself.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


func (node *OsdnNode) watchServices() {
services := make(map[string]*kapi.Service)
RunEventQueue(node.kClient, Services, func(delta cache.Delta) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably update this to use eventqueue.NewEventQueueForStore() and make 'services' a cache.Store, but that could be done after.

return nil
}

func (node *OsdnNode) updatePodNetwork(namespace string, oldNetID, netID uint32) error {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updatePodNetwork() is only used by watchNetNamespaces() in vnids_node.go
I was expecting MultitenantStartNode() to watch for both NetNamespaces and Services to patch vnid when needed. I didn't understand the split between vnids_node.go and multitenant.go

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm... did I explain that in the commit message? Maybe not... The idea is that there's going to be an openshift-ovs-networkpolicy network plugin soon as well, and it will use the VNID-tracking code, but it won't react to it in the same way. So vnids_node.go = NetNamespace watching, and multitenant.go = openshift-ovs-multitenant-specific policy, and there will later be networkpolicy.go as well.

The fact that updatePodNetwork() gets called directly from vnids_node.go is wrong, yeah, and that's going to change later. (Initially I was planning to just move code around in these commits, and not refactor anything, which is why this is like that. Although then I did end up making some code changes too, so maybe I should have fixed this...)

@openshift-bot openshift-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 29, 2016
@danwinship
Copy link
Contributor Author

OK, repushed without the vnids_node/multitenant split (I'll rework that a bit and do it with the rest of the networkpolicy branch), but with a new patch to use utilwait.ExponentialBackoff() where we should.

@openshift-bot openshift-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 29, 2016
@dcbw
Copy link
Contributor

dcbw commented Sep 30, 2016

LGTM

@dcbw
Copy link
Contributor

dcbw commented Oct 4, 2016

@pravisankar PTAL?

@dcbw
Copy link
Contributor

dcbw commented Oct 4, 2016

[merge]

@openshift-bot
Copy link
Contributor

[Test]ing while waiting on the merge queue

@pravisankar
Copy link

LGTM

@danwinship danwinship force-pushed the more-sdn-cleanup branch 2 times, most recently from 2ea77a3 to 6cd347a Compare October 5, 2016 21:18
@dcbw
Copy link
Contributor

dcbw commented Oct 6, 2016

flake is #11240 [merge]

The SDN initialization was being called from subnets.go for historical
reasons; have node.go call it directly instead. Also, don't bother
passing data to SetupSDN() that it could get from the OsdnNode
structure itself (another historical artifact).
@danwinship
Copy link
Contributor Author

flake is #11240, [test]

@openshift-bot
Copy link
Contributor

Evaluated for origin test up to 54b856a

@openshift-bot
Copy link
Contributor

continuous-integration/openshift-jenkins/test SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/9761/)

@openshift-bot
Copy link
Contributor

Evaluated for origin merge up to 54b856a

@openshift-bot
Copy link
Contributor

openshift-bot commented Oct 9, 2016

continuous-integration/openshift-jenkins/merge SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/9787/) (Image: devenv-rhel7_5157)

@openshift-bot openshift-bot merged commit 1bcada1 into openshift:master Oct 9, 2016
@danwinship danwinship deleted the more-sdn-cleanup branch October 10, 2016 13:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants