New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
change the name of host-addresses annotation to host-cidrs #3915
Conversation
778261e
to
3405987
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some extra changes needed:
go-controller/pkg/node/node_ip_handler_linux.go:184:
go-controller/pkg/ovn/controller/services/services_controller.go:473
go-controller/pkg/ovn/controller/services/services_controller.go:486
go-controller/pkg/ovn/controller/services/node_tracker.go:254
go-controller/pkg/ovn/egressservices_test.go:1465
go-controller/pkg/ovn/ovn.go:359
Do you think we need to rename hostAddress
below (go-controller/pkg/ovn/controller/services/services_controller.go) ? I am on the fench - up to you:
type nodeInfo struct {
// the node's Name
name string
// The list of physical IPs reported by the gatewayconf annotation
l3gatewayAddresses []net.IP
// The list of physical IPs and subnet masks the node has, as reported by the host-cidrs annotation
hostAddresses []net.IP
go-controller/pkg/ovn/controller/services/services_controller.go:473
go-controller/pkg/ovn/controller/services/services_controller.go:486
go-controller/pkg/ovn/controller/services/node_tracker.go:254
go-controller/pkg/ovn/egressservices_test.go:1465
go-controller/pkg/ovn/ovn.go:359
go-controller/pkg/util/node_annotations.go:719
go-controller/pkg/util/node_annotations.go:726
go-controller/pkg/util/node_annotations.go:735
go-controller/pkg/util/node_annotations.go:742
go-controller/pkg/util/node_annotations.go:775
go-controller/pkg/clustermanager/egressip_controller.go:1395
// does not contain EgressIPs that are assigned to interfaces. | ||
for _, node := range nodes { | ||
nodeHostAddressesSet, err := util.ParseNodeHostAddressesDropNetMask(node) | ||
nodeHostAddressesSet, err := util.ParseNodeHostCIDRSDropNetMask(node) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit but shouldnt we have the s in CIDRS
lowercase here and the rest of the vars? WDYT?
nodeHostAddressesSet, err := util.ParseNodeHostCIDRsDropNetMask(node)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that I did it
@@ -307,7 +307,7 @@ func (c *addressManager) doesNodeHostAddressesMatch() bool { | |||
} | |||
// check to see if ips on the node differ from what we stored | |||
// in host-address annotation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change this to host-cidrs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed it to doNodeHostCIDRsMatch
70d8c71
to
f663355
Compare
/retest |
Oops, something went wrong:
|
0f09f68
to
d3fbd46
Compare
Thanks Jacob - some extra changes:
|
|
d3fbd46
to
c451a4b
Compare
CP failures unrelated to this PR and looks like upstream issue - retesting:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
double check places you changed hostAddrs to hostCIDRs. Not all of them should be named CIDRs as some are truly just IP addresses without a mask.
I took the liberty of addressing the comments and pushing a new revision since @JacobTanenbaum is on PTO and this is urgent |
@qinqon fyi this flake keeps happening on several PRs: [Fail] Kubevirt Virtual Machines when live migrated [It] with pre-copy should keep connectivity [Fail] Kubevirt Virtual Machines when live migrated [It] with pre-copy should keep connectivity https://github.com/ovn-org/ovn-kubernetes/actions/runs/6304894472/job/17118087308?pr=3915 |
commit: 8847618 changed the host-addresses annotation to include a subnet mask. This causes updates to experiance a network disruption as masters and nodes expect different data to be in the host-addresses annotation. Changing the name of the host-addresses annotation to host-cidrs will eliminate the network disruption because nodes updating will not remove the host-addresses annotation and once upgraded masters will do the right thing with the host-cidrs annotation Co-authored-by: Tim Rozet <trozet@redhat.com> Signed-off-by: Jacob Tanenbaum <jtanenba@redhat.com>
We are tracking this here, #3902, I am trying to reproduce it, but more or less I think is the kubevirt pod that handles webhook is at bad state. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm let's bring this downstream and ensure 12mins becomes 0mins for stable upgrades :D
some questions inline just to clarify some parts.
thank you @JacobTanenbaum @trozet
// does not contain EgressIPs that are assigned to interfaces. | ||
for _, node := range nodes { | ||
nodeHostAddressesSet, err := util.ParseNodeHostAddressesDropNetMask(node) | ||
nodeHostAddrsSet, err := util.ParseNodeHostCIDRsDropNetMask(node) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so when we upgrade from older version, since the node rolls out first, this new annotation will already be set by the time master's roll out and thus there wont be any lag/delay in processing ovnManagedEIPs existing in the cluster right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Node IP handler will update/sync this annot under 2 conditions:
- it get trigger either by an addr on the node being create/altered - we sub to events
- 30s time resync period
so the max delay can be ~30s
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it does do a sync in-line when generating a new addr manager so we are good i think:
func newAddressManagerInternal(nodeName string, k kube.Interface, config *managementPortConfig, watchFactory factory.NodeWatchFactory, useNetlink bool) *addressManager {
mgr := &addressManager{
nodeName: nodeName,
watchFactory: watchFactory,
addresses: sets.NewString(),
mgmtPortConfig: config,
OnChanged: func() {},
useNetlink: useNetlink,
}
mgr.nodeAnnotator = kube.NewNodeAnnotator(k, nodeName)
mgr.sync()func newAddressManagerInternal(nodeName string, k kube.Interface, config *managementPortConfig, watchFactory factory.NodeWatchFactory, useNetlink bool) *addressManager {
mgr := &addressManager{
nodeName: nodeName,
watchFactory: watchFactory,
addresses: sets.NewString(),
mgmtPortConfig: config,
OnChanged: func() {},
useNetlink: useNetlink,
}
mgr.nodeAnnotator = kube.NewNodeAnnotator(k, nodeName)
mgr.sync() <--- HERE
@@ -220,7 +220,7 @@ func (c *addressManager) handleNodePrimaryAddrChange() { | |||
} | |||
|
|||
// updateNodeAddressAnnotations updates all relevant annotations for the node including | |||
// k8s.ovn.org/host-addresses, k8s.ovn.org/node-primary-ifaddr, k8s.ovn.org/l3-gateway-config. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the host-addresses annotation will live on forever? or during upgrades it will naturally vanish?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should live on forever until we clean it in the future when theres no longer an upgrade path from before IC.
@@ -560,16 +560,16 @@ func (oc *DefaultNetworkController) addExternalSwitch(prefix, interfaceID, nodeN | |||
return nil | |||
} | |||
|
|||
func (oc *DefaultNetworkController) addPolicyBasedRoutes(nodeName, mgmtPortIP string, hostIfAddr *net.IPNet, otherHostAddrs []string) error { | |||
func (oc *DefaultNetworkController) addPolicyBasedRoutes(nodeName, mgmtPortIP string, hostIfCIDR *net.IPNet, otherHostAddrs []string) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NOTE: This path of code is not tested at all which is why this bug was not caught when the match of the policy changed from hostIP to hostIP/16. No code should be able to modify core routing policies on ovn_cluster_router and get away with it. Tests should complain.
matchstr2 := fmt.Sprintf(`inport == "rtos-%s" && ip4.dst == nodePhysicalIP /* %s */`, nodeName, nodeName)
matchstr3 := fmt.Sprintf("ip4.src == source && ip4.dst == nodePhysicalIP")
matchstr6 := fmt.Sprintf("ip4.src == NO DELETE && ip4.dst == nodePhysicalIP /* %s-no */", nodeName)
All current unit tests are playing with dummy match expresssions trying to leverage the comment on the actual policy:
We need unit test coverage in OVN DB libovsdb test framework in gateway_test.go and gateway_init.go for the match to prevent this from happening in the future. I will not block this PR for that but I will open an issue for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm not true, we do have some tests here: addNodeLogicalFlows
however for nonIC setups these should have clearly failed - we got lucky with upgrades for ic only because db's are wiped, so nonIC->nonIC should have been stuck forever but it didn't unsure why.
@@ -25,7 +25,7 @@ type checkNodeAnnot func(v annotationChange, nodeName string) error | |||
|
|||
// commonNodeAnnotationChecks holds annotations allowed for ovnkube-node:<nodeName> users in non-IC and IC environments | |||
var commonNodeAnnotationChecks = map[string]checkNodeAnnot{ | |||
util.OvnNodeHostAddresses: nil, | |||
util.OVNNodeHostCIDRs: nil, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know the order of upgrades for the new node-identity component. My question is: Is there a chance that this new component shows up alongside 4.13 ovnkube-node's? Or will the new component show up only after all ovnkube-node's are in 4.14. Basically do we need to handle the presence of an old ovnkube-node that could potentially update the old annotation and fail due to webhooks not knowing about that annotation?
Fix included in accepted release 4.15.0-0.nightly-2023-09-27-073353 |
commit: 8847618bbf2feeb42c30375a6de6f91afe011d91 changed the host-addresses annotation to include a subnet mask. This causes updates to experiance a network disruption as masters and nodes expect different data to be in the host-addresses annotation.
Changing the name of the host-addresses annotation to host-cidrs will eliminate the network disruption because nodes updating will not remove the host-addresses annotation and once upgraded masters will do the right thing with the host-cidrs annotation
- What this PR does and why is it needed
- Special notes for reviewers
- How to verify it
- Description for the changelog