New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[release-4.10] Bug 2104454: improve performance of service sync #1173
[release-4.10] Bug 2104454: improve performance of service sync #1173
Conversation
Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com> (cherry picked from commit cf3a846)
@jcaamano: This pull request references Bugzilla bug 2104454, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker. 6 validation(s) were run on this bug
Requesting review from QA contact: In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com> (cherry picked from commit 73e491c)
Running the LB predicate that matches on name takes a long time if the LB table has many LBs. For example, looking up ~40 LBs in a table with ~200k rows took aproximately 3s. The service controller has a second level cache and knows which LBs need to be added and which need to be updated. Avoid this lookup for LBs that are to be added. Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com> (cherry picked from commit 0573fe5) (cherry picked from commit c4a539b)
/cc |
@jcaamano: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
func (m *ModelClient) buildOps(ops []ovsdb.Operation, doWhenFound opModelToOpMapper, doWhenNotFound opModelToOpMapper, opModels ...OperationModel) (interface{}, []ovsdb.Operation, error) { | ||
if ops == nil { | ||
ops = []ovsdb.Operation{} | ||
} | ||
notfound := []interface{}{} | ||
for _, opModel := range opModels { | ||
if opModel.ExistingResult == nil && opModel.Model != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
commit1: LGTM, same as 4.11 no conflicts from cf3a846
@@ -105,17 +105,6 @@ func createOrUpdateLoadBalancerOps(nbClient libovsdbclient.Client, ops []libovsd | |||
|
|||
// If LoadBalancer does not exist, create it | |||
if err == libovsdbclient.ErrNotFound { | |||
timeout := types.OVSDBWaitTimeout |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is partially from ovn-org/ovn-kubernetes@8f17235#diff-685368ea628f10afd438349e793caa717e04ec15ae38a15ac0be406686f0ae5bL97
Moves the wait to model_client.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
true. Maybe I should just pick the whole thing.
ops = append(ops, libovsdb.Operation{ | ||
Op: libovsdb.OperationWait, | ||
Timeout: &timeout, | ||
Table: "Load_Balancer", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self-note: EnsureLBs-> CreateOrUpdateLoadBalancersOps -> createOrUpdateLoadBalancerOps -> modelClient.CreateOrUpdateOps -> modelClient.createOrUpdateOps -> buildFailOnDuplicateOps -> wait ops moved there.
@@ -304,15 +304,9 @@ func AddACLToNodeSwitch(nbClient libovsdbclient.Client, nodeName string, nodeACL | |||
Name: nodeName, | |||
} | |||
|
|||
aclName := "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So these names should be unique were added for the wait ops here: 72c9cdb#diff-8f8adc7b3c118c97013725f0f4e69dd32f54e098c4e183f52a39049f34b2ed1bR288; refactored 6d60741#diff-8f8adc7b3c118c97013725f0f4e69dd32f54e098c4e183f52a39049f34b2ed1bL306 so 4.11 doesn't have this function at all. Your buildFailOnDuplicateOps should take care of setting the names...?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
buildFailOnDuplicateOps
is basically in charge of building the Wait op. A spearate field with the names is no longer needed.
@@ -132,12 +132,7 @@ func (oc *Controller) syncEgressFirewallRetriable(egressFirewalls []interface{}) | |||
for i := range egressFirewallACLs { | |||
egressFirewallACL := egressFirewallACLs[i] | |||
egressFirewallACL.Direction = types.DirectionToLPort | |||
aclName := "" | |||
if egressFirewallACL.Name != nil { | |||
aclName = *egressFirewallACL.Name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
forgive me for the stupid question that is unrelated to your PR, but just to understand, we never had waits for ACLs rights (72c9cdb#diff-54373a7069b8883bb780babfb45422a0c8af7211107571b700db55cf17feb863R292)? why do we have code where the name was explicitly set in many places for ACLs and you are removing them now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know but probably in preparation to actually have waits for ACLs as well in the future.
@@ -782,7 +771,6 @@ func (oc *Controller) createPolicyBasedRoutes(match, priority, nexthops string) | |||
}, | |||
}, | |||
{ | |||
Name: logicalRouter.Name, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self-note: partial revert of 72c9cdb#diff-9c68f154bc9189eb61d17c17d1392ee36c13a497f4f742a98250c2e0a272d48cR61 different because of 8b14898#diff-9c68f154bc9189eb61d17c17d1392ee36c13a497f4f742a98250c2e0a272d48cL68, being present in 4.11
@@ -297,7 +297,6 @@ func UpdateNodeSwitchExcludeIPs(nbClient libovsdbclient.Client, nodeName string, | |||
|
|||
opModels := []libovsdbops.OperationModel{ | |||
{ | |||
Name: logicalSwitchDes.Name, | |||
Model: &logicalSwitchDes, | |||
ModelPredicate: func(ls *nbdb.LogicalSwitch) bool { return ls.Name == nodeName }, | |||
OnModelMutations: []interface{}{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
commit2 LGTM: slightly confused around the ACL names, but mostly this is a combo of moving to buildFailOnDuplicateOps and removing the .Name
that was added in the initial waitOps commit.
self note:
model.go: exactly same from 73e491c#diff-305e20a256590719c5d0ae03a702e7070b42539c3d38478a99b89893312be20dR322
model_client.go doesn't have 73e491c#diff-54373a7069b8883bb780babfb45422a0c8af7211107571b700db55cf17feb863L467 rest is the same.
|
||
func isGuardOp(op *ovsdb.Operation) bool { | ||
return op != nil && op.Op == ovsdb.OperationWait && op.Timeout != nil && *op.Timeout == types.OVSDBWaitTimeout | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
commit 3: LGTM same as e490188#diff-54373a7069b8883bb780babfb45422a0c8af7211107571b700db55cf17feb863
NOTE: didn't review tests, I am going to open a JIRA card to run unit tests in 4.10 CI
@@ -188,30 +193,32 @@ func (m *ModelClient) CreateOrUpdateOps(opModels ...OperationModel) (interface{} | |||
If BulkOp is set, delete or mutate can happen accross multiple models found. | |||
*/ | |||
func (m *ModelClient) Delete(opModels ...OperationModel) error { | |||
ops, err := m.DeleteOps(opModels...) | |||
ops, err := m.DeleteOps(nil, opModels...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
awesome! :) this is exactly what I need for my cherry-pick :)
@@ -122,6 +122,10 @@ func (m *ModelClient) WithClient(client client.Client) *ModelClient { | |||
return &cl | |||
} | |||
|
|||
func onModelUpdatesNone() []interface{} { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
COMMIT4: LGTM
except this bit which came from cleanup, exactly same as c4a539b
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jcaamano, trozet, tssurya The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/label backport-risk-assessed |
/assign @anuragthehatter |
@anuragthehatter PTAL, this blocks a cherry-pick I need for https://bugzilla.redhat.com/show_bug.cgi?id=2105657 (CU bug) |
/label cherry-pick-approved |
@jcaamano: All pull requests linked via external trackers have merged: Bugzilla bug 2104454 has been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
This is a backport of #1110, containing:
Conflicts: