Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[release-4.10] Bug 2104454: improve performance of service sync #1173

Merged
merged 4 commits into from Aug 1, 2022

Conversation

jcaamano
Copy link
Contributor

@jcaamano jcaamano commented Jul 6, 2022

This is a backport of #1110, containing:

  • CARRY: cf3a846 libovsdb: passthrough ops in ModelClient
  • CARRY: 73e491c libovsdb: refactor duplicate detection in modelClient
  • 1aaad50 libovsdbops: only one wait per txn
  • 0573fe5 Don't lookup LBs that don't exist in cache

Conflicts:

Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>
(cherry picked from commit cf3a846)
@openshift-ci openshift-ci bot added bugzilla/severity-high Referenced Bugzilla bug's severity is high for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. labels Jul 6, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 6, 2022

@jcaamano: This pull request references Bugzilla bug 2104454, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

6 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.10.z) matches configured target release for branch (4.10.z)
  • bug is in the state NEW, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)
  • dependent bug Bugzilla bug 2070674 is in the state VERIFIED, which is one of the valid states (VERIFIED, RELEASE_PENDING, CLOSED (ERRATA), CLOSED (CURRENTRELEASE))
  • dependent Bugzilla bug 2070674 targets the "4.11.0" release, which is one of the valid target releases: 4.11.0
  • bug has dependents

Requesting review from QA contact:
/cc @anuragthehatter

In response to this:

[release-4.10] Bug 2104454: improve performance of service sync

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 6, 2022
Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>
(cherry picked from commit 73e491c)
Transactions is an all or nothing thing so we just need one wait per
transaction to check against duplicates.

Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>
(cherry picked from commit 1aaad50)
(cherry picked from commit e490188)
Running the LB predicate that matches on name takes a long time if the
LB table has many LBs. For example, looking up ~40 LBs in a table with
~200k rows took aproximately 3s.

The service controller has a second level cache and knows which LBs need
to be added and which need to be updated. Avoid this lookup for LBs that
are to be added.

Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>
(cherry picked from commit 0573fe5)
(cherry picked from commit c4a539b)
@tssurya
Copy link
Contributor

tssurya commented Jul 7, 2022

/cc

@openshift-ci openshift-ci bot requested a review from tssurya July 7, 2022 16:11
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 7, 2022

@jcaamano: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-openstack-ovn 3369d38 link false /test e2e-openstack-ovn
ci/prow/okd-e2e-gcp-ovn 3369d38 link false /test okd-e2e-gcp-ovn

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Copy link
Contributor

@tssurya tssurya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

func (m *ModelClient) buildOps(ops []ovsdb.Operation, doWhenFound opModelToOpMapper, doWhenNotFound opModelToOpMapper, opModels ...OperationModel) (interface{}, []ovsdb.Operation, error) {
if ops == nil {
ops = []ovsdb.Operation{}
}
notfound := []interface{}{}
for _, opModel := range opModels {
if opModel.ExistingResult == nil && opModel.Model != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commit1: LGTM, same as 4.11 no conflicts from cf3a846

@@ -105,17 +105,6 @@ func createOrUpdateLoadBalancerOps(nbClient libovsdbclient.Client, ops []libovsd

// If LoadBalancer does not exist, create it
if err == libovsdbclient.ErrNotFound {
timeout := types.OVSDBWaitTimeout
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true. Maybe I should just pick the whole thing.

ops = append(ops, libovsdb.Operation{
Op: libovsdb.OperationWait,
Timeout: &timeout,
Table: "Load_Balancer",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self-note: EnsureLBs-> CreateOrUpdateLoadBalancersOps -> createOrUpdateLoadBalancerOps -> modelClient.CreateOrUpdateOps -> modelClient.createOrUpdateOps -> buildFailOnDuplicateOps -> wait ops moved there.

@@ -304,15 +304,9 @@ func AddACLToNodeSwitch(nbClient libovsdbclient.Client, nodeName string, nodeACL
Name: nodeName,
}

aclName := ""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So these names should be unique were added for the wait ops here: 72c9cdb#diff-8f8adc7b3c118c97013725f0f4e69dd32f54e098c4e183f52a39049f34b2ed1bR288; refactored 6d60741#diff-8f8adc7b3c118c97013725f0f4e69dd32f54e098c4e183f52a39049f34b2ed1bL306 so 4.11 doesn't have this function at all. Your buildFailOnDuplicateOps should take care of setting the names...?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

buildFailOnDuplicateOps is basically in charge of building the Wait op. A spearate field with the names is no longer needed.

@@ -132,12 +132,7 @@ func (oc *Controller) syncEgressFirewallRetriable(egressFirewalls []interface{})
for i := range egressFirewallACLs {
egressFirewallACL := egressFirewallACLs[i]
egressFirewallACL.Direction = types.DirectionToLPort
aclName := ""
if egressFirewallACL.Name != nil {
aclName = *egressFirewallACL.Name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

forgive me for the stupid question that is unrelated to your PR, but just to understand, we never had waits for ACLs rights (72c9cdb#diff-54373a7069b8883bb780babfb45422a0c8af7211107571b700db55cf17feb863R292)? why do we have code where the name was explicitly set in many places for ACLs and you are removing them now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know but probably in preparation to actually have waits for ACLs as well in the future.

@@ -782,7 +771,6 @@ func (oc *Controller) createPolicyBasedRoutes(match, priority, nexthops string)
},
},
{
Name: logicalRouter.Name,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -297,7 +297,6 @@ func UpdateNodeSwitchExcludeIPs(nbClient libovsdbclient.Client, nodeName string,

opModels := []libovsdbops.OperationModel{
{
Name: logicalSwitchDes.Name,
Model: &logicalSwitchDes,
ModelPredicate: func(ls *nbdb.LogicalSwitch) bool { return ls.Name == nodeName },
OnModelMutations: []interface{}{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commit2 LGTM: slightly confused around the ACL names, but mostly this is a combo of moving to buildFailOnDuplicateOps and removing the .Name that was added in the initial waitOps commit.

self note:
model.go: exactly same from 73e491c#diff-305e20a256590719c5d0ae03a702e7070b42539c3d38478a99b89893312be20dR322
model_client.go doesn't have 73e491c#diff-54373a7069b8883bb780babfb45422a0c8af7211107571b700db55cf17feb863L467 rest is the same.


func isGuardOp(op *ovsdb.Operation) bool {
return op != nil && op.Op == ovsdb.OperationWait && op.Timeout != nil && *op.Timeout == types.OVSDBWaitTimeout
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commit 3: LGTM same as e490188#diff-54373a7069b8883bb780babfb45422a0c8af7211107571b700db55cf17feb863
NOTE: didn't review tests, I am going to open a JIRA card to run unit tests in 4.10 CI

@@ -188,30 +193,32 @@ func (m *ModelClient) CreateOrUpdateOps(opModels ...OperationModel) (interface{}
If BulkOp is set, delete or mutate can happen accross multiple models found.
*/
func (m *ModelClient) Delete(opModels ...OperationModel) error {
ops, err := m.DeleteOps(opModels...)
ops, err := m.DeleteOps(nil, opModels...)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome! :) this is exactly what I need for my cherry-pick :)

@@ -122,6 +122,10 @@ func (m *ModelClient) WithClient(client client.Client) *ModelClient {
return &cl
}

func onModelUpdatesNone() []interface{} {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

COMMIT4: LGTM
except this bit which came from cleanup, exactly same as c4a539b

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jul 22, 2022
Copy link
Contributor

@trozet trozet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 25, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jcaamano, trozet, tssurya

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@trozet
Copy link
Contributor

trozet commented Jul 25, 2022

/label backport-risk-assessed

@openshift-ci openshift-ci bot added the backport-risk-assessed Indicates a PR to a release branch has been evaluated and considered safe to accept. label Jul 25, 2022
@trozet
Copy link
Contributor

trozet commented Jul 25, 2022

/assign @anuragthehatter

@tssurya
Copy link
Contributor

tssurya commented Jul 29, 2022

@anuragthehatter PTAL, this blocks a cherry-pick I need for https://bugzilla.redhat.com/show_bug.cgi?id=2105657 (CU bug)

@anuragthehatter
Copy link

/label cherry-pick-approved

@openshift-ci openshift-ci bot added the cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. label Aug 1, 2022
@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 2 against base HEAD 358801d and 8 for PR HEAD 3369d38 in total

@openshift-merge-robot openshift-merge-robot merged commit 464b78d into openshift:release-4.10 Aug 1, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 1, 2022

@jcaamano: All pull requests linked via external trackers have merged:

Bugzilla bug 2104454 has been moved to the MODIFIED state.

In response to this:

[release-4.10] Bug 2104454: improve performance of service sync

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. backport-risk-assessed Indicates a PR to a release branch has been evaluated and considered safe to accept. bugzilla/severity-high Referenced Bugzilla bug's severity is high for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants