[WIP] Scale fixes - only LBG #814

dceara · 2021-11-03T13:57:08Z

TODO: handle case when LBG are not supported by OVN. Signed-off-by: Dumitru Ceara <dceara@redhat.com>

Signed-off-by: Dumitru Ceara <dceara@redhat.com>

Allow up to 1k pods per node. Signed-off-by: Dumitru Ceara <dceara@redhat.com>

Signed-off-by: Dumitru Ceara <dceara@redhat.com>

…builds). By default log DB transactions for LSP and Port_Binding. In ovn-controller log iface-id-ver mismatches. Signed-off-by: Dumitru Ceara <dceara@redhat.com>

This includes ovsdb-idl changes from: dceara/ovs@0414012 Signed-off-by: Dumitru Ceara <dceara@redhat.com>

openshift-ci · 2021-11-03T14:02:11Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: dceara
To complete the pull request process, please assign trozet after the PR has been reviewed.
You can assign the PR to them by writing /assign @trozet in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

If a single routeInfo had multiple external routes that matched the same gateway IP, that routeInfo could be added to the returned list multiple times. It was only locked once, however. But deletePodGWRoutesForNamespace() iterates over the list returned by getRouteInfosForGateway() and calls Unlock() on each element in the list, which can create an unbalanced locking situation (multiple Unlocks for a single Lock) in the above situation where a routeInfo has multiple external routes matching the same gatewayIP. Signed-off-by: Dan Williams <dcbw@redhat.com>

At least give a clue what's wrong instead of just: "failed: exit status 1"

When deciding if a service needs updating, we do a reflect.DeepEqual on the existing load balancers that are stored in a cache, vs the load balancers that would be built by the operation. This comparison was incorrect, because it was comparing the UUID field, which will never be equal. The end result is potentially a lot more pressure on OVN for transactions that essentially do nothing. Signed-off-by: Tim Rozet <trozet@redhat.com>

There's no point in limiting the timeout to something lower than the CRI timeout; it just causes pods in large-scale clusters to hit the kubelet retry backoffs and take even longer to bring up. Continued improvements to OVN will reduce the need for longer timeouts in the near future. Signed-off-by: Dan Williams <dcbw@redhat.com>

This picks up OVN fix for setting lrp gw chassis: ovn-org/ovn@e6bcb88 Signed-off-by: Dumitru Ceara <dceara@redhat.com>

dceara · 2021-11-16T15:45:57Z

/test images

dceara · 2021-11-16T16:07:32Z

/test images

dceara · 2021-11-16T16:29:04Z

/test images

dceara · 2021-11-16T16:54:03Z

/retest

Cherry-picked from: ovn-org/ovn-kubernetes@f18feab Signed-off-by: Dan Williams <dcbw@redhat.com>

openshift-ci · 2021-11-17T12:06:04Z

@dceara: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci · 2021-11-17T12:08:13Z

@dceara: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/e2e-vsphere-ovn	`d8b7b0b`	link	false	`/test e2e-vsphere-ovn`
ci/prow/e2e-aws-ovn-upgrade	`d8b7b0b`	link	true	`/test e2e-aws-ovn-upgrade`
ci/prow/e2e-ovn-hybrid-step-registry	`d8b7b0b`	link	false	`/test e2e-ovn-hybrid-step-registry`
ci/prow/okd-e2e-gcp-ovn	`d8b7b0b`	link	false	`/test okd-e2e-gcp-ovn`
ci/prow/e2e-azure-ovn	`d8b7b0b`	link	false	`/test e2e-azure-ovn`
ci/prow/e2e-gcp-ovn	`d8b7b0b`	link	true	`/test e2e-gcp-ovn`
ci/prow/e2e-aws-ovn-windows	`d8b7b0b`	link	true	`/test e2e-aws-ovn-windows`
ci/prow/e2e-aws-ovn	`d8b7b0b`	link	true	`/test e2e-aws-ovn`
ci/prow/e2e-aws-ovn-local-gateway	`d8b7b0b`	link	true	`/test e2e-aws-ovn-local-gateway`
ci/prow/e2e-vsphere-windows	`d8b7b0b`	link	false	`/test e2e-vsphere-windows`
ci/prow/e2e-openstack-ovn	`d8b7b0b`	link	false	`/test e2e-openstack-ovn`
ci/prow/e2e-metal-ipi-ovn-dualstack	`d8b7b0b`	link	true	`/test e2e-metal-ipi-ovn-dualstack`
ci/prow/4.10-upgrade-from-stable-4.9-e2e-aws-ovn-upgrade	`d8b7b0b`	link	false	`/test 4.10-upgrade-from-stable-4.9-e2e-aws-ovn-upgrade`
ci/prow/4.10-upgrade-from-stable-4.9-images	`d8b7b0b`	link	true	`/test 4.10-upgrade-from-stable-4.9-images`
ci/prow/okd-images	`d8b7b0b`	link	true	`/test okd-images`
ci/prow/e2e-metal-ipi-ovn-ipv6	`d8b7b0b`	link	true	`/test e2e-metal-ipi-ovn-ipv6`
ci/prow/images	`d8b7b0b`	link	true	`/test images`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

dceara · 2021-11-18T11:52:02Z

Latest WIP scale testing PR: #839

dceara added 8 commits November 3, 2021 14:53

WIP: Use Load_Balancer_Groups.

6b31078

TODO: handle case when LBG are not supported by OVN. Signed-off-by: Dumitru Ceara <dceara@redhat.com>

WIP: Handle case when OVN doesn't support LB Groups.

f2a4b01

Signed-off-by: Dumitru Ceara <dceara@redhat.com>

ovn: bump to -21 scratch build with load balancer groups schema fix.

4caf269

Signed-off-by: Dumitru Ceara <dceara@redhat.com>

Use Dockerfile.fedora.dev for kind.

5c259f4

Allow up to 1k pods per node. Signed-off-by: Dumitru Ceara <dceara@redhat.com>

HACK: Always set iface-id-ver.

e549dbc

Signed-off-by: Dumitru Ceara <dceara@redhat.com>

ovs: bump to -21 scratch build with DB transaction logging.

9e015e3

Signed-off-by: Dumitru Ceara <dceara@redhat.com>

ovs: ovn: Add more logging (bump ovs to -22, bump ovn to -24 scratch …

969b566

…builds). By default log DB transactions for LSP and Port_Binding. In ovn-controller log iface-id-ver mismatches. Signed-off-by: Dumitru Ceara <dceara@redhat.com>

ovn: bump to -24 scratch build.

b7683e7

This includes ovsdb-idl changes from: dceara/ovs@0414012 Signed-off-by: Dumitru Ceara <dceara@redhat.com>

openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 3, 2021

openshift-ci bot requested review from abhat and tssurya November 3, 2021 14:02

dcbw and others added 5 commits November 4, 2021 14:49

CARRY: services: print out nbctl error message when service sync fails

e60ab30

At least give a clue what's wrong instead of just: "failed: exit status 1"

ovn: bump to -25 scratch build.

56b9506

This picks up OVN fix for setting lrp gw chassis: ovn-org/ovn@e6bcb88 Signed-off-by: Dumitru Ceara <dceara@redhat.com>

dceara force-pushed the scale-fixes-lbg branch from 6855b4f to 56b9506 Compare November 15, 2021 17:07

testing

d8b7b0b

Cherry-picked from: ovn-org/ovn-kubernetes@f18feab Signed-off-by: Dan Williams <dcbw@redhat.com>

dceara force-pushed the scale-fixes-lbg branch from 7586517 to d8b7b0b Compare November 17, 2021 12:05

openshift-ci bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 17, 2021

dceara mentioned this pull request Nov 17, 2021

[WIP] Scale fixes lbg 2 #839

Closed

dceara closed this Nov 18, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Scale fixes - only LBG #814

[WIP] Scale fixes - only LBG #814

dceara commented Nov 3, 2021

openshift-ci bot commented Nov 3, 2021

dceara commented Nov 16, 2021

dceara commented Nov 16, 2021

dceara commented Nov 16, 2021

dceara commented Nov 16, 2021

openshift-ci bot commented Nov 17, 2021

openshift-ci bot commented Nov 17, 2021

dceara commented Nov 18, 2021

[WIP] Scale fixes - only LBG #814

[WIP] Scale fixes - only LBG #814

Conversation

dceara commented Nov 3, 2021

openshift-ci bot commented Nov 3, 2021

dceara commented Nov 16, 2021

dceara commented Nov 16, 2021

dceara commented Nov 16, 2021

dceara commented Nov 16, 2021

openshift-ci bot commented Nov 17, 2021

openshift-ci bot commented Nov 17, 2021

dceara commented Nov 18, 2021