Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DownstreamMerge] Merge 2021-10-13 #796

Merged
merged 402 commits into from
Oct 28, 2021

Conversation

trozet
Copy link
Contributor

@trozet trozet commented Oct 13, 2021

Passes unit tests 😥

dcbw and others added 30 commits August 23, 2021 07:04
Add quotes around nexthop and dst-ip fields
The previous gomega matching function `ContainsElement` doesn't
assert that objects which are deleted, are also not retrieved in the
final result. Ex: item1, item2 exists, item1 is deleted. With
`ContainsElement(item2)` we cannot fully assert that item1 is gone. With
`ConsistOf(item1)` we can.

Signed-off-by: Alexander Constantinescu <aconstan@redhat.com>
Succesor for PR 2331 [fix reserve joinSwitch LRP IPs]
Function was accidentally using the non-hashed names.

Signed-off-by: Tim Rozet <trozet@redhat.com>
Signed-off-by: Alexander Constantinescu <aconstan@redhat.com>
This new version includes support for multiple DGPs per router, which is
needed for an upcoming commit.

Signed-off-by: Han Zhou <hzhou@ovn.org>
This new version includes support for multiple DGPs per router, which is
needed for an upcoming commit.

Signed-off-by: Han Zhou <hzhou@ovn.org>
This patch sets gateway-chassis to the node itself for each logical
router port that connects node level switch to the cluster router. This
makes use of the OVN optimization [0] that avoids flood-filling
unrelated datapaths by ovn-controller on each node, which perfectly
fits the ovn-kubernetes deployments where each logical switch is bound
to a chassis, significantly improving scalability of ovn-controller (10
times faster for recompute and 70-80% memory savings for ovn-controller
+ OVS combined. See more details in [0]).

Note: this optimization is disabled for gateway_mode = local for now, which
requires NAT on the cluster router, which is not supported when multiple
DGPs are used on the same router. The support may be added in the future.

[0] ovn-org/ovn@22298fd

Signed-off-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Tim Rozet <trozet@redhat.com>
Update installation documentation and add a walkthrough for
an installation with kubeadm.

Signed-off-by: Andreas Karis <ak.karis@gmail.com>
Signed-off-by: Alexander Constantinescu <aconstan@redhat.com>
[libovsdb] Modify matchers to take into account optional field values
Dockerfile.fedora: Update ovn build to ovn-21.06.0-15.fc33.
Use DGP to connect logical switches to the cluster router.
Adding a small but important fix to the CI documentation to export
missing variables.

Signed-off-by: Andreas Karis <ak.karis@gmail.com>
[libovsdb] Bump to main: optional field value fix for in-mem DB
Fixes backwards values for election timer
Currently we call addPodExternalGW from both ensurePod and
addLogicalPort. If ensurePod is called with addPort=false
from the UpdatePodHandler, it will keep trying to call
addPodExternalGW to add routes and policies repeatedly
for the same pod. During exgw pod creation, we see same routes
and policies getting added 3 times in a row.

Note that ovn-org/ovn-kubernetes#2337
fixed this by adding a check into addGWRoutesForNamespace to
return if routes already exist for the pod, but this comes
later in the code flow. Its better to not call addPodExternalGW
at all unless needed. This would save time and help with
pod latency issues specially at scale.

Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
Fix ensurePod to call addPodExternalGW only for annotation updates
…low-up

Documentation: Improve CI documentation
Documentation: Update installation documentation
factory: split watcher creation and start
Geneve packets coming from external should skip conntrack and go
directly to host. Due to a typo, they were sent out back to
external for IPv6. Fix this.

Signed-off-by: Andreas Karis <ak.karis@gmail.com>
No reason to have an internal cache of ips anymore now that we have
libovsdb.

Signed-off-by: Tim Rozet <trozet@redhat.com>
Shared GW: Fix wrong action for IPv6 incoming Geneve packets
Get rid of address sets lock and ips cache
When populating the cache, if for some reason the node's GW LRP
already has that IP, don't fail.

E0826 13:45:04.449574 1 namespace.go:587] Failed to get join switch port IP address for node ip-10-0-152-74.us-west-2.compute.internal: provided IP is already allocated
E0826 13:45:04.620345 1 master.go:1260] Failed to get join switch port IP address for node ip-10-0-152-74.us-west-2.compute.internal: provided IP is already allocated

Signed-off-by: Dan Williams <dcbw@redhat.com>
When a node gets deleted, the logical switch and
gateway router get removed first. However we do
not update the LBCache and this causes dependency
failures during deletion leaving behind wrong ip->vip
pairing in the lbs.

When the pods on this node go away, ensureLBs
gets triggered and it tries to update endpoints across
load balancers. This set load_balancer command is
batched with ls-lb-del & ls-lr-del commands to remove
association of the lbs from the removed switches and
routers. These commands fail together because the switches
and routers are not present in ovn.

This PR updates the LBcache as soon as we remove the
routers and switches.

Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 28, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alexanderConstantinescu, trozet

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@astoycos
Copy link
Contributor

I ran this against upstream's CI here -> https://github.com/astoycos/ovn-kubernetes/runs/4035376913?check_suite_focus=true Signal is looking alright so far (I would expect the upgrade jobs to fail)

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

12 similar comments
@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 28, 2021

@trozet: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/okd-e2e-gcp-ovn 7ba605d link false /test okd-e2e-gcp-ovn
ci/prow/e2e-vsphere-windows 7ba605d link false /test e2e-vsphere-windows

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@dcbw
Copy link
Contributor

dcbw commented Oct 28, 2021

/override e2e-aws-ovn-windows

time="2021-10-28T21:55:05Z" level=info msg="Created Subscription: windows-machine-config-operator-v4-0-0-sub"
{"component":"entrypoint","file":"prow/entrypoint/run.go:165","func":"k8s.io/test-infra/prow/entrypoint.Options.ExecuteProcess","level":"error","msg":"Process did not finish before 2h15m0s timeout","severity":"error","time":"2021-10-28T21:59:22Z"}
{"component":"entrypoint","file":"prow/entrypoint/run.go:255","func":"k8s.io/test-infra/prow/entrypoint.gracefullyTerminate","level":"error","msg":"Process did not exit before 15s grace period","severity":"error","time":"2021-10-28T21:59:37Z"}
{"component":"entrypoint","error":"process timed out","file":"prow/entrypoint/run.go:80","func":"k8s.io/test-infra/prow/entrypoint.Options.Run","level":"error","msg":"Error executing test process","severity":"error","time":"2021-10-28T21:59:37Z"}
error: failed to execute wrapped command: exit status 127 

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 28, 2021

@dcbw: /override requires a failed status context or a job name to operate on.
The following unknown contexts were given:

  • e2e-aws-ovn-windows

Only the following contexts were expected:

  • ci/prow/4.10-upgrade-from-stable-4.8-images
  • ci/prow/e2e-aws-ovn
  • ci/prow/e2e-aws-ovn-local-gateway
  • ci/prow/e2e-aws-ovn-windows
  • ci/prow/e2e-azure-ovn
  • ci/prow/e2e-gcp-ovn
  • ci/prow/e2e-metal-ipi-ovn-dualstack
  • ci/prow/e2e-metal-ipi-ovn-ipv6
  • ci/prow/e2e-openstack-ovn
  • ci/prow/e2e-ovn-hybrid-step-registry
  • ci/prow/e2e-vsphere-ovn
  • ci/prow/e2e-vsphere-windows
  • ci/prow/images
  • ci/prow/okd-e2e-gcp-ovn
  • ci/prow/okd-images
  • pull-ci-openshift-ovn-kubernetes-master-4.10-upgrade-from-stable-4.8-images
  • pull-ci-openshift-ovn-kubernetes-release-4.1-images
  • pull-ci-openshift-ovn-kubernetes-release-4.10-e2e-aws-ovn
  • pull-ci-openshift-ovn-kubernetes-release-4.10-e2e-aws-ovn-local-gateway
  • pull-ci-openshift-ovn-kubernetes-release-4.10-e2e-aws-ovn-windows
  • pull-ci-openshift-ovn-kubernetes-release-4.10-e2e-azure-ovn
  • pull-ci-openshift-ovn-kubernetes-release-4.10-e2e-gcp-ovn
  • pull-ci-openshift-ovn-kubernetes-release-4.10-e2e-metal-ipi-ovn-dualstack
  • pull-ci-openshift-ovn-kubernetes-release-4.10-e2e-metal-ipi-ovn-ipv6
  • pull-ci-openshift-ovn-kubernetes-release-4.10-e2e-openstack-ovn
  • pull-ci-openshift-ovn-kubernetes-release-4.10-e2e-ovn-hybrid-step-registry
  • pull-ci-openshift-ovn-kubernetes-release-4.10-e2e-vsphere-ovn
  • pull-ci-openshift-ovn-kubernetes-release-4.10-e2e-vsphere-windows
  • pull-ci-openshift-ovn-kubernetes-release-4.10-okd-e2e-gcp-ovn
  • pull-ci-openshift-ovn-kubernetes-release-4.10-okd-images
  • tide

In response to this:

/override e2e-aws-ovn-windows

time="2021-10-28T21:55:05Z" level=info msg="Created Subscription: windows-machine-config-operator-v4-0-0-sub"
{"component":"entrypoint","file":"prow/entrypoint/run.go:165","func":"k8s.io/test-infra/prow/entrypoint.Options.ExecuteProcess","level":"error","msg":"Process did not finish before 2h15m0s timeout","severity":"error","time":"2021-10-28T21:59:22Z"}
{"component":"entrypoint","file":"prow/entrypoint/run.go:255","func":"k8s.io/test-infra/prow/entrypoint.gracefullyTerminate","level":"error","msg":"Process did not exit before 15s grace period","severity":"error","time":"2021-10-28T21:59:37Z"}
{"component":"entrypoint","error":"process timed out","file":"prow/entrypoint/run.go:80","func":"k8s.io/test-infra/prow/entrypoint.Options.Run","level":"error","msg":"Error executing test process","severity":"error","time":"2021-10-28T21:59:37Z"}
error: failed to execute wrapped command: exit status 127 

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@dcbw
Copy link
Contributor

dcbw commented Oct 28, 2021

/override ci/prow/e2e-aws-ovn-windows

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 28, 2021

@dcbw: Overrode contexts on behalf of dcbw: ci/prow/e2e-aws-ovn-windows

In response to this:

/override ci/prow/e2e-aws-ovn-windows

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-merge-robot openshift-merge-robot merged commit 13f767b into openshift:master Oct 28, 2021
dcbw added a commit to dcbw/ovn-kubernetes-1 that referenced this pull request Oct 29, 2021
Created by inspecting the result of:

git diff 7df9969

where 7df996 the github.com/ovn-org/ovn-kubernetes commit that
openshift#796 was merged
from, and reverting the parts that don't need to be different
downstream.
dcbw added a commit to dcbw/ovn-kubernetes-1 that referenced this pull request Oct 29, 2021
Created by inspecting the result of:

git diff 7df9969

where 7df996 the github.com/ovn-org/ovn-kubernetes commit that
openshift#796 was merged
from, and reverting the parts that don't need to be different
downstream.
dcbw added a commit to dcbw/ovn-kubernetes-1 that referenced this pull request Oct 29, 2021
Created by inspecting the result of:

git diff 7df9969

where 7df996 the github.com/ovn-org/ovn-kubernetes commit that
openshift#796 was merged
from, and reverting the parts that don't need to be different
downstream.
stbenjam added a commit to stbenjam/ovn-kubernetes-1 that referenced this pull request Nov 1, 2021
deads2k added a commit that referenced this pull request Nov 2, 2021
trozet added a commit to trozet/ovn-kubernetes-1 that referenced this pull request Nov 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.