Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge 2021-07-18 #609

Merged
merged 57 commits into from Jul 19, 2021
Merged

Conversation

trozet
Copy link
Contributor

@trozet trozet commented Jul 16, 2021

bostrt and others added 30 commits April 8, 2021 10:28
…ide Pod

Signed-off-by: Robert Bost <rbost@redhat.com>
Signed-off-by: Christoph Stäbler <cstabler@redhat.com>
Since we do not have a config parser that checks if
HybridOverlay.ClusterSubnets is set if HybridOverlay.Enabled is true,
before adding the lr-in-policy we should check if there are any
clustersubnets in the config. Since we don't have a default value
for this field it ends up creating policies that have nil dst
fields.

Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
- Support in Makefile (e.g. OCI_BIN=podman make fedora)
- Support in kind.sh with flag "-ep podman"

Signed-off-by: Joel Takvorian <jtakvori@redhat.com>
ovnkube-trace will now look for ovnkube pods in all namespaces and it
will pick the namespace of the first matching pod that it finds.
A manual override with -ovn-config-namespace still takes precedence.

Signed-off-by: Andreas Karis <ak.karis@gmail.com>
For now, the document only features a short introduction about
multicast. Follow-up commits will add information on how to:
- configure the cluster to use multicast
- description of the OVN north entities used to implement
multicast

Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com>
Check /sys/class/net for ifindex when ip command is not available inside Pod
When a gw pod gets the external gateway annotation, it adds the specific
routes to the external gateway for existing pods, but it does not remove
the SNAT that was added when the pod was created.

Signed-off-by: Federico Paolinelli <fpaoline@redhat.com>
Signed-off-by: Alexander Constantinescu <aconstan@redhat.com>
Improve logging message when address move fails

Signed-off-by: Andreas Karis <ak.karis@gmail.com>
Right now we add the vipProtocol to the list of vipProtocols to remove
only if the loadbalancer is not the idling one. The problem is, we
update the loadbalancer just before this check, with the effect of not
removing the vip from the existing balancers. This causes the vip to
stay both on the idling and non idling loadbalancers, with non
predictable effects.

Signed-off-by: Federico Paolinelli <fpaoline@redhat.com>
…ging

nicstobridge.go: Improve logging message when address move fails
…more

If ovnkube-master is a bit slow and the service is already gone from
the apiserver and informer cache, we still want to clean it up. But
we don't want to pass a nil service to Eventf() which will cause
the following panic:

I0708 15:46:33.636318       1 services_controller.go:518] Deleting service cluster-density-503fcc1a-1066-44d9-bef7-43292b046b06-151/deployment-2pod-151-1
E0708 15:46:35.468114       1 utils.go:93] Error deleting VIP [172.30.55.216:80] on OVN LoadBalancer [01801284-6384-498c-8ef8-a08a6e63c77f 072075fd-eb15-47cc-bb17-05c8b4b49634 0a11b3f8-b1ba-4649-815b-8117442236d
5 10c04472-7cc0-4247-aa7b-2f166e8d4ab1 13a0737b-9859-4654-9a45-df034a2d1098 1532a693-b1e7-438e-b125-9e79953baf73 1b860b67-d31d-4633-98ba-5b62d2db1f2e 240b1487-b592-4c5d-a30c-1910c17512a0 2a25228b-7860-4342-bb88-
b75d0068cfa1 3417cdde-16a7-4db0-b717-319ad7459a32 41ebdb00-9cce-47bf-9849-e1a203f915b1 45a7e1bb-62d0-4ec9-871b-53320f031d69 45e40b3c-4cde-4c69-8980-6b067e1c6f45 4bebffa0-a635-4880-81e7-6da162f65ae0 4ce91155-325f
-4a85-95cc-3b8504a24a19 4eb3e323-8278-48be-a2d7-3288285cfe0e 4f576145-ef6e-4408-8781-a229e8c73506 587912ee-e688-4ef0-95cc-38c1e44a0108 5d5bfbf6-da93-4044-9263-c716aa699b0d 66625dce-c24c-4479-a576-8eef0fee4055 67
28cfbd-54ad-493d-9de5-f29189c6509b 706ade56-cdc8-4a20-8836-637837ad4d84 71927cfe-ac03-440c-ac9e-227aa6c330c6 71a14d97-0ee3-42b1-a47a-63589504c0a1 7b7777e0-a9d8-4417-a8ea-2aa66644d02b 81b048c3-4a70-4eaa-b17f-aa9a
f440d9fb 8432d426-a68f-4757-9aef-7501f70c46f7 8b4e2ab3-82e8-4baa-b7c0-abaf6a83d0fa 8f81fc2c-e2c6-4d64-ae5f-c86d25f5687f 93542f94-9dec-4cbf-9200-3ce926d3e560 97b89ca2-2335-46e6-88eb-ed74d084789f a46434bf-26b5-416
8-bb6c-69cc328e98ad a4648426-4d7f-4bf7-b40e-4df1fbc6cf10 a68ed82d-f2de-4cd9-aa1b-264e78a5ed26 a8f2055d-406f-47f7-b8c6-4abc7de7ffac aaa40b67-8c2b-4c29-bdf7-8d0802b89c49 ae11be0f-8053-419e-ad2e-0ba960870dc6 af7c56
6e-ed23-4e2a-8696-668a598e7a7b b5e67f77-2a3e-4366-aa0b-6705eb254282 bcd4c9eb-13c9-436b-a355-37f237969de5 c1d9ce94-7b19-49ef-8159-39450968e50e c7192d48-3db5-4a49-a849-ec8a814728e9 cefa01d7-5120-4b70-9cf0-dedb52b2
9753 d02c33af-6d30-40e4-a678-85b3db567ae1 d5b2f49c-1cdb-4086-83f5-9a6993d8b0ef d6ddb601-30ba-4b3f-aa63-62afff9b361c dbfddd53-5a09-41eb-bbf2-33916b240fc1 e8746562-a990-40a3-b142-466e0d77c6f4 ec44d001-484b-4fe2-b7
e3-94c060c22b0f ee2734c5-21a0-42ec-813c-be174c7b751f f21577cc-78b0-47dd-b342-0de071c77743 f2d00d0a-0f4b-4863-81c0-8172084bcf38 f2fb2f8b-33e3-40f5-8859-d2f0a99b376e f327ce83-d94f-4b56-9e34-d0fa3ae5de9f f4b120db-e
21f-4d74-87cc-abd2a88779c3]
I0708 15:46:35.468207       1 services_controller.go:223] Finished syncing service deployment-2pod-123-1 on namespace cluster-density-503fcc1a-1066-44d9-bef7-43292b046b06-123 : 15.673423702s
E0708 15:46:35.468297       1 runtime.go:78] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
goroutine 8824 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic(0x18f2520, 0x28a4910)
        /go/src/github.com/openshift/ovn-kubernetes/go-controller/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:74 +0x95
k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
        /go/src/github.com/openshift/ovn-kubernetes/go-controller/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:48 +0x86
panic(0x18f2520, 0x28a4910)
        /usr/lib/golang/src/runtime/panic.go:965 +0x1b9
k8s.io/api/core/v1.(*Service).GetObjectKind(0x0, 0x7fa626221c00, 0x0)
        <autogenerated>:1 +0x5
k8s.io/client-go/tools/reference.GetReference(0xc0002bb490, 0x1d3f9a0, 0x0, 0x7fa5dc16fb38, 0x0, 0x0)
        /go/src/github.com/openshift/ovn-kubernetes/go-controller/vendor/k8s.io/client-go/tools/reference/ref.go:59 +0x14d
k8s.io/client-go/tools/record.(*recorderImpl).generateEvent(0xc000f673c0, 0x1d3f9a0, 0x0, 0x0, 0xc031e556dbe7c809, 0x65dd7aa6698, 0x28e28c0, 0x1afbbca, 0x7, 0x1b1bd64, ...)
        /go/src/github.com/openshift/ovn-kubernetes/go-controller/vendor/k8s.io/client-go/tools/record/event.go:327 +0x5d
k8s.io/client-go/tools/record.(*recorderImpl).Event(0xc000f673c0, 0x1d3f9a0, 0x0, 0x1afbbca, 0x7, 0x1b1bd64, 0x1d, 0xc0105b8000, 0x1ec3)
        /go/src/github.com/openshift/ovn-kubernetes/go-controller/vendor/k8s.io/client-go/tools/record/event.go:349 +0xc5
k8s.io/client-go/tools/record.(*recorderImpl).Eventf(0xc000f673c0, 0x1d3f9a0, 0x0, 0x1afbbca, 0x7, 0x1b1bd64, 0x1d, 0x1b62102, 0x41, 0xc00df6c720, ...)
        /go/src/github.com/openshift/ovn-kubernetes/go-controller/vendor/k8s.io/client-go/tools/record/event.go:353 +0xca
github.com/ovn-org/ovn-kubernetes/go-controller/pkg/ovn/controller/services.(*Controller).syncServices(0xc00176b4d0, 0xc00a217d60, 0x4e, 0x0, 0x0)
        /go/src/github.com/openshift/ovn-kubernetes/go-controller/pkg/ovn/controller/services/services_controller.go:246 +0x682
github.com/ovn-org/ovn-kubernetes/go-controller/pkg/ovn/controller/services.(*Controller).processNextWorkItem(0xc00176b4d0, 0x203000)
        /go/src/github.com/openshift/ovn-kubernetes/go-controller/pkg/ovn/controller/services/services_controller.go:184 +0xcd
github.com/ovn-org/ovn-kubernetes/go-controller/pkg/ovn/controller/services.(*Controller).worker(...)
        /go/src/github.com/openshift/ovn-kubernetes/go-controller/pkg/ovn/controller/services/services_controller.go:173
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0xc002f319b0)
        /go/src/github.com/openshift/ovn-kubernetes/go-controller/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:155 +0x5f
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc002f319b0, 0x1d339c0, 0xc0024048a0, 0xc001987701, 0xc000210780)
        /go/src/github.com/openshift/ovn-kubernetes/go-controller/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:156 +0x9b
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc002f319b0, 0x3b9aca00, 0x0, 0xc001987701, 0xc000210780)
        /go/src/github.com/openshift/ovn-kubernetes/go-controller/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133 +0x98
k8s.io/apimachinery/pkg/util/wait.Until(0xc002f319b0, 0x3b9aca00, 0xc000210780)
        /go/src/github.com/openshift/ovn-kubernetes/go-controller/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:90 +0x4d
created by github.com/ovn-org/ovn-kubernetes/go-controller/pkg/ovn/controller/services.(*Controller).Run
        /go/src/github.com/openshift/ovn-kubernetes/go-controller/pkg/ovn/controller/services/services_controller.go:161 +0x3b1
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
        panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0xd3aa05]

Signed-off-by: Dan Williams <dcbw@redhat.com>
Due to lock contention, and because our namespace handler is
single-threaded, namespace additions may pile up and take a
while. Log how long they take, and also when namespaces are
added and deleted so we can better debug these issues.

Signed-off-by: Dan Williams <dcbw@redhat.com>
…-patching-pods-2278

Add unit test for Kube.SetAnnotationsOnPod
Add information to the multicast doc explaining on to enable it
on a given namespace.

Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com>
Show ACL examples of:
- allow multicast traffic per namespace
- block all multicast traffic (lesser priority than ACL above)

Furthermore, also show the port groups and address sets per
namespace.

Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com>
Show the multicast related changes to each node's logical switch:
- set snooping
- set querier
- specify src MAC address
- specify src IP address

Also show the changes introduced to the cluster router, by
activating the multicast relay option.

Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com>
Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com>
Adds libovsdb clients to OVN, egressip, services and unidle controllers
and the relevant harness for each of its tests.

Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>
Add libovsdb clients to controllers & test harness
services: log errors and don't panic if the service doesn't exist any more
namespace: track how long namespace addition takes
Add hbo-lr-policy only if config.HybridOverlay.ClusterSubnets is set
When the master is restarted, all the nodes are fed using
both a fetch of existing nodes and by the add node event.
In case the node already had subnet annotations, we were
updating the subnets allocated metric again. This commit
checks if new subnets were allocated and only then calls
the code to update metrics associated with allocated
subnets.

Signed-off-by: Aniket Bhat <anbhat@redhat.com>
Signed-off-by: Dan Williams <dcbw@redhat.com>
In some scale scenarios it takes 20s to repair services on startup:

2021-07-08T15:46:57.601Z|01774|ovn_dbctl|INFO|Running command run --if-exists -- remove load_balancer 13a0737b-9859-4654-9a45-df034a2d1098 vips "\"172.30.9.247:80\""
I0708 15:46:57.658232       1 repair.go:132] Deleting non-existing Kubernetes vip 172.30.135.31:80 from OVN TCP load balancer 13a0737b-9859-4654-9a45-df034a2d1098
...
I0708 15:47:08.113637       1 repair.go:132] Deleting non-existing Kubernetes vip 10.0.141.143:30666 from OVN TCP load balancer aaa40b67-8c2b-4c29-bdf7-8d0802b89c49
2021-07-08T15:47:08.117Z|02031|ovn_dbctl|INFO|Running command run --if-exists -- remove load_balancer aaa40b67-8c2b-4c29-bdf7-8d0802b89c49 vips "\"10.0.141.143:30666\""
I0708 15:47:08.477075       1 repair.go:47] Finished repairing loop for services: 21.111423982s

A good chunk of that time is sequential calls to delete stale VIPs
in the repair loop. Batch them instead.

Signed-off-by: Dan Williams <dcbw@redhat.com>
services: batch LoadBalancer VIP deletions when possible
Fix duplicate incrementing of subnet allocation metric
@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 16, 2021
@trozet
Copy link
Contributor Author

trozet commented Jul 16, 2021

/retest

1 similar comment
@trozet
Copy link
Contributor Author

trozet commented Jul 16, 2021

/retest

fedepaol and others added 7 commits July 16, 2021 15:15
This happens when the pod was already created but a new event of the pod
is generated. I managed to see it after a ovnkube-master manual restart.

Signed-off-by: Federico Paolinelli <fpaoline@redhat.com>
addGWRoutesForPod: don't fail if the routes are already added
ovnkube-trace: Autodetermine ovnNamespace
We need to pass the CA data itself between ovnkube-node and the cnishim
since the node is containerized and the shim is not, and the path could
be different between the two since they have different filesystem namespaces.

So we might as well just read the CA file and pass data around internally,
rather than using a file path.

Signed-off-by: Dan Williams <dcbw@redhat.com>
Passing the Kube API authentication data via the CNI config file
has two problems:

1) the CA file path might be different to the cniserver (because
it's containerized) than it is to the cnishim running outside
a container

2) it's better not to leak authentication info into the host
filesystem, even though the CNI config file should have restricted
permissions

To solve these two issues, pass the Kube API authentication data
back from the cniserver (running in ovnkube-node) to the cnishim
in the JSON response instead of writing it to a file on-disk.

This commit reverts parts of:
d397166
cni: cancel pod sandbox add requests if the pod's UID or MAC changes

Signed-off-by: Dan Williams <dcbw@redhat.com>
cni: pass Kube API auth via cnishim response, not CNI config file
@trozet trozet changed the title Merge 2021-07-15 Merge 2021-07-18 Jul 18, 2021
@trozet
Copy link
Contributor Author

trozet commented Jul 18, 2021

No synthetic failures

/retest

@trozet
Copy link
Contributor Author

trozet commented Jul 19, 2021

@dcbw aws-ovn showed one instance of:

https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_ovn-kubernetes/609/pull-ci-openshift-ovn-kubernetes-master-e2e-aws-ovn/1416887054985334784

: [sig-network] pods should successfully create sandboxes by other expand_less | 0s
-- | --
2 failures to create the sandbox  ns/e2e-statefulset-5668 pod/ss-0 node/ip-10-0-177-63.us-west-2.compute.internal - 5.86 seconds after deletion - reason/FailedCreatePodSandBox Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_ss-0_e2e-statefulset-5668_2da4f8db-d5aa-4c3f-8c28-101be3f3b2fc_0(1f421d4e535e5a910460de2db32ed2607316320c6a8c1b9991e7661f25fded39): error adding pod e2e-statefulset-5668_ss-0 to CNI network "multus-cni-network": [e2e-statefulset-5668/ss-0:ovn-kubernetes]: error adding container to network "ovn-kubernetes": CNI request failed with status 400: '[e2e-statefulset-5668/ss-0 1f421d4e535e5a910460de2db32ed2607316320c6a8c1b9991e7661f25fded39] [e2e-statefulset-5668/ss-0 1f421d4e535e5a910460de2db32ed2607316320c6a8c1b9991e7661f25fded39] failed to configure pod interface: pod OVN annotations changed waiting for OVS port binding for 0a:58:0a:80:03:a0 [10.128.3.160/23] ' ns/e2e-statefulset-5668 pod/ss-0 node/ip-10-0-177-63.us-west-2.compute.internal - 9.86 seconds after deletion - reason/FailedCreatePodSandBox Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_ss-0_e2e-statefulset-5668_2da4f8db-d5aa-4c3f-8c28-101be3f3b2fc_0(954add4cb169d1ce25d77b6eb8fc408d5e8c517ed145137945095c93a549767d): error adding pod e2e-statefulset-5668_ss-0 to CNI network "multus-cni-network": [e2e-statefulset-5668/ss-0:ovn-kubernetes]: error adding container to network "ovn-kubernetes": CNI request failed with status 400: '[e2e-statefulset-5668/ss-0 954add4cb169d1ce25d77b6eb8fc408d5e8c517ed145137945095c93a549767d] [e2e-statefulset-5668/ss-0 954add4cb169d1ce25d77b6eb8fc408d5e8c517ed145137945095c93a549767d] failed to configure pod interface: pod OVN annotations changed waiting for OVS port binding for 0a:58:0a:80:03:a3 [10.128.3.163/23]

@trozet
Copy link
Contributor Author

trozet commented Jul 19, 2021

/retest

1 similar comment
@trozet
Copy link
Contributor Author

trozet commented Jul 19, 2021

/retest

@trozet
Copy link
Contributor Author

trozet commented Jul 19, 2021

another instance on vsphere-ovn:
https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_ovn-kubernetes/609/pull-ci-openshift-ovn-kubernetes-master-e2e-vsphere-ovn/1417112356160278528

ns/e2e-statefulset-2737 pod/ss-0 node/ci-op-54fj8rgh-fb62e-5sxj5-worker-rzsq9 - 10.65 seconds after deletion - reason/FailedCreatePodSandBox Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_ss-0_e2e-statefulset-2737_9e952dcd-e239-4fe4-8590-075859dcadc2_0(03255bf81d9e3dd58ce4953f00f0361ded0bdd9d85771dfed687e5cc289ae3bd): error adding pod e2e-statefulset-2737_ss-0 to CNI network "multus-cni-network": [e2e-statefulset-2737/ss-0:ovn-kubernetes]: error adding container to network "ovn-kubernetes": CNI request failed with status 400: '[e2e-statefulset-2737/ss-0 03255bf81d9e3dd58ce4953f00f0361ded0bdd9d85771dfed687e5cc289ae3bd] [e2e-statefulset-2737/ss-0 03255bf81d9e3dd58ce4953f00f0361ded0bdd9d85771dfed687e5cc289ae3bd] failed to configure pod interface: pod OVN annotations changed waiting for OVS port binding for 0a:58:0a:80:03:58 [10.128.3.88/23]

They are definitely less frequent now @dcbw

@trozet
Copy link
Contributor Author

trozet commented Jul 19, 2021

/retest

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 19, 2021

@trozet: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-vsphere-windows 43abf64 link /test e2e-vsphere-windows
ci/prow/e2e-vsphere-ovn 43abf64 link /test e2e-vsphere-ovn
ci/prow/e2e-openstack-ovn 43abf64 link /test e2e-openstack-ovn
ci/prow/e2e-azure-ovn 43abf64 link /test e2e-azure-ovn
ci/prow/e2e-gcp-ovn-upgrade 43abf64 link /test e2e-gcp-ovn-upgrade

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@dcbw
Copy link
Member

dcbw commented Jul 19, 2021

/override ci/prow/e2e-gcp-ovn
/lgtm
failures are known issues and being worked on

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jul 19, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 19, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dcbw, trozet

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 19, 2021

@dcbw: Overrode contexts on behalf of dcbw: ci/prow/e2e-gcp-ovn

In response to this:

/override ci/prow/e2e-gcp-ovn
/lgtm
failures are known issues and being worked on

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet