Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 1823560: Upstream merge 2020-04-06 #136

Closed
wants to merge 282 commits into from

Conversation

dcbw
Copy link
Contributor

@dcbw dcbw commented Apr 11, 2020

4.3 version of #123

@openshift/networking

dceara and others added 30 commits February 23, 2020 11:55
ovn-controller enables conditional monitoring by default for tables in
the Southbound Database it is interested in (e.g., Logical_Flow). This
effectively performs row level filtering for the SB DB tables such that
an ovn-controller would receive only updates about records that refer to
logical datapaths that are local to the chassis.

However, in an ovn-kubernetes cluster, all node logical switches are
connected to the ovn-cluster-router which forces all ovn-controllers to
register for updates about all logical switches.

The conditional monitoring implementation in OVSDB turns out to not
scale so well in such scenarios.

OVN 20.03 supports a new configuration knob (ovn-monitor-all=true) which
instructs ovn-controllers to not use row level filtering of records.
This was introduced by:
ovn-org/ovn@3c355e3

ovn-kubernetes per node gateway routers are bound to individual chassis
and should not generate openflow entries on other chassis. This is also
handled in OVN 20.03 by follow up commits:
ovn-org/ovn@166a33c
ovn-org/ovn@eebef50

Note regarding backwards compatibility:
When ovn-kubernetes is used with older OVN versions (i.e., <20.03)
setting "ovn-monitor-all=true" has no effect as the knob will just be
ignored by ovn-controller.

Reported-by: Girish Moodalbail <gmoodalbail@nvidia.com>
Reported-at: https://groups.google.com/d/msg/ovn-kubernetes/OalZ5-2ljnA/FOGlAyK5DQAJ
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
cluster.go: Disable Southbound DB conditional monitoring.
I had a problem in an IPv6 cluster where certain traffic was getting
dropped.  A closer look revealed that duplicate MACs had been
configured in the OVN virtual network topology.  In particular, the
join bridge between the gateway router and distributed cluster router
had the same MAC on both of its ports.

The root cause was this function.  It assumed IPv4.

While working on a fix for this and writing test cases, I discovered
that the function would sometimes also break for IPv4.  It assumed
that the IPv4 address was always in 4-byte form.  Sometimes the IP
type actually has 16 bytes, even when holding an IPv4 address.  The
first fix is to ensure that we're always using the 4-byte
representation when working with the IPv4 address.

The second fix is to make this function aware of IPv6.  We have more
bytes in the address than we have bytes in a MAC, so it's not
technically possible to always guarantee this results in a unique mac.
However, the subnets we have hard coded internally in OVN are of a
pretty simple format like fd98::N, so using the first two and last two
bytes seems good enough.

Really it's better to just generate a random unique MAC, but this
patch attempts a compromise by covering some cases without
re-introducing the performance concern that was addressed when this
code was first introduced by commit
b453f73.

Signed-off-by: Russell Bryant <russell@ovn.org>
ipv6: Make IPAddrToHWAddr() aware of IPv6
…e/switch_to_klog

Switch from logrus to klog
add readiness probes for OVN/OVS daemons
ovs 2.12 [1] with hardware offload support ingress ratelimiting therefore
LinkSetVfTxRate is not needed. The patch remove the duplicate configuration
for SR-IOV VF. This kernel patch [2] add the support the mellanox driver.

Closes: openshift#1066

[1] - openvswitch/ovs@e7f6ba2
[2] - https://www.spinics.net/lists/netdev/msg589937.html
This is a follow up after some discussion on PR openshift#1091.  Even with this
limitation in place, we will still end up with some duplicate MACs in
a system.  The duplicates won't be on the same logical switch though,
so it's not harmful.  The cluster subnet doesn't change this
situation.
Includes fixes to regressions that were causing CI to fail before
downgrade.

Signed-off-by: Tim Rozet <trozet@redhat.com>
Drop IPv6 cluster subnet limitation
remove LinkSetVfTxRate from setupSriovInterface
Otherwise unit tests using the fake kube client cannot correctly
delete an annotation using the MergePatch strategy and passing 'nil'
for the annotation's value.

Signed-off-by: Dan Williams <dcbw@redhat.com>
The annotator provides a better-encapsulated interface for setting
or removing multiple annotations on a node from disconnected functions
since it bundles them all up into a single set and/or delete call.

Signed-off-by: Dan Williams <dcbw@redhat.com>
Use the node annotator to batch and set node annotations rather
than returning maps of annotations to set later.

Add a generalized startup waiter and use that from the gateway
and management port functions.

Signed-off-by: Dan Williams <dcbw@redhat.com>
Signed-off-by: Dan Williams <dcbw@redhat.com>
…/fixing-log-file-writing

Fix: adding klog specific flags when logging to file
With recent changes to OVN to block untracked traffic like ARP, we now
need to explicitly allow it in the default ACLs.

Closes: openshift#1076

Signed-off-by: Tim Rozet <trozet@redhat.com>
ovn-controller enables conditional monitoring by default for tables in
the Southbound Database it is interested in (e.g., Logical_Flow). This
effectively performs row level filtering for the SB DB tables such that
an ovn-controller would receive only updates about records that refer to
logical datapaths that are local to the chassis.

However, in an ovn-kubernetes cluster, all node logical switches are
connected to the ovn-cluster-router which forces all ovn-controllers to
register for updates about all logical switches.

The conditional monitoring implementation in OVSDB turns out to not
scale so well in such scenarios.

OVN 20.03 supports a new configuration knob (ovn-monitor-all=true) which
instructs ovn-controllers to not use row level filtering of records.
This was introduced by:
ovn-org/ovn@3c355e3

ovn-kubernetes per node gateway routers are bound to individual chassis
and should not generate openflow entries on other chassis. This is also
handled in OVN 20.03 by follow up commits:
ovn-org/ovn@166a33c
ovn-org/ovn@eebef50

Note regarding backwards compatibility:
When ovn-kubernetes is used with older OVN versions (i.e., <20.03)
setting "ovn-monitor-all=true" has no effect as the knob will just be
ignored by ovn-controller.

Reported-by: Girish Moodalbail <gmoodalbail@nvidia.com>
Reported-at: https://groups.google.com/d/msg/ovn-kubernetes/OalZ5-2ljnA/FOGlAyK5DQAJ
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Doesn't check for OVS/OVN utilities; to be used for
components that don't need those.

Signed-off-by: Dan Williams <dcbw@redhat.com>
…nels

Heavily modified from work by Rajat Chopra in:

ovn-org/ovn-kubernetes#593

Signed-off-by: Dan Williams <dcbw@redhat.com>
Signed-off-by: Jocelyn Berrendonner <jocelynb@microsoft.com>
Signed-off-by: Dan Williams <dcbw@redhat.com>
Signed-off-by: Girish Moodalbail <gmoodalbail@nvidia.com>
trozet and others added 14 commits April 6, 2020 09:06
Adds debug logging for EPs.

Signed-off-by: Tim Rozet <trozet@redhat.com>
identify default flows with 0xdeffl05 cookie and add healthcheck for it
Signed-off-by: Dan Williams <dcbw@redhat.com>
hybrid-overlay: pod IP/MAC might be nil
Signed-off-by: Girish Moodalbail <gmoodalbail@nvidia.com>
ovn-nbctl daemon mode by default puts its pidfile and socket
into /var/run/openvswitch which is likely an oversight in the
OVS/OVN split work debuting in OVN 2.12. The code was not looking
in /var/run/openvswitch if ovn-appctl was present, which was supposed
to be the magic way of figuring out whether to use /var/run/openvswitch
or /var/run/ovn.

Instead, first look for an existing OVN_NB_DAEMON environment variable,
which may have been set already per 'man ovn-nbctl' daemon mode
documentation, and return that verbatim to ensure the child ovn-nbctl
process uses it.

Otherwise, look for the pidfile and corresponding socket in both
directories (ovn first, then openvswitch) and pick the first one
that has both.

Signed-off-by: Dan Williams <dcbw@redhat.com>
@openshift-ci-robot
Copy link
Contributor

@dcbw: No Bugzilla bug is referenced in the title of this pull request.
To reference a bug, add 'Bug XXX:' to the title of this pull request and request another bug refresh with /bugzilla refresh.

In response to this:

Upstream merge 2020-04-06

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dcbw

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 11, 2020
@openshift-ci-robot
Copy link
Contributor

@dcbw: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-gcp-ovn 9308309 link /test e2e-gcp-ovn
ci/prow/e2e-aws-ovn 9308309 link /test e2e-aws-ovn

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@dcbw dcbw changed the title Upstream merge 2020-04-06 Bug 1823560: Upstream merge 2020-04-06 Apr 13, 2020
@openshift-ci-robot openshift-ci-robot added the bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. label Apr 13, 2020
@openshift-ci-robot
Copy link
Contributor

@dcbw: This pull request references Bugzilla bug 1823560, which is invalid:

  • expected the bug to target the "4.3.z" release, but it targets "4.4.0" instead
  • expected dependent Bugzilla bug 1823557 to be in one of the following states: VERIFIED, RELEASE_PENDING, CLOSED (ERRATA), but it is NEW instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

Bug 1823560: Upstream merge 2020-04-06

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@dcbw
Copy link
Contributor Author

dcbw commented Apr 13, 2020

/bugzilla refresh

@openshift-ci-robot
Copy link
Contributor

@dcbw: This pull request references Bugzilla bug 1823560, which is invalid:

  • expected dependent Bugzilla bug 1823557 to be in one of the following states: VERIFIED, RELEASE_PENDING, CLOSED (ERRATA), but it is NEW instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@trozet
Copy link
Contributor

trozet commented Apr 17, 2020

@dcbw can you close this? I don't think it is needed anymore.

@dcbw
Copy link
Contributor Author

dcbw commented Apr 17, 2020

Closed as this is longer needed.

@dcbw dcbw closed this Apr 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet