-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add "Software bridge management" design doc #640
Add "Software bridge management" design doc #640
Conversation
Thanks for your PR,
To skip the vendors CIs use one of:
|
Pull Request Test Coverage Report for Build 8518344220Details
💛 - Coveralls |
f514638
to
3753c5d
Compare
Thanks for your PR,
To skip the vendors CIs use one of:
|
`accelerated-bridge-cni` and contains the required resource requests. | ||
* As a user, I want to define configuration for software bridges inside the `SriovNetworkNodePolicy` CR and expect that the | ||
operator will create required bridges, configure them, and attach uplinks (physical functions). | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about the OVN-K CNI use case? Will the api to set it remain the same?
Will OVN-K and OVS-CNI have any conflicts if OVN-K is installed in the first place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that the only difference for OVN-k use case is that sriov-network-operator will need to be configured to systemd
mode (check this doc for details). NIC configuration only flow is proposed to be compatible with existing OVN-k use-case. In this case NIC will be configured to switchdev mode early on boot by the systemd service, but the operator will not manage software bridges.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current proposal assumes that "Fully automatic" flow can be used only for dedicated NICs (not used by OVN-k) which are fully controlled by the sriov-network-operator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wizhaoredhat I will like to ask your team if you can validate that after all the work done as part of https://github.com/k8snetworkplumbingwg/sriov-network-operator/blob/master/doc/design/switchdev-refactoring.md the ovn-k HWoffload continue to work. there is one PR that is missing before we can try that #643
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few comments. Overall design looks good to me
_**Note**: multiple policies can match single PF (vf range use-case), bridge settings from the policy with the higher priority | ||
will be applied to PF._ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: mind recalling the priority mechanism?
_**Note**: multiple policies can match single PF (vf range use-case), bridge settings from the policy with the higher priority | |
will be applied to PF._ | |
_**Note**: multiple policies can match single PF (vf range use-case), bridge settings from the policy with the higher priority (which is the one with the lowest `.spec.priority` value) | |
will be applied to PF._ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
ExternallyManaged bool `json:"externallyManaged,omitempty"` | ||
TotalVfs int `json:"totalvfs,omitempty"` | ||
VFs []VirtualFunction `json:"Vfs,omitempty"` | ||
Bridge *Bridge `json:"bridge,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the same type as we have in the SriovNetworkNodeState.Spec
and in the SriovNetworkNodeState.Spec
? If this information comes from an inspection on the host, it might reflect the output of a ovs-vsctl
or ip link
, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, currently I think that it is possible to use the same type here and in the policy/state.
ovs-vsctl
orip link
just as a note, I think we will use direct requests to ovsd to configure ovs bridges and rely on netlink for linux bridges.
Usage of cmd tools is not expected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree we should use netlink (if we miss something we can add to the golang netlink) and for ovsdb I think we can use the same package ovn-k use?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree we should use netlink (if we miss something we can add to the golang netlink) and for ovsdb I think we can use the same package ovn-k use?
yes, I think we have options to avoid usage of cmd tools
``` | ||
|
||
Open questions: | ||
- do we want to support each OVS Bridge/Interface option explicitly or should we support |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd personally go with a generic container for options, describing in the docs in which command they will be used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on generic.
do we have any option which is mandatory for the use-cases in this doc ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we have any option which is mandatory for the use-cases in this doc ?
No. I assume that ovs-kernel (with TC based HW offloading) should work without extra options.
User will need to:
a) enable HW offload
b) create policy that contains Bridge.type = "ovs", without extra OVS options
c) create OVSNetwork CR
3753c5d
to
ff7c04c
Compare
Thanks for your PR,
To skip the vendors CIs use one of:
|
* add bridge auto-selection logic to `ovs-cni` | ||
* define `OVSNetwork` CRD and implement controller for it | ||
|
||
##### Phase 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we have a "bridge" plugin to handle bridge configurations ?
generic plugin handles today:
- sriov configuration
- kernel parameters
- loading drivers
im thinking, if we can get away from adding more responsibility onto it we should.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 though this is an implementation detail.
To me, in the design doc it's enough to refer to the config-daemon component.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the mention of the generic_plugin from the doc. Let's discuss this during the implementation.
``` | ||
|
||
Open questions: | ||
- do we want to support each OVS Bridge/Interface option explicitly or should we support |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on generic.
do we have any option which is mandatory for the use-cases in this doc ?
Open questions: | ||
- do we want to support each OVS Bridge/Interface option explicitly or should we support | ||
generic map[string]string struct which allows set any field in the Bridge/Interface tables? | ||
- need to clarify list of required args for OVS and Linux bridge |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ykulazhenkov lets do some checks internally about such args.
generic map[string]string struct which allows set any field in the Bridge/Interface tables? | ||
- need to clarify list of required args for OVS and Linux bridge | ||
|
||
_**Note**: multiple policies can match single PF (vf range use-case), bridge settings from the policy with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what will happen when multiple PFs match the policy ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what will happen when multiple PFs match the policy ?
A separate bridge will be created for each PF. I added additional note about this to the doc
|
||
_**Note 2:** we should create udev rule which will set `NM_UNMANAGED=1` for bridges created by the operator_ | ||
|
||
4. `sriov-network-config-daemon` should report information about software bridges in the status field of the `SriovNetworkNodePolicy` CR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should take extra care not to delete a bridge we did not create.
with OVS we could use some external ids to mark the bridge as created by the operator if we like.
more detail on how to ensure that is welcome :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If "fully auto" mode is used we should create a bridge with predictable name for each PF that match a policy that contains a bridge configuration. We can't use interface name to generate a predictable name for the bridge, because the limit for the interface length is only 15 symbols and PF name will already have length around this.
I propose to use PF's PCI address to generate predictable bridge name, like br-0000_d8_01.0
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe the information can be in the sriovNodeState for every node?
and the name is not that critical if we have a file with the connection between them.
this way we always know what are the bridges we control
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we will report this information in the status field of the state object. I accidentally mentioned the wrong CR in this bullet. Daemon will report info about bridges in the status field of the Nodestate object
ff7c04c
to
a5ce0cd
Compare
Thanks for your PR,
To skip the vendors CIs use one of:
|
a5ce0cd
to
8797f89
Compare
Thanks for your PR,
To skip the vendors CIs use one of:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had some discussions with @ykulazhenkov regarding possible alternatives and i believe what is currently proposed is a good way forward so overall im OK with the proposed approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great document!
I left some comments nice work :)
`accelerated-bridge-cni` and contains the required resource requests. | ||
* As a user, I want to define configuration for software bridges inside the `SriovNetworkNodePolicy` CR and expect that the | ||
operator will create required bridges, configure them, and attach uplinks (physical functions). | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wizhaoredhat I will like to ask your team if you can validate that after all the work done as part of https://github.com/k8snetworkplumbingwg/sriov-network-operator/blob/master/doc/design/switchdev-refactoring.md the ovn-k HWoffload continue to work. there is one PR that is missing before we can try that #643
* add `OVSNetwork` and `BridgeNetwork` CRDs as an API for end users to simplify creation of `NetworkAttachmentDefinition` CR | ||
for `ovs-cni` and `accelerated-bridge-cni` | ||
* extend `SriovNetworkNodePolicy` CRD to support configuration of software bridges (bridge-level configuration) | ||
* extend `SriovNetworkNodeState` CRD to support configuration of software bridges (bridge-level configuration) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we also want to exterd the SriovNetworkNodeState.Status
to report the switch status/topology?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I added a note about this
|
||
_**Note 2:** we should create udev rule which will set `NM_UNMANAGED=1` for bridges created by the operator_ | ||
|
||
4. `sriov-network-config-daemon` should report information about software bridges in the status field of the `SriovNetworkNodePolicy` CR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe the information can be in the sriovNodeState for every node?
and the name is not that critical if we have a file with the connection between them.
this way we always know what are the bridges we control
- need to clarify list of required args for OVS and Linux bridge | ||
|
||
_**Note 1**: multiple policies can match single PF (vf range use-case), bridge settings from the policy with | ||
the higher priority (which is the one with the lowest `.spec.priority` value) will be applied to PF._ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be just tell the user that only 1 policy per PF can contain bridge section and block that via a webhook?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure. I think it is better to preserve existing behavior. Currently we follow this rule for existing PF level options defined in the policy (linkType, eswitchMode, etc)
ExternallyManaged bool `json:"externallyManaged,omitempty"` | ||
TotalVfs int `json:"totalvfs,omitempty"` | ||
VFs []VirtualFunction `json:"Vfs,omitempty"` | ||
Bridge *Bridge `json:"bridge,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree we should use netlink (if we miss something we can add to the golang netlink) and for ovsdb I think we can use the same package ovn-k use?
EswitchMode string `json:"eSwitchMode,omitempty"` | ||
ExternallyManaged bool `json:"externallyManaged,omitempty"` | ||
TotalVfs int `json:"totalvfs,omitempty"` | ||
VFs []VirtualFunction `json:"Vfs,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in the virtual functions do we want to also add the representorName?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is not directly related to the scope of this proposal, but I agree that this is a good improvement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we add the field to represent the change in the API as part of this proposal?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
// only one bridge type can be set | ||
type Bridge struct { | ||
// +kubebuilder:validation:Enum=ovs;linux | ||
Type string `json:"type"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it use pointers so can we just check that only one exist and the other one is nil so we don't need to type also?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can't. Structs contain optional arguments for bridges.
Minimal valid config is
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
name: connectx6dx-switched
namespace: sriov-network-operator
spec:
eSwitchMode: switchdev
nicSelector:
deviceID: 101d
vendor: 15b3
nodeSelector:
feature.node.kubernetes.io/network-sriov.capable: "true"
numVfs: 4
resourceName: connectx6dx-switched
bridge:
type: ovs
In this case OVS bridge with default configuration will be created for each PF that match the policy.
By default configuration I mean config which OVS uses if you exec ovs-vsctl add-br foo
(no extra args).
If HW offloading is enable on the node level (with old ConfigPool API), then such bridge will work nicely with OVS-kernel hw offloading.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will it work if the user create something like?
bridge:
Ovs: {}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, let's drop the type field
} | ||
|
||
``` | ||
`SriovNetworkNodeState.spec` and `SriovNetworkNodeState.status` should be extended to contain same `Bridge` struct as `SriovNetworkNodePolicySpec` CRD. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we can use the same object as the Policy as in the policy I can have nicSelector: vendor: 8086
but in the spec and status for the NodeState I will like to have a list of all the bridges we need to create with the configuration, name and uplink object
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@SchSeba I updated proposed API change for SriovNetworkNodeState object. Now it contains Bridges list.
I propose to use the same Bridges object in the spec and in the status. We may want to extend the one in the status field with additional info when we will have feedback from the users.
8797f89
to
81a3ed7
Compare
Thanks for your PR,
To skip the vendors CIs use one of:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did another round of review.
general point about expose everything from the cnis args to the user.
maybe we can remove the ones I add a comment on and add them later if we see a use-case?
adding is easier than removing I think :)
and please one critical comment if you can take a look
https://github.com/k8snetworkplumbingwg/sriov-network-operator/pull/640/files#r1518902408
// only one bridge type can be set | ||
type Bridge struct { | ||
// +kubebuilder:validation:Enum=ovs;linux | ||
Type string `json:"type"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will it work if the user create something like?
bridge:
Ovs: {}
} | ||
|
||
// LinuxBrConfig optional configuration for Linux bridge and uplink interface | ||
type LinuxBrConfig struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the other one can be just BridgeConfig as we are already inside the LinuxBridgeConfig section no?
EswitchMode string `json:"eSwitchMode,omitempty"` | ||
ExternallyManaged bool `json:"externallyManaged,omitempty"` | ||
TotalVfs int `json:"totalvfs,omitempty"` | ||
VFs []VirtualFunction `json:"Vfs,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we add the field to represent the change in the API as part of this proposal?
81a3ed7
to
10b6f2d
Compare
Thanks for your PR,
To skip the vendors CIs use one of:
|
thanks @SchSeba
Ok, let's remove these fields for now. I removed them from the design
I replied to this comment and updated the design. PTAL |
10b6f2d
to
54974ff
Compare
Thanks for your PR,
To skip the vendors CIs use one of:
|
54974ff
to
6e5fea4
Compare
Thanks for your PR,
To skip the vendors CIs use one of:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
almost ready just two small nits :)
6e5fea4
to
83098b7
Compare
Thanks for your PR,
To skip the vendors CIs use one of:
|
83098b7
to
b47fd1c
Compare
Thanks for your PR,
To skip the vendors CIs use one of:
|
Signed-off-by: Yury Kulazhenkov <ykulazhenkov@nvidia.com>
b47fd1c
to
d04e84a
Compare
Thanks for your PR,
To skip the vendors CIs use one of:
|
I pushed small update to the document:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This document contains a proposal to add limited support for software bridges configuration to the sriov-network-operator.
This feature assumes integration with ovs-cni and accelerated-bridge-cni.
Depends on switchdev and systemd modes refactoring feature
cc @adrianchiris @zeeke @SchSeba