Introduce Process for API changes, experimental features and feature gates #321

adrianchiris · 2021-01-31T11:09:08Z

As Support for new features are added to sriov-network-device-plugin, these features can:

add and/or modify functionality of device plugin
introduce new user-facing APIs (and possibly deprecate an old user-facing API)
support new "south bound" facing APIs e.g kubelet, device-info-spec (this is similar to 1 IMO as the API is already defined and device plugin just implements a new functionality)

For these features it may be desired to:

Introduce feature gates to enable-disable support and functionality in device plugin
Mark these features (and their APIs) as experimental as they are not fully mature and their user-facing APIs may change in subsequent releases or removed altogether.
Mark an existing API as deprecated and instruct users to use the new API (eventually removing the old API)

Ive created this issue to trigger an open discussion on the issues and see if we can come with a clean way to handle it.

zshi-redhat · 2021-02-01T02:21:35Z

As Support for new features are added to sriov-network-device-plugin, these features can:

1. add and/or modify functionality of device plugin

2. introduce new user-facing APIs (and possibly deprecate an old user-facing API)

3. support new "south bound" facing APIs e.g kubelet, device-info-spec (this is similar to 1 IMO as the API is already defined and device plugin just implements a new functionality)

For these features it may be desired to:

1. Introduce feature gates to enable-disable support and functionality in device plugin

so this feature gate will be a new cli cmd option, and the corresponding new feature will only be evaluated when this feature gate is true?

2. Mark these features (and their APIs) as experimental as they are not fully mature and their user-facing APIs may change in subsequent releases or removed altogether.

Is this to mark it as experimental or deprecated in the doc or in the code (e.g. warning message that user can be notified when configuring the feature)?

Since we are targeting new API process, do we really need featureGate? or just use the API created for that feature for enablement or disablement (e.g. if API is defined in cli or configMap, then it is enabled by user, vice versa)?

adrianchiris · 2021-02-01T12:11:38Z

so this feature gate will be a new cli cmd option, and the corresponding new feature will only be evaluated when this feature gate is true?

Yes, thats what i had in mind. similarly to what is done in K8s feature gates.
We can have the feature gates as part of configMap as well (eg a new featureGates field) if we want and cli flags will override those.

Is this to mark it as experimental or deprecated in the doc or in the code (e.g. warning message that user can be notified when configuring the feature)?

Im thinking the following:

maintain a table in docs for features which states:

Feature name
Feature description
When it was introduced
its stage (alpha, beta, ga with similar semantics as k8s feature gates stages)
whether or not this feature is deprecated, with the version it was deprecated on.

For deprecated features we shall also have a deprecation warning emitted in log.
a deprecated feature will be removed 2 releases or 6 months after it was marked as deprecated (the later of the two)

Since we are targeting new API process, do we really need featureGate? or just use the API created for that feature for enablement or disablement (e.g. if API is defined in cli or configMap, then it is enabled by user, vice versa)?

A new feature may or may not have a user facing API, hence i think we should have the separation.

martinkennelly · 2021-02-03T13:27:53Z

@adrianchiris I think this needs careful thought before introducing - How much work involving for introducing this.
Do you know of any example project that implements feature gates like Kubernetes? Id like to understand the code changes needed to turn off and off existing features.
Also, if we want to go down this route and we need a consensus on depreciating a feature, can we agree on full acceptance from all maintainers?

adrianchiris · 2021-02-08T09:58:05Z

Do you know of any example project that implements feature gates like Kubernetes? Id like to understand the code changes needed to turn off and off existing features.

Kubernetes is one example :) .
ovn-kubernetes has some knobs that disables/enables functionality which is introduced in new features as well as some form of feature gate:
https://github.com/ovn-org/ovn-kubernetes/blob/0a2aad51333400e28fbf51c1017b3064d61186cc/go-controller/pkg/config/config.go#L218

Also, if we want to go down this route and we need a consensus on depreciating a feature, can we agree on full acceptance from all maintainers?

My 2-cents here:

Alpha feature - removal should be by majority of maintainers (not deprecation here as it may be removed at the subsequent release).
Beta feature - deprecation should have full acceptance from all maintainers.
GA feature - deprecation should have full acceptance from all maintainers, although i suspect these will be deprecated if the feature is irrelevant, has alternatives and next to none consumers.

Not every feature will require a feature gate.
As an example:

Adding a new selector, should we add a feature gate to this selector ? IMO no as it extends an existing API and is not used if not specified
Adding Support to provide NUMA information to kubelet ? IMO yes as it satisfies a new kubelet API and affects pods scheduling. (and with PR Flag to not advertise NUMA information #320 ) it will also introduce new APIs for the user.

so it may be on a per-enhancement basis if a feature gate is to be added.

Downside of that is that the list of features tracked in docs and their state may be disjoint from the feature gates,
unless we say that we track a state (alpha, beta, GA, deprecated) of a feature only if it has a feature gate.
And a feature will have a feature gate if it is "substantial" . Evaluated per-feature depending of the impact on the project (API, behavioral)

amorenoz · 2021-03-01T09:21:21Z

Overall, I think this brings a lot of benefits

Downside of that is that the list of features tracked in docs and their state may be disjoint from the feature gates,
unless we say that we track a state (alpha, beta, GA, deprecated) of a feature only if it has a feature gate.
And a feature will have a feature gate if it is "substantial" . Evaluated per-feature depending of the impact on the project (API, behavioral)

I think just tracking the features for which there is a feature gate is reasonable. Other features should be documented elsewhere anyhow.

Some thoughts implementation-wise:
I guess this should mean we add a global Config struct that is accessible from all over the codebase?
Should it live in it's own sub-package to avoid adding inter-package dependencies or could we reuse e,g: factory?
What kind of logic will end up in such package? I'm thinking at least the struct initialization based on cli and configMap options (including solving potential feature interdependency). But this should be devicetype-agnostic, meaning the activation of a feature cannot depend on, say, the network-specific capabilities of the node. I cannot think of any feature that might need this, but just saying it out loud.

Also, the current implementation uses []ResourceConfig to unmarshall the configMap content, we'll have to add a global config space, right? Or are you thinking of making the feature gates per-pool?

adrianchiris · 2021-03-01T11:25:55Z

@amorenoz i think we have the same idea about implementation approach

exact details can be discussed over the PR if we have a general agreement on adding support for feature gates.

implementation points:

global Config structs in its own sub-package to avoid storing a reference in all relevant internal structs. im thinking to avoid use factory.
Initialized globally with defaults, overriden by configMap and cli params (in that order)
devicetype/hardware/node-config agnostic, essentially a feature gate on/off value is derrived from:
- hardcoded defaults
- configMap
- Cli params
- other feature gates Dependencies
In regards to ConfigMap, im thinking to have a new featureGates top level member which will affect device plugin functionallity in a global manner. If a certain feature has a per resource knobs its fine, but the feature is enabled/disabled globally.

ipatrykx · 2021-04-07T09:29:43Z

global Config structs in its own sub-package to avoid storing a reference in all relevant internal structs

@adrianchiris Does this mean a singleton basically?

Also, do you think that this new config should be added to the existing configMap, or should it be a separate configMap (e.g. renaming existing configMap.yml to, let's say, discoveryConfig.yml and creating addtional featuresConfig.yml?

Additionally I am not sure about having both Config and featureGates. Couldn't all the relevant information be stored in featureGates itself? Just to avoid duplicating the values.

adrianchiris · 2021-04-07T09:55:43Z

@adrianchiris Does this mean a singleton basically?

Yep

Also, do you think that this new config should be added to the existing configMap, or should it be a separate configMap (e.g. > renaming existing configMap.yml to, let's say, discoveryConfig.yml and creating addtional featuresConfig.yml?

I think it should be in the same Map, later on we can discuss if we need to separate the resource configuration from device plugin configuration.

Additionally I am not sure about having both Config and featureGates. Couldn't all the relevant information be stored in
featureGates itself? Just to avoid duplicating the values.

Well a featureGate enables or disables a feature. A feature may or may not be associated with a set of Config parameters.
so i dont think you can roll all of it into featureGates.

ipatrykx · 2021-04-12T11:33:02Z

Just created a PR on this for initial discussion.

ipatrykx · 2021-04-12T15:49:35Z

@adrianchiris @zshi-redhat @amorenoz @martinkennelly

I was supposed to present this today, but we ran out of time. So, to speed things up, I am attaching this simple presentation here.

SRIOV DP featuregates.pdf

Also, the concerns that are specified in the last slide:

FeatureGate configuration precedence - should CLI args override ConfigMap parameters? When DP is running as a pod it seems to be easier to change ConfigMap than CLI args.
Deprecation message – should emitted deprecation message be generic (e.g. WARNING: feature ABC is deprecated), or should be feature-specific (provided in config for example)?
Should there be special idioms like e.g. ‘allAlphaFeatures’ that would disable/enable groups of features (in this case disable/anable all alpha features)?
Should there be a way to block the feature disabling/enabling (e.g. there is a GA feature that user should not be able to disable)?

ipatrykx mentioned this issue Apr 12, 2021

Introducing FeatureGates and global feature config #344

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce Process for API changes, experimental features and feature gates #321

Introduce Process for API changes, experimental features and feature gates #321

adrianchiris commented Jan 31, 2021 •

edited

Loading

zshi-redhat commented Feb 1, 2021

adrianchiris commented Feb 1, 2021 •

edited

Loading

martinkennelly commented Feb 3, 2021

adrianchiris commented Feb 8, 2021 •

edited

Loading

amorenoz commented Mar 1, 2021

adrianchiris commented Mar 1, 2021 •

edited

Loading

ipatrykx commented Apr 7, 2021 •

edited

Loading

adrianchiris commented Apr 7, 2021

ipatrykx commented Apr 12, 2021

ipatrykx commented Apr 12, 2021 •

edited

Loading

Introduce Process for API changes, experimental features and feature gates #321

Introduce Process for API changes, experimental features and feature gates #321

Comments

adrianchiris commented Jan 31, 2021 • edited Loading

zshi-redhat commented Feb 1, 2021

adrianchiris commented Feb 1, 2021 • edited Loading

martinkennelly commented Feb 3, 2021

adrianchiris commented Feb 8, 2021 • edited Loading

amorenoz commented Mar 1, 2021

adrianchiris commented Mar 1, 2021 • edited Loading

ipatrykx commented Apr 7, 2021 • edited Loading

adrianchiris commented Apr 7, 2021

ipatrykx commented Apr 12, 2021

ipatrykx commented Apr 12, 2021 • edited Loading

adrianchiris commented Jan 31, 2021 •

edited

Loading

adrianchiris commented Feb 1, 2021 •

edited

Loading

adrianchiris commented Feb 8, 2021 •

edited

Loading

adrianchiris commented Mar 1, 2021 •

edited

Loading

ipatrykx commented Apr 7, 2021 •

edited

Loading

ipatrykx commented Apr 12, 2021 •

edited

Loading