Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce Process for API changes, experimental features and feature gates #321

Open
adrianchiris opened this issue Jan 31, 2021 · 10 comments

Comments

@adrianchiris
Copy link
Contributor

adrianchiris commented Jan 31, 2021

As Support for new features are added to sriov-network-device-plugin, these features can:

  1. add and/or modify functionality of device plugin
  2. introduce new user-facing APIs (and possibly deprecate an old user-facing API)
  3. support new "south bound" facing APIs e.g kubelet, device-info-spec (this is similar to 1 IMO as the API is already defined and device plugin just implements a new functionality)

For these features it may be desired to:

  1. Introduce feature gates to enable-disable support and functionality in device plugin
  2. Mark these features (and their APIs) as experimental as they are not fully mature and their user-facing APIs may change in subsequent releases or removed altogether.
  3. Mark an existing API as deprecated and instruct users to use the new API (eventually removing the old API)

Ive created this issue to trigger an open discussion on the issues and see if we can come with a clean way to handle it.

@zshi-redhat
Copy link
Collaborator

As Support for new features are added to sriov-network-device-plugin, these features can:

1. add and/or modify functionality of device plugin

2. introduce new user-facing APIs (and possibly deprecate an old user-facing API)

3. support new "south bound" facing APIs e.g kubelet, device-info-spec (this is similar to 1 IMO as the API is already defined and device plugin just implements a new functionality)

For these features it may be desired to:

1. Introduce feature gates to enable-disable support and functionality in device plugin

so this feature gate will be a new cli cmd option, and the corresponding new feature will only be evaluated when this feature gate is true?

2. Mark these features (and their APIs) as experimental as they are not fully mature and their user-facing APIs may change in subsequent releases or removed altogether.

Is this to mark it as experimental or deprecated in the doc or in the code (e.g. warning message that user can be notified when configuring the feature)?

Since we are targeting new API process, do we really need featureGate? or just use the API created for that feature for enablement or disablement (e.g. if API is defined in cli or configMap, then it is enabled by user, vice versa)?

@adrianchiris
Copy link
Contributor Author

adrianchiris commented Feb 1, 2021

so this feature gate will be a new cli cmd option, and the corresponding new feature will only be evaluated when this feature gate is true?

Yes, thats what i had in mind. similarly to what is done in K8s feature gates.
We can have the feature gates as part of configMap as well (eg a new featureGates field) if we want and cli flags will override those.

Is this to mark it as experimental or deprecated in the doc or in the code (e.g. warning message that user can be notified when configuring the feature)?

Im thinking the following:

maintain a table in docs for features which states:

  • Feature name
  • Feature description
  • When it was introduced
  • its stage (alpha, beta, ga with similar semantics as k8s feature gates stages)
  • whether or not this feature is deprecated, with the version it was deprecated on.

For deprecated features we shall also have a deprecation warning emitted in log.
a deprecated feature will be removed 2 releases or 6 months after it was marked as deprecated (the later of the two)

Since we are targeting new API process, do we really need featureGate? or just use the API created for that feature for enablement or disablement (e.g. if API is defined in cli or configMap, then it is enabled by user, vice versa)?

A new feature may or may not have a user facing API, hence i think we should have the separation.

@martinkennelly
Copy link
Member

@adrianchiris I think this needs careful thought before introducing - How much work involving for introducing this.
Do you know of any example project that implements feature gates like Kubernetes? Id like to understand the code changes needed to turn off and off existing features.
Also, if we want to go down this route and we need a consensus on depreciating a feature, can we agree on full acceptance from all maintainers?

@adrianchiris
Copy link
Contributor Author

adrianchiris commented Feb 8, 2021

Do you know of any example project that implements feature gates like Kubernetes? Id like to understand the code changes needed to turn off and off existing features.

Kubernetes is one example :) .
ovn-kubernetes has some knobs that disables/enables functionality which is introduced in new features as well as some form of feature gate:
https://github.com/ovn-org/ovn-kubernetes/blob/0a2aad51333400e28fbf51c1017b3064d61186cc/go-controller/pkg/config/config.go#L218

Also, if we want to go down this route and we need a consensus on depreciating a feature, can we agree on full acceptance from all maintainers?

My 2-cents here:

Alpha feature - removal should be by majority of maintainers (not deprecation here as it may be removed at the subsequent release).
Beta feature - deprecation should have full acceptance from all maintainers.
GA feature - deprecation should have full acceptance from all maintainers, although i suspect these will be deprecated if the feature is irrelevant, has alternatives and next to none consumers.

Not every feature will require a feature gate.
As an example:

  • Adding a new selector, should we add a feature gate to this selector ? IMO no as it extends an existing API and is not used if not specified
  • Adding Support to provide NUMA information to kubelet ? IMO yes as it satisfies a new kubelet API and affects pods scheduling. (and with PR Flag to not advertise NUMA information #320 ) it will also introduce new APIs for the user.

so it may be on a per-enhancement basis if a feature gate is to be added.

Downside of that is that the list of features tracked in docs and their state may be disjoint from the feature gates,
unless we say that we track a state (alpha, beta, GA, deprecated) of a feature only if it has a feature gate.
And a feature will have a feature gate if it is "substantial" . Evaluated per-feature depending of the impact on the project (API, behavioral)

@amorenoz
Copy link
Contributor

amorenoz commented Mar 1, 2021

Overall, I think this brings a lot of benefits

Downside of that is that the list of features tracked in docs and their state may be disjoint from the feature gates,
unless we say that we track a state (alpha, beta, GA, deprecated) of a feature only if it has a feature gate.
And a feature will have a feature gate if it is "substantial" . Evaluated per-feature depending of the impact on the project (API, behavioral)

I think just tracking the features for which there is a feature gate is reasonable. Other features should be documented elsewhere anyhow.

Some thoughts implementation-wise:
I guess this should mean we add a global Config struct that is accessible from all over the codebase?
Should it live in it's own sub-package to avoid adding inter-package dependencies or could we reuse e,g: factory?
What kind of logic will end up in such package? I'm thinking at least the struct initialization based on cli and configMap options (including solving potential feature interdependency). But this should be devicetype-agnostic, meaning the activation of a feature cannot depend on, say, the network-specific capabilities of the node. I cannot think of any feature that might need this, but just saying it out loud.

Also, the current implementation uses []ResourceConfig to unmarshall the configMap content, we'll have to add a global config space, right? Or are you thinking of making the feature gates per-pool?

@adrianchiris
Copy link
Contributor Author

adrianchiris commented Mar 1, 2021

@amorenoz i think we have the same idea about implementation approach

exact details can be discussed over the PR if we have a general agreement on adding support for feature gates.

implementation points:

  • global Config structs in its own sub-package to avoid storing a reference in all relevant internal structs. im thinking to avoid use factory.
  • Initialized globally with defaults, overriden by configMap and cli params (in that order)
  • devicetype/hardware/node-config agnostic, essentially a feature gate on/off value is derrived from:
    • hardcoded defaults
    • configMap
    • Cli params
    • other feature gates Dependencies
  • In regards to ConfigMap, im thinking to have a new featureGates top level member which will affect device plugin functionallity in a global manner. If a certain feature has a per resource knobs its fine, but the feature is enabled/disabled globally.

@ipatrykx
Copy link
Contributor

ipatrykx commented Apr 7, 2021

  • global Config structs in its own sub-package to avoid storing a reference in all relevant internal structs

@adrianchiris Does this mean a singleton basically?

Also, do you think that this new config should be added to the existing configMap, or should it be a separate configMap (e.g. renaming existing configMap.yml to, let's say, discoveryConfig.yml and creating addtional featuresConfig.yml?

Additionally I am not sure about having both Config and featureGates. Couldn't all the relevant information be stored in featureGates itself? Just to avoid duplicating the values.

@adrianchiris
Copy link
Contributor Author

@adrianchiris Does this mean a singleton basically?

Yep

Also, do you think that this new config should be added to the existing configMap, or should it be a separate configMap (e.g. > renaming existing configMap.yml to, let's say, discoveryConfig.yml and creating addtional featuresConfig.yml?

I think it should be in the same Map, later on we can discuss if we need to separate the resource configuration from device plugin configuration.

Additionally I am not sure about having both Config and featureGates. Couldn't all the relevant information be stored in
featureGates itself? Just to avoid duplicating the values.

Well a featureGate enables or disables a feature. A feature may or may not be associated with a set of Config parameters.
so i dont think you can roll all of it into featureGates.

@ipatrykx
Copy link
Contributor

Just created a PR on this for initial discussion.

@ipatrykx
Copy link
Contributor

ipatrykx commented Apr 12, 2021

@adrianchiris @zshi-redhat @amorenoz @martinkennelly

I was supposed to present this today, but we ran out of time. So, to speed things up, I am attaching this simple presentation here.

SRIOV DP featuregates.pdf

Also, the concerns that are specified in the last slide:

  • FeatureGate configuration precedence - should CLI args override ConfigMap parameters? When DP is running as a pod it seems to be easier to change ConfigMap than CLI args.

  • Deprecation message – should emitted deprecation message be generic (e.g. WARNING: feature ABC is deprecated), or should be feature-specific (provided in config for example)?

  • Should there be special idioms like e.g. ‘allAlphaFeatures’ that would disable/enable groups of features (in this case disable/anable all alpha features)?

  • Should there be a way to block the feature disabling/enabling (e.g. there is a GA feature that user should not be able to disable)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants