Conversation
| enable_software_inventory: | ||
| default: false |
There was a problem hiding this comment.
Is there a reason using an object with default: false wouldn't work for the regular case as well? I usually find that having multiple types are harder to implement and more confusing. In this case it seems like a RolloutBoolean can be an object or a boolean. Are we doing this for backwards compatibility reasons?
There was a problem hiding this comment.
Yes, the goal is backwards compatibility.
|
|
||
| ## How | ||
|
|
||
| We would create a new type of boolean value in our `AppConfig` called `RolloutBoolean`. |
There was a problem hiding this comment.
I have typically seen this sort of thing called a feature flag. Our users may also be more familiar with that term than RolloutBoolean.
There was a problem hiding this comment.
Could be, feature flags in my book is a binary thing, though. This is a way of rolling out a feature so that eventually you enable for all if you want.
There was a problem hiding this comment.
Feature flags are not always just a binary off/on thing. For example, in LaunchDarkly, feature flags can target a specific user, a subset of users, use conditional logic. They can also take a range of values (not just boolean, but also strings, numbers, etc)
There was a problem hiding this comment.
Note I've heard customers express concerns about turning on features for a bunch of hosts at once. They want to be able to roll out things like the software inventory queries slowly and monitor the performance impacts.
There was a problem hiding this comment.
Feature flags are not binary in what they target, but I think they do tend to be associated with temporary things though (i.e. roll out slowly but eventually it should be enabled everywhere) and typically to be more for the development team than a config option for the users.
If I understand correctly this proposal, this could be for permanent things - maybe team1 will never want feature X, but team2 wants it, is that correct? (and of course it seems to be strictly for users, not for us as a way to "hide" an experimental feature - they manage the flags however they want for how long they want)
So in my understanding, it might be confusing to call this "feature flags" though I certainly see the similarities.
concerns about turning on features for a bunch of hosts at once
I'm not sure the config approach of setting specific host IDs scales very well though. E.g. if I manage 20K hosts, it would be nicer to be able to say "enable for 10%" or "enable for hosts in this label", something like that, although that means having to manage some state for the percentage, and the label membership is dynamic. Thinking out loud, but to assist users with many hosts, we could have tooling in fleetctl (and/or the UI) to support rolling out in this way (x%, or hosts in "label", etc.) that would take a snapshot of matching host ids and set the config using host ids, so that there's no state to manage elsewhere and the label membership is just that - a snapshot at that moment in time. When you grow the rollout, it ensures the current host IDs stay in the set and only adds new ones.
Overall, I'm +1 too for the proposal - there's some complexity around allowing different types for the same config value, but nothing crazy.
I have some concerns about being sure to properly handle the now per-host application of the features in all cases, it might get very tricky depending on the feature (e.g. you can't just check if the feature is enabled and if so, do a batch SELECT of things related to that feature - you have to check if each affected host has the feature enabled).
My concerns would get clearer once we design this for a specific feature, but to give an hypothetical example, if the failing policies webhook integration was controlled like this, we'd have to add checks for each failed policy if it contains any host that has the feature enabled, ignore it if it only has hosts with the feature disabled, and only send the list of hosts with the feature enabled if it has a mix of both.
There was a problem hiding this comment.
So we should give people running Fleet tools to ... use a feature within the scope that is useful for them.
The wording from the doc above suggests that this is intended as a config option for users, to control whether certain features are enabled
it would be nicer to be able to say "enable for 10%" or "enable for hosts in this label"
I think a percentage rollout would be a great idea, especially for things that affect performance.
I have some concerns about being sure to properly handle the now per-host application of the features in all cases, it might get very tricky depending on the feature (e.g. you can't just check if the feature is enabled and if so, do a batch SELECT of things related to that feature - you have to check if each affected host has the feature enabled).
The best way I can think of implementing this right now is having a separate table you can join hosts with to check whether a feature is enabled eg host_features
There was a problem hiding this comment.
I intentionally left percentages out because of its added complexity. We can add it later on. If you want progressive rollout, either assign teams or specific hosts. If you want a range, we can make fleetctl auto generate the config based on a range, like @mna mentioned.
As for the wording, yup, it was left phrased like that so that this would serve both purposes: full AND partia rollouts. We can disagree on wether we want this, but the idea is: what's the simplest we could do that could serve the most use cases we've heard of?
There was a problem hiding this comment.
I can see percentages being useful, but I can also see customers requesting other ways to control this, I think we should ask around before settling. One crazy idea could be to control the target hosts via an osquery query
There was a problem hiding this comment.
One crazy idea could be to control the target hosts via an osquery query
Not so crazy! This is basically (iiuc) being able to target labels :)
|
+1. I think this is a good low-risk improvement. We can start with the shown |
|
Am using |
Hm, well, I didn't think of throwing this in docs because it didn't feel like a doc... but if it is approved and merged, it would technically serve as the doc for the feature, so it's not a bad idea at all! |
| enabled by default, everybody would love them all, and they would work flawlessly for all possible use cases. | ||
|
|
||
| We are in the real world, though, which is not ideal. So we should give people running Fleet tools to rollout features | ||
| slowly, so that they can update infrastructure if needed, or only use a feature within the scope that is useful for |
There was a problem hiding this comment.
I think this is a great idea, should we put guidelines or a DRY individual to make the final decision regarding which features can be configured?
I can imagine this feature left unused because of the extra work, or being overused and suddenly everything is behind a config.
There was a problem hiding this comment.
It'll be work to strike a balance. The default is probably NOT to use this unless there's a need.
|
|
||
| We would create a new type of boolean value in our `AppConfig` called `RolloutBoolean`. | ||
|
|
||
| `RolloutBoolean` will have a function `Enabled(h *fleet.Host) bool`. So instead of doing this: |
There was a problem hiding this comment.
nice! as a general comment, I think it would be useful to include which RolloutBoolean values are true in the error instance we store in Redis.
There was a problem hiding this comment.
Could you expand on your comment? I'm not sure I follow.
There was a problem hiding this comment.
yeah, sorry that was pretty abstract. I meant it would be nice to ultimately include which flags are enabled in fleetctl debug errors and/or fleetctl debug archive
| We would do: | ||
|
|
||
| ```go | ||
| if ac.HostSettings.EnableHostUsers.Enabled(host) { |
There was a problem hiding this comment.
very much a nit, but it would be nice to think about a naming convention for these booleans that reads better than EnableHostUsers.Enabled
There was a problem hiding this comment.
+1 for changing the name of this method, Enabled is a little too specific as well. I would propose something generic like Value or Get to retrieve the actual value of the boolean based on the host.
|
@chiiph this is looking great overall! What is the reasoning behind starting with three options (team, platform, and host) ?
I'm thinking we could start with just teams. And the following becomes the documented way for test performance of new features:
This also enables us tell the story of Fleet Premium being the paid, "enterprise edition" of Fleet. If you want to be able to test performance of new features, because you have performance concerns for your organization, sign up for Fleet Premium. |
Basically the complexity of doing all three is not much greater than doing just one, and we would cover more use cases that would lower the barrier of adoption/rollout of features we've seen struggles with being adopted in orgs. |
|
Thanks folks! The way we're going to proceed with this is to take this proposal as "here's what engineering thinks is possible/practical". Then Noah and I will work together to craft the "interface" portions of this and come back to Platform with prioritization for what gets implemented. |
| default: false | ||
| overrides: | ||
| platforms: | ||
| - linux: true |
There was a problem hiding this comment.
IDEA: To simplify first iteration, we should not allow users to add more than one overrides field.
You need to choose one of teams, hosts_ids or platforms, otherwise we would have to define a preference order between the three fields:
---
apiVersion: v1
kind: config
spec:
host_settings:
enable_software_inventory:
default: false
overrides:
teams:
- team1: true
host_ids:
- 3214: true
platforms:
- linux: false(E.g. What do we do if host 3214 is a linux host in team1?)
|
Merging this since we're going to implement it asap. |
Proposal for allow features to be rolled out in a more controlled way.