Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add feature flag #56

Merged
merged 40 commits into from
Mar 9, 2023
Merged

Add feature flag #56

merged 40 commits into from
Mar 9, 2023

Conversation

AndersonQ
Copy link
Member

@AndersonQ AndersonQ commented Feb 1, 2023

closes #55
relates elastic/elastic-agent#2185

Testing notes

See testing notes on elastic/elastic-agent#2218.

@elasticmachine
Copy link
Collaborator

elasticmachine commented Feb 1, 2023

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2023-03-09T11:50:52.991+0000

  • Duration: 3 min 6 sec

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

@AndersonQ AndersonQ marked this pull request as ready for review February 1, 2023 14:01
@AndersonQ AndersonQ self-assigned this Feb 1, 2023
elastic-agent-client.proto Outdated Show resolved Hide resolved
@cmacknz cmacknz added the Team:Elastic-Agent Label for the Agent team label Feb 1, 2023
elastic-agent-client.proto Outdated Show resolved Hide resolved
// Expected returns the expected state and config for the unit.
func (u *Unit) Expected() (UnitState, UnitLogLevel, *proto.UnitExpectedConfig) {
// Expected returns the expected state, log leve, features and config for the unit.
func (u *Unit) Expected() (UnitState, UnitLogLevel, *proto.Features, *proto.UnitExpectedConfig) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changing ExpectedConfig returned value, do we have PRs on all repos using client, so we're aligned? endpoint needs to be aware as well

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, but they don't have to update right away. Anyway it's just a return value to ignore, it's easy to do

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just the Go client, only changes in the .proto file affect endpoint. That said this feature will not fully work unless we align with endpoint, cloudbeat, and APM as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also not sure the features makes sense as a per unit return value here, since the feature flags are not actually per unit.

A separate Features() channel seems like a better approach, that way users of this don't need to filter out and separately handle the feature changes from the unit changes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's what we were talking with anderson in the mornign. providing a separate channel to watch inside checkin loop

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, what is done now I think breaks the client API contract #56 (comment)

We need to change this before we can merge it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, it's a per component configuration. Sneaking it into the current config seemed fine to me before I understood how the protocol is actually implemented. Also I didn't want to make any big change to try to make to 8.7.

However I do think a change in the feature flag is ultimately a change in config. Pursuing the idea of making it already ready for future features that might be behind a feature flag, I can imagine a scenario where we'd need to reload components (in the generic sense, not as a beat) to properly apply the feature flags. And I don't know if it'd be a good idea having 2 flows reloading components, one for config and another for feature flags

Copy link
Member

@cmacknz cmacknz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the feature flags have been handled in the right place in the client code.

if u.configIdx != cfgIdx {
u.configIdx = cfgIdx
if !gproto.Equal(u.config.GetSource(), cfg.GetSource()) {
if !gproto.Equal(u.config.GetSource(), cfg.GetSource()) ||
!gproto.Equal(u.features, features) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will be a source of bugs, because we will now send unchanged unit configurations to clients of this code and they will need to filter them out.

The intent of the elastic agent client is to only send unit modified events to a client of the package when the actual unit configuration changed. This breaks that contract, and will send unchanged unit configurations if only the feature flags have been modified.

Copy link

@brian-mckinney brian-mckinney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My review applies only to the .proto changes. Adding the Features as a top level field will work better for Endpoint as well. 👍

@cmacknz
Copy link
Member

cmacknz commented Feb 8, 2023

The team spoke about what we should do in the Go client this morning and concluded:

  1. When a feature changes, we should trigger a UnitModified event for every unit.
  2. The UnitModified event should be extended to included the set of changes that triggered the event. The set of possible change triggers right now is a change to a Feature, a change to the log level, or a change to the Unit configuration. A single UnitModified event must be associated with at least one of these events, but may also include all of them together if all of them have changed.

We agreed that the changes to the .proto files are good as they are.

Copy link
Member

@cmacknz cmacknz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

API design is looking good, minus a few nits and missing tests.

elastic-agent-client.proto Outdated Show resolved Hide resolved
magefile.go Outdated Show resolved Hide resolved
pkg/client/client_v2.go Outdated Show resolved Hide resolved
pkg/client/client_v2.go Outdated Show resolved Hide resolved
pkg/client/unit.go Outdated Show resolved Hide resolved
pkg/client/client_v2.go Outdated Show resolved Hide resolved
case change := <-validClient.UnitChanged():
for _, t := range change.Triggers {
if t == TriggerFeature {
gotFQDN = change.Features.FQDN.Enabled
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth having a trigger per feature?

It seems like as soon as we support more than one feature we will force every user of this package to have to iterate over all of them to find what changed. Having something like TriggerFQDNFeature instead seems like it would be more convenient, at the cost of an additional enum entry per feature.

go func() {
for {
select {
case <-ctx.Done():
return
case change := <-client.UnitChanges():
case change := <-client.UnitChanged():
for _, t := range change.Triggers {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be testing that each of the trigger types works, not just the feature triggers.

pkg/client/unit.go Show resolved Hide resolved
pkg/client/client_v2.go Outdated Show resolved Hide resolved
elastic-agent-client.proto Show resolved Hide resolved
pkg/client/artifact_test.go Outdated Show resolved Hide resolved
pkg/client/client_v2.go Outdated Show resolved Hide resolved
elastic-agent-client.proto Show resolved Hide resolved
pkg/client/client_v2.go Outdated Show resolved Hide resolved
pkg/client/client_v2.go Outdated Show resolved Hide resolved
errCh chan error
changesCh chan UnitChanged
featuresMu sync.Mutex
features Features
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going to need to add the featuresIdx here as well. I would start the initial index at 1.

There is a reason for starting it at 1, and that is backwards compatibility. With older clients, I would modify the Elastic Agent to read the observed features idx at 0, as the client doesn't understand features and we should not trigger an unsettle when it doesn't match, because it will never match. If it gets a 1, then it knows its a client that supports it but doesn't know about it. Or you add a boolean field in the observed message to signal that the client supports features.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless I'm missing something, I'm not sure we need features (and related fields) here on the clientV2 struct at all, since features is a field on the Unit struct (similar to logLevel).

@AndersonQ AndersonQ force-pushed the 55-feature-flag branch 3 times, most recently from 30ce399 to 07faf31 Compare February 27, 2023 19:00
}

// Expected returns the expected state, log level, features and config for the unit.
func (u *Unit) Expected() Expected {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This definitely falls outside the scope of this PR, but I think we're really overloading the Expected() call. This adds a lot of complexity to clients, which now have to figure out what's actually changed every time a unit comes across the V2 interface. The shipper already does this, as some additional logic is required to see if a unit update is just a log level change. In the future, we might want to break out some of the data passed in the Expected struct into separate channels. It's probably too late to do that here though, and I don't wanna delay this PR any more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could be misunderstanding but I'm not sure much has changed from a "contract" perspective here? It looks like, before this PR Expected() was returning three values whereas it's returning four values now, but encapsulated in an Expected struct object. So callers will need to be updated whenever they bump up the version of their elastic-agent-client dependency.

That said, I see your point about Expected() returning too much information (even prior to this PR), which burdens the caller with having to figure out what's actually changed in a unit. I've created #62 to discuss and track this work.

@AndersonQ AndersonQ force-pushed the 55-feature-flag branch 2 times, most recently from eab0963 to 8ff7d95 Compare March 1, 2023 20:15
@ycombinator
Copy link
Contributor

There are a number of cleanup changes in this PR that aren't strictly related to the feature flag implementation. To make it easier to review this PR (for selfish reasons 😄), I've extracted the cleanup changes into their own PR: #61.

Once that cleanup PR is merged, I'll rebase this one on main so it ends up containing only FQDN-related changes.

@ycombinator ycombinator added enhancement New feature or request backport-v8.7.0 Automated backport with mergify labels Mar 8, 2023

const (
// UnitChangedAdded is when a new unit is added.
UnitChangedAdded UnitChangedType = 1
UnitChangedAdded UnitChangedType = iota // unit_added
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

having added as a default value for ChangedType, on purpose?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, good catch. I'm not sure this is a good idea either. And, besides, this being an exported symbol, the change from 1 => 0 (and thereon) could break existing consumers of this package? I'll reset this to 1.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

TriggeredNothing Trigger = 0 // nothing_triggered

// TriggeredConfigChange indicates a change in config triggered the change.
TriggeredConfigChange Trigger = 1 << iota // config_change_triggered
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you leave a comment that event can have multiple triggers and this is a mask so nobody accidentally changes this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor

@michalpristas michalpristas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think it looks good. alex had a good comment about Expected though

@ycombinator
Copy link
Contributor

alex had a good comment about Expected though

++ addressed it in the same thread; tl;dr: see #62.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-v8.7.0 Automated backport with mergify enhancement New feature or request Team:Elastic-Agent Label for the Agent team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add feature flags to the control protocol
9 participants