[Fleet] Add agent policy API to add settings not yet supported by UI #158699

juliaElastic · 2023-05-31T08:08:46Z

To increase supportability, Fleet should support a way to add custom settings to agent policies.
This will enable unblocking clients when a policy change is required to fix agents managed by Fleet.
Today this is not easy because Fleet UI/API only enables updating certain fields in agent policy.

Use cases:
Global parameters on the agent policy (monitoring, timeout settings)

Tasks:

Create API: POST /api/fleet/agent_policies/:id/custom_settings
- The request body should be key value pairs that are going to be appended to the agent policy
Make sure existing agent policy API updates don't remove custom settings
The custom settings should be visible in the View policy UI as yaml

Open question:

Do we need any whitelisting on which fields can be added as custom settings?
Do we allow to override existing fields in custom settings e.g. agent.download.sourceURI, outputs?

cc @joshdover

The text was updated successfully, but these errors were encountered:

elasticmachine · 2023-05-31T08:08:48Z

Pinging @elastic/fleet (Team:Fleet)

joshdover · 2023-05-31T08:22:14Z

The custom settings should be visible in the View policy UI as yaml

I think they also need to be visible at the bottom of the "settings" tab of the Agent policy. It's relevant there because these settings could override other things in the policy and create issues. I'd even argue they should be shown in a yellow callout banner w/ a link to API documentation on how to remove them.

/api/fleet/agent_policies/:id/custom_settings

Looking at this again I think the API should be /api/fleet/agent_policies/:id/overrides instead to make it more obvious this is out-of-the-norm option.

Do we need any whitelisting on which fields can be added as custom settings?

Do we allow to override existing fields in custom settings e.g. agent.download.sourceURI, outputs?

I think we should make this API quite flexible so it can be used to workaround any bugs in Fleet. If we restrict this too much it will lose it's value. That said, I think it may make sense to not support any keys with the inputs prefix as it's highly unlikely these overrides will work for overriding keys in that array and we have other ways to ship fixes out-of-band to that part of the config (a prerelease version of a package).

joshdover · 2023-05-31T08:22:23Z

cc @cmacknz

juliaElastic · 2023-05-31T08:41:47Z

I think they also need to be visible at the bottom of the "settings" tab of the Agent policy

Do you mean as a read only box or as a textarea to update? The latter increases the scope slightly to add yaml validation, something like we have in Fleet Server integration policy "Advanced YAML settings"

joshdover · 2023-05-31T09:10:36Z

Readonly is fine IMO

joshdover · 2023-05-31T09:11:51Z

I would even prefer we break this into two phases and do 0 UI changes in the first phase. Most important thing is to unblock fixing customer deployments. Next most important thing is helping users know when they're still using these overrides.

nchaulet · 2023-06-05T19:47:34Z

Do we need a new API for that could it be an extra property on the existing agent policy API?

POST /agent_policies
{
"overrides": {
"key": "val"
}
}

juliaElastic · 2023-06-06T12:56:44Z

Do we need a new API for that could it be an extra property on the existing agent policy API?

We could, though perhaps the intention was to keep the overrides as an escape hatch and not include in the main API as it is not intended for normal use.

nchaulet · 2023-06-07T15:04:34Z

We could, though perhaps the intention was to keep the overrides as an escape hatch and not include in the main API as it is not intended for normal use.

I think with or without a new API this should be pointed in the API doc. I do not have a strong argument for a new API or reuse the policy API for that. Reusing the policy API is maybe a little simpler to implement/maintains.

jlind23 · 2023-06-07T17:33:36Z

I think with or without a new API this should be pointed in the API doc.

I agree with that.

I do not have a strong argument for a new API or reuse the policy API for that

Aren't we mixing things here? I would rather add a new experimental API, that could lately be removed if needed. Experimental will also avoid us officially supporting it.

joshdover · 2023-06-08T07:13:53Z

One reason we were thinking that it should be a separate API is to prevent accidental removals of any custom settings. For example, if we use the existing API we have to make sure that PUT /agent_policies from our agent policy settings page doesn't forget to include the previous overrides. This is just one example, others would be impacted as well.

I do agree it would be more straightforward though to not handle this as a special case with a separate API. Another option could be to use the existing API but require that clients explicitly set an overrides key to null in order to remove these settings on PUTs.

nchaulet · 2023-06-08T15:07:10Z

One reason we were thinking that it should be a separate API is to prevent accidental removals of any custom settings. For example, if we use the existing API we have to make sure that PUT /agent_policies from our agent policy settings page doesn't forget to include the previous overrides. This is just one example, others would be impacted as well.

Actually all of the PUT API fleet API are in fact PATCH so if the user do not explicitly send a null value it will not be cleared.

lucabelluccini · 2023-06-13T09:48:20Z

To improve supportability:

Can we highlight a custom setting has been pushed to Elastic Agents in the diagnostic?
The idea is to immediately spot if a user is adding custom settings. As the custom settings will not be validated and they can be any setting we usually allow in a Beat product, it is crucial to spot if a user added them without manually checking every single rendered configuration on Elastic Agent.
Can we ensure we have a documentation and (when it will exist) a warning in the custom setting section of the policy UI to warn we do not test against any possible combination of settings which will be passed down to the components of Elastic Agent and things can break?
Clearly state the maturity (tech preview, etc...) of this feature in docs so that users will not abuse of this?

jlind23 · 2023-06-13T15:41:13Z

Can we highlight a custom setting has been pushed to Elastic Agents in the diagnostic?
The idea is to immediately spot if a user is adding custom settings. As the custom settings will not be validated and they can be any setting we usually allow in a Beat product, it is crucial to spot if a user added them without manually checking every single rendered configuration on Elastic Agent.

Custom settings will be set the same way as standard settings so i'm not sure this is something we should highlight. However we might probably want to create an audit log entry for this.

We should probably pull @kilfoyle and @karenzone to discuss the docs options here.

lucabelluccini · 2023-06-13T16:30:43Z

Custom settings will be set the same way as standard settings so i'm not sure this is something we should highlight.

Yes, but those settings are not getting validated (at least from what I understood from the issue) and they can break the behavior (we will not test any possible combination of them + they will possibly override some defaults we are using).

E.g. if a user adds:

queue.disk:
  max_size: 10GB

What will happen?
If this setting gets pushed down from EA to components:

Each component might generate a queue consuming up to 10GB (they're not preallocated but still could happen)?
When we upgrade, are the queues going to be copied over?

cmacknz · 2023-06-13T17:54:13Z

If a user adds queue.disk to the agent policy today, nothing will happen because the agent policy doesn't support queue configuration yet. This would just be ignored.

If elastic/beats#35615 is implemented as planned, then in theory they could try to configure the disk queue before it is supported properly (sharing the queue across upgrades for example), and the agent should probably refuse to configure it in that case.

In general if someone puts valid YAML into the agent policy for a feature or directive that isn't supported it will be silently ignored. This is the same behavior Beats has. This probably isn't ideal but fixing this is challenging, ideally we would generate an error or warning when this happens.

lucabelluccini · 2023-06-13T18:45:13Z

If a user adds queue.disk to the agent policy today, nothing will happen because the agent policy doesn't support queue configuration yet. This would just be ignored.

Thank you @cmacknz I agree - I am aware the queue settings are ignored at the moment.

But if we implement this feature - we are opening the door to users possibly expecting any setting which is injected via the YAML section will work.

We already had situations where:

users struggle to know if a setting is taken in account or not (due to the fact Beats doesn't validate the settings) - customers add a setting with a typo, it is silently accepted but it's actually not used because it is never parsed by the binaries/components
users might use valid settings (example queue.disk) to overcome to some current limitations with Elastic Agent and expect it to work in production, while we're introducing this setting to apply some "light" workaround

That's why I would like to make it clear in the diagnostics and Elastic Agent policy the custom settings added by the user are visible as it might be crucial to spot if the user injected some "unsupported" setting in the freeform YAML.

kilfoyle · 2023-06-13T21:51:29Z

Regarding the docs:

Can we ensure we have a documentation and (when it will exist) a warning in the custom setting section of the policy UI to warn we do not test against any possible combination of settings which will be passed down to the components of Elastic Agent and things can break?

As far as I know the only API docs we have currently are these generated API docs which will include the newly added API descriptions, but when the settings UI is updated we can certainly also add some warning text to the corresponding Fleet UI settings page.

nimarezainia · 2023-06-21T06:50:25Z

Would it be possible to get the full list of what possible configuration can be applied here?

My initial gut feel is that this is only a tool for Support/Engineering to resolve customer issues and not something that we want to publicize too much in our documentation. I would like to avoid unintended consequences of users applying a config to the policy.

juliaElastic · 2023-06-27T09:38:00Z

One use case to use this API is to configure the agent download timeout: elastic/elastic-agent#4580

To set download timeout:

Create an agent policy
Take the agent policy id, name, namespace (mandatory for overrides request)
Update agent policy with overrides

PUT kbn:/api/fleet/agent_policies/24292680-158e-11ee-b8ce-f39328f70950
{
   "name": "Test policy",
    "namespace": "default",
    "overrides": {
         "agent": {
              "download": {
                "timeout": "120s"
              }
            }
    }
}

Go to Fleet / Agent policy / Click policy / View policy
Verify that the download timeout is set under agent

TODO: Verify that the download timeout applies when the agent is being upgraded

juliaElastic added Team:Fleet Team label for Observability Data Collection Fleet team Supportability Improve our (devs, SREs, support eng, users) ability to troubleshoot/self-service product better. labels May 31, 2023

joshdover changed the title ~~[Fleet] Add agent policy API to save custom settings~~ [Fleet] Add agent policy API to add settings not yet supported by UI May 31, 2023

nchaulet self-assigned this Jun 6, 2023

nchaulet mentioned this issue Jun 9, 2023

[Fleet] Allow to overrides agent policy #159414

Merged

2 tasks

cmacknz mentioned this issue Jun 13, 2023

Allow queue configuration to be specified under the output type elastic/beats#35615

Closed

2 tasks

nchaulet closed this as completed in #159414 Jun 14, 2023

cmacknz mentioned this issue Jun 21, 2023

Allow underlying beats module configuration overrides in Elastic Agent from Fleet integration configuration page #155339

Closed

juliaElastic mentioned this issue Aug 21, 2023

[Fleet] The agent upgrade download timeout should be configurable. elastic/elastic-agent#4580

Open

cmacknz mentioned this issue Jul 4, 2023

E2E test TestFleetManagedUpgrade fails when attempting to poll for the watcher to complete elastic/elastic-agent#2977

Open

cmacknz mentioned this issue Oct 11, 2023

Elastic Agent always starts http monitoring service even if metrics are disabled elastic/elastic-agent#2734

Closed

kilfoyle mentioned this issue Mar 5, 2024

Add custom settings page elastic/ingest-docs#962

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Fleet] Add agent policy API to add settings not yet supported by UI #158699

[Fleet] Add agent policy API to add settings not yet supported by UI #158699

juliaElastic commented May 31, 2023 •

edited

Loading

elasticmachine commented May 31, 2023

joshdover commented May 31, 2023

joshdover commented May 31, 2023

juliaElastic commented May 31, 2023

joshdover commented May 31, 2023

joshdover commented May 31, 2023

nchaulet commented Jun 5, 2023

juliaElastic commented Jun 6, 2023

nchaulet commented Jun 7, 2023

jlind23 commented Jun 7, 2023

joshdover commented Jun 8, 2023

nchaulet commented Jun 8, 2023

lucabelluccini commented Jun 13, 2023

jlind23 commented Jun 13, 2023

lucabelluccini commented Jun 13, 2023

cmacknz commented Jun 13, 2023

lucabelluccini commented Jun 13, 2023

kilfoyle commented Jun 13, 2023

nimarezainia commented Jun 21, 2023 •

edited

Loading

juliaElastic commented Jun 27, 2023 •

edited

Loading

[Fleet] Add agent policy API to add settings not yet supported by UI #158699

[Fleet] Add agent policy API to add settings not yet supported by UI #158699

Comments

juliaElastic commented May 31, 2023 • edited Loading

elasticmachine commented May 31, 2023

joshdover commented May 31, 2023

joshdover commented May 31, 2023

juliaElastic commented May 31, 2023

joshdover commented May 31, 2023

joshdover commented May 31, 2023

nchaulet commented Jun 5, 2023

juliaElastic commented Jun 6, 2023

nchaulet commented Jun 7, 2023

jlind23 commented Jun 7, 2023

joshdover commented Jun 8, 2023

nchaulet commented Jun 8, 2023

lucabelluccini commented Jun 13, 2023

jlind23 commented Jun 13, 2023

lucabelluccini commented Jun 13, 2023

cmacknz commented Jun 13, 2023

lucabelluccini commented Jun 13, 2023

kilfoyle commented Jun 13, 2023

nimarezainia commented Jun 21, 2023 • edited Loading

juliaElastic commented Jun 27, 2023 • edited Loading

juliaElastic commented May 31, 2023 •

edited

Loading

nimarezainia commented Jun 21, 2023 •

edited

Loading

juliaElastic commented Jun 27, 2023 •

edited

Loading