[Elastic Agent] Report running processes and their health statuses #2156

jen-huang · 2021-06-23T23:12:09Z

This is related to elastic/kibana#75236 and elastic/kibana#99068, both of which are longer-term efforts around enabling more granular status reporting of "integrations" that are running on Elastic Agent. But Agent has no concept of integrations, only which inputs/processes are running.

Still, reporting that information is useful and would get us closer to our longer-term goals. In the short term, this would enable Endpoint to filter agents by which ones are running Endpoint without doing additional JOIN-like queries.

I'd like to propose that agents:

Report what inputs/processes are running
Report the health status of each
Store the above information in local_metadata field

One thing to consider in deciding the data structure of of how this information should be stored, is that in the future we will want to allow subprocesses to report their own additional meta information, such as Endpoint process reporting an "isolated" status.

The text was updated successfully, but these errors were encountered:

elasticmachine · 2021-06-23T23:12:11Z

Pinging @elastic/agent (Team:Agent)

mostlyjason · 2021-06-24T12:58:07Z

@kevinlog What kind of health status info do you want reported? I saw you have policy response data that seems to indicate whether its running successfully. I suppose that only covers initialization, not if the endpoint becomes unhealthy later?

urso · 2021-06-24T18:34:47Z

@mostlyjason don't we already have another meta-issue regarding status reporting?

kevinlog · 2021-06-24T23:01:48Z

@mostlyjason

What kind of health status info do you want reported? I saw you have policy response data that seems to indicate whether its running successfully. I suppose that only covers initialization, not if the endpoint becomes unhealthy later?

Endpoint will periodically update its Policy Response if there are meaningful events that change Endpoint's compliance with how the user configured it, so it could change during its lifecycle.

@ferullo could give more details on when this may happen.

mostlyjason · 2021-06-28T19:42:42Z

@kevinlog Do we need another health status reporting mechanism if we already have policy response status? What additional use cases do you require that are not offered by the policy response status?

kevinlog · 2021-07-06T21:19:16Z

@mostlyjason sorry I missed this the first time.

Do we need another health status reporting mechanism if we already have policy response status? What additional use cases do you require that are not offered by the policy response status?

I don't believe Endpoint needs another mechanism, I just think that Fleet users may want additional insight if a subprocess isn't running correctly. Policy compliance for Endpoint is big. So if that's in a "Failed" state, it would be good to bubble that up to Agent so that it can be reported in the UI. Otherwise, all Agents are "Healthy".

I think we could do this in a generic way so that Integrations have the option to ship a "Success/Failure/Warning" status to let Fleet users know something isn't right. Then they could drill down further to individual Agents or solutions to investigate further.

Let me know if that makes sense

mostlyjason · 2021-07-12T14:11:55Z

++ sounds like a good idea to make policy responses a generic feature for all integrations. I haven't seen how it works currently, but conceptually it sounds good because it would provide a more structured error we could show on the agent details page, without the using having to dig through logs. It's also nice to have a uniform behavior if we don't have it already.

++ on having a failure response status put the agent into an unhealthy state so we keep our states consistent. Again, I'm not sure how that bubbles up but it sounds good conceptually.

As a general principal I think we don't expose processes to users directly, but the policy response could contain a aggregate of failures across all processes. We could show this aggregate info on the agent details page without exposing the underlying processes in the schema, which may result in a breaking change for users if we remove or change them in the future.

@jen-huang are you aligned on not exposing processes to users in the schema? How do you see this aligning with policy responses? Would it help to have a formal definition/design step for this issue?

botelastic · 2022-07-19T15:18:42Z

Hi!
We just realized that we haven't looked into this issue in a while. We're sorry!

We're labeling this issue as Stale to make it hit our filters and make sure we get back to it as soon as possible. In the meantime, it'd be extremely helpful if you could take a look at it as well and confirm its relevance. A simple comment with a nice emoji will be enough :+1.
Thank you for your contribution!

jen-huang · 2023-01-23T22:10:24Z

@pierrehilbert @nimarezainia Not sure if we have an appropriate meta issue that can supersede this one, so I am reopening for now but feel free to close and redirect.

pierrehilbert · 2023-01-26T15:57:23Z

We have this one: https://github.com/elastic/ingest-dev/issues/1367

jlind23 · 2024-05-14T07:07:42Z

Closing this as done.
cc @ycombinator

jen-huang added the Team:Elastic-Agent Label for the Agent team label Jun 23, 2021

nimarezainia added the 7.16-candidate label Jul 19, 2021

blakerouse mentioned this issue Aug 13, 2021

[Agent] Determining Agent Capabilities in fleet elastic/beats#27366

Closed

botelastic bot added the Stalled label Jul 19, 2022

botelastic bot closed this as completed Jan 15, 2023

jen-huang transferred this issue from elastic/beats Jan 23, 2023

jen-huang reopened this Jan 23, 2023

stale bot removed the Stalled label Jan 23, 2023

joshdover mentioned this issue Jan 26, 2023

Display a warning banner when Elastic Agents receives a high number of 429 from Elasticsearch elastic/kibana#140157

Closed

jlind23 closed this as not planned Won't fix, can't repro, duplicate, stale Feb 13, 2023

jlind23 reopened this Feb 13, 2023

jlind23 closed this as completed May 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Elastic Agent] Report running processes and their health statuses #2156

[Elastic Agent] Report running processes and their health statuses #2156

jen-huang commented Jun 23, 2021

elasticmachine commented Jun 23, 2021

mostlyjason commented Jun 24, 2021

urso commented Jun 24, 2021

kevinlog commented Jun 24, 2021

mostlyjason commented Jun 28, 2021

kevinlog commented Jul 6, 2021

mostlyjason commented Jul 12, 2021

botelastic bot commented Jul 19, 2022

jen-huang commented Jan 23, 2023

pierrehilbert commented Jan 26, 2023

jlind23 commented May 14, 2024

[Elastic Agent] Report running processes and their health statuses #2156

[Elastic Agent] Report running processes and their health statuses #2156

Comments

jen-huang commented Jun 23, 2021

elasticmachine commented Jun 23, 2021

mostlyjason commented Jun 24, 2021

urso commented Jun 24, 2021

kevinlog commented Jun 24, 2021

mostlyjason commented Jun 28, 2021

kevinlog commented Jul 6, 2021

mostlyjason commented Jul 12, 2021

botelastic bot commented Jul 19, 2022

jen-huang commented Jan 23, 2023

pierrehilbert commented Jan 26, 2023

jlind23 commented May 14, 2024