-
Notifications
You must be signed in to change notification settings - Fork 144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Elastic Agent] Report running processes and their health statuses #2156
Comments
Pinging @elastic/agent (Team:Agent) |
@kevinlog What kind of health status info do you want reported? I saw you have policy response data that seems to indicate whether its running successfully. I suppose that only covers initialization, not if the endpoint becomes unhealthy later? |
@mostlyjason don't we already have another meta-issue regarding status reporting? |
Endpoint will periodically update its Policy Response if there are meaningful events that change Endpoint's compliance with how the user configured it, so it could change during its lifecycle. @ferullo could give more details on when this may happen. |
@kevinlog Do we need another health status reporting mechanism if we already have policy response status? What additional use cases do you require that are not offered by the policy response status? |
@mostlyjason sorry I missed this the first time.
I don't believe Endpoint needs another mechanism, I just think that Fleet users may want additional insight if a subprocess isn't running correctly. Policy compliance for Endpoint is big. So if that's in a "Failed" state, it would be good to bubble that up to Agent so that it can be reported in the UI. Otherwise, all Agents are "Healthy". I think we could do this in a generic way so that Integrations have the option to ship a "Success/Failure/Warning" status to let Fleet users know something isn't right. Then they could drill down further to individual Agents or solutions to investigate further. Let me know if that makes sense |
++ sounds like a good idea to make policy responses a generic feature for all integrations. I haven't seen how it works currently, but conceptually it sounds good because it would provide a more structured error we could show on the agent details page, without the using having to dig through logs. It's also nice to have a uniform behavior if we don't have it already. ++ on having a failure response status put the agent into an unhealthy state so we keep our states consistent. Again, I'm not sure how that bubbles up but it sounds good conceptually. As a general principal I think we don't expose processes to users directly, but the policy response could contain a aggregate of failures across all processes. We could show this aggregate info on the agent details page without exposing the underlying processes in the schema, which may result in a breaking change for users if we remove or change them in the future. @jen-huang are you aligned on not exposing processes to users in the schema? How do you see this aligning with policy responses? Would it help to have a formal definition/design step for this issue? |
Hi! We're labeling this issue as |
@pierrehilbert @nimarezainia Not sure if we have an appropriate meta issue that can supersede this one, so I am reopening for now but feel free to close and redirect. |
We have this one: https://github.com/elastic/ingest-dev/issues/1367 |
Closing this as done. |
This is related to elastic/kibana#75236 and elastic/kibana#99068, both of which are longer-term efforts around enabling more granular status reporting of "integrations" that are running on Elastic Agent. But Agent has no concept of integrations, only which inputs/processes are running.
Still, reporting that information is useful and would get us closer to our longer-term goals. In the short term, this would enable Endpoint to filter agents by which ones are running Endpoint without doing additional JOIN-like queries.
I'd like to propose that agents:
local_metadata
fieldOne thing to consider in deciding the data structure of of how this information should be stored, is that in the future we will want to allow subprocesses to report their own additional meta information, such as Endpoint process reporting an "isolated" status.
The text was updated successfully, but these errors were encountered: