[Fleet] Support reporting policy status #82298

nchaulet · 2020-11-02T15:15:20Z

Description

The current way we report status (from agent events) is not optimal, we are not able to have status per inputs|integrations.

Agent status should be responsible of reporting their status: Agent are going to report their status during checkin

We should allow agents to report status per input.

Status will be persisted on the agent status Saved Object

Format details

We could use the following format

{
 "agent_status": {
   "status": "error"|"degraded"|"online",
   "message": "Human readable error message"
  },
  "policy_status": {
    "id": "policyId",
    "revision": policyRevison,
    "status": "error"|"degraded"|"online",
    "message": "Human readable error message",
    "inputs": [
      "id": "input-id",
      "status": "error"|"degraded"|"online",
      "message": "Human readable error message",
      "payload": {...aditionnalDataFromInput}
    ]
  }
}

In a case of a dynamic input it will be something like

{
 "agent.id": "agent-uuid",
 "agent.host": "agent-host",
 "agent_status": {
   "status": "error"|"degraded"|"online",
   "message": "Human readable error message"
  },
  "policy_status": {
    "id": "policyId",
    "revision": policyRevison,
    "status": "error"|"degraded"|"online",
    "message": "Human readable error message",
    "inputs": [
      "id": "agent-generated-input-id",
      "template_id": "fleet-id-in-case-of-dynamic-input-id",
      "status": "error"|"degraded"|"online",
      "message": "Human readable error message",
      "payload": {...aditionnalDataFromInput}
    ]
  }
}

elasticmachine · 2020-11-02T15:15:22Z

Pinging @elastic/ingest-management (Team:Ingest Management)

kevinlog · 2020-11-02T17:19:32Z

@nchaulet thanks for creating this ticket, I'm excited to see this feature.

I'm wondering if we should add the agent.id as it refers to the underlying beat/endpoint. I'm thinking of ways other beats or apps could extend this to add more Policy Status information at some point. So the Fleet app would support this entry point, but the other apps (such as Security) would be able to pull more details on that policy status starting with the agent.id.

These are just some early thoughts, but I'm thinking about how this could be extendible later.

FYI @paul-tavares @nnamdifrankie

nchaulet · 2020-11-02T17:25:52Z

@kevinlog for now this will probably be saved on the agent saved object the agent id is probably not mandatory here.
I think in the future (or in parallel) we want the agent to also save this directly to ES in this case we will have agent.id and also probably other agent metadata

ph · 2020-11-03T17:23:31Z

Is there any reason not to go the ES route directly?

nnamdifrankie · 2020-11-03T17:41:11Z

The change will impact Endpoint Agent and Fleet for now?

kevinlog · 2020-11-05T13:28:45Z

@nchaulet I've been thinking about this more and I have another question.

Just for clarification, the Elastic Agent would accept a status update from underlying subprocesses such as the Endpoint or Beats, correct? So, for instance, the Endpoint would report a status directly to the Agent and then the Agent would report that status. Is this correct?

FYI @ferullo we can chat more offline, but I'm thinking that we could take the policy response and report that as part of the status. That way, config errors in the Endpoint can be known at a higher level.

And then if we wanted to show more details in the app, we could use the input ID or agent ID to go and query another API.

nchaulet · 2020-11-05T14:38:16Z

Just for clarification, the Elastic Agent would accept a status update from underlying subprocesses such as the Endpoint or Beats, correct? So, for instance, the Endpoint would report a status directly to the Agent and then the Agent would report that status. Is this correct?

Yes it's correct, endpoint would report a status (healthy, error or degraded) a message and a payload and Elastic agent will report that as part of our Agent status

jen-huang · 2021-04-28T19:30:58Z

@nchaulet As we no longer have agent events with Fleet Server, is status reporting still a concern?

nchaulet added Team:Fleet Team label for Observability Data Collection Fleet team v7.11.0 labels Nov 2, 2020

nchaulet self-assigned this Nov 2, 2020

ph unassigned nchaulet Feb 15, 2021

ph removed the v7.11.0 label Feb 15, 2021

nchaulet mentioned this issue Mar 7, 2022

[Fleet] Improve status reporting for Agents elastic/elastic-agent#120

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Fleet] Support reporting policy status #82298

[Fleet] Support reporting policy status #82298

nchaulet commented Nov 2, 2020 •

edited

Loading

elasticmachine commented Nov 2, 2020

kevinlog commented Nov 2, 2020

nchaulet commented Nov 2, 2020

ph commented Nov 3, 2020

nnamdifrankie commented Nov 3, 2020

kevinlog commented Nov 5, 2020

nchaulet commented Nov 5, 2020

jen-huang commented Apr 28, 2021

[Fleet] Support reporting policy status #82298

[Fleet] Support reporting policy status #82298

Comments

nchaulet commented Nov 2, 2020 • edited Loading

Description

Format details

elasticmachine commented Nov 2, 2020

kevinlog commented Nov 2, 2020

nchaulet commented Nov 2, 2020

ph commented Nov 3, 2020

nnamdifrankie commented Nov 3, 2020

kevinlog commented Nov 5, 2020

nchaulet commented Nov 5, 2020

jen-huang commented Apr 28, 2021

nchaulet commented Nov 2, 2020 •

edited

Loading