Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fleet] Support reporting policy status #82298

Open
nchaulet opened this issue Nov 2, 2020 · 8 comments
Open

[Fleet] Support reporting policy status #82298

nchaulet opened this issue Nov 2, 2020 · 8 comments
Labels
Team:Fleet Team label for Observability Data Collection Fleet team

Comments

@nchaulet
Copy link
Member

nchaulet commented Nov 2, 2020

Description

The current way we report status (from agent events) is not optimal, we are not able to have status per inputs|integrations.

Agent status should be responsible of reporting their status: Agent are going to report their status during checkin

We should allow agents to report status per input.

Status will be persisted on the agent status Saved Object

Format details

We could use the following format

{
 "agent_status": {
   "status": "error"|"degraded"|"online",
   "message": "Human readable error message"
  },
  "policy_status": {
    "id": "policyId",
    "revision": policyRevison,
    "status": "error"|"degraded"|"online",
    "message": "Human readable error message",
    "inputs": [
      "id": "input-id",
      "status": "error"|"degraded"|"online",
      "message": "Human readable error message",
      "payload": {...aditionnalDataFromInput}
    ]
  }
}

In a case of a dynamic input it will be something like

{
 "agent.id": "agent-uuid",
 "agent.host": "agent-host",
 "agent_status": {
   "status": "error"|"degraded"|"online",
   "message": "Human readable error message"
  },
  "policy_status": {
    "id": "policyId",
    "revision": policyRevison,
    "status": "error"|"degraded"|"online",
    "message": "Human readable error message",
    "inputs": [
      "id": "agent-generated-input-id",
      "template_id": "fleet-id-in-case-of-dynamic-input-id",
      "status": "error"|"degraded"|"online",
      "message": "Human readable error message",
      "payload": {...aditionnalDataFromInput}
    ]
  }
}
@nchaulet nchaulet added Team:Fleet Team label for Observability Data Collection Fleet team v7.11.0 labels Nov 2, 2020
@nchaulet nchaulet self-assigned this Nov 2, 2020
@elasticmachine
Copy link
Contributor

Pinging @elastic/ingest-management (Team:Ingest Management)

@kevinlog
Copy link
Contributor

kevinlog commented Nov 2, 2020

@nchaulet thanks for creating this ticket, I'm excited to see this feature.

I'm wondering if we should add the agent.id as it refers to the underlying beat/endpoint. I'm thinking of ways other beats or apps could extend this to add more Policy Status information at some point. So the Fleet app would support this entry point, but the other apps (such as Security) would be able to pull more details on that policy status starting with the agent.id.

These are just some early thoughts, but I'm thinking about how this could be extendible later.

FYI @paul-tavares @nnamdifrankie

@nchaulet
Copy link
Member Author

nchaulet commented Nov 2, 2020

@kevinlog for now this will probably be saved on the agent saved object the agent id is probably not mandatory here.
I think in the future (or in parallel) we want the agent to also save this directly to ES in this case we will have agent.id and also probably other agent metadata

@ph
Copy link
Contributor

ph commented Nov 3, 2020

Is there any reason not to go the ES route directly?

@nnamdifrankie
Copy link
Contributor

The change will impact Endpoint Agent and Fleet for now?

@kevinlog
Copy link
Contributor

kevinlog commented Nov 5, 2020

@nchaulet I've been thinking about this more and I have another question.

Just for clarification, the Elastic Agent would accept a status update from underlying subprocesses such as the Endpoint or Beats, correct? So, for instance, the Endpoint would report a status directly to the Agent and then the Agent would report that status. Is this correct?

FYI @ferullo we can chat more offline, but I'm thinking that we could take the policy response and report that as part of the status. That way, config errors in the Endpoint can be known at a higher level.

And then if we wanted to show more details in the app, we could use the input ID or agent ID to go and query another API.

@nchaulet
Copy link
Member Author

nchaulet commented Nov 5, 2020

Just for clarification, the Elastic Agent would accept a status update from underlying subprocesses such as the Endpoint or Beats, correct? So, for instance, the Endpoint would report a status directly to the Agent and then the Agent would report that status. Is this correct?

Yes it's correct, endpoint would report a status (healthy, error or degraded) a message and a payload and Elastic agent will report that as part of our Agent status

@ph ph unassigned nchaulet Feb 15, 2021
@ph ph removed the v7.11.0 label Feb 15, 2021
@jen-huang
Copy link
Contributor

@nchaulet As we no longer have agent events with Fleet Server, is status reporting still a concern?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:Fleet Team label for Observability Data Collection Fleet team
Projects
None yet
Development

No branches or pull requests

6 participants