Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide Kibana Alerting functionality for Fleet #79310

Closed
jeffvestal opened this issue Oct 2, 2020 · 15 comments
Closed

Provide Kibana Alerting functionality for Fleet #79310

jeffvestal opened this issue Oct 2, 2020 · 15 comments
Labels
Team:Fleet Team label for Observability Data Collection Fleet team

Comments

@jeffvestal
Copy link

Describe the feature:
As an administrator responsible for managing Elastic Agents with Fleet, I would like to easily enable alerting to be notified when an agent goes offline. Ideally I would be able to configure this with a Kibana Alerting flyout within the Fleet UI.

Describe a specific use case for the feature:
With the elastic agent being centrally managed within Kibana, it is important for end users to know when one or more of the agents goes offline. We have other indicators down the line (eg. log rate drop offs) but operators/administrators need to be able to configure alerting for offline agents regardless of the modules they are running.

Currently there is only a visual indication when you navigate to Ingest Manager -> Fleet (screenshot below). Without OOTB alerting functionality we risk missing data, affecting other Solutions, and affecting business disruptions for use cases that rely on timely delivery of data through our pipeline.

Fleet UI - Agent offline

cc: @mukeshelastic

@jeffvestal jeffvestal added the Team:Fleet Team label for Observability Data Collection Fleet team label Oct 2, 2020
@elasticmachine
Copy link
Contributor

Pinging @elastic/ingest-management (Team:Ingest Management)

@mostlyjason
Copy link
Contributor

@mukeshelastic FYI for agent observability. I think we already write agent status to ES, but could use docs and maybe out of the box alerts?

@renzedj
Copy link

renzedj commented May 31, 2023

@mukeshelastic FYI for agent observability. I think we already write agent status to ES, but could use docs and maybe out of the box alerts?

The agent status isn't an ongoing track of Agent status though. It's just the current status (e.g., I can't look at it and see that 12h ago an agent was offline for 30m), and IIRC from checking this out as an option, it doesn't write when it goes offline or when something misses a check-in.

@zez3
Copy link

zez3 commented Aug 24, 2023

This would be indeed very useful

@zez3
Copy link

zez3 commented Nov 9, 2023

@jamiehynds any update on this?

@leandrojmp
Copy link

Hello, is this still planned?

Having an OOTB alert of when an Agent is offline should be a Core Feature.

@mikefrommars
Copy link

When using Elastic as a Security and Compliance tool I need to know when an Agent goes offline since that means I am no longer collecting logs via the agent.

Any updates on this? Has the feature been approved?

@X-Dean
Copy link

X-Dean commented Feb 13, 2024

I would like to have also alerting capabilities when resource usage of an Agent is very high.

I had an issue not much time ago when Azure did some changes on network side, agent could not connect to event hub, but CPU usage of agent was 100%. Even after Azure restored the connectivity, agent was still on very high CPU usage and very few events could be ingested. It was needed a restart of the agent service to bring things to normal working status.

@nyp-cgranata
Copy link

+1 on this issue. In a similar situation as @mikefrommars.

@farbod-sec
Copy link

This is a bedrock / foundational feature for SIEM/security and some o11y. It needs to be turnkey and OOTB. I have customers regularly asking about how to accomplish this.

To add on to what Vestal posted earlier, SIEMs also have silent log alerting somewhere nearby agent heartbeat alerting as they go hand in hand for operators. It would be nice to have a single page to configure and monitor heartbeat + log rate.

@jen-huang
Copy link
Contributor

cc @nimarezainia

@nimarezainia
Copy link
Contributor

@farbod-sec please open an ER for the SIEM related enhancements you are referring to.

Regarding alerts on agents: please refer to the agent documentation: https://www.elastic.co/guide/en/fleet/current/monitor-elastic-agent.html#fleet-alerting

We are now exposing the various agent statuses required to build alerts. These were previously hidden which prohibited us building any ML or Alerts on top of them. I'm closing this request for now and if there are enhancements to be made as a follow-on would be happy to consider them.

@carlosaya
Copy link

carlosaya commented May 8, 2024

@nimarezainia am i missing something, or does the link you provided only state that a COUNT of the agents in each state is provided? This is a start (I guess), but we really need to know WHICH agents are offline so that we can raise individual alerts for each agent when it goes offline.

@nimarezainia
Copy link
Contributor

@carlosaya you are right, it will give an alert when the count changes. We currently don't have the ability to create an alert on an individual agent basis. Something we plan to address.

@carlosaya
Copy link

@carlosaya you are right, it will give an alert when the count changes. We currently don't have the ability to create an alert on an individual agent basis. Something we plan to address.

@nimarezainia Thanks for the confirmation. Is there an issue I can keep an eye on for that feature?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:Fleet Team label for Observability Data Collection Fleet team
Projects
None yet
Development

No branches or pull requests