Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Agent should collect and report CPU and memory usage of monitoring components #4082

Closed
ycombinator opened this issue Jan 12, 2024 · 7 comments · Fixed by #4326
Closed

Agent should collect and report CPU and memory usage of monitoring components #4082

ycombinator opened this issue Jan 12, 2024 · 7 comments · Fixed by #4326
Assignees
Labels
bug Something isn't working Team:Elastic-Agent Label for the Agent team

Comments

@ycombinator
Copy link
Contributor

Currently, Agent is only collecting and, therefore, reporting CPU and memory usage for itself and it's non-monitoring Beats components. It is not collecting and reporting CPU and memory usage for the monitoring Beats, leading to undercounting.

So, for a default Fleet policy with the system integration and monitoring (logs and metrics both) enabled, Agent runs:

  • 1 metricbeat instance for the system integration: system/metrics-default,
  • 1 filebeat instance for the system integration: log-default,
  • 2 metricbeat instances for metrics monitoring: http/metrics-monitoring and beat/metrics-monitoring, and
  • 1 filebeat instance for logs monitoring: filestream-monitoring.

In the above scenario, Agent is currently only collecting and reporting CPU and memory usage for the metricbeat and filebeat instances for the system integration: system/metrics-default and log-default. It is not collecting and reporting CPU and memory usage for the metricbeat and filebeat instances for monitoring: http/metrics-monitoring, beat/metrics-monitoring, and filestream-monitoring.

@ycombinator ycombinator added bug Something isn't working Team:Elastic-Agent Label for the Agent team labels Jan 12, 2024
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent (Team:Elastic-Agent)

@fearful-symmetry
Copy link
Contributor

fearful-symmetry commented Feb 12, 2024

@ycombinator when you say "collect and report" what are you imagining? We make sure metrics for these end up in the last 30s log lines? Something else?

@ycombinator
Copy link
Contributor Author

@ycombinator when you say "collect and report" what are you imagining? We make sure metrics for these end up in the last 30s log lines? Something else?

I meant collect and report monitoring metrics of the monitoring beats' resource usage the same way we do today for any non-monitoring beats. The issue description has an example that should help clarify things further, but let me know if it doesn't.

@fearful-symmetry
Copy link
Contributor

@ycombinator I think we're on different pages here. So, the last 30s metrics for the monitoring beats are exposed via the agent logs:

grep -rh "last 30s" . | jq .component.id
"filestream-monitoring"
"system/metrics-default"
"beat/metrics-monitoring"
"http/metrics-monitoring"
"filestream-monitoring"
"system/metrics-default"
"beat/metrics-monitoring"
"http/metrics-monitoring"
"filestream-monitoring"
"system/metrics-default"
"beat/metrics-monitoring"
"http/metrics-monitoring"
"filestream-monitoring"
"system/metrics-default"
"beat/metrics-monitoring"
"http/metrics-monitoring"
"filestream-monitoring"

What specific monitoring interface or UX are you thinking of here? The Kibana/fleet agent resource usage GUI? Something else?

@ycombinator
Copy link
Contributor Author

What specific monitoring interface or UX are you thinking of here? The Kibana/fleet agent resource usage GUI? Something else?

Screenshot 2024-02-13 at 10 22 39

Currently, the circled metrics do not include resource usage for the monitoring beats themselves. This results in undercounting the resources being used by Agent and all it's component processes. See #4005 (comment) for background discussion that led to this (and a few other) issues.

@cmacknz
Copy link
Member

cmacknz commented Mar 21, 2024

Re-opening, this was reverted because the test was failing #4451

@pierrehilbert
Copy link
Contributor

@fearful-symmetry I think we can close this one now that #4462 is merged right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Team:Elastic-Agent Label for the Agent team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants