-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Elastic Agent] Collect Elastic Agent metrics and send to Elasticsearch #22394
Comments
Pinging @elastic/ingest-management (Team:Ingest Management) |
@michalpristas Can you "write" a proposal here and more specifically can we reuse the libbeat's code to collect and expose the metrics via. @ruflin Were you considering this similar to other beats, expose an HTTP endpoint and use the metricbeat module to collect that information? |
@ph I think in the future, Elastic Agent will ship all the metrics data directly, also for its processes. So this is probably a good start to try it out. |
@ruflin Agree but can it be done in two steps?
|
Sure, if this simplifies things. Lets make sure the HTTP is not "official" supported so we can remove it. |
@ravikesarwani FYI, forgot to ping you on this one. @michalpristas Can you take a look at #22394 (comment) ? |
@ph can you elaborate on this one i have also two options in mind:
or do we want to provide an option of not monitoring agent metrics? EDIT: exposing HTTP endpoint for metricbeat use will only report data from beat module. i think uptime and number of goroutines may be valuable. |
@michalpristas I have reworded the AC for Collected metric should be send to a specific agent data stream. Concerning the behavior you are describing, isn't this just implementation details? I meant we have an option in the configuration file which is:
Would like to hear what others think? |
@ph yeah i was wondering if using these options to disable monitoring, whether or not do we want to collect metrics of agent. e.g if there is any reason to continue collecting agent metrics if we set i think i have something we can work with as a draft, i will test it on all OSes to see what gets reported where and then sum it up somehow. |
Apart from Memory, CPU, Throughput, System load and Open file handles the other key metrics are:
Here's an image from beats overview page in stack monitoring that can potentially be used as a data point. @ph Should we modify the original issue comments with some of this information? |
@ravikesarwani The event rate would be great overall for an Elastic Agent, but it has a technical issue. At the moment Elastic Agent doesn't send events to elasticsearch directly, that is done by the beats running under the Elastic Agent. We already collect these metrics from the beats running under the Elastic Agent, but its not the Elastic Agent itself. |
The Agent metrics dashboard by default provides a view where it shows accumulated information for the Agent and all the beats metrics combined. If user filters just to see Agent metrics then the event rate graph would be empty. If user selects to filter one of more beats they will see the event rate information. |
docs examples of collected metrics by using http input linux{
"_index": ".ds-metrics-elastic_agent.elastic_agent-default-000001",
"_id": "f0V-PHYB92whPQxjg6Wk",
"_version": 1,
"_score": null,
"_source": {
"@timestamp": "2020-12-07T09:18:11.839Z",
"host": {
"os": {
"kernel": "4.4.0-31-generic",
"codename": "xenial",
"platform": "ubuntu",
"version": "16.04.1 LTS (Xenial Xerus)",
"family": "debian",
"name": "Ubuntu"
},
"id": "c0cc2a7efa902a719ada8ab6584b6bcb",
"containerized": false,
"ip": [
"172.17.0.1"
],
"mac": [
"08:00:ab:08:00:ab"
],
"hostname": "vagrant",
"architecture": "x86_64",
"name": "vagrant"
},
"agent": {
"type": "metricbeat",
"version": "8.0.0",
"ephemeral_id": "4470f14f-9bf1-4452-aee7-37e0db031c5c",
"id": "95a0ff4e-e36e-4c0a-a552-844799011648",
"name": "vagrant"
},
"event": {
"duration": 5269657,
"dataset": "elastic_agent.elastic_agent",
"module": "http"
},
"elastic_agent": {
"version": "8.0.0",
"id": "f0eb529c-3512-429f-b6aa-37264dddb402",
"process": "elastic-agent",
"snapshot": false
},
"metricset": {
"name": "json",
"period": 10000
},
"data_stream": {
"type": "metrics",
"dataset": "elastic_agent.elastic_agent",
"namespace": "default"
},
"system": {
"process": {
"cpu": {
"system": {
"ticks": 1745,
"time": {
"ms": 1745
}
},
"total": {
"ticks": 7291,
"time": {
"ms": 7291
},
"value": 7291
},
"user": {
"time": {
"ms": 5546
},
"ticks": 5546
}
},
"memory": {
"size": 74531072
},
"fd": {
"open": 21
},
"cgroup": {
"cpuacct": {
"id": "elastic-agent.service",
"total": {
"ns": 5728663070
}
},
"memory": {
"mem": {
"limit": {
"bytes": 9223372036854772000
},
"usage": {
"bytes": 462458880
}
},
"id": "elastic-agent.service"
}
}
}
},
"ecs": {
"version": "1.7.0"
},
"service": {
"address": "http://unix/stats",
"type": "http"
},
"http": {}
},
"fields": {
"@timestamp": [
"2020-12-07T09:18:11.839Z"
]
},
"sort": [
1607332691839
]
} mac{
"_index": ".ds-metrics-elastic_agent.elastic_agent-default-000001",
"_id": "2QxlPHYBjGFDnaF_EkU-",
"_version": 1,
"_score": null,
"_source": {
"@timestamp": "2020-12-07T08:50:24.348Z",
"event": {
"dataset": "elastic_agent.elastic_agent",
"module": "http",
"duration": 3040126
},
"metricset": {
"name": "json",
"period": 10000
},
"system": {
"process": {
"cpu": {
"system": {
"ticks": 1745,
"time": {
"ms": 1745
}
},
"total": {
"ticks": 7291,
"time": {
"ms": 7291
},
"value": 7291
},
"user": {
"time": {
"ms": 5546
},
"ticks": 5546
}
},
"memory": {
"size": 74531072
}
}
},
"host": {
"mac": [
"ac:de:48:ac:de:48"
],
"name": "MacBook-Pro-2.local",
"hostname": "MacBook-Pro-2.local",
"architecture": "x86_64",
"os": {
"name": "Mac OS X",
"kernel": "18.7.0",
"build": "18G6032",
"platform": "darwin",
"version": "10.14.6",
"family": "darwin"
},
"id": "FC609F24-07E1-54EA-8E33-56F9D5A7A97E",
"ip": [
"127.0.0.2"
]
},
"agent": {
"ephemeral_id": "0cf156d9-4398-4c29-a52d-596ec7a93f5f",
"id": "e09c86a1-f5dd-4fe8-898c-70de832e2a9e",
"name": "MacBook-Pro-2.local",
"type": "metricbeat",
"version": "8.0.0"
},
"service": {
"address": "http://unix/stats",
"type": "http"
},
"http": {},
"data_stream": {
"dataset": "elastic_agent.elastic_agent",
"namespace": "default",
"type": "metrics"
},
"elastic_agent": {
"snapshot": false,
"version": "8.0.0",
"id": "02e6478a-72b9-4a5e-bd63-0f6be2ef4dba",
"process": "elastic-agent"
},
"ecs": {
"version": "1.6.0"
}
},
"fields": {
"@timestamp": [
"2020-12-07T08:50:24.348Z"
]
},
"sort": [
1607331024348
]
} |
I guess you could use the rename processor to move the fields under root? Why is there a |
there's beat.memstat because we're collecting beatMetrics using http. i can use rename to move it up if this is disturbing. |
Can you elaborate on the |
what i mean by beatMetrics is a registry with the same name in a libbeat code. this registry is part of Default registry under |
I think we should be careful to reuse too much of libbeat here as Elastic Agent is not a Beat. If we need the stats inside beatMetrics, could we get them from beatMetrics but put together our own even with the values? I think Elastic Agent must be in control on where and what metrics are exposed. |
i agree with agent having control of what is being published but even with that in mind i worked with the assumption we agreed on using libbeat to collect metrics at first at least. at the same time we need to take into account that we're walking towards dashboards where aggregated metrics of agent and its processes are being displays. we probably need to have a common format in mind so aggregations are not cumbersome. i updated rules which cleaned up resulting document of agent. mainly cleaning up what's not needed from |
updated #22394 (comment) structure again, important part is collecting just memory, cpu and fd (on linux) |
This LGTM. |
LGTM, seems OK for first version. |
@michalpristas would it be possible to also include |
@simitt yes i can do that |
Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane) |
Pinging @elastic/stack-monitoring (Stack monitoring) |
@michalpristas it seems that you have already done this PR: #22793 |
AC
Implementations proposal:
??
The text was updated successfully, but these errors were encountered: