[Elastic Agent] Collect Elastic Agent metrics and send to Elasticsearch #22394

ph · 2020-11-03T18:51:58Z

AC

I should be able to see the CPU Usage on the running machine. Am using too much CPU?
I should be able to see Disk usage on the running machine. Am I running out of disk space?
I should be able to see the Memory usage of Elastic Agent. Is the Elastic Agent using too much memory?
I should be able to see the system memory. Am I running out of memory?
I should be able to see fd usage. Am I keep too many files open?
Collected metric should be send to a specific agent data stream.
Data should follow ECS fields

Implementations proposal:

??

elasticmachine · 2020-11-03T18:52:00Z

Pinging @elastic/ingest-management (Team:Ingest Management)

ph · 2020-11-03T18:54:27Z

@michalpristas Can you "write" a proposal here and more specifically can we reuse the libbeat's code to collect and expose the metrics via.

@ruflin Were you considering this similar to other beats, expose an HTTP endpoint and use the metricbeat module to collect that information?

ruflin · 2020-11-05T11:56:30Z

@ph I think in the future, Elastic Agent will ship all the metrics data directly, also for its processes. So this is probably a good start to try it out.

ph · 2020-11-05T13:52:17Z

@ruflin Agree but can it be done in two steps?

Step1: Expose HTTP and use metricbeat.
Step2: Have an internal pipeline for sending metric?

ruflin · 2020-11-06T08:19:29Z

Sure, if this simplifies things. Lets make sure the HTTP is not "official" supported so we can remove it.

ph · 2020-11-12T15:34:21Z

@ravikesarwani FYI, forgot to ping you on this one.

@michalpristas Can you take a look at #22394 (comment) ?

michalpristas · 2020-11-25T08:36:06Z

@ph can you elaborate on this one I should be able to query a data_stream query the collected metric.
as a first step i will use libbeat code to expose metrics stats using socket/npipe. then metricbeat will collect these.

i have also two options in mind:

first one is that we dont allow disabling monitoring and we will monitor all processes and agent metrics at all times.
second one will result in monitoring all if enabled and monitoring just agent when disabled.

or do we want to provide an option of not monitoring agent metrics?

EDIT: exposing HTTP endpoint for metricbeat use will only report data from beat module. i think uptime and number of goroutines may be valuable.
other options i managed to accomplish by correctly configuring metricbeat so it watches elastic agent and reports, cpu,fd, memory... all these resulting into single index

ph · 2020-11-25T13:43:58Z

@michalpristas I have reworded the AC for Collected metric should be send to a specific agent data stream.

Concerning the behavior you are describing, isn't this just implementation details? I meant we have an option in the configuration file which is:

# agent.monitoring:
#   # enabled turns on monitoring of running processes
#   enabled: true
#   # enables log monitoring
#   logs: true
#   # enables metrics monitoring
#   metrics: true

Would like to hear what others think?

michalpristas · 2020-11-25T13:58:06Z

@ph yeah i was wondering if using these options to disable monitoring, whether or not do we want to collect metrics of agent. e.g if there is any reason to continue collecting agent metrics if we set agent.monitoring.metrics:false
there probably isn't. i just wanted to double check.

i think i have something we can work with as a draft, i will test it on all OSes to see what gets reported where and then sum it up somehow.

ravikesarwani · 2020-12-02T15:56:46Z

Apart from Memory, CPU, Throughput, System load and Open file handles the other key metrics are:

Event rate: Am I okay in catching up or falling behind. This is really critical for logs where the generated data can fluctuate a lot.
Any failure information (fail rates, output errors)

Here's an image from beats overview page in stack monitoring that can potentially be used as a data point.

@ph Should we modify the original issue comments with some of this information?

blakerouse · 2020-12-02T17:46:23Z

@ravikesarwani The event rate would be great overall for an Elastic Agent, but it has a technical issue. At the moment Elastic Agent doesn't send events to elasticsearch directly, that is done by the beats running under the Elastic Agent. We already collect these metrics from the beats running under the Elastic Agent, but its not the Elastic Agent itself.

ravikesarwani · 2020-12-04T02:03:27Z

The Agent metrics dashboard by default provides a view where it shows accumulated information for the Agent and all the beats metrics combined. If user filters just to see Agent metrics then the event rate graph would be empty. If user selects to filter one of more beats they will see the event rate information.

michalpristas · 2020-12-07T09:23:40Z

docs examples of collected metrics by using http input
see that i needed to place collected metrics into metrics field as http did not enable me to put them at the root level

linux

{
	"_index": ".ds-metrics-elastic_agent.elastic_agent-default-000001",
	"_id": "f0V-PHYB92whPQxjg6Wk",
	"_version": 1,
	"_score": null,
	"_source": {
		"@timestamp": "2020-12-07T09:18:11.839Z",
		"host": {
			"os": {
				"kernel": "4.4.0-31-generic",
				"codename": "xenial",
				"platform": "ubuntu",
				"version": "16.04.1 LTS (Xenial Xerus)",
				"family": "debian",
				"name": "Ubuntu"
			},
			"id": "c0cc2a7efa902a719ada8ab6584b6bcb",
			"containerized": false,
			"ip": [
				"172.17.0.1"
			],
			"mac": [
				"08:00:ab:08:00:ab"
			],
			"hostname": "vagrant",
			"architecture": "x86_64",
			"name": "vagrant"
		},
		"agent": {
			"type": "metricbeat",
			"version": "8.0.0",
			"ephemeral_id": "4470f14f-9bf1-4452-aee7-37e0db031c5c",
			"id": "95a0ff4e-e36e-4c0a-a552-844799011648",
			"name": "vagrant"
		},
		"event": {
			"duration": 5269657,
			"dataset": "elastic_agent.elastic_agent",
			"module": "http"
		},
		"elastic_agent": {
			"version": "8.0.0",
			"id": "f0eb529c-3512-429f-b6aa-37264dddb402",
			"process": "elastic-agent",
			"snapshot": false
		},
		"metricset": {
			"name": "json",
			"period": 10000
		},
		"data_stream": {
			"type": "metrics",
			"dataset": "elastic_agent.elastic_agent",
			"namespace": "default"
		},
		"system": {
			"process": {
				"cpu": {
					"system": {
						"ticks": 1745,
						"time": {
							"ms": 1745
						}
					},
					"total": {
						"ticks": 7291,
						"time": {
							"ms": 7291
						},
						"value": 7291
					},
					"user": {
						"time": {
							"ms": 5546
						},
						"ticks": 5546
					}
				},
				"memory": {
					"size": 74531072
				},
				"fd": {
					"open": 21
				},
				"cgroup": {
					"cpuacct": {
						"id": "elastic-agent.service",
						"total": {
							"ns": 5728663070
						}
					},
					"memory": {
						"mem": {
							"limit": {
								"bytes": 9223372036854772000
							},
							"usage": {
								"bytes": 462458880
							}
						},
						"id": "elastic-agent.service"
					}
				}
			}
		},
		"ecs": {
			"version": "1.7.0"
		},
		"service": {
			"address": "http://unix/stats",
			"type": "http"
		},
		"http": {}
	},
	"fields": {
		"@timestamp": [
			"2020-12-07T09:18:11.839Z"
		]
	},
	"sort": [
		1607332691839
	]
}

mac

{
	"_index": ".ds-metrics-elastic_agent.elastic_agent-default-000001",
	"_id": "2QxlPHYBjGFDnaF_EkU-",
	"_version": 1,
	"_score": null,
	"_source": {
		"@timestamp": "2020-12-07T08:50:24.348Z",
		"event": {
			"dataset": "elastic_agent.elastic_agent",
			"module": "http",
			"duration": 3040126
		},
		"metricset": {
			"name": "json",
			"period": 10000
		},
		"system": {
			"process": {
				"cpu": {
					"system": {
						"ticks": 1745,
						"time": {
							"ms": 1745
						}
					},
					"total": {
						"ticks": 7291,
						"time": {
							"ms": 7291
						},
						"value": 7291
					},
					"user": {
						"time": {
							"ms": 5546
						},
						"ticks": 5546
					}
				},
				"memory": {
					"size": 74531072
				}
			}
		},
		"host": {
			"mac": [
				"ac:de:48:ac:de:48"
			],
			"name": "MacBook-Pro-2.local",
			"hostname": "MacBook-Pro-2.local",
			"architecture": "x86_64",
			"os": {
				"name": "Mac OS X",
				"kernel": "18.7.0",
				"build": "18G6032",
				"platform": "darwin",
				"version": "10.14.6",
				"family": "darwin"
			},
			"id": "FC609F24-07E1-54EA-8E33-56F9D5A7A97E",
			"ip": [
				"127.0.0.2"
			]
		},
		"agent": {
			"ephemeral_id": "0cf156d9-4398-4c29-a52d-596ec7a93f5f",
			"id": "e09c86a1-f5dd-4fe8-898c-70de832e2a9e",
			"name": "MacBook-Pro-2.local",
			"type": "metricbeat",
			"version": "8.0.0"
		},
		"service": {
			"address": "http://unix/stats",
			"type": "http"
		},
		"http": {},
		"data_stream": {
			"dataset": "elastic_agent.elastic_agent",
			"namespace": "default",
			"type": "metrics"
		},
		"elastic_agent": {
			"snapshot": false,
			"version": "8.0.0",
			"id": "02e6478a-72b9-4a5e-bd63-0f6be2ef4dba",
			"process": "elastic-agent"
		},
		"ecs": {
			"version": "1.6.0"
		}
	},
	"fields": {
		"@timestamp": [
			"2020-12-07T08:50:24.348Z"
		]
	},
	"sort": [
		1607331024348
	]
}

ruflin · 2020-12-07T10:01:26Z

I guess you could use the rename processor to move the fields under root?

Why is there a beat object in the stats for memstats for example?

michalpristas · 2020-12-08T09:41:46Z

Why is there a beat object in the stats for memstats for example?

there's beat.memstat because we're collecting beatMetrics using http. i can use rename to move it up if this is disturbing.
i tried rename for moving to root but either i'm using it wrong or it's not possible. looking at the code of rename it should not be possible

ruflin · 2020-12-09T12:10:30Z

Can you elaborate on the beatMetrics using http?

michalpristas · 2020-12-09T12:26:50Z

what i mean by beatMetrics is a registry with the same name in a libbeat code. this registry is part of Default registry under stats namespace which is exposed using /stats endpoint. This endpoint I consume using http input.

ruflin · 2020-12-09T12:31:46Z

I think we should be careful to reuse too much of libbeat here as Elastic Agent is not a Beat. If we need the stats inside beatMetrics, could we get them from beatMetrics but put together our own even with the values? I think Elastic Agent must be in control on where and what metrics are exposed.

michalpristas · 2020-12-09T13:16:37Z

i agree with agent having control of what is being published but even with that in mind i worked with the assumption we agreed on using libbeat to collect metrics at first at least.

at the same time we need to take into account that we're walking towards dashboards where aggregated metrics of agent and its processes are being displays. we probably need to have a common format in mind so aggregations are not cumbersome.

i updated rules which cleaned up resulting document of agent. mainly cleaning up what's not needed from beatMetrics registry and renaming metrics.beat to metrics.process this should be universal for each process.
additionally beats can provide more information in this document as ravi mentioned somewhere up here like output events and RW errors. agent can jump in and use those as soon it has ability to collect them

michalpristas · 2020-12-10T13:11:50Z

updated #22394 (comment) structure again, important part is system.process

collecting just memory, cpu and fd (on linux)
total system metrics such as free/used disk space or or free RAM would need a different approach either Go APM Agent or agent logic itself. these can be added later.

ruflin · 2020-12-10T13:27:52Z

This LGTM.

ph · 2020-12-10T13:41:26Z

LGTM, seems OK for first version.

simitt · 2020-12-10T13:46:25Z

@michalpristas would it be possible to also include cgroup metrics from the beginning? Similar to what has been done in #21113. This is important when running in containers.

michalpristas · 2020-12-10T13:53:32Z

@simitt yes i can do that

michalpristas · 2020-12-10T14:32:39Z

@ruflin @simitt i updated PR here to match proposed doc schema. if you can take a look it would be great
#22793

elasticmachine · 2021-10-29T10:25:49Z

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

elasticmachine · 2021-10-29T10:27:56Z

Pinging @elastic/stack-monitoring (Stack monitoring)

jlind23 · 2021-10-29T12:49:36Z

@michalpristas it seems that you have already done this PR: #22793
What should we do with this issue then? Is there still something to deliver?

ph added the Team:Ingest Management label Nov 3, 2020

ph assigned michalpristas Nov 3, 2020

ph added the v7.11.0 label Nov 3, 2020

michalpristas mentioned this issue Nov 30, 2020

[Ingest Management] Agent expose metrics #22793

Merged

6 tasks

ph mentioned this issue Dec 2, 2020

Elastic Agent's integration package elastic/integrations#434

Closed

5 tasks

michalpristas mentioned this issue Dec 8, 2020

[Ingest Management] Fixed parsing of npipe URI #22978

Merged

6 tasks

michalpristas mentioned this issue Dec 9, 2020

Cherry-pick #22978 to 7.x: Fixed parsing of npipe URI #23014

Merged

6 tasks

ruflin changed the title ~~[Elastic Agent] Collect Elastic Agent metrics and send to Elastic Search~~ [Elastic Agent] Collect Elastic Agent metrics and send to Elasticsearch Dec 10, 2020

blakerouse mentioned this issue Dec 14, 2020

Cherry-pick #22793 to 7.x: [Ingest Management] Agent expose metrics #23105

Merged

6 tasks

jen-huang removed the Team:Ingest Management label Oct 27, 2021

botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Oct 27, 2021

jsoriano added the Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team label Oct 29, 2021

botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Oct 29, 2021

jsoriano added the Feature:Stack Monitoring label Oct 29, 2021

jlind23 closed this as completed Feb 3, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Elastic Agent] Collect Elastic Agent metrics and send to Elasticsearch #22394

[Elastic Agent] Collect Elastic Agent metrics and send to Elasticsearch #22394

ph commented Nov 3, 2020 •

edited

Loading

elasticmachine commented Nov 3, 2020

ph commented Nov 3, 2020

ruflin commented Nov 5, 2020

ph commented Nov 5, 2020 •

edited

Loading

ruflin commented Nov 6, 2020

ph commented Nov 12, 2020

michalpristas commented Nov 25, 2020 •

edited

Loading

ph commented Nov 25, 2020

michalpristas commented Nov 25, 2020

ravikesarwani commented Dec 2, 2020

blakerouse commented Dec 2, 2020

ravikesarwani commented Dec 4, 2020

michalpristas commented Dec 7, 2020 •

edited

Loading

ruflin commented Dec 7, 2020

michalpristas commented Dec 8, 2020 •

edited

Loading

ruflin commented Dec 9, 2020

michalpristas commented Dec 9, 2020 •

edited

Loading

ruflin commented Dec 9, 2020

michalpristas commented Dec 9, 2020 •

edited

Loading

michalpristas commented Dec 10, 2020

ruflin commented Dec 10, 2020

ph commented Dec 10, 2020

simitt commented Dec 10, 2020

michalpristas commented Dec 10, 2020

michalpristas commented Dec 10, 2020

elasticmachine commented Oct 29, 2021

elasticmachine commented Oct 29, 2021

jlind23 commented Oct 29, 2021

[Elastic Agent] Collect Elastic Agent metrics and send to Elasticsearch #22394

[Elastic Agent] Collect Elastic Agent metrics and send to Elasticsearch #22394

Comments

ph commented Nov 3, 2020 • edited Loading

elasticmachine commented Nov 3, 2020

ph commented Nov 3, 2020

ruflin commented Nov 5, 2020

ph commented Nov 5, 2020 • edited Loading

ruflin commented Nov 6, 2020

ph commented Nov 12, 2020

michalpristas commented Nov 25, 2020 • edited Loading

ph commented Nov 25, 2020

michalpristas commented Nov 25, 2020

ravikesarwani commented Dec 2, 2020

blakerouse commented Dec 2, 2020

ravikesarwani commented Dec 4, 2020

michalpristas commented Dec 7, 2020 • edited Loading

linux

mac

ruflin commented Dec 7, 2020

michalpristas commented Dec 8, 2020 • edited Loading

ruflin commented Dec 9, 2020

michalpristas commented Dec 9, 2020 • edited Loading

ruflin commented Dec 9, 2020

michalpristas commented Dec 9, 2020 • edited Loading

michalpristas commented Dec 10, 2020

ruflin commented Dec 10, 2020

ph commented Dec 10, 2020

simitt commented Dec 10, 2020

michalpristas commented Dec 10, 2020

michalpristas commented Dec 10, 2020

elasticmachine commented Oct 29, 2021

elasticmachine commented Oct 29, 2021

jlind23 commented Oct 29, 2021

ph commented Nov 3, 2020 •

edited

Loading

ph commented Nov 5, 2020 •

edited

Loading

michalpristas commented Nov 25, 2020 •

edited

Loading

michalpristas commented Dec 7, 2020 •

edited

Loading

michalpristas commented Dec 8, 2020 •

edited

Loading

michalpristas commented Dec 9, 2020 •

edited

Loading

michalpristas commented Dec 9, 2020 •

edited

Loading