Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Elastic Agent] Sending monitoring metrics to logs datastream #26758

Closed
mostlyjason opened this issue Jul 7, 2021 · 8 comments · Fixed by #26828
Closed

[Elastic Agent] Sending monitoring metrics to logs datastream #26758

mostlyjason opened this issue Jul 7, 2021 · 8 comments · Fixed by #26828
Assignees
Labels
Team:Elastic-Agent Label for the Agent team

Comments

@mostlyjason
Copy link

It looks like Elastic Agent is sending it's own monitoring metrics to the logs datastream. This is not useful as a log message. It should be sending this information to the metrics datastream instead.

Example event:

{
  "_index": ".ds-logs-elastic_agent.metricbeat-default-2021.07.07-000196",
  "_type": "_doc",
  "_id": "DWZtgXoBysd6RRjof1WI",
  "_version": 1,
  "_score": null,
  "fields": {
    "elastic_agent.version": [
      "7.13.2"
    ],
    "monitoring.metrics.beat.cpu.system.ticks": [
      2191340
    ],
    "monitoring.metrics.metricbeat.system.socket.events": [
      8
    ],
    "monitoring.metrics.metricbeat.system.cpu.events": [
      3
    ],
    "host.hostname": [
      "unifi"
    ],
    "host.mac": [
    ],
    "monitoring.metrics.metricbeat.system.network_summary.events": [
      3
    ],
    "monitoring.metrics.libbeat.output.write.bytes": [
      382478
    ],
    "monitoring.metrics.metricbeat.system.network_summary.success": [
      3
    ],
    "host.os.version": [
      "18.04.5 LTS (Bionic Beaver)"
    ],
    "monitoring.metrics.metricbeat.system.service.events": [
      162
    ],
    "agent.name": [
      "unifi"
    ],
    "monitoring.metrics.beat.info.uptime.ms": [
      162121222
    ],
    "monitoring.metrics.metricbeat.system.service.success": [
      162
    ],
    "monitoring.metrics.beat.memstats.memory_alloc": [
      19120104
    ],
    "host.os.type": [
      "linux"
    ],
    "monitoring.metrics.metricbeat.system.entropy.success": [
      3
    ],
    "monitoring.metrics.metricbeat.system.raid.failures": [
      3
    ],
    "monitoring.metrics.metricbeat.system.uptime.success": [
      3
    ],
    "input.type": [
      "log"
    ],
    "monitoring.metrics.libbeat.pipeline.clients": [
      17
    ],
    "agent.hostname": [
      "unifi"
    ],
    "monitoring.metrics.libbeat.pipeline.events.total": [
      248
    ],
    "monitoring.metrics.libbeat.pipeline.events.active": [
      0
    ],
    "host.architecture": [
      "x86_64"
    ],
    "monitoring.metrics.metricbeat.system.socket.success": [
      8
    ],
    "agent.id": [
      "0d376a06-dccc-4fb9-94ea-d8c5ecea380a"
    ],
    "host.containerized": [
      false
    ],
    "monitoring.metrics.system.load.norm.15": [
      0.11
    ],
    "monitoring.metrics.beat.cpu.total.value": [
      4682010
    ],
    "monitoring.metrics.metricbeat.system.load.events": [
      3
    ],
    "log.logger": [
      "monitoring"
    ],
    "monitoring.metrics.libbeat.pipeline.queue.acked": [
      248
    ],
    "host.ip": [
    ],
    "agent.type": [
      "filebeat"
    ],
    "monitoring.metrics.metricbeat.system.diskio.events": [
      21
    ],
    "monitoring.metrics.beat.handles.open": [
      19
    ],
    "monitoring.metrics.metricbeat.system.diskio.success": [
      21
    ],
    "monitoring.metrics.beat.cpu.total.ticks": [
      4682010
    ],
    "elastic_agent.snapshot": [
      false
    ],
    "host.id": [
      "a4719423bdb94e1e80df8ea652c5dd59"
    ],
    "monitoring.metrics.libbeat.output.events.active": [
      0
    ],
    "monitoring.metrics.system.load.5": [
      0.12
    ],
    "monitoring.metrics.beat.memstats.memory_total": [
      613619360744
    ],
    "elastic_agent.id": [
      "3cace087-149e-4507-b0ee-5e6a3afdf14a"
    ],
    "monitoring.metrics.metricbeat.system.process.success": [
      18
    ],
    "host.os.codename": [
      "bionic"
    ],
    "monitoring.metrics.system.load.1": [
      0.27
    ],
    "monitoring.metrics.beat.memstats.rss": [
      39579648
    ],
    "monitoring.metrics.metricbeat.system.load.success": [
      3
    ],
    "log.origin.file.name": [
      "log/log.go"
    ],
    "@timestamp": [
      "2021-07-07T14:44:33.126Z"
    ],
    "host.os.platform": [
      "ubuntu"
    ],
    "log.file.path": [
      "/opt/Elastic/Agent/data/elastic-agent-686ba4/logs/default/metricbeat-json.log"
    ],
    "data_stream.dataset": [
      "elastic_agent.metricbeat"
    ],
    "agent.ephemeral_id": [
      "8361cc5c-d090-45a6-a564-3ccc134c261b"
    ],
    "monitoring.metrics.metricbeat.system.memory.success": [
      3
    ],
    "monitoring.metrics.metricbeat.system.uptime.events": [
      3
    ],
    "monitoring.metrics.metricbeat.system.network.events": [
      12
    ],
    "monitoring.metrics.metricbeat.system.cpu.success": [
      3
    ],
    "monitoring.metrics.libbeat.pipeline.events.published": [
      248
    ],
    "monitoring.metrics.beat.memstats.gc_next": [
      21159136
    ],
    "monitoring.metrics.metricbeat.system.process_summary.success": [
      3
    ],
    "host.os.name": [
      "Ubuntu"
    ],
    "log.level": [
      "info"
    ],
    "monitoring.metrics.beat.runtime.goroutines": [
      112
    ],
    "host.name": [
      "unifi"
    ],
    "monitoring.metrics.beat.cpu.system.time.ms": [
      376
    ],
    "monitoring.metrics.metricbeat.system.socket_summary.success": [
      3
    ],
    "monitoring.metrics.metricbeat.system.network.success": [
      12
    ],
    "log.offset": [
      220314
    ],
    "data_stream.type": [
      "logs"
    ],
    "monitoring.metrics.libbeat.output.events.total": [
      248
    ],
    "monitoring.metrics.beat.handles.limit.soft": [
      1024
    ],
    "ecs.version": [
      "1.8.0"
    ],
    "agent.version": [
      "7.13.2"
    ],
    "monitoring.metrics.libbeat.output.events.batches": [
      9
    ],
    "host.os.family": [
      "debian"
    ],
    "monitoring.metrics.metricbeat.system.raid.events": [
      3
    ],
    "monitoring.metrics.metricbeat.system.entropy.events": [
      3
    ],
    "monitoring.metrics.beat.cpu.user.time.ms": [
      490
    ],
    "monitoring.metrics.beat.cgroup.memory.mem.usage.bytes": [
      81920
    ],
    "monitoring.metrics.beat.cgroup.cpuacct.total.ns": [
      1159162390
    ],
    "monitoring.metrics.system.load.norm.1": [
      0.27
    ],
    "monitoring.metrics.system.load.norm.5": [
      0.12
    ],
    "monitoring.metrics.beat.cpu.user.ticks": [
      2490670
    ],
    "monitoring.ecs.version": [
      "1.6.0"
    ],
    "monitoring.metrics.metricbeat.system.process_summary.events": [
      3
    ],
    "monitoring.metrics.metricbeat.system.memory.events": [
      3
    ],
    "monitoring.metrics.libbeat.output.read.bytes": [
      62838
    ],
    "monitoring.metrics.system.load.15": [
      0.11
    ],
    "monitoring.metrics.beat.cpu.total.time.ms": [
      866
    ],
    "monitoring.metrics.beat.info.ephemeral_id": [
      "c8914934-42d8-4d7a-9ae5-bda5db39d2e0"
    ],
    "monitoring.metrics.libbeat.config.module.running": [
      17
    ],
    "host.os.kernel": [
      "4.15.0-147-generic"
    ],
    "monitoring.metrics.libbeat.output.events.acked": [
      248
    ],
    "monitoring.metrics.metricbeat.system.process.events": [
      18
    ],
    "log.origin.file.line": [
      144
    ],
    "monitoring.metrics.metricbeat.system.socket_summary.events": [
      3
    ],
    "data_stream.namespace": [
      "default"
    ],
    "message": [
      "Non-zero metrics in the last 30s"
    ],
    "monitoring.metrics.beat.handles.limit.hard": [
      4096
    ],
    "event.dataset": [
      "elastic_agent.metricbeat"
    ]
  },
  "sort": [
    1625669073126
  ]
}

For confirmed bugs, please report:

  • Version: 7.13.2
  • Operating System: Linux
  • Steps to Reproduce: Install Elastic Agent and enable metrics monitoring in the agent policy
@mostlyjason mostlyjason added bug Team:Elastic-Agent Label for the Agent team labels Jul 7, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/agent (Team:Agent)

@michel-laterman michel-laterman self-assigned this Jul 7, 2021
@ruflin ruflin removed the bug label Jul 8, 2021
@ruflin
Copy link
Member

ruflin commented Jul 8, 2021

Beats logs its metrics every 30s. This was built before there was even a metrics endpoint in Beats. Elastic Agent behaves as expected here as it just "tails" the log file. Instead we should likely disable the logging of these metrics which I think can be done through a config option.

I remove the bug label as I don't consider this a bug.

@michel-laterman
Copy link
Contributor

We'll add a config option to disable logging metrics. However the default behaviour will not change as we may need access to these metrics (through the log files) to help us debug issues.

@michel-laterman
Copy link
Contributor

@ruflin, my PR adds a setting that can be used in stand-alone mode to stop the beats from emitting metrics. If we want this to be enabled in fleet mode, the setting should be passed as part of the policy, and set through Kibana. I'm not sure what project (fleet/kibana) is responsible for doing that/where to make the issue.
If we want a short-term work around for fleet mode, we can enable the agent to pick it up as part of fleet.yml, however this would still require a user to edit the file manually.

@ruflin
Copy link
Member

ruflin commented Jul 14, 2021

I don't think we should build any short term hacks around this and I'm also not sure on the urgency. @jen-huang As soon as there is a config option, we could like use this as the default in the policy?

@jen-huang
Copy link

Each agent policy needs to explicitly declare what agent monitoring options should be enabled, by default it is stored as ["logs", "metrics"] on the policy. We can certainly add this new monitoring option to be enabled by default, but it will only kick in for new agent policies.

We could add a migration for 7.15 to add it to existing policies though. But I think we will want some conditional logic there, I suppose log_metrics should be enabled when the policy also has metrics monitoring enabled?

@michel-laterman
Copy link
Contributor

michel-laterman commented Jul 15, 2021

They are decoupled it's possible to enable log_metrics and not metrics.
However, we have an agent.logging.metrics.enabled setting that we may reuse to pass to beats instead of adding the new one.

The default settings will not change from the current behaviour (metrics appear in logs).

@michel-laterman
Copy link
Contributor

We are reusing agent.logging.metrics.enabled instead of introducing another setting.
The default (true) has not changed. If it's set to false then the elastic-agent and all beats running under it will not have the metrics entries appear in the logs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:Elastic-Agent Label for the Agent team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants