Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Monitoring data sent to the wrong endpoint (internal collection) #17937

Closed
fdartayre opened this issue Apr 23, 2020 · 4 comments · Fixed by #17991
Closed

Monitoring data sent to the wrong endpoint (internal collection) #17937

fdartayre opened this issue Apr 23, 2020 · 4 comments · Fixed by #17991
Assignees
Labels
bug Feature:Stack Monitoring libbeat monitoring Team:Services (Deprecated) Label for the former Integrations-Services team

Comments

@fdartayre
Copy link
Contributor

fdartayre commented Apr 23, 2020

Version: 7.6.2

Description:
When several hosts are configured in Elasticsearch output, the list of the hosts, except the first one, seems to be appended to the monitoring hosts.

For instance, with such a configuration

metricbeat.config.modules:
  path: ${path.config}/modules.d/*.yml
  reload.enabled: false

output.elasticsearch:
  hosts: ["host1", "host2", "host3"]
  username: "elastic"
  password: "redacted"

monitoring:
  elasticsearch:
    hosts: ["host4"] 
    username: "elastic"
    password: "redacted"

This gets logged:

2020-04-23T15:05:33.406+0200  INFO  elasticsearch/client.go:174 Elasticsearch url: http://host1:9200
2020-04-23T15:05:33.408+0200  INFO  elasticsearch/client.go:174 Elasticsearch url: http://host2:9200
2020-04-23T15:05:33.408+0200  INFO  elasticsearch/client.go:174 Elasticsearch url: http://host3:9200
2020-04-23T15:05:33.409+0200  INFO  [publisher] pipeline/module.go:110  Beat name: fred
2020-04-23T15:05:33.430+0200  INFO  elasticsearch/client.go:174 Elasticsearch url: http://host4:9200
2020-04-23T15:05:33.431+0200  INFO  elasticsearch/client.go:174 Elasticsearch url: http://host2:9200
2020-04-23T15:05:33.431+0200  INFO  elasticsearch/client.go:174 Elasticsearch url: http://host3:9200

Most of the time, the first host is picked and monitoring data is sent to the monitoring cluster, but not always.

Steps to Reproduce:
A simple configuration to reproduce the issue "deterministically":

metricbeat.config.modules:
  path: ${path.config}/modules.d/*.yml
  reload.enabled: false

output.elasticsearch:
  hosts: ["localhost", "localhost"]
  username: "elastic"
  password: "redacted"

monitoring:
  cluster_uuid: xxx
  elasticsearch:
    hosts: ["unresolved"] 
    username: "elastic"
    password: "redacted"

The hostname of the monitoring cluster cannot be resolved, and the .monitoring-beats-7- index is created on the production cluster instead.

Here's the log:

2020-04-23T15:11:37.699+0200  INFO  instance/beat.go:298  Setup Beat: metricbeat; Version: 7.6.2
2020-04-23T15:11:37.699+0200  INFO  elasticsearch/client.go:174 Elasticsearch url: http://localhost:9200
2020-04-23T15:11:37.699+0200  INFO  elasticsearch/client.go:174 Elasticsearch url: http://localhost:9200
2020-04-23T15:11:37.700+0200  INFO  [publisher] pipeline/module.go:110  Beat name: fred
2020-04-23T15:11:37.708+0200  INFO  elasticsearch/client.go:174 Elasticsearch url: http://unresolved:9200
2020-04-23T15:11:37.708+0200  INFO  elasticsearch/client.go:174 Elasticsearch url: http://localhost:9200
2020-04-23T15:11:37.708+0200  INFO  [monitoring]  log/log.go:118  Starting metrics logging every 30s
2020-04-23T15:11:37.708+0200  INFO  instance/beat.go:439  metricbeat start running.
2020-04-23T15:11:37.709+0200  INFO  cfgfile/reload.go:175 Config reloader started
2020-04-23T15:11:37.711+0200  INFO  cfgfile/reload.go:235 Loading of config files completed.
2020-04-23T15:11:37.725+0200  INFO  elasticsearch/client.go:757 Attempting to connect to Elasticsearch version 7.6.1
2020-04-23T15:11:37.731+0200  INFO  [license] licenser/es_callback.go:50  Elasticsearch license: Platinum
2020-04-23T15:11:37.738+0200  INFO  [monitoring]  elasticsearch/elasticsearch.go:269  Successfully connected to X-Pack Monitoring endpoint.
2020-04-23T15:11:37.739+0200  INFO  [monitoring]  elasticsearch/elasticsearch.go:283  Start monitoring stats metrics snapshot loop with period 10s.
2020-04-23T15:11:37.739+0200  INFO  [monitoring]  elasticsearch/elasticsearch.go:283  Start monitoring state metrics snapshot loop with period 1m0s.
2020-04-23T15:11:38.712+0200  INFO  pipeline/output.go:95 Connecting to backoff(elasticsearch(http://localhost:9200))
2020-04-23T15:11:38.713+0200  INFO  pipeline/output.go:95 Connecting to backoff(elasticsearch(http://localhost:9200))
2020-04-23T15:11:38.722+0200  INFO  elasticsearch/client.go:757 Attempting to connect to Elasticsearch version 7.6.1
2020-04-23T15:11:38.723+0200  INFO  elasticsearch/client.go:757 Attempting to connect to Elasticsearch version 7.6.1
2020-04-23T15:11:38.726+0200  INFO  [license] licenser/es_callback.go:50  Elasticsearch license: Platinum
2020-04-23T15:11:39.006+0200  INFO  pipeline/output.go:105  Connection to backoff(elasticsearch(http://localhost:9200)) established
2020-04-23T15:11:39.010+0200  INFO  [license] licenser/es_callback.go:50  Elasticsearch license: Platinum
2020-04-23T15:11:39.030+0200  INFO  pipeline/output.go:105  Connection to backoff(elasticsearch(http://localhost:9200)) established
2020-04-23T15:11:47.742+0200  INFO  pipeline/output.go:95 Connecting to backoff(failover(publish(elasticsearch(http://unresolved:9200)),publish(elasticsearch(http://localhost:9200))))
2020-04-23T15:11:47.744+0200  WARN  transport/tcp.go:52 DNS lookup failure "unresolved": lookup unresolved: no such host
2020-04-23T15:11:49.639+0200  ERROR pipeline/output.go:100  Failed to connect to backoff(failover(publish(elasticsearch(http://unresolved:9200)),publish(elasticsearch(http://localhost:9200)))): cannot connect underlying Elasticsearch client: Get http://unresolved:9200: lookup unresolved: no such host
2020-04-23T15:11:49.639+0200  INFO  pipeline/output.go:93 Attempting to reconnect to backoff(failover(publish(elasticsearch(http://unresolved:9200)),publish(elasticsearch(http://localhost:9200)))) with 1 reconnect attempt(s)
2020-04-23T15:11:49.643+0200  INFO  elasticsearch/client.go:757 Attempting to connect to Elasticsearch version 7.6.1
2020-04-23T15:11:49.649+0200  INFO  [license] licenser/es_callback.go:50  Elasticsearch license: Platinum
2020-04-23T15:11:49.661+0200  INFO  pipeline/output.go:105  Connection to backoff(failover(publish(elasticsearch(http://unresolved:9200)),publish(elasticsearch(http://localhost:9200)))) established
@ycombinator ycombinator added Team:Services (Deprecated) Label for the former Integrations-Services team Feature:Stack Monitoring labels Apr 23, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/integrations-services (Team:Services)

@elasticmachine
Copy link
Collaborator

Pinging @elastic/stack-monitoring (Stack monitoring)

@ycombinator
Copy link
Contributor

I am able to reproduce this bug locally. Thanks for finding it and filing it with great details, @fdartayre!

@ycombinator
Copy link
Contributor

ycombinator commented Apr 25, 2020

Doing some more testing, here's what we're seeing:

output.elasticsearch.hosts monitoring.elasticsearch.hosts Resulting monitoring hosts
o1 o1
o1 m1 m1
o1, o2 m1 m1, o2
o1, o2, o3 m1 m1, o2, o3
o1, o2, o3 m1, m2 m1, m2, o3
o1 m1, m2 m1, m2
o1 m1, m2, m3 m1, m2, m3
o1, o2 m1, m2, m3 m1, m2, m3
m1 m1

So the generalized (buggy) behavior seems to be that the monitoring hosts list seems to want to be at least as long as the output hosts list.


And here's what we want to see:

output.elasticsearch.hosts monitoring.elasticsearch.hosts Resulting monitoring hosts
o1 o1
o1 m1 m1
o1, o2 m1 m1
o1, o2, o3 m1 m1
o1, o2, o3 m1, m2 m1, m2
o1 m1, m2 m1, m2
o1 m1, m2, m3 m1, m2, m3
o1, o2 m1, m2, m3 m1, m2, m3
m1 m1

So generally we want to just take the monitoring hosts list as-is, if it's specified.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Feature:Stack Monitoring libbeat monitoring Team:Services (Deprecated) Label for the former Integrations-Services team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants