Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metricbeat elasticsearch module when output is Kafka #11519

Closed
ppf2 opened this issue Mar 28, 2019 · 20 comments
Closed

Metricbeat elasticsearch module when output is Kafka #11519

ppf2 opened this issue Mar 28, 2019 · 20 comments
Labels
enhancement Feature:Stack Monitoring Metricbeat Metricbeat needs_team Indicates that the issue/PR needs a Team:* label Stalled

Comments

@ppf2
Copy link
Member

ppf2 commented Mar 28, 2019

I am linking this to one of the items on the main issue on monitoring ES via metricbeat (#7035).

One aspect we haven't talked about much (or documented) is what happens when the user's metricbeat is configured to route all events through Kafka (using output.kafka).

Per our guidelines today (https://www.elastic.co/guide/en/elasticsearch/reference/current/configuring-metricbeat.html), the configuration of the Elasticsearch module requires the output to be output.elasticsearch.

What is the recommended set up here for output.kafka users?

  1. Will they send everything through output.kafka, and have separate Logstash ES outputs downstream, 1 for regular events to the production cluster, and 1 for routing metricbeat ES stack module events to .monitoring-es* indices on the remote monitoring cluster?

  2. Or is there a way to reuse Logstash's xpack.monitoring.elasticsearch.hosts for the connection to route the metricbeat ES stack module events to the remote monitoring cluster?

  3. Will they have to set up a 2nd metricbeat (with output.elasticsearch just for the ES stack modules) to route events directly to the remote monitoring cluster, while the original metricbeat instance will continue to send other events through Kafka.

Until we figure out our story on this, it will be helpful to update https://www.elastic.co/guide/en/elasticsearch/reference/current/configuring-metricbeat.html with some information on our current guidelines when metricbeat does not have an output.elasticsearch (or maybe it's simply a not-currently-supported statement, etc..). Thx!

@elasticmachine
Copy link
Collaborator

Pinging @elastic/stack-monitoring

@cachedout
Copy link
Contributor

There has been some discussion around this question and from what I have seen thus far, we lean toward option three in your list. @ycombinator, do you concur?

@cachedout cachedout self-assigned this Mar 29, 2019
@ycombinator
Copy link
Contributor

Option 3 has been tested and known to work. So I'd start out by documenting that right now.

However, in theory, option 1 could also work so I think its worth testing it out and coming up with docs around that too.

@ppf2
Copy link
Member Author

ppf2 commented Mar 29, 2019

Tested option 1 briefly, metricbeat (ES module) -> Kafka -> LS -> ES seems to work as well :)

For example, a conditional can be added to the output section of the LS config to route metricbeat ES module metrics to a separate monitoring cluster (for 6.5+ to 6.latest):

# route ES monitoring metrics collected by metricbeat elasticsearch module
# to ES monitoring cluster
# https example
if [metricset][module] == "elasticsearch"
{
  elasticsearch{
  index => ".monitoring-es-6-mb-%{+YYYY.MM.dd}"
  hosts => ["https://node1:9200"]
  cacert => "/path_to/ca.crt"
  user=>"elastic"
  password=>"password"
} else {
... <where your non-monitoring events will go>
}

Instead of having a conditional statement with 2 ES output, the alternative will be to build out hosts, index, etc.. variables upstream in the pipeline to substitute into a single elasticsearch output.

@ycombinator
Copy link
Contributor

ycombinator commented Mar 29, 2019

Good stuff, @ppf2, thanks so much for testing this out!

I wonder if it's safe for the conditional to just test for [metricset][module] == "elasticsearch". After all, this could be true even if xpack.enabled is false in the corresponding metricbeat module config.

Imagine a case where the user has configured the elasticsearch module in the same Metricbeat instance twice for some reason, once with xpack.enabled: true and once without. Or that there are two Metricbeat instances feeding data to the same LS instance, one configured with xpack.enabled: true in the elasticsearch module and one without.

I wonder if there's another piece of data/metadata in the event that Logstash receives from Metricbeat that we could use to make this check more robust. If there isn't, I wonder if we should inject something to this effect from the Elasticsearch Metricbeat module when xpack.enabled is set to true.

@ppf2
Copy link
Member Author

ppf2 commented Mar 29, 2019

How about this if clause instead?

# route ES monitoring metrics collected by metricbeat elasticsearch module
# to ES monitoring cluster
# https example
if [@metadata][index] =~ /^.monitoring-es*/
{
  elasticsearch{
  index => ".monitoring-es-6-mb-%{+YYYY.MM.dd}"
  hosts => ["https://node1:9200"]
  cacert => "/path_to/ca.crt"
  user=>"elastic"
  password=>"password"
} else {
... <where your non-monitoring events will go>
}

@ycombinator
Copy link
Contributor

ycombinator commented Mar 29, 2019

Perhaps we could generalize this a bit to work not just for ES stack monitoring data collected by Metricbeat but also other stack products' monitoring data? So something like:

# route monitoring metrics collected by metricbeat Elastic stack product module
# to ES monitoring cluster
# https example
if [@metadata][index] =~ /^.monitoring-*/ {
  if [@metadata][id] {
    elasticsearch {
      index => "%{[@metadata][index]}-%{+YYYY.MM.dd}"
      document_id => "%{[@metadata][id]}"
      hosts => ["https://node1:9200"]
      cacert => "/path_to/ca.crt"
      user=>"elastic"
      password=>"password"
    }
  } else {
    elasticsearch{
      index => "%{[@metadata][index]}-%{+YYYY.MM.dd}"
      hosts => ["https://node1:9200"]
      cacert => "/path_to/ca.crt"
      user=>"elastic"
      password=>"password"
    }
  }
} else {
... <where your non-monitoring events will go>
}

@cachedout
Copy link
Contributor

If indeed we have a solution that works and we agree on, in order to satisfy the original request , we need to document the recommended setup.

@lcawl Any suggestions on a home for this sort of information in the docs? I'm happy to discuss over slack/zoom as well to give more context.

@cachedout
Copy link
Contributor

Polite bump, @lcawl . Thanks!

@kkh-security-distractions

I implemented option 1 in a 3 node test cluster today. I am setting metricbeat to output to Logstash , then output to elasticsearch. at first glance it seemed to work fine. But then I noticed that the shard count on the nodes page is way off. It keeps incrementing going from correct number to several thousands over time. so I may start with 50 , then for every 10 secs , it gets to 100 , 150 ,200 and so. This was tested using 7.5.1 on RHEL.

My motivation for this is trying to come up with a fix , so I dont have to disable the system module in Metricbeat as it proves very valuable insights into other performance charactericstics of a given node,

@cachedout cachedout removed their assignment Jan 8, 2020
@cachedout
Copy link
Contributor

@lcawl and @ycombinator I've removed myself as owner of this issue after switching teams. Would one of you like to pick it up?

@ycombinator
Copy link
Contributor

@cachedout Sure.

@lcawl Can you take up bit about documenting Option 1? I can answer @kkh-security-distractions's question.

@ycombinator
Copy link
Contributor

ycombinator commented Jan 8, 2020

@kkh-security-distractions I assume you were using the Logstash fragment I had posted in my comment above:

# route monitoring metrics collected by metricbeat Elastic stack product module
# to ES monitoring cluster
# https example
if [@metadata][index] =~ /^.monitoring-*/
{
  elasticsearch{
    index => "%{[@metadata][index]}-%{+YYYY.MM.dd}"
    hosts => ["https://node1:9200"]
    cacert => "/path_to/ca.crt"
    user=>"elastic"
    password=>"password"
} else {
... <where your non-monitoring events will go>
}

Unfortunately, this fragment is not quite right. It works for most stack monitoring data except data about shards, as you obviously found out the hard way 😞. Sorry about that.

I've now updated the comment with a better fragment; please try that out. Note that you will need to clear out your existing monitoring data (DELETE .monitoring-es-*-mb*) first.

@kkh-security-distractions

@ycombinator ,Yesterday when I was comparing a working versus the non working setup , I did notice the rather odd "id" of the working setup. But I was not able to determine , whether it was important or not ;)

I changed my pipeline according to your suggestions and it seems have fixed the problem. Shard count seems steady now as it should be. I added some stuff in the filter section to get rid of the ECS fields from Metricbeat as this is not needed as I see it.

I will leave it running and see tomorrow , how it ends up. I suggest that someone improves the documention on the main metricbeat page to how incorporate this setup. I think many people with Ingest going through fx Kafka will appreciate this setup and still be able to keep the system module , which was my goal :)

Thnx for input.

@lcawl
Copy link
Contributor

lcawl commented Jan 9, 2020

@lcawl Can you take up bit about documenting Option 1?

Sorry, I somehow missed the earlier notifications on this one.

From what I understand in this issue, this is a less common configuration option. Since
the basic setup steps (https://www.elastic.co/guide/en/elasticsearch/reference/current/configuring-metricbeat.html) are already quite complex, I don't think it would be ideal to try to squeeze it in there. Instead, I think this would be appropriate for a separate piece of content describing a more advanced configuration scenario.

I had a chat with @dedemorton and I think our suggestion would be to put this in a blog, at least initially. If it becomes a common enough use case that we want to actively maintain it in the docs, we can revisit incorporating it.

@ycombinator
Copy link
Contributor

ycombinator commented Jan 9, 2020

Sounds good and makes sense, @lcawl and @dedemorton. I'll start working on a blog post soon.

@ppf2
Copy link
Member Author

ppf2 commented Jan 10, 2020

Blog post sounds good, let's cross link from the https://www.elastic.co/guide/en/elasticsearch/reference/current/configuring-metricbeat.html page to the public blog post location (once it is published). thx!

@ycombinator
Copy link
Contributor

@ppf Blog post is live: https://www.elastic.co/blog/elastic-stack-monitoring-with-metricbeat-via-logstash-or-kafka.

@lcawl WDYT about the linking idea that @ppf2 mentioned in the previous comment?

@botelastic
Copy link

botelastic bot commented Jan 5, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@botelastic botelastic bot added Stalled needs_team Indicates that the issue/PR needs a Team:* label labels Jan 5, 2021
@botelastic
Copy link

botelastic bot commented Jan 5, 2021

This issue doesn't have a Team:<team> label.

@botelastic botelastic bot closed this as completed Feb 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Feature:Stack Monitoring Metricbeat Metricbeat needs_team Indicates that the issue/PR needs a Team:* label Stalled
Projects
None yet
Development

No branches or pull requests

6 participants