-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Metricbeat elasticsearch module when output is Kafka #11519
Comments
Pinging @elastic/stack-monitoring |
There has been some discussion around this question and from what I have seen thus far, we lean toward option three in your list. @ycombinator, do you concur? |
Option 3 has been tested and known to work. So I'd start out by documenting that right now. However, in theory, option 1 could also work so I think its worth testing it out and coming up with docs around that too. |
Tested option 1 briefly, metricbeat (ES module) -> Kafka -> LS -> ES seems to work as well :) For example, a conditional can be added to the output section of the LS config to route metricbeat ES module metrics to a separate monitoring cluster (for 6.5+ to 6.latest):
Instead of having a conditional statement with 2 ES output, the alternative will be to build out hosts, index, etc.. variables upstream in the pipeline to substitute into a single elasticsearch output. |
Good stuff, @ppf2, thanks so much for testing this out! I wonder if it's safe for the conditional to just test for Imagine a case where the user has configured the I wonder if there's another piece of data/metadata in the event that Logstash receives from Metricbeat that we could use to make this check more robust. If there isn't, I wonder if we should inject something to this effect from the Elasticsearch Metricbeat module when |
How about this if clause instead?
|
Perhaps we could generalize this a bit to work not just for ES stack monitoring data collected by Metricbeat but also other stack products' monitoring data? So something like:
|
If indeed we have a solution that works and we agree on, in order to satisfy the original request , we need to document the recommended setup. @lcawl Any suggestions on a home for this sort of information in the docs? I'm happy to discuss over slack/zoom as well to give more context. |
Polite bump, @lcawl . Thanks! |
I implemented option 1 in a 3 node test cluster today. I am setting metricbeat to output to Logstash , then output to elasticsearch. at first glance it seemed to work fine. But then I noticed that the shard count on the nodes page is way off. It keeps incrementing going from correct number to several thousands over time. so I may start with 50 , then for every 10 secs , it gets to 100 , 150 ,200 and so. This was tested using 7.5.1 on RHEL. My motivation for this is trying to come up with a fix , so I dont have to disable the system module in Metricbeat as it proves very valuable insights into other performance charactericstics of a given node, |
@lcawl and @ycombinator I've removed myself as owner of this issue after switching teams. Would one of you like to pick it up? |
@cachedout Sure. @lcawl Can you take up bit about documenting Option 1? I can answer @kkh-security-distractions's question. |
@kkh-security-distractions I assume you were using the Logstash fragment I had posted in my comment above:
Unfortunately, this fragment is not quite right. It works for most stack monitoring data except data about shards, as you obviously found out the hard way 😞. Sorry about that. I've now updated the comment with a better fragment; please try that out. Note that you will need to clear out your existing monitoring data ( |
@ycombinator ,Yesterday when I was comparing a working versus the non working setup , I did notice the rather odd "id" of the working setup. But I was not able to determine , whether it was important or not ;) I changed my pipeline according to your suggestions and it seems have fixed the problem. Shard count seems steady now as it should be. I added some stuff in the filter section to get rid of the ECS fields from Metricbeat as this is not needed as I see it. I will leave it running and see tomorrow , how it ends up. I suggest that someone improves the documention on the main metricbeat page to how incorporate this setup. I think many people with Ingest going through fx Kafka will appreciate this setup and still be able to keep the system module , which was my goal :) Thnx for input. |
Sorry, I somehow missed the earlier notifications on this one. From what I understand in this issue, this is a less common configuration option. Since I had a chat with @dedemorton and I think our suggestion would be to put this in a blog, at least initially. If it becomes a common enough use case that we want to actively maintain it in the docs, we can revisit incorporating it. |
Sounds good and makes sense, @lcawl and @dedemorton. I'll start working on a blog post soon. |
Blog post sounds good, let's cross link from the https://www.elastic.co/guide/en/elasticsearch/reference/current/configuring-metricbeat.html page to the public blog post location (once it is published). thx! |
@ppf Blog post is live: https://www.elastic.co/blog/elastic-stack-monitoring-with-metricbeat-via-logstash-or-kafka. @lcawl WDYT about the linking idea that @ppf2 mentioned in the previous comment? |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
This issue doesn't have a |
I am linking this to one of the items on the main issue on monitoring ES via metricbeat (#7035).
One aspect we haven't talked about much (or documented) is what happens when the user's metricbeat is configured to route all events through Kafka (using output.kafka).
Per our guidelines today (https://www.elastic.co/guide/en/elasticsearch/reference/current/configuring-metricbeat.html), the configuration of the Elasticsearch module requires the output to be output.elasticsearch.
What is the recommended set up here for output.kafka users?
Will they send everything through output.kafka, and have separate Logstash ES outputs downstream, 1 for regular events to the production cluster, and 1 for routing metricbeat ES stack module events to .monitoring-es* indices on the remote monitoring cluster?
Or is there a way to reuse Logstash's xpack.monitoring.elasticsearch.hosts for the connection to route the metricbeat ES stack module events to the remote monitoring cluster?
Will they have to set up a 2nd metricbeat (with output.elasticsearch just for the ES stack modules) to route events directly to the remote monitoring cluster, while the original metricbeat instance will continue to send other events through Kafka.
Until we figure out our story on this, it will be helpful to update https://www.elastic.co/guide/en/elasticsearch/reference/current/configuring-metricbeat.html with some information on our current guidelines when metricbeat does not have an output.elasticsearch (or maybe it's simply a not-currently-supported statement, etc..). Thx!
The text was updated successfully, but these errors were encountered: