Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upconsul: more efficient tag and service filtering #3711
Comments
iksaif
changed the title
More efficient tag filtering for Prometheus
consul: more efficient tag filtering
Jan 19, 2018
This comment has been minimized.
This comment has been minimized.
|
cc: @vguerci, @Thib17, @karthimohan, @mchataigner |
This comment has been minimized.
This comment has been minimized.
|
I understand where you are coming from however as noted on several of those issues, Prometheus uses service discovery that exists - we don't invent new one or variants thereof. If there are filtering options that Consul supports natively then they can be considered for addition. Service-meta as is discussed on hashicorp/consul#2549 sounds like a better way to handle this (as would more efficient watching), and I am against adding short-term hacks to Prometheus which will ultimately lead to impedance mismatches. The place to tackle this problem is at its source. Put another way, we already have one custom way of doing filtering in Prometheus (relabelling) I don't think we should be adding more. |
This comment has been minimized.
This comment has been minimized.
|
My point is that The mechanism is already here, we just do not use it. |
This comment has been minimized.
This comment has been minimized.
|
A config option to pass a tag to the |
This comment has been minimized.
This comment has been minimized.
|
Great ! :) Will do that. |
This comment has been minimized.
This comment has been minimized.
|
You might also want to check if #3592 should also be applied to the Consul SD code. |
This comment has been minimized.
This comment has been minimized.
|
Ok, I will experiment with that too but I expect it to be beneficial only when the services in the catalog are disapearing. What we see are not idle connections, but blocking queries to every defined service in the catalog. |
iksaif
changed the title
consul: more efficient tag filtering
consul: more efficient tag and service filtering
Jan 30, 2018
This comment has been minimized.
This comment has been minimized.
|
We just found out that the blocking query to the catalog (one per job) could also get very expensive both in term of CPU and bandwidth (mostly because if you are big enough, the catalog is refresh all the time). There are also issues with non-stale reads to other datacenters. I'll come up with number of graphs this week and next. Then propose ways to make that less of an issue for big users. For example I think that we could get rid of the catalog call if we already have the service names. |
This comment has been minimized.
This comment has been minimized.
|
Services could be added over time, and we should pick that up. I imagine that's even more true for big users. |
This comment has been minimized.
This comment has been minimized.
|
I agree that the current (user-facing) behavior is the correct one. I'm trying to find a more resource efficient implementation of the same behavior. Currently I know that we do ~1000qps per dc on consul, and some of these queries are non-stale queries to leaders. |
kamaradclimber
added a commit
to criteo-forks/consul
that referenced
this issue
Jan 30, 2018
This comment has been minimized.
This comment has been minimized.
|
For future reference, since one of my patch will include the ability to do stale reads, here is the impact on our cluster of not using stale reads (prometheus being one of the top issuer of non-stale read queries): |
iksaif
pushed a commit
to criteo-forks/prometheus
that referenced
this issue
Mar 16, 2018
iksaif
pushed a commit
to iksaif/prometheus
that referenced
this issue
Mar 21, 2018
brian-brazil
added a commit
that referenced
this issue
Mar 23, 2018
brian-brazil
closed this
Apr 3, 2018
sipian
pushed a commit
to sipian/prometheus
that referenced
this issue
May 18, 2018
This comment has been minimized.
This comment has been minimized.
lock
bot
commented
Mar 22, 2019
|
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
iksaif commentedJan 19, 2018
•
edited
This is related to:
The official answer so far as been "use relabeling", but doing that ends up opening one TCP connection per service (it looks like all the services are being watched). This happens because if you don't specify a service name, and drop using relabeling
shouldWatch()will always return True and all services will be watched (which is currently very inefficient with consul, and clearly non-necessary).According to https://www.consul.io/api/catalog.html#list-services, we still can only filter on
dcandnode-service. As discussed in hashicorp/consul#2549 they would like to also addservice-metahere, but that hasn't been done (and that was almost a year ago).Currently the discovery/consul/consul.go simply watches the Catalog without any restrictions, but
Services()returns both the service names and tags, so we can start to do some filtering here based on the tags (example in https://github.com/hashicorp/consul/blob/b3292d13fb8bbc8b14b2a1e2bbae29c6e105b8f4/command/catalog/list/services/catalog_list_services.go#L85)catalog.Service()can take an also optionaltagargument that could be used to do some filtering server-side, so we can use that too to get notified only when necessary.I (or somebody else from my team) will try to get a PR ready next week as it is quite annoying to have thousands of connections for nothing, and relabeling isn't that obvious to do this.