X-Consul-Index behaves strangely on non-existing services with an existing service prefix #3712

pierresouchay · 2017-11-25T00:32:36Z

`consul version` for both Client and Server

Client: 0.9.3
Server: 0.9.3

Description of the Issue (and unexpected/desired result)

We have several large clusters (~5k hosts per clusters) connected thru WAN and have some scalability issues. During one of my investigations, I noticed a strange bug:

When requesting a non-existing service with a X-Consul-Index, consul blocks as long as the timeout occurs or the service is being declared => OK
When requesting an existing service with a X-Consul-Index fetched on this service, consul blocks as the timeout occurs or the service gets updated => OK
But, when requesting a non-existing service with a X-Consul-Index, if another service with a shorter name exists, Consul does not block and all updates of X-Consul-Index are getting catched.

Reproduction steps

Create this script called consul_check_services_changes.sh

#!/bin/bash
if test ${1:-none} = "none"
then
  echo "USAGE: $0 Consul_URL"
  echo "       Example: localhost:8500/v1/health/service/MY_SUPER_SERVICE"
  exit 1
fi
url_to_check=$1

headers=$(mktemp)
content=$(mktemp)
index=0
while true;
do
  url="${url_to_check}?wait=5m&index=${index}&pretty=true&stale"
  curl -fs --dump-header "$headers" -o "${content}.new" "${url}" || { "echo Failed to query ${url}"; exit 1; }
  if test $index -ne 0
  then
    diff -u "$content" "$content.new" && echo " diff: No Differences found in service"
  fi
  index=$(grep "X-Consul-Index" "$headers" | sed 's/[^0-9]*\([0-9][0-9]*\)[^0-9]*/\1/g')
  mv "$content.new" "$content"
  printf "X-Consul-Index: $index at $(date) \b"
done

Call it on a non-existing service:

./consul_check_services_changes.sh consul-agent:8500/v1/catalog/service/non-existing-service
X-Consul-Index: 37968409 at Sat 25 Nov 2017 01:27:09 CET

Blocks -> OK

Call it on a real service:

consul_check_services_changes.sh consul-agent:8500/v1/catalog/service/consul
X-Consul-Index: 37968524 at Sat 25 Nov 2017 01:27:31 CET

Blocks as long as the service is not updated -> OK

Call it on a service that does not exists but has an existing service as prefix

consul_check_services_changes.sh consul-agent:8500/v1/catalog/service/consul2
X-Consul-Index: 37968837 at Sat 25 Nov 2017 01:28:06 CET diff: No Differences found in service
X-Consul-Index: 37968864 at Sat 25 Nov 2017 01:28:07 CET diff: No Differences found in service
X-Consul-Index: 37968865 at Sat 25 Nov 2017 01:28:07 CET diff: No Differences found in service
X-Consul-Index: 37968868 at Sat 25 Nov 2017 01:28:07 CET diff: No Differences found in service
X-Consul-Index: 37968869 at Sat 25 Nov 2017 01:28:07 CET diff: No Differences found in service
X-Consul-Index: 37968877 at Sat 25 Nov 2017 01:28:08 CET diff: No Differences found in service
X-Consul-Index: 37968878 at Sat 25 Nov 2017 01:28:08 CET diff: No Differences found in service
X-Consul-Index: 37968879 at Sat 25 Nov 2017 01:28:08 CET diff: No Differences found in service
X-Consul-Index: 37968880 at Sat 25 Nov 2017 01:28:09 CET diff: No Differences found in service
X-Consul-Index: 37968887 at Sat 25 Nov 2017 01:28:09 CET diff: No Differences found in service
X-Consul-Index: 37968888 at Sat 25 Nov 2017 01:28:09 CET diff: No Differences found in service
X-Consul-Index: 37968896 at Sat 25 Nov 2017 01:28:12 CET diff: No Differences found in service
X-Consul-Index: 37968898 at Sat 25 Nov 2017 01:28:12 CET diff: No Differences found in service
X-Consul-Index: 37968914 at Sat 25 Nov 2017 01:28:12 CET diff: No Differences found in service
X-Consul-Index: 37968915 at Sat 25 Nov 2017 01:28:12 CET diff: No Differences found in service
X-Consul-Index: 37968919 at Sat 25 Nov 2017 01:28:13 CET diff: No Differences found in service
X-Consul-Index: 37968920 at Sat 25 Nov 2017 01:28:13 CET diff: No Differences found in service
X-Consul-Index: 37968921 at Sat 25 Nov 2017 01:28:14 CET diff: No Differences found in service
[...]

Basically, we are notified on each change of X-Consul-Index for such services. Thus I suspect a significant but in the implementation. (It should behave the same way as either and existing service or a non-existing service right ?

The text was updated successfully, but these errors were encountered:

slackpad · 2017-12-19T00:55:39Z

Hi @pierresouchay thanks for opening an issue - I was wondering when someone would notice this :-)

This is actually working as intended, though it is a little weird. Internally, Consul stores all of its data like the catalog using https://github.com/hashicorp/go-memdb which uses radix trees for prefix-based indexing. There's a watch mechanism whereby queries return a "watch set" with a set of channels to be notified if something changes (here's an example). When you query for a non-existent entry, you end up with a high-level node in the radix tree to watch on, up to possibly the root node if there's not much data in the index. Because blocking queries can always wake up without returning new data we decided not to optimize this further and add more complexity to the query code, though we did think about it. The results for a non-existent query are cheap to serve, and once the index gets populated with some real data you usually end up watching a more specific prefix node.

Your case is kind of pathological because you have some other service that's a prefix of the one you are watching, so indeed you'd get some churn in there. If you are really seeing a few QPS in practice it's probably not a huge deal, though it might be worth addressing if it's causing huge amounts of bogus requests.

weiwei04 · 2018-01-17T09:40:16Z

Hi @slackpad, unlike @pierresouchay, we rely on this intended behavior.

We have about 600+ different services and the number is still growing, the services from same department have same prefix in service names, so our LoadBalancer use prefix to block query those services instead of 1 block query per services. Can we rely on this intended behavior on future consul release?

For example:

services:
com.a.a0, com.a.a1, com.a.a2
com.b.b0, com.b.b1, com.b.b2

we block query com.a to get a0, a1, a2 service update, block query com.b to get b0, b1, b2 service update.

slackpad · 2018-01-25T01:33:41Z

Hi @weiwei04 that's an interesting use case. Because of the internal implementation this should work because everything is managed with a prefix tree, and we don't have plans to change that, but it's not an explicit part of the interface contract. It might be better as an enhancement request to make a more explicit new API to watch for changes for service by prefix.

- Do not require a call to the catalog if services are specified by name. - Add `allow_stale` configuration option to do stale reads. Non-stale reads can be costly, even more when you are doing them to a remote datacenter with 10k+ targets over WAN (which is common for federation). - Add `refresh_interval` to minimize the strain on the catalog and on the service endpoint. This is needed because of that kind of behavior from consul: hashicorp/consul#3712 and because a catalog on a large cluster would basically change *all* the time. No need to discover targets in 1sec if we scrape them every minute. TODO get number: CPU/RAM/QPS/Latency for - filter by relabel (before) vs. filter by tag (after) - filter by service before after

Related to prometheus#3711 - Add the ability to filter by tag and node-meta in an efficient way (`/catalog/services` allow filtering by node-meta, and returns a `map[string]string` or `service`->`tags`). Tags and nore-meta are also used in `/catalog/service` requests. - Do not require a call to the catalog if services are specified by name. This is important because on large cluster `/catalog/services` changes all the time. - Add `allow_stale` configuration option to do stale reads. Non-stale reads can be costly, even more when you are doing them to a remote datacenter with 10k+ targets over WAN (which is common for federation). - Add `refresh_interval` to minimize the strain on the catalog and on the service endpoint. This is needed because of that kind of behavior from consul: hashicorp/consul#3712 and because a catalog on a large cluster would basically change *all* the time. No need to discover targets in 1sec if we scrape them every minute. - Added plenty of unit tests. Benchmarks ---------- ```yaml scrape_configs: - job_name: prometheus scrape_interval: 60s static_configs: - targets: ["127.0.0.1:9090"] - job_name: "observability-by-tag" scrape_interval: "60s" metrics_path: "/metrics" consul_sd_configs: - server: consul.service.par.consul.prod.crto.in:8500 tag: marathon-user-observability # Used in After refresh_interval: 30s # Used in After+delay relabel_configs: - source_labels: [__meta_consul_tags] regex: ^(.*,)?marathon-user-observability(,.*)?$ action: keep - job_name: "observability-by-name" scrape_interval: "60s" metrics_path: "/metrics" consul_sd_configs: - server: consul.service.par.consul.prod.crto.in:8500 services: - observability-cerebro - observability-portal-web - job_name: "fake-fake-fake" scrape_interval: "15s" metrics_path: "/metrics" consul_sd_configs: - server: consul.service.par.consul.prod.crto.in:8500 services: - fake-fake-fake ``` Note: tested with ~1200 services, ~5000 nodes. | Resource | Empty | Before | After | After + delay | | -------- |:-----:|:------:|:-----:|:-------------:| |/service-discovery size|5K|85MiB|27k|27k|27k| |`go_memstats_heap_objects`|100k|1M|120k|110k| |`go_memstats_heap_alloc_bytes`|24MB|150MB|28MB|27MB| |`rate(go_memstats_alloc_bytes_total[5m])`|0.2MB/s|28MB/s|2MB/s|0.3MB/s| |`rate(process_cpu_seconds_total[5m])`|0.1%|15%|2%|0.01%| |`process_open_fds`|16|*1236*|22|22| |`rate(prometheus_sd_consul_rpc_duration_seconds_count{call="services"}[5m])`|~0|1|1|*0.03*| |`rate(prometheus_sd_consul_rpc_duration_seconds_count{call="service"}[5m])`|0.1|*80*|0.5|0.5| |`prometheus_target_sync_length_seconds{quantile="0.9",scrape_job="observability-by-tag"}`|N/A|200ms|0.2ms|0.2ms| |Network bandwidth|~10kbps|~2.8Mbps|~1.6Mbps|~10kbps| Filtering by tag using relabel_configs uses **100kiB and 23kiB/s per service per job** and quite a lot of CPU. Also sends and additional *1Mbps* of traffic to consul. Being a little bit smarter about this reduces the overhead quite a lot. Limiting the number of `/catalog/services` queries per second almost removes the overhead of service discovery.

* consul: improve consul service discovery Related to #3711 - Add the ability to filter by tag and node-meta in an efficient way (`/catalog/services` allow filtering by node-meta, and returns a `map[string]string` or `service`->`tags`). Tags and nore-meta are also used in `/catalog/service` requests. - Do not require a call to the catalog if services are specified by name. This is important because on large cluster `/catalog/services` changes all the time. - Add `allow_stale` configuration option to do stale reads. Non-stale reads can be costly, even more when you are doing them to a remote datacenter with 10k+ targets over WAN (which is common for federation). - Add `refresh_interval` to minimize the strain on the catalog and on the service endpoint. This is needed because of that kind of behavior from consul: hashicorp/consul#3712 and because a catalog on a large cluster would basically change *all* the time. No need to discover targets in 1sec if we scrape them every minute. - Added plenty of unit tests. Benchmarks ---------- ```yaml scrape_configs: - job_name: prometheus scrape_interval: 60s static_configs: - targets: ["127.0.0.1:9090"] - job_name: "observability-by-tag" scrape_interval: "60s" metrics_path: "/metrics" consul_sd_configs: - server: consul.service.par.consul.prod.crto.in:8500 tag: marathon-user-observability # Used in After refresh_interval: 30s # Used in After+delay relabel_configs: - source_labels: [__meta_consul_tags] regex: ^(.*,)?marathon-user-observability(,.*)?$ action: keep - job_name: "observability-by-name" scrape_interval: "60s" metrics_path: "/metrics" consul_sd_configs: - server: consul.service.par.consul.prod.crto.in:8500 services: - observability-cerebro - observability-portal-web - job_name: "fake-fake-fake" scrape_interval: "15s" metrics_path: "/metrics" consul_sd_configs: - server: consul.service.par.consul.prod.crto.in:8500 services: - fake-fake-fake ``` Note: tested with ~1200 services, ~5000 nodes. | Resource | Empty | Before | After | After + delay | | -------- |:-----:|:------:|:-----:|:-------------:| |/service-discovery size|5K|85MiB|27k|27k|27k| |`go_memstats_heap_objects`|100k|1M|120k|110k| |`go_memstats_heap_alloc_bytes`|24MB|150MB|28MB|27MB| |`rate(go_memstats_alloc_bytes_total[5m])`|0.2MB/s|28MB/s|2MB/s|0.3MB/s| |`rate(process_cpu_seconds_total[5m])`|0.1%|15%|2%|0.01%| |`process_open_fds`|16|*1236*|22|22| |`rate(prometheus_sd_consul_rpc_duration_seconds_count{call="services"}[5m])`|~0|1|1|*0.03*| |`rate(prometheus_sd_consul_rpc_duration_seconds_count{call="service"}[5m])`|0.1|*80*|0.5|0.5| |`prometheus_target_sync_length_seconds{quantile="0.9",scrape_job="observability-by-tag"}`|N/A|200ms|0.2ms|0.2ms| |Network bandwidth|~10kbps|~2.8Mbps|~1.6Mbps|~10kbps| Filtering by tag using relabel_configs uses **100kiB and 23kiB/s per service per job** and quite a lot of CPU. Also sends and additional *1Mbps* of traffic to consul. Being a little bit smarter about this reduces the overhead quite a lot. Limiting the number of `/catalog/services` queries per second almost removes the overhead of service discovery. * consul: tweak `refresh_interval` behavior `refresh_interval` now does what is advertised in the documentation, there won't be more that one update per `refresh_interval`. It now defaults to 30s (which was also the current waitTime in the consul query). This also make sure we don't wait another 30s if we already waited 29s in the blocking call by substracting the number of elapsed seconds. Hopefully this will do what people expect it does and will be safer for existing consul infrastructures.

* consul: improve consul service discovery Related to prometheus#3711 - Add the ability to filter by tag and node-meta in an efficient way (`/catalog/services` allow filtering by node-meta, and returns a `map[string]string` or `service`->`tags`). Tags and nore-meta are also used in `/catalog/service` requests. - Do not require a call to the catalog if services are specified by name. This is important because on large cluster `/catalog/services` changes all the time. - Add `allow_stale` configuration option to do stale reads. Non-stale reads can be costly, even more when you are doing them to a remote datacenter with 10k+ targets over WAN (which is common for federation). - Add `refresh_interval` to minimize the strain on the catalog and on the service endpoint. This is needed because of that kind of behavior from consul: hashicorp/consul#3712 and because a catalog on a large cluster would basically change *all* the time. No need to discover targets in 1sec if we scrape them every minute. - Added plenty of unit tests. Benchmarks ---------- ```yaml scrape_configs: - job_name: prometheus scrape_interval: 60s static_configs: - targets: ["127.0.0.1:9090"] - job_name: "observability-by-tag" scrape_interval: "60s" metrics_path: "/metrics" consul_sd_configs: - server: consul.service.par.consul.prod.crto.in:8500 tag: marathon-user-observability # Used in After refresh_interval: 30s # Used in After+delay relabel_configs: - source_labels: [__meta_consul_tags] regex: ^(.*,)?marathon-user-observability(,.*)?$ action: keep - job_name: "observability-by-name" scrape_interval: "60s" metrics_path: "/metrics" consul_sd_configs: - server: consul.service.par.consul.prod.crto.in:8500 services: - observability-cerebro - observability-portal-web - job_name: "fake-fake-fake" scrape_interval: "15s" metrics_path: "/metrics" consul_sd_configs: - server: consul.service.par.consul.prod.crto.in:8500 services: - fake-fake-fake ``` Note: tested with ~1200 services, ~5000 nodes. | Resource | Empty | Before | After | After + delay | | -------- |:-----:|:------:|:-----:|:-------------:| |/service-discovery size|5K|85MiB|27k|27k|27k| |`go_memstats_heap_objects`|100k|1M|120k|110k| |`go_memstats_heap_alloc_bytes`|24MB|150MB|28MB|27MB| |`rate(go_memstats_alloc_bytes_total[5m])`|0.2MB/s|28MB/s|2MB/s|0.3MB/s| |`rate(process_cpu_seconds_total[5m])`|0.1%|15%|2%|0.01%| |`process_open_fds`|16|*1236*|22|22| |`rate(prometheus_sd_consul_rpc_duration_seconds_count{call="services"}[5m])`|~0|1|1|*0.03*| |`rate(prometheus_sd_consul_rpc_duration_seconds_count{call="service"}[5m])`|0.1|*80*|0.5|0.5| |`prometheus_target_sync_length_seconds{quantile="0.9",scrape_job="observability-by-tag"}`|N/A|200ms|0.2ms|0.2ms| |Network bandwidth|~10kbps|~2.8Mbps|~1.6Mbps|~10kbps| Filtering by tag using relabel_configs uses **100kiB and 23kiB/s per service per job** and quite a lot of CPU. Also sends and additional *1Mbps* of traffic to consul. Being a little bit smarter about this reduces the overhead quite a lot. Limiting the number of `/catalog/services` queries per second almost removes the overhead of service discovery. * consul: tweak `refresh_interval` behavior `refresh_interval` now does what is advertised in the documentation, there won't be more that one update per `refresh_interval`. It now defaults to 30s (which was also the current waitTime in the consul query). This also make sure we don't wait another 30s if we already waited 29s in the blocking call by substracting the number of elapsed seconds. Hopefully this will do what people expect it does and will be safer for existing consul infrastructures. Signed-off-by: sipian <cs15btech11019@iith.ac.in>

pierresouchay · 2018-09-09T16:16:56Z

Fixed in Prometheus and Consul

pierresouchay mentioned this issue Nov 25, 2017

Healthchecks resynced every ~2mins, causes consul index to grow #3259

Closed

slackpad added type/enhancement Proposed improvement or new feature theme/api Relating to the HTTP API interface thinking More time is needed to research by the Consul Contributors labels Dec 19, 2017

slackpad added this to the Unplanned milestone Dec 19, 2017

iksaif mentioned this issue Feb 8, 2018

consul: improve consul service discovery prometheus/prometheus#3814

Merged

pierresouchay mentioned this issue Feb 15, 2018

X-Consul-Index reflects global service churn when watching a single service causing issues with sleeping between blocking calls #3890

Closed

pierresouchay mentioned this issue May 18, 2018

When watching all services, consul-template is DOSing the Consul agent hashicorp/consul-template#1065

Open

pierresouchay closed this as completed Sep 9, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

X-Consul-Index behaves strangely on non-existing services with an existing service prefix #3712

X-Consul-Index behaves strangely on non-existing services with an existing service prefix #3712

pierresouchay commented Nov 25, 2017

slackpad commented Dec 19, 2017

weiwei04 commented Jan 17, 2018

slackpad commented Jan 25, 2018

pierresouchay commented Sep 9, 2018

X-Consul-Index behaves strangely on non-existing services with an existing service prefix #3712

X-Consul-Index behaves strangely on non-existing services with an existing service prefix #3712

Comments

pierresouchay commented Nov 25, 2017

consul version for both Client and Server

Description of the Issue (and unexpected/desired result)

Reproduction steps

slackpad commented Dec 19, 2017

weiwei04 commented Jan 17, 2018

slackpad commented Jan 25, 2018

pierresouchay commented Sep 9, 2018

`consul version` for both Client and Server