Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serf health check failures #1180

Closed
blakeblackshear opened this issue Aug 17, 2015 · 16 comments
Closed

Serf health check failures #1180

blakeblackshear opened this issue Aug 17, 2015 · 16 comments
Assignees

Comments

@blakeblackshear
Copy link

I am seeing intermittent serf failures across all my nodes. There doesn't seem to be any real consistency. Random nodes, random times, and very short lived failures.

2015/08/17 14:50:04 [INFO] consul: New leader elected: consul-10-0-22-12
2015/08/17 14:55:11 [INFO] agent.rpc: Accepted client: 127.0.0.1:44640
2015/08/17 15:00:05 [INFO] consul: New leader elected: consul-10-0-22-12
2015/08/17 15:01:04 [INFO] consul: New leader elected: consul-10-0-155-171
2015/08/17 15:07:39 [INFO] agent: Synced check 'updates_check'
2015/08/17 15:10:12 [INFO] serf: EventMemberFailed: redis-10-0-20-103 10.0.20.103
2015/08/17 15:10:23 [INFO] serf: EventMemberJoin: redis-10-0-20-103 10.0.20.103
2015/08/17 15:20:04 [INFO] consul: New leader elected: consul-10-0-155-171
2015/08/17 15:30:04 [INFO] consul: New leader elected: consul-10-0-155-171
2015/08/17 15:30:13 [INFO] serf: EventMemberFailed: logstash-10-0-149-216 10.0.149.216
2015/08/17 15:30:13 [INFO] serf: EventMemberJoin: logstash-10-0-149-216 10.0.149.216

Here are the ACL rules for my subnet:

Rule # Type Protocol Port Range Source Allow / Deny
100 ALL Traffic ALL ALL 0.0.0.0/0
* ALL Traffic ALL ALL 0.0.0.0/0

Every machine in the vpc has this security group:

Type Protocol Port Range Source
Custom TCP Rule TCP (6) 8301 sg-7a81181d
Custom UDP Rule UDP (17) 8301 sg-7a81181d

The servers have these rules defined as well:

Type Protocol Port Range Source
Custom TCP Rule TCP (6) 8300 sg-7a81181d

Here is my consul server config:

{
  "ui_dir": "/opt/consul/dist",
    "retry_join": [
    "10.0.22.12",
    "10.0.64.26",
    "10.0.155.171"
],
    "domain": "ngd.",
  "data_dir": "/opt/consul/data",
  "log_level": "INFO",
  "server": true,
  "client_addr": "127.0.0.1",
  "bind_addr": "10.0.155.171",
  "rejoin_after_leave": true,
  "bootstrap_expect": 3,
  "bootstrap": false,
  "datacenter": "us-east-1",
  "disable_remote_exec": true
}

What am I missing?

@blakeblackshear
Copy link
Author

I also seem to be getting new leader elections periodically:

2015/08/17 16:10:03 [WARN] raft: Heartbeat timeout reached, starting election
2015/08/17 16:10:03 [INFO] raft: Node at 10.0.155.171:8300 [Candidate] entering Candidate state
2015/08/17 16:10:03 [INFO] raft: Election won. Tally: 2
2015/08/17 16:10:03 [INFO] raft: Node at 10.0.155.171:8300 [Leader] entering Leader state
2015/08/17 16:10:03 [INFO] consul: cluster leadership acquired
2015/08/17 16:10:03 [INFO] consul: New leader elected: consul-10-0-155-171
2015/08/17 16:10:03 [INFO] raft: pipelining replication to peer 10.0.22.12:8300
2015/08/17 16:10:03 [INFO] raft: pipelining replication to peer 10.0.64.26:8300

@blakeblackshear
Copy link
Author

This is v0.5.2.

@highlyunavailable
Copy link
Contributor

Your network config looks fine to me - very similar to what I run. Some questions about your environment:

  1. Do you have any health checks that have extremely frequent data changes in their output? E.g. a check that does a curl that returns a timestamp. This causes a lot of gossip traffic which can block health traffic.
  2. Are the "failing" nodes under heavy CPU pressure at the time of failure?
  3. Do you use consul-template or consul watch and do you have any complicated/watch heavy templates or high numbers of watching keys?

Also can you post the telemetry from your leader?

@blakeblackshear
Copy link
Author

  1. No
  2. No (1-5%)
  3. I have one watch for all health checks running on one consul server, and my load balancers use consul-template to update the configs for a few services.

Here is the telemetry:

[2015-08-17 16:50:20 +0000 UTC][G] 'consul.runtime.heap_objects': 28242.000
[2015-08-17 16:50:20 +0000 UTC][G] 'consul.runtime.num_goroutines': 127.000
[2015-08-17 16:50:20 +0000 UTC][G] 'consul.runtime.alloc_bytes': 5100752.000
[2015-08-17 16:50:20 +0000 UTC][G] 'consul.runtime.sys_bytes': 17905912.000
[2015-08-17 16:50:20 +0000 UTC][G] 'consul.runtime.total_gc_runs': 17267.000
[2015-08-17 16:50:20 +0000 UTC][G] 'consul.consul.session_ttl.active': 0.000
[2015-08-17 16:50:20 +0000 UTC][G] 'consul.runtime.malloc_count': 244283840.000
[2015-08-17 16:50:20 +0000 UTC][G] 'consul.runtime.free_count': 244255600.000
[2015-08-17 16:50:20 +0000 UTC][G] 'consul.runtime.total_gc_pause_ns': 47665496064.000
[2015-08-17 16:50:20 +0000 UTC][C] 'consul.consul.rpc.accept_conn': Count: 1 Sum: 1.000
[2015-08-17 16:50:20 +0000 UTC][C] 'consul.consul.rpc.query': Count: 2 Sum: 2.000
[2015-08-17 16:50:20 +0000 UTC][C] 'consul.consul.rpc.request': Count: 2 Sum: 2.000
[2015-08-17 16:50:20 +0000 UTC][C] 'consul.raft.replication.appendEntries.logs.10.0.64.26:8300': Count: 135 Sum: 0.000
[2015-08-17 16:50:20 +0000 UTC][C] 'consul.raft.replication.appendEntries.logs.10.0.22.12:8300': Count: 131 Sum: 0.000
[2015-08-17 16:50:20 +0000 UTC][C] 'consul.memberlist.udp.sent': Count: 21 Min: 22.000 Mean: 28.381 Max: 40.000 Stddev: 6.996 Sum: 596.000
[2015-08-17 16:50:20 +0000 UTC][C] 'consul.memberlist.udp.received': Count: 21 Min: 22.000 Mean: 30.381 Max: 38.000 Stddev: 8.188 Sum: 638.000
[2015-08-17 16:50:20 +0000 UTC][S] 'consul.serf.queue.Intent': Count: 20 Sum: 0.000
[2015-08-17 16:50:20 +0000 UTC][S] 'consul.serf.queue.Event': Count: 20 Sum: 0.000
[2015-08-17 16:50:20 +0000 UTC][S] 'consul.serf.queue.Query': Count: 20 Sum: 0.000
[2015-08-17 16:50:20 +0000 UTC][S] 'consul.memberlist.probeNode': Count: 10 Min: 0.971 Mean: 1.910 Max: 5.028 Stddev: 1.241 Sum: 19.100
[2015-08-17 16:50:20 +0000 UTC][S] 'consul.raft.replication.appendEntries.rpc.10.0.22.12:8300': Count: 131 Min: 0.951 Mean: 1.167 Max: 2.230 Stddev: 0.179 Sum: 152.839
[2015-08-17 16:50:20 +0000 UTC][S] 'consul.raft.replication.heartbeat.10.0.22.12:8300': Count: 66 Min: 0.963 Mean: 1.142 Max: 1.829 Stddev: 0.151 Sum: 75.367
[2015-08-17 16:50:20 +0000 UTC][S] 'consul.memberlist.gossip': Count: 70 Min: 0.001 Mean: 0.007 Max: 0.024 Stddev: 0.004 Sum: 0.478
[2015-08-17 16:50:20 +0000 UTC][S] 'consul.raft.replication.heartbeat.10.0.64.26:8300': Count: 67 Min: 1.130 Mean: 1.302 Max: 1.992 Stddev: 0.130 Sum: 87.249
[2015-08-17 16:50:20 +0000 UTC][S] 'consul.raft.leader.lastContact': Count: 44 Min: 0.000 Mean: 27.909 Max: 72.000 Stddev: 21.622 Sum: 1228.000
[2015-08-17 16:50:20 +0000 UTC][S] 'consul.raft.replication.appendEntries.rpc.10.0.64.26:8300': Count: 135 Min: 1.106 Mean: 1.379 Max: 9.520 Stddev: 0.730 Sum: 186.144
[2015-08-17 16:50:20 +0000 UTC][S] 'consul.runtime.gc_pause_ns': Count: 1 Sum: 3192798.000
[2015-08-17 16:50:30 +0000 UTC][G] 'consul.runtime.num_goroutines': 125.000
[2015-08-17 16:50:30 +0000 UTC][G] 'consul.runtime.sys_bytes': 17905912.000
[2015-08-17 16:50:30 +0000 UTC][G] 'consul.runtime.free_count': 244273072.000
[2015-08-17 16:50:30 +0000 UTC][G] 'consul.runtime.heap_objects': 30298.000
[2015-08-17 16:50:30 +0000 UTC][G] 'consul.runtime.total_gc_runs': 17268.000
[2015-08-17 16:50:30 +0000 UTC][G] 'consul.runtime.alloc_bytes': 5520536.000
[2015-08-17 16:50:30 +0000 UTC][G] 'consul.runtime.malloc_count': 244303360.000
[2015-08-17 16:50:30 +0000 UTC][G] 'consul.runtime.total_gc_pause_ns': 47668633600.000
[2015-08-17 16:50:30 +0000 UTC][G] 'consul.consul.session_ttl.active': 0.000
[2015-08-17 16:50:30 +0000 UTC][C] 'consul.raft.replication.appendEntries.logs.10.0.64.26:8300': Count: 134 Sum: 0.000
[2015-08-17 16:50:30 +0000 UTC][C] 'consul.memberlist.tcp.accept': Count: 1 Sum: 1.000
[2015-08-17 16:50:30 +0000 UTC][C] 'consul.memberlist.tcp.sent': Count: 1 Sum: 2736.000
[2015-08-17 16:50:30 +0000 UTC][C] 'consul.consul.rpc.query': Count: 2 Sum: 2.000
[2015-08-17 16:50:30 +0000 UTC][C] 'consul.raft.replication.appendEntries.logs.10.0.22.12:8300': Count: 135 Sum: 0.000
[2015-08-17 16:50:30 +0000 UTC][C] 'consul.memberlist.msg.alive': Count: 1 Sum: 1.000
[2015-08-17 16:50:30 +0000 UTC][C] 'consul.memberlist.udp.sent': Count: 22 Min: 22.000 Mean: 27.955 Max: 38.000 Stddev: 6.743 Sum: 615.000
[2015-08-17 16:50:30 +0000 UTC][C] 'consul.memberlist.udp.received': Count: 22 Min: 22.000 Mean: 30.727 Max: 38.000 Stddev: 8.154 Sum: 676.000
[2015-08-17 16:50:30 +0000 UTC][C] 'consul.consul.rpc.request': Count: 2 Sum: 2.000
[2015-08-17 16:50:30 +0000 UTC][S] 'consul.raft.replication.appendEntries.rpc.10.0.64.26:8300': Count: 134 Min: 1.112 Mean: 1.373 Max: 5.001 Stddev: 0.429 Sum: 183.975
[2015-08-17 16:50:30 +0000 UTC][S] 'consul.raft.replication.appendEntries.rpc.10.0.22.12:8300': Count: 135 Min: 0.893 Mean: 1.349 Max: 13.936 Stddev: 1.241 Sum: 182.061
[2015-08-17 16:50:30 +0000 UTC][S] 'consul.memberlist.gossip': Count: 70 Min: 0.002 Mean: 0.007 Max: 0.024 Stddev: 0.004 Sum: 0.461
[2015-08-17 16:50:30 +0000 UTC][S] 'consul.serf.queue.Intent': Count: 20 Sum: 0.000
[2015-08-17 16:50:30 +0000 UTC][S] 'consul.memberlist.probeNode': Count: 10 Min: 0.874 Mean: 1.953 Max: 4.805 Stddev: 1.202 Sum: 19.528
[2015-08-17 16:50:30 +0000 UTC][S] 'consul.runtime.gc_pause_ns': Count: 1 Sum: 3134871.000
[2015-08-17 16:50:30 +0000 UTC][S] 'consul.raft.replication.heartbeat.10.0.22.12:8300': Count: 68 Min: 0.948 Mean: 1.215 Max: 5.220 Stddev: 0.508 Sum: 82.643
[2015-08-17 16:50:30 +0000 UTC][S] 'consul.raft.replication.heartbeat.10.0.64.26:8300': Count: 62 Min: 1.096 Mean: 1.273 Max: 1.906 Stddev: 0.112 Sum: 78.917
[2015-08-17 16:50:30 +0000 UTC][S] 'consul.raft.leader.lastContact': Count: 44 Min: 1.000 Mean: 28.341 Max: 78.000 Stddev: 19.717 Sum: 1247.000
[2015-08-17 16:50:30 +0000 UTC][S] 'consul.serf.queue.Event': Count: 20 Sum: 0.000
[2015-08-17 16:50:30 +0000 UTC][S] 'consul.serf.queue.Query': Count: 20 Sum: 0.000
[2015-08-17 16:50:40 +0000 UTC][G] 'consul.runtime.num_goroutines': 124.000
[2015-08-17 16:50:40 +0000 UTC][G] 'consul.runtime.sys_bytes': 17905912.000
[2015-08-17 16:50:40 +0000 UTC][G] 'consul.runtime.free_count': 244294256.000
[2015-08-17 16:50:40 +0000 UTC][G] 'consul.runtime.total_gc_runs': 17269.000
[2015-08-17 16:50:40 +0000 UTC][G] 'consul.consul.session_ttl.active': 0.000
[2015-08-17 16:50:40 +0000 UTC][G] 'consul.runtime.alloc_bytes': 5563312.000
[2015-08-17 16:50:40 +0000 UTC][G] 'consul.runtime.malloc_count': 244328384.000
[2015-08-17 16:50:40 +0000 UTC][G] 'consul.runtime.heap_objects': 34124.000
[2015-08-17 16:50:40 +0000 UTC][G] 'consul.runtime.total_gc_pause_ns': 47671828480.000
[2015-08-17 16:50:40 +0000 UTC][C] 'consul.raft.replication.appendEntries.logs.10.0.22.12:8300': Count: 136 Sum: 0.000
[2015-08-17 16:50:40 +0000 UTC][C] 'consul.memberlist.udp.sent': Count: 15 Min: 22.000 Mean: 30.867 Max: 40.000 Stddev: 6.749 Sum: 463.000
[2015-08-17 16:50:40 +0000 UTC][C] 'consul.memberlist.tcp.connect': Count: 1 Sum: 1.000
[2015-08-17 16:50:40 +0000 UTC][C] 'consul.memberlist.msg.alive': Count: 2 Sum: 2.000
[2015-08-17 16:50:40 +0000 UTC][C] 'consul.consul.health.service.query-tag.redis.master': Count: 2 Sum: 2.000
[2015-08-17 16:50:40 +0000 UTC][C] 'consul.memberlist.tcp.accept': Count: 1 Sum: 1.000
[2015-08-17 16:50:40 +0000 UTC][C] 'consul.raft.replication.appendEntries.logs.10.0.64.26:8300': Count: 132 Sum: 0.000
[2015-08-17 16:50:40 +0000 UTC][C] 'consul.memberlist.udp.received': Count: 15 Min: 22.000 Mean: 27.333 Max: 38.000 Stddev: 7.807 Sum: 410.000
[2015-08-17 16:50:40 +0000 UTC][C] 'consul.memberlist.tcp.sent': Count: 2 Min: 2736.000 Mean: 2737.500 Max: 2739.000 Stddev: 2.121 Sum: 5475.000
[2015-08-17 16:50:40 +0000 UTC][C] 'consul.consul.rpc.query': Count: 2 Sum: 2.000
[2015-08-17 16:50:40 +0000 UTC][C] 'consul.consul.health.service.query.redis': Count: 2 Sum: 2.000
[2015-08-17 16:50:40 +0000 UTC][C] 'consul.consul.rpc.request': Count: 2 Sum: 2.000
[2015-08-17 16:50:40 +0000 UTC][S] 'consul.raft.replication.heartbeat.10.0.64.26:8300': Count: 65 Min: 1.125 Mean: 1.302 Max: 3.517 Stddev: 0.290 Sum: 84.628
[2015-08-17 16:50:40 +0000 UTC][S] 'consul.memberlist.gossip': Count: 70 Min: 0.002 Mean: 0.007 Max: 0.020 Stddev: 0.003 Sum: 0.462
[2015-08-17 16:50:40 +0000 UTC][S] 'consul.raft.replication.heartbeat.10.0.22.12:8300': Count: 66 Min: 0.929 Mean: 1.166 Max: 1.519 Stddev: 0.109 Sum: 76.948
[2015-08-17 16:50:40 +0000 UTC][S] 'consul.serf.queue.Event': Count: 20 Sum: 0.000
[2015-08-17 16:50:40 +0000 UTC][S] 'consul.memberlist.probeNode': Count: 10 Min: 0.969 Mean: 1.332 Max: 1.656 Stddev: 0.271 Sum: 13.316
[2015-08-17 16:50:40 +0000 UTC][S] 'consul.memberlist.pushPullNode': Count: 1 Sum: 6.132
[2015-08-17 16:50:40 +0000 UTC][S] 'consul.raft.replication.appendEntries.rpc.10.0.22.12:8300': Count: 136 Min: 0.988 Mean: 1.206 Max: 1.562 Stddev: 0.107 Sum: 164.026
[2015-08-17 16:50:40 +0000 UTC][S] 'consul.raft.replication.appendEntries.rpc.10.0.64.26:8300': Count: 132 Min: 1.109 Mean: 1.281 Max: 1.841 Stddev: 0.100 Sum: 169.128
[2015-08-17 16:50:40 +0000 UTC][S] 'consul.raft.leader.lastContact': Count: 44 Min: 0.000 Mean: 26.227 Max: 84.000 Stddev: 21.400 Sum: 1154.000
[2015-08-17 16:50:40 +0000 UTC][S] 'consul.serf.queue.Intent': Count: 20 Sum: 0.000
[2015-08-17 16:50:40 +0000 UTC][S] 'consul.serf.queue.Query': Count: 20 Sum: 0.000
[2015-08-17 16:50:40 +0000 UTC][S] 'consul.runtime.gc_pause_ns': Count: 1 Sum: 3196407.000
[2015-08-17 16:50:50 +0000 UTC][G] 'consul.runtime.sys_bytes': 17905912.000
[2015-08-17 16:50:50 +0000 UTC][G] 'consul.runtime.total_gc_runs': 17270.000
[2015-08-17 16:50:50 +0000 UTC][G] 'consul.runtime.alloc_bytes': 5181784.000
[2015-08-17 16:50:50 +0000 UTC][G] 'consul.runtime.malloc_count': 244342864.000
[2015-08-17 16:50:50 +0000 UTC][G] 'consul.runtime.free_count': 244313248.000
[2015-08-17 16:50:50 +0000 UTC][G] 'consul.runtime.heap_objects': 29614.000
[2015-08-17 16:50:50 +0000 UTC][G] 'consul.runtime.total_gc_pause_ns': 47675097088.000
[2015-08-17 16:50:50 +0000 UTC][G] 'consul.consul.session_ttl.active': 0.000
[2015-08-17 16:50:50 +0000 UTC][G] 'consul.runtime.num_goroutines': 119.000
[2015-08-17 16:50:50 +0000 UTC][C] 'consul.raft.replication.appendEntries.logs.10.0.22.12:8300': Count: 130 Sum: 0.000
[2015-08-17 16:50:50 +0000 UTC][C] 'consul.memberlist.udp.received': Count: 22 Min: 22.000 Mean: 30.727 Max: 38.000 Stddev: 8.154 Sum: 676.000
[2015-08-17 16:50:50 +0000 UTC][C] 'consul.memberlist.udp.sent': Count: 22 Min: 22.000 Mean: 28.091 Max: 40.000 Stddev: 6.969 Sum: 618.000
[2015-08-17 16:50:50 +0000 UTC][C] 'consul.consul.rpc.query': Count: 6 Sum: 6.000
[2015-08-17 16:50:50 +0000 UTC][C] 'consul.consul.rpc.request': Count: 6 Sum: 6.000
[2015-08-17 16:50:50 +0000 UTC][C] 'consul.consul.rpc.accept_conn': Count: 1 Sum: 1.000
[2015-08-17 16:50:50 +0000 UTC][C] 'consul.raft.replication.appendEntries.logs.10.0.64.26:8300': Count: 134 Sum: 0.000
[2015-08-17 16:50:50 +0000 UTC][S] 'consul.raft.replication.heartbeat.10.0.64.26:8300': Count: 66 Min: 1.133 Mean: 1.262 Max: 1.472 Stddev: 0.075 Sum: 83.313
[2015-08-17 16:50:50 +0000 UTC][S] 'consul.memberlist.gossip': Count: 70 Min: 0.002 Mean: 0.006 Max: 0.022 Stddev: 0.003 Sum: 0.439
[2015-08-17 16:50:50 +0000 UTC][S] 'consul.serf.queue.Event': Count: 20 Sum: 0.000
[2015-08-17 16:50:50 +0000 UTC][S] 'consul.memberlist.probeNode': Count: 10 Min: 0.941 Mean: 1.540 Max: 1.726 Stddev: 0.231 Sum: 15.401
[2015-08-17 16:50:50 +0000 UTC][S] 'consul.runtime.gc_pause_ns': Count: 1 Sum: 3266811.000
[2015-08-17 16:50:50 +0000 UTC][S] 'consul.raft.replication.appendEntries.rpc.10.0.64.26:8300': Count: 134 Min: 1.136 Mean: 1.299 Max: 2.334 Stddev: 0.169 Sum: 174.104
[2015-08-17 16:50:50 +0000 UTC][S] 'consul.raft.replication.heartbeat.10.0.22.12:8300': Count: 69 Min: 0.982 Mean: 1.151 Max: 1.386 Stddev: 0.092 Sum: 79.448
[2015-08-17 16:50:50 +0000 UTC][S] 'consul.raft.replication.appendEntries.rpc.10.0.22.12:8300': Count: 130 Min: 0.969 Mean: 1.173 Max: 1.409 Stddev: 0.090 Sum: 152.442
[2015-08-17 16:50:50 +0000 UTC][S] 'consul.raft.leader.lastContact': Count: 42 Min: 1.000 Mean: 29.810 Max: 87.000 Stddev: 21.281 Sum: 1252.000
[2015-08-17 16:50:50 +0000 UTC][S] 'consul.serf.queue.Intent': Count: 20 Sum: 0.000
[2015-08-17 16:50:50 +0000 UTC][S] 'consul.serf.queue.Query': Count: 20 Sum: 0.000
[2015-08-17 16:51:00 +0000 UTC][G] 'consul.runtime.heap_objects': 29913.000
[2015-08-17 16:51:00 +0000 UTC][G] 'consul.runtime.total_gc_pause_ns': 47678242816.000
[2015-08-17 16:51:00 +0000 UTC][G] 'consul.runtime.total_gc_runs': 17271.000
[2015-08-17 16:51:00 +0000 UTC][G] 'consul.runtime.num_goroutines': 129.000
[2015-08-17 16:51:00 +0000 UTC][G] 'consul.runtime.alloc_bytes': 5095480.000
[2015-08-17 16:51:00 +0000 UTC][G] 'consul.runtime.sys_bytes': 17905912.000
[2015-08-17 16:51:00 +0000 UTC][G] 'consul.runtime.malloc_count': 244365440.000
[2015-08-17 16:51:00 +0000 UTC][G] 'consul.runtime.free_count': 244335520.000
[2015-08-17 16:51:00 +0000 UTC][G] 'consul.consul.session_ttl.active': 0.000
[2015-08-17 16:51:00 +0000 UTC][C] 'consul.consul.rpc.query': Count: 11 Sum: 11.000
[2015-08-17 16:51:00 +0000 UTC][C] 'consul.memberlist.udp.sent': Count: 17 Min: 22.000 Mean: 29.471 Max: 37.000 Stddev: 6.539 Sum: 501.000
[2015-08-17 16:51:00 +0000 UTC][C] 'consul.memberlist.udp.received': Count: 17 Min: 22.000 Mean: 28.588 Max: 38.000 Stddev: 8.117 Sum: 486.000
[2015-08-17 16:51:00 +0000 UTC][C] 'consul.consul.health.service.query.orders': Count: 1 Sum: 1.000
[2015-08-17 16:51:00 +0000 UTC][C] 'consul.raft.replication.appendEntries.logs.10.0.64.26:8300': Count: 137 Min: 0.000 Mean: 0.007 Max: 1.000 Stddev: 0.085 Sum: 1.000
[2015-08-17 16:51:00 +0000 UTC][C] 'consul.consul.rpc.accept_conn': Count: 1 Sum: 1.000
[2015-08-17 16:51:00 +0000 UTC][C] 'consul.raft.barrier': Count: 1 Sum: 1.000
[2015-08-17 16:51:00 +0000 UTC][C] 'consul.raft.replication.appendEntries.logs.10.0.22.12:8300': Count: 132 Min: 0.000 Mean: 0.008 Max: 1.000 Stddev: 0.087 Sum: 1.000
[2015-08-17 16:51:00 +0000 UTC][C] 'consul.consul.rpc.request': Count: 9 Sum: 9.000
[2015-08-17 16:51:00 +0000 UTC][S] 'consul.raft.replication.heartbeat.10.0.64.26:8300': Count: 65 Min: 1.104 Mean: 1.429 Max: 6.863 Stddev: 0.754 Sum: 92.879
[2015-08-17 16:51:00 +0000 UTC][S] 'consul.memberlist.gossip': Count: 70 Min: 0.002 Mean: 0.007 Max: 0.030 Stddev: 0.004 Sum: 0.496
[2015-08-17 16:51:00 +0000 UTC][S] 'consul.serf.queue.Event': Count: 20 Sum: 0.000
[2015-08-17 16:51:00 +0000 UTC][S] 'consul.serf.queue.Query': Count: 20 Sum: 0.000
[2015-08-17 16:51:00 +0000 UTC][S] 'consul.memberlist.probeNode': Count: 10 Min: 0.987 Mean: 1.487 Max: 1.899 Stddev: 0.289 Sum: 14.870
[2015-08-17 16:51:00 +0000 UTC][S] 'consul.consul.leader.reconcileMember': Count: 17 Min: 0.050 Mean: 0.080 Max: 0.206 Stddev: 0.038 Sum: 1.359
[2015-08-17 16:51:00 +0000 UTC][S] 'consul.runtime.gc_pause_ns': Count: 1 Sum: 3148746.000
[2015-08-17 16:51:00 +0000 UTC][S] 'consul.raft.replication.heartbeat.10.0.22.12:8300': Count: 64 Min: 0.936 Mean: 1.236 Max: 3.447 Stddev: 0.437 Sum: 79.105
[2015-08-17 16:51:00 +0000 UTC][S] 'consul.raft.leader.dispatchLog': Count: 1 Sum: 1.809
[2015-08-17 16:51:00 +0000 UTC][S] 'consul.serf.queue.Intent': Count: 20 Sum: 0.000
[2015-08-17 16:51:00 +0000 UTC][S] 'consul.raft.commitTime': Count: 1 Sum: 5.451
[2015-08-17 16:51:00 +0000 UTC][S] 'consul.consul.leader.barrier': Count: 1 Sum: 5.487
[2015-08-17 16:51:00 +0000 UTC][S] 'consul.raft.replication.appendEntries.rpc.10.0.22.12:8300': Count: 132 Min: 0.913 Mean: 1.326 Max: 5.534 Stddev: 0.615 Sum: 175.059
[2015-08-17 16:51:00 +0000 UTC][S] 'consul.raft.replication.appendEntries.rpc.10.0.64.26:8300': Count: 137 Min: 1.065 Mean: 1.519 Max: 10.417 Stddev: 1.080 Sum: 208.133
[2015-08-17 16:51:00 +0000 UTC][S] 'consul.consul.leader.reconcile': Count: 1 Sum: 1.904
[2015-08-17 16:51:00 +0000 UTC][S] 'consul.raft.leader.lastContact': Count: 46 Min: 1.000 Mean: 34.196 Max: 85.000 Stddev: 21.388 Sum: 1573.000
[2015-08-17 16:59:50 +0000 UTC][G] 'consul.runtime.heap_objects': 27067.000
[2015-08-17 16:59:50 +0000 UTC][G] 'consul.runtime.total_gc_pause_ns': 47851905024.000
[2015-08-17 16:59:50 +0000 UTC][G] 'consul.consul.session_ttl.active': 0.000
[2015-08-17 16:59:50 +0000 UTC][G] 'consul.runtime.alloc_bytes': 4843056.000
[2015-08-17 16:59:50 +0000 UTC][G] 'consul.runtime.free_count': 245444416.000
[2015-08-17 16:59:50 +0000 UTC][G] 'consul.runtime.malloc_count': 245471488.000
[2015-08-17 16:59:50 +0000 UTC][G] 'consul.runtime.total_gc_runs': 17326.000
[2015-08-17 16:59:50 +0000 UTC][G] 'consul.runtime.num_goroutines': 119.000
[2015-08-17 16:59:50 +0000 UTC][G] 'consul.runtime.sys_bytes': 17905912.000
[2015-08-17 16:59:50 +0000 UTC][C] 'consul.consul.health.service.query.redis': Count: 2 Sum: 2.000
[2015-08-17 16:59:50 +0000 UTC][C] 'consul.consul.health.service.query-tag.redis.master': Count: 2 Sum: 2.000
[2015-08-17 16:59:50 +0000 UTC][C] 'consul.consul.rpc.request': Count: 4 Sum: 4.000
[2015-08-17 16:59:50 +0000 UTC][C] 'consul.raft.replication.appendEntries.logs.10.0.64.26:8300': Count: 132 Sum: 0.000
[2015-08-17 16:59:50 +0000 UTC][C] 'consul.raft.replication.appendEntries.logs.10.0.22.12:8300': Count: 134 Sum: 0.000
[2015-08-17 16:59:50 +0000 UTC][C] 'consul.memberlist.udp.received': Count: 21 Min: 22.000 Mean: 30.381 Max: 38.000 Stddev: 8.188 Sum: 638.000
[2015-08-17 16:59:50 +0000 UTC][C] 'consul.memberlist.udp.sent': Count: 21 Min: 22.000 Mean: 28.429 Max: 40.000 Stddev: 7.033 Sum: 597.000
[2015-08-17 16:59:50 +0000 UTC][C] 'consul.consul.rpc.query': Count: 4 Sum: 4.000
[2015-08-17 16:59:50 +0000 UTC][S] 'consul.memberlist.probeNode': Count: 10 Min: 0.956 Mean: 1.928 Max: 6.380 Stddev: 1.582 Sum: 19.278
[2015-08-17 16:59:50 +0000 UTC][S] 'consul.runtime.gc_pause_ns': Count: 1 Sum: 3101747.000
[2015-08-17 16:59:50 +0000 UTC][S] 'consul.raft.replication.appendEntries.rpc.10.0.64.26:8300': Count: 132 Min: 1.149 Mean: 1.299 Max: 3.994 Stddev: 0.249 Sum: 171.533
[2015-08-17 16:59:50 +0000 UTC][S] 'consul.raft.leader.lastContact': Count: 46 Min: 0.000 Mean: 35.261 Max: 79.000 Stddev: 22.805 Sum: 1622.000
[2015-08-17 16:59:50 +0000 UTC][S] 'consul.raft.replication.heartbeat.10.0.22.12:8300': Count: 66 Min: 0.926 Mean: 1.079 Max: 1.227 Stddev: 0.062 Sum: 71.215
[2015-08-17 16:59:50 +0000 UTC][S] 'consul.raft.replication.heartbeat.10.0.64.26:8300': Count: 66 Min: 1.068 Mean: 1.264 Max: 1.738 Stddev: 0.107 Sum: 83.448
[2015-08-17 16:59:50 +0000 UTC][S] 'consul.serf.queue.Event': Count: 20 Sum: 0.000
[2015-08-17 16:59:50 +0000 UTC][S] 'consul.raft.replication.appendEntries.rpc.10.0.22.12:8300': Count: 134 Min: 0.957 Mean: 1.098 Max: 1.905 Stddev: 0.111 Sum: 147.184
[2015-08-17 16:59:50 +0000 UTC][S] 'consul.memberlist.gossip': Count: 70 Min: 0.002 Mean: 0.006 Max: 0.021 Stddev: 0.003 Sum: 0.422
[2015-08-17 16:59:50 +0000 UTC][S] 'consul.serf.queue.Intent': Count: 20 Sum: 0.000
[2015-08-17 16:59:50 +0000 UTC][S] 'consul.serf.queue.Query': Count: 20 Sum: 0.000
[2015-08-17 17:00:00 +0000 UTC][G] 'consul.runtime.malloc_count': 245504464.000
[2015-08-17 17:00:00 +0000 UTC][G] 'consul.runtime.free_count': 245475104.000
[2015-08-17 17:00:00 +0000 UTC][G] 'consul.consul.session_ttl.active': 0.000
[2015-08-17 17:00:00 +0000 UTC][G] 'consul.runtime.num_goroutines': 119.000
[2015-08-17 17:00:00 +0000 UTC][G] 'consul.runtime.alloc_bytes': 5337808.000
[2015-08-17 17:00:00 +0000 UTC][G] 'consul.runtime.total_gc_pause_ns': 47858135040.000
[2015-08-17 17:00:00 +0000 UTC][G] 'consul.runtime.total_gc_runs': 17328.000
[2015-08-17 17:00:00 +0000 UTC][G] 'consul.runtime.sys_bytes': 17905912.000
[2015-08-17 17:00:00 +0000 UTC][G] 'consul.runtime.heap_objects': 29348.000
[2015-08-17 17:00:00 +0000 UTC][C] 'consul.memberlist.msg.alive': Count: 4 Sum: 4.000
[2015-08-17 17:00:00 +0000 UTC][C] 'consul.memberlist.msg.suspect': Count: 1 Sum: 1.000
[2015-08-17 17:00:00 +0000 UTC][C] 'consul.serf.events.consul:new-leader': Count: 1 Sum: 1.000
[2015-08-17 17:00:00 +0000 UTC][C] 'consul.raft.barrier': Count: 1 Sum: 1.000
[2015-08-17 17:00:00 +0000 UTC][C] 'consul.consul.rpc.query': Count: 2 Sum: 2.000
[2015-08-17 17:00:00 +0000 UTC][C] 'consul.raft.replication.appendEntries.logs.10.0.64.26:8300': Count: 84 Min: 0.000 Mean: 0.024 Max: 2.000 Stddev: 0.218 Sum: 2.000
[2015-08-17 17:00:00 +0000 UTC][C] 'consul.memberlist.udp.received': Count: 44 Min: 22.000 Mean: 88.750 Max: 252.000 Stddev: 68.443 Sum: 3905.000
[2015-08-17 17:00:00 +0000 UTC][C] 'consul.memberlist.tcp.accept': Count: 2 Sum: 2.000
[2015-08-17 17:00:00 +0000 UTC][C] 'consul.serf.events': Count: 1 Sum: 1.000
[2015-08-17 17:00:00 +0000 UTC][C] 'consul.consul.rpc.request': Count: 2 Sum: 2.000
[2015-08-17 17:00:00 +0000 UTC][C] 'consul.raft.replication.appendEntries.logs.10.0.22.12:8300': Count: 98 Min: 0.000 Mean: 0.020 Max: 1.000 Stddev: 0.142 Sum: 2.000
[2015-08-17 17:00:00 +0000 UTC][C] 'consul.memberlist.udp.sent': Count: 46 Min: 22.000 Mean: 91.913 Max: 223.000 Stddev: 69.093 Sum: 4228.000
[2015-08-17 17:00:00 +0000 UTC][C] 'consul.memberlist.tcp.sent': Count: 2 Sum: 5478.000
[2015-08-17 17:00:00 +0000 UTC][S] 'consul.raft.rpc.requestVote': Count: 3 Min: 0.054 Mean: 0.069 Max: 0.099 Stddev: 0.026 Sum: 0.207
[2015-08-17 17:00:00 +0000 UTC][S] 'consul.raft.leader.dispatchLog': Count: 2 Min: 1.857 Mean: 2.046 Max: 2.236 Stddev: 0.268 Sum: 4.092
[2015-08-17 17:00:00 +0000 UTC][S] 'consul.raft.commitTime': Count: 2 Min: 6.152 Mean: 6.381 Max: 6.610 Stddev: 0.324 Sum: 12.762
[2015-08-17 17:00:00 +0000 UTC][S] 'consul.consul.leader.reconcile': Count: 1 Sum: 1.907
[2015-08-17 17:00:00 +0000 UTC][S] 'consul.serf.msgs.received': Count: 14 Sum: 910.000
[2015-08-17 17:00:00 +0000 UTC][S] 'consul.raft.replication.heartbeat.10.0.64.26:8300': Count: 42 Min: 1.094 Mean: 26.169 Max: 1006.168 Stddev: 154.923 Sum: 1099.109
[2015-08-17 17:00:00 +0000 UTC][S] 'consul.serf.queue.Event': Count: 20 Sum: 0.000
[2015-08-17 17:00:00 +0000 UTC][S] 'consul.serf.queue.Query': Count: 20 Sum: 0.000
[2015-08-17 17:00:00 +0000 UTC][S] 'consul.memberlist.probeNode': Count: 10 Min: 1.111 Mean: 118.482 Max: 532.544 Stddev: 196.665 Sum: 1184.825
[2015-08-17 17:00:00 +0000 UTC][S] 'consul.runtime.gc_pause_ns': Count: 2 Min: 2996053.000 Mean: 3114318.500 Max: 3232584.000 Stddev: 167252.674 Sum: 6228637.000
[2015-08-17 17:00:00 +0000 UTC][S] 'consul.raft.replication.appendEntries.rpc.10.0.64.26:8300': Count: 84 Min: 1.143 Mean: 14.192 Max: 1003.915 Stddev: 109.311 Sum: 1192.096
[2015-08-17 17:00:00 +0000 UTC][S] 'consul.raft.replication.appendEntries.rpc.10.0.22.12:8300': Count: 98 Min: 0.910 Mean: 3.169 Max: 21.752 Stddev: 4.175 Sum: 310.588
[2015-08-17 17:00:00 +0000 UTC][S] 'consul.raft.replication.heartbeat.10.0.22.12:8300': Count: 47 Min: 0.957 Mean: 3.166 Max: 21.581 Stddev: 4.414 Sum: 148.818
[2015-08-17 17:00:00 +0000 UTC][S] 'consul.memberlist.gossip': Count: 69 Min: 0.002 Mean: 0.109 Max: 3.622 Stddev: 0.453 Sum: 7.544
[2015-08-17 17:00:00 +0000 UTC][S] 'consul.serf.queue.Intent': Count: 20 Sum: 0.000
[2015-08-17 17:00:00 +0000 UTC][S] 'consul.raft.candidate.electSelf': Count: 4 Min: 1.264 Mean: 4.354 Max: 8.173 Stddev: 3.571 Sum: 17.415
[2015-08-17 17:00:00 +0000 UTC][S] 'consul.raft.leader.lastContact': Count: 32 Min: 0.000 Mean: 84.094 Max: 924.000 Stddev: 190.386 Sum: 2691.000
[2015-08-17 17:00:00 +0000 UTC][S] 'consul.serf.snapshot.appendLine': Count: 1 Sum: 0.013
[2015-08-17 17:00:00 +0000 UTC][S] 'consul.consul.leader.barrier': Count: 1 Sum: 7.075
[2015-08-17 17:00:00 +0000 UTC][S] 'consul.consul.leader.reconcileMember': Count: 17 Min: 0.054 Mean: 0.078 Max: 0.125 Stddev: 0.023 Sum: 1.320
[2015-08-17 17:00:00 +0000 UTC][S] 'consul.serf.msgs.sent': Count: 8 Sum: 520.000
[2015-08-17 17:00:10 +0000 UTC][G] 'consul.runtime.alloc_bytes': 5854400.000
[2015-08-17 17:00:10 +0000 UTC][G] 'consul.runtime.sys_bytes': 17905912.000
[2015-08-17 17:00:10 +0000 UTC][G] 'consul.runtime.heap_objects': 30714.000
[2015-08-17 17:00:10 +0000 UTC][G] 'consul.runtime.total_gc_pause_ns': 47861350400.000
[2015-08-17 17:00:10 +0000 UTC][G] 'consul.runtime.num_goroutines': 120.000
[2015-08-17 17:00:10 +0000 UTC][G] 'consul.runtime.free_count': 245493344.000
[2015-08-17 17:00:10 +0000 UTC][G] 'consul.runtime.total_gc_runs': 17329.000
[2015-08-17 17:00:10 +0000 UTC][G] 'consul.consul.session_ttl.active': 0.000
[2015-08-17 17:00:10 +0000 UTC][G] 'consul.runtime.malloc_count': 245524064.000
[2015-08-17 17:00:10 +0000 UTC][C] 'consul.memberlist.udp.sent': Count: 23 Min: 22.000 Mean: 27.826 Max: 40.000 Stddev: 6.991 Sum: 640.000
[2015-08-17 17:00:10 +0000 UTC][C] 'consul.consul.rpc.request': Count: 2 Sum: 2.000
[2015-08-17 17:00:10 +0000 UTC][C] 'consul.raft.replication.appendEntries.logs.10.0.22.12:8300': Count: 129 Sum: 0.000
[2015-08-17 17:00:10 +0000 UTC][C] 'consul.memberlist.udp.received': Count: 23 Min: 22.000 Mean: 31.043 Max: 38.000 Stddev: 8.110 Sum: 714.000
[2015-08-17 17:00:10 +0000 UTC][C] 'consul.memberlist.tcp.sent': Count: 1 Sum: 2734.000
[2015-08-17 17:00:10 +0000 UTC][C] 'consul.memberlist.msg.alive': Count: 1 Sum: 1.000
[2015-08-17 17:00:10 +0000 UTC][C] 'consul.consul.rpc.query': Count: 2 Sum: 2.000
[2015-08-17 17:00:10 +0000 UTC][C] 'consul.raft.replication.appendEntries.logs.10.0.64.26:8300': Count: 135 Sum: 0.000
[2015-08-17 17:00:10 +0000 UTC][C] 'consul.memberlist.tcp.connect': Count: 1 Sum: 1.000
[2015-08-17 17:00:10 +0000 UTC][S] 'consul.raft.replication.heartbeat.10.0.64.26:8300': Count: 63 Min: 1.096 Mean: 2.391 Max: 11.167 Stddev: 3.071 Sum: 150.629
[2015-08-17 17:00:10 +0000 UTC][S] 'consul.raft.leader.lastContact': Count: 44 Min: 0.000 Mean: 36.727 Max: 89.000 Stddev: 23.280 Sum: 1616.000
[2015-08-17 17:00:10 +0000 UTC][S] 'consul.serf.queue.Intent': Count: 20 Sum: 0.000
[2015-08-17 17:00:10 +0000 UTC][S] 'consul.memberlist.probeNode': Count: 10 Min: 0.958 Mean: 2.331 Max: 11.009 Stddev: 3.071 Sum: 23.310
[2015-08-17 17:00:10 +0000 UTC][S] 'consul.memberlist.pushPullNode': Count: 1 Sum: 22.985
[2015-08-17 17:00:10 +0000 UTC][S] 'consul.raft.replication.heartbeat.10.0.22.12:8300': Count: 67 Min: 0.933 Mean: 1.973 Max: 11.101 Stddev: 2.855 Sum: 132.212
[2015-08-17 17:00:10 +0000 UTC][S] 'consul.raft.replication.appendEntries.rpc.10.0.22.12:8300': Count: 129 Min: 0.930 Mean: 2.363 Max: 21.058 Stddev: 3.493 Sum: 304.868
[2015-08-17 17:00:10 +0000 UTC][S] 'consul.memberlist.gossip': Count: 70 Min: 0.002 Mean: 0.007 Max: 0.028 Stddev: 0.005 Sum: 0.514
[2015-08-17 17:00:10 +0000 UTC][S] 'consul.runtime.gc_pause_ns': Count: 1 Sum: 3219006.000
[2015-08-17 17:00:10 +0000 UTC][S] 'consul.raft.replication.appendEntries.rpc.10.0.64.26:8300': Count: 135 Min: 1.118 Mean: 2.347 Max: 11.075 Stddev: 2.982 Sum: 316.871
[2015-08-17 17:00:10 +0000 UTC][S] 'consul.serf.queue.Event': Count: 20 Sum: 0.000
[2015-08-17 17:00:10 +0000 UTC][S] 'consul.serf.queue.Query': Count: 20 Sum: 0.000
[2015-08-17 17:00:20 +0000 UTC][G] 'consul.runtime.alloc_bytes': 5090272.000
[2015-08-17 17:00:20 +0000 UTC][G] 'consul.runtime.malloc_count': 245538992.000
[2015-08-17 17:00:20 +0000 UTC][G] 'consul.runtime.free_count': 245510640.000
[2015-08-17 17:00:20 +0000 UTC][G] 'consul.runtime.total_gc_runs': 17330.000
[2015-08-17 17:00:20 +0000 UTC][G] 'consul.runtime.num_goroutines': 120.000
[2015-08-17 17:00:20 +0000 UTC][G] 'consul.runtime.sys_bytes': 17905912.000
[2015-08-17 17:00:20 +0000 UTC][G] 'consul.runtime.heap_objects': 28356.000
[2015-08-17 17:00:20 +0000 UTC][G] 'consul.runtime.total_gc_pause_ns': 47864561664.000
[2015-08-17 17:00:20 +0000 UTC][G] 'consul.consul.session_ttl.active': 0.000
[2015-08-17 17:00:20 +0000 UTC][C] 'consul.memberlist.udp.sent': Count: 17 Min: 22.000 Mean: 29.706 Max: 38.000 Stddev: 6.762 Sum: 505.000
[2015-08-17 17:00:20 +0000 UTC][C] 'consul.memberlist.udp.received': Count: 17 Min: 22.000 Mean: 28.588 Max: 38.000 Stddev: 8.117 Sum: 486.000
[2015-08-17 17:00:20 +0000 UTC][C] 'consul.raft.replication.appendEntries.logs.10.0.64.26:8300': Count: 133 Sum: 0.000
[2015-08-17 17:00:20 +0000 UTC][C] 'consul.raft.replication.appendEntries.logs.10.0.22.12:8300': Count: 134 Sum: 0.000
[2015-08-17 17:00:20 +0000 UTC][C] 'consul.consul.health.service.query.orders': Count: 2 Sum: 2.000
[2015-08-17 17:00:20 +0000 UTC][C] 'consul.consul.rpc.request': Count: 4 Sum: 4.000
[2015-08-17 17:00:20 +0000 UTC][C] 'consul.consul.rpc.query': Count: 4 Sum: 4.000
[2015-08-17 17:00:20 +0000 UTC][S] 'consul.memberlist.probeNode': Count: 10 Min: 0.880 Mean: 1.592 Max: 3.236 Stddev: 0.630 Sum: 15.921
[2015-08-17 17:00:20 +0000 UTC][S] 'consul.runtime.gc_pause_ns': Count: 1 Sum: 3207401.000
[2015-08-17 17:00:20 +0000 UTC][S] 'consul.raft.replication.appendEntries.rpc.10.0.22.12:8300': Count: 134 Min: 0.965 Mean: 1.139 Max: 3.272 Stddev: 0.239 Sum: 152.656
[2015-08-17 17:00:20 +0000 UTC][S] 'consul.raft.replication.heartbeat.10.0.64.26:8300': Count: 64 Min: 1.118 Mean: 1.239 Max: 1.467 Stddev: 0.074 Sum: 79.327
[2015-08-17 17:00:20 +0000 UTC][S] 'consul.raft.replication.heartbeat.10.0.22.12:8300': Count: 67 Min: 0.950 Mean: 1.211 Max: 7.338 Stddev: 0.800 Sum: 81.110
[2015-08-17 17:00:20 +0000 UTC][S] 'consul.raft.leader.lastContact': Count: 44 Min: 1.000 Mean: 26.682 Max: 74.000 Stddev: 21.495 Sum: 1174.000
[2015-08-17 17:00:20 +0000 UTC][S] 'consul.serf.queue.Intent': Count: 20 Sum: 0.000
[2015-08-17 17:00:20 +0000 UTC][S] 'consul.raft.replication.appendEntries.rpc.10.0.64.26:8300': Count: 133 Min: 1.104 Mean: 1.245 Max: 1.707 Stddev: 0.071 Sum: 165.617
[2015-08-17 17:00:20 +0000 UTC][S] 'consul.memberlist.gossip': Count: 70 Min: 0.002 Mean: 0.007 Max: 0.034 Stddev: 0.005 Sum: 0.524
[2015-08-17 17:00:20 +0000 UTC][S] 'consul.serf.queue.Event': Count: 20 Sum: 0.000
[2015-08-17 17:00:20 +0000 UTC][S] 'consul.serf.queue.Query': Count: 20 Sum: 0.000
[2015-08-17 17:00:30 +0000 UTC][G] 'consul.runtime.free_count': 245529920.000
[2015-08-17 17:00:30 +0000 UTC][G] 'consul.consul.session_ttl.active': 0.000
[2015-08-17 17:00:30 +0000 UTC][G] 'consul.runtime.total_gc_pause_ns': 47867711488.000
[2015-08-17 17:00:30 +0000 UTC][G] 'consul.runtime.total_gc_runs': 17331.000
[2015-08-17 17:00:30 +0000 UTC][G] 'consul.runtime.num_goroutines': 125.000
[2015-08-17 17:00:30 +0000 UTC][G] 'consul.runtime.alloc_bytes': 5328464.000
[2015-08-17 17:00:30 +0000 UTC][G] 'consul.runtime.sys_bytes': 17905912.000
[2015-08-17 17:00:30 +0000 UTC][G] 'consul.runtime.malloc_count': 245559472.000
[2015-08-17 17:00:30 +0000 UTC][G] 'consul.runtime.heap_objects': 29544.000
[2015-08-17 17:00:30 +0000 UTC][C] 'consul.memberlist.tcp.sent': Count: 1 Sum: 2740.000
[2015-08-17 17:00:30 +0000 UTC][C] 'consul.memberlist.udp.received': Count: 21 Min: 22.000 Mean: 30.381 Max: 38.000 Stddev: 8.188 Sum: 638.000
[2015-08-17 17:00:30 +0000 UTC][C] 'consul.consul.rpc.query': Count: 6 Sum: 6.000
[2015-08-17 17:00:30 +0000 UTC][C] 'consul.consul.rpc.request': Count: 6 Sum: 6.000
[2015-08-17 17:00:30 +0000 UTC][C] 'consul.memberlist.tcp.accept': Count: 1 Sum: 1.000
[2015-08-17 17:00:30 +0000 UTC][C] 'consul.memberlist.msg.alive': Count: 1 Sum: 1.000
[2015-08-17 17:00:30 +0000 UTC][C] 'consul.raft.replication.appendEntries.logs.10.0.22.12:8300': Count: 130 Sum: 0.000
[2015-08-17 17:00:30 +0000 UTC][C] 'consul.raft.replication.appendEntries.logs.10.0.64.26:8300': Count: 136 Sum: 0.000
[2015-08-17 17:00:30 +0000 UTC][C] 'consul.memberlist.udp.sent': Count: 21 Min: 22.000 Mean: 28.048 Max: 38.000 Stddev: 6.591 Sum: 589.000
[2015-08-17 17:00:30 +0000 UTC][C] 'consul.consul.rpc.accept_conn': Count: 1 Sum: 1.000
[2015-08-17 17:00:30 +0000 UTC][S] 'consul.raft.replication.heartbeat.10.0.64.26:8300': Count: 67 Min: 1.110 Mean: 1.440 Max: 10.543 Stddev: 1.225 Sum: 96.490
[2015-08-17 17:00:30 +0000 UTC][S] 'consul.serf.queue.Event': Count: 20 Sum: 0.000
[2015-08-17 17:00:30 +0000 UTC][S] 'consul.serf.queue.Query': Count: 20 Sum: 0.000
[2015-08-17 17:00:30 +0000 UTC][S] 'consul.memberlist.probeNode': Count: 10 Min: 0.841 Mean: 2.633 Max: 9.581 Stddev: 2.667 Sum: 26.326
[2015-08-17 17:00:30 +0000 UTC][S] 'consul.runtime.gc_pause_ns': Count: 1 Sum: 3150983.000
[2015-08-17 17:00:30 +0000 UTC][S] 'consul.raft.replication.heartbeat.10.0.22.12:8300': Count: 66 Min: 0.937 Mean: 1.124 Max: 4.196 Stddev: 0.392 Sum: 74.166
[2015-08-17 17:00:30 +0000 UTC][S] 'consul.raft.replication.appendEntries.rpc.10.0.22.12:8300': Count: 130 Min: 0.962 Mean: 1.115 Max: 2.214 Stddev: 0.144 Sum: 144.941
[2015-08-17 17:00:30 +0000 UTC][S] 'consul.raft.replication.appendEntries.rpc.10.0.64.26:8300': Count: 136 Min: 1.099 Mean: 1.305 Max: 3.091 Stddev: 0.216 Sum: 177.489
[2015-08-17 17:00:30 +0000 UTC][S] 'consul.memberlist.gossip': Count: 70 Min: 0.002 Mean: 0.007 Max: 0.021 Stddev: 0.003 Sum: 0.477
[2015-08-17 17:00:30 +0000 UTC][S] 'consul.raft.leader.lastContact': Count: 42 Min: 0.000 Mean: 29.571 Max: 84.000 Stddev: 24.620 Sum: 1242.000
[2015-08-17 17:00:30 +0000 UTC][S] 'consul.serf.queue.Intent': Count: 20 Sum: 0.000

@blakeblackshear
Copy link
Author

I do have a health check that fails consistently on my standby postgres server. It runs ever 10s and checks to see if the service is running. Its output does not change.

@djenriquez
Copy link

+1

@blakeblackshear
Copy link
Author

Here is a tcpdump of traffic on 8300 from the leader
https://gist.github.com/blakeblackshear/39281493d3b4b757f05d

@blakeblackshear
Copy link
Author

The consul servers are:

10.0.76.122
10.0.141.86
10.0.21.215

@slackpad slackpad self-assigned this Aug 25, 2015
@cruatta
Copy link
Contributor

cruatta commented Sep 7, 2015

We've been seeing similar issues on servers that are under relatively high load and doing a lot of network IO. Out of curiosity, what is your GOMAXPROCS set at?

@blakeblackshear
Copy link
Author

It should be set to 2, but looking at our configs, I am not certain that is the case now. I am starting to think this is related to performance because we aren't seeing this issue in production where we run larger instances. However, average CPU utilization is less than 5%, so I am leaning towards network IO as the suspect. The strange thing is that I am only seeing it in some networks but not others even though they are configured with the exact same Ansible roles. I can't seem to find the difference.

@cruatta
Copy link
Contributor

cruatta commented Sep 7, 2015

Pretty sure this and #1212 are related

@cruatta
Copy link
Contributor

cruatta commented Sep 7, 2015

My understanding of Raft and Serf is limited but would it be possible for a server doing a lot of network IO to appear unresponsive to enough of the peers in the cluster that it would flap between healthy and unhealthy states? This makes me really wish we had some kind of high level visualization tool for the Serf cluster to monitor the state of the underlying member list.

@blakeblackshear
Copy link
Author

I confirmed that this is set to 2. I never see this in our larger production environment, but it happens constantly where we are using t2.micro/t2.small instances. Even when all servers in the VPC are idle at night. Avg CPU usage is < 3%.

@vaidik
Copy link

vaidik commented Jun 2, 2016

Do we have any updates on this? Or any ways to further debug similar issues? We are also seeing a lot of Serf health failures.

@xiaods
Copy link

xiaods commented Sep 19, 2016

i feel need a tracing system to track the request. how about use appdash to tracing:

https://github.com/sourcegraph/appdash

@slackpad
Copy link
Contributor

I'm going to close this out as Consul 0.7 introduced Lifeguard enhancements to the gossip layer to combat these kinds of issues. We also did some work to allow you to relax the Raft timing if you want to run on smaller, slightly overloaded servers - https://www.consul.io/docs/guides/performance.html.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants