You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Running sensuctl cluster health against a healthy node of the cluster should return the cluster status no matter if one or more nodes are down.
Current Behavior
I am having a 3 node cluster and i am currently testing it. When i terminate a node and then run sensuctl cluster healthi am getting Error: GET "/health": Get https://x.x.x.x:8080/health: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
OR using the api
++ curl -k https://x.x.x.x:8080/health
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0++ jq -r '.ClusterHealth[] | select(.Healthy==false) | .MemberID'
0 0 0 0 0 0 0 0 --:--:-- 0:24:57 --:--:-- 0
It is running for 25 mins...
Since i cannot detect the unhealthy cluster i cannot add new nodes due to:
The OS is RHEL7.7
I am hosting the packages on a custom yum repo. The following packages are installed
sensu-go-agent.x86_64 5.14.2-7022
sensu-go-backend.x86_64 5.14.2-7022
sensu-go-cli.x86_64 5.14.2-7022
p.s. i am using the embedded etcd version
The text was updated successfully, but these errors were encountered:
gtarnaras
changed the title
sensuctl cannot get cluster health when a node an etcd node is down
sensuctl cannot get cluster health when an etcd node is down and quorum is lost
Nov 18, 2019
Expected Behavior
Running
sensuctl cluster health
against a healthy node of the cluster should return the cluster status no matter if one or more nodes are down.Current Behavior
I am having a 3 node cluster and i am currently testing it. When i terminate a node and then run
sensuctl cluster health
i am gettingError: GET "/health": Get https://x.x.x.x:8080/health: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
OR using the api
++ curl -k https://x.x.x.x:8080/health
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0++ jq -r '.ClusterHealth[] | select(.Healthy==false) | .MemberID'
0 0 0 0 0 0 0 0 --:--:-- 0:24:57 --:--:-- 0
It is running for 25 mins...
Since i cannot detect the unhealthy cluster i cannot add new nodes due to:
neither detect which node is unhealthy to delete it. i.e. if a node is in failed condition i cannot restore the cluster.
Steps to Reproduce (for bugs)
sensuctl cluster health
sensuctl cluster health
Context
I am trying to build a robust Sensu cluster on AWS using autoscaling groups and i am currently checking how sensuctl reacts in case of unexpected failures. I am trying to follow the "remove-first" practice as described here -> https://github.com/etcd-io/etcd/blob/master/Documentation/op-guide/runtime-configuration.md#replace-a-failed-machine . I am able to run all other sensuctl commands, authenticate etc. but i cannot get the health status.
Your Environment
p.s. i am using the embedded etcd version
The text was updated successfully, but these errors were encountered: