Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

There're no cluster level command to show all the members' status/health in V3 #8117

Closed
armstrongli opened this issue Jun 16, 2017 · 4 comments

Comments

@armstrongli
Copy link

Problem state

In V2, the command etcdctl cluster health checks all the members' health status. And in V3, there's 1 command, 2 sub-commands to check the member's health status - etcdctl endpoint status and etcdctl endpoint health.
However, the V3 commands are difficult to use cause they only check the endpoints' status/health provided by option --endpoints. It is needed to collect all the endpoints first, and then set into the endpoints option.

There's one issue created #8115 first to implement the feature to collect all the members' status/health.

It is supposed to be the design instead of implementation. I hope there is one command to check the cluster status instead of only the endpoints provided.

$ ETCDCTL_API=3 etcdctl cluster health -w table
+----------------------------------------------+------------------+---------+---------+-----------+-----------+------------+
|                   ENDPOINT                   |        ID        | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX |
+----------------------------------------------+------------------+---------+---------+-----------+-----------+------------+
| https://test-node-vh8tr-6888.51.test.cn:4001 | d479908fcc05ba8e | 3.0.15  | 8.0 GB  | false     |      1309 |  229422110 |
| https://test-node-2o9hu-4886.51.test.cn:4001 | b620b4c395187fad | 3.1.8   | 8.0 GB  | false     |      1309 |  229422110 |
| https://test-node-873t0-8911.51.test.cn:4001 | 829de42e4f2097c4 | 3.1.8   | 8.0 GB  | true      |      1309 |  229422110 |
| https://test-node-ba93c-3871.51.test.cn:4001 | 2af130f12df09e15 | 3.0.15  | 8.0 GB  | false     |      1294 |  227711328 |
| https://test-node-xq2sm-5915.51.test.cn:4001 | 5902a07919e43cdf | 3.0.15  | 8.0 GB  | false     |      1309 |  229422110 |
+----------------------------------------------+------------------+---------+---------+-----------+-----------+------------+

The confusion part is that there is already one command which can make this.

$ ETCDCTL_API=3 etcdctl endpoint status --endpoints=https://test-node-vh8tr-6888.51.test.cn:4001,https://test-node-2o9hu-4886.51.test.cn:4001,https://test-node-873t0-8911.51.test.cn:4001,https://test-node-ba93c-3871.51.test.cn:4001,https://test-node-xq2sm-5915.51.test.cn:4001 -w table
....

I propose to change the logic of endpoint status, endpoint health to collect all the endpoints and check the health of them. It is implemented by PR: #8116

@heyitsanthony
Copy link
Contributor

The current behavior shouldn't be changed, but I think it's reasonable to extend etcdctl to support a flag like etcdctl endpoint status --cluster. When passed --cluster, etcdctl will check the endpoint status on every endpoint seen in the cluster member list.

@xiang90
Copy link
Contributor

xiang90 commented Jun 16, 2017

@heyitsanthony

--cluster can be confusing. Let's say we have a cluster with 3 members and 3 endpoints: a, b, c.

As a user with the --cluster option enabled, I would lazily just put a as the endpoint, and use it to check the cluster status. Then one day, the etcd member with endpoint a dies. My cluster checking status does not work anymore. I will be very confusing. Why does my cluster commands return errors?! I still have two members running b and c. etcd cannot tolerate one member failure!

We have seen this so many times... So I would just want to force users to put in ALL endpoints when they want to do a checking correctly.

@heyitsanthony
Copy link
Contributor

@xiang90 it can say it can't fetch the member list given the one endpoint that was provided. Is that still confusing?

@xiang90
Copy link
Contributor

xiang90 commented Jun 16, 2017

@heyitsanthony

That would be better. Just do not report the cluster is not healthy or things like that :P.

@heyitsanthony heyitsanthony self-assigned this Jun 20, 2017
heyitsanthony added a commit to heyitsanthony/etcd that referenced this issue Jun 20, 2017
Queries the cluster for endpoints to use for the endpoint commands.

Fixes etcd-io#8117
heyitsanthony added a commit to heyitsanthony/etcd that referenced this issue Jun 20, 2017
Query the cluster for endpoints when given --cluster for the endpoint commands.

Fixes etcd-io#8117
heyitsanthony added a commit to heyitsanthony/etcd that referenced this issue Jun 20, 2017
Query the cluster for endpoints when given --cluster for the endpoint commands.

Fixes etcd-io#8117
heyitsanthony added a commit to heyitsanthony/etcd that referenced this issue Jun 21, 2017
Queries the cluster for endpoints to use for the endpoint commands.

Fixes etcd-io#8117
yudai pushed a commit to yudai/etcd that referenced this issue Oct 5, 2017
Queries the cluster for endpoints to use for the endpoint commands.

Fixes etcd-io#8117
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants