Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Watcher.Watch() hangs when some endpoints are not available #7247

Closed
cw9 opened this issue Jan 28, 2017 · 10 comments
Closed

Watcher.Watch() hangs when some endpoints are not available #7247

cw9 opened this issue Jan 28, 2017 · 10 comments

Comments

@cw9
Copy link

cw9 commented Jan 28, 2017

Hi, I've experienced a few times that Wathcer.Watch() would hang forever, the hanging happens on the following code:

watcher.Watch(
		context.Background(),
		key,
		clientv3.WithProgressNotify(),
		clientv3.WithCreatedNotify(),
	)

I'm using the release-3.1 version of clients, is this behavior expected? What is the reason for the hang? The key being watched on is already existing, but I don't think that matters?

My current plan of work around is to add timeout and retry to this call, let me know if you have any concerns with this approach.

@heyitsanthony
Copy link
Contributor

It will block until key is modified or until 10 minutes has elapsed for a progress notification.

@cw9
Copy link
Author

cw9 commented Jan 28, 2017

Sorry to be more clear, it's not the returned channel that is blocking, it's this Watch() call not returning a watch channel and just hanging here because I used context.Background().

I dig around a bit more, seem to find something related. I was running a cluster of 5 etcd nodes: etcd[1-5], but my etcd1 was sad:

etcdctl cluster-health
cluster may be unhealthy: failed to list members
Error:  client: etcd cluster is unavailable or misconfigured; error #0: client: endpoint http://127.0.0.1:2379 exceeded header timeout
; error #1: dial tcp 127.0.0.1:4001: getsockopt: connection refused

error #0: client: endpoint http://127.0.0.1:2379 exceeded header timeout
error #1: dial tcp 127.0.0.1:4001: getsockopt: connection refused

on the rest of the cluster:

etcdctl cluster-health
member 1ccd4d9efffe92ee is healthy: got healthy result from etcd2:2379
member 42b142598e45c3df is healthy: got healthy result from etcd3:2379
member c41764c24eb810c9 is healthy: got healthy result from etcd4:2379
member e5472c006de57d1f is healthy: got healthy result from etcd5:2379
cluster is healthy

So when I try to create the etcd client with etcd[2-5], the Watch() works just fine, but when I create the etcd client with only etcd1 in the endpoint list, the Watch() just hangs.

When I create client with endpoints: etcd[1-5], the client will randomly talk to one of the endpoints, so when they connect to etcd[2-5], they are fine, but when they are connected to etcd1, it hangs. My question is why does the simpleBalancer in the etcd client not trying to contact other instances, does this mean the watch connection is sticky?

@cw9 cw9 changed the title Is Watcher.Watch() Blocking? Watcher.Watch() hangs when some endpoints are not available Jan 28, 2017
@heyitsanthony
Copy link
Contributor

@cw9 OK, something may be wrong with the etcd1 member. Should the client endpoint be 127.0.0.1?

The balancer will pin an address if it can open a connection. If requests time out, it won't know about that; it will issue requests on that endpoint so long as the connection is up. This isn't a problem necessarily isolated to watches. It seems like 127.0.0.1:2379 is accepting connections, then doing nothing. For example, this "hangs" in a similar manner:

$ nc -l -p 2379 &
$ ETCDCTL_API=3 etcdctl watch abc

The fix would probably involve some kind of endpoint poisoning in the balancer so the client can abandon malfunctioning nodes.

@xiang90
Copy link
Contributor

xiang90 commented Jan 28, 2017

@cw9 @heyitsanthony

If you provide etcd clientv3 a blackhole endpoint, it will hang for watch, put or whatever request without a timeout. Watch is especially important here since you do not really want to put a timeout for most cases.

A off channel endpoint health checking mechanism is required to break out the rpc waiting I assume. See https://github.com/grpc/grpc/blob/master/doc/health-checking.md.

Not sure if gRPC-go already support this or not.

@cw9
Copy link
Author

cw9 commented Jan 30, 2017

thanks for the explanation, I'll probably do something to prevent this sort of blackholing from happening from my end, besides that I'd like to confirm 2 things:

1, when there is a network partition, let's say the 5 nodes splits into 3+2, if the client can still talk to any of the etcd boxes, will it be able to get watch updates that happened in the bigger partition if the original watch channel was connected to an etcd instance that is now in the smaller partition?
I assume Get() calls will still success if it got sent to one of the boxes in the bigger partition?

2, if the client is connected to proxy etcd instances and 1 of the proxy instances now lost connection to the main etcd cluster, would the watch channel that established with this box catch up/retry other proxy boxes?

@xiang90
Copy link
Contributor

xiang90 commented Jan 30, 2017

when there is a network partition, let's say the 5 nodes splits into 3+2, if the client can still talk to any of the etcd boxes, will it be able to get watch updates that happened in the bigger partition if the original watch channel was connected to an etcd instance that is now in the smaller partition?
I assume Get() calls will still success if it got sent to one of the boxes in the bigger partition?

By default, the watchers connected to the minority will hang there until the network partition recovers. However, you can provide option requireLeader (https://github.com/coreos/etcd/blob/master/clientv3/client.go#L306) to the watchers to ensure they can be aborted, and then be switched to the majority when partition happens.

You probably want to try this feature yourself to understand how it works. Let us know if you see any issues with it.

I assume Get() calls will still success if it got sent to one of the boxes in the bigger partition?

Correct. If you enable serializable reads, local read is allowed.

would the watch channel that established with this box catch up/retry other proxy boxes?

It should. But the gRPC proxy is an alpha feature, I have not tried it personally.

@heyitsanthony heyitsanthony added this to the unplanned milestone Jan 31, 2017
@cw9
Copy link
Author

cw9 commented Jan 31, 2017

However, you can provide option requireLeader (https://github.com/coreos/etcd/blob/master/clientv3/client.go#L306) to the watchers to ensure they can be aborted, and then be switched to the majority when partition happens.

Cool, this looks interesting, I'll def try it out

It should. But the gRPC proxy is an alpha feature, I have not tried it personally.

Got it, I'm not using the proxy feature as well, will make sure before I onboard to that.

@xiang90
Copy link
Contributor

xiang90 commented Oct 4, 2017

@gyuho is this fixed on master?

@gyuho
Copy link
Contributor

gyuho commented Oct 4, 2017

Closing via #8545.

@gyuho gyuho closed this as completed Oct 4, 2017
@xiang90
Copy link
Contributor

xiang90 commented Oct 4, 2017

@cw9 probably give it a try with current master + current master client. thank you!

adityadani added a commit to portworx/kvdb that referenced this issue Nov 24, 2018
1. Send an ErrWatchStopped to the caller only once.
- Currently ErrWatchStopped gets sent to the caller multiple
  times causing a resubscribing watch to fail as well.

2. Use context with leader requirement for Watch API.

- By default the etcd watchers will hang in case of a network partition and they are connected
  to the minority.
- As mentioned here - etcd-io/etcd#7247 (comment)
  setting the leader requirement for watchers allows them to switch to the majority partition.
adityadani added a commit to portworx/kvdb that referenced this issue Nov 26, 2018
1. Send an ErrWatchStopped to the caller only once.
- Currently ErrWatchStopped gets sent to the caller multiple
  times causing a resubscribing watch to fail as well.

2. Use context with leader requirement for Watch API.

- By default the etcd watchers will hang in case of a network partition and they are connected
  to the minority.
- As mentioned here - etcd-io/etcd#7247 (comment)
  setting the leader requirement for watchers allows them to switch to the majority partition.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

4 participants