Athens should not check the status of all etcd endpoints on startup #1888

uhthomas · 2023-09-12T00:06:21Z

Describe the bug

etcd and the client are designed to be highly available and resilient, manually checking the status of each endpoint defeats these goals as Athens will crash if only 2/3 endpoints of an etcd cluster are available. This behaviour makes rolling updates much harder.

Error Message

N/A - Athens will print an error that it cannot connect to a member of the etcd cluster.

To Reproduce
Steps to reproduce the behavior:

Have multiple athens instance, and an etcd cluster.
Restart both at the same time.
Observe that Athens will not connect to etcd unless all endpoints are available.

Expected behavior

Athens should connect to the etcd cluster and defer connection management to the etcd client. It should automatically load balance and route to available members.

Environment (please complete the following information):

OS: Linux
Go version: N/A
Proxy version: e248d22
Storage: etcd and s3

Additional context

We run 5 Athens pods and 3 etcd pods in Kubernetes with high availability. We will update both images at the same time, and the Athens deployment will take many minutes to progress as it will crash loop until the etcd cluster is completely ready.

The text was updated successfully, but these errors were encountered:

Athens checks the status of all etcd endpoints when started, which can cause issues when some members of the etcd cluster are unavailable. It is perfectly okay for some members of an etcd cluster to be unavailable, as it's designed for high availability and fault tolerance. The management of the connections is instead deferred to the etcd client, which will handle failures and load balancing as expected. Fixes: gomods#1888

uhthomas mentioned this issue Sep 12, 2023

fix(pkg/stash): don't check status of all etcd endpoints on start #1889

Merged

manugupt1 closed this as completed in #1889 Sep 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Athens should not check the status of all etcd endpoints on startup #1888

Athens should not check the status of all etcd endpoints on startup #1888

uhthomas commented Sep 12, 2023

Athens should not check the status of all etcd endpoints on startup #1888

Athens should not check the status of all etcd endpoints on startup #1888

Comments

uhthomas commented Sep 12, 2023