You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
etcd and the client are designed to be highly available and resilient, manually checking the status of each endpoint defeats these goals as Athens will crash if only 2/3 endpoints of an etcd cluster are available. This behaviour makes rolling updates much harder.
Error Message
N/A - Athens will print an error that it cannot connect to a member of the etcd cluster.
To Reproduce
Steps to reproduce the behavior:
Have multiple athens instance, and an etcd cluster.
Restart both at the same time.
Observe that Athens will not connect to etcd unless all endpoints are available.
Expected behavior
Athens should connect to the etcd cluster and defer connection management to the etcd client. It should automatically load balance and route to available members.
Environment (please complete the following information):
We run 5 Athens pods and 3 etcd pods in Kubernetes with high availability. We will update both images at the same time, and the Athens deployment will take many minutes to progress as it will crash loop until the etcd cluster is completely ready.
The text was updated successfully, but these errors were encountered:
uhthomas
added a commit
to uhthomas/athens
that referenced
this issue
Sep 12, 2023
Athens checks the status of all etcd endpoints when started, which can cause
issues when some members of the etcd cluster are unavailable. It is perfectly
okay for some members of an etcd cluster to be unavailable, as it's designed
for high availability and fault tolerance.
The management of the connections is instead deferred to the etcd client, which
will handle failures and load balancing as expected.
Fixes: gomods#1888
Athens checks the status of all etcd endpoints when started, which can cause
issues when some members of the etcd cluster are unavailable. It is perfectly
okay for some members of an etcd cluster to be unavailable, as it's designed
for high availability and fault tolerance.
The management of the connections is instead deferred to the etcd client, which
will handle failures and load balancing as expected.
Fixes: gomods#1888
Describe the bug
etcd and the client are designed to be highly available and resilient, manually checking the status of each endpoint defeats these goals as Athens will crash if only 2/3 endpoints of an etcd cluster are available. This behaviour makes rolling updates much harder.
Error Message
N/A - Athens will print an error that it cannot connect to a member of the etcd cluster.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
Athens should connect to the etcd cluster and defer connection management to the etcd client. It should automatically load balance and route to available members.
Environment (please complete the following information):
Additional context
We run 5 Athens pods and 3 etcd pods in Kubernetes with high availability. We will update both images at the same time, and the Athens deployment will take many minutes to progress as it will crash loop until the etcd cluster is completely ready.
The text was updated successfully, but these errors were encountered: