Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Athens should not check the status of all etcd endpoints on startup #1888

Closed
uhthomas opened this issue Sep 12, 2023 · 0 comments · Fixed by #1889
Closed

Athens should not check the status of all etcd endpoints on startup #1888

uhthomas opened this issue Sep 12, 2023 · 0 comments · Fixed by #1889

Comments

@uhthomas
Copy link
Contributor

Describe the bug

etcd and the client are designed to be highly available and resilient, manually checking the status of each endpoint defeats these goals as Athens will crash if only 2/3 endpoints of an etcd cluster are available. This behaviour makes rolling updates much harder.

Error Message

N/A - Athens will print an error that it cannot connect to a member of the etcd cluster.

To Reproduce
Steps to reproduce the behavior:

  1. Have multiple athens instance, and an etcd cluster.
  2. Restart both at the same time.
  3. Observe that Athens will not connect to etcd unless all endpoints are available.

Expected behavior

Athens should connect to the etcd cluster and defer connection management to the etcd client. It should automatically load balance and route to available members.

Environment (please complete the following information):

  • OS: Linux
  • Go version: N/A
  • Proxy version: e248d22
  • Storage: etcd and s3

Additional context

We run 5 Athens pods and 3 etcd pods in Kubernetes with high availability. We will update both images at the same time, and the Athens deployment will take many minutes to progress as it will crash loop until the etcd cluster is completely ready.

uhthomas added a commit to uhthomas/athens that referenced this issue Sep 12, 2023
Athens checks the status of all etcd endpoints when started, which can cause
issues when some members of the etcd cluster are unavailable. It is perfectly
okay for some members of an etcd cluster to be unavailable, as it's designed
for high availability and fault tolerance.

The management of the connections is instead deferred to the etcd client, which
will handle failures and load balancing as expected.

Fixes: gomods#1888
uhthomas added a commit to uhthomas/athens that referenced this issue Sep 12, 2023
Athens checks the status of all etcd endpoints when started, which can cause
issues when some members of the etcd cluster are unavailable. It is perfectly
okay for some members of an etcd cluster to be unavailable, as it's designed
for high availability and fault tolerance.

The management of the connections is instead deferred to the etcd client, which
will handle failures and load balancing as expected.

Fixes: gomods#1888
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant