New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sync etcd endpoints before lease acquistion #13082
Sync etcd endpoints before lease acquistion #13082
Conversation
If this looks ok, do you want me to add a call to |
cc @eparis |
What is the downside of autosync? |
@smarterclayton not much of one that I can think of. They recommend running with an interval between 10 and 60 seconds. https://github.com/coreos/etcd/blob/02f4a9a034f65f588599eaf6d756b98b55640c98/client/client.go#L181-L195 |
Let's do that then. Do we need this with that?
On Feb 24, 2017, at 10:03 AM, Andy Goldstein <notifications@github.com> wrote:
@smarterclayton <https://github.com/smarterclayton> not much of one that I
can think of. They recommend running with an interval between 10 and 60
seconds.
https://github.com/coreos/etcd/blob/02f4a9a034f65f588599eaf6d756b98b55640c98/client/client.go#L181-L195
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#13082 (comment)>,
or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABG_pxOG2HpjFfFliYsv6WAG6OCSPRj0ks5rfvEvgaJpZM4MKWWy>
.
|
No, I think only calling AutoSync should be sufficient. I have a standalone test driver I will try it in first to be sure. |
61ade16
to
28872e8
Compare
@smarterclayton updated, ptal |
// until it succeeds. Assuming it does, the client's list of endpoints is updated, and any | ||
// unavailable members are removed from the list. | ||
for { | ||
err := e.client.AutoSync(ctx, autoSyncInterval) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This scarily looks like a hotloop. :)
If the hotloop dies, should it cancel the lease?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right - if the inner call to Sync fails repeatedly, it would be a hotloop. At the very least, after an error, it should sleep for e.pauseInterval or maybe autoSyncInterval.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed an update where it sleeps e.pauseInterval after an AutoSync error
Auto-sync etcd endpoints while trying to acquire the leader lease. If any of the etcd cluster members is down, the sync operation will update the client's member list to remove the unavailable members. Because the lease acquistion key "set" operation is a one-shot attempt, if the first member the etcd client tries to contact is unavailable, it will not try any other members and it just fails. With the sync, we should be able to work around this.
28872e8
to
d960c59
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm [merge]
[merge] |
[Test]ing while waiting on the merge queue |
Evaluated for origin test up to d960c59 |
continuous-integration/openshift-jenkins/test SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pull_requests_origin_future/557/) (Base Commit: f984cc7) |
[merge]
…On Sun, Feb 26, 2017 at 4:13 PM OpenShift Bot ***@***.***> wrote:
continuous-integration/openshift-jenkins/merge ABORTED (
https://ci.openshift.redhat.com/jenkins/job/test_pull_requests_origin_future/565/)
(Base Commit: fa35acd
<fa35acd>
)
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#13082 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAABYsEwOXEhbCfHvqiZtvt8KQOXotL9ks5rgerwgaJpZM4MKWWy>
.
|
Evaluated for origin merge up to d960c59 |
continuous-integration/openshift-jenkins/merge SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pull_requests_origin_future/570/) (Base Commit: d757d20) (Image: devenv-rhel7_5984) |
Auto-sync etcd endpoints while trying to acquire the leader lease. If
any of the etcd cluster members is down, the sync operation will update
the client's member list to remove the unavailable members.
Because the lease acquistion key "set" operation is a one-shot attempt,
if the first member the etcd client tries to contact is unavailable, it
will not try any other members and it just fails. With the sync, we
should be able to work around this.
Fixes bug 1426183
https://bugzilla.redhat.com/show_bug.cgi?id=1426183
cc @smarterclayton @deads2k @sttts @liggitt @mfojtik @derekwaynecarr