Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zookeeper backend timeout and no re-auth #1989

Closed
afterwords opened this issue Oct 11, 2016 · 5 comments
Closed

Zookeeper backend timeout and no re-auth #1989

afterwords opened this issue Oct 11, 2016 · 5 comments

Comments

@afterwords
Copy link

I'm using zookeeper as my HA backend in production and have run into the following issue:

When zookeeper restarts or expires the vault client session, vault doesn't try to reauth.

2016-10-11 00:20:36.295243 I | Authentication failed: zk: session has been expired by the server 2016/10/11 00:20:36.295456 [WRN] core: leadership lost, stopping active operation 2016-10-11 00:20:36.296029 I | Connected to xxx.xxx.xxx.xxx:2181 2016/10/11 00:20:36.298505 [INF] core: pre-seal teardown starting 2016/10/11 00:20:36.298674 [INF] core/stopClusterListener: stopping listeners 2016/10/11 00:20:36.298898 [INF] core/startClusterListener: shutting down listeners 2016-10-11 00:20:36.358971 I | Authenticated: id=1234567890, timeout=4000 2016/10/11 00:20:36.459922 [INF] core/startClusterListener: listeners successfully shut down 2016/10/11 00:20:36.459974 [INF] core/stopClusterListener: success 2016/10/11 00:20:36.460096 [INF] rollback: stopping rollback manager 2016/10/11 00:20:36.460749 [INF] core: pre-seal teardown complete 2016/10/11 00:20:36.586379 [ERR] core: failed to acquire lock error=zk: not authenticated

You can imagine this is pretty problematic for HA.

@jefferai
Copy link
Member

Is this 0.6.2? I seem to remember something similar to this being fixed in it.

@afterwords
Copy link
Author

This is 0.6.1. Reviewing 0.6.2 notes now, but not seeing anything about it.

@afterwords
Copy link
Author

afterwords commented Oct 11, 2016

Also, all of the other nodes had already timed out since they weren't getting any traffic and so I ended up with a bunch of Vault nodes that were all reporting they were in standby with no leader, but couldn't connect to Zookeeper.

@afterwords
Copy link
Author

Just tested 0.6.2 in my sandbox and it seems that you are correct @jefferai . Thanks again.

@jefferai
Copy link
Member

For completeness, this wasn't in the notes because it was solved via a normal upstream library sync -- see #1933

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants