-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Attempting to unseal Vault immediately after initializing fails with error 400 in HA mode #491
Comments
@lyptt In an HA setup, only the active Vault will respond to queries. The standby Vaults redirect to the active instance. It takes a few seconds for Vault to elect a leader when there is none, so in that period when a request comes in there is no active instance to forward to and you get an error. Does that help clarify? |
We only have one Vault instance and three Consul servers so I'm not sure if this particularly applies to us? |
@lyptt It does, the Consul backend is HA enabled. Vault cannot know there are no other peer servers. There could be 3 Vaults as far as each particular instances knows, so it is only safe to proceed once the lock is acquired. We could add a flag to force disable HA mode so that you can indicate there are no other peers to worry about, but the safest thing to do is to acquire the lock which is what we do now. |
In that case shouldn't Vault be either queueing the request and deferring until the lock is acquired, or at least returning a response that scripts or end users can use to usefully determine this is the case? Simply returning 400 Bad Request isn't particularly helpful, and makes it seem like the end user has done something incorrect, when the reality is it's simply not ready yet. |
@lyptt Probably, but it is a 0.2 product and it takes time to mature. I actually think it is best to push the error back to the client quickly instead of trying to queue the request. It should probably be a 503 or equivalent error code. Otherwise it is indistinguishable from an incredibly slow request for a client. |
Yes absolutely, if the error could be changed then at least we can handle it better. Having to wait isn't too much of an issue as long as we can tell when we have to. |
We've got an automated setup for Vault running in a testing environment - under these conditions we have to auto-unseal in order to be able to test our infrastructure in an automated fashion. We're using Vault in combination with three Consul servers to verify that high availability works effectively with the rest of our infrastructure.
Everything works fine with initializing Vault until it gets to the point where we attempt to unseal. Any attempt to interact with Vault, even with a simple query to verify the seal status, fails with Error 400. We found that adding a 10 second sleep after initializing Vault allows everything to proceed.
This seems a bit concerning, I'm assuming it's due to some kind of disparity between Vault and Consul. Any further advice would be extremely helpful.
The text was updated successfully, but these errors were encountered: