Attempting to unseal Vault immediately after initializing fails with error 400 in HA mode #491

ghost · 2015-08-04T16:15:38Z

We've got an automated setup for Vault running in a testing environment - under these conditions we have to auto-unseal in order to be able to test our infrastructure in an automated fashion. We're using Vault in combination with three Consul servers to verify that high availability works effectively with the rest of our infrastructure.

Everything works fine with initializing Vault until it gets to the point where we attempt to unseal. Any attempt to interact with Vault, even with a simple query to verify the seal status, fails with Error 400. We found that adding a 10 second sleep after initializing Vault allows everything to proceed.

This seems a bit concerning, I'm assuming it's due to some kind of disparity between Vault and Consul. Any further advice would be extremely helpful.

armon · 2015-08-04T16:56:25Z

@lyptt In an HA setup, only the active Vault will respond to queries. The standby Vaults redirect to the active instance. It takes a few seconds for Vault to elect a leader when there is none, so in that period when a request comes in there is no active instance to forward to and you get an error. Does that help clarify?

ghost · 2015-08-04T17:10:50Z

We only have one Vault instance and three Consul servers so I'm not sure if this particularly applies to us?

armon · 2015-08-04T17:12:29Z

@lyptt It does, the Consul backend is HA enabled. Vault cannot know there are no other peer servers. There could be 3 Vaults as far as each particular instances knows, so it is only safe to proceed once the lock is acquired. We could add a flag to force disable HA mode so that you can indicate there are no other peers to worry about, but the safest thing to do is to acquire the lock which is what we do now.

ghost · 2015-08-04T17:16:35Z

In that case shouldn't Vault be either queueing the request and deferring until the lock is acquired, or at least returning a response that scripts or end users can use to usefully determine this is the case?

Simply returning 400 Bad Request isn't particularly helpful, and makes it seem like the end user has done something incorrect, when the reality is it's simply not ready yet.

armon · 2015-08-04T17:24:14Z

@lyptt Probably, but it is a 0.2 product and it takes time to mature. I actually think it is best to push the error back to the client quickly instead of trying to queue the request. It should probably be a 503 or equivalent error code. Otherwise it is indistinguishable from an incredibly slow request for a client.

ghost · 2015-08-04T17:26:25Z

Yes absolutely, if the error could be changed then at least we can handle it better. Having to wait isn't too much of an issue as long as we can tell when we have to.

sethvargo added bug labels Aug 7, 2015

ghost closed this as completed Jun 6, 2016

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Attempting to unseal Vault immediately after initializing fails with error 400 in HA mode #491

Attempting to unseal Vault immediately after initializing fails with error 400 in HA mode #491

ghost commented Aug 4, 2015

armon commented Aug 4, 2015

ghost commented Aug 4, 2015

armon commented Aug 4, 2015

ghost commented Aug 4, 2015

armon commented Aug 4, 2015

ghost commented Aug 4, 2015

Attempting to unseal Vault immediately after initializing fails with error 400 in HA mode #491

Attempting to unseal Vault immediately after initializing fails with error 400 in HA mode #491

Comments

ghost commented Aug 4, 2015

armon commented Aug 4, 2015

ghost commented Aug 4, 2015

armon commented Aug 4, 2015

ghost commented Aug 4, 2015

armon commented Aug 4, 2015

ghost commented Aug 4, 2015