Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vault health check issues #1486

Closed
bmonkman opened this issue Jun 3, 2016 · 7 comments
Closed

Vault health check issues #1486

bmonkman opened this issue Jun 3, 2016 · 7 comments
Assignees
Milestone

Comments

@bmonkman
Copy link
Contributor

bmonkman commented Jun 3, 2016

Heya, I'm trying Vault 0.6 beta 2 and I'm having issues with the "sealed" consul health check.
The check is registered, if I stop vault it goes into critical due to "TTL expired", but it doesn't properly report the sealed state or go into warning state while sealed.

$ curl https://localhost:8200/v1/sys/health
{"initialized":true,"sealed":true,"standby":true,"server_time_utc":1464973008}
$ curl http://localhost:8589/v1/agent/checks | jq .
{
  "vault-sealed-check": {
    "ModifyIndex": 0,
    "CreateIndex": 0,
    "Node": "vault1.hostname.com",
    "CheckID": "vault-sealed-check",
    "Name": "Vault Sealed Status",
    "Status": "passing",
    "Notes": "Vault service is healthy when Vault is in an unsealed status and can become an active Vault server",
    "Output": "Vault Unsealed",
    "ServiceID": "vault:1.2.3.4:8200",
    "ServiceName": "vault"
  }
}

The Vault config is very straightforward, Consul is working fine, and I've granted the Consul ACL token sufficient access to write to the KV store and the vault service. (Even tried changing to a management token.)
I also tried stopping everything, gracefully leaving the consul agent, manually removing the service and checks using both the agent and catalog apis, but no dice.

Here is the Vault config:

backend "consul" {
  address = "127.0.0.1:8589"
  path = "vault"
  token = "xxx"
}

listener "tcp" {
  address = "0.0.0.0:8200"
  tls_cert_file = "/etc/vault.d/ssl/vault.crt"
  tls_key_file = "/etc/vault.d/ssl/vault.key"
}

Any ideas?

Thanks!

@jefferai jefferai added this to the 0.6.0 milestone Jun 3, 2016
@sean-
Copy link
Contributor

sean- commented Jun 3, 2016

@bmonkman Thanks for the report. It sounds like what's happening is Consul is correctly reporting Vault in a critical state when the TTL check expires, but when you restart Vault and it is in a sealed state, you're still seeing Consul report the process as unavailable (even though it's up, but still in a sealed state). Is that correct?

Can you answer a few additional questions regarding this?

  • How are you monitoring the output of the Vault service when it is in a critical state? Are you using the UI, or are you polling the Agent or the Catalog endpoints?
  • Does this problem "resolve itself" if you leave it in a critical state for ~10min?

With that information we should be able to get this figured out. Cheers.

@bmonkman
Copy link
Contributor Author

bmonkman commented Jun 3, 2016

Not exactly. When I bring vault back up, it reports it as available and unsealed, as in the health check response I posted above. (Notice the string "Vault Unsealed", even though the previous command returned "sealed":true)
Both Consul UI and the API are showing the same result.
The health check never enters a warning state or shows a message saying "Sealed", regardless of how long I leave it.

@sean- sean- added bug labels Jun 3, 2016
@sean-
Copy link
Contributor

sean- commented Jun 3, 2016

@bmonkman I can't fully context switch into this problem this second, but I did take a quick look and found something that's likely relevant. Would you be comfortable applying a quick patch and seeing if that fixes things? If so I'll toss it up as a gist, otherwise I'll have a chance to fully test and dig into this next week. ? LMK.

@bmonkman
Copy link
Contributor Author

bmonkman commented Jun 3, 2016

Yeah no problem, Gist me!

On Jun 3, 2016, at 11:44 AM, Sean Chittenden notifications@github.com wrote:

@bmonkman I can't fully context switch into this problem this second, but I did take a quick look and found something that's likely relevant. Would you be comfortable applying a quick patch and seeing if that fixes things? If so I'll toss it up as a gist, otherwise I'll have a chance to fully test and dig into this next week. ? LMK.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

@sean-
Copy link
Contributor

sean- commented Jun 3, 2016

https://pastebin.com/raw/i90ERJ67

patch -p1 < gh1486.diff && make dev

If that works I'll merge the fix, otherwise like I said, I'll dig into it early next week when I can actually context switch into this for more than 5min.

@bmonkman
Copy link
Contributor Author

bmonkman commented Jun 3, 2016

Okay, I found the issue. It is also affecting the active/standby tagging.
I'll submit a PR.

bmonkman pushed a commit to bmonkman/vault that referenced this issue Jun 3, 2016
sean- added a commit that referenced this issue Jun 3, 2016
#1486 : Fixed sealed and leader checks for consul backend
@sean-
Copy link
Contributor

sean- commented Jun 3, 2016

Thank you for the fix!

@sean- sean- closed this as completed Jun 3, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants