Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Actuator Health Endpoint returns 503 when app is working but Spring Cloud Vault points to standby #112

Closed
sworisbreathing opened this issue May 18, 2017 · 8 comments

Comments

@sworisbreathing
Copy link
Contributor

Starting in version 0.6.2, requests to a standby Vault server are automatically forwarded to the active member, and thus complete successfully (except apparently for requests to the system backend).

However, VaultHealthIndicator will report OUT_OF_SERVICE in such a scenario, meaning if you are using Spring Boot Actuator endpoints to provide status information (for example, using /management/health for a status check in a load balancer), the actuator endpoint will report an HTTP 503 status even though the application is successfully communicating with Vault and able to service requests.

A fallback approach could query the /sys/leader endpoint and, if ha_enabled: true is returned in the response payload, fire a subsequent request to the leader_address to check the health

@mp911de
Copy link
Member

mp911de commented May 18, 2017

Thanks for the ticket. From your report, I read that you're using Vault HA with direct server communication (no load balancer in between). I'm not sure adding cluster-awareness right now is a good path to follow. True cluster-awareness would mean that cluster state reflects down to the client so the endpoint gets updated and sends requests to the active host. I created spring-projects/spring-vault#98 to track Vault cluster efforts.

ThevaultHealthIndicator bean is created conditionally so you can provide your own instance to customize health check behavior (see VaultHealthIndicator for an implementation example).

@sworisbreathing
Copy link
Contributor Author

sworisbreathing commented May 18, 2017

@mp911de thanks for the tip. I'll definitely try a custom VaultHealthIndicator, which passes the standbyok request parameter

@mp911de
Copy link
Member

mp911de commented May 19, 2017

Maybe there is even a less-invasive change possible for now (until we get to cluster support).

Starting with Vault 0.6.2, a standby node isn't an issue anymore. The health response gives us all required details to decide whether we're communicating with an instance that forwards requests or not.

The health check could adapt to the version: Responses without a Version number are pre-0.6.1, Version number starting with Vault v is 0.6.1, every other version number indicates 0.6.2 or higher. In case of standby it can return out of service for versions before 0.6.2 and healthy for versions 0.6.2 and higher.

Does this make sense?

@sworisbreathing
Copy link
Contributor Author

Prior to 0.6.2, Vault's default behavior was to return a redirect to the leader for any operations sent to a standby node (except for some or maybe all of the /sys/... endpoints).

Assuming the underlying client library is set to automatically follow redirects (which I believe is the default behavior in both OkHttp and HttpClient), I'm not sure it would have been an issue on older Vault releases either.

Also, in 0.6.2 onwards you can configure Vault to use the older redirect behavior. Though I can't imagine why someone would want to do this, it would probably be difficult for a client to know ahead of time which is the case. In any case, it doesn't really matter - if the underlying http library follows redirects, then requests sent to a standby node should still succeed from an application perspective.

@mp911de
Copy link
Member

mp911de commented May 22, 2017

I'm inclined to change standby state to OK. The status message already reports standby state and applications will continue working in a healthy state.

/cc @singram

@mp911de mp911de added this to the 1.0.2 milestone May 22, 2017
mp911de added a commit that referenced this issue May 27, 2017
We now accept Vault standby nodes as available. Requests to standby nodes are redirected by Vault to the master node. Communication with a standby node allows using Vault without functional restrictions.

Related pull request: gh-113.
Fixes gh-112.
@mp911de
Copy link
Member

mp911de commented May 27, 2017

Changed Vault standby node health check result to Health.up().

@hellohelloye
Copy link

I met the same issue, implementing custom health check, the app working fine, but return 503 on endpoint /actuator/health.

https://docs.spring.io/spring-boot/docs/current/reference/html/production-ready-features.html#writing-custom-healthindicators

Did you find the solution?

@mp911de
Copy link
Member

mp911de commented Apr 1, 2020

Is this still an issue after upgrading? If so, please file a new ticket.

spencergibb pushed a commit that referenced this issue Sep 14, 2023
We now accept Vault standby nodes as available. Requests to standby nodes are redirected by Vault to the master node. Communication with a standby node allows using Vault without functional restrictions.

Related pull request: gh-113.
Fixes gh-112.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants