-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
postgresql HA backend - vault auto seals when it cannot open connection to database due to exhausted local ports #11936
Comments
for the record this seems to happen due to the connection pooling which is too small by default (unset?). If we set The following setting worked flawlessly.
IMHO this could become: |
Hi @write0nly, This suggestion makes good sense to me, I'm all for it. I'm not sure when we'll get to it though, feel free to submit a PR if you get impatient. |
Hi @write0nly - following up on Nick's comment, was this work that you'd be interested in taking up and filing a PR for? Please let us know how we can help. Thanks! |
Thanks for this. We have a small setup with mysql backend and we faced the same issue. In our case, the following configuration also works smoothly
|
This issue was caught in QA/stress testing and is not really expected in a production environment, however could also be forced by users who can login to the vault server host.
Because vault (when using postgresql backend) does fast ephemeral connections to postgresql to create/delete leases and entries in the DB, if there are too many connections in TIME_WAIT or CLOSE_WAIT vault can cycle through the entirety of the port range available for connections and eventually run out of ports, printing the following type of error:
after some time of this happening vault errors and seals itself. In the case where vault tries to expire leases upon startup and has too many leases (let's say 200k it needs to expire) vault exhausts ports and then seals itself. This causes vault to become unusable because it keeps on re-sealing itself over and over again.
If the user is persistent and also unseals vault in a loop, then vault will reach a stable point when the number of leases goes below 10k, which can be seen with this query:
Steps to reproduce the behavior:
Have a vault cluster (tested on v1.7.2) using the postgresql storage backend. This was tested in a cluster but may also happen on a single vault.
Run
vault write auth/approle/login role_id=... secret_id=...
in a loop millions of times until we have 200k+ outstanding vault tokens that are going to expire.stop all vaults in the cluster, let's say due to an upgrade.
Make sure you have a large number of outstanding leases in the DB:
Expected behavior
5. vault gets unsealed and works normally doing lease deletion in background.
Observed behaviour
6. Vault frantically tries to remove expired leases and delete leases from tables, rapidly cycling and exhausting all source ports to the postgresql server. When no ports are available anymore vault starts erroring and then seals itself.
The text was updated successfully, but these errors were encountered: