Instrument all connection pool PermitPools#10773
Closed
ianferguson wants to merge 21 commits intohashicorp:mainfrom
Closed
Instrument all connection pool PermitPools#10773ianferguson wants to merge 21 commits intohashicorp:mainfrom
ianferguson wants to merge 21 commits intohashicorp:mainfrom
Conversation
|
This pull request is being automatically deployed with Vercel (learn more). vault-storybook – ./ui🔍 Inspect: https://vercel.com/hashicorp/vault-storybook/29r9hv7v4 [Deployment for af48c7d canceled] |
Contributor
Author
|
|
|
It would be great to get some 👀 on this and merged in, this would be tremendously helpful! |
af48c7d to
2b2866e
Compare
Contributor
Author
|
this has many merge conflicts so I'm going to close it, but I'd be happy to rebase this instrumentation PR if y'all are interested in accepting it in the future |
pull bot
pushed a commit
to sigtrap/vault
that referenced
this pull request
Nov 13, 2025
…0376) (hashicorp#10773) Co-authored-by: kpcraig <3031348+kpcraig@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
We've run into cases where we believe that our Vault clusters are saturating the permit pool for our backend, leading to connection queueing and client timeouts.
Currently, we cannot directly verify this using metrics and have to infer it based on behavior. Additionally, when testing an increase to the concurrent connection permits for our backend it is harder to directly verify that the increased permit limit is being consumed as expected. Having insight at runtime into each permit pool's limit and current consumption will help us better understand what limits we are hitting within Vault as we operate it in our production environments.
This PR adds a package (
helper/permits) that includes a metrics instrumented wrapper of the SDK'sPermitPooltype (InstrumentedPermitPool).Each instance of the instrumented pool adds 2 metric gauges:
$prefix.permits: number of permits currently being consumed$prefix.permits-limit: the permit pools maximum permit limitIf the permit pool size were not configurable in many places, I would've opted for a single "free permits" gauge that counts down to 0 as permits are used rather than the 2 gauges included. But being able to see the current max size line on a graph and being able to calculate the percentage of permits currently in use is valuable if permit maximum can change over time.
I've instrumented the lease revocation/expiration pool and all storage backends that currently use the PermitPool in some form.
One storage backend (the oci backend) already included some similar metrics. The new implementation does not clobber or otherwise disrupt those metrics, and instead adds the new metrics to that backend as well, so that current users of the existing metrics are not impacted, but the backend has the standard named metrics for permit usage going forward as well.
I created
helper/permitsas a separate package to avoid requiring this server side metrics implementation to show up in the SDKphysicalpackage, if there is a better way to do I am happy to change things.Let me know if there's any other information I can provide, or if you'd like to take a different approach to adding visibility to the semaphore/permit pools used within Vault