Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vault Agent Auto Auth slow performance when used as cache/listener #7965

Open
nritholtz opened this issue Dec 3, 2019 · 2 comments
Open
Assignees
Labels
agent bug Used to indicate a potential bug

Comments

@nritholtz
Copy link

nritholtz commented Dec 3, 2019

Describe the bug

Getting intermittently large, but mostly consistent slow performance when using the Vault agent auto auth as a caching/listener to our vault cluster. We ran some metrics using the vault token lookup call to be able to properly produce logs to try and debug where the issue is occurring.

We found that there was a delay somewhere in the agent itself from 3-12+ seconds before the actual call is made to the vault server. When we run the vault token lookup against the vault cluster itself, the response is instantaneous.

To Reproduce

We started a Vault agent with log level of DEBUG and the config HCL mentioned below.

A sample test was using this command against the agent locally:

# date; VAULT_ADDR=http://localhost:8200 vault token lookup; date
Tue Dec  3 20:45:32 UTC 2019
Key                  Value
---                  -----
creation_time        1575405791
creation_ttl         1h
display_name         xxxx
entity_id            xxxx
expire_time          2019-12-03T21:43:18.20604275Z
explicit_max_ttl     0s
issue_time           2019-12-03T20:43:11.976227337Z
last_renewal         2019-12-03T20:43:18.206044131Z
last_renewal_time    1575405798
meta                 xxxx
num_uses             0
orphan               true
path                 auth/aws/login
policies             xxx
renewable            true
ttl                  57m40s
type                 service
Tue Dec  3 20:45:37 UTC 2019

This is from the agent logs:

==> Vault server started! Log data will stream in below:

==> Vault agent configuration:

           Api Address 1: http://0.0.0.0:8200
                     Cgo: disabled
               Log Level: debug
                 Version: Vault v1.2.1

2019-12-03T20:43:11.307Z [INFO]  sink.file: creating file sink
2019-12-03T20:43:11.308Z [INFO]  sink.file: file sink configured: path=/tmp/file
2019-12-03T20:43:11.321Z [DEBUG] cache: auto-auth token is allowed to be used; configuring inmem sink
2019-12-03T20:43:11.323Z [INFO]  auth.handler: starting auth handler
2019-12-03T20:43:11.324Z [INFO]  auth.handler: authenticating
2019-12-03T20:43:11.324Z [INFO]  sink.server: starting sink server
2019-12-03T20:43:11.983Z [INFO]  auth.handler: authentication successful, sending token to sinks
2019-12-03T20:43:11.983Z [INFO]  auth.handler: starting renewal process
2019-12-03T20:43:11.983Z [INFO]  sink.file: token written: path=/tmp/file
2019-12-03T20:43:11.983Z [DEBUG] cache.leasecache: storing auto-auth token into the cache
2019-12-03T20:43:18.213Z [INFO]  auth.handler: renewed auth token
………..
2019-12-03T20:44:34.480Z [INFO]  cache: received request: method=GET path=/v1/auth/token/lookup-self
2019-12-03T20:44:34.480Z [DEBUG] cache: using auto auth token: method=GET path=/v1/auth/token/lookup-self
2019-12-03T20:44:34.480Z [DEBUG] cache.leasecache: forwarding request: method=GET path=/v1/auth/token/lookup-self
2019-12-03T20:44:40.779Z [INFO]  cache.apiproxy: forwarding request: method=GET path=/v1/auth/token/lookup-self
2019-12-03T20:44:40.782Z [DEBUG] cache.leasecache: pass-through response; secret without lease and token: method=GET path=/v1/auth/token/lookup-self
2019-12-03T20:44:40.782Z [INFO]  cache: stripping auto-auth token from the response: method=GET path=/v1/auth/token/lookup-self

As you can see the major delay was in the agent between the log entries of cache.leasecache: forwarding request: and cache.apiproxy: forwarding request.

Expected behavior
Reasonable response time on calls made against the agent.

Environment:

  • Vault Server Version (retrieve with vault status): v1.2.0
  • Vault CLI Version (retrieve with vault version): v1.2.1
  • Server Operating System/Architecture: Linux
  • VAULT_ADDR environment variable set to https://ourvaultplaceholder.com

Vault server configuration file(s):

auto_auth {
  method {
    type = "aws"
    config = {
      type = "iam"
      role = "XXXXX"
      header_value = "https://ourvaultplaceholder.com"
    }
  }

  sink {
    type = "file"
    config = {
      path = "/tmp/file"
    }
  }
}

cache {
  use_auto_auth_token = true
}

listener "tcp" {
  address = "0.0.0.0:8200"
  tls_disable = true
}

Additional context
A major reason we are using the vault agent auto auth is to have a shared caching agent that will handle the heavy lifting in terms of auth for all the clients that are on the same machine as the agent, instead of going against the vault servers directly.

@catsby catsby added agent bug Used to indicate a potential bug labels Dec 3, 2019
@tiger-seo
Copy link

i'd like to add that i observe the same issue with 3 seconds delay in vault agent only when running from inside docker container, but running vault agent without docker container has no delay

@calvn calvn self-assigned this Oct 12, 2020
@cpl
Copy link

cpl commented Oct 15, 2020

Edit: After inspecting all network traffic, I found the following issue:
#3709 (comment)
It was trying to resolve SRV DNS records because :443 is not specified.

Hi. While using the github.com/hashicorp/vault/api Go package, and I perform Sys().Capabilities() calls, I am getting a 20s (everytime, exactly 20s) delay between the function call and the request going out to Vault. This happens while running inside Docker, but only when hitting Vault over https (with a TLS proxy) but not when hit over http directly.

Using a custom HTTP Client does not solve the problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
agent bug Used to indicate a potential bug
Projects
None yet
Development

No branches or pull requests

5 participants