Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vault client slow with CNAME #3709

Closed
adamnoll opened this issue Dec 18, 2017 · 7 comments
Closed

Vault client slow with CNAME #3709

adamnoll opened this issue Dec 18, 2017 · 7 comments
Milestone

Comments

@adamnoll
Copy link

adamnoll commented Dec 18, 2017

Environment:

  • Vault Version: v0.9.0
  • NOTE: Vault is running under the official docker container
  • Operating System/Architecture: Kubernetes 1.8.5

Vault Config File:
backend "consul" { address = "127.0.0.1:8500" path = "vault/" token = "xxxx" disable_registration = "true" }
default_lease_ttl = "168h"
max_lease_ttl = "720h"
listener "tcp" { address = "0.0.0.0:8200" tls_cert_file = "/vault/ssl/vault.pem" tls_key_file = "/vault/ssl/vault.key" tls_client_ca_file = "/vault/ssl/ca.pem" }
listener "tcp" { address = "127.0.0.1:9000" tls_disable = 1 }

Startup Log Output:
==> Vault server started! Log data will stream in below:

2017/12/13 16:03:29.676774 [INFO ] core: vault is unsealed
2017/12/13 16:03:29.676828 [INFO ] core: entering standby mode
2017/12/13 16:03:30.334405 [INFO ] core: acquired lock, enabling active operation
2017/12/13 16:03:30.375861 [INFO ] core: post-unseal setup starting
2017/12/13 16:03:30.378161 [INFO ] core: loaded wrapping token key
2017/12/13 16:03:30.378177 [INFO ] core: successfully setup plugin catalog: plugin-directory=
2017/12/13 16:03:30.395767 [INFO ] core: successfully mounted backend: type=kv path=secret/
2017/12/13 16:03:30.395885 [INFO ] core: successfully mounted backend: type=system path=sys/
2017/12/13 16:03:30.395956 [INFO ] core: successfully mounted backend: type=pki path=pki/primary.prod/auth/
2017/12/13 16:03:30.395983 [INFO ] core: successfully mounted backend: type=aws path=prod.aws/
2017/12/13 16:03:30.396039 [INFO ] core: successfully mounted backend: type=pki path=pki/dev.prod/auth/
2017/12/13 16:03:30.396094 [INFO ] core: successfully mounted backend: type=pki path=pki/primary.prod/namespaces/support-case-triage/tiller/
2017/12/13 16:03:30.396114 [INFO ] core: successfully mounted backend: type=cubbyhole path=cubbyhole/
2017/12/13 16:03:30.396284 [INFO ] core: successfully mounted backend: type=identity path=identity/
2017/12/13 16:03:30.416959 [INFO ] expiration: restoring leases
2017/12/13 16:03:30.417026 [INFO ] rollback: starting rollback manager
2017/12/13 16:03:30.423445 [INFO ] identity: entities restored
2017/12/13 16:03:30.425594 [INFO ] identity: groups restored
2017/12/13 16:03:30.427821 [INFO ] core: post-unseal setup complete
2017/12/13 16:03:30.427843 [INFO ] core/startClusterListener: starting listener: listener_address=0.0.0.0:8201
2017/12/13 16:03:30.427918 [INFO ] core/startClusterListener: serving cluster requests: cluster_listen_address=[::]:8201
2017/12/13 16:03:30.427941 [INFO ] core/startClusterListener: starting listener: listener_address=127.0.0.1:9001
2017/12/13 16:03:30.427979 [INFO ] core/startClusterListener: serving cluster requests: cluster_listen_address=127.0.0.1:9001
2017/12/13 16:03:30.462812 [INFO ] expiration: lease restore complete

Expected Behavior:
The vault client cli should be responsive whether or not the server address is a CNAME.

Actual Behavior:
When using the vault client cli with VAULT_ADDR containing a CNAME, any command takes roughly a minute to complete. The command will eventually succeed. Even vault status takes a significant amount of time to return. When calling the vault api with curl, the command returns almost instantly. Running vault under strace shows that the client is blocking a number of times. Pointing VAULT_ADDR to the target of the CNAME and setting VAULT_TLS_SERVER_NAME is an effective workaround.

Steps to Reproduce:
export VAULT_ADRR=https://xxx.yyy.zzz
where xxx.yyy.zzz is a CNAME to internal-12345.us-east-1.elb.amazonaws.com.
Run vault status. Observe that it takes from 20 seconds to several minutes for the command to eventually succeed. Now
export VAULT_ADDR=https://internal-12345.us-east-1.elb.amazonaws.com
export VAULT_TLS_SERVER_NAME=xxx.yyy.zzz
The vault command succeeds immediately.
Run curl https://xxx.yyy.zzz/v1/sys/seal-status. Observe that the server provides an immediate response.

@jefferai
Copy link
Member

If the Vault CLI is blocking, that almost certainly means it's waiting on DNS resolution. Unfortunately that's happening at the Go level so it's not something we can easily manage.

If you can build Vault on your own you may want to try playing with the netdns setting -- see https://golang.org/pkg/net/#hdr-Name_Resolution -- and try the cgo option, which uses system DNS resolution instead of Go's built-in resolution. If that solves the problem, you're likely hitting some pathological case in Go's DNS library and we need to get an issued filed there.

@jefferai jefferai added this to the 0.9.2 milestone Dec 18, 2017
@jefferai jefferai modified the milestones: 0.9.2, 0.9.3 Jan 17, 2018
@jefferai jefferai modified the milestones: 0.9.3, 0.9.4 Jan 28, 2018
@jefferai jefferai modified the milestones: 0.9.4, 0.10 Feb 14, 2018
@jbrwon2006
Copy link

I have the same issue with CentOS 7 running on Fusion. Not issue on the host OS.
From packet capture, it appears to timeout 4 times (5s each) on srv request for _http._tcp.{CNAME}.localdomain

@jefferai jefferai modified the milestones: 0.10, 0.10.1 Apr 10, 2018
@jefferai
Copy link
Member

Closing due to lack of feedback. If this is still an issue with 0.10, please write back (new version of Go, and it may have changed/fixed this). But the next steps after that would be to try a dynamic build using system DNS resolution to see if it helps.

@nickmaccarthy
Copy link

I too am having an issue similar to this, but it seems to only affect Mac's in our environment.

I am running Vault v0.10.1 ('756fdc4587350daf1c65b93647b2cc31a6f119cd') on my Mac, and 0.10.0 on the servers in AWS. Our Vault systems, like @adamnoll's also sit behind an ELB, which we tied an A record (vault.company.com) to the ELB's CNAME. On a CentOS 7 VM on my mac, the vault client works fine, which is nat'd on my Mac network. On a co-workers Windows 10 machine, he doesnt have this timeout issue either. It seems its only related the Mac version of the Vault binary.

@jefferai
Copy link
Member

IIRC, on Mac, Go programs are always compiled dynamically, not statically (it also might be that they're only cross-compiled dynamically, I forget, but we do our releases via cross-compiling). If it's a dynamic binary Go defaults to using system DNS libraries for resolution as opposed to internal name lookup logic. Check out https://golang.org/pkg/net/#hdr-Name_Resolution but basically try setting export GODEBUG=netdns=go and see if this solves your issue.

@deverton
Copy link
Contributor

My guess is the delay is caused by this line of code https://github.com/hashicorp/vault/blob/master/api/client.go#L560 which does a SRV lookup if the port is not specified.

@nickmaccarthy can you try specifying your Vault address including the port, e.g. :443 and see if its quicker?

@nickmaccarthy
Copy link

Hi @deverton , that worked! Specifying :443 at the end of VAULT_ADDR solved it. Thank you, I had worked around this by building a linux VM with vault. Thank you for finding that!

jsageryd added a commit to jsageryd/.config.d that referenced this issue Jan 16, 2019
The vault cli is terribly slow; 'vault status' takes around 30 seconds
to run. This seems to be related to how DNS resolving works in Go on
macOS. Compiling the vault binary from source instead of using the
binary provided by HashiCorp or the one from Homebrew in combination
with setting the Vault server port explicitly seems to solve it.

  brew uninstall vault
  go get github.com/hashicorp/vault
  export VAULT_ADDR=https://vault.b17g.services:443/

Details:

  deverton commented on 16 Jul 2018
    My guess is the delay is caused by this line of code
    https://github.com/hashicorp/vault/blob/838a449c9b48/api/client.go#L560
    which does a SRV lookup if the port is not specified.

    @nickmaccarthy can you try specifying your Vault address including the
    port, e.g. :443 and see if its quicker?

  nickmaccarthy commented on 16 Jul 2018
    Hi @deverton , that worked! Specifying :443 at the end of VAULT_ADDR
    solved it. Thank you, I had worked around this by building a linux VM
    with vault. Thank you for finding that!

hashicorp/vault#3709 (comment)
jsageryd added a commit to jsageryd/.config.d that referenced this issue Mar 10, 2019
The vault cli is terribly slow; 'vault status' takes around 30 seconds
to run. This seems to be related to how DNS resolving works in Go on
macOS. Compiling the vault binary from source instead of using the
binary provided by HashiCorp or the one from Homebrew in combination
with setting the Vault server port explicitly seems to solve it.

  brew uninstall vault
  go get github.com/hashicorp/vault
  export VAULT_ADDR=https://vault.b17g.services:443/

Details:

  deverton commented on 16 Jul 2018
    My guess is the delay is caused by this line of code
    https://github.com/hashicorp/vault/blob/838a449c9b48/api/client.go#L560
    which does a SRV lookup if the port is not specified.

    @nickmaccarthy can you try specifying your Vault address including the
    port, e.g. :443 and see if its quicker?

  nickmaccarthy commented on 16 Jul 2018
    Hi @deverton , that worked! Specifying :443 at the end of VAULT_ADDR
    solved it. Thank you, I had worked around this by building a linux VM
    with vault. Thank you for finding that!

hashicorp/vault#3709 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants