Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DNS resolution issues when connected to a VPN #3536

Open
devth opened this Issue Oct 17, 2015 · 30 comments

Comments

Projects
None yet
@devth
Copy link

devth commented Oct 17, 2015

Using 0.6.3:

± tf --version
Terraform v0.6.3

± TF=INFO tf plan
Refreshing Terraform state prior to plan...

openstack_lb_pool_v1.clusters_preprod_pool: Refreshing state... (ID: a7b1aac5-7e24-4010-be81-c9f278729468)
openstack_lb_vip_v1.clusters-preprod: Refreshing state... (ID: f46a09a0-4666-4f98-bd30-2b8ce871f411)

The Terraform execution plan has been generated and is shown below.
Resources are shown in alphabetical order for quick scanning. Green resources
will be created (or destroyed and then created if an existing resource
exists), yellow resources are being changed in-place, and red resources
will be destroyed.

Note: You didn't specify an "-out" parameter to save this plan, so when
"apply" is called, Terraform can't guarantee this is what will execute.

I upgraded to 0.6.4, ran with same Openstack env vars and:

± tf --version
Terraform v0.6.4

± tf plan
Refreshing Terraform state prior to plan...

Error refreshing state: 1 error(s) occurred:

* Post https://os-identity.vip.foo.com:5443/v2.0/tokens: dial tcp: lookup os-identity.vip.foo.com: no such host
@jtopjian

This comment has been minimized.

Copy link
Contributor

jtopjian commented Nov 1, 2015

Is https://os-identity.vip.foo.com a resolvable domain? Additionally, is Keystone running on port 5443? It usually runs on port 5000.

edit: err... I see. You simply swapped out versions of Terraform. Can you still confirm that the Keystone URL is resolvable? Is the domain an entry in /etc/hosts (or equivalent) and not necessarily a real domain name?

If Terraform isn't resolving it, that might be a problem with Terraform core and not specifically OpenStack.

Let me know 😄

@jtopjian

This comment has been minimized.

Copy link
Contributor

jtopjian commented Nov 1, 2015

I was also reading your description of the issue in #3345. Can you verify the problem either exists or doesn't exist with the latest, unmodified 0.6.6 binaries?

@bluk

This comment has been minimized.

Copy link

bluk commented Jan 30, 2016

I ran into this issue on a private OpenStack instance. For me, I'm logging in over a VPN on OS X and while I can hit the various endpoints in the browser (and resolve correctly via ping and other utilities), Terraform seems to not resolve the IP address correctly for the various endpoints (DNS lookups I think fail but hard to tell).

If I hardcode the IP addresses of the various OpenStack domain names, I can get it to work by editing my /etc/hosts. Notably Packer does not have this issue in spawning up an instance and building an image. This is on Terraform 0.6.8 through 0.6.10.

@jtopjian

This comment has been minimized.

Copy link
Contributor

jtopjian commented Jan 30, 2016

@bluk Thank you for the info!

To confirm: When you are logged in over a VPN, you are then using a VPN-specific DNS resolver in order to resolve hosts/domains that are only accessible over the VPN?

@bluk

This comment has been minimized.

Copy link

bluk commented Jan 30, 2016

@jtopjian Yes, it's a VPN specific DNS resolver.

@jtopjian

This comment has been minimized.

Copy link
Contributor

jtopjian commented Jan 30, 2016

@bluk OK, thanks. Does the VPN software update your /etc/resolv.conf file so that all DNS requests now go through your VPN? Or are lookups done by some other means?

@btyler97

This comment has been minimized.

Copy link

btyler97 commented Mar 29, 2016

I noticed there hasn't been activity on this issue in a while, but I am experiencing the same issue and @jtopjian I can confirm that the VPN software does NOT update /etc/resolv.conf (at least not in my case). The VPN software I am using is Sonicwall Mobile Connect and I'm on OS X El Capitan. I understand that the old NetExtender software does update /etc/resolv.conf; however, there are issues with it on El Cap, so we're stuck with the mobile connect client.

@jtopjian

This comment has been minimized.

Copy link
Contributor

jtopjian commented Mar 29, 2016

@btyler97 Thanks for the information. To confirm: this is only happening when you're connected to the VPN? Are you able to use the OpenStack command line tools while you're connected to the VPN?

@pryorda

This comment has been minimized.

Copy link

pryorda commented Mar 29, 2016

@jtopjian here is a link on how dns works with mobile connect. Might help with diagnosing the issue. https://support.software.dell.com/kb/sw11559

@jtopjian

This comment has been minimized.

Copy link
Contributor

jtopjian commented Mar 29, 2016

@pryorda Thanks!

At this point, I'm trying to make a confident determination that the issue everyone is seeing is only happening when they are connected to a VPN. If so, then I believe this issue isn't local to just the OpenStack provider, but possibly Terraform core and/or Golang.

I think the main reason why this problem is manifesting within the OpenStack provider is because it's one of the few providers within Terraform that communicates with a non-public cloud provider. DNS resolution behavior might be different depending on how the DNS infrastructure that contains the OpenStack endpoint records is configured along with the VPN. The link @pryorda gave, seems to support that theory.

@btyler97

This comment has been minimized.

Copy link

btyler97 commented Mar 29, 2016

@jtopjian I can confirm that this is only an issue when connected via the VPN. After some late night research I'm also of the belief that the issue isn't local to the Openstack provider. The Golang docs on the "net" package hint at a possible cause (https://golang.org/pkg/net/) under the "Name Resolution" heading. I tried setting the ENV variables they suggested, but I'm probably doing something wrong as I didn't notice any change. Unfortunately, I just don't have enough familiarity with Go to know if they aren't applicable in this situation or if I'm missing something.

@pryorda

This comment has been minimized.

Copy link

pryorda commented Mar 30, 2016

@jtopjian Here is what we found... The issue is that Mac OS X native net dns resolver goes directly to resolv.conf and our vpn client does not update the resolv.conf since it split tunnels the queries based on dns suffix. We fixed the issue by having it build using this command:

export CGO_ENABLED=1; XC_OS="darwin" XC_ARCH="amd64" make bin

A packet capture confirmed that it was traversing the vpn rather then going directly to the servers in resolv.conf.

pryorda added a commit to pryorda/terraform that referenced this issue Mar 30, 2016

pryorda added a commit to pryorda/terraform that referenced this issue Mar 30, 2016

@jtopjian

This comment has been minimized.

Copy link
Contributor

jtopjian commented Mar 30, 2016

@pryorda @btyler97 Nice! Thank you for the investigation.

I'm going to label this as a Core bug to get some other eyes on it.

@jtopjian jtopjian changed the title Openstack identity broken for me in 0.6.4 DNS resolution issues when connected to a VPN Mar 31, 2016

pryorda added a commit to pryorda/terraform that referenced this issue Mar 31, 2016

pryorda added a commit to pryorda/terraform that referenced this issue Jul 21, 2016

@bacoboy

This comment has been minimized.

Copy link

bacoboy commented Nov 21, 2016

Seems like you need this upstream change to go language networking for this to work as expected:
golang/go#12524

@pryorda

This comment has been minimized.

Copy link

pryorda commented Dec 6, 2016

We tried that and that doesnt work well with split horizon dns.

@apparentlymart

This comment has been minimized.

Copy link
Contributor

apparentlymart commented Jun 2, 2017

I had mentioned this in passing in #14781, but want to put it here too for posterity:

Currently we use Go's native cross-compilation support to build the release binaries for all supported platforms, but that approach doesn't give us the OS-specific libraries and headers needed to use CGo on OS X, and thus we aren't able to use the libc resolver. In future we may be able to use xgo to work around this, but we won't have time to do this in the immediate term, unfortunately.

@richid

This comment has been minimized.

Copy link

richid commented Jul 27, 2017

Just going to throw my $0.02 in here in case it helps someone else.

I currently have a Vault installation sitting in AWS in a VPC using a private Route53 Hosted Zone. This means that the zone is not publicly distributed and can only be accessed within the VPC with which it is associated. To access resources in this VPC I have EC2 instances in the VPC that are used as VPN connectors. I'm running OS X and the VPN software does not update /etc/hosts, rather the OS-level DNS hooks which can be inspected via scutil --dns.

When configuring Terraform's Vault provider I get the dial tcp: lookup vault.internal.company.com on 192.168.130.1:53: no such host error. The quick way around this for me was to run route get vault.internal.company.com (again, on OS X) and put that IP into my /etc/hosts file. I may be way off but it seems like if we just let the OS do the resolution (rather than do it explicitly) it should work. But I'm sure it's not that simple.

@jason-riddle

This comment has been minimized.

Copy link
Contributor

jason-riddle commented Sep 13, 2017

Not sure what happened, but it looks like this was resolved? I can't replicate this anymore.

@apparentlymart

This comment has been minimized.

Copy link
Contributor

apparentlymart commented Sep 13, 2017

Nothing specific has changed within Terraform itself to support this, but we did switch to Go 1.9 for the latest two releases, so possibly there is some new behavior in Go 1.9 that is making this smoother.

I didn't see anything in the release notes specifically about this, but there were some DNS-related changes in the 1.9 timeframe that may have changed the situation here. Versions 0.10.3 and 0.10.4 were built with Go 1.9, while 0.10.2 was built with 1.8. If someone has the time to compare the behavior on 0.10.2 vs. 0.10.4, that could help confirm whether this got resolved by changes in Go 1.9.

@qwylz

This comment has been minimized.

Copy link

qwylz commented Nov 22, 2017

I can confirm that this is still a major issue with Terraform 0.10.8. See man 5 resolver on macOS for complete background and this note from /etc/resolv.conf:

# Mac OS X Notice
#
# This file is not used by the host name and address resolution
# or the DNS query routing mechanisms used by most processes on
# this Mac OS X system.

Terraform really must be built so that it will use macOS's native resolver, as /etc/resolv.conf is not sufficient and is documented by Apple as not the supported method for doing DNS resolution. Yes, macOS is UNIXish, but definitely has it's own ways of doing various things that are not UNIXish.

This is a major problem in our environment, as access to our cloud provider is not allowed via their public Internet addresses. Instead, DNS queries for their management systems are answered by non-public DNS servers that hand out different, internal addresses that takes our traffic over a private connection with the cloud provider. DNS queries for these domains will only get sent to the correct DNS servers when the macOS-native resolver is used. The DNS servers in /etc/resolv.conf are just plain-jane DNS servers that know nothing of the special addresses. As a result, Terraform on macOS is completely unusable for us.

Please enable the cgo netdns support so that the macOS-native resolver will be used.

@ramarnat

This comment has been minimized.

Copy link

ramarnat commented Dec 10, 2017

I have been able to workaround this issue by rebuilding the aws and nomad providers (my use case requires them) as described in terraform-providers/terraform-provider-aws#1392

@srikiraju

This comment has been minimized.

Copy link

srikiraju commented Jan 24, 2018

This is a huge issue for us. We use openDNS which rewrites the /etc/resolv.conf to point to localhost for the umbrella client and this breaks terraform. The workarounds are all painful to work with.

@pryorda

This comment has been minimized.

Copy link

pryorda commented Feb 16, 2018

Noticing this when DNS resolves now. Do the providers get built with the same options as the terraform bin?

@zoltan-toth-mw

This comment has been minimized.

Copy link

zoltan-toth-mw commented Apr 4, 2018

Terraform v0.11.3 and this is still a bug with consul and rabbitmq provider.

* consul_keys.press_release_crawler_properties: 1 error(s) occurred:

* consul_keys.press_release_crawler_properties: consul_keys.press_release_crawler_properties: Failed to read Consul key 'config/application/data': Get http://consul..../v1/kv/config/application/data?dc=fhaid-dc: dial tcp: lookup consul....internal on 10.84.1.41:53: no such host

Meanwhile curl works fine:

 curl http://consul..../v1/kv/config/application/data?dc=fhaid-dc
[{"LockIndex":0,"Key":"config/application/data","Flags":0,"Value":"....
@bitglue

This comment has been minimized.

Copy link

bitglue commented May 2, 2018

This will continue to be a problem for any Terraform binary (in fact, any Go program) which does not include cgo. Usually running with GODEBUG=netdns=9 in the environment will output something like:

go package net: built with netgo build tag; using Go's DNS resolver

This doesn't seem to work with Terraform, perhaps because it's the provider binaries doing the name resolution, and GODEBUG is not passed through to them?

Another way is to check with what libraries the binaries are linked. For example:

$ otool -L .terraform/plugins/darwin_amd64/terraform-provider-aws_v1.13.0_x4 
.terraform/plugins/darwin_amd64/terraform-provider-aws_v1.13.0_x4:

Nothing is listed, meaning this binary isn't liked with anything. In particular it's not linked with libc, where the resolver is implemented. So it can not use the macOS resolver.

Building Terraform from source will "fix" it. Though since most people install binary releases, ideally the release process would produce a cgo resolver. I haven't tried it, but apparently it's possible to cross-compile while using a cgo enabled net module without much difficulty.

Note this issue isn't limited to macOS, either. The net package will fall back to the cgo resolver under a number of conditions on non-macOS platforms where it can detect the native go resolver's behavior isn't compatible with expected semantics.

@apparentlymart

This comment has been minimized.

Copy link
Contributor

apparentlymart commented May 4, 2018

Thanks for sharing that link to "gonative", @bitglue!

If I'm understanding correctly, it seems like that works because the binary distributions of Go for other platforms already include already-compiled package library files (.a files) that already include the C library bindings, and so they can just be linked in to the final executable without requiring access to the target system C library headers, toolchain, etc.

If so, that seems like a nice way to get around the requirement of having the OS X SDK available at build time. The Terraform Core team at HashiCorp is currently focused on the configuration language improvements for the next major release, but I'll make a note to investigate this further and see what it'd take to weave this into our build process for a later release.

@nikitashalnov

This comment has been minimized.

Copy link

nikitashalnov commented Oct 12, 2018

Hi.
I have faced with the same problem, have investigated issues and want to share the results with any looking an answer.
@bitglue is right that this is actually not a problem of terraform. The main issue is in Go itself.

Sometimes I got error:

13:22 [master] n.shalnov:~/cloudflare/tf-fff.ru$ terraform plan

Error: Error loading state: Failed to open state file at gs://terraform-cf/cloudflare/semrush.ru/default.tfstate: Get https://storage.googleapis.com/terraform-cf/cloudflare/semrush.ru/default.tfstate: dial tcp: lookup storage.googleapis.com on 192.168.1.1:53: read udp 192.168.3.66:33635->192.168.1.1:53: i/o timeout

192.168.1.1 - is my office DNS server (actually Mikrotik router). Flushing DNS cache helps resolve this problem, but if someone in the office makes a request to "root ns servers" (e.g. dig something.com +trace, mikrotik will save this answer and will answer with all root servers and so on on every DNS request. You can see it in tcpdump captured on 53 port.

Moreover it answers using UDP protocol with packages which length is more than 520 bytes. It's not RFC compliant and mikrotik must answer with TCP protocol if a segment is too large. So go lib used to resolve names cannot work with this response correctly.
Switching on cgo lib forces Go being able to work with such requests.

So, if you're facing the same issue with terraform, you can:

  • change your DNS server in /etc/resolv.conf
  • flush DNS cache on your DNS server
  • compile terraform with cgo (?)

For more info see:
golang/go#21160
https://golang.org/pkg/net/

@bitglue

This comment has been minimized.

Copy link

bitglue commented Oct 12, 2018

Related golang issues: golang/go#12524 golang/go#12524

@bacoboy

This comment has been minimized.

Copy link

bacoboy commented Oct 12, 2018

I think we all agree that this is a go problem, but since hashcorp is giving us compiled binaries to use on a mac, it seems that they should be compiling it in a way that works when people are VPNed into their (likely corporate) environments. The days of being directly attached to your production environments are going bye-bye as we all shift to cloud providers (and hence using more complicated dns resolver chains).

5 more days to the 3rd anniversary of this ticket -- @mitchellh can we get some love on this already? You guys are enterprise-ready post 1.0 now ;) Let's put the last nail in this coffin already...

@docwhat

This comment has been minimized.

Copy link

docwhat commented Mar 22, 2019

It looks like golang/go#12524 is moving again... so maybe there is hope?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.