Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when using terraform inside WSL2 #8022

Open
1 of 2 tasks
b1ackhawk-uh60 opened this issue Feb 4, 2022 · 51 comments
Open
1 of 2 tasks

Error when using terraform inside WSL2 #8022

b1ackhawk-uh60 opened this issue Feb 4, 2022 · 51 comments

Comments

@b1ackhawk-uh60
Copy link

b1ackhawk-uh60 commented Feb 4, 2022

Version

Microsoft Windows [Version 10.0.19044.1503]

WSL Version

  • WSL 2
  • WSL 1

Kernel Version

Kernel version: 5.10.60.1

Distro Version

Ubuntu 20.04

Other Software

Terraform v1.1.5

Repro Steps

Run terraform refresh or any command that is doing a refresh (like plan/apply) in WSL2.

Expected Behavior

for terraform refresh to complete as usual.
This was working as of 3 days ago.

Actual Behavior

Refresh does not complete and and error message is presented:
│ Error: Unable to list provider registration status, it is possible that this is due to invalid credentials or the service principal does not have permission to use the Resource Manager API, Azure error: resources.ProvidersClient#List: Failure sending request: StatusCode=0 -- Original Error: Get "https://management.azure.com/subscriptions/{my-subscription-id}/providers?api-version=2016-02-01": dial tcp: lookup management.azure.com on 172.30.96.1:53: cannot unmarshal DNS message

│ with provider["registry.terraform.io/hashicorp/azurerm"],
│ on main.tf line 10, in provider "azurerm":
│ 10: provider "azurerm" {

This consistently happens but only with the following conditions:
Running in WSL2 with my primary ISP (Xfinity) and connected to either WiFi or Ethernet

I've tried swapping out my router for a different make/model - Issue still persists
I've test on another computer, also outfitted with WSL2 (however running Ubuntu 18.04) - the issue persists
I've tested using different DNS providers - the issue persists
Also note that other tools seem to work fine (like azure cli) from WSL2, dns for managment.azure.com resolves fine (nslookup provides expected results)

Conditions where the issue does not persist and terraform operates normally:
If I simply convert WSL2 to WSL1 - no issue, terraform operates normally
If I run terraform from Windows (on the same machine) instead of WSL2 - no issue, terraform operates normally
If I connect my computer via WiFi to my phones wireless hotspot - no issue, terraform operates normally
If I connect to VPN in Windows - no issue, terraform operates normally

So it seems to be some combination of WSL2 and my ISP.

Diagnostic Logs

No response

@sirredbeard
Copy link
Contributor

172.30.96.1 looks like an IP address assigned to the WSL2 instance, not a remote IP on Azure. I wonder why it is resolving management.azure.com to your local IP address.

Any change when you sudo rm /etc/resolv.conf, wsl.exe --shutdown, and restart WSL?

Or set generateResolvConf = false in your .wslconfig file, manually enter a non-ISP DNS server, eg 1.1.1.1 or 8.8.8.8, and restart as above?

Connecting to a VPN can cause issues. I wonder if connecting and then disconnecting from the VPN left your DNS is a broken state.

@b1ackhawk-uh60
Copy link
Author

b1ackhawk-uh60 commented Feb 4, 2022

@sirredbeard thanks for the response.
edit
deleting resolv.conf and allowing wsl to recreate did not change anything.
However, turning off resolv conf generation and manually creating my own does work as a work around.

To clarify, this machine did not have any VPN configured previously. I only installed my preferred VPN software and configured as a troubleshooting step because of the issue I was having. Also, this issue started happening on two different machines on the same day. They both worked fine previously.

I believe 172.30.96.1 is the gateway, that WSL2 is just being nat'd behind

Also note that other tools seem to work fine (like azure cli) from WSL2, dns for managment.azure.com resolves fine (nslookup provides expected results)

@Otimun
Copy link

Otimun commented Feb 7, 2022

I can confirm I have the exact same issue. As mentioned you can work around the error by changing the /etc/resolve.conf file and adjust the nameserver to 1.1.1.1 or 8.8.8.8 instead of the IP address of your machine. (in my case 172.18.176.1) But something has changed over the last few days that has broken the use of terraform with regards to dns.

│ Error: Unable to list provider registration status, it is possible that this is due to invalid credentials or the service principal does not have permission to use the Resource Manager API, Azure error: resources.ProvidersClient#List: Failure sending request: StatusCode=0 -- Original Error: Get "[https://management.azure.com/subscriptions/(my-subscription id)/providers?api-version=2016-02-01":](https://management.azure.com/subscriptions//(my-subscription id)/providers?api-version=2016-02-01%22:) dial tcp: lookup management.azure.com on 172.18.176.1:53: cannot unmarshal DNS message

│ with provider["registry.terraform.io/hashicorp/azurerm"],
│ on main.tf line 28, in provider "azurerm":
│ 28: provider "azurerm" {

A normal nslookup or dig still works when using 172.18.176.1 as a name server.

Adaptador de Ethernet vEthernet (WSL):

Sufijo DNS específico para la conexión. . :
Vínculo: dirección IPv6 local. . . : fe80::c87d:1b12:8bd:318%81
Dirección IPv4. . . . . . . . . . . . . . : 172.18.176.1
Máscara de subred . . . . . . . . . . . . : 255.255.240.0
Puerta de enlace predeterminada . . . . . :

The IP mentioned is used on windows as your WSL adapter.

@pduchnovsky
Copy link

same problem here, everything was fine yesterday, today I get the error "cannot unmarshal DNS message"
I don't see any windows updates in the past 24 hours, it's weird

@sebastiansterk
Copy link

Having the exact same problem. This is a huge blocker for us.

@sebastiansterk
Copy link

sebastiansterk commented Feb 8, 2022

I was able to fix it (at least a workaround):

1. Turn off generation of /etc/resolv.conf

Using your Linux prompt, open /etc/wsl.conf an paste the following content

[network]
generateResolvConf = false

2. Restart WSL

In Powershell run:

wsl --shutdown

3. Create a custom /etc/resolv.conf

Delete the /etc/resolv.conf:

rm -f /etc/resolv.conf

Create a new resolv.conf with the following content

nameserver 8.8.8.8

4. Restart WSL

In Powershell run:

wsl --shutdown

Open WSL --> issue is fixed (at least for me)

@melsigl
Copy link

melsigl commented Feb 8, 2022

I had the same issue as of today, and I can confirm that the workaround proposed by @sebastiansterk did work splendidly.

@0Downtime
Copy link

0Downtime commented Feb 8, 2022

I also am having this issue starting mid day yesterday while working on some terraform code. Anyone have any idea what the root cause is? @sebastiansterk your fix also worked for me, thanks!

I don't know if anyone else can confirm this, but my firewalls DNS is pointed to 1.1.1.1 with forced DoT. Not sure if that is a contributing factor?

@cheeseburger12
Copy link

thank you @sebastiansterk . Your workaround worked for me. I have been searching all day

@hyzza
Copy link

hyzza commented Feb 9, 2022

This particular workaround poses a problem for those who need to use some VPN in windows and resolve internal vpn addresses from WSL linux. DNSmasq could solve this by routing requests according domain as needed in wsl but this is quite a heavy weight solution.

@hyzza
Copy link

hyzza commented Feb 9, 2022

A colleague of mine have found a pretty elegant solution to this:

echo -e "nameserver IP.OF.DNS.SERVER\ntimeout: 1" >> /etc/resolv.conf

where IP.OF.DNS.SERVER is IP of a DNS server which allows TCP DNS resolving, 8.8.8.8 for example

or adding to /etc/wsl.conf

[boot]
command="echo "nameserver IP.OF.DNS.SERVER\ntimeout: 1" >> /etc/resolv.conf"

This way worst case scenario is 1s delay when DNS TCP resolving is not successuful via primary (windows) dns.

@KaremCBC
Copy link

KaremCBC commented Feb 9, 2022

I'm having the same problem since yesterday, but unfortunately the solution from @sebastiansterk didn't work for me, on 3 separate WSL2.
Please help!

Update:
az logout / login was needed in order the solution to work!. Thanks @sebastiansterk

@bernardmaltais
Copy link

I also had the issue. Changing the DNS to 8.8.8.8 solved it. It was driving me nuts.

@mohamed-elbeltagy
Copy link

I have the exact same issue, setting DNS to 8.8.8.8 fixed it.

@simonesavi
Copy link

Same problem. Setting DNS to 8.8.8.8 fixed it, but I can confirm that DNS resolution in VPN stops to work

@vladimir-shopov
Copy link

Changing the DNS server to Google's is not a solution, but a workaround. There are times when you need to use a private DNS server.

This seems to be yet another side effect of #5806. Wondering when Microsoft will finally understand the huge impact this particular bug has on all WSL2 users and fix it.

@msbenz
Copy link

msbenz commented Feb 9, 2022

Same problem

Super jank (and very temporary) workaround until there's a true fix: grab an IP for management.azure.com and add an entry to /etc/hosts (in my case, it's currently 40.71.13.226)

echo "$(dig management.azure.com | grep -E -o "([0-9]{1,3}[\.]){3}[0-9]{1,3}$") management.azure.com" >> /etc/hosts

Maintains all other DNS configs etc and allows terraform to auth/deploy.

@pduchnovsky
Copy link

pduchnovsky commented Feb 9, 2022

This works as a workaround that is persistent

Run this and restart wsl using powershell.exe wsl --shutdown directly from wsl.

This will ensure that 9.9.9.9 nameserver will be added to /etc/resolv.conf and change dns timeout to 1 second.

Fully automatic and does not break connections over vpn on the windows.

sudo bash -c "cat >> /etc/wsl.conf <<EOF
[boot]
command = printf 'nameserver 9.9.9.9\ntimeout: 1' >> /etc/resolv.conf
EOF"

@martinjoshua
Copy link

Also experiencing this issue today.

@masonhuemmer
Copy link

Same here. This has impacted our entire team.

@ImIOImI
Copy link

ImIOImI commented Feb 10, 2022

Disabling resolve.conf and using a public DNS server didn't work for me. I suspect this is because we define private endpoints to get to private resources while on the VPN and those addresses aren't resolved correctly when using a public server.

@b1ackhawk-uh60
Copy link
Author

b1ackhawk-uh60 commented Feb 10, 2022

Disabling resolve.conf and using a public DNS server didn't work for me. I suspect this is because we define private endpoints to get to private resources while on the VPN and those addresses aren't resolved correctly when using a public server.

using a public dns server would prevent you from resolving dns on a private network. Alternatively, instead of using a public dns server for name resolution, you use the dns server of your private network.

or

You could modify the host file in windows with an entry for management.azure.com as mentioned here (thanks @AaronFriel for mentioning this issue there):
golang/go#51127 (comment)

@bernardmaltais
Copy link

bernardmaltais commented Feb 10, 2022

Here is something that could help some. I added the following to my alias file:

sudo bash -c "sed -i '/management.azure.com/d' /etc/hosts" ; sudo bash -c 'echo "$(dig management.azure.com | grep -E -o "([0-9]{1,3}[\.]){3}[0-9]{1,3}$") management.azure.com" >> /etc/hosts'

And simply call fixdns before running terraform commands.

sudo bash -c "cat >> /etc/wsl.conf <<EOF
[boot]
command = printf 'nameserver 9.9.9.9\ntimeout: 1' >> /etc/resolv.conf
EOF"

Work very well. Best workaround so far.

@ImIOImI
Copy link

ImIOImI commented Feb 11, 2022

Here is something that could help some. I added the following to my alias file:

sudo bash -c "sed -i '/management.azure.com/d' /etc/hosts" ; sudo bash -c 'echo "$(dig management.azure.com | grep -E -o "([0-9]{1,3}[\.]){3}[0-9]{1,3}$") management.azure.com" >> /etc/hosts')

And simply call fixdns before running terraform commands.

sudo bash -c "cat >> /etc/wsl.conf <<EOF
[boot]
command = printf 'nameserver 9.9.9.9\ntimeout: 1' >> /etc/resolv.conf
EOF"

Work very well. Best workaround so far.

This helped a lot. I think you've got an extra ) at the end of your bash command, though

@bernardmaltais
Copy link

This helped a lot. I think you've got an extra ) at the end of your bash command, though
Thanks, fixed.

@angelbulas
Copy link

Also experiencing this issue since yesterday

@jsmith-speedeon
Copy link

This is really weird.

Using the azure cli (az login, az group list, etc.) all works fine with the default DNS stuff. But terraform plan fails as everyone else is reporting.

Setting it to either my local network DNS resolver or my VPN DNS resolver everything works fine. This was all being handled automatically before this week.

Wonder what changed to specifically cause terraform plan/apply/refresh to break during DNS resolution.

@bernardmaltais
Copy link

bernardmaltais commented Feb 11, 2022

Wonder what changed to specifically cause terraform plan/apply/refresh to break during DNS resolution.

This seems to be yet another side effect of #5806. Wondering when Microsoft will finally understand the huge impact this particular bug has on all WSL2 users and fix it.

@AaronFriel
Copy link

AaronFriel commented Feb 11, 2022

@bernardmaltais I actually dug a bit deeper into this, and it appears that the Internet Connection Sharing DNS server does not use "message compression" (https://datatracker.ietf.org/doc/html/rfc1035#section-4.1.4) even when the upstream DNS server does. That causes the response size to be larger than the original, which isn't always correctly handled.

I don't want to declare mission accomplished too soon, but I'm now tracking golang/go#51153 which may land as a fix in Go for 1.18 and backported to previous versions.

@tenletters10
Copy link

Been spending a few days troubleshooting then and identified it was DNS with WSL2 causing it. After that found this thread. Same problem here. I need to use Private DNS that comes from a VPN and public DNS resolution at the same time. @b1ackhawk-uh60 b1ackhawk-uh60 agree with your comments you have shared so far about just using some public DNS server is not a valid solution.

@AaronFriel
Copy link

It looks like the Go team has a systemic fix slated for inclusion with 1.18 this month and the next point releases, but I can't speak to their release schedule.

golang/go#51153 (comment)

@rezarms
Copy link

rezarms commented Feb 16, 2022

It started for me today and azure cli is working fine.
I had to change ns in /etc/resolve.conf

@HumanPrinter
Copy link

Here is something that could help some. I added the following to my alias file:

sudo bash -c "sed -i '/management.azure.com/d' /etc/hosts" ; sudo bash -c 'echo "$(dig management.azure.com | grep -E -o "([0-9]{1,3}[\.]){3}[0-9]{1,3}$") management.azure.com" >> /etc/hosts'

And simply call fixdns before running terraform commands.

sudo bash -c "cat >> /etc/wsl.conf <<EOF
[boot]
command = printf 'nameserver 9.9.9.9\ntimeout: 1' >> /etc/resolv.conf
EOF"

Work very well. Best workaround so far.

@bernardmaltais This command works like a charm and is less intrusive than changing the resolv.conf. However, I'm having some trouble adding this to my bash_aliases file. Could you please share your entry including any escaped characters?

@rlees85
Copy link

rlees85 commented Feb 21, 2022

Please be aware this does not just affect WSL! I have this problem on Linux also. My DNS path also includes a step that goes via. DoT/DoH so I suspect this might be a common factor.

The workarounds posted 'work' but changing the DNS path in my opinion is not really a 'workaround' unless extremely desperate. The issue (that I suspect is with Go) needs proper attention

@davidshen84
Copy link

Is it something new? I am pretty sure my TF scripts worked in my WSL2 environment before, until today...

@rezarms
Copy link

rezarms commented Feb 24, 2022

For me changing /etc/wsl.conf and setting generateResolvConf = false in /etc/wsl.conf didn't help.
After hours it gets reset.

@IskanderNovena
Copy link

The Azure CLI has the same issue. When trying to log in with a Service Principal, I get an error stating that there are no subscriptions. When running the same command with the CLI in PowerShell, I get a normal response.
Command used:
az login --service-principal -u "<appId>" -p "<password>" --tenant "<tenantId>"

@megakid
Copy link

megakid commented Mar 3, 2022

Same here.

@kaancfidan
Copy link

This works as a workaround that is persistent, Run this and restart wsl powershell.exe wsl --shutdown, this will automatically add 9.9.9.9 as additional nameserver to /etc/resolv.conf and change dns timeout to 1 second, fully automatic and does not break connections over vpn on the windows.

sudo bash -c "cat >> /etc/wsl.conf <<EOF
[boot]
command = printf 'nameserver 9.9.9.9\ntimeout: 1' >> /etc/resolv.conf
EOF"

Worked like a charm.

@rezarms
Copy link

rezarms commented Mar 7, 2022

The workaround doesn't work for me. If I add generateResolvConf = false, after shutting down and starting the wsl no file is created and I I remove the line the workaround doesn't do anything and still I get autogenerated resolve.conf file

@moneygit
Copy link

moneygit commented Mar 8, 2022

The sebastiansterk workaround worked for me.
Had two machine, same Windows build and all. One working and one not.

@epomatti
Copy link

Changing to Google DNS fixed my issue as well.

Not the first time WSL2 default DNS gives me annoying issues.

@timmyreilly
Copy link

Just adding another wrinkle, it was working for me for a second, but then I logged in with a Service Principal az login --service-principal --username $ARM_CLIENT_ID --password $ARM_CLIENT_SECRET --tenant $ARM_TENANT_ID and it broke again.

@ImIOImI
Copy link

ImIOImI commented Apr 14, 2022

@bernardmaltais This command works like a charm and is less intrusive than changing the resolv.conf. However, I'm having some trouble adding this to my bash_aliases file. Could you please share your entry including any escaped characters?

Obviously, I'm not Bernard, but here is the exact code I have in my .zshrc file. I didn't set it up as an alias.

fixdns() {
  command sudo bash -c "sed -i '/management.azure.com/d' /etc/hosts" ; sudo bash -c 'echo "$(dig management.azure.com | grep -E -o "([0-9]{1,3}[\.]){3}[0-9]{1,3}$") management.azure.com" >> /etc/hosts'
}

@williamohara
Copy link

williamohara commented May 15, 2022

confirmed that @sebastiansterk 's solution worked for me - I was about ready to throw in the towel - give up technology all together and move to the woods to live off the land. I had stepped away from coding my project for a time and i guess a windows update did it. is there any ticket elsewhere for a resolution?

@ndamulelonemakh
Copy link

ndamulelonemakh commented Aug 21, 2022

I was able to fix it (at least a workaround):

1. Turn off generation of /etc/resolv.conf

Using your Linux prompt, open /etc/wsl.conf an paste the following content

[network]
generateResolvConf = false

2. Restart WSL

In Powershell run:

wsl --shutdown

3. Create a custom /etc/resolv.conf

Delete the /etc/resolv.conf:

rm -f /etc/resolv.conf

Create a new resolv.conf with the following content

nameserver 8.8.8.8

4. Restart WSL

In Powershell run:

wsl --shutdown

Open WSL --> issue is fixed (at least for me)

I can confirm that this worked for me as well. @sebastiansterk Thanks for saving me time:)

@surlypants
Copy link

the only "fix" i have found is to downgrade to WSL1. every other suggestion has only provided temporary / non-persistent (if any) relief

@AaronFriel
Copy link

@surlypants A recent build of terraform should fix this, but terraform providers will need to be built on a recent version of Go.

@migldasilva
Copy link

Every solution proposing using the [boot] section on /etc/wsl.conf file are available only for Windows 11 and Server 2022.

https://learn.microsoft.com/en-us/windows/wsl/wsl-config#boot-settings

@newbenji
Copy link

im using terragrunt in docker container in wsl2
and had the issue with slow terraform running.
Giving --dns=1.1.1.1 to the pod changed so its fast again

@wdrury-uk
Copy link

wdrury-uk commented Aug 15, 2023

Still got same issue in 2023 on WSL2, cant run terraform :(

tried fix above on WSL2 with Ubuntu 20.04.6 LTS.

Remove /etc/resolv.conf and now have broken symlink showing red.
Cant cat resolve.conf as says "No such file or directory"

Why cant MS fix a bug from over 2 years ago. Just match WSL nameservers to my Windows PC nameserver :(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests