Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WSL2] [Interop] Keep a single shared /run/WSL/* socket #5065

Open
paulstelian97 opened this issue Apr 9, 2020 · 31 comments
Open

[WSL2] [Interop] Keep a single shared /run/WSL/* socket #5065

paulstelian97 opened this issue Apr 9, 2020 · 31 comments
Labels

Comments

@paulstelian97
Copy link

Please use the following bug reporting template to help produce issues which are actionable and reproducible, including all command-line steps necessary to induce the failure condition. Please fill out all the fields! Issues with missing or incomplete issue templates will be closed.

If this is a console issue (a problem with layout, rendering, colors, etc.), please post to the console issue tracker.

Important: Do not open GitHub issues for Windows crashes (BSODs) or security issues. Please direct all Windows crashes and security issues to secure@microsoft.com. Ideally, please configure your machine to capture minidumps, repro the issue, and send the minidump from "C:\Windows\minidump".\

See our contributing instructions for assistance.

Please fill out the below information:

  • Your Windows build number: (Type ver at a Windows Command Prompt)
    Microsoft Windows [Version 10.0.19041.172]

  • What you're doing and what's happening: (Copy&paste the full set of specific command-line steps necessary to reproduce the behavior, and their output. Include screen shots if that helps demonstrate the problem.)

Launch a new console of a WSL2 distro (either in Windows Terminal, wsl.exe directly or anything else). Then launch either tmux or gnome-terminal. Then close the original console. If I return to the tmux session and try to run a Windows executable, or if I attempt this in gnome-terminal, I get the following error:
<3>init: (11770) ERROR: UtilConnectToInteropServer:300: connect failed 2

This is because the $WSL_INTEROP variable points to a socket which was deleted (when the original terminal was closed)

  • What's wrong / what should be happening instead:

Be able to use Interop even in that situation.

  • Strace of the failing command, if applicable: (If some_command is failing, then run strace -o some_command.strace -f some_command some_args, and link the contents of some_command.strace in a gist here).
    The only mildly relevant line sums up as:
    connect(fd, {AF_UNIX, sun_path="/run/WSL/11067_interop"}, 110) = -1 ENOENT.

  • For WSL launch issues, please collect detailed logs.

@paulstelian97 paulstelian97 changed the title [WSL2] [Interop] Keep a single shared /var/WSL/* socket [WSL2] [Interop] Keep a single shared /run/WSL/* socket Apr 9, 2020
@therealkenc
Copy link
Collaborator

therealkenc commented Apr 10, 2020

Alright, gleaning the repro, you're in a state like this:

image

You can see /init listening with lsof. Here the client's /init parent is 173:

image

As a work-around, you can set the $WSL_INTEROP variable manually.

image

Perhaps the binfmt interpreter instance of /init could walk up the process tree itself, automating the aformentioned. Eliminate $WSL_INTEROP. [Just spitballing.]

@paulstelian97
Copy link
Author

Workaround works, of course, but unless there is a way I can somehow automate it it doesn't matter.

@lvlts
Copy link

lvlts commented Aug 27, 2020

I have this function in my .zshrc, and I call it on every shell init:

fix_wsl2_interop() {
    for i in $(pstree -np -s $$ | grep -o -E '[0-9]+'); do
        if [[ -e "/run/WSL/${i}_interop" ]]; then
            export WSL_INTEROP=/run/WSL/${i}_interop
        fi
    done
}

So far, so good, it works all the time.

Edit: fixed the regex to include 0 as @zakandrewking pointed out.

@zakandrewking
Copy link

thanks @lvlts, this helps me. I'd suggest a small change: grep -o -E '[0-9]+'

@paulstelian97
Copy link
Author

I have this function in my .zshrc, and I call it on every shell init:

fix_wsl2_interop() {
    for i in $(pstree -np -s $$ | grep -o -E '[0-9]+'); do
        if [[ -e "/run/WSL/${i}_interop" ]]; then
            export WSL_INTEROP=/run/WSL/${i}_interop
        fi
    done
}

So far, so good, it works all the time.

Edit: fixed the regex to include 0 as @zakandrewking pointed out.

Stole that, added in my own .bashrc. I'll just call it manually if I see issues.

@str4d
Copy link

str4d commented Aug 31, 2020

If anyone is arriving here and needs a similar function for nushell (like I did):

alias fix-wsl2-interop --save [] {
  echo $nu.env | insert WSL_INTEROP $(
    pstree -np -s | grep -o -E '[0-9]+' | lines | each {
      build-string /run/WSL/ $it _interop
    } | split column "|" path | insert exists {
      get path | path exists
    } | where $it.exists | get path
  ) | config set_into env
}

@elucidsoft
Copy link

This is freaking odd, I had a previous build with WSL 2 exact same source code, everything the same. I reinstalled windows and recreated the setup, and encountered this issue without explanation. The same setup as before, so what made this happen this time vs. last time? I have no idea, as I literally set both environments up following the exact same procedures.

@paulstelian97
Copy link
Author

This is freaking odd, I had a previous build with WSL 2 exact same source code, everything the same. I reinstalled windows and recreated the setup, and encountered this issue without explanation. The same setup as before, so what made this happen this time vs. last time? I have no idea, as I literally set both environments up following the exact same procedures.

@elucidsoft Are you using stuff like Gnome Terminal or other GUI apps? The /run/WSL/* sockets remain valid as long as the wsl.exe or bash.exe (or ubuntu.exe etc) commands that you manually call remain valid. I encounter the issue whenever using gnome-terminal since that one allows the original normal or wt.exe terminal executable to close and invalidate the socket.

@elucidsoft
Copy link

I am using VSCode, but what's odd this never happened to me once in my previous setup which should have been identical.

@fsackur
Copy link

fsackur commented Nov 29, 2020

...and for any powershellers, a similar function for pwsh:

function Reset-WslInterop
{
    param ($ProcessId = $PID)

    if (Test-Path /run/WSL/$ProcessId`_interop)
    {
        $env:WSL_INTEROP="/run/WSL/$ProcessId`_interop"
        return
    }

    Reset-WslInterop (Get-Process -Id $ProcessId).Parent.Id
}

@elucidsoft
Copy link

...and for any powershellers, a similar function for pwsh:

function Reset-WslInterop
{
    param ($ProcessId = $PID)

    if (Test-Path /run/WSL/$ProcessId`_interop)
    {
        $env:WSL_INTEROP="/run/WSL/$ProcessId`_interop"
        return
    }

    Reset-WslInterop (Get-Process -Id $ProcessId).Parent.Id
}

Yes, I don't use Powershell but I did this in bash script as the solution.

@lyf2000
Copy link

lyf2000 commented Feb 26, 2021

I've encountered with same error with my WSL2, when was trying to run docker-compose build - to rebuild the containers. The eror was:

newschool-db uses an image, skipping
newschool-redis uses an image, skipping
Building newschool-api
<3>init: (671) ERROR: UtilConnectToInteropServer:300: connect failed 2
Traceback (most recent call last):
  File "docker/credentials/store.py", line 80, in _execute
  File "subprocess.py", line 411, in check_output
  File "subprocess.py", line 512, in run
subprocess.CalledProcessError: Command '['/mnt/c/Program Files/Docker/Docker/resources/bin/docker-credential-desktop.exe', 'list']' returned non-zero exit status 1.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "bin/docker-compose", line 3, in <module>
  File "compose/cli/main.py", line 67, in main
  File "compose/cli/main.py", line 126, in perform_command
  File "compose/cli/main.py", line 302, in build
  File "compose/project.py", line 468, in build
  File "compose/project.py", line 450, in build_service
  File "compose/service.py", line 1125, in build
  File "docker/api/build.py", line 261, in build
  File "docker/api/build.py", line 308, in _set_auth_headers
  File "docker/auth.py", line 302, in get_all_credentials
  File "docker/credentials/store.py", line 71, in list
  File "docker/credentials/store.py", line 93, in _execute
docker.credentials.errors.StoreError: Credentials store docker-credential-desktop.exe exited with "".
[668] Failed to execute script docker-compose

But later I just run it as root and that resovled the trouble
It's likely the solution for specific issues, but hope will help someone)

@meymeynard
Copy link

meymeynard commented Mar 16, 2021

I've encountered with same error with my WSL2, when was trying to run docker-compose build - to rebuild the containers. The eror was:

newschool-db uses an image, skipping
newschool-redis uses an image, skipping
Building newschool-api
<3>init: (671) ERROR: UtilConnectToInteropServer:300: connect failed 2
Traceback (most recent call last):
  File "docker/credentials/store.py", line 80, in _execute
  File "subprocess.py", line 411, in check_output
  File "subprocess.py", line 512, in run
subprocess.CalledProcessError: Command '['/mnt/c/Program Files/Docker/Docker/resources/bin/docker-credential-desktop.exe', 'list']' returned non-zero exit status 1.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "bin/docker-compose", line 3, in <module>
  File "compose/cli/main.py", line 67, in main
  File "compose/cli/main.py", line 126, in perform_command
  File "compose/cli/main.py", line 302, in build
  File "compose/project.py", line 468, in build
  File "compose/project.py", line 450, in build_service
  File "compose/service.py", line 1125, in build
  File "docker/api/build.py", line 261, in build
  File "docker/api/build.py", line 308, in _set_auth_headers
  File "docker/auth.py", line 302, in get_all_credentials
  File "docker/credentials/store.py", line 71, in list
  File "docker/credentials/store.py", line 93, in _execute
docker.credentials.errors.StoreError: Credentials store docker-credential-desktop.exe exited with "".
[668] Failed to execute script docker-compose

But later I just run it as root and that resovled the trouble
It's likely the solution for specific issues, but hope will help someone)

@lyf2000 Is there any way this could run without the sudo?

@paulstelian97
Copy link
Author

I've encountered with same error with my WSL2, when was trying to run docker-compose build - to rebuild the containers. The eror was:

newschool-db uses an image, skipping
newschool-redis uses an image, skipping
Building newschool-api
<3>init: (671) ERROR: UtilConnectToInteropServer:300: connect failed 2
Traceback (most recent call last):
  File "docker/credentials/store.py", line 80, in _execute
  File "subprocess.py", line 411, in check_output
  File "subprocess.py", line 512, in run
subprocess.CalledProcessError: Command '['/mnt/c/Program Files/Docker/Docker/resources/bin/docker-credential-desktop.exe', 'list']' returned non-zero exit status 1.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "bin/docker-compose", line 3, in <module>
  File "compose/cli/main.py", line 67, in main
  File "compose/cli/main.py", line 126, in perform_command
  File "compose/cli/main.py", line 302, in build
  File "compose/project.py", line 468, in build
  File "compose/project.py", line 450, in build_service
  File "compose/service.py", line 1125, in build
  File "docker/api/build.py", line 261, in build
  File "docker/api/build.py", line 308, in _set_auth_headers
  File "docker/auth.py", line 302, in get_all_credentials
  File "docker/credentials/store.py", line 71, in list
  File "docker/credentials/store.py", line 93, in _execute
docker.credentials.errors.StoreError: Credentials store docker-credential-desktop.exe exited with "".
[668] Failed to execute script docker-compose

But later I just run it as root and that resovled the trouble
It's likely the solution for specific issues, but hope will help someone)

@lyf2000 Is there any way this could run without the sudo?

  1. Ensure your user is in the Docker group
  2. At the beginning of any session you shall do "sg docker" to launch a shell that has the proper group, since normal logins won't load these supplemental groups from /etc/passwd I think.

@sfmontyo
Copy link

sfmontyo commented Mar 24, 2021

So I get this issue repeatedly because I like to use the gnome-terminal instead of Windows Terminal. Here's the steps to reproduce:

  1. Run Windows Terminal and use your favorite WSL2 distro.
  2. Within the WSL2 distro, start gnome-terminal . (You'll need to have an X Server on your Windows Host).
  3. In the new gnome-terminal, execute the command:
/mnt/c/Windows/System32/cmd.exe /C "echo %USERPROFILE%"

NOTE: that it works and prints out the contents of the Windows Environment Variable USERPROFILE.

Also note the contents of the WSL_INTEROP bash environment variable points to /var/run/NNNN_interop or something like that

  1. Close the original Windows Terminal. This frees up the WSL instance
  2. Go back to the gnome-terminal and re-execute the above command. It will fail with the error:
ERROR: UtilConnectToInteropServer:300: connect failed 2
  1. Now start up a new Windows Terminal with a new WSL2 instance. In that terminal, see what the contents of WSL_INTEROP is
echo $WSL_INTEROP
  1. Go back to the gnome-terminal and change that WSL_INTEROP variable to use the new value from the new Windows Terminal
  2. Reexecute the command and it works again since its now using the new filesystem socket.

@paulstelian97
Copy link
Author

The fact that the socket is per-tty rather than, say, per-Windows session or something, is annoying. I would have preferred to always run Gnome Terminal and launch new terminals from that one always.

@JeppeKlitgaard
Copy link

I'll leave my version for fish, shamelessly translated from lvlts' bash solution above.

# .config/fish/conf.d/wsl.fish

function fix_wsl2_interop
    for i in (pstree -np -s $fish_pid | grep -o -E '[0-9]+')
        if test -e "/run/WSL/"$i"_interop"
            set -x WSL_INTEROP "/run/WSL/"$i"_interop"
        end
    end
end

fix_wsl2_interop

@plato79
Copy link

plato79 commented Apr 15, 2021

Ok, the solutions listed here doesn't work for me..
I'm using Ubuntu 20.04 and when I check with pstree -p I couldn't see any values resembling the number in the "_interop" file.. I don't know the cause but I think we should find another way to find this number.

@paulstelian97
Copy link
Author

Ok, the solutions listed here doesn't work for me..
I'm using Ubuntu 20.04 and when I check with pstree -p I couldn't see any values resembling the number in the "_interop" file.. I don't know the cause but I think we should find another way to find this number.

You should see if anything at all exists in /run/WSL/. If not, then it's because there's simply no open native terminal.

@joedborg
Copy link

Ok, the solutions listed here doesn't work for me..
I'm using Ubuntu 20.04 and when I check with pstree -p I couldn't see any values resembling the number in the "_interop" file.. I don't know the cause but I think we should find another way to find this number.

I've done it with adding this to my .zshenv

export WSL_INTEROP="/run/WSL/$(ls -tr /run/WSL | head -n1)"

No idea how robust this is going to be, but the oldest socket seems to be the one for me.

@plato79
Copy link

plato79 commented Apr 15, 2021

I think this would work.. Because if you think about it, whenever you open a new instance a new file is generated and it will be the file we need to assign to the environment variable. Thanks.

Edit: Just realized, I think you shouldn't reverse order. Because ls -t lists files from newest first. So reversing returns oldest first.

@joedborg
Copy link

joedborg commented Apr 15, 2021

Edit: Just realized, I think you shouldn't reverse order. Because ls -t lists files from newest first. So reversing returns oldest first.

On my machine, the oldest one is the one that always seems to work. It's all the new ones that are created, per shell, that don't. Perhaps we have different issues that manifest the same?

@plato79
Copy link

plato79 commented Apr 15, 2021

Edit: Just realized, I think you shouldn't reverse order. Because ls -t lists files from newest first. So reversing returns oldest first.

On my machine, the oldest one is the one that always seems to work. It's all the new ones that are created, per shell, that don't. Perhaps we have different issues that manifest the same?

Well, using the oldest also works. Although I'm using this for code . command mostly.. If you use for anything else it could matter.

@joedborg
Copy link

joedborg commented Apr 16, 2021

Edit: Just realized, I think you shouldn't reverse order. Because ls -t lists files from newest first. So reversing returns oldest first.

On my machine, the oldest one is the one that always seems to work. It's all the new ones that are created, per shell, that don't. Perhaps we have different issues that manifest the same?

Well, using the oldest also works. Although I'm using this for code . command mostly.. If you use for anything else it could matter.

Yeah, as I mentioned, no idea how robust this will be, it's just a work around until we can get a real one in WSL (fingers crossed).

@paulstelian97
Copy link
Author

I think any of the sockets work, so you just need a heuristic to select the one that will take the longest to be closed. Could be oldest, could be newest, could be any of the others. When you close one of the native terminals (wt, bash.exe, ubuntu.exe etc) the corresponding socket will be both closed and deleted from that path.

@joedborg
Copy link

I think any of the sockets work, so you just need a heuristic to select the one that will take the longest to be closed. Could be oldest, could be newest, could be any of the others. When you close one of the native terminals (wt, bash.exe, ubuntu.exe etc) the corresponding socket will be both closed and deleted from that path.

In my specific case, only the lowest numbered works.

@neerolyte
Copy link

neerolyte commented May 8, 2021

I wanted a solution that would work automatically in existing shells, not just new ones (or by manually executing a fix). Others may want this solution too.

Running the function from above within a PROMPT_COMMAND means WSL_INTEROP will be reset every time the prompt is updated (after each command invocation).

I've also optimised the function a little to short circuit the run if possible and avoid the extra proccess call (grep).

prompt_fix_wsl() {
	# return early if WSL is missing or already working
	[[ -n "$WSL_INTEROP" ]] || return
	! [[ -e "$WSL_INTEROP" ]] || return
	local pid pids
	# parse pstree output in to pids array
	IFS='-()'
	# shellcheck disable=SC2207
	pids=($(pstree --numeric-sort --show-pids --show-parents $$))
	unset IFS
	for pid in "${pids[@]}"; do
		[[ "$pid" =~ [0-9]+ ]] || continue
		[[ -e "/run/WSL/${pid}_interop" ]] || continue
		export "WSL_INTEROP=/run/WSL/${pid}_interop"
		# stop looking for sockets
		return
	done
}

Setting it up like so:

function my_prompt_command {
	# .... my existing prompt command that's probably too long to justify the above optimisation ...
	prompt_fix_wsl
}

PROMPT_COMMAND=my_prompt_command

WSL_INTEROP is now restored every time a command is executed in bash.

@marwatk
Copy link

marwatk commented Jun 29, 2021

All of these workarounds assume you don't need to keep the original interop socket open (because it's in use). When using something like wsl-vpnkit to allow WSL2 networking under a VPN it launches a persistent windows process, but that process gets terminated when the shell that started it is closed.

While it's possible to detect and remediate the closure it would be much better if there was just a single socket instead of one per-tty.

@sleeperss
Copy link

sleeperss commented Sep 15, 2021

all of this solutions doesn't work for https://github.com/shayne/wsl2-hacks, as terminal pid isn't accessible anymore

I come up with this solution :

#!/usr/bin/bash

export WSL_INTEROP=
for socket in /run/WSL/*; do
   if ss -elx | grep -q "$socket"; then
      export WSL_INTEROP=$socket
   else
      rm $socket 
   fi
done

if [[ -z $WSL_INTEROP ]]; then
   echo -e "\033[31mNo working WSL_INTEROP socket found !\033[0m" 
fi

@ElMehdi-TouimiBenjelloun

all of this solutions doesn't work for https://github.com/shayne/wsl2-hacks, as terminal pid isn't accessible anymore

I come up with this solution :

#!/usr/bin/bash

export WSL_INTEROP=
for socket in /run/WSL/*; do
   if ss -elx | grep -q "$socket"; then
      export WSL_INTEROP=$socket
   else
      rm $socket 
   fi
done

if [[ -z $WSL_INTEROP ]]; then
   echo -e "\033[31mNo working WSL_INTEROP socket found !\033[0m" 
fi

Thanks, it works like a charm.

@techtheriac
Copy link

export WSL_INTEROP="/run/WSL/$(ls -tr /run/WSL | head -n1)"

This works for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests