Join GitHub today
GitHub is home to over 36 million developers working together to host and review code, manage projects, and build software together.Sign up
[has a simply repro] TCP ephemeral ports exhausted after lots of early-closed non-blocking connections #3951
In the three months after the reproduction code proposed by philip-searle on 18 Jan, no Microsoft employees paid attention to the old issue. So I can only make a new issue to get attention.
After creating and closing (before established) a large number of non-blocking connections in WSL, all TCP ephemeral ports will be exhausted, then no new TCP connections from WSL or Win32 can be established. Closing related processes in WSL does not release these ports. All new TCP connections or listening will failed and must to restart the
You can reliably reproduce this issue using the attached program (~80 lines of C): wsl-issue-2913-repro.c.txt
Output from strace looks normal to me and is attached as wsl-issue-2913-repro.strace.zip
The program performs these steps in a loop:
In Ubuntu in WSL, build and run the demo with these commands:
apt update apt install gcc wget wget -O wsl-issue-2913-repro.c https://github.com/Microsoft/WSL/files/2769821/wsl-issue-2913-repro.c.txt gcc -o wsl-issue-2913-repro wsl-issue-2913-repro.c ./wsl-issue-2913-repro
On a Linux VM I can run the loop several hundred thousand times and see the ports being used cycle through the entire ephemeral range multiple times.
In addition, even if the program has a bug that does not properly release the occupied port, these ports should be automatically released after the program exits.
On philip-searle's Windows laptop it loops about 16,000 times and then EINVAL is returned from connect(). At this point the symptoms described in previous comments appear: Win32 programs such as web browsers fail to connect and the output of "netstat -anoq" in a command prompt shows many connections stuck in the "BOUND" state. The only way to get network connections working again is to restart the
On YihaoPeng's PC with Windows 1809
No ports released after the program exits. If you let the program run repeatedly (so it will immediately take up the ephemeral port released by other programs), you will find that no TCP connections in your Windows can be established. For example, your EDGE browser will not be able to load any page.
Use the following commands to run the program repeatedly:
referenced this issue
Apr 1, 2019
changed the title
[has a simply repro] TCP user port exhaust and network broken after lots of non-blocking connections
Apr 1, 2019
Phillip's code below so it doesn't get lost.
Running the Sysinternals RamMap utility after the above program has terminated provides some insight - the wsl-issue-2913-repro process still exists as a zombie - no surprise considering it owns all those sockets visible in
There's actually no need to go to the extreme of port exhaustion to repro the underlying issue. In fact we don't need the above code at all, we can use anything that will call connect() and can be fed a target where there is nothing listening.
Zombie wget and curl (sticking with 127.0.0.1:1234):
Can also try for the same result (it's not a localhost thing):
If there is something listening on 127.0.0.1:1234 (I used netcat) there is no zombie process:
So it seems that WSL does not correctly handle failed connects. It looks as though a reference to the socket is maintained internally which results in the owning process becoming a zombie and port exhaustion will eventually (or quickly) occur as per #2913.
As mentioned in #2913 terminating all WSL processes running under the distro allows the sockets to be released and the zombies go away once WSL runs its cleanup.
Great follow up.
I was able to reproduce this here on 18865 with RamMap and your
This is going to 'splain some other ill-defined reports, if WSL has been leaking like a sieve all this time nd no one noticed. Usually you don't try to connect to something that isn't there, and those NT processes don't show up in a bog-standard Resource Monitor look see. But if, say, you did an
I am also experiencing this issue -- it's very frustrating to have to bounce
It appears that some process (
There's probably better ways to get some of this data...:
Ephemeral port usage
Unknown process running on loop
Task Manager view