Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WSL 2 uses half the number of cores on AMD Threadripper 3990X #5423

Open
AAlMutairi opened this issue Jun 16, 2020 · 81 comments
Open

WSL 2 uses half the number of cores on AMD Threadripper 3990X #5423

AAlMutairi opened this issue Jun 16, 2020 · 81 comments
Labels

Comments

@AAlMutairi
Copy link

@AAlMutairi AAlMutairi commented Jun 16, 2020

Environment

Windows build number: Microsoft Windows [Version 10.0.19041.329]
Your Distribution version: Ubuntu: 20.04
WSL 2

Steps to reproduce

I am using AMD threadripper 3990x in my PC. when I use the command lscpu I get the following

Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   48 bits physical, 48 bits virtual
CPU(s):                          64
On-line CPU(s) list:             0-63
Thread(s) per core:              2
Core(s) per socket:              32
Socket(s):                       1
Vendor ID:                       AuthenticAMD
CPU family:                      23
Model:                           49
Model name:                      AMD Ryzen Threadripper 3990X 64-Core Processor
.
.
.

Also when I use the command nproc, I get 64.

However, using both openmpi and mpich to run parallel job, mpi uses only 32 cores (half real cores). For this test I used the following code (copied from: https://mpitutorial.com/tutorials/mpi-hello-world/)

#include <mpi.h>
#include <stdio.h>

int main(int argc, char** argv) {
    // Initialize the MPI environment
    MPI_Init(NULL, NULL);

    // Get the number of processes
    int world_size;
    MPI_Comm_size(MPI_COMM_WORLD, &world_size);

    // Get the rank of the process
    int world_rank;
    MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);

    // Get the name of the processor
    char processor_name[MPI_MAX_PROCESSOR_NAME];
    int name_len;
    MPI_Get_processor_name(processor_name, &name_len);

    // Print off a hello world message
    printf("Hello world from processor %s, rank %d out of %d processors\n",
           processor_name, world_rank, world_size);

    // Finalize the MPI environment.
    MPI_Finalize();
}

Expected behavior

.
.
.
Hello world from processor Ubuntu, rank 10 out of 64 processors
Hello world from processor Ubuntu, rank 18 out of 64 processors
Hello world from processor Ubuntu, rank 23 out of 64 processors
.
.
.

Actual behavior

.
.
.
Hello world from processor Ubuntu, rank 10 out of 32 processors
Hello world from processor Ubuntu, rank 18 out of 32 processors
Hello world from processor Ubuntu, rank 23 out of 32 processors
.
.
.
@AAlMutairi
Copy link
Author

@AAlMutairi AAlMutairi commented Jun 17, 2020

Not sure if it is relevant but I am experiencing the same issue in hyper-V too.

@WSLUser
Copy link

@WSLUser WSLUser commented Jun 19, 2020

It's the kernel config. Look at https://github.com/microsoft/WSL2-Linux-Kernel/tree/master/Microsoft. In the config for x86_64 you will see it's set to 64. This is standard from the Linux kernel. What you can do is update the config to match the number of cores you have. Ideally WSL would do a check on the number of CPU cores and update the config appropriately in .wslconfig. For now this is a manual process.

@AAlMutairi
Copy link
Author

@AAlMutairi AAlMutairi commented Jun 19, 2020

@WSLUser , thanks for the answer. Just to confirm, you meant updating config-wsl since I couldn't find .wslconfig. if this is the case, I believe the part of interest is the following:

CONFIG_NR_CPUS_RANGE_BEGIN=2
CONFIG_NR_CPUS_RANGE_END=512
CONFIG_NR_CPUS_DEFAULT=64
CONFIG_NR_CPUS=256

Are you suggesting that despite the range, WSL 2 uses the default value as a max value?

My apologies if I misunderstood your suggestion.

@WSLUser
Copy link

@WSLUser WSLUser commented Jun 19, 2020

@AAlMutairi
Copy link
Author

@AAlMutairi AAlMutairi commented Jun 19, 2020

@WSLUser thank you again for the suggestion and sorry for the misunderstanding on my part. I tried your method to change the number of processors. It works when I decrease the number of processors but unfortunately, it doesn't work passed the 64 processor (which is equivalent to 32 physical processors). it seems to still limit me to half of number of physical cores (64/2 = 32).

@sanastasiou
Copy link

@sanastasiou sanastasiou commented Jun 19, 2020

@WSLUser Does this work also for multiple CPUs? i.e. Dual Xeon setup?

@WSLUser
Copy link

@WSLUser WSLUser commented Jun 19, 2020

Not sure. @craigloewen-msft would probably know better. In your case it appears the kernel config itself needs updating. You should be able to override the original value in .wslconfig as well. You should see the option in the release notes. And yes (sorry I didn't answer before), it's using the default value. So you'll overwrite it. I don't recommend going above 256.

@AAlMutairi
Copy link
Author

@AAlMutairi AAlMutairi commented Jun 20, 2020

@WSLUser Thanks for all the help, I guess I will wait for the kernel to be updated.
@benhillis @therealkenc, would you be able to let us know if such fix to the kernel will be added to the next build?
@sanastasiou Did you have the chance to try the .wslconfig method?

@sanastasiou
Copy link

@sanastasiou sanastasiou commented Jun 20, 2020

@AAlMutairi not yet,not sure if it applies to dual cpu setups as well. If it does, I'll try.

@AAlMutairi
Copy link
Author

@AAlMutairi AAlMutairi commented Jun 29, 2020

Any updates or fixes to test?

@mozram
Copy link

@mozram mozram commented Jun 30, 2020

It affect compiling also when running make -j. Only half of CPUs used whereas WSL1 does not have this issue. Ryzen 2600, Ubuntu 20.04 WSL2

@sanastasiou
Copy link

@sanastasiou sanastasiou commented Jun 30, 2020

This basically blocks any usage of WSL 2, even if I check out my repo there, I lose 50% of my processing power.. That's simply a no go.

@AAlMutairi
Copy link
Author

@AAlMutairi AAlMutairi commented Jun 30, 2020

@mozram, it is surprising that it was working for you in WSL 1. Unfortunately for me, Both don't work for me.

@AAlMutairi
Copy link
Author

@AAlMutairi AAlMutairi commented Jun 30, 2020

@sanastasiou , hopefully any fix can work for both WSL 2 and hyper-V since the issue persist in both.

@AAlMutairi
Copy link
Author

@AAlMutairi AAlMutairi commented Jul 8, 2020

Interestingly, even when I used mpich on windows, it only sees 32 physical cores. I guess this issue isn't just limited to WSL or hyper-V

@AAlMutairi
Copy link
Author

@AAlMutairi AAlMutairi commented Jul 13, 2020

Any updates?

@sanastasiou
Copy link

@sanastasiou sanastasiou commented Jul 13, 2020

Changing WSL config has 0 effect whatsoever. 2nd CPU is not recognized.

@AAlMutairi
Copy link
Author

@AAlMutairi AAlMutairi commented Jul 13, 2020

I tried contacting AMD customer support about the issue and if they have any fixes but to no avail.

@onomatopellan
Copy link

@onomatopellan onomatopellan commented Jul 13, 2020

Ben said "I am already looking into this, AMD brought this to my attention as well."
So be patient.

@AAlMutairi
Copy link
Author

@AAlMutairi AAlMutairi commented Jul 20, 2020

Just to help narrow the issue, this issue seems to effect the 3990x alone since John from the AMD community test running WSL2 on his 3970x and got the following results:
pastedImage_1

it shows it detected all 32 physical cores (shown next to cores per socket) and all 64 logical cores (next to CPUs). not sure how helpful it is, but I thought it might help.

@sanastasiou
Copy link

@sanastasiou sanastasiou commented Jul 20, 2020

Not quite true, I have a dual xeon setup and it only detects one of them. So it doesn't affect only 3990X

@AAlMutairi
Copy link
Author

@AAlMutairi AAlMutairi commented Jul 20, 2020

@sanastasiou , my apologies, I meant within the AMD thread ripper line, only the 3990x is affected. by the way, did you test if the same issue persists when you use hyper-V? because it is the case for me.

@ykim362
Copy link
Member

@ykim362 ykim362 commented Jul 22, 2020

I have the same issue with Intel Xeon. I have two 6242R CPUs (2 sockets), and only 1 socket is available from WSL 2.

@AAlMutairi
Copy link
Author

@AAlMutairi AAlMutairi commented Jul 22, 2020

@ykim362 Which Windows are you using? Do you have the same issue with hyper-V?

@sanastasiou
Copy link

@sanastasiou sanastasiou commented Jul 22, 2020

@AAlMutairi how do I enable/how can I check this with Hyper-V?

@AAlMutairi
Copy link
Author

@AAlMutairi AAlMutairi commented Jul 22, 2020

@sanastasiou it is similar to WSL in which you enable it through the "Turn Windows features on or off" as shown here:
https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/quick-start/enable-hyper-v#enable-the-hyper-v-role-through-settings

Then use the "Hyper-V quick create" as shown here (based on old windows but it is still the same):
https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/quick-start/quick-create-virtual-machine

I guess you can install an Ubuntu VM for now.

if you right click on the VM you can access it setting and change the number of cores and sockets you want. Then you can run and test.

@ykim362
Copy link
Member

@ykim362 ykim362 commented Jul 22, 2020

@AAlMutairi I was able to configure the number of virtual cores (2 x physical cores) with Hyper-V (on windows 10 enterprise). But, I am not sure it's really using all CPUs, or just doing virtually showing 2x more cpus. It was 40 logical cores (20 physical cores) by default, and even after I increased the number to 80 logical cores, it only shows as 1 socket.

lscpu

Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 80
On-line CPU(s) list: 0-79
Thread(s) per core: 2
Core(s) per socket: 40
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 85
Model name: Intel(R) Xeon(R) Gold 6242R CPU @ 3.10GHz
Stepping: 7
CPU MHz: 3092.733
BogoMIPS: 6185.46
Hypervisor vendor: Microsoft
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 1024K
L3 cache: 36608K
NUMA node0 CPU(s): 0-79

@mistergitj
Copy link

@mistergitj mistergitj commented Aug 28, 2020

@benhillis
Copy link
Member

@benhillis benhillis commented Aug 28, 2020

I mean NUMA support for WSL2, specifically.

@AAlMutairi
Copy link
Author

@AAlMutairi AAlMutairi commented Aug 28, 2020

@benhillis thanks for your reply. I assuming from what you are saying that the issue I currently face with WSL2 is independent from the issue with Hyper-V since Hyper-V have no issue with NUMA and yet it is showing the same behaviour. Sorry if I misunderstood you.

@mistergitj unfortunately, even AMD support team are not sure how to fix it.

@mistergitj
Copy link

@mistergitj mistergitj commented Aug 28, 2020

@mistergitj
Copy link

@mistergitj mistergitj commented Aug 28, 2020

@AAlMutairi
Copy link
Author

@AAlMutairi AAlMutairi commented Aug 28, 2020

@benhillis I looked into disabling SMT. Initially that seemed to work. However, I ran some benchmark to compare performance. The result showed that despite detecting the correct number of cores (64), the performance is the same as the 32 cores when SMT is enabled.
I do not think WSL liked the enabling and disabling of SMT since after re-enabling SMT, the performance was much worse than it was originally. (Windows performance was not effected).

@mistergitj , we will see what happens in future updates.

@mistergitj
Copy link

@mistergitj mistergitj commented Aug 30, 2020

@benhillis I do not understand what your problem is. As I understand NUMA, it is a function of the processor (must be Enabled in BIOS) and should have nothing to do with the application. Hyper-V needs to get involved to properly simulate the VM but I do not understand why/what WSL is having problems with concerning NUMA. It seems to me that an application may need to be NUMA Mode aware to more efficiently allocate/use memory. If you will explain your WSL/NUMA problem I will think on it. Thanks and enjoy, John.

@AAlMutairi
Copy link
Author

@AAlMutairi AAlMutairi commented Sep 28, 2020

Looking at the updates released for Windows insider program and WSL2, it seems that this bug (and the bug of the two CPUs) is far from being solved, isn't it?

@AAlMutairi
Copy link
Author

@AAlMutairi AAlMutairi commented Nov 23, 2020

@benhillis @therealkenc , it seems this issue is still persistent even with the new Windows 10 20H2. I am guessing there is no solution for the foreseeable future, right?

@ykim362 did upgrading help with your issue?

@ykim362
Copy link
Member

@ykim362 ykim362 commented Nov 23, 2020

@AAlMutairi No approach has solved this so far for me.

@AAlMutairi
Copy link
Author

@AAlMutairi AAlMutairi commented Nov 29, 2020

Does this quietness that there isn't a fix inbound?

@alvarolb
Copy link

@alvarolb alvarolb commented Dec 20, 2020

Same problem here.

Dual Xeon E5-2699-v4 (each 22C 44T), showing only one socket in WSL2, and then only using 44 threads. I tried by adding more processors in the configuration but it does nothing, as it seems that WSL is exposing only one socket.

Hope it can be fixed soon, as WSL2 is useless if we cannot use our CPUs at 100%.

Any progress on that?

@AAlMutairi
Copy link
Author

@AAlMutairi AAlMutairi commented Dec 20, 2020

@alvarolb

I have been waiting for a fix for 6 months and it does not seems to be any progress on this front. The reason for this as @benhillis pointed out in #5423 (comment) is the processing group.

Now we are facing the first issue, it seems that the WSL(2) development team think of it as an issue that needs to be addressed by the windows platform developer. While the windows platform development team think it is the responsibility of the software developers (as shown here: https://docs.microsoft.com/en-us/windows/win32/procthread/processor-groups) which a lot of other software developer did successfully.

If you are wondering if it was a simple fix, why the WSL2 team did not fix it yet, the only reason I can come up with is that WSL(2) was never meant to replace proper linux. it was meant for light code development/compiling. It was not meant for high computational work or HPC code development. Think of it as Lite linux for people who does not want to setup a full virtual machine. Hence, this issue is not a priority.

@sanastasiou
Copy link

@sanastasiou sanastasiou commented Dec 20, 2020

@AAlMutairi

which a lot of other software developer did successfully.

Could you elaborate? You mean we can somehow fix this ourselves?

@AAlMutairi
Copy link
Author

@AAlMutairi AAlMutairi commented Dec 20, 2020

@sanastasiou

Both Julia and Matlab overcame this issue. There is also some discussion about this topic here: https://stackoverflow.com/questions/61479400/c-thread-handling-with-two-processor-groups

Can we fix it ourselves, not sure but I have not tried it myself.

@alvarolb
Copy link

@alvarolb alvarolb commented Dec 21, 2020

Ok, thanks for the response @AAlMutairi. I think the WSL(2) team should address this issue, so its lightweight VM supports different processing groups. It is not only useful for high computacional work or HPC code development, but also for compiling over Linux. I personally use Linux for compiling large toolchains, dependencies, and code that benefits from multiple cores. Would love to see WSL2 can use all cores....

@AAlMutairi
Copy link
Author

@AAlMutairi AAlMutairi commented Mar 23, 2021

@benhillis @therealkenc, Any updates on this bug? any work arounds we can test?

@AAlMutairi
Copy link
Author

@AAlMutairi AAlMutairi commented Mar 25, 2021

@benhillis @therealkenc, Alternatively, can I run to separate instances of WSL2 on each subgroup simultaneously? If I can't get 100% utilization of one of them might as well run two.

@pnthai88
Copy link

@pnthai88 pnthai88 commented May 8, 2021

I'm working on new rack workstation 4 sockets total 72 cores 144 threads.
Benchmark on w10 enterprise 10H2 is fine all cores and threads are regconized.
I tried with wslconfig and recompile linux kernel with NR_CPU_DEFAULT... 8192.
All are not working, maximum at 64 cores. I seriously do not want to use CentOS for development.
Yea, i can push my code to other HPC server that running on CentOS but that will cost a lot time for debug and rebuild everytime i change.
So sad the problem did not solved yet 🙂

@AAlMutairi
Copy link
Author

@AAlMutairi AAlMutairi commented Jul 12, 2021

Has anyone check if this issue has been resolved in Windows 11? I know it is still in beta but maybe such issue has been addressed.

@benhillis @therealkenc, has there been update regarding this bug?

@stubmirror
Copy link

@stubmirror stubmirror commented Aug 5, 2021

same issue on dual 64core epyc system, maxes out at 32core/64threads

@athena9
Copy link

@athena9 athena9 commented Sep 14, 2021

+1 having the same issue with a 3995wx. WSL2 on Win Enterprise.

@AAlMutairi
Copy link
Author

@AAlMutairi AAlMutairi commented Nov 25, 2021

@benhillis @therealkenc any updates? #5423 (comment) suggested that this issue is understood and has been looked at since July 2020. Unfortunately, it has not been fixed yet. I recently upgraded to W11 since it fixes the NUMA issue of TR3990x (now it treats it as a single NUMA cpu). I hoped this will fix the issue in WSL2 but, WSL2 still struggle with processor groups. Are there any plans to make WSL processor group aware?

@silence48
Copy link

@silence48 silence48 commented Nov 29, 2021

I'm trying to figure out this same issue I have tried it with an epyc 64 core cpu as well as a threadripper 3995wx and both cpus are limited to 32 core 64 threads. I switched to a native linux kernel to get done what i needed but am i doing something wrong?

@athena9
Copy link

@athena9 athena9 commented Nov 30, 2021

I have upgraded to the latest version (Ubuntu 20.4 LTS) and the problem is solved...

With what system?
I've had no luck with Ubuntu 20.04.3 LTS with 64 cores on Win 10.

@amircrypto001
Copy link

@amircrypto001 amircrypto001 commented Nov 30, 2021

I have upgraded to the latest version (Ubuntu 20.4 LTS) and the problem is solved...

With what system? I've had no luck with Ubuntu 20.04.3 LTS with 64 cores on Win 10.

my bad... the problem is still there!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet