Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WSL 2 uses half the number of cores on AMD Threadripper 3990X #5423

Open
AAlMutairi opened this issue Jun 16, 2020 · 65 comments
Open

WSL 2 uses half the number of cores on AMD Threadripper 3990X #5423

AAlMutairi opened this issue Jun 16, 2020 · 65 comments
Labels
bug

Comments

@AAlMutairi
Copy link

@AAlMutairi AAlMutairi commented Jun 16, 2020

Environment

Windows build number: Microsoft Windows [Version 10.0.19041.329]
Your Distribution version: Ubuntu: 20.04
WSL 2

Steps to reproduce

I am using AMD threadripper 3990x in my PC. when I use the command lscpu I get the following

Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   48 bits physical, 48 bits virtual
CPU(s):                          64
On-line CPU(s) list:             0-63
Thread(s) per core:              2
Core(s) per socket:              32
Socket(s):                       1
Vendor ID:                       AuthenticAMD
CPU family:                      23
Model:                           49
Model name:                      AMD Ryzen Threadripper 3990X 64-Core Processor
.
.
.

Also when I use the command nproc, I get 64.

However, using both openmpi and mpich to run parallel job, mpi uses only 32 cores (half real cores). For this test I used the following code (copied from: https://mpitutorial.com/tutorials/mpi-hello-world/)

#include <mpi.h>
#include <stdio.h>

int main(int argc, char** argv) {
    // Initialize the MPI environment
    MPI_Init(NULL, NULL);

    // Get the number of processes
    int world_size;
    MPI_Comm_size(MPI_COMM_WORLD, &world_size);

    // Get the rank of the process
    int world_rank;
    MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);

    // Get the name of the processor
    char processor_name[MPI_MAX_PROCESSOR_NAME];
    int name_len;
    MPI_Get_processor_name(processor_name, &name_len);

    // Print off a hello world message
    printf("Hello world from processor %s, rank %d out of %d processors\n",
           processor_name, world_rank, world_size);

    // Finalize the MPI environment.
    MPI_Finalize();
}

Expected behavior

.
.
.
Hello world from processor Ubuntu, rank 10 out of 64 processors
Hello world from processor Ubuntu, rank 18 out of 64 processors
Hello world from processor Ubuntu, rank 23 out of 64 processors
.
.
.

Actual behavior

.
.
.
Hello world from processor Ubuntu, rank 10 out of 32 processors
Hello world from processor Ubuntu, rank 18 out of 32 processors
Hello world from processor Ubuntu, rank 23 out of 32 processors
.
.
.
@AAlMutairi
Copy link
Author

@AAlMutairi AAlMutairi commented Jun 17, 2020

Not sure if it is relevant but I am experiencing the same issue in hyper-V too.

@WSLUser
Copy link

@WSLUser WSLUser commented Jun 19, 2020

It's the kernel config. Look at https://github.com/microsoft/WSL2-Linux-Kernel/tree/master/Microsoft. In the config for x86_64 you will see it's set to 64. This is standard from the Linux kernel. What you can do is update the config to match the number of cores you have. Ideally WSL would do a check on the number of CPU cores and update the config appropriately in .wslconfig. For now this is a manual process.

@AAlMutairi
Copy link
Author

@AAlMutairi AAlMutairi commented Jun 19, 2020

@WSLUser , thanks for the answer. Just to confirm, you meant updating config-wsl since I couldn't find .wslconfig. if this is the case, I believe the part of interest is the following:

CONFIG_NR_CPUS_RANGE_BEGIN=2
CONFIG_NR_CPUS_RANGE_END=512
CONFIG_NR_CPUS_DEFAULT=64
CONFIG_NR_CPUS=256

Are you suggesting that despite the range, WSL 2 uses the default value as a max value?

My apologies if I misunderstood your suggestion.

@AAlMutairi
Copy link
Author

@AAlMutairi AAlMutairi commented Jun 19, 2020

@WSLUser thank you again for the suggestion and sorry for the misunderstanding on my part. I tried your method to change the number of processors. It works when I decrease the number of processors but unfortunately, it doesn't work passed the 64 processor (which is equivalent to 32 physical processors). it seems to still limit me to half of number of physical cores (64/2 = 32).

@sanastasiou
Copy link

@sanastasiou sanastasiou commented Jun 19, 2020

@WSLUser Does this work also for multiple CPUs? i.e. Dual Xeon setup?

@WSLUser
Copy link

@WSLUser WSLUser commented Jun 19, 2020

Not sure. @craigloewen-msft would probably know better. In your case it appears the kernel config itself needs updating. You should be able to override the original value in .wslconfig as well. You should see the option in the release notes. And yes (sorry I didn't answer before), it's using the default value. So you'll overwrite it. I don't recommend going above 256.

@AAlMutairi
Copy link
Author

@AAlMutairi AAlMutairi commented Jun 20, 2020

@WSLUser Thanks for all the help, I guess I will wait for the kernel to be updated.
@benhillis @therealkenc, would you be able to let us know if such fix to the kernel will be added to the next build?
@sanastasiou Did you have the chance to try the .wslconfig method?

@sanastasiou
Copy link

@sanastasiou sanastasiou commented Jun 20, 2020

@AAlMutairi not yet,not sure if it applies to dual cpu setups as well. If it does, I'll try.

@AAlMutairi
Copy link
Author

@AAlMutairi AAlMutairi commented Jun 29, 2020

Any updates or fixes to test?

@mozram
Copy link

@mozram mozram commented Jun 30, 2020

It affect compiling also when running make -j. Only half of CPUs used whereas WSL1 does not have this issue. Ryzen 2600, Ubuntu 20.04 WSL2

@sanastasiou
Copy link

@sanastasiou sanastasiou commented Jun 30, 2020

This basically blocks any usage of WSL 2, even if I check out my repo there, I lose 50% of my processing power.. That's simply a no go.

@AAlMutairi
Copy link
Author

@AAlMutairi AAlMutairi commented Jun 30, 2020

@mozram, it is surprising that it was working for you in WSL 1. Unfortunately for me, Both don't work for me.

@AAlMutairi
Copy link
Author

@AAlMutairi AAlMutairi commented Jun 30, 2020

@sanastasiou , hopefully any fix can work for both WSL 2 and hyper-V since the issue persist in both.

@AAlMutairi
Copy link
Author

@AAlMutairi AAlMutairi commented Jul 8, 2020

Interestingly, even when I used mpich on windows, it only sees 32 physical cores. I guess this issue isn't just limited to WSL or hyper-V

@AAlMutairi
Copy link
Author

@AAlMutairi AAlMutairi commented Jul 13, 2020

Any updates?

@sanastasiou
Copy link

@sanastasiou sanastasiou commented Jul 13, 2020

Changing WSL config has 0 effect whatsoever. 2nd CPU is not recognized.

@AAlMutairi
Copy link
Author

@AAlMutairi AAlMutairi commented Jul 13, 2020

I tried contacting AMD customer support about the issue and if they have any fixes but to no avail.

@onomatopellan
Copy link

@onomatopellan onomatopellan commented Jul 13, 2020

Ben said "I am already looking into this, AMD brought this to my attention as well."
So be patient.

@AAlMutairi
Copy link
Author

@AAlMutairi AAlMutairi commented Jul 20, 2020

Just to help narrow the issue, this issue seems to effect the 3990x alone since John from the AMD community test running WSL2 on his 3970x and got the following results:
pastedImage_1

it shows it detected all 32 physical cores (shown next to cores per socket) and all 64 logical cores (next to CPUs). not sure how helpful it is, but I thought it might help.

@sanastasiou
Copy link

@sanastasiou sanastasiou commented Jul 20, 2020

Not quite true, I have a dual xeon setup and it only detects one of them. So it doesn't affect only 3990X

@AAlMutairi
Copy link
Author

@AAlMutairi AAlMutairi commented Jul 20, 2020

@sanastasiou , my apologies, I meant within the AMD thread ripper line, only the 3990x is affected. by the way, did you test if the same issue persists when you use hyper-V? because it is the case for me.

@ykim362
Copy link
Member

@ykim362 ykim362 commented Jul 22, 2020

I have the same issue with Intel Xeon. I have two 6242R CPUs (2 sockets), and only 1 socket is available from WSL 2.

@AAlMutairi
Copy link
Author

@AAlMutairi AAlMutairi commented Jul 22, 2020

@ykim362 Which Windows are you using? Do you have the same issue with hyper-V?

@sanastasiou
Copy link

@sanastasiou sanastasiou commented Jul 22, 2020

@AAlMutairi how do I enable/how can I check this with Hyper-V?

@AAlMutairi
Copy link
Author

@AAlMutairi AAlMutairi commented Jul 22, 2020

@sanastasiou it is similar to WSL in which you enable it through the "Turn Windows features on or off" as shown here:
https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/quick-start/enable-hyper-v#enable-the-hyper-v-role-through-settings

Then use the "Hyper-V quick create" as shown here (based on old windows but it is still the same):
https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/quick-start/quick-create-virtual-machine

I guess you can install an Ubuntu VM for now.

if you right click on the VM you can access it setting and change the number of cores and sockets you want. Then you can run and test.

@ykim362
Copy link
Member

@ykim362 ykim362 commented Jul 22, 2020

@AAlMutairi I was able to configure the number of virtual cores (2 x physical cores) with Hyper-V (on windows 10 enterprise). But, I am not sure it's really using all CPUs, or just doing virtually showing 2x more cpus. It was 40 logical cores (20 physical cores) by default, and even after I increased the number to 80 logical cores, it only shows as 1 socket.

lscpu

Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 80
On-line CPU(s) list: 0-79
Thread(s) per core: 2
Core(s) per socket: 40
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 85
Model name: Intel(R) Xeon(R) Gold 6242R CPU @ 3.10GHz
Stepping: 7
CPU MHz: 3092.733
BogoMIPS: 6185.46
Hypervisor vendor: Microsoft
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 1024K
L3 cache: 36608K
NUMA node0 CPU(s): 0-79

@AAlMutairi
Copy link
Author

@AAlMutairi AAlMutairi commented Jul 22, 2020

I use Quantum espresso (because it is the software I need WSL2 for) but you can use the code on the main post with few modifications to test if you want. Or you can use the tool you need WSL2 for it you want. There is of course the option of installing windows on a VM and use something like cinebench then delete the VM after.

@ykim362
Copy link
Member

@ykim362 ykim362 commented Jul 22, 2020

@AAlMutairi I tried one of the open stress tests. Seems it's working correctly. The dark blue cells are 100%. 40 physical cores are 100%.

image

@AAlMutairi
Copy link
Author

@AAlMutairi AAlMutairi commented Jul 22, 2020

Not sure if it is working, because if I understood correctly, tasks manager cores layout shows core in pairs (i.e adjacent ones are threads of the same core). What it is showing is that it is using 100% of the logical cores on one of the two sockets

@ykim362
Copy link
Member

@ykim362 ykim362 commented Jul 23, 2020

@AAlMutairi I see. I checked a benchmark on windows. And, I could see all 80 logical cores are 100%. So, Hyper-V might also uses 1 socket only even if it displays 2 sockets. Does Hyper-v need to be fixed first to resolve this issue?

@AAlMutairi
Copy link
Author

@AAlMutairi AAlMutairi commented Jul 23, 2020

@ykim362 I am not sure. It seems the issue is in both WSL and Hyper-V but not sure what is causing them. Also, it would be worthwhile to see if someone who uses Windows 10 Pro for workstation or Windows server 2019 is facing similar issue, because I am considering the upgrade if it fixes the issue.

@WSLUser
Copy link

@WSLUser WSLUser commented Jul 23, 2020

So I added a number of options for NUMA in my config (it's disabled by default in WSL2) but requires using the latest Linux kernel (minimum 5.7.8). This might help or not since I'd expect the options to be enabled for distros running in Hyper-V but maybe not. Try it for WSL2 with https://github.com/microsoft/WSL2-Linux-Kernel/blob/271456763a85394e318b345744ad8fd2692678c2/Microsoft/config-wsl and this guide https://wsl.dev/wsl2-kernel-zfs/. If it works better, check the config in Hyper-V for the distro and do comparison. Note I left the default of 64 core in my PR so change it to 256. CONFIG_NR_CPUS_DEFAULT is the relevant config option.

@AAlMutairi
Copy link
Author

@AAlMutairi AAlMutairi commented Jul 23, 2020

@WSLUser yes I looked into NUMA while testing Hyper V but it didn't help. The TR 3990x is a single NUMA CPU. However, not sure if that will help with the two CPU issue

@WSLUser
Copy link

@WSLUser WSLUser commented Jul 23, 2020

Yes which is why I said it might help or not. They were directly ported from Clear Linux kernel config for Hyper-V.

@mistergitj
Copy link

@mistergitj mistergitj commented Jul 23, 2020

aalmutairi, this is MisterJ from the AMD forum.
I think you do not have a Hyper-V problem! When the WSL/Linux problem is fixed, we will see if you have any more problems. You posted this Task Manager SS on the AMD forum and it clearly shows 128 logical processors all going 100%. I asked you to do this again to make sure of what is being displayed. I think it is the Task Manager on your VM running CB:
3990CB
Please run this test again as I requested.

I have read ykim362 posts and it looks like he is experiencing the "only one socket' supported problem in Hyper-V. I recommend he post in MS TechNet forum 'Windows 10 IT Pro > Windows 10 Virtualization'.

Thanks and enjoy, John.

@AAlMutairi
Copy link
Author

@AAlMutairi AAlMutairi commented Jul 23, 2020

@mistergitj this was CB running on the W10 (not VM W10). The task manager on the VM only shows 32 physical cores, and when I ran CB on the VM, the tasks manager on the maim PC showed the results in comment
#5423 (comment)

@therealkenc
Copy link
Collaborator

@therealkenc therealkenc commented Jul 23, 2020

@AAlMutairi because I am considering the upgrade [to Win Pro] if it fixes the issue.

It will not.

The problem is understood. What will address your issue is a fixinbound tag and star alignment.

@WSLUser This might help or not

If the problem could be fixed with just a kconfig, it wouldn't exist.

@therealkenc therealkenc added the bug label Jul 23, 2020
@mistergitj
Copy link

@mistergitj mistergitj commented Jul 23, 2020

aalmutairi, please run the CB under Hyper-V again and post a SS. I will post on MS TechNet. Thanks and enjoy, John.

@mistergitj
Copy link

@mistergitj mistergitj commented Jul 23, 2020

Here is the link to my post on TechNet. Enjoy, John.

EDIT: BTW when the Task Manager shows zeros, as ykim362's above, resizing the Task Manager will fix it. Try changing the width and the 0% should change to 100%

EDIT: Found this. May be bad news.
EDIT: Created this Performance Monitor like the one posted in the link. Notice mine says 64 and 64.
Perfmon

@AAlMutairi
Copy link
Author

@AAlMutairi AAlMutairi commented Aug 27, 2020

@benhillis , sorry to bother you. you mentioned you were looking into this problem. I was wondering if you managed to find the source of the problem. Currently, I am experiencing the same behaviour in WSL2, Hyper V and MS MPI. I am not sure if the source is the same for all of them or each problem has a different root. If the later is correct, I will have to check with each team.

@mistergitj
Copy link

@mistergitj mistergitj commented Aug 27, 2020

@AAlMutairi
Copy link
Author

@AAlMutairi AAlMutairi commented Aug 28, 2020

@mistergitj I have seen that and how to was considered answered despite the fact it wasn't. However, @benhillis has been working on this problem for awhile and I was hoping for his input here.

@benhillis
Copy link
Member

@benhillis benhillis commented Aug 28, 2020

If your machine has more than one processor group this is current expected. NUMA support is on the backlog.

@mistergitj
Copy link

@mistergitj mistergitj commented Aug 28, 2020

@mistergitj
Copy link

@mistergitj mistergitj commented Aug 28, 2020

@benhillis
Copy link
Member

@benhillis benhillis commented Aug 28, 2020

I mean NUMA support for WSL2, specifically.

@AAlMutairi
Copy link
Author

@AAlMutairi AAlMutairi commented Aug 28, 2020

@benhillis thanks for your reply. I assuming from what you are saying that the issue I currently face with WSL2 is independent from the issue with Hyper-V since Hyper-V have no issue with NUMA and yet it is showing the same behaviour. Sorry if I misunderstood you.

@mistergitj unfortunately, even AMD support team are not sure how to fix it.

@mistergitj
Copy link

@mistergitj mistergitj commented Aug 28, 2020

@mistergitj
Copy link

@mistergitj mistergitj commented Aug 28, 2020

@AAlMutairi
Copy link
Author

@AAlMutairi AAlMutairi commented Aug 28, 2020

@benhillis I looked into disabling SMT. Initially that seemed to work. However, I ran some benchmark to compare performance. The result showed that despite detecting the correct number of cores (64), the performance is the same as the 32 cores when SMT is enabled.
I do not think WSL liked the enabling and disabling of SMT since after re-enabling SMT, the performance was much worse than it was originally. (Windows performance was not effected).

@mistergitj , we will see what happens in future updates.

@mistergitj
Copy link

@mistergitj mistergitj commented Aug 30, 2020

@benhillis I do not understand what your problem is. As I understand NUMA, it is a function of the processor (must be Enabled in BIOS) and should have nothing to do with the application. Hyper-V needs to get involved to properly simulate the VM but I do not understand why/what WSL is having problems with concerning NUMA. It seems to me that an application may need to be NUMA Mode aware to more efficiently allocate/use memory. If you will explain your WSL/NUMA problem I will think on it. Thanks and enjoy, John.

@AAlMutairi
Copy link
Author

@AAlMutairi AAlMutairi commented Sep 28, 2020

Looking at the updates released for Windows insider program and WSL2, it seems that this bug (and the bug of the two CPUs) is far from being solved, isn't it?

@AAlMutairi
Copy link
Author

@AAlMutairi AAlMutairi commented Nov 23, 2020

@benhillis @therealkenc , it seems this issue is still persistent even with the new Windows 10 20H2. I am guessing there is no solution for the foreseeable future, right?

@ykim362 did upgrading help with your issue?

@ykim362
Copy link
Member

@ykim362 ykim362 commented Nov 23, 2020

@AAlMutairi No approach has solved this so far for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
9 participants
You can’t perform that action at this time.