Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Argon containers use only weak cores on systems with heterogeneous CPUs #397

Open
conioh opened this issue Jul 7, 2023 · 19 comments
Open
Assignees
Labels
bug Something isn't working

Comments

@conioh
Copy link

conioh commented Jul 7, 2023

NOTE: Edited on 2023-07-11 to correct inaccurate data about ARM64 machines.

Describe the bug

Argon containers use only weak cores on systems with heterogeneous CPUs.

We use Docker for our build environment. This is required for many reasons.

For example, it's quote common for our code to be incompatible with past and future Visual C++ versions because of new C++ features going in and bugfixes (or new bugs) modifying behavior of old code.

Generally, people can always build the master branch on their host machine, but they can't build an older version (required when servicing an issue with an older but still supported version). But people can't refrain from upgrading Visual Studio because then the current code won't build. It's impractical to keep all the version of the Visual C++ build tools installed.

So we have Docker images with the entire build environment, and a text file in the code repository pointing to the tag of the corresponding build tools Docker image. That way in order to build any commit from the product source repository we only need to pull the referred Docker image. This is also what our CI system does.

We make every effort to run Docker containers using process isolation (mostly - make sure our images start from base images compatible to the host), as our tests have shown that running under process has no overhead compared to running directly on the host (sometimes even slightly faster¹), while running under Hyper-V isolation has a significant overhead, among other problems.

Unfortunately we have recently discovered that when running on system with heterogeneous CPUs (Intel Alder Lake and Raptor Lake CPUs and ARM64 CPUs under certain conditions), the processes inside process-isolated containers utilize only the "weak" cores (E-core on Intel, LITTLE cores on ARM). See: docker/for-win#13562

Using sophisticated debugging techniques we have also discovered that the issue is not with Docker/Moby but rather with Windows. See below in the reproduction section.

On all of our machines with Intel CPUs prior to Alder Lake, building inside process-isolated Docker containers runs as fast (or slightly faster¹, but on our Raptor Lake machines with Intel i9 13900H CPUs (with 6 P-cores = 12 logical cores + 8 E-core) we get the following results:

Host Process isolation
Debug 500s 1000s
Release 1000s 2500s

(The numbers are rounded averages. For example, Debug on the host actually takes 490s-497s.)

This it pretty awful. We found out it is actually faster to build on an older model of the same computer, from 2 iterations back, with an Intel i9 11900H CPU, since it doesn't have two kinds of cores and it actually uses all of them.

On the Raptor Lake aforementioned the 12 logical P-core do nothing while the 8 E-cores do all the work. We're assuming Release is more compute-intensive due to the optimizations and there we see x2.5 factor (which is close to the 40% CPU utilization claimed by some tools seeing 100% on 8 cores and practically nothing on 12 other cores), and Debug it not so compute-bound so there the slowdown factor is "only" x2.

A workaround we've considered is using Hyper-V isolation. It kind of works but not really. A complete table with Hyper-V isolation would be:

Host Process isolation Hyper-V isolation
Debug 500s 1000s 700s
Release 1000s 2500s -ICE-

The Debug build has a significant overhead compared to running directly on the host and compared to what process isolation should have been, but it's still better than what process isolation does on this machine. But the Release build just crashed MSBuild.exe for lack of memory.

You see, Hyper-V isolation has the "nice" property of causing MSBuild.exe to crash with screams about not enough memory when launching the containers with as few as --memory 64GB and refusing to run at all with, let's say, --memory 96GB:

> docker run --rm -it --mount "type=bind,src=$(Get-Location),dst=C:\whatever" -w C:\whatever --isolation=hyperv --cpu-count 20 --memory 96GB some-builder:some-tag powershell.exe -Command "build command"
docker: Error response from daemon: hcs::CreateComputeSystem a3270d2ab11086bbcaac433a660d5cbde47fea3d2f0349b9d2ed22a9839a76e9: The paging file is too small for this operation to complete.
screenshot

image

That's a separate issue with Hyper-V isolation being less that useful but we're not here to solve that. The point is that using Hyper-V isolation isn't a valid workaround on the grounds of not working. Not that it would be a good workaround even if it had worked, on grounds of being slow and no reason for process isolation not to work properly.


¹ One element we have discovered that makes processes inside process-isolated containers to run slightly faster than directly on the host is less interference by certain security software. We assume there may be other causes. Generally we say that the performance of process-isolated docker containers is approximately equal to that of running directly on the host, except in pathological cases. Like the one we have here, unfortunately.

To Reproduce

Execute the following PowerShell commands:

[E:\]
> $sieve_URI = "https://github.com/kimwalisch/primesieve/releases/download/v11.1/primesieve-11.1-win-x64.zip" #or arm64
[E:\]
> Invoke-WebRequest -Uri $sieve_URI -OutFile "sieve.zip"
[E:\]
> Expand-Archive -Path ".\sieve.zip" -DestinationPath ".\sieve\"
[E:\]
> CmDiag.exe CreateContainer -Type ServerSilo -Id 11111111-1111-1111-1111-111111111111 -FriendlyName Foo
The container was successfully created. Its ID is: 11111111-1111-1111-1111-111111111111
The container will continue running until it is terminated. A new instance of cmdiag has been spun
up in the background to keep the container alive.

[E:\]
> CmDiag.exe Map 11111111-1111-1111-1111-111111111111 -ReadOnly "$PWD\sieve" "C:\sieve"
[E:\]
> CmDiag.exe Console 11111111-1111-1111-1111-111111111111 powershell.exe
Executing: powershell.exe
Windows PowerShell
Copyright (C) Microsoft Corporation. All rights reserved.

Install the latest PowerShell for new features and improvements! https://aka.ms/PSWindows

PS C:\Windows\system32> C:\sieve\primesieve.exe 1e14
Sieve size = 256 KiB
Threads = 20
0%

Take a look at your favorite CPU utilization tool. I used Sysinternals Process Explorer. You're welcome to use Task Manager, perfmon.msc or whatever rocks your boat.

Dell XPS 9530 with i9 13900H, 64GB RAM, aforementioned Micron SSD:

image

If you're not sure which core is which you can hover over the core. Process Explorer, unlike Task Manager tell you which logical core belongs to which physical core. Or you can use Sysinternals Coreinfo:

coreinfo.exe output:
Logical Processor to Cache Map:
**------------------  Data Cache          0, Level 1,   48 KB, Assoc  12, LineSize  64
**------------------  Instruction Cache   0, Level 1,   32 KB, Assoc   8, LineSize  64
**------------------  Unified Cache       0, Level 2,    1 MB, Assoc  10, LineSize  64
********************  Unified Cache       1, Level 3,   24 MB, Assoc  12, LineSize  64
--**----------------  Data Cache          1, Level 1,   48 KB, Assoc  12, LineSize  64
--**----------------  Instruction Cache   1, Level 1,   32 KB, Assoc   8, LineSize  64
--**----------------  Unified Cache       2, Level 2,    1 MB, Assoc  10, LineSize  64
----**--------------  Data Cache          2, Level 1,   48 KB, Assoc  12, LineSize  64
----**--------------  Instruction Cache   2, Level 1,   32 KB, Assoc   8, LineSize  64
----**--------------  Unified Cache       3, Level 2,    1 MB, Assoc  10, LineSize  64
------**------------  Data Cache          3, Level 1,   48 KB, Assoc  12, LineSize  64
------**------------  Instruction Cache   3, Level 1,   32 KB, Assoc   8, LineSize  64
------**------------  Unified Cache       4, Level 2,    1 MB, Assoc  10, LineSize  64
--------**----------  Data Cache          4, Level 1,   48 KB, Assoc  12, LineSize  64
--------**----------  Instruction Cache   4, Level 1,   32 KB, Assoc   8, LineSize  64
--------**----------  Unified Cache       5, Level 2,    1 MB, Assoc  10, LineSize  64
----------**--------  Data Cache          5, Level 1,   48 KB, Assoc  12, LineSize  64
----------**--------  Instruction Cache   5, Level 1,   32 KB, Assoc   8, LineSize  64
----------**--------  Unified Cache       6, Level 2,    1 MB, Assoc  10, LineSize  64
------------*-------  Data Cache          6, Level 1,   32 KB, Assoc   8, LineSize  64
------------*-------  Instruction Cache   6, Level 1,   64 KB, Assoc   8, LineSize  64
------------****----  Unified Cache       7, Level 2,    2 MB, Assoc  16, LineSize  64
-------------*------  Data Cache          7, Level 1,   32 KB, Assoc   8, LineSize  64
-------------*------  Instruction Cache   7, Level 1,   64 KB, Assoc   8, LineSize  64
--------------*-----  Data Cache          8, Level 1,   32 KB, Assoc   8, LineSize  64
--------------*-----  Instruction Cache   8, Level 1,   64 KB, Assoc   8, LineSize  64
---------------*----  Data Cache          9, Level 1,   32 KB, Assoc   8, LineSize  64
---------------*----  Instruction Cache   9, Level 1,   64 KB, Assoc   8, LineSize  64
----------------*---  Data Cache         10, Level 1,   32 KB, Assoc   8, LineSize  64
----------------*---  Instruction Cache  10, Level 1,   64 KB, Assoc   8, LineSize  64
----------------****  Unified Cache       8, Level 2,    2 MB, Assoc  16, LineSize  64
-----------------*--  Data Cache         11, Level 1,   32 KB, Assoc   8, LineSize  64
-----------------*--  Instruction Cache  11, Level 1,   64 KB, Assoc   8, LineSize  64
------------------*-  Data Cache         12, Level 1,   32 KB, Assoc   8, LineSize  64
------------------*-  Instruction Cache  12, Level 1,   64 KB, Assoc   8, LineSize  64
-------------------*  Data Cache         13, Level 1,   32 KB, Assoc   8, LineSize  64
-------------------*  Instruction Cache  13, Level 1,   64 KB, Assoc   8, LineSize  64

You can see that the first 6 pairs of cores share caches and have larger caches than the other 8 cores.

This also happens on ARM devices such as the Samsung Galaxy Book go (with Snapdragon 7c Gen 2, 4GB RAM) and the Surface Pro X (SQ1, 8GB RAM; SQ2, 16GB RAM):

image

On both these devices the problem manifests itself only when using Argon containers and running on battery power. That is:

AC power Battery
Host All cores utilized All cores utilized
Container All cores utilized Only LITTLE cores

Only when running inside a container and on battery power, the big cores spike for a moment, like on the Alder/Raptor Lake and then aren't utilized by the container anymore. If connected to AC power, they are utilized again and if disconnected they are used once more.

(The small fluctuation after the spike are due the the 7c being extremely weak and even running Task Manager and a browser requires non-negligible CPU power. On the Surface Pro X is smoother after the spike.)

The first six cores are the LITTLE ones and the other two are the big:

coreinfo.exe output:
Logical Processor to Cache Map:
*-------  Instruction Cache   0, Level 1,   32 KB, Assoc   4, LineSize  64
*-------  Data Cache          0, Level 1,   32 KB, Assoc   4, LineSize  64
*-------  Unified Cache       0, Level 2,   64 KB, Assoc   4, LineSize  64
********  Unified Cache       1, Level 3,    1 MB, Assoc  16, LineSize  64
-*------  Instruction Cache   1, Level 1,   32 KB, Assoc   4, LineSize  64
-*------  Data Cache          1, Level 1,   32 KB, Assoc   4, LineSize  64
-*------  Unified Cache       2, Level 2,   64 KB, Assoc   4, LineSize  64
--*-----  Instruction Cache   2, Level 1,   32 KB, Assoc   4, LineSize  64
--*-----  Data Cache          2, Level 1,   32 KB, Assoc   4, LineSize  64
--*-----  Unified Cache       3, Level 2,   64 KB, Assoc   4, LineSize  64
---*----  Instruction Cache   3, Level 1,   32 KB, Assoc   4, LineSize  64
---*----  Data Cache          3, Level 1,   32 KB, Assoc   4, LineSize  64
---*----  Unified Cache       4, Level 2,   64 KB, Assoc   4, LineSize  64
----*---  Instruction Cache   4, Level 1,   32 KB, Assoc   4, LineSize  64
----*---  Data Cache          4, Level 1,   32 KB, Assoc   4, LineSize  64
----*---  Unified Cache       5, Level 2,   64 KB, Assoc   4, LineSize  64
-----*--  Instruction Cache   5, Level 1,   32 KB, Assoc   4, LineSize  64
-----*--  Data Cache          5, Level 1,   32 KB, Assoc   4, LineSize  64
-----*--  Unified Cache       6, Level 2,   64 KB, Assoc   4, LineSize  64
------*-  Instruction Cache   6, Level 1,   64 KB, Assoc   4, LineSize  64
------*-  Data Cache          6, Level 1,   64 KB, Assoc   4, LineSize  64
------*-  Unified Cache       7, Level 2,  256 KB, Assoc   8, LineSize  64
-------*  Instruction Cache   7, Level 1,   64 KB, Assoc   4, LineSize  64
-------*  Data Cache          7, Level 1,   64 KB, Assoc   4, LineSize  64
-------*  Unified Cache       8, Level 2,  256 KB, Assoc   8, LineSize  64

Expected behavior

All cores utilized.

Configuration:

  • Edition: Windows 11 Enterprise, Windows 11 Pro
  • Base Image being used: Windows Server Core in original bug. Live host image using CmDiag.exe in reproduction. Because it doesn't matter.
  • Container engine: Docker
  • Container Engine version: 23.0.5

Additional context

It certainly doesn't seem to be an issue with the specific container engine, runtime or base image.

Due to the behavior on the ARM64 devices it might be related to power management. Perhaps the container is somehow "confused" about the power state or configuration?

@conioh conioh added the bug Something isn't working label Jul 7, 2023
@ntrappe-msft ntrappe-msft added the triage New and needs attention label Jul 10, 2023
@fady-azmy-msft
Copy link
Contributor

@Howard-Haiyang-Hao are you familiar with this issue?

@Howard-Haiyang-Hao
Copy link
Contributor

@fady-azmy-msft , It's first time for me to hear this issue. I am in the process to investigate this issue and keep you guys posted.

@Howard-Haiyang-Hao
Copy link
Contributor

@conioh I have successfully replicated the issue locally and am collaborating with the feature teams to gain a deeper understanding of the underlying reasons behind this behavior.

@conioh
Copy link
Author

conioh commented Aug 22, 2023

@Howard-Haiyang-Hao, that's great to hear.

Since opening the issue we have made the following findings, both obviously related to scheduling:

  1. If we modify the process affinity only to the P-core (and no E-core) the process is scheduled to the P-core, but only to half of them, i.e. only one of every two hyper-threaded cores. It sometimes even switch the logical hyperthreaded core the same physical core, but never runs on both logical cores of the single physical core.

  2. If we set the process priority to above normal all the process is scheduled to all cores.

I hope that helps.

@ntrappe-msft ntrappe-msft removed the triage New and needs attention label Aug 23, 2023
@microsoft-github-policy-service
Copy link
Contributor

This issue has been open for 30 days with no updates.
@Howard-Haiyang-Hao, please provide an update or close this issue.

@conioh
Copy link
Author

conioh commented Oct 5, 2023

@Howard-Haiyang-Hao, is there any update?

@Howard-Haiyang-Hao
Copy link
Contributor

Howard-Haiyang-Hao commented Oct 5, 2023

Thanks @conioh for the updates. Here's the reason why we observed the scheduling behaviors:

In the host case, primesieve.exe was launched with the following parent chain:
… explorer.exe -> WindowsTerminal.exe -> cmd.exe -> primesieve.exe
Where I’m guessing at least one of the ancestor was a window in focus thus receiving a high (Important) QoS.

In the container case, primesieve.exe has the following parent chain:
… wininit.exe -> services.exe -> cexecsvc.exe -> primesieve.exe
Where no process had a windows in focus, thus it looks like a background service and receives mostly a low (Utility) QoS.

By setting the process priority to above normal, we can change the schedule to treat the container process with high priority, overcoming the QoS behavior that we experienced.

I have initiated an email thread regarding the hyperthreaded core scheduling behaviors you described. I will keep you updated.

Thanks!

@conioh
Copy link
Author

conioh commented Oct 16, 2023

@Howard-Haiyang-Hao, thanks for the information. That's very interesting.

I made a couple of tests and indeed I see that if I set the process inside the container to HighQoS it utilizes all the cores and if I set the process outside the container to EcoQoS it utilizes only the E-cores. Unfortunately, unlike priority and affinity, thread QoS isn't inherited between processes as far as I can see, so setting my shell inside the container to HighQoS doesn't help its child processes (primvesieve.exe in the example and msbuild.exe etc. in my real world scenario), and I can't set the QoS on all the processes created as part of the build.

Also, although thread QoS in general explains this behavior, there still seems to be something missing here:

  1. On both ARM machines I tested (Surface Pro X and Samsung Galaxy Book Go), even with the default QoS the processes inside the container utilize all cores, unless on battery power.

  2. According to the documentation, both in Low and in Utility QoS, which you mentioned, the expected behavior is:

    On battery, selects most efficient CPU frequency and schedules to efficient core[s].

    But I'm getting this behavior on wall power rather than on battery power. This is in contrast with EcoQoS which I used in my test, for which the documentation says:

    Always selects most efficient CPU frequency and schedules to efficient cores.

    (Emphasis added by me.)

If I take the documentation to be the intended behavior it seems like the scheduler on ARM does the right thing, while the scheduler on x64 has a bug which reduces the performance on these (Low/Utility) threads even when it shouldn't.

Thank you.

@driver1998
Copy link

Wait, ARM machines? I don't think Windows Containers supports ARM ;).

#224

@conioh
Copy link
Author

conioh commented Dec 12, 2023

Wait, ARM machines? I don't think Windows Containers supports ARM ;).

#224

Obviously off-topic here, but you're confusing two meaning of the word "support":

  1. Agree to provide humans assistance in performing a task, and actually provide that assistance, provide documentation, etc.
  2. In a software, have code that performs a task.

Microsoft does not offer support (1) to use Windows containers on ARM64 Windows. They don't offer ARM64 base images for Docker for Windows and so on. Windows 11 for ARM64 supports (2) launching Windows process-isolation ("server silo"-based) containers. The Windows software binaries have that code and it works. You can test it yourself with the simple command I provided above. Using that simple command in unsupported (1) by Microsoft.

@conioh
Copy link
Author

conioh commented Jan 8, 2024

So it's been over two years since the Alder Lake CPUs were releases, six months since I've opened this issue and two months since an internal email thread on this issue has been initiated. Is there any news?

@Howard-Haiyang-Hao
Copy link
Contributor

@conioh, an internal discussion is underway regarding the treatment of processes running inside containers, aiming to handle them as regular processes instead of background processes. Implementing this change won't be straightforward, as it significantly affects system behaviors. The workaround you suggested, prioritizing the process, appears to address the issue. Regarding ARM containers, they are not officially supported at the moment. I have communicated your feedback to the feature team for an examination of the behaviors you highlighted.

@conioh
Copy link
Author

conioh commented Jan 12, 2024

@Howard-Haiyang-Hao, treating container foreground processes (e.g. run via docker run -it or CmDiag.exe Console) as foreground processes, in contrast with true background processes (e.g. run via docker exec without -it or CmDiag.exe Exec -NoWait) may be the correct thing to do, but it's not the only problem.

Even true background processes, that legitimately run in utility QoS, should be scheduled to all cores when running on wall power.

Setting the process priority to above normal isn't a viable workaround since the entire system stops responding until the build ends. Since the build is highly parallelized and the compiler doesn't yield CPU, giving it above normal priority causes CPU starvation for all the other processes running at normal priority.

@Howard-Haiyang-Hao
Copy link
Contributor

Thank you, @conioh. I've relayed the message to the feature team for further discussion.

@conioh
Copy link
Author

conioh commented May 1, 2024

Is there any news?

@fjs4
Copy link

fjs4 commented May 21, 2024

Apologies for the delayed response. If you'd like your app to opt-out of the "background" QoS behavior, you can use the SetProcessInformation API, with the ProcessPowerThrottling info class, to opt out of this behavior.

More information can be found here: https://learn.microsoft.com/en-us/windows/win32/api/processthreadsapi/nf-processthreadsapi-setprocessinformation

For example, this should be able to do something similar to the following to opt out of the background QoS.

PROCESS_POWER_THROTTLING_STATE PowerThrottling;
RtlZeroMemory(&PowerThrottling, sizeof(PowerThrottling));
PowerThrottling.Version = PROCESS_POWER_THROTTLING_CURRENT_VERSION;
PowerThrottling.ControlMask = PROCESS_POWER_THROTTLING_EXECUTION_SPEED;
PowerThrottling.StateMask = 0;

SetProcessInformation(GetCurrentProcess(),
ProcessPowerThrottling,
&PowerThrottling,
sizeof(PowerThrottling));

This should opt the current process out of the heuristics the system would have normally applied to determine the QoS to use for the process. If you need to apply the opt out for a child process you could specify the handle to the child process in the API. Note that you'd need the PROCESS_SET_INFORMATION access right if operating on another process.

It wasn't clear from the discussion above if this is already something you tried, or if this would work for your needs. Please let me know if you have tried this approach, and it wasn't sufficient for some reason.

Thanks, and I hope this helps.

@conioh
Copy link
Author

conioh commented May 21, 2024

@fjs4, we're aware of this API and it's not useful in this scenario.

First, MSBuild.exe creates many thousands of MSBuild.exe, cl.exe, link.exe, mspdbsrv.exe etc. child processes during the build. We can't control the thread QoS for all of them. Perhaps if thread QoS were inherited it would have been useful. But it's not so it's not. As I've written above (emphasis added):

I made a couple of tests and indeed I see that if I set the process inside the container to HighQoS it utilizes all the cores and if I set the process outside the container to EcoQoS it utilizes only the E-cores. Unfortunately, unlike priority and affinity, thread QoS isn't inherited between processes as far as I can see, so setting my shell inside the container to HighQoS doesn't help its child processes (primvesieve.exe in the example and msbuild.exe etc. in my real world scenario), and I can't set the QoS on all the processes created as part of the build.

(Second, but less critical, the code you've suggested doesn't simply "opt the current process out of the heuristics" but rather forcefully and explicitly sets the QoS to High. Why not Medium? Well, because there's no public API to set the QoS to anything but Eco or High for one. But do we really want High? Not necessarily. We want the default. The correct default that works correctly.)

Third, there's no justification to push this onto us. As I've already said above:

Even true background processes, that legitimately run in utility QoS, should be scheduled to all cores when running on wall power.

And further above:

Also, although thread QoS in general explains this behavior, there still seems to be something missing here:

  1. On both ARM machines I tested (Surface Pro X and Samsung Galaxy Book Go), even with the default QoS the processes inside the container utilize all cores, unless on battery power.

  2. According to the documentation, both in Low and in Utility QoS, which you mentioned, the expected behavior is:

    On battery, selects most efficient CPU frequency and schedules to efficient core[s].

    But I'm getting this behavior on wall power rather than on battery power. This is in contrast with EcoQoS which I used in my test, for which the documentation says:

    Always selects most efficient CPU frequency and schedules to efficient cores.

    (Emphasis added by me.)

If I take the documentation to be the intended behavior it seems like the scheduler on ARM does the right thing, while the scheduler on x64 has a bug which reduces the performance on these (Low/Utility) threads even when it shouldn't.

See also more details in microsoft/Windows-Dev-Performance#117

My employer pays good money to Microsoft and we expect Windows to function at least on some basic level.
Two and a half years since Alder Lake was launched and a year since I've opened this issue I think it's quite reasonable to expect someone to fix already that BUG in the x86-64 scheduler.

@nbrinks
Copy link

nbrinks commented Jun 14, 2024

I bumped into this issue recently at work.

My laptop exhibits this problem, but a co-workers desktop does not. Both CPUs are Raptor Lake, with a mix of performance and efficiency cores.

Laptop:

  • Dell Precision 5680
  • 13th Gen Intel(R) Core(TM) i7-13800H

Desktop:

  • Dell Precision 3660
  • 13th Gen Intel(R) Core(TM) i9-13900K

My use case sounds very similar to @conioh. I am attempting to run a build and compile software inside a container. Until this is resolved, running a build inside a windows container on a laptop is not practical. Performance is significantly degraded (e.g. 90-150% longer build times in my experience).

@conioh
Copy link
Author

conioh commented Jun 17, 2024

Hi @nbrinks.

Thank you so much for this.

I didn't mention it here, but we did check it on a desktop machine and we did encounter the same issue there. We don't have Alder Lake or Raptor Lake desktops at work, but one of my colleagues has a personal custom-built (not from a brand) desktop with an Intel i9 13900K and had the same problem there. To make sure we tried it again and the problems is still there.

But the data you provided made me think. Now the differentiating parameter wasn't just ARM vs x64, and it also wasn't Dell vs non-Dell. I thought it might be that something on your desktop environment tinkered with the priority or affinity; if you set the process priority to above normal its threads are schedules to all the cores and similarly if you set it's affinity to only include P-cores it also causes it to run on them.

But that was pretty unlikely. I was about to ask you for a bunch of data, including a WPR trace to see what's going on on your desktop. To do that I needed to compile a list of the pieces of information I wanted, and while doing that I seem to have stumbled upon something interesting.

It's not a solution because it works on one of our laptop models, doesn't work and our desktop, and we didn't test it on a second laptop model yet, but it's interesting.

Basically somehow between Microsoft, Intel and Dell, the power management settings were to schedule utility QoS threads only to E-cores, contrary to what the documentation says.

It's a bit delicate, not something I consider production-ready (and indeed works on one machine but not on the other 🤷 ), and may require tweaking depending on your specific configuration, so I'm reluctant to share it publicly. If you'd like, you're welcome to contact me privately and we can see if it helps you. Slack might be the best (I'm just assuming you use it too), but other means are possible.

In the meanwhile, if you don't mind, could you share the contents of the following Registry keys?

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\PolicyManager\current\device\knobs
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\PolicyManager\default\knobs
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\PolicyManager\providers
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Power\PowerSettings
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Power\User

(This includes almost exclusively power management configuration related information, at least on my machine, and in particular excludes all non-power related settings under PolicyManager\default, but of course you're welcome to review the information before sharing it.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

7 participants