-
-
Notifications
You must be signed in to change notification settings - Fork 30.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
regrtest: log "CPU usage" on Windows #78241
Comments
I would help to debug race conditions on Windows to log the "CPU usage" on regrtest, as we do on other platforms (using os.getloadavg()). Links: |
Annoyingly, it looks like Windows does not provide an API that gives an average value. There is a counter exposed called "System \ Processor Queue Length" which does what the equivalent of unix's load https://blogs.technet.microsoft.com/askperf/2008/01/15/an-overview-of-processor-bottlenecks/ But we're gonna have to average it ourselves if we want this information. |
Taking a shot at this, should take a day or so. |
There is an old ticket for this in psutil with some (possible useful) references in it: |
Also prior conversation: |
Thanks a lot for that link Jeremy, it was really helpful. After reading up on it, my take is that winapi is the most appropriate place for this, it is a non public api that's used in the stdlib. I've used Windows APIs in a way that we don't need to manually start up a thread and call a calc_load function, instead using a callback invoked by windows. Internally this uses a thread pool, but it means we don't have to worry about managing the thread ourselves. The load is stored as a global but the winapi module is already marked as "-1" indicating it has global state, so that shouldn't be a problem. https://docs.python.org/3/c-api/module.html#c.PyModuleDef.m_size Like Jeremy noted, using WMI does add a 5mb overhead or so to the calling process. One more caveat is that the PdhAddEnglishCounterW function is only available in Vista+. I'm not sure if we still support Windows XP, but the alternative is to use PdhAddCounter, which breaks if the system language is not english because the counter paths are localized. |
Opened up a PR with a proof of concept to get feedback on if this approach is reasonable. Giampaolo, on your psutil issue you specifically said, "(possibly without using WMI)" Is there any particular problem with using WMI? |
Performance. In general WMI is (a lot) slower than the Windows API counterpart (psutil never uses WMI except in unit tests). Don't know if this matters for this specific issue though, or whether a correspondent Windows API for doing the same thing exists. |
Aah, yeah I don't think there's a good way of doing it purely from the windows API. There might be a way to enumerate through all the processes and see if they're queued up but I didn't look into it. In this case it should be fine, we just pay a bit of WMI cost to initialize the query, all the updating and retrieval is done asynchronously to the python code. |
Not that it matters all that much, but from a terminology standpoint, WMI != PDH != Performance Counters. Performance counters (the objects, not the topic) are provided by DLLs registered in the HKLM\SYSTEM\CurrentControlSet\Services key. Their data is accessed via registry API functions using the HKEY_PERFORMANCE_DATA root key. PDH (Performance Data Helper) provides an abstraction layer that can access those values among other things like a GUI or writing to log files. WMI (Windows Management Instrumentation) is yet another layer on top of PDH and raw Performance Counters. In this case of the "System" performance counter object, it is provided by a performance DLL (perfos.dll in the case of Win10 1803). If overhead (memory and/or CPU) is a concern, then accessing the counter data via the registry is the way to go. |
I suppose there is no way to emulate os.getloadavg() on Windows because that would necessarily imply using a thread to call the necessary routine (WMI, PDH, whatever...) every X secs or something, correct? |
Correct. Windows provides the building blocks for implementing getloadavg(), but does not provide an interface that does the averaging. That is deferred to a per application basis. The best that an application can do for that is to use thread pools. You can think of thread pools as kernel-managed threads (different from user-managed threads via CreateThread()). As of Win10 1703, any process linked with DLLs automatically have thread pools created for them (to parallel-ize the loading of said DLLs). Leveraging that feature would minimize the costs incurred to do the running average. |
Is the function I used for the callback, RegisterWaitForSingleObject https://docs.microsoft.com/en-us/windows/desktop/api/winbase/nf-winbase-registerwaitforsingleobject |
The RegisterWaitForSingleObject() function does use the thread pool API: https://docs.microsoft.com/en-us/windows/desktop/ProcThread/thread-pool-api However, PdhCollectQueryDataEx() also creates a user-space thread to handle its work of setting the event whenever the timeout elapses. I would think that it should be preferable to use a single thread pool worker to retrieve the queue length and calculate the average. |
Roger, personally I don't think its worth it to complicate the code in order to use the thread pool API. Especially considering this is just a private API and the only consumer will be the test suite runner. Let's see what the core devs think when this gets reviewed. |
Logging the current value can be an acceptable compromise. When we run tests in subprocesses (python3 -m test -jN), we can run a test every N milliseconds with no thread. It's ok if the average is not accurate. Most buildbots use -jN for performance but also to isolate the main processes from crashes and hard timeouts. |
Processor Queue Length: Corresponds to the number of threads waiting for processor time. A processor bottleneck develops when threads of a process require more processor cycles than are available. If more than a few processes attempt to utilize the processor's time, you might need to install a faster processor. Or, if you have a multiprocessor system, you could add a processor. When you examine processor usage, consider the type of work that the instance of SQL Server performs. If SQL Server performs many calculations, such as queries involving aggregates or memory-bound queries that require no disk I/O, 100 percent of the processor's time can be used. If this causes the performance of other applications to suffer, try changing the workload. For example, dedicate the computer to running the instance of SQL Server. Usage rates around 100 percent, where many client requests are being processed, may indicate that processes are queuing up, waiting for processor time, and causing a bottleneck. Resolve the problem by adding faster processors. -- Is it exactly the same thing on Unix (load average)? If not, I would prefer to use a different name in regrtest and "loadavg". Maybe "PQL avg"? What is the impact of the number of CPUs on this value? |
I don't think taking instantaneous values instead of averaging will work out too well. For reference I've attached a screenshot. It has sampled values at every second on an unloaded computer and then with running prime95 for cpu stress testing. The load tends to peak and fall.
Indeed it is: https://en.wikipedia.org/wiki/Load_(computing)#Unix-style_load_calculation "An idle computer has a load number of 0 (the idle process isn't counted). Each process using or waiting for CPU (the ready queue or run queue) increments the load number by 1." From what I can tell, the number of processors are dealt with the same way as on Linux, that is, a single core processor is overloaded when the load is >1 and a quad core processor is overloaded when the load is >4 |
I found a command to get the CPU usage in percent *per* CPU. Here with 2 CPUs: vstinner@WIN C:\vstinner\python\master>wmic cpu get loadpercentage |
This is the WMI solution we are trying to avoid. But then again, if it's solely for our tests, perhaps the best way to approach this is to start a Python thread that periodically runs this command? I also haven't seen it suggested, but perhaps GetProcessTimes (https://docs.microsoft.com/en-us/windows/desktop/api/processthreadsapi/nf-processthreadsapi-getprocesstimes) (or GetThreadTimes) would provide enough information to detect the same information? |
My intent is to get an idea if the whole system is busy. Not if the current Python process is busy. Most buildbots run tests with multiple worker processes (at least 2). |
psutil exposes this functionality as "psutil.cpu_percent()": I'm not sure if it's worth it to copy all that stuff into Modules/_winapi.c and test/libregrtest/main.py though. It would probably be simpler to change the policy and allow (at least some) some third party libs in cPython's test suite. =) |
This sounds like a very good solution to me, it avoids adding the complexity of the C code. We actually have two options here, to keep the results consistent with the unix load, we can use To get cpu usage, we can use the command Victor posted. I'll make an alternative PR with that today just so we can contrast the two approaches. |
I'm actually totally okay with this, as I'd really like to have JUnit XML output from the test suite, which is easiest to do with the existing third-party libraries. Can we formalize a way by which optional third-party libraries are allowed? Provided they aren't critical for the overall pass/fail state of the test suite (or the more strict alternative: pass/fail state of *each* test), I don't see any particular harm in certain site packages being used. (This is probably a discussion for python-dev, assuming the policy is written down somewhere.) |
Processor Queue Length seems simpler and easier to read. I don't want to log 24 numbers per regrtest output line if a machine has 24 CPUs... The load average is a "raw" value to give the idea if the system is "loaded" or not. More precise metrics can be used later to debug a test failure, but manually. |
PR 8287 seems short to me and it seems like psutils doesn't expose Processor Queue Length, so I'm not sure why we are talking about depending on psutils? |
I opened up #8357 with this strategy, in my opinion its a lot nicer and doesn't clutter up _winapi with half baked, extremely specialized functions. Its a bit more involved than running a thread, details are on the PR. |
I'm not sure if you're strictly interested in getting system load or if CPU utilization is also fine. FWIW with psutil you would be able to get the system-wide CPU utilization occurred in a given period of time: >>> import psutil, time
>>> psutil.cpu_percent(interval=None) # non-blocking
0.0
>>> time.sleep(60)
>>> psutil.cpu_percent(interval=None) # average of the last 60 secs
23.6
>>> ...and you can do the same for the current process too (psutil.Process().cpu_percent()). |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: