-
-
Notifications
You must be signed in to change notification settings - Fork 31.7k
add os.cpu_count() #62114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
multiprocessing.cpu_count() implementation should be made available in the os module, where it belongs. |
Note that I think it might be interesting to return 1 if the actual value cannot be determined, since that's a sensible default, and likely what the caller will do anyway. This contrasts with the current multiprocessing's implementation which raised NotImplementedError. |
If you can't determine the number of CPUs, return a clear "can't determine" value, such as 0 or -1. Returning 1 will hide information, and it's an easy default for the caller to apply if they want to. |
I am interested to submit a patch on this. Should I move the implementation to os module and made the multiprocessing one as an alias ? or keep it in both places ? I prefer the idea of returning -1 instead of the current way of raising NotImplementedError in case we can not determine the number of CPU(s). |
Yes, you should move it, add a corresponding documentation to
Seriously, I don't see what this brings. Since the user can't do |
Seriously, return zero, and I can use it as: cpu_count = os.cpu_count() or 1 Why throw away information? |
Returning 0 or None sounds better to me than 1 or -1. |
I also vote +1 for returning None when the information is unknown. Just write "os.cpu_count() or 1" if you need 1 when the count is unknown ;-) |
See also bpo-17444, Trent Nelson wrote an implementation of os.cpu_count(). |
I still don't like it. And returning an exception is IMO useles, since the user can't do
os.cpu_count() or 1 is an ugly idiom.
I don't see exactly what this C implementation brings over the one in |
multiprocessing.cpu_count() creates a subprocess on BSD and Darwin to get the number of CPU. Calling sysctl() or sysctlnametomib() should be faster and use less memory. On Windows, GetSystemInfo() is called instead of reading an environment variable. I suppose that this function is more reliable. Trent's os.cpu_count() returns -1 if the count cannot be read, multiprocessing.cpu_count() raises NotImplementedError. |
Fair enough, I guess we should use it then. We just have to agree on the value to return when the number of CPUs |
Returning None sounds reasonable to me. |
As for why to not return 1, I can imagine code that checks cpu_count, and only if it returns the "don't know" result would it invoke some more expensive method of determining the CPU count on platforms that cpu_count doesn't support. Since the os module is the home for "close to the metal" (well, OS) functions, I agree that it does not make sense to throw away the information that cpu_count can't actually determine the CPU count. Contrawise, I could see the multiprocessing version returning 1, since it is a higher level API and os.cpu_count would be available for those wanting the "don't know" info. |
Based on the conversation and the particular inputs to the thread form neologix and ezio, I would like to submit this patch. It probably needs modification(s) as I am not sure what to do with the implementation that is already present in multiprocessing. This patch simply calls the os.cpu_count() from multiprocessing now and behaves as it would have previously. The test cases are also added to test_os similar to ones from multiprocessing. |
Thanks, but it would be better to reuse Trent's C implementation |
A few small points: Use Use And this I think will throw an exception in Python 3: |
I think the idiom I have added some other comments on Rietveld. |
I agree with Charles-François. An approach using C library functions is far superior to launching external commands. |
And I maintain it's an ugly idiom ;-) |
The user can also raise an error. For example, if I'm writing a |
In general I agree with you. Actually the os module should contains two functions: cpu_count() which fallbacks to 1 and is_cpu_counting_supported() for rare need. But this looks even more ugly and I choose single function even if in most cases I need use strange idiom. |
Appreciate everyone's feedback. I have modified the patch based on further messages in the thread. @neologix @ned: |
Yogesh, didn't you notice comments on Rietveld? |
@serhiy |
Now we have three cpu_count() functions: multiprocessing.cpu_count() raises an exception on failure, posix.cpu_count() returns -1, and os.cpu_count() returns None. It will be easy to get rid of Python wrapper in the os module and return None directly from C code. |
Returning None from C code sounds reasonable to me. Anyone else wants to pitch in with suggestions for/against this? |
Modified patch based on review by neologix |
Here is a new patch (cpu_count.patch) with a different approach:
So os.cpu_count() as a well defined behaviour, and shutil.cpu_count() is the convinient API. My patch is based on bpo-17914-4.patch and so also on Trent Nelson's code. I only tested my patch on Linux. It must be tested on other platforms. If nobody tests the patch on HPUX, it would be safer to remove HPUX support. It looks like Trent's code was not tested, I don't think that his code works on platforms other than Windows. test_os will fail if os.cpu_count() fails. The test should be fixed to handle failures, but I prefer to start with a failing test to check if the error case occurs on a buildbot. |
The return type of the C function sysconf() is long, but Python uses int: I opened the issue bpo-17964 to track this bug. (os.cpu_count() uses sysconf()). |
Do we really need cpu_count() in three places with different semantics? IMO just one version in Modules/posixmodule.c is enough (with
Well, HP-UX isn't officially supported, so that's reasonable.
That's reasonable. |
-1. This is simply too complicated for a simple API. Just let |
a. Let's say someone works on scheduling and power managment modules. It is important to know that the platform does not support providing cpu_count() instead of giving 1. This will ensure that they don't go about erroneously setting wrong options for scheduler and/or overclocking the CPU too much(or too little). b. On the other hand if another user just wants to use a cpu_count number from a his application to determine the number of threads to spawn he can set These are just 2 examples to demonstrate that it must be the end-user who decides what to do with the proper_value or reasonable_error_value given by cpu_count()
|
Modified patch based on further comments and review.
|
Just for giggles, here's the glibc default implementation on non Linux
platforms:
http://sourceware.org/git/?p=glibc.git;a=blob;f=misc/getsysstats.c;hb=HEAD
"""
int
__get_nprocs ()
{
/* We don't know how to determine the number. Simply return always 1. */
return 1;
}
""" And on Linux, 1 is returned as a fallback when you don't have the (The enum discussion enlighted me, endless discussions are so fun!) |
Do you want cpu_count() to return an enum? >>> os.cpu_count()
<CPUCount.Four: 4> |
Python's goal is not to emulate the suboptimal parts of other languages. We have dynamic typing, and so can return None from the same function that returns 1. And we have compact expressions like |
Well, I'm sure they could have returned -1 or 0, which are valid C Furthermore, you're missing the point: since the underlying libraries
That's not because it's feasible that it's a good idea. Dynamic typing is different from no typing: the return type of a What's looks more natural Even in dynamic typing, it's always a good thing to be consistent in Why? With a os.cpu_count() returning a number (or eventually raising an def cpu_count() -> int
[...] What does it become if you can return None instead? For example, there's some discussion to use such signatures or other Basically, you just implement: /*
** [annotation]
** @return int
*/
long cpu_count_impl(void)
{
long result = 1;
#ifdef _SC_NPROCESSORS_CONF
long = sysconf(_SC_NPROCESSORS_ONL);
[...]
#fi
return result;
} And the DSL processor takes care of the rest. What does this become if your return object isn't typed? Really, typing is of paramount importance, even in a dynamically typed language. And pretending to return a distinct value is pretty much useless, Now I hope I made my point, but honestly at this point I don't care |
Well, they can be wrong sometimes, too :-)
The patch doesn't seem to rely on the glibc, so we are fine here.
It's typed, just the type is "int or None". I'm sure some Anyway, I don't mind whether it's None or 0 or -42. But let's not hide |
Based on the last 3 messages by Ned, Charles and Antoine, I keep thinking that arguments made by Charles are very valid ones and that it would be better to return 1. I say this (partly from the 'type' argument, but), mainly, *if* its known that the underlying libraries are returning 1 on failure. *If* that is the case I see no reason try to return None (which, btw, will never happen if none of the calls return non-positive values ever). |
Indeed, as can I ;-)
sysconf(_SC_NPROCESSORS_CONF) is implemented with the above function For other Unix systems apparently they use the sysctl() syscall, and I
I recently started learning Haskell. You have Either and Maybe that
I liked your suggestion of making it an enum: >>> os.cpu_count()
<CPUCount.UnknownCount: 42> Nah, None is fine to me! |
Minor modifications based on review comments.
|
Typo fix |
+1 for returning None. |
New changeset 5e0c56557390 by Charles-Francois Natali in branch 'default': |
Alright, committed. I'm attaching a patch to replace several occurrences of |
In my patch cpu_count.patch, I changed posix_cpu_count():
Sorry for this late review. |
New changeset a85ac58e9eaf by Charles-Francois Natali in branch 'default': |
New changeset f9d815522cdb by Charles-Francois Natali in branch 'default': |
Indeed.
Ouch. This was overlooked when the type was changed to long.
Done in a subsequent commit (especially since the OS-X special case |
New changeset 6a0437adafbd by Charles-François Natali in branch 'default': |
In message "On Windows, GetSystemInfo() is called instead of reading an environment variable. I suppose that this function is more reliable." From my reading, and based on feedback from one of my customers, I believe he is correct and that GetSystemInfo() ought to be used on Windows. (It is available in pywin32 win32api.) |
Please open a new issue to suggest this enhancement, this issue is closed. |
Since this is closed I've created a new issue as requested: |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: