Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[linux] cpu_count_physical is not correct #1620

Closed
sctlee opened this issue Nov 13, 2019 · 6 comments
Closed

[linux] cpu_count_physical is not correct #1620

sctlee opened this issue Nov 13, 2019 · 6 comments

Comments

@sctlee
Copy link

sctlee commented Nov 13, 2019

Platform

  • linux
  • psutil 5.6.2

Bug description
currently, psutil use '/sys/devices/system/cpu/cpu[0-9]*/topology/core_id' to calculate physical cores, but core id is only uniq in one socket.

def cpu_count_physical():
    """Return the number of physical cores in the system."""
    # Method #1
    core_ids = set()
    for path in glob.glob(
            "/sys/devices/system/cpu/cpu[0-9]*/topology/core_id"):
        with open_binary(path) as f:
            core_ids.add(int(f.read()))
    result = len(core_ids)
    if result != 0:
        return result

Test results

[root@dce100 ~]# lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                4
On-line CPU(s) list:   0-3
Thread(s) per core:    1
Core(s) per socket:    2
座:                 2
[root@dce100 ~]# cat /proc/cpuinfo | grep -E "^(physical|core|process)"
processor	: 0
physical id	: 0
core id		: 0
processor	: 1
physical id	: 0
core id		: 1
processor	: 2
physical id	: 1
core id		: 0
processor	: 3
physical id	: 1
core id		: 1
>>> import psutil
>>> psutil.cpu_count(logical=False)
2
@sctlee sctlee added the bug label Nov 13, 2019
@amanusk
Copy link
Collaborator

amanusk commented Nov 13, 2019

This is also related to #1392 and #1558, currently there is no support for getting the number of socket in the system reliably. The number of sockets can be used as a sanity check for number of physical CPUs and can be interesting by itself.
Perhaps we should take a look at how this is done in lscpu:
https://github.com/karelzak/util-linux/blob/9cca1ef2d5283a461c3a6d9aa1a036bb4ada6e70/sys-utils/lscpu.c#L1078

@DannyVanpoucke
Copy link

Could replacing the "set()" with an array resolve the problem? At least on one system I noticed the core_id to only be present for the physical cores...alternately this may provide a work-around to get the number of sockets as being the double-counting factor. (of course if some systems implement a core_id file for the logical cores as well this won't work)

@Keou0007
Copy link

Keou0007 commented Feb 4, 2020

I've noticed this has broken on my system lately.
I run dual socket systems, some with hyperthreading disabled in the bios. OS is CentOS 6.

Some code I wrote used psutil.cpu_count(logical=False) to ensure I was only spawning as many processes as physical cores, and not using hyperthreading. It used to work, but no longer does.

Now, psutil.cpu_count() will report the number of real or virtual cores available. Which is the number of physical cores if hyperthreading is disabled, else double that.
psutil.cpu_count(logical=False) will only the report the number of physical cores on a single socket. Which is in every case here, half the number of physical cores that actually exist.

@jandrovins
Copy link
Contributor

jandrovins commented Apr 14, 2020

I'm having a related problem.

FAIL: psutil.tests.test_linux.TestSystemCPUCountPhysical.test_against_lscpu
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/vincent/OpenSourceDeveloping/psutil/psutil/tests/test_linux.py", line 689, in test_against_lscpu
    self.assertEqual(psutil.cpu_count(logical=False), len(core_ids))
AssertionError: 4 != 2

I have one socket of AMD A10-5757M APU with Radeon(tm) HD Graphics, which counts with two physical cores. I have hyperthreading activated.

I've tested Method two in two nodes each with two sockets running CentOS 8 and Centos 7, with kernels 4.18.0 and 3.10.0, respectively. Also, I tested in my PC running Arch with kernel 5.6.2, and I see no reason for Method two to fail. I think we can drop Method one, but someone who knows better the codebase should have the call.

def cpu_count_physical():
    """Return the number of physical cores in the system."""
    # Method #1
    core_ids = set()
    for path in glob.glob(
            "/sys/devices/system/cpu/cpu[0-9]*/topology/core_id"):
        with open_binary(path) as f:
            core_ids.add(int(f.read()))
    result = len(core_ids)
    if result != 0:
        return result

    # Method #2
    mapping = {}
    current_info = {}
    with open_binary('%s/cpuinfo' % get_procfs_path()) as f:
        for line in f:
            line = line.strip().lower()
            if not line:
                # new section
                if (b'physical id' in current_info and
                        b'cpu cores' in current_info):
                    mapping[current_info[b'physical id']] = \
                        current_info[b'cpu cores']
                current_info = {}
            else:
                # ongoing section
                if (line.startswith(b'physical id') or
                        line.startswith(b'cpu cores')):
                    key, value = line.split(b'\t:', 1)
                    current_info[key] = int(value)

    result = sum(mapping.values())
    return result or None  # mimic os.cpu_count()

I'm looking forward to contribute. If you want me to make a pull request, please answer this post.

@giampaolo
Copy link
Owner

I didn't investigate this issue yet but if you think you understand the problem and know a solution please make a PR so we'll have more room for discussion.

@giampaolo
Copy link
Owner

Fixed in #1727.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants