Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

/sys/devices/system/cpu/online not container aware #301

Closed
ilhaan opened this issue Aug 21, 2019 · 10 comments
Closed

/sys/devices/system/cpu/online not container aware #301

ilhaan opened this issue Aug 21, 2019 · 10 comments

Comments

@ilhaan
Copy link

ilhaan commented Aug 21, 2019

I have noticed that containers see all cores when I inspect /sys/devices/system/cpu/online, even though they have been restricted to a single core. /proc/cpuinfo seems to be reporting correctly.

For example, I have a container test:

root@server:~# lxc exec test -- grep -c "processor" /proc/cpuinfo 
1

Which is the expected output. However.

root@server:~# lxc exec test -- cat /sys/devices/system/cpu/online
0-55

I tested this using the following snap installed versions of LXD:

  • 3.0.4 (from the 3.0/stable channel)
  • 3.16 (from the 3.16/stable channel)

More info from my server:

root@server:~# lsb_release -a 
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 18.04.3 LTS
Release:	18.04
Codename:	bionic
root@server:~# uname -a 
Linux v45 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
root@server:~# snap --version 
snap    2.40
snapd   2.40
series  16
ubuntu  18.04
kernel  4.15.0-58-generic

Seems like there was a merge in May 2019 to address this. The comments on this merge make it seem like this should be enabled by default.

I discovered this while setting up a Kubernetes cluster with Rancher where I used LXD containers as nodes and posted an issue in the rancher repo. Rancher seems to be using /sys/ to enumerate CPU and memory resources on nodes, which is why I need LXCFS to fix my problem.

I also posted about this on the LXD forum.

Please let me know if you need me to provide additional information.

@ilhaan ilhaan changed the title /sys/devices/system/cpu/online not container aware /sys/devices/system/cpu/online not container aware Aug 21, 2019
@stgraber
Copy link
Member

Yeah, there is a bug in the generation of the online value...

root@shell01:~# grep lxcfs /proc/mounts
lxcfs /proc/cpuinfo fuse.lxcfs rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other 0 0
lxcfs /proc/diskstats fuse.lxcfs rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other 0 0
lxcfs /proc/loadavg fuse.lxcfs rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other 0 0
lxcfs /proc/meminfo fuse.lxcfs rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other 0 0
lxcfs /proc/stat fuse.lxcfs rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other 0 0
lxcfs /proc/swaps fuse.lxcfs rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other 0 0
lxcfs /proc/uptime fuse.lxcfs rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other 0 0
lxcfs /sys/devices/system/cpu/online fuse.lxcfs rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other 0 0
lxcfs /var/lib/lxcfs fuse.lxcfs rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other 0 0
root@shell01:~# cat /sys/devices/system/cpu/online
0-7
root@shell01:~# grep ^Cpus_allowed_list /proc/self/status
Cpus_allowed_list:	5-6
root@shell01:~# grep ^processor /proc/cpuinfo 
processor	: 0
processor	: 1
root@shell01:~# stat -f /sys/devices/system/cpu/online
  File: "/sys/devices/system/cpu/online"
    ID: 0        Namelen: 255     Type: fuseblk
Block size: 512        Fundamental block size: 512
Blocks: Total: 0          Free: 0          Available: 0
Inodes: Total: 0          Free: 0
root@shell01:~# 

@brauner can you take a look? For some reason we appear to be hitting some fallback case where we get the host value rather than something which matches our task's cpuset configuration.

@ilhaan
Copy link
Author

ilhaan commented Aug 26, 2019

@stgraber Thanks for looking into this. I just noticed the following:

root@test:~# grep lxcfs /proc/mounts
lxcfs /proc/cpuinfo fuse.lxcfs rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other 0 0
lxcfs /proc/diskstats fuse.lxcfs rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other 0 0
lxcfs /proc/loadavg fuse.lxcfs rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other 0 0
lxcfs /proc/meminfo fuse.lxcfs rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other 0 0
lxcfs /proc/stat fuse.lxcfs rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other 0 0
lxcfs /proc/swaps fuse.lxcfs rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other 0 0
lxcfs /proc/uptime fuse.lxcfs rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other 0 0
lxcfs /var/lib/lxcfs fuse.lxcfs rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other 0 0

I don't see /sys/devices/system/cpu/online listed above. Tested this with LXD 3.0.4 and 3.16 and got the same results.

@time-river
Copy link
Contributor

I also find the problem, the following software use /sys/devices/system/cpu/online

  1. # lscpu
  2. nginx when use pid /run/nginx.pid;

Note: For nginx, it use sysconf(_SC_NPROCESSORS_ONLN); to get cpu number which the # strace is /sys/devices/system/cpu/online.

So, is there a plan to support virtual /sys/devices/system/cpu/online?

@stgraber
Copy link
Member

There is support in lxcfs to virtualize /sys/devices/system/cpu/online as can be seen in my listing above on a suitably recent LXCFS, there is however an issue with the way it's rendered.

@yinhongbo
Copy link

@ilhaan @stgraber Please try to use the code of the latest master branch, I just submitted a PR may solve your problem.

@ilhaan
Copy link
Author

ilhaan commented Sep 13, 2019

@yinhongbo thanks for submitting the PR. Not sure how I can test this in a snap installed version of LXD. I'll try to figure it out, but would appreciate if you could give me some pointers on how to do this. Until then, I'll try poking around the LXD 3.17 snap.

@adamszen
Copy link

adamszen commented Sep 25, 2019

@ilhaan @stgraber Please try to use the code of the latest master branch, I just submitted a PR may solve your problem.

Still not working. I start lxcfs (latest master branch):

# lxcfs -l /var/lib/lxcfs/
mount namespace: 5
hierarchies:
  0: fd:   6: cpuset,cpu,cpuacct,blkio,memory,devices,freezer,perf_event,hugetlb,pids,rdma

All mounts are ok:

# mount | egrep "cgroup|lxcfs"
none on /cgroup type cgroup (rw)
lxcfs on /var/lib/lxcfs type fuse.lxcfs (rw,nosuid,nodev,allow_other)

On host:

# cat /proc/cpuinfo  | grep processor
processor       : 0
processor       : 1
processor       : 2
processor       : 3
processor       : 4
processor       : 5
processor       : 6
processor       : 7
processor       : 8
processor       : 9
processor       : 10
processor       : 11
# cat /sys/devices/system/cpu/online
0-11
# cat /cgroup/cpuset.cpus
0-11

I do:

# echo "1-8" >/cgroup/lxc/cpuset.cpus
# cat /cgroup/lxc/cpuset.cpus
1-8

Then i start container (name: test):

# lxc-ls 
test

And do on host:

# echo "2" >/cgroup/lxc/test/cpuset.cpus
# cat /cgroup/lxc/test/cpuset.cpus
2

SSH into test and:

root@test:~# grep lxcfs /proc/mounts 
lxcfs /proc/cpuinfo fuse.lxcfs rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other 0 0
lxcfs /proc/diskstats fuse.lxcfs rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other 0 0
lxcfs /proc/loadavg fuse.lxcfs rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other 0 0
lxcfs /proc/meminfo fuse.lxcfs rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other 0 0
lxcfs /proc/stat fuse.lxcfs rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other 0 0
lxcfs /proc/swaps fuse.lxcfs rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other 0 0
lxcfs /proc/uptime fuse.lxcfs rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other 0 0
lxcfs /sys/devices/system/cpu/online fuse.lxcfs rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other 0 0
root@test:~# cat /proc/cpuinfo | grep processor
processor       : 0

OK, but:

root@test:~# cat /sys/devices/system/cpu/online
0-11

Not ok :(

It's something in cgfs_get_value function in bindings.c i thing - this funtcion is reading the wrong file?
Regards!

@yinhongbo
Copy link

@ilhaan @stgraber Please try to use the code of the latest master branch, I just submitted a PR may solve your problem.

Still not working. I start lxcfs (latest master branch):

# lxcfs -l /var/lib/lxcfs/
mount namespace: 5
hierarchies:
  0: fd:   6: cpuset,cpu,cpuacct,blkio,memory,devices,freezer,perf_event,hugetlb,pids,rdma

All mounts are ok:

# mount | egrep "cgroup|lxcfs"
none on /cgroup type cgroup (rw)
lxcfs on /var/lib/lxcfs type fuse.lxcfs (rw,nosuid,nodev,allow_other)

On host:

# cat /proc/cpuinfo  | grep processor
processor       : 0
processor       : 1
processor       : 2
processor       : 3
processor       : 4
processor       : 5
processor       : 6
processor       : 7
processor       : 8
processor       : 9
processor       : 10
processor       : 11
# cat /sys/devices/system/cpu/online
0-11
# cat /cgroup/cpuset.cpus
0-11

I do:

# echo "1-8" >/cgroup/lxc/cpuset.cpus
# cat /cgroup/lxc/cpuset.cpus
1-8

Then i start container (name: test):

# lxc-ls 
test

And do on host:

# echo "2" >/cgroup/lxc/test/cpuset.cpus
# cat /cgroup/lxc/test/cpuset.cpus
2

SSH into test and:

root@test:~# grep lxcfs /proc/mounts 
lxcfs /proc/cpuinfo fuse.lxcfs rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other 0 0
lxcfs /proc/diskstats fuse.lxcfs rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other 0 0
lxcfs /proc/loadavg fuse.lxcfs rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other 0 0
lxcfs /proc/meminfo fuse.lxcfs rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other 0 0
lxcfs /proc/stat fuse.lxcfs rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other 0 0
lxcfs /proc/swaps fuse.lxcfs rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other 0 0
lxcfs /proc/uptime fuse.lxcfs rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other 0 0
lxcfs /sys/devices/system/cpu/online fuse.lxcfs rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other 0 0
root@test:~# cat /proc/cpuinfo | grep processor
processor       : 0

OK, but:

root@test:~# cat /sys/devices/system/cpu/online
0-11

Not ok :(

It's something in cgfs_get_value function in bindings.c i thing - this funtcion is reading the wrong file?
Regards!

I think I know the reason for this problem. I am submitting a PR. Notify you after the merger.

@adamszen
Copy link

It works very well!

Sorry for the gif but it's easier to show me this ;)

screen-capture.gif

Thanks for this patch!

@stgraber
Copy link
Member

stgraber commented Mar 3, 2020

Sounds like this got resolved with #307.

@stgraber stgraber closed this as completed Mar 3, 2020
azat added a commit to azat-archive/jemalloc that referenced this issue Dec 17, 2021
Determinitic number of CPUs is important for percpu arena to work
correctly, since it uses cpu index - sched_getcpu(), and if it will
greater then number of CPUs bad thing will happen, or assertion will be
failed in debug build:

    <jemalloc>: ../contrib/jemalloc/src/jemalloc.c:321: Failed assertion: "ind <= narenas_total_get()"
    Aborted (core dumped)

Number of CPUs can be obtained from the following places:
- sched_getaffinity()
- sysconf(_SC_NPROCESSORS_ONLN)
- sysconf(_SC_NPROCESSORS_CONF)

For the sched_getaffinity() you may simply use taskset(1) to run program
on a different cpu, and in case it will be not first, percpu will work
incorrectly, i.e.:

    $ taskset --cpu-list $(( $(getconf _NPROCESSORS_ONLN)-1 )) <your_program>

_SC_NPROCESSORS_ONLN uses /sys/devices/system/cpu/online, LXD/LXC
virtualize /sys/devices/system/cpu/online file [1], and so when you run
container with limited limits.cpus it will bind randomly selected CPU to
it

  [1]: lxc/lxcfs#301

_SC_NPROCESSORS_CONF uses /sys/devices/system/cpu/cpu*, and AFAIK nobody
playing with dentries there.

So if all three of these are equal, percpu arenas should work correctly.

And a small note regardless _SC_NPROCESSORS_ONLN/_SC_NPROCESSORS_CONF,
musl uses sched_getaffinity() for both. So this will also increase the
entropy.

Also note, that you can check is percpu arena really applied using
abort_conf:true.

Refs: jemalloc#1939
Refs: ClickHouse/ClickHouse#32806

v2: move malloc_cpu_count_is_deterministic() into
malloc_init_hard_recursible() since _SC_NPROCESSORS_CONF does
allocations for readdir()
azat added a commit to ClickHouse/jemalloc that referenced this issue Dec 17, 2021
Determinitic number of CPUs is important for percpu arena to work
correctly, since it uses cpu index - sched_getcpu(), and if it will
greater then number of CPUs bad thing will happen, or assertion will be
failed in debug build:

    <jemalloc>: ../contrib/jemalloc/src/jemalloc.c:321: Failed assertion: "ind <= narenas_total_get()"
    Aborted (core dumped)

Number of CPUs can be obtained from the following places:
- sched_getaffinity()
- sysconf(_SC_NPROCESSORS_ONLN)
- sysconf(_SC_NPROCESSORS_CONF)

For the sched_getaffinity() you may simply use taskset(1) to run program
on a different cpu, and in case it will be not first, percpu will work
incorrectly, i.e.:

    $ taskset --cpu-list $(( $(getconf _NPROCESSORS_ONLN)-1 )) <your_program>

_SC_NPROCESSORS_ONLN uses /sys/devices/system/cpu/online, LXD/LXC
virtualize /sys/devices/system/cpu/online file [1], and so when you run
container with limited limits.cpus it will bind randomly selected CPU to
it

  [1]: lxc/lxcfs#301

_SC_NPROCESSORS_CONF uses /sys/devices/system/cpu/cpu*, and AFAIK nobody
playing with dentries there.

So if all three of these are equal, percpu arenas should work correctly.

And a small note regardless _SC_NPROCESSORS_ONLN/_SC_NPROCESSORS_CONF,
musl uses sched_getaffinity() for both. So this will also increase the
entropy.

Also note, that you can check is percpu arena really applied using
abort_conf:true.

Refs: jemalloc#1939
Refs: ClickHouse/ClickHouse#32806

v2: move malloc_cpu_count_is_deterministic() into
malloc_init_hard_recursible() since _SC_NPROCESSORS_CONF does
allocations for readdir()
azat added a commit to azat-archive/jemalloc that referenced this issue Dec 18, 2021
Determinitic number of CPUs is important for percpu arena to work
correctly, since it uses cpu index - sched_getcpu(), and if it will
greater then number of CPUs bad thing will happen, or assertion will be
failed in debug build:

    <jemalloc>: ../contrib/jemalloc/src/jemalloc.c:321: Failed assertion: "ind <= narenas_total_get()"
    Aborted (core dumped)

Number of CPUs can be obtained from the following places:
- sched_getaffinity()
- sysconf(_SC_NPROCESSORS_ONLN)
- sysconf(_SC_NPROCESSORS_CONF)

For the sched_getaffinity() you may simply use taskset(1) to run program
on a different cpu, and in case it will be not first, percpu will work
incorrectly, i.e.:

    $ taskset --cpu-list $(( $(getconf _NPROCESSORS_ONLN)-1 )) <your_program>

_SC_NPROCESSORS_ONLN uses /sys/devices/system/cpu/online, LXD/LXC
virtualize /sys/devices/system/cpu/online file [1], and so when you run
container with limited limits.cpus it will bind randomly selected CPU to
it

  [1]: lxc/lxcfs#301

_SC_NPROCESSORS_CONF uses /sys/devices/system/cpu/cpu*, and AFAIK nobody
playing with dentries there.

So if all three of these are equal, percpu arenas should work correctly.

And a small note regardless _SC_NPROCESSORS_ONLN/_SC_NPROCESSORS_CONF,
musl uses sched_getaffinity() for both. So this will also increase the
entropy.

Also note, that you can check is percpu arena really applied using
abort_conf:true.

Refs: jemalloc#1939
Refs: ClickHouse/ClickHouse#32806

v2: move malloc_cpu_count_is_deterministic() into
    malloc_init_hard_recursible() since _SC_NPROCESSORS_CONF does
    allocations for readdir()
v3:
- mark cpu_count_is_deterministic static
- check only if percpu arena is enabled
- check narenas
azat added a commit to azat-archive/jemalloc that referenced this issue Dec 21, 2021
Determinitic number of CPUs is important for percpu arena to work
correctly, since it uses cpu index - sched_getcpu(), and if it will
greater then number of CPUs bad thing will happen, or assertion will be
failed in debug build:

    <jemalloc>: ../contrib/jemalloc/src/jemalloc.c:321: Failed assertion: "ind <= narenas_total_get()"
    Aborted (core dumped)

Number of CPUs can be obtained from the following places:
- sched_getaffinity()
- sysconf(_SC_NPROCESSORS_ONLN)
- sysconf(_SC_NPROCESSORS_CONF)

For the sched_getaffinity() you may simply use taskset(1) to run program
on a different cpu, and in case it will be not first, percpu will work
incorrectly, i.e.:

    $ taskset --cpu-list $(( $(getconf _NPROCESSORS_ONLN)-1 )) <your_program>

_SC_NPROCESSORS_ONLN uses /sys/devices/system/cpu/online, LXD/LXC
virtualize /sys/devices/system/cpu/online file [1], and so when you run
container with limited limits.cpus it will bind randomly selected CPU to
it

  [1]: lxc/lxcfs#301

_SC_NPROCESSORS_CONF uses /sys/devices/system/cpu/cpu*, and AFAIK nobody
playing with dentries there.

So if all three of these are equal, percpu arenas should work correctly.

And a small note regardless _SC_NPROCESSORS_ONLN/_SC_NPROCESSORS_CONF,
musl uses sched_getaffinity() for both. So this will also increase the
entropy.

Also note, that you can check is percpu arena really applied using
abort_conf:true.

Refs: jemalloc#1939
Refs: ClickHouse/ClickHouse#32806

v2: move malloc_cpu_count_is_deterministic() into
    malloc_init_hard_recursible() since _SC_NPROCESSORS_CONF does
    allocations for readdir()
v3:
- mark cpu_count_is_deterministic static
- check only if percpu arena is enabled
- check narenas
Lapenkov pushed a commit to jemalloc/jemalloc that referenced this issue Dec 21, 2021
Determinitic number of CPUs is important for percpu arena to work
correctly, since it uses cpu index - sched_getcpu(), and if it will
greater then number of CPUs bad thing will happen, or assertion will be
failed in debug build:

    <jemalloc>: ../contrib/jemalloc/src/jemalloc.c:321: Failed assertion: "ind <= narenas_total_get()"
    Aborted (core dumped)

Number of CPUs can be obtained from the following places:
- sched_getaffinity()
- sysconf(_SC_NPROCESSORS_ONLN)
- sysconf(_SC_NPROCESSORS_CONF)

For the sched_getaffinity() you may simply use taskset(1) to run program
on a different cpu, and in case it will be not first, percpu will work
incorrectly, i.e.:

    $ taskset --cpu-list $(( $(getconf _NPROCESSORS_ONLN)-1 )) <your_program>

_SC_NPROCESSORS_ONLN uses /sys/devices/system/cpu/online, LXD/LXC
virtualize /sys/devices/system/cpu/online file [1], and so when you run
container with limited limits.cpus it will bind randomly selected CPU to
it

  [1]: lxc/lxcfs#301

_SC_NPROCESSORS_CONF uses /sys/devices/system/cpu/cpu*, and AFAIK nobody
playing with dentries there.

So if all three of these are equal, percpu arenas should work correctly.

And a small note regardless _SC_NPROCESSORS_ONLN/_SC_NPROCESSORS_CONF,
musl uses sched_getaffinity() for both. So this will also increase the
entropy.

Also note, that you can check is percpu arena really applied using
abort_conf:true.

Refs: #1939
Refs: ClickHouse/ClickHouse#32806

v2: move malloc_cpu_count_is_deterministic() into
    malloc_init_hard_recursible() since _SC_NPROCESSORS_CONF does
    allocations for readdir()
v3:
- mark cpu_count_is_deterministic static
- check only if percpu arena is enabled
- check narenas
azat added a commit to ClickHouse/jemalloc that referenced this issue Dec 22, 2021
Determinitic number of CPUs is important for percpu arena to work
correctly, since it uses cpu index - sched_getcpu(), and if it will
greater then number of CPUs bad thing will happen, or assertion will be
failed in debug build:

    <jemalloc>: ../contrib/jemalloc/src/jemalloc.c:321: Failed assertion: "ind <= narenas_total_get()"
    Aborted (core dumped)

Number of CPUs can be obtained from the following places:
- sched_getaffinity()
- sysconf(_SC_NPROCESSORS_ONLN)
- sysconf(_SC_NPROCESSORS_CONF)

For the sched_getaffinity() you may simply use taskset(1) to run program
on a different cpu, and in case it will be not first, percpu will work
incorrectly, i.e.:

    $ taskset --cpu-list $(( $(getconf _NPROCESSORS_ONLN)-1 )) <your_program>

_SC_NPROCESSORS_ONLN uses /sys/devices/system/cpu/online, LXD/LXC
virtualize /sys/devices/system/cpu/online file [1], and so when you run
container with limited limits.cpus it will bind randomly selected CPU to
it

  [1]: lxc/lxcfs#301

_SC_NPROCESSORS_CONF uses /sys/devices/system/cpu/cpu*, and AFAIK nobody
playing with dentries there.

So if all three of these are equal, percpu arenas should work correctly.

And a small note regardless _SC_NPROCESSORS_ONLN/_SC_NPROCESSORS_CONF,
musl uses sched_getaffinity() for both. So this will also increase the
entropy.

Also note, that you can check is percpu arena really applied using
abort_conf:true.

Refs: jemalloc#1939
Refs: ClickHouse/ClickHouse#32806

v2: move malloc_cpu_count_is_deterministic() into
    malloc_init_hard_recursible() since _SC_NPROCESSORS_CONF does
    allocations for readdir()
v3:
- mark cpu_count_is_deterministic static
- check only if percpu arena is enabled
- check narenas
azat added a commit to ClickHouse/jemalloc that referenced this issue Dec 22, 2021
Determinitic number of CPUs is important for percpu arena to work
correctly, since it uses cpu index - sched_getcpu(), and if it will
greater then number of CPUs bad thing will happen, or assertion will be
failed in debug build:

    <jemalloc>: ../contrib/jemalloc/src/jemalloc.c:321: Failed assertion: "ind <= narenas_total_get()"
    Aborted (core dumped)

Number of CPUs can be obtained from the following places:
- sched_getaffinity()
- sysconf(_SC_NPROCESSORS_ONLN)
- sysconf(_SC_NPROCESSORS_CONF)

For the sched_getaffinity() you may simply use taskset(1) to run program
on a different cpu, and in case it will be not first, percpu will work
incorrectly, i.e.:

    $ taskset --cpu-list $(( $(getconf _NPROCESSORS_ONLN)-1 )) <your_program>

_SC_NPROCESSORS_ONLN uses /sys/devices/system/cpu/online, LXD/LXC
virtualize /sys/devices/system/cpu/online file [1], and so when you run
container with limited limits.cpus it will bind randomly selected CPU to
it

  [1]: lxc/lxcfs#301

_SC_NPROCESSORS_CONF uses /sys/devices/system/cpu/cpu*, and AFAIK nobody
playing with dentries there.

So if all three of these are equal, percpu arenas should work correctly.

And a small note regardless _SC_NPROCESSORS_ONLN/_SC_NPROCESSORS_CONF,
musl uses sched_getaffinity() for both. So this will also increase the
entropy.

Also note, that you can check is percpu arena really applied using
abort_conf:true.

Refs: jemalloc#1939
Refs: ClickHouse/ClickHouse#32806

v2: move malloc_cpu_count_is_deterministic() into
    malloc_init_hard_recursible() since _SC_NPROCESSORS_CONF does
    allocations for readdir()
v3:
- mark cpu_count_is_deterministic static
- check only if percpu arena is enabled
- check narenas
minaripenguin pushed a commit to minaripenguin/android_external_jemalloc-new that referenced this issue Jan 28, 2023
Determinitic number of CPUs is important for percpu arena to work
correctly, since it uses cpu index - sched_getcpu(), and if it will
greater then number of CPUs bad thing will happen, or assertion will be
failed in debug build:

    <jemalloc>: ../contrib/jemalloc/src/jemalloc.c:321: Failed assertion: "ind <= narenas_total_get()"
    Aborted (core dumped)

Number of CPUs can be obtained from the following places:
- sched_getaffinity()
- sysconf(_SC_NPROCESSORS_ONLN)
- sysconf(_SC_NPROCESSORS_CONF)

For the sched_getaffinity() you may simply use taskset(1) to run program
on a different cpu, and in case it will be not first, percpu will work
incorrectly, i.e.:

    $ taskset --cpu-list $(( $(getconf _NPROCESSORS_ONLN)-1 )) <your_program>

_SC_NPROCESSORS_ONLN uses /sys/devices/system/cpu/online, LXD/LXC
virtualize /sys/devices/system/cpu/online file [1], and so when you run
container with limited limits.cpus it will bind randomly selected CPU to
it

  [1]: lxc/lxcfs#301

_SC_NPROCESSORS_CONF uses /sys/devices/system/cpu/cpu*, and AFAIK nobody
playing with dentries there.

So if all three of these are equal, percpu arenas should work correctly.

And a small note regardless _SC_NPROCESSORS_ONLN/_SC_NPROCESSORS_CONF,
musl uses sched_getaffinity() for both. So this will also increase the
entropy.

Also note, that you can check is percpu arena really applied using
abort_conf:true.

Refs: jemalloc/jemalloc#1939
Refs: ClickHouse/ClickHouse#32806

v2: move malloc_cpu_count_is_deterministic() into
    malloc_init_hard_recursible() since _SC_NPROCESSORS_CONF does
    allocations for readdir()
v3:
- mark cpu_count_is_deterministic static
- check only if percpu arena is enabled
- check narenas
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

5 participants