Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

std::thread::available_parallelism() return 0 #115868

Closed
lbl8603 opened this issue Sep 15, 2023 · 19 comments · Fixed by #115946
Closed

std::thread::available_parallelism() return 0 #115868

lbl8603 opened this issue Sep 15, 2023 · 19 comments · Fixed by #115946
Labels
C-bug Category: This is a bug. I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness O-linux Operating system: Linux O-MIPS Target: MIPS processors T-libs Relevant to the library team, which will review and decide on the PR/issue.

Comments

@lbl8603
Copy link

lbl8603 commented Sep 15, 2023

available_parallelism() return 0

I tried this code:

fn main() {
    let num = dbg!(std::thread::available_parallelism());
    println!("available_parallelism={:?}",num);
   println!("core num = {:?}",std::thread::available_parallelism());
}

rustc --version --verbose:

rustc 1.73.0-nightly (d8899c577 2023-07-11)
binary: rustc
commit-hash: d8899c577bc11308a0db96e8378373be93330a8f
commit-date: 2023-07-11
host: x86_64-unknown-linux-gnu
release: 1.73.0-nightly
LLVM version: 16.0.5

JKRM7E(W{4511P16FSRSR

cat /proc/cpuinfo


cat /proc/cpuinfo
system type		: MT7621
machine			: TGNET_RN1150W
processor		: 0
cpu model		: MIPS 1004Kc V2.15
BogoMIPS		: 583.68
wait instruction	: yes
microsecond timers	: yes
tlb_entries		: 32
extra interrupt vector	: yes
hardware watchpoint	: yes, count: 4, address/irw mask: [0x0ffc, 0x0ffc, 0x0ffb, 0x0ffb]
isa			: mips1 mips2 mips32r1 mips32r2
ASEs implemented	: mips16 dsp mt
shadow register sets	: 1
kscratch registers	: 0
core			: 0
VPE			: 0
VCED exceptions		: not available
VCEI exceptions		: not available

processor		: 1
cpu model		: MIPS 1004Kc V2.15
BogoMIPS		: 583.68
wait instruction	: yes
microsecond timers	: yes
tlb_entries		: 32
extra interrupt vector	: yes
hardware watchpoint	: yes, count: 4, address/irw mask: [0x0ffc, 0x0ffc, 0x0ffb, 0x0ffb]
isa			: mips1 mips2 mips32r1 mips32r2
ASEs implemented	: mips16 dsp mt
shadow register sets	: 1
kscratch registers	: 0
core			: 0
VPE			: 1
VCED exceptions		: not available
VCEI exceptions		: not available

processor		: 2
cpu model		: MIPS 1004Kc V2.15
BogoMIPS		: 583.68
wait instruction	: yes
microsecond timers	: yes
tlb_entries		: 32
extra interrupt vector	: yes
hardware watchpoint	: yes, count: 4, address/irw mask: [0x0ffc, 0x0ffc, 0x0ffb, 0x0ffb]
isa			: mips1 mips2 mips32r1 mips32r2
ASEs implemented	: mips16 dsp mt
shadow register sets	: 1
kscratch registers	: 0
core			: 1
VPE			: 0
VCED exceptions		: not available
VCEI exceptions		: not available

processor		: 3
cpu model		: MIPS 1004Kc V2.15
BogoMIPS		: 583.68
wait instruction	: yes
microsecond timers	: yes
tlb_entries		: 32
extra interrupt vector	: yes
hardware watchpoint	: yes, count: 4, address/irw mask: [0x0ffc, 0x0ffc, 0x0ffb, 0x0ffb]
isa			: mips1 mips2 mips32r1 mips32r2
ASEs implemented	: mips16 dsp mt
shadow register sets	: 1
kscratch registers	: 0
core			: 1
VPE			: 1
VCED exceptions		: not available
VCEI exceptions		: not available

uname -a
Linux Router 3.10.14 #2 SMP Fri Oct 19 18:02:45 CST 2018 mips GNU/Linux

@lbl8603 lbl8603 added the C-bug Category: This is a bug. label Sep 15, 2023
@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Sep 15, 2023
@Urgau
Copy link
Member

Urgau commented Sep 15, 2023

Looking at the relevant code, it appears that if lib::CPU_COUNT returned 0 it would indeed be a problem, but why would that be?

let count = libc::CPU_COUNT(&set) as usize;

Could you try to run this small program, and return us the output of it?

fn main() {
    let mut set: libc::cpu_set_t = unsafe { std::mem::zeroed() };
    unsafe {
        if dbg!(libc::sched_getaffinity(0, std::mem::size_of::<libc::cpu_set_t>(), &mut set)) == 0 {
            dbg!(libc::CPU_COUNT(&set));
        }
    }
}

If it prints out 0 for the CPU_COUNT could you also attach dbg!(&set).

@rustbot label +T-libs +I-prioritize +O-linux -needs-triage

@rustbot rustbot added I-prioritize Issue: Indicates that prioritization has been requested for this issue. O-linux Operating system: Linux T-libs Relevant to the library team, which will review and decide on the PR/issue. and removed needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Sep 15, 2023
@saethlin saethlin added the O-MIPS Target: MIPS processors label Sep 15, 2023
@lbl8603
Copy link
Author

lbl8603 commented Sep 15, 2023

DLGM@P2ZANP}AUONG23YQVT

@saethlin
Copy link
Member

Can you post the output of running your test program with strace -f ./core_test? I'm just hoping you have some way to get strace onto this router 🙏

@Urgau
Copy link
Member

Urgau commented Sep 15, 2023

Right, that's what I thought. Let's try to pin point where the bug is by trying the same thing in C.

Could you try to run this small program, and return us the output of it?

#define _GNU_SOURCE
#include <sched.h>
#include <stdio.h>

int main() {
    cpu_set_t set;

    printf("sizeof(set) = %d\n", sizeof(set));

    int ret = sched_getaffinity(0, sizeof(set), &set);
    printf("sched_getaffinity = %d\n", ret);

    int count = CPU_COUNT(&set);
    printf("CPU_COUNT(&set) = %d\n", count);

    return 0;
}

@lbl8603
Copy link
Author

lbl8603 commented Sep 15, 2023

I think for any reason, when the value of 'CPU_COUNT' is 0, 'available_parallelism' should return Err

@lbl8603
Copy link
Author

lbl8603 commented Sep 15, 2023

Sorry, I don't know how to compile C program onto the router

@Urgau
Copy link
Member

Urgau commented Sep 15, 2023

I think for any reason, when the value of 'CPU_COUNT' is 0, 'available_parallelism' should return Err

Well yes, but we shouldn't get 0 for starter.

Sorry, I don't know how to compile C program onto the router

Hum, that's a bummer. Then I don't know how to debug this further.

@lbl8603
Copy link
Author

lbl8603 commented Sep 15, 2023

Can you provide the compiled binary for me to run? The system environment has been given

@Urgau
Copy link
Member

Urgau commented Sep 15, 2023

Can you provide the compiled binary for me to run? The system environment has been given

I'm sorry, I don't know your environment, and I haven't used any MIPS.

However if you able to compile with rustix as a dependency, we might be able to continue debugging.

Could you try to run this small program, and return us the output of it?

fn main() {
    dbg!(std::mem::size_of::<libc::cpu_set_t>());
    dbg!(std::mem::size_of::<rustix::process::CpuSet>());
    
    if let Ok(cpu_set) = dbg!(rustix::process::sched_getaffinity(None)) {
        dbg!(cpu_set.count());
    }
}
Cargo.toml if needed
[package]
name = "available_parallelism"
version = "0.1.0"
edition = "2021"

[dependencies]
libc = "0.2"
rustix = { version = "0.38", features = ["process"] }

@taiki-e
Copy link
Member

taiki-e commented Sep 15, 2023

Related: seanmonstar/num_cpus#69

Unlike num_cpus, which returns usize, std returns NoneZeroUsize, so this is a soundness issue.

return Ok(NonZeroUsize::new_unchecked(count));

@rustbot label +I-unsound

@rustbot rustbot added the I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness label Sep 15, 2023
@thomcc
Copy link
Member

thomcc commented Sep 15, 2023

std should probably just handle this by returning Err if NonZeroUsize::new fails rather than using NonZeroUsize::new_unchecked. It's a weird issue (obviously there is at least 1 CPU if code is running) but who knows.

@saethlin
Copy link
Member

Maybe it's worth saying this explicitly: As far as I can tell by reading the man pages, this is a bug in glibc or Linux. The man pages for sched_getaffinity and CPU_COUNT don't look like there is any wiggle room for a platform to behave like this.

Has anyone tried reporting this upstream? This issue is not the first time this problem has been encountered, but it looks to me like everyone else has papered over it.

@saethlin
Copy link
Member

If we're down for cursed things, this program will print the raw bytes in the cpuset which would at least let us know if the kernel is putting anything in there:

use libc::{cpu_set_t, sched_getaffinity, CPU_COUNT};
use std::mem::size_of;

fn main() {
    unsafe {
        let mut set: cpu_set_t = std::mem::zeroed();

        let ret = sched_getaffinity(0, size_of::<cpu_set_t>(), &mut set);
        dbg!(ret);
        // Assume the kernel doesn't deinit any bytes in the set
        let set_as_bytes = std::mem::transmute::<_, &[u8; std::mem::size_of::<cpu_set_t>()]>(&set);
        println!("{:?}", set_as_bytes);

        let count = CPU_COUNT(&set);
        dbg!(count);
    }
}

@thomcc
Copy link
Member

thomcc commented Sep 15, 2023

libc::CPU_COUNT is implemented in pure rust with straightforward code: https://docs.rs/libc/latest/src/libc/unix/linux_like/linux/mod.rs.html#3988-3999

I'm not sure I think there's a way for that code to fail to miss any bits.

@the8472
Copy link
Member

the8472 commented Sep 16, 2023

@lbl8603 you haven't mentioned which target-triple you're compiling for. Some are tier 2 and some are tier 3. Some use glibc, some musl. This could be relevant if it's a bug in libc.

@the8472
Copy link
Member

the8472 commented Sep 16, 2023

Linux Router 3.10.14

This kernel version has reached EOL and is below the minimum kernel version for the tier 2 mips-gnu targets.

@ryanavella
Copy link

ryanavella commented Sep 18, 2023

std should probably just handle this by returning Err if NonZeroUsize::new fails rather than using NonZeroUsize::new_unchecked. It's a weird issue (obviously there is at least 1 CPU if code is running) but who knows.

The docs for available_parallelism already mention that it returns an estimate, rather than an accurate count. I take this mean a lower bound, so a fallback of 1 would be consistent with my expectations.

But I'll admit the docs do also seem to contradict themselves a few paragraphs later:

This function will, but is not limited to, return errors in the following cases ... If the amount of parallelism is not known ...

@saethlin
Copy link
Member

Regardless of whether this kernel is EOL (we support plenty of EOL kernels) and whether the wobbliness of available_parallelism applies here (I'm sure it doesn't, and it would be terrible for us to retcon this as permitted behavior), I think the problem here is that your kernel is buggy.

A quick Google search for "mips sched_getaffinity" turns up this patch:
https://lkml.iu.edu/hypermail/linux/kernel/1509.0/02527.html
https://scm.linefinity.com/common/openwrt/commit/473ddd592c55c4fcafc27b61ee17905377d336b1

Just going off vibes, this looks like it fixes the bug we are talking about. As was pointed out correctly above, the kernel you're using is not supported by this target, and upgrading to a supported kernel would bring in the patch above. Can you confirm that upgrading to a supported kernel (at least 4.4)?

If an upgrade to at least Linux 4.4 fixes this issue, I do not think we should try to fall back to returning 1 if CPU_COUNT says 0. Might be nice to panic instead of executing UB though.

@the8472
Copy link
Member

the8472 commented Sep 18, 2023

Yeah, panic makes sense. We should notify users that the system is bugged and unsupported.
I'll make a PR.

rust-timer added a commit to rust-lang-ci/rust that referenced this issue Sep 19, 2023
Rollup merge of rust-lang#115946 - the8472:panic-on-sched_getaffinity-bug, r=Mark-Simulacrum

panic when encountering an illegal cpumask in thread::available_parallelism

Fixes rust-lang#115868 by panicking instead of returning an invalid `NonZeroUsize`
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Sep 22, 2023
…-bug, r=cuviper

Fall back to _SC_NPROCESSORS_ONLN if sched_getaffinity returns an empty mask

Followup to rust-lang#115946
A gentler fix for rust-lang#115868, one that doesn't panic, [suggested on zulip](https://rust-lang.zulipchat.com/#narrow/stream/259402-t-libs.2Fmeetings/topic/Meeting.202023-09-19/near/391942927)

In that situation - on the buggy kernel versions - a zero-mask means no affinities have been set so `_SC_NPROCESSORS_ONLN` provides the right value.
rust-timer added a commit to rust-lang-ci/rust that referenced this issue Sep 22, 2023
Rollup merge of rust-lang#116038 - the8472:panic-on-sched_getaffinity-bug, r=cuviper

Fall back to _SC_NPROCESSORS_ONLN if sched_getaffinity returns an empty mask

Followup to rust-lang#115946
A gentler fix for rust-lang#115868, one that doesn't panic, [suggested on zulip](https://rust-lang.zulipchat.com/#narrow/stream/259402-t-libs.2Fmeetings/topic/Meeting.202023-09-19/near/391942927)

In that situation - on the buggy kernel versions - a zero-mask means no affinities have been set so `_SC_NPROCESSORS_ONLN` provides the right value.
@apiraino apiraino removed the I-prioritize Issue: Indicates that prioritization has been requested for this issue. label Nov 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: This is a bug. I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness O-linux Operating system: Linux O-MIPS Target: MIPS processors T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants