Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HWLOC_DEBUG=1 breaks hwloc on MacOS. #564

Closed
FunMiles opened this issue Jan 31, 2023 · 5 comments
Closed

HWLOC_DEBUG=1 breaks hwloc on MacOS. #564

FunMiles opened this issue Jan 31, 2023 · 5 comments

Comments

@FunMiles
Copy link

I discovered this issue after creating a MacOS CMakeLists.txt based on the Windows one. The Windows CMakeLists.txt turns on HWLOC_DEBUG if the code is compiled under debugging. It seems to not be an issue coming from CMake compilation but also exists for autotools.
Notes:

  • I am not sure what is the proper way to turn on the flag with configure. I forced the flag by overriding the compiler (see below)
  • Some test codes, other than the simple example I gave do also crash, but this sample code show the crash faster.
  • Using HWLOC_DEBUG_VERBOSE=1 on code compiled without HWLOC_DEBUG does not trigger the same issue. It seems the HWLOC_DEBUG flag has to be defined for it to crash.
  • Since the issue seems to be only with HWLOC_DEBUG turned on, it is not a show-stopper, but IMO either that flag should not make things crash or should be absent from the code.

What version of hwloc are you using?

3.0.0a1-git
commit 7987eb4

Which operating system and hardware are you running on?

Darwin Michels-MacBook-Pro.local 22.3.0 Darwin Kernel Version 22.3.0: Thu Jan 5 20:53:49 PST 2023; root:xnu-8792.81.2~2/RELEASE_X86_64 x86_64

Details of the problem

Steps to reproduce:

  1. configure with HWLOC_DEBUG turned on: CC="gcc -DHWLOC_DEBUG" <path to hwloc>/configure --prefix=pwd/dbg
  2. compile and install make -j 16 && make install
  3. create the following test code file test.cpp:
#include <iostream>
#include <thread>

#include <hwloc.h>

inline
int numCores()
{
        hwloc_topology_t topology;
        hwloc_cpuset_t cpuset;
        hwloc_obj_t obj;

        /* Allocate and initialize topology object. */
        hwloc_topology_init(&topology);

        /* ... Optionally, put detection configuration here to ignore
           some objects types, define a synthetic topology, etc....

           The default is to detect all the objects of the machine that
           the caller is allowed to access.  See Configure Topology
           Detection. */
        hwloc_topology_set_all_types_filter(topology, HWLOC_TYPE_FILTER_KEEP_NONE);
        hwloc_topology_set_type_filter(topology, HWLOC_OBJ_CORE, HWLOC_TYPE_FILTER_KEEP_ALL);
        /* Perform the topology detection. */
        hwloc_topology_load(topology);

        /* Optionally, get some additional topology information
           in case we need the topology depth later. */
        auto topodepth = hwloc_topology_get_depth(topology);
        // Try to get the number of CPU cores from topology
        int depth = hwloc_get_type_depth(topology, HWLOC_OBJ_CORE);
        int nCores = -1;
        if (depth == HWLOC_TYPE_DEPTH_UNKNOWN)
                nCores = -std::thread::hardware_concurrency();
        else
                nCores = hwloc_get_nbobjs_by_depth(topology, depth);

        // Destroy topology object and return
        hwloc_topology_destroy(topology);
        return nCores;
}

inline
int default_num_threads() { return std::abs(numCores()); }

int main() {
    std::cout << "Number of of threads: " << default_num_threads() << std::endl;
}
  1. compile the test codeclang++ -std=c++20 test.cpp -Ldbg/lib -lhwloc -Idbg/include -framework CoreFoundation -framework IOKit
  2. run the code: ./a.out and it crashes

Additional information

sysctl hw

hw.ncpu: 16
hw.byteorder: 1234
hw.memsize: 68719476736
hw.activecpu: 16
hw.perflevel0.physicalcpu: 8
hw.perflevel0.physicalcpu_max: 8
hw.perflevel0.logicalcpu: 16
hw.perflevel0.logicalcpu_max: 16
hw.perflevel0.l1icachesize: 32768
hw.perflevel0.l1dcachesize: 32768
hw.perflevel0.l2cachesize: 262144
hw.perflevel0.cpusperl2: 2
hw.perflevel0.l3cachesize: 16777216
hw.perflevel0.cpusperl3: 16
hw.perflevel0.name: Standard
hw.features.allows_security_research: 0
hw.optional.floatingpoint: 1
hw.optional.mmx: 1
hw.optional.sse: 1
hw.optional.sse2: 1
hw.optional.sse3: 1
hw.optional.supplementalsse3: 1
hw.optional.sse4_1: 1
hw.optional.sse4_2: 1
hw.optional.x86_64: 1
hw.optional.aes: 1
hw.optional.avx1_0: 1
hw.optional.rdrand: 1
hw.optional.f16c: 1
hw.optional.enfstrg: 1
hw.optional.fma: 1
hw.optional.avx2_0: 1
hw.optional.bmi1: 1
hw.optional.bmi2: 1
hw.optional.rtm: 0
hw.optional.hle: 0
hw.optional.adx: 1
hw.optional.mpx: 0
hw.optional.sgx: 0
hw.optional.avx512f: 0
hw.optional.avx512cd: 0
hw.optional.avx512dq: 0
hw.optional.avx512bw: 0
hw.optional.avx512vl: 0
hw.optional.avx512ifma: 0
hw.optional.avx512vbmi: 0
hw.physicalcpu: 8
hw.physicalcpu_max: 8
hw.logicalcpu: 16
hw.logicalcpu_max: 16
hw.cputype: 7
hw.cpusubtype: 8
hw.cpu64bit_capable: 1
hw.cpufamily: 260141638
hw.cpusubfamily: 0
hw.cacheconfig: 16 2 2 16 0 0 0 0 0 0
hw.cachesize: 68719476736 32768 262144 16777216 0 0 0 0 0 0
hw.pagesize: 4096
hw.pagesize32: 4096
hw.busfrequency: 400000000
hw.busfrequency_min: 400000000
hw.busfrequency_max: 400000000
hw.cpufrequency: 2400000000
hw.cpufrequency_min: 2400000000
hw.cpufrequency_max: 2400000000
hw.cachelinesize: 64
hw.l1icachesize: 32768
hw.l1dcachesize: 32768
hw.l2cachesize: 262144
hw.l3cachesize: 16777216
hw.tbfrequency: 1000000000
hw.packages: 1
hw.use_kernelmanagerd: 1
hw.serialdebugmode: 0
hw.nperflevels: 1
hw.targettype: Mac
hw.cputhreadtype: 1

sysct machdep

machdep.vectors.timer: 221
machdep.vectors.IPI: 222
machdep.pmap.hashwalks: 371183049
machdep.pmap.hashcnts: 379940942
machdep.pmap.hashmax: 16
machdep.pmap.kernel_text_ps: 4096
machdep.pmap.kern_pv_reserve: 16000
machdep.memmap.Conventional: 68608507904
machdep.memmap.RuntimeServices: 1511424
machdep.memmap.ACPIReclaim: 393216
machdep.memmap.ACPINVS: 790528
machdep.memmap.PalCode: 0
machdep.memmap.Reserved: 91496448
machdep.memmap.Unusable: 0
machdep.memmap.Other: 0
machdep.tsc.nanotime.tsc_base: 55802224609698
machdep.tsc.nanotime.ns_base: 369037310150163
machdep.tsc.nanotime.scale: 1789569706
machdep.tsc.nanotime.shift: 0
machdep.tsc.nanotime.generation: 31
machdep.tsc.frequency: 2400000000
machdep.tsc.deep_idle_rebase: 1
machdep.tsc.at_boot: 44521694
machdep.tsc.rebase_abs_time: 11086057180
machdep.misc.fast_uexc_support: 1
machdep.misc.panic_restart_timeout: 2147483647
machdep.misc.interrupt_latency_max: 0x0 0x49 0x33eda8
machdep.misc.timer_queue_trace:
machdep.misc.nmis: 0
machdep.xcpm.mode: 1
machdep.xcpm.pcps_mode: 0
machdep.xcpm.hard_plimit_max_100mhz_ratio: 50
machdep.xcpm.hard_plimit_min_100mhz_ratio: 8
machdep.xcpm.soft_plimit_max_100mhz_ratio: 50
machdep.xcpm.soft_plimit_min_100mhz_ratio: 8
machdep.xcpm.tuib_plimit_max_100mhz_ratio: 50
machdep.xcpm.tuib_plimit_min_100mhz_ratio: 8
machdep.xcpm.lpm_plimit_max_100mhz_ratio: 26
machdep.xcpm.tuib_enabled: 0
machdep.xcpm.lpm_enabled: 0
machdep.xcpm.power_source: 0
machdep.xcpm.bootplim: 0
machdep.xcpm.bootpst: 50
machdep.xcpm.tuib_ns: 0
machdep.xcpm.vectors_loaded_count: 1
machdep.xcpm.ratio_change_ratelimit_ns: 3000000
machdep.xcpm.ratio_changes_total: 29483176
machdep.xcpm.maxbusdelay: 4294967295
machdep.xcpm.maxintdelay: 0
machdep.xcpm.mid_applications: 0
machdep.xcpm.mid_relaxations: 0
machdep.xcpm.mid_mode: 1
machdep.xcpm.mid_cst_control_limit: 0
machdep.xcpm.mid_mode_active: 0
machdep.xcpm.mbd_mode: 1
machdep.xcpm.mbd_applications: 14
machdep.xcpm.mbd_relaxations: 32
machdep.xcpm.forced_idle_ratio: 100
machdep.xcpm.forced_idle_period: 30000000
machdep.xcpm.deep_idle_log: 0
machdep.xcpm.qos_txfr: 1
machdep.xcpm.deep_idle_count: 26
machdep.xcpm.deep_idle_last_stats: 0:03:25 CC7:99% C2:0% C3:0% C6:0% C7:0% C8:0% C9:0% C10:99%
machdep.xcpm.deep_idle_total_stats: 12:30:31 CC7:99% C2:0% C3:0% C6:0% C7:0% C8:0% C9:0% C10:99%
machdep.xcpm.cpu_thermal_level: 17
machdep.xcpm.gpu_thermal_level: 0
machdep.xcpm.io_thermal_level: 0
machdep.xcpm.io_control_engages: 0
machdep.xcpm.io_control_disengages: 0
machdep.xcpm.io_filtered_reads: 0
machdep.xcpm.pcps_rt_override_mode: 0
machdep.xcpm.io_cst_control_enabled: 1
machdep.xcpm.ring_boost_enabled: 0
machdep.xcpm.io_epp_boost_enabled: 1
machdep.xcpm.epp_override: 0
machdep.xcpm.perf_hints: 0
machdep.xcpm.pcps_rt_override_ns: 0
machdep.cpu.tlb.inst.large: 8
machdep.cpu.tlb.data.small: 64
machdep.cpu.tlb.data.small_level1: 64
machdep.cpu.address_bits.physical: 39
machdep.cpu.address_bits.virtual: 48
machdep.cpu.tsc_ccc.numerator: 200
machdep.cpu.tsc_ccc.denominator: 2
machdep.cpu.mwait.linesize_min: 64
machdep.cpu.mwait.linesize_max: 64
machdep.cpu.mwait.extensions: 3
machdep.cpu.mwait.sub_Cstates: 286531872
machdep.cpu.thermal.sensor: 1
machdep.cpu.thermal.dynamic_acceleration: 1
machdep.cpu.thermal.invariant_APIC_timer: 1
machdep.cpu.thermal.thresholds: 2
machdep.cpu.thermal.ACNT_MCNT: 1
machdep.cpu.thermal.core_power_limits: 1
machdep.cpu.thermal.fine_grain_clock_mod: 1
machdep.cpu.thermal.package_thermal_intr: 1
machdep.cpu.thermal.hardware_feedback: 0
machdep.cpu.thermal.energy_policy: 1
machdep.cpu.xsave.extended_state: 31 832 1088 0
machdep.cpu.xsave.extended_state1: 15 832 256 0
machdep.cpu.arch_perf.version: 4
machdep.cpu.arch_perf.number: 4
machdep.cpu.arch_perf.width: 48
machdep.cpu.arch_perf.events_number: 7
machdep.cpu.arch_perf.events: 0
machdep.cpu.arch_perf.fixed_number: 3
machdep.cpu.arch_perf.fixed_width: 48
machdep.cpu.cache.linesize: 64
machdep.cpu.cache.L2_associativity: 4
machdep.cpu.cache.size: 256
machdep.cpu.max_basic: 22
machdep.cpu.max_ext: 2147483656
machdep.cpu.vendor: GenuineIntel
machdep.cpu.brand_string: Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
machdep.cpu.family: 6
machdep.cpu.model: 158
machdep.cpu.extmodel: 9
machdep.cpu.extfamily: 0
machdep.cpu.stepping: 13
machdep.cpu.feature_bits: 9221959987971750911
machdep.cpu.leaf7_feature_bits: 43804591 1073741824
machdep.cpu.leaf7_feature_bits_edx: 3154120192
machdep.cpu.extfeature_bits: 1241984796928
machdep.cpu.signature: 591597
machdep.cpu.brand: 0
machdep.cpu.features: FPU VME DE PSE TSC MSR PAE MCE CX8 APIC SEP MTRR PGE MCA CMOV PAT PSE36 CLFSH DS ACPI MMX FXSR SSE SSE2 SS HTT TM PBE SSE3 PCLMULQDQ DTES64 MON DSCPL VMX EST TM2 SSSE3 FMA CX16 TPR PDCM SSE4.1 SSE4.2 x2APIC MOVBE POPCNT AES PCID XSAVE OSXSAVE SEGLIM64 TSCTMR AVX1.0 RDRAND F16C
machdep.cpu.leaf7_features: RDWRFSGS TSC_THREAD_OFFSET SGX BMI1 AVX2 SMEP BMI2 ERMS INVPCID FPU_CSDS MPX RDSEED ADX SMAP CLFSOPT IPT SGXLC MDCLEAR IBRS STIBP L1DF ACAPMSR SSBD
machdep.cpu.extfeatures: SYSCALL XD 1GBPAGE EM64T LAHF LZCNT PREFETCHW RDTSCP TSCI
machdep.cpu.logical_per_package: 16
machdep.cpu.cores_per_package: 8
machdep.cpu.microcode_version: 244
machdep.cpu.processor_flag: 5
machdep.cpu.core_count: 8
machdep.cpu.thread_count: 16
machdep.user_idle_level: 0
machdep.x2apic_enabled: 0
machdep.eager_timer_evaluations: 89913
machdep.eager_timer_evaluation_max: 1171210
machdep.x86_fp_simd_isr_uses: 0
machdep.uncore_sample_state: 0
machdep.uncore_sample_mask: 1
machdep.uncore_sample_ctl: 0
machdep.uncore_sample_interval_ms: 500
machdep.uncore_pcie_mmio_base: -536870912

@bgoglin
Copy link
Contributor

bgoglin commented Jan 31, 2023

I can reproduce with your test case on a M1, now I need to learn how to debug with lldb instead of gdb :)

@bgoglin
Copy link
Contributor

bgoglin commented Jan 31, 2023

Found the bug, it's indeed limited to recent Mac where hybrid processors are described in sysctl "perflevels". I forgot to honor the API filtering for caches found in these perflevels, they are always added even when explicitly filtered out. It's easy to fix, I'll fix master and stable branches tomorrow.

The assert is only enabled in debug mode, but you'll get those unwanted caches in non-debug too. If this is annoying for you, I'll release a 2.9.1 earlier than planned (that's pretty much the only bug found in 2.9 so far).

@bgoglin
Copy link
Contributor

bgoglin commented Feb 1, 2023

@FunMiles Can you try the tarball at https://ci.inria.fr/hwloc/view/all/job/bgoglin/500/ with autotools or just apply commit bgoglin@aa0ef16 on top of your cmake build ?

@FunMiles
Copy link
Author

FunMiles commented Feb 1, 2023

It works. Thanks. I'll move that under my commits for CMakeLists PR that @JackBoosY requested in #565 (comment)

@bgoglin bgoglin closed this as completed in aa0ef16 Feb 1, 2023
bgoglin added a commit that referenced this issue Feb 1, 2023
Forgotten in f7c9aa8

Thanks to Michel Lesoinne for the report.

Closes #564

Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
(cherry picked from commit aa0ef16)
bgoglin added a commit that referenced this issue Feb 1, 2023
Forgotten in f7c9aa8

Thanks to Michel Lesoinne for the report.

Closes #564

Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
(cherry picked from commit aa0ef16)
bgoglin added a commit that referenced this issue Feb 1, 2023
Forgotten in f7c9aa8

Thanks to Michel Lesoinne for the report.

Closes #564

Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
(cherry picked from commit aa0ef16)
@bgoglin
Copy link
Contributor

bgoglin commented Mar 29, 2023

The fix is included in hwloc 2.9.1rc1 released yesterday. Final 2.9.1 is expected early next week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants