Skip to content

Fix cpuinfo init on Linux without CPU sysfs lists#28230

Merged
tianleiwu merged 4 commits intomicrosoft:mainfrom
tianleiwu:tlwu/fix-cpuinfo-sysfs-fallback
Apr 29, 2026
Merged

Fix cpuinfo init on Linux without CPU sysfs lists#28230
tianleiwu merged 4 commits intomicrosoft:mainfrom
tianleiwu:tlwu/fix-cpuinfo-sysfs-fallback

Conversation

@tianleiwu
Copy link
Copy Markdown
Contributor

@tianleiwu tianleiwu commented Apr 25, 2026

Description

Fixes ONNX Runtime startup on Linux ARM64 environments where /sys/devices/system/cpu/possible and /sys/devices/system/cpu/present are unavailable, such as AWS Lambda ARM64/Graviton and restricted build sandboxes.

There are two related failure modes:

  1. PosixEnv may be constructed before ORT's default logger is registered. If cpuinfo_initialize() fails during that early construction path, the existing LOGS_DEFAULT(INFO) call can terminate with Attempt to use DefaultLogger but none has been registered.
  2. The bundled pytorch/cpuinfo code treats missing Linux CPU possible/present sysfs cpulists as fatal on ARM Linux. The max-count helpers return UINT32_MAX, which wraps to 0 after 1 + UINT32_MAX in ARM Linux initialization and prevents cpuinfo from reaching the later /proc/cpuinfo and getauxval() based detection paths.

Root Cause

The immediate import crash is caused by unsafe early logging in onnxruntime/core/platform/posix/env.cc. Python bindings can reference Env::Default() during module load before logging is initialized, so a cpuinfo initialization failure must not use LOGS_DEFAULT() unless a default logger exists.

The cpuinfo initialization failure is more subtle. A count-only fallback is not enough: after cpuinfo computes max possible/present CPU counts, it calls cpuinfo_linux_detect_possible_processors() and cpuinfo_linux_detect_present_processors() to set CPUINFO_LINUX_FLAG_POSSIBLE and CPUINFO_LINUX_FLAG_PRESENT on each processor. ARM Linux initialization later marks processors valid only if those flags are set. If only the count fallback is provided, valid_processors can remain zero and cpuinfo can proceed into an invalid partial initialization state.

Fix

  • Make PosixEnv logging safe when cpuinfo initialization fails before a default logger exists:
    • use logging::LoggingManager::HasDefaultLogger() before LOGS_DEFAULT()
    • fall back to std::cerr when no logger is registered
  • Add a cpuinfo patch for Linux missing sysfs CPU cpulists:
    • fallback max possible/present processor detection to sysconf(_SC_NPROCESSORS_ONLN) - 1
    • fallback present/possible processor flag detection by marking CPUs 0..nproc-1
    • preserve existing sysfs parsing behavior when the cpulist files are available
  • Wire the cpuinfo patch into the existing cpuinfo FetchContent flow for Linux and existing ARM64/ARM64EC patch path.
  • Add a simulation test that validates:
    • safe early logging without a registered default logger
    • sysconf(_SC_NPROCESSORS_ONLN) count and present/possible flag fallback behavior
    • hiding /sys/devices/system/cpu/{possible,present} via LD_PRELOAD
    • optional ORT import with hidden sysfs when a built ORT package is importable

Testing

Ran from a clean branch/worktree:

python onnxruntime/test/common/test_cpuinfo_sysfs_fallback.py

Result:

  • safe logging simulation: PASS
  • sysconf count + flag fallback simulation: PASS
  • LD_PRELOAD sysfs-hiding simulation: PASS
  • ORT import integration: SKIP (onnxruntime.capi not built/importable in this workspace)

Also validated the cpuinfo patch directly:

cd build/cu128/Release/_deps/pytorch_cpuinfo-src
patch --dry-run -p1 < /path/to/cmake/patches/cpuinfo/fix_missing_sysfs_fallback.patch

And syntax-checked patched src/linux/processors.c in a temporary tree with cpuinfo headers.

Related Issue

Fixes #10038.

Comment thread onnxruntime/test/common/test_cpuinfo_sysfs_fallback.py Fixed
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes ONNX Runtime startup failures on Linux ARM64 environments where /sys/devices/system/cpu/{possible,present} are unavailable by (1) making early cpuinfo-init logging safe before a default logger exists, and (2) patching the bundled pytorch/cpuinfo to fall back to sysconf(_SC_NPROCESSORS_ONLN) for both CPU counts and per-CPU present/possible flags.

Changes:

  • Guard LOGS_DEFAULT(...) usage in PosixEnv so cpuinfo init failures won’t crash when logging hasn’t been initialized yet.
  • Patch pytorch/cpuinfo Linux processor detection to provide robust sysfs-missing fallbacks (counts + flags).
  • Add a standalone simulation script to validate the early-logging and sysfs-missing behaviors (incl. LD_PRELOAD sysfs hiding).

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.

File Description
onnxruntime/core/platform/posix/env.cc Avoids crashing during early PosixEnv construction by falling back to std::cerr when no default logger exists.
cmake/external/onnxruntime_external_deps.cmake Wires in the new cpuinfo patch during FetchContent dependency setup (Linux + ARM64/ARM64EC patch flow).
cmake/patches/cpuinfo/fix_missing_sysfs_fallback.patch Adds sysconf(_SC_NPROCESSORS_ONLN)-based fallbacks for max CPU count and present/possible flags when sysfs cpulists are missing.
onnxruntime/test/common/test_cpuinfo_sysfs_fallback.py Adds a manual/simulation validation script (compiles small programs + LD_PRELOAD shim).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread onnxruntime/test/common/test_cpuinfo_sysfs_fallback.py Outdated
Comment thread onnxruntime/test/common/test_cpuinfo_sysfs_fallback.py Outdated
Comment thread onnxruntime/test/common/test_cpuinfo_sysfs_fallback.py Outdated
Comment thread onnxruntime/test/common/test_cpuinfo_sysfs_fallback.py Outdated
Comment thread cmake/external/onnxruntime_external_deps.cmake Outdated
…s, fix docstring

- Convert test_cpuinfo_sysfs_fallback.py from standalone functions to proper
  unittest.TestCase so pytest/unittest discovery works correctly
- Add platform guards (sys.platform == 'linux') and tool detection
  (shutil.which) with unittest.SkipTest for non-Linux or missing compilers
- Remove unused get_ort_root() function
- Fix docstring: 'intercepts open/fopen' -> 'intercepts fopen' to match impl
- Fix CodeQL implicit string concatenation warning by extracting the -c
  script to a named variable
- Remove fix_missing_sysfs_fallback.patch from Windows ARM64/ARM64EC block
  since it only modifies Linux-specific sources (src/linux/processors.c);
  keep it in the Linux-only elseif block
Comment thread onnxruntime/test/common/test_cpuinfo_sysfs_fallback.py Fixed
Copy link
Copy Markdown
Member

@yuslepukhin yuslepukhin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@tianleiwu tianleiwu enabled auto-merge (squash) April 29, 2026 16:47
@tianleiwu tianleiwu merged commit df2b677 into microsoft:main Apr 29, 2026
96 of 102 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Importing onnxruntime on AWS Lambdas with ARM64 processor causes crash

5 participants