Skip to content

Refactor: move RTLD_GLOBAL SO preload + simpler_log_init from C++ ChipWorker::init to Python#746

Merged
ChaoWao merged 1 commit into
hw-native-sys:mainfrom
ChaoWao:refactor/move-so-preload-to-python
May 12, 2026
Merged

Refactor: move RTLD_GLOBAL SO preload + simpler_log_init from C++ ChipWorker::init to Python#746
ChaoWao merged 1 commit into
hw-native-sys:mainfrom
ChaoWao:refactor/move-so-preload-to-python

Conversation

@ChaoWao
Copy link
Copy Markdown
Collaborator

@ChaoWao ChaoWao commented May 12, 2026

Summary

Continues the API-narrowing theme of #723 / #735. ChipWorker::init was the last place in C++ doing process-wide SO bootstrap (dlopen libsimpler_log.so + on sim libcpu_sim_context.so with RTLD_GLOBAL, plus calling simpler_log_init to seed HostLogger). That moves up into the Python ChipWorker wrapper, shrinking the C++ init signature from 8 args to 4:

- void init(host_lib, aicpu, aicore, simpler_log_lib, device_id, sim_context_lib = "", log_level = 1, log_info_v = 5);
+ void init(host_lib, aicpu, aicore, device_id);

Why it's safe

_task_interface.so has no undefined HostLogger / unified_log_* symbols — chip_worker.cpp reaches host_runtime.so purely via dlsym, and the binding code doesn't log. So the RTLD_GLOBAL preload only has to precede the _ChipWorker.init dlopen of host_runtime.so, not module import. The Python wrapper does:

  1. ctypes.CDLL(bins.simpler_log_path, mode=RTLD_GLOBAL) — once per process
  2. <handle>.simpler_log_init(log_level, log_info_v) — seed HostLogger
  3. on sim: ctypes.CDLL(bins.sim_context_path, mode=RTLD_GLOBAL)
  4. self._impl.init(host_path, aicpu_path, aicore_path, device_id)

A module-level _preloaded_globals: dict[str, ctypes.CDLL] makes the loads idempotent per path (Python counterpart of the old std::once_flag). The ChipWorker.init wrapper's public signature (device_id, bins, log_level=None, log_info_v=None) is unchanged, so no caller updates.

Changes

  • src/common/worker/chip_worker.{h,cpp} — drop the 4 init params; remove the g_simpler_log_* / g_sim_context_* globals, ensure_*_loaded() helpers, SimplerLogInitFn typedef + simpler_log_init_fn_ member, the simpler_log_init call, and the now-unused <mutex> include.
  • python/bindings/task_interface.cpp_ChipWorker.init def is now 4 args.
  • python/simpler/task_interface.py_preloaded_globals registry + _preload_global(); ChipWorker.init does the preload + simpler_log_init via ctypes, then the 4-arg _impl.init.
  • tests/ut/py/test_chip_worker.py — the three _ChipWorker.init(...) fault-path tests drop the /nonexistent/libsimpler_log.so arg.
  • Docs (chip-level-arch, dynamic-linking, logging, python/simpler/__init__.py, python/simpler/_log.py) — init-flow ASCII art / load-order / config-flow table now show the preload in the Python wrapper.

Testing

  • pip install --no-build-isolation -e . (all 4 host_runtime.so + libsimpler_log compile)
  • pytest tests/ut/py — 119 passed, 7 skipped (torch-missing tests excluded as before)
  • examples/workers/l2/{hello_worker, worker_malloc} on a2a3sim and a5sim
  • Onboard ut + st (runs in CI on Linux)

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the library preloading and logging initialization logic, moving the responsibility from the C++ ChipWorker implementation to the Python task_interface wrapper. The Python layer now uses ctypes.CDLL with RTLD_GLOBAL to load libsimpler_log.so and libcpu_sim_context.so (on simulation platforms) and seeds the HostLogger before calling the simplified C++ init method. Feedback suggests improving the robustness of the ctypes implementation by using RTLD_NOW for immediate symbol resolution, explicitly defining argtypes and restype for the C functions, and adding validation for mandatory library paths.

Comment thread python/simpler/task_interface.py Outdated
Comment thread python/simpler/task_interface.py
…pWorker::init to Python

Continues the API-narrowing theme of hw-native-sys#723 / hw-native-sys#735. ChipWorker::init was the
last place in C++ doing process-wide SO bootstrap (dlopen libsimpler_log.so
and, on sim, libcpu_sim_context.so with RTLD_GLOBAL, plus calling
libsimpler_log.so's simpler_log_init to seed HostLogger). That work moves up
into the Python `ChipWorker` wrapper, shrinking the C++ init signature from
8 args to 4.

Before:
  void ChipWorker::init(host_lib, aicpu, aicore, simpler_log_lib,
                        device_id, sim_context_lib = "",
                        log_level = 1, log_info_v = 5);
After:
  void ChipWorker::init(host_lib, aicpu, aicore, device_id);

### Why this is safe

`_task_interface.so` (the nanobind module that contains chip_worker.cpp) has
no undefined HostLogger / unified_log_* symbols — chip_worker.cpp reaches
host_runtime.so purely via dlsym, and the binding code itself doesn't log. So
the RTLD_GLOBAL preload only has to precede the `_ChipWorker.init` dlopen of
host_runtime.so, not module import. The Python wrapper does exactly that:

  1. ctypes.CDLL(bins.simpler_log_path, mode=RTLD_GLOBAL)   # once per process
  2. <handle>.simpler_log_init(log_level, log_info_v)       # seed HostLogger
  3. if bins.sim_context_path:                              # sim only
       ctypes.CDLL(bins.sim_context_path, mode=RTLD_GLOBAL)
  4. self._impl.init(host_path, aicpu_path, aicore_path, device_id)

A module-level `_preloaded_globals: dict[str, ctypes.CDLL]` makes the loads
idempotent per path — the Python counterpart of the C++ side's old
std::once_flag.

### Changes

src/common/worker/chip_worker.{h,cpp}:
- init() drops simpler_log_lib_path, sim_context_lib_path, log_level,
  log_info_v params.
- Remove the g_simpler_log_* / g_sim_context_* file-scope globals,
  ensure_simpler_log_loaded(), ensure_sim_context_loaded(), the
  SimplerLogInitFn typedef + simpler_log_init_fn_ member, and the
  simpler_log_init call. Drop the now-unused <mutex> include.
- init()'s body is just: dlopen host_runtime.so RTLD_LOCAL → dlsym → create
  device ctx → read executor binaries → simpler_init.

python/bindings/task_interface.cpp:
- `_ChipWorker.init` nanobind def: 4 args (host_lib_path, aicpu_path,
  aicore_path, device_id).

python/simpler/task_interface.py:
- New module-level `_preloaded_globals` registry + `_preload_global(path)`
  helper (ctypes.CDLL RTLD_GLOBAL, one per path).
- ChipWorker.init: preload libsimpler_log.so + call simpler_log_init via the
  ctypes handle, preload libcpu_sim_context.so when bins.sim_context_path is
  set, then call the 4-arg _impl.init. Wrapper's public signature
  (device_id, bins, log_level=None, log_info_v=None) is unchanged, so no
  caller updates needed.

tests/ut/py/test_chip_worker.py:
- The three `_ChipWorker.init(...)` fault-path tests drop the
  `/nonexistent/libsimpler_log.so` argument (no longer a parameter).

Docs (chip-level-arch, dynamic-linking, logging, python/simpler/__init__.py,
python/simpler/_log.py): updated the init-flow ASCII art / load-order section
/ configuration-flow table to show the preload happening in the Python
wrapper before the C++ _ChipWorker.init.

Verified locally on a2a3sim + a5sim:
- pip install --no-build-isolation -e .
- pytest tests/ut/py  (119 passed, 7 skipped; torch-missing tests excluded as before)
- examples/workers/l2/{hello_worker, worker_malloc} on both sims

Onboard ut + st coverage runs in CI (Linux).
@ChaoWao ChaoWao force-pushed the refactor/move-so-preload-to-python branch from 0cb3027 to c3351ff Compare May 12, 2026 03:05
@ChaoWao ChaoWao merged commit 76543e1 into hw-native-sys:main May 12, 2026
14 checks passed
@ChaoWao ChaoWao deleted the refactor/move-so-preload-to-python branch May 12, 2026 03:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant