Skip to content

Refactor: collapse ChipWorker init/set_device → init(device_id, bins)#723

Merged
ChaoWao merged 1 commit into
hw-native-sys:mainfrom
ChaoWao:refactor/chip-worker-fold-set-device
May 11, 2026
Merged

Refactor: collapse ChipWorker init/set_device → init(device_id, bins)#723
ChaoWao merged 1 commit into
hw-native-sys:mainfrom
ChaoWao:refactor/chip-worker-fold-set-device

Conversation

@ChaoWao
Copy link
Copy Markdown
Collaborator

@ChaoWao ChaoWao commented May 8, 2026

Summary

ChipWorker exposed init / set_device / reset_device / finalize as four public lifecycle methods even though set_device existed only to satisfy CANN's per-thread aclrtSetDevice attach requirement (#715), and reset_device had zero call sites. The split was already the source of one bug (#715) and forced every caller through a redundant ceremony.

This PR collapses the lifecycle to a single attach point at init.

What changed

C++ (src/common/worker/chip_worker.{h,cpp})

  • ChipWorker::init now takes device_id and attaches the calling thread internally via simpler_init
  • ChipWorker::set_device and ::reset_device deleted
  • device_set_ flag and device_set() getter collapse into initialized_
  • finalize() simplified — calls finalize_device_fn_ directly

C-API (pto_runtime_c_api.h + a2a3/a5 onboard+sim implementations)

  • set_device export removed
  • simpler_init signature extended to (ctx, device_id, log_level, log_info_v) and now returns int; folds the per-thread attach in alongside log config

DeviceRunner attach surface (a2a3 + a5, onboard + sim)

  • attach_current_thread(int) is the centralized binder. Onboard: rtSetDevice. Sim: pto_cpu_sim_bind_device + pto_cpu_sim_acquire_device.
  • device_id_ stays a plain int. All ChipWorker callers run on the same thread that called init(), so the standard thread-spawn happens-before edge is sufficient — no atomic, no CAS, no per-op re-attach.

Python

  • _ChipWorker.init / ChipWorker.init take device_id (and the wrapper takes it before bins, since it's the simpler, more fundamental "where" parameter)
  • _ChipWorker.set_device / .reset_device bindings removed; device_set property removed
  • bootstrap_context no longer calls set_device — the device is attached at init time, the param is now used only as a defensive consistency check

Drive-by fix: ProfilerBase's constructor/destructor are now private with friend Derived (bugprone-crtp-constructor-accessibility). Newer clang-tidy versions on macOS would otherwise reject any TU that transitively includes profiler_base.h.

Callers updated

python/simpler/worker.py, simpler_setup/scene_test.py, tests/ut/py/test_chip_worker.py, tests/ut/py/test_worker/test_platform_comm.py, and test_bootstrap_context_{sim,hw}.py collapse init + set_device into a single init(device_id, bins) call.

Docs

docs/chip-level-arch.md, docs/dynamic-linking.md, docs/getting-started.md, plus the worker_malloc / hello_worker examples reflect the new flow.

Test plan

  • `pip install --no-build-isolation -e .` — clean build (a2a3sim + a5sim) on macOS
  • `pytest tests/ut/py` — 116 passed, 7 skipped (the only unrelated failures need `torch`, which is not installed in the local env)
  • `python examples/workers/l2/{hello_worker,worker_malloc}/main.py -p {a2a3sim,a5sim} -d 0` — passes
  • Onboard CI: `ut-a2a3` / `ut-a5` / `st-onboard-*` (Linux, hardware) — needs CI run
  • `ctest --test-dir build/ut_cpp` (Linux) — needs CI run

@ChaoWao ChaoWao force-pushed the refactor/chip-worker-fold-set-device branch from efc0058 to 5528c02 Compare May 8, 2026 09:28
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the ChipWorker API by integrating NPU device attachment into the init method and removing the explicit set_device and reset_device calls. Device operations such as malloc, free, and copy now internally handle thread attachment to ensure safety across different calling threads. Documentation, examples, and tests have been updated to reflect these changes. Feedback identifies potential data races in the initialization and attachment logic, along with a resource leak in the free_tensor error path.

Comment thread src/common/worker/chip_worker.cpp Outdated
Comment thread src/a2a3/platform/sim/host/device_runner.cpp
Comment thread src/a5/platform/sim/host/device_runner.cpp
Comment thread src/a2a3/platform/onboard/host/device_runner.cpp Outdated
Comment thread src/a5/platform/onboard/host/device_runner.cpp Outdated
@ChaoWao ChaoWao force-pushed the refactor/chip-worker-fold-set-device branch 2 times, most recently from 00889af to f48e496 Compare May 9, 2026 08:40
@ChaoWao ChaoWao changed the title Refactor: collapse ChipWorker init/set_device, fold per-thread attach into device-ops Refactor: collapse ChipWorker init/set_device → init(device_id, bins) May 9, 2026
ChipWorker exposed init / set_device / reset_device / finalize as four
public lifecycle methods even though set_device existed only to satisfy
CANN's per-thread aclrtSetDevice attach requirement (PR hw-native-sys#715), and
reset_device had zero call sites. The split was the source of one bug
(hw-native-sys#715) and forced every caller through a redundant ceremony.

Collapse into a single attach point:

- ChipWorker::init now takes device_id and attaches the calling thread
  internally via simpler_init; ChipWorker::set_device and ::reset_device
  are deleted, and device_set_ / device_set() collapse into initialized_.
- C-API set_device export is removed; simpler_init grows a device_id
  parameter and returns int. The per-thread attach is centralized inside
  DeviceRunner::attach_current_thread (sim DeviceRunner gains the same
  method, driving pto_cpu_sim_bind_device + idempotent acquire_device).
  All ChipWorker callers run on the same thread that called init, so
  device_id_ stays a plain int — no atomic / no CAS needed.
- Python `ChipWorker.init` wrapper takes (device_id, bins) — device_id
  first since it's the simpler, more fundamental "where" parameter.
  bootstrap_context's device_id arg becomes a defensive consistency
  check rather than the attach point.
- Callers updated: python/simpler/worker.py, simpler_setup/scene_test.py,
  tests/ut/py/test_chip_worker.py, tests/ut/py/test_worker/test_platform_comm.py
  and test_bootstrap_context_{sim,hw}.py.
- Docs (chip-level-arch, dynamic-linking, getting-started) and the
  worker_malloc / hello_worker examples reflect the new flow.

Drive-by: ProfilerBase's constructor/destructor are now private with
`friend Derived` (bugprone-crtp-constructor-accessibility); newer
clang-tidy versions on macOS would otherwise reject any TU that
transitively includes profiler_base.h.
@ChaoWao ChaoWao force-pushed the refactor/chip-worker-fold-set-device branch from f48e496 to 9939bd3 Compare May 9, 2026 09:32
@ChaoWao ChaoWao merged commit 359fc3d into hw-native-sys:main May 11, 2026
14 checks passed
@ChaoWao ChaoWao deleted the refactor/chip-worker-fold-set-device branch May 11, 2026 03:38
ChaoWao added a commit to ChaoWao/simpler-fork that referenced this pull request May 11, 2026
The C-ABI run_runtime() carried three parameters that are not actually
per-run state:

  - aicpu_binary / aicore_binary: executor binaries pinned to the
    (platform, runtime) tuple, read off disk once during ChipWorker::init
    and never changed during the worker's lifetime.
  - device_id: already bound to the DeviceRunner via attach_current_thread
    (called from simpler_init at init time, see hw-native-sys#715 / hw-native-sys#723).

Every run_runtime() call paid for these by copying the AICPU + AICore byte
arrays into fresh std::vector<uint8_t>'s inside the dlsym'd SO and
forwarding device_id through prepare_run_context just to re-attach the
same thread the runner already knows about.

Layer A: collapse these three into init-time setup.

- pto_runtime_c_api.h: declare `int bind_executors(ctx, aicpu_ptr,
  aicpu_size, aicore_ptr, aicore_size)`; shrink run_runtime() to drop
  device_id and the four executor params. The file-header dlsym list and
  run_runtime doxygen both note that executor / device are init-time-only.
- 4 platform pto_runtime_c_api.cpp (a2a3 + a5, onboard + sim): implement
  bind_executors as a thin wrapper that constructs vectors from the byte
  ranges and moves them into DeviceRunner::set_executors. run_runtime no
  longer constructs per-call vectors, no longer passes device_id; it reads
  runner->device_id() and calls runner->run(rt, block_dim, aicpu_thread_num).
- 4 DeviceRunner.{h,cpp}: add `set_executors(vector<uint8_t> aicpu,
  vector<uint8_t> aicore)` (by-value + move), drop `device_id` /
  `aicpu_so_binary` / `aicore_kernel_binary` from `DeviceRunner::run`,
  `ensure_device_initialized`, `ensure_binaries_loaded`. Onboard already
  had `aicore_kernel_binary_` as a member; add the parallel `aicpu_binary_`
  to it and to all four DeviceRunner variants. Add a public
  `device_id()` getter so the c_api can ask the runner what to attach to.
- chip_worker.h/cpp: add `BindExecutorsFn` typedef + `bind_executors_fn_`
  member; drop `aicpu_binary_` / `aicore_binary_` members (the bytes are
  now owned by the DeviceRunner). `init()` reads binaries into local
  vectors and hands them off via `bind_executors_fn_(...)` right after
  `simpler_init_fn_(...)`; the local vectors then die at the end of the
  scope. `run()` passes neither device_id nor binaries down — they are
  resolved by run_runtime() inside the SO. init's rollback path mirrors
  finalize's teardown exactly minus finalize_device_fn_.
- docs/chip-level-arch.md, docs/dynamic-linking.md: refresh ABI listings
  and the init / run flow diagrams to show bind_executors as a separate
  init-time step and to remove the device_id + binary args from
  run_runtime.

Verified locally on a2a3sim + a5sim:

  - pip install --no-build-isolation -e .  (all 4 host_runtime.so compile)
  - pytest tests/ut/py  (116 passed, 7 skipped)
  - examples/workers/l2/{hello_worker,worker_malloc} on both sims
  - tests/ut/py/test_worker/test_bootstrap_context_sim.py (5 passed,
    exercises init → bind_executors → run end-to-end through sim)

Onboard ut + st coverage runs in CI (Linux). This is a prerequisite for
later run_runtime decomposition; the API is now narrow enough that a
register/run split won't drag the executor/device args along.
ChaoWao added a commit to ChaoWao/simpler-fork that referenced this pull request May 11, 2026
The C-ABI run_runtime() carried three parameters that are not actually
per-run state:

  - aicpu_binary / aicore_binary: executor binaries pinned to the
    (platform, runtime) tuple, read off disk once during ChipWorker::init
    and never changed during the worker's lifetime.
  - device_id: already bound to the DeviceRunner via attach_current_thread
    (called from simpler_init at init time, see hw-native-sys#715 / hw-native-sys#723).

Every run_runtime() call paid for these by copying the AICPU + AICore byte
arrays into fresh std::vector<uint8_t>'s inside the dlsym'd SO and
forwarding device_id through prepare_run_context just to re-attach the
same thread the runner already knows about.

Layer A: collapse these three into init-time setup.

- pto_runtime_c_api.h: declare `int bind_executors(ctx, aicpu_ptr,
  aicpu_size, aicore_ptr, aicore_size)`; shrink run_runtime() to drop
  device_id and the four executor params. The file-header dlsym list and
  run_runtime doxygen both note that executor / device are init-time-only.
- 4 platform pto_runtime_c_api.cpp (a2a3 + a5, onboard + sim): implement
  bind_executors as a thin wrapper that constructs vectors from the byte
  ranges and moves them into DeviceRunner::set_executors. run_runtime no
  longer constructs per-call vectors, no longer passes device_id; it reads
  runner->device_id() and calls runner->run(rt, block_dim, aicpu_thread_num).
- 4 DeviceRunner.{h,cpp}: add `set_executors(vector<uint8_t> aicpu,
  vector<uint8_t> aicore)` (by-value + move), drop `device_id` /
  `aicpu_so_binary` / `aicore_kernel_binary` from `DeviceRunner::run`,
  `ensure_device_initialized`, `ensure_binaries_loaded`. Onboard already
  had `aicore_kernel_binary_` as a member; add the parallel `aicpu_binary_`
  to it and to all four DeviceRunner variants. Add a public
  `device_id()` getter so the c_api can ask the runner what to attach to.
- chip_worker.h/cpp: add `BindExecutorsFn` typedef + `bind_executors_fn_`
  member; drop `aicpu_binary_` / `aicore_binary_` members (the bytes are
  now owned by the DeviceRunner). `init()` reads binaries into local
  vectors and hands them off via `bind_executors_fn_(...)` right after
  `simpler_init_fn_(...)`; the local vectors then die at the end of the
  scope. `run()` passes neither device_id nor binaries down — they are
  resolved by run_runtime() inside the SO. init's rollback path mirrors
  finalize's teardown exactly minus finalize_device_fn_.
- docs/chip-level-arch.md, docs/dynamic-linking.md: refresh ABI listings
  and the init / run flow diagrams to show bind_executors as a separate
  init-time step and to remove the device_id + binary args from
  run_runtime.

Verified locally on a2a3sim + a5sim:

  - pip install --no-build-isolation -e .  (all 4 host_runtime.so compile)
  - pytest tests/ut/py  (116 passed, 7 skipped)
  - examples/workers/l2/{hello_worker,worker_malloc} on both sims
  - tests/ut/py/test_worker/test_bootstrap_context_sim.py (5 passed,
    exercises init → bind_executors → run end-to-end through sim)

Onboard ut + st coverage runs in CI (Linux). This is a prerequisite for
later run_runtime decomposition; the API is now narrow enough that a
register/run split won't drag the executor/device args along.
ChaoWao added a commit to ChaoWao/simpler-fork that referenced this pull request May 11, 2026
… device_init

Continues the API-narrowing theme of hw-native-sys#723. Two strands woven together:

(1) Logger ownership moves entirely to libsimpler_log.so.

Before: simpler_init in host_runtime.so reached cross-SO into libsimpler_log
        for HostLogger setup, then host_runtime.so cached log_level /
        log_info_v on every DeviceRunner so run_runtime could later forward
        them to AICPU. Log state lived in three places (HostLogger, runner
        member, KernelArgs) all seeded off the same C-ABI argument.
After:  libsimpler_log.so exports its own simpler_log_init(level, info_v) C
        entry, called from ChipWorker::init BEFORE host_runtime.so is even
        dlopened. HostLogger gains level()/info_v() raw getters. Every
        consumer (host_runtime, AICPU forwarding, CANN dlog sync) reads
        from HostLogger::get_instance() directly. Log state is owned in
        exactly one place; no log argument ever travels through the
        host_runtime.so C ABI.

(2) host_runtime.so's init surface collapses to one entry: device_init.

Before: simpler_init(ctx, device_id, log_level, log_info_v) +
        bind_executors(ctx, aicpu_*, aicore_*) — two adjacent init-time
        entries always called back-to-back from ChipWorker::init.
After:  device_init(ctx, device_id, aicpu_*, aicore_*) — single entry that
        attaches the calling thread, takes ownership of executor binaries,
        and (onboard) syncs CANN dlog from HostLogger. Log args gone
        because (1) put them on a separate SO.

### Changes

libsimpler_log.so:
- HostLogger gains `int level() const` / `int info_v() const` raw getters.
- New C export `simpler_log_init(int log_level, int log_info_v)` validates
  and forwards to HostLogger setters.

host_runtime.so C ABI (`pto_runtime_c_api.h` + 4 platform impls):
- `simpler_init` and `bind_executors` removed.
- New `device_init(ctx, device_id, aicpu_ptr, aicpu_size, aicore_ptr,
  aicore_size)`: attach + executor takeover + (onboard) dlog sync. Onboard's
  dlog_setlevel reads HostLogger::get_instance().level() — no log arg.
- Header doc updated; dlsym list updated.

4 × DeviceRunner.{h,cpp}:
- Drop `log_level_` / `log_info_v_` members + `set_log_level` /
  `set_log_info_v` setters (no DeviceRunner-side log cache).
- run() reads `HostLogger::get_instance().level()` / `.info_v()` directly
  when populating KernelArgs (onboard) or forwarding to the AICPU sim SO
  via dlsym (sim).

ChipWorker:
- ensure_simpler_log_loaded keeps `g_simpler_log_handle` (was already
  global) — ChipWorker now dlsym's `simpler_log_init` from it and calls it
  BEFORE host_runtime.so is opened, so any LOG_* macro firing during
  host_runtime.so's dlopen-time constructors already sees the right level.
- `simpler_init_fn_` + `bind_executors_fn_` replaced by `device_init_fn_`.
- init()'s rollback path collapsed: one rc check + one rollback instead of
  two.

Docs:
- `docs/chip-level-arch.md`, `docs/dynamic-linking.md`: ABI listings,
  lifecycle diagrams.
- `docs/logging.md`, `docs/testing.md`: log configuration flow.
- `python/simpler/__init__.py`, `python/simpler/_log.py`: module docstrings.
- worker_malloc README.

Verified locally on a2a3sim + a5sim:
- pip install --no-build-isolation -e .  (all 4 host_runtime.so + libsimpler_log compile)
- pytest tests/ut/py  (116 passed, 7 skipped)
- examples/workers/l2/{hello_worker, worker_malloc} on both sims
- tests/ut/py/test_worker/test_bootstrap_context_sim.py (5 passed,
  exercises init → device_init → run end-to-end through sim)

Onboard ut + st coverage runs in CI (Linux).
ChaoWao added a commit to ChaoWao/simpler-fork that referenced this pull request May 12, 2026
…pWorker::init to Python

Continues the API-narrowing theme of hw-native-sys#723 / hw-native-sys#735. ChipWorker::init was the
last place in C++ doing process-wide SO bootstrap (dlopen libsimpler_log.so
and, on sim, libcpu_sim_context.so with RTLD_GLOBAL, plus calling
libsimpler_log.so's simpler_log_init to seed HostLogger). That work moves up
into the Python `ChipWorker` wrapper, shrinking the C++ init signature from
8 args to 4.

Before:
  void ChipWorker::init(host_lib, aicpu, aicore, simpler_log_lib,
                        device_id, sim_context_lib = "",
                        log_level = 1, log_info_v = 5);
After:
  void ChipWorker::init(host_lib, aicpu, aicore, device_id);

### Why this is safe

`_task_interface.so` (the nanobind module that contains chip_worker.cpp) has
no undefined HostLogger / unified_log_* symbols — chip_worker.cpp reaches
host_runtime.so purely via dlsym, and the binding code itself doesn't log. So
the RTLD_GLOBAL preload only has to precede the `_ChipWorker.init` dlopen of
host_runtime.so, not module import. The Python wrapper does exactly that:

  1. ctypes.CDLL(bins.simpler_log_path, mode=RTLD_GLOBAL)   # once per process
  2. <handle>.simpler_log_init(log_level, log_info_v)       # seed HostLogger
  3. if bins.sim_context_path:                              # sim only
       ctypes.CDLL(bins.sim_context_path, mode=RTLD_GLOBAL)
  4. self._impl.init(host_path, aicpu_path, aicore_path, device_id)

A module-level `_preloaded_globals: dict[str, ctypes.CDLL]` makes the loads
idempotent per path — the Python counterpart of the C++ side's old
std::once_flag.

### Changes

src/common/worker/chip_worker.{h,cpp}:
- init() drops simpler_log_lib_path, sim_context_lib_path, log_level,
  log_info_v params.
- Remove the g_simpler_log_* / g_sim_context_* file-scope globals,
  ensure_simpler_log_loaded(), ensure_sim_context_loaded(), the
  SimplerLogInitFn typedef + simpler_log_init_fn_ member, and the
  simpler_log_init call. Drop the now-unused <mutex> include.
- init()'s body is just: dlopen host_runtime.so RTLD_LOCAL → dlsym → create
  device ctx → read executor binaries → simpler_init.

python/bindings/task_interface.cpp:
- `_ChipWorker.init` nanobind def: 4 args (host_lib_path, aicpu_path,
  aicore_path, device_id).

python/simpler/task_interface.py:
- New module-level `_preloaded_globals` registry + `_preload_global(path)`
  helper (ctypes.CDLL RTLD_GLOBAL, one per path).
- ChipWorker.init: preload libsimpler_log.so + call simpler_log_init via the
  ctypes handle, preload libcpu_sim_context.so when bins.sim_context_path is
  set, then call the 4-arg _impl.init. Wrapper's public signature
  (device_id, bins, log_level=None, log_info_v=None) is unchanged, so no
  caller updates needed.

tests/ut/py/test_chip_worker.py:
- The three `_ChipWorker.init(...)` fault-path tests drop the
  `/nonexistent/libsimpler_log.so` argument (no longer a parameter).

Docs (chip-level-arch, dynamic-linking, logging, python/simpler/__init__.py,
python/simpler/_log.py): updated the init-flow ASCII art / load-order section
/ configuration-flow table to show the preload happening in the Python
wrapper before the C++ _ChipWorker.init.

Verified locally on a2a3sim + a5sim:
- pip install --no-build-isolation -e .
- pytest tests/ut/py  (119 passed, 7 skipped; torch-missing tests excluded as before)
- examples/workers/l2/{hello_worker, worker_malloc} on both sims

Onboard ut + st coverage runs in CI (Linux).
ChaoWao added a commit that referenced this pull request May 12, 2026
…pWorker::init to Python (#746)

Continues the API-narrowing theme of #723 / #735. ChipWorker::init was the
last place in C++ doing process-wide SO bootstrap (dlopen libsimpler_log.so
and, on sim, libcpu_sim_context.so with RTLD_GLOBAL, plus calling
libsimpler_log.so's simpler_log_init to seed HostLogger). That work moves up
into the Python `ChipWorker` wrapper, shrinking the C++ init signature from
8 args to 4.

Before:
  void ChipWorker::init(host_lib, aicpu, aicore, simpler_log_lib,
                        device_id, sim_context_lib = "",
                        log_level = 1, log_info_v = 5);
After:
  void ChipWorker::init(host_lib, aicpu, aicore, device_id);

### Why this is safe

`_task_interface.so` (the nanobind module that contains chip_worker.cpp) has
no undefined HostLogger / unified_log_* symbols — chip_worker.cpp reaches
host_runtime.so purely via dlsym, and the binding code itself doesn't log. So
the RTLD_GLOBAL preload only has to precede the `_ChipWorker.init` dlopen of
host_runtime.so, not module import. The Python wrapper does exactly that:

  1. ctypes.CDLL(bins.simpler_log_path, mode=RTLD_GLOBAL)   # once per process
  2. <handle>.simpler_log_init(log_level, log_info_v)       # seed HostLogger
  3. if bins.sim_context_path:                              # sim only
       ctypes.CDLL(bins.sim_context_path, mode=RTLD_GLOBAL)
  4. self._impl.init(host_path, aicpu_path, aicore_path, device_id)

A module-level `_preloaded_globals: dict[str, ctypes.CDLL]` makes the loads
idempotent per path — the Python counterpart of the C++ side's old
std::once_flag.

### Changes

src/common/worker/chip_worker.{h,cpp}:
- init() drops simpler_log_lib_path, sim_context_lib_path, log_level,
  log_info_v params.
- Remove the g_simpler_log_* / g_sim_context_* file-scope globals,
  ensure_simpler_log_loaded(), ensure_sim_context_loaded(), the
  SimplerLogInitFn typedef + simpler_log_init_fn_ member, and the
  simpler_log_init call. Drop the now-unused <mutex> include.
- init()'s body is just: dlopen host_runtime.so RTLD_LOCAL → dlsym → create
  device ctx → read executor binaries → simpler_init.

python/bindings/task_interface.cpp:
- `_ChipWorker.init` nanobind def: 4 args (host_lib_path, aicpu_path,
  aicore_path, device_id).

python/simpler/task_interface.py:
- New module-level `_preloaded_globals` registry + `_preload_global(path)`
  helper (ctypes.CDLL RTLD_GLOBAL, one per path).
- ChipWorker.init: preload libsimpler_log.so + call simpler_log_init via the
  ctypes handle, preload libcpu_sim_context.so when bins.sim_context_path is
  set, then call the 4-arg _impl.init. Wrapper's public signature
  (device_id, bins, log_level=None, log_info_v=None) is unchanged, so no
  caller updates needed.

tests/ut/py/test_chip_worker.py:
- The three `_ChipWorker.init(...)` fault-path tests drop the
  `/nonexistent/libsimpler_log.so` argument (no longer a parameter).

Docs (chip-level-arch, dynamic-linking, logging, python/simpler/__init__.py,
python/simpler/_log.py): updated the init-flow ASCII art / load-order section
/ configuration-flow table to show the preload happening in the Python
wrapper before the C++ _ChipWorker.init.

Verified locally on a2a3sim + a5sim:
- pip install --no-build-isolation -e .
- pytest tests/ut/py  (119 passed, 7 skipped; torch-missing tests excluded as before)
- examples/workers/l2/{hello_worker, worker_malloc} on both sims

Onboard ut + st coverage runs in CI (Linux).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant