Performance optimization on asset writes by AntoineRichard · Pull Request #5329 · isaac-sim/IsaacLab

AntoineRichard · 2026-04-20T16:41:25Z

Description

Improves performance across the board for asset writes.

Type of change

Bug fix (non-breaking change which fixes an issue)

Micro benchmarks

  ┌─────────────────────────────────┬────────┬───────┬─────────┐
  │             Method              │ Before │ After │ Speedup │
  ├─────────────────────────────────┼────────┼───────┼─────────┤
  │ write_root_link_pose_to_sim     │ 94     │ 81    │ 1.16x   │
  ├─────────────────────────────────┼────────┼───────┼─────────┤
  │ write_root_com_pose_to_sim      │ 111    │ 90    │ 1.23x   │
  ├─────────────────────────────────┼────────┼───────┼─────────┤
  │ write_root_link_velocity_to_sim │ 128    │ 124   │ 1.03x   │
  ├─────────────────────────────────┼────────┼───────┼─────────┤
  │ write_root_com_velocity_to_sim  │ 110    │ 108   │ 1.02x   │
  ├─────────────────────────────────┼────────┼───────┼─────────┤
  │ write_joint_state_to_sim        │ 180    │ 127   │ 1.42x   │
  ├─────────────────────────────────┼────────┼───────┼─────────┤
  │ write_joint_position_to_sim     │ 98     │ 108   │ ~1.0x   │
  ├─────────────────────────────────┼────────┼───────┼─────────┤
  │ set_joint_position_target       │ 94     │ 107   │ ~1.0x   │
  ├─────────────────────────────────┼────────┼───────┼─────────┤
  │ write_joint_stiffness_to_sim    │ 171    │ 152   │ 1.13x   │
  ├─────────────────────────────────┼────────┼───────┼─────────┤
  │ write_joint_damping_to_sim      │ 177    │ 151   │ 1.17x   │
  ├─────────────────────────────────┼────────┼───────┼─────────┤
  │ write_joint_armature_to_sim     │ 170    │ 153   │ 1.11x   │
  ├─────────────────────────────────┼────────┼───────┼─────────┤
  │ set_masses                      │ 177    │ 152   │ 1.16x   │
  ├─────────────────────────────────┼────────┼───────┼─────────┤
  │ set_coms                        │ 350    │ 272   │ 1.29x   │
  ├─────────────────────────────────┼────────┼───────┼─────────┤
  │ set_inertias                    │ 350    │ 280   │ 1.25x   │
  └─────────────────────────────────┴────────┴───────┴─────────┘

Checklist

I have read and understood the contribution guidelines
I have run the pre-commit checks with ./isaaclab.sh --format
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
I have updated the changelog and the corresponding version in the extension's config/extension.toml file
I have added my name to the CONTRIBUTORS.md or my name already exists there

Add benchmark scripts for both PhysX and Newton backends covering ArticulationData property access and Articulation setter/writer methods. Includes torch_list, torch_tensor, warp_mask modes and fill-ratio variants (5%, 95%, 100%). Add design spec and implementation plan for four performance optimizations targeting Python dispatch overhead.

Slim 8 warp kernels (index + mask variants for root link pose, root COM pose, root COM velocity, root link velocity) to remove output parameters that are always passed as None. Reduces arg marshalling overhead and simplifies generated GPU code.

Add single-slot caches keyed on tensor.data_ptr() to _resolve_env_ids, _resolve_joint_ids, and _resolve_body_ids in both PhysX and Newton backends. Eliminates redundant wp.array wrapper allocations when the same tensor is passed across training steps.

Pre-compute and cache float32 views of structured warp arrays used by PhysX TensorAPI calls. Eliminates per-call wp.array wrapper allocation in root pose and velocity writers.

Replace per-call wp.clone(device='cpu') with wp.copy into pre-allocated pinned CPU buffers for all joint property and body property writers. Pinned memory enables DMA fast path and eliminates CPU malloc overhead. Full-index env_ids also use a pre-allocated pinned buffer.

Add missing _cached_*_ptr/_wp fields, _root_*_f32 view caches, and pinned CPU buffers to benchmark mock articulation constructors so they match the new attributes added by the optimization commits.

Add AssetBaseCfg.disable_shape_checks that controls whether assert_shape_and_dtype and assert_shape_and_dtype_mask run validation. When None (default), follows Python's __debug__ flag. When True, disables checks entirely for ~3us savings per setter/writer call. The flag is resolved once at init to a bool _check_shapes field. Also add --no_shape_checks CLI flag to articulation benchmarks.

Cache the flattened float32 views of wrench composer force and torque arrays to avoid creating new wp.array wrapper objects every simulation step.

Add fused write_joint_state_data kernels (index + mask variants) for both Newton and PhysX backends. write_joint_state_to_sim_mask and the deprecated write_joint_state_to_sim now resolve indices and launch a single kernel instead of two separate calls, saving ~50-80us per call.

- Add CHANGELOG entries and version bumps for isaaclab (4.6.7), isaaclab_newton (0.5.17), and isaaclab_physx (0.5.19). - Fix _resolve_env_ids int64 cache GC hazard: move data_ptr() capture after dtype conversion so the cache key matches the actual tensor. - Add write_joint_state_to_sim_index to Newton for API parity with PhysX. - Update Newton deprecated write_joint_state_to_sim to delegate to the new fused index method with correct docstring and warning message. - Fix misleading "Recreated in _create_simulation_bindings" comment in PhysX _create_buffers.

int64→int32 conversion creates a temporary tensor whose backing memory can be reclaimed. Return wp.from_torch directly without caching.

- Fix "This method expect" → "This method expects" across all PhysX articulation docstrings. - Rename kernel param from_mask → full_data in write_joint_state_data for consistency with caller. - Add invalidation notes to cached view helper docstrings. - Replace "infrequent path" with "partial resets" in _get_cpu_env_ids. - Remove stale timing estimate from disable_shape_checks docstring and document the False case.

…ernels Slim 8 PhysX warp kernels (4 root + 4 body variants) to remove output parameters for deprecated state buffers that are always passed as None. Update all 12 call sites across Articulation, RigidObject, and RigidObjectCollection to remove the None entries from outputs lists.

Apply single-slot data_ptr() caching to _resolve_env_ids and _resolve_body_ids in PhysX DeformableObject, Newton RigidObject, and Newton RigidObjectCollection.

Cache .view(wp.float32) results in PhysX RigidObject, DeformableObject, and wrench .flatten().view() in RigidObject/Collection write_data_to_sim. Pre-allocate pinned CPU buffers for RigidObject set_masses/set_coms/ set_inertias.

The _check_shapes attribute was only set in __init__, but assert_shape_and_dtype can be called before __init__ completes (e.g. during _initialize_impl callbacks). Add a class-level default so the attribute always exists.

Add micro-benchmark scripts for Newton RigidObject setter/writer methods (torch_list, torch_tensor, warp_mask modes with fill-ratio variants) and RigidObjectData property access, following the same patterns as the existing Newton articulation benchmarks.

Add micro-benchmark scripts for Newton RigidObjectCollection setter/ writer methods and CollectionData property access, following the same patterns as the existing Newton articulation benchmarks.

isaaclab-review-bot

🤖 Isaac Lab Review Bot

Summary

This PR delivers a solid performance optimization for asset write operations by introducing single-slot caching for _resolve_* methods, pre-allocated pinned CPU buffers for PhysX TensorAPI writes, cached .view(wp.float32) wrappers for structured warp arrays, and a fused write_joint_state_to_sim_index kernel. The implementation is well-structured and aligns with the claimed speedups (1.16x-1.42x on hot paths).

Architecture Impact

Low risk. Changes are internal implementation optimizations that maintain public API compatibility:

_resolve_env_ids, _resolve_joint_ids, _resolve_body_ids now use single-slot pointer caching to avoid repeated wp.from_torch allocations
PhysX TensorAPI writes now use pre-allocated pinned CPU buffers instead of per-call wp.clone(..., device="cpu")
The new disable_shape_checks config option provides a documented escape hatch for production workloads
Kernel output signature changes are internal (removed unused None outputs for state arrays)

Implementation Verdict

Minor fixes needed — Clean implementation with good performance wins. A few issues require attention before merge.

Test Coverage

Adequate for performance optimization PR. The extensive micro-benchmarking framework added in benchmark/assets/ serves as validation infrastructure. The PR does not claim to fix bugs, so no regression tests are required. Existing test coverage through isaaclab_physx and isaaclab_newton test suites validates correctness. The CI test failures (test_deformable_object.py::test_set_nodal_state_with_applied_transform) are unrelated to this PR.

CI Status

❌ pre-commit — Formatting issues in benchmark files (code will need ./isaaclab.sh -f)
❌ license-check — Pre-existing infrastructure issue (ovphysx has license: LicenseRef-NVIDIA-Omniverse), unrelated to PR
❌ isaaclab_physx tests — 4 failures in test_deformable_object.py (deformable object tests), unrelated to this PR's articulation/rigid object changes
✅ Core tests, build, wheel, docs all passing

Findings

🟡 Warning: pre-commit CI failure — formatting issues need fixing
The benchmark files have formatting that pre-commit rejects. Run ./isaaclab.sh -f to auto-fix and push.

🔵 Improvement: source/isaaclab_physx/isaaclab_physx/assets/articulation/articulation.py — Cache invalidation not explicit
The cached view wrappers (_root_link_pose_w_f32, etc.) are set to None in _create_buffers(), but this only happens during initialization. If _create_buffers() is ever called again (unlikely but defensive), consider adding explicit self._root_link_pose_w_f32 = None invalidation at the top of the method for all cached views.

🔵 Improvement: source/isaaclab/isaaclab/assets/asset_base.py:277 — __debug__ check could be more explicit
The instance-level _check_shapes flag correctly overrides the class-level default, but the resolution logic uses the implicit __debug__ global. Consider adding a brief comment noting that __debug__ is True in normal Python and False with -O for maintainability:

# __debug__ is True normally, False with `python -O`
if self.cfg.disable_shape_checks is None:
    self._check_shapes = __debug__

🔵 Improvement: Documentation — PR description claims "Bug fix" label but this is a performance optimization
The PR template shows "Bug fix (non-breaking change which fixes an issue)" but this is clearly a performance optimization. Consider updating the checklist to reflect the actual change type to avoid confusion in release notes.

🔵 Minor: source/isaaclab_newton/benchmark/assets/benchmark_rigid_object_collection.py — pre-commit formatting
Based on the pre-commit output, the formatting around object.__setattr__ calls and multiline torch.rand calls needs adjustment. This will be auto-fixed by ./isaaclab.sh -f.

Recommendations

Run ./isaaclab.sh -f to fix pre-commit failures and push
Consider updating PR labels/description to reflect "performance optimization" rather than "bug fix"
The core implementation is sound — ship after CI passes

isaaclab-review-bot

🤖 Isaac Lab Review Bot

Summary

This PR implements four performance optimizations for asset write methods: (1) single-slot data_ptr() caching in _resolve_* methods to avoid repeated wp.from_torch allocations, (2) pre-allocated pinned CPU buffers for PhysX TensorAPI writes, (3) cached .view(wp.float32) results for structured warp arrays, and (4) fused write_joint_state_* kernels that combine position+velocity writes. The implementation is solid and well-organized across both Newton and PhysX backends.

Architecture Impact

Changes are self-contained within the asset classes (Articulation, RigidObject, RigidObjectCollection, DeformableObject) and their kernels. No cross-module API changes—all existing callers remain compatible. The new disable_shape_checks config flag in AssetBaseCfg is optional with sensible defaults.

Implementation Verdict

Ship it — Clean implementation with comprehensive benchmark scripts. The optimizations are well-motivated by measured speedups (1.02x–1.42x on micro-benchmarks). Minor polish items below.

Test Coverage

This PR adds extensive benchmark scripts covering all modified methods, which is valuable for regression detection. However:

🟡 Warning: No unit tests are added to verify the correctness of the new caching logic or fused kernels. The _resolve_* caching in particular has subtle semantics (GC hazards with int64 tensors) that would benefit from targeted tests.
The existing test suite should cover functional correctness, but explicit tests for the cache invalidation logic and fused kernel output equivalence would strengthen confidence.

Recommendation: Consider adding a few unit tests that verify:

_resolve_env_ids returns cached result for same tensor, fresh result for new tensor
int64 tensors bypass the cache correctly
write_joint_state_to_sim_index produces identical results to separate position+velocity calls

CI Status

Unable to check CI status programmatically (GitHub API limitations).

Findings

🔵 Improvement: Newton write_joint_state_to_sim deprecation message is inconsistent
The Newton write_joint_state_to_sim deprecation message says "use write_joint_state_to_sim_index instead" which is good, but the PhysX version should be updated similarly. Currently PhysX still calls the index method but doesn't have the fused kernel optimization.

🔵 Improvement: _check_shapes class-level default uses __debug__
The class-level default _check_shapes: bool = __debug__ is correct, but the comment "Class-level default for shape validation. Overridden per-instance in __init__" could be clearer that this exists to prevent AttributeError when assert_shape_and_dtype is called before __init__ completes.

🔵 Improvement: Benchmark scripts are comprehensive but large
The benchmark scripts are well-written but quite large (600-1300 lines each). Consider extracting common patterns (fill-ratio generators, benchmark definitions) into a shared module to reduce duplication across the 6 new benchmark files.

🔵 Improvement: Minor docstring enhancement opportunity
The disable_shape_checks docstring in AssetBaseCfg is excellent. Consider adding a brief note about the measured overhead savings (you mention ~3us in a commit message but it's not in the user-facing docs).

Notes

The GC hazard fix for int64 tensors is correctly handled—skipping the cache when dtype conversion creates a temporary.
Good use of pinned memory for CPU buffers to enable DMA fast path.
The removal of dead state-buffer output parameters from warp kernels is clean and reduces unnecessary argument marshalling.

isaaclab-review-bot

Performance Optimization Review – Changes Requested

Summary

This PR introduces single-slot caching for _resolve_env_ids, _resolve_body_ids, and _resolve_joint_ids methods to reduce repeated wp.from_torch wrapper allocations. The optimization intent is sound, but CI is failing due to missing attribute initializations.

🚨 Critical Issues (Must Fix)

1. Missing Cache Attribute Initialization in RigidObject Classes

The _cached_env_ids_ptr, _cached_env_ids_wp, _cached_body_ids_ptr, and _cached_body_ids_wp attributes are accessed in _resolve_env_ids and _resolve_body_ids before they're initialized.

Failing Tests:

AttributeError: 'RigidObject' object has no attribute '_cached_env_ids_ptr' (Newton)
AttributeError: 'RigidObject' object has no attribute '_root_link_pose_w_f32' (PhysX)

Root Cause: The cache attributes are initialized in _create_buffers(), but _resolve_env_ids() can be called before buffers are created, or the initialization block is missing entirely in some classes.

Fix Required:

Ensure all asset classes (RigidObject, RigidObjectCollection, Articulation) in both PhysX and Newton backends initialize these cache attributes in __init__ or at the start of _create_buffers().
Verify the initialization order—_create_buffers() must run before any method calls that invoke _resolve_* methods.

2. Pre-commit / Linting Failures

E501 Line too long (121 > 120)
  --> source/isaaclab/isaaclab/assets/asset_base.py:73:121

Two linting errors remain. Please run pre-commit run --all-files locally before pushing.

⚠️ Correctness Concerns

3. Single-Slot Cache Invalidation Edge Cases

The caching strategy uses a single-slot cache keyed on tensor.data_ptr(). This is efficient for the common case (same tensor reused across steps), but has edge cases:

Potential Issue – ABA Problem:

ptr = env_ids.data_ptr()
if self._cached_env_ids_ptr == ptr:
    return self._cached_env_ids_wp  # Returns stale warp array if tensor was reallocated

If a tensor is deallocated and a new tensor is allocated at the same memory address (ABA problem), the cache will incorrectly return the old warp wrapper. This is unlikely in typical RL loops but can occur with dynamic tensor allocation patterns.

Mitigation Suggestion (optional): Consider also comparing env_ids.numel() and env_ids.device as secondary validation, or document this limitation clearly.

4. Cache Not Cleared on Reset

The caches are never explicitly cleared (e.g., during reset() or environment reconfiguration). If environment counts change dynamically, stale cached warp arrays could cause shape mismatches.

📝 Style & Documentation

5. Benchmark Files Added Without Documentation

New benchmark scripts (benchmark_articulation.py, benchmark_rigid_object.py, etc.) are added but:

No --help documentation in the arg parser descriptions
No docstring explaining benchmark methodology or how to interpret results
Consider adding a README.md in the benchmark directory

✅ What Looks Good

The caching pattern itself is appropriate for this use case
The int64→int32 conversion handling (skipping cache) is correct
Consistent application across PhysX and Newton backends
Benchmark infrastructure is valuable for ongoing perf tracking

Required Actions Before Merge

Fix missing attribute initialization in all RigidObject classes
Fix linting errors (line length)
Add test coverage or verify existing tests pass after fixes
Consider adding cache invalidation on reset() (optional but recommended)

…culation-resolve-caching # Conflicts: # source/isaaclab/docs/CHANGELOG.rst # source/isaaclab_newton/docs/CHANGELOG.rst # source/isaaclab_physx/docs/CHANGELOG.rst

Add single-slot cache attributes (_cached_*_ptr, _cached_*_wp), view caches (_root_*_f32, _*_wrench_*_f32), and pinned CPU buffers to test fixture functions that bypass __init__/_create_buffers via object.__new__. Convert PhysX articulation test _ALL_INDICES from torch to warp arrays to match the actual class. Fix line-length lint in asset_base.py and auto-format benchmark file.

The underlying _nodal_pos_w.data / _nodal_vel_w.data arrays are reassigned on every property read (from get_simulation_nodal_positions), so a cached .view(wp.float32) goes stale and writes old data to the binding. Inline the .view() call instead.

DeformableObjectData was reassigning _nodal_pos_w.data and _nodal_vel_w.data to new arrays on every read, violating the TimestampedBufferWarp contract (which allocates stable buffers at init). Use wp.copy into the pre-allocated buffers instead, keeping pointers stable so cached .view(wp.float32) wrappers remain valid. Re-enable the nodal view caching in DeformableObject writers.

Add 5%, 95%, 100% fill-ratio variants for env_ids subsetting to the PhysX RigidObject and RigidObjectCollection benchmarks, matching the pattern already used in the Newton benchmarks and PhysX Articulation.

greptile-apps · 2026-04-21T12:27:39Z

Greptile Summary

This PR introduces a suite of performance optimizations for asset write methods across PhysX and Newton backends (articulations, rigid objects, rigid object collections, deformable objects). The main techniques are: (1) single-slot data_ptr()-keyed caches in _resolve_env_ids/_resolve_joint_ids/_resolve_body_ids to avoid repeated wp.from_torch wrapper allocations; (2) lazy .view(wp.float32) caches for structured warp buffers (root pose, root velocity, wrench force/torque); (3) pre-allocated pinned CPU buffers for PhysX model-property writes (stiffness, damping, masses, COMs, inertias); (4) fused kernel launches combining joint position+velocity+prev_vel+acc writes into a single 2D dispatch; and (5) a new disable_shape_checks config flag that follows Python's __debug__ by default. All cache invalidations are handled correctly in _create_buffers, and the DeformableObjectData stale-view bug is fixed by switching to copy-pull into pre-allocated buffers.

Confidence Score: 5/5

Safe to merge — all optimizations are correctly implemented with proper cache invalidation; remaining comments are minor P2 style/design suggestions.

All P0/P1 correctness, cache-invalidation, and data-consistency issues reviewed are sound. Caches are reset in _create_buffers, warp .view() wrappers are backed by stable pre-allocated buffers, the wrench composer uses in-place zero fills, and fused kernels produce the same outputs as the prior multi-launch paths. Benchmark data confirms measurable speedups. All remaining comments are P2 quality/documentation suggestions that do not block merge.

articulation.py (physx) — partial-env CPU copy overhead and cache documentation; asset_base_cfg.py — shape-check footgun documentation.

Important Files Changed

Filename	Overview
source/isaaclab_physx/isaaclab_physx/assets/articulation/articulation.py	Core optimization target: adds single-slot tensor caches, pinned CPU buffers, `.view(wp.float32)` wrench caches, and fused kernel launches. Full-buffer CPU copy on partial env_id model updates is slightly wasteful; cache docs could be sharper.
source/isaaclab/isaaclab/assets/asset_base_cfg.py	Adds `disable_shape_checks: bool
source/isaaclab_physx/isaaclab_physx/assets/kernels.py	Adds shared write kernels (`set_root_link_pose_to_sim`, `set_root_com_pose_to_sim`, `set_root_com_velocity_to_sim`, etc.); `from_mask`/full-data branching logic is consistent and correct.
source/isaaclab_physx/isaaclab_physx/assets/articulation/kernels.py	New `write_joint_state_data` fuses position+velocity+prev_vel+acc writes into a single 2D kernel launch; logic and index mapping are correct.
source/isaaclab_physx/isaaclab_physx/assets/rigid_object/rigid_object.py	Mirrors articulation optimizations: single-slot tensor caches, pinned CPU buffers for mass/COM/inertia, wrench view caches; all correctly invalidated in `_create_buffers`.
source/isaaclab_physx/isaaclab_physx/assets/deformable_object/deformable_object_data.py	Fixes stale-view issue by copy-pulling into pre-allocated buffers (`nodal_pos_w`, `nodal_vel_w`) rather than returning raw PhysX views; correct approach.
source/isaaclab/isaaclab/assets/asset_base.py	Adds `disable_shape_checks` per-instance flag, `_check_shapes` class-level default, and `assert_shape_and_dtype_mask`; logic is correct and well-documented.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[write_*_to_sim_index called] --> B[_resolve_env_ids / _resolve_joint_ids]
    B --> C{Single-slot ptr cache hit?}
    C -- Yes --> D[Return cached wp.array wrapper]
    C -- No --> E[wp.from_torch + cache update]
    D --> F[wp.launch fused kernel]
    E --> F
    F --> G{Model property write?}
    G -- Yes --> H[_get_cpu_env_ids]
    G -- No --> I[PhysX TensorAPI call]
    H --> J[wp.copy full buffer to pinned CPU
then root_view.set_* indices=cpu_env_ids]
    F --> K{View-typed buffer?}
    K -- Yes --> L[Lazy .view cached getter]
    L --> I
    J --> M[Timestamp invalidation]
    I --> M

_{Reviews (1): Last reviewed commit: "Add fill-ratio benchmark variants to Phy..." | Re-trigger Greptile}

Dismissing: bot incorrectly used CHANGES_REQUESTED instead of COMMENT. Bot policy is COMMENT-only.

isaaclab-review-bot

Code Review: Performance Optimization for Asset Writes

Summary

This PR introduces multiple performance optimizations for setter and writer methods in asset classes. The changes are well-structured and follow a consistent pattern across PhysX and Newton backends. The documented speedups (1.02x–1.42x) are plausible given the eliminated overhead sources.

✅ Strengths

1. Clean Configuration Design
The disable_shape_checks tri-state config (None/True/False) is well-designed:

None → follow __debug__ (sensible default)
True → force disable (production optimization)
False → force enable (useful for debugging even with -O)

2. Consistent Caching Strategy
Single-slot data_ptr() caches in _resolve_* methods are a good trade-off:

Low memory overhead (one slot)
High hit rate for common RL loops where the same tensor is reused
Graceful fallback when tensors differ

3. Proper Cache Invalidation
Caches are reset to None in _create_buffers(), ensuring they're invalidated on re-initialization.

4. Pinned Memory for CPU Transfers
Pre-allocating pinned CPU buffers eliminates per-call malloc overhead for PhysX "model" property updates.

5. Fused Kernels
The new write_joint_state_to_sim_index and write_joint_state_data kernel reduce kernel launch overhead for the common position+velocity write pattern.

⚠️ Issues & Suggestions

1. Single-Slot Cache Thread Safety (Low Risk)

The caching pattern:

ptr = env_ids.data_ptr()
if self._cached_env_ids_ptr == ptr:
    return self._cached_env_ids_wp
result = wp.from_torch(env_ids, dtype=wp.int32)
self._cached_env_ids_ptr = ptr
self._cached_env_ids_wp = result

Is not thread-safe if two threads call these methods concurrently. While IsaacLab likely runs single-threaded, consider documenting this assumption or using a thread-local cache if multi-threaded access is ever expected.

2. Cache Staleness with Tensor Reallocation (Medium Risk)

data_ptr() identifies the memory address, not the tensor content. If a user:

Calls write_joint_position_to_sim_index(env_ids=my_tensor)
Modifies my_tensor in-place
Calls again with the same my_tensor

The cached wp.array will have stale content. This is fine if the underlying memory was also updated (warp wraps the same memory), but could cause issues if PyTorch reallocates. The current implementation is safe because wp.from_torch creates a non-owning view, but this subtlety deserves a comment.

Suggestion: Add a comment in the caching code explaining this relies on wp.from_torch being a non-owning view:

# Safe: wp.from_torch creates a non-owning view; if the tensor's memory
# is modified, the warp array reflects those changes automatically.

3. Missing `_resolve_joint_ids` int64 Handling (Bug)

In _resolve_env_ids, there's explicit int64→int32 conversion:

if env_ids.dtype == torch.int64:
    return wp.from_torch(env_ids.to(torch.int32), dtype=wp.int32)

But _resolve_joint_ids and _resolve_body_ids lack this handling. If a user passes int64 joint_ids, it may fail or produce incorrect results.

Fix: Add the same int64 check to _resolve_joint_ids and _resolve_body_ids.

4. Pinned CPU Buffer Size Assumption (Minor)

The pinned CPU buffers are sized for the full simulation:

self._cpu_joint_stiffness = wp.zeros((N, J), dtype=wp.float32, device="cpu", pinned=True)

This is fine, but the _get_cpu_env_ids method still calls wp.clone for partial indices:

if env_ids.ptr == self._ALL_INDICES.ptr:
    return self._cpu_env_ids_all
return wp.clone(env_ids, device="cpu")  # <-- Still allocates for partial resets

For a more aggressive optimization, you could pre-allocate a smaller pinned buffer for partial indices and copy into it. However, partial resets are typically infrequent enough that this may not matter.

5. Benchmark Test Fixture Duplication

The test fixtures in test_articulation_iface.py and benchmark_articulation.py now have significant overlap in the mock setup. Consider extracting a shared helper to reduce duplication and ensure they stay in sync.

6. Newton Kernel Output Parameters Removal

Removing the None output parameters from Newton kernels:

outputs=[
    self.data.root_link_pose_w,
-   None,  # self.data._root_link_state_w.data,
-   None,  # self.data._root_state_w.data,
],

Is a nice cleanup. Verify these were truly dead code and not vestigial from a planned feature.

📝 Minor Suggestions

Docstring Typos Fixed — Good catch on expect → expects throughout.
CHANGELOG Tilde Alignment — The CHANGELOG headers have inconsistent underline lengths (~~~ vs ~~~~). RST requires the underline to be at least as long as the title. Double-check these render correctly.
Benchmark Fill Ratios — The fill-ratio benchmarks (5%, 95%, 100%) are a nice addition for understanding partial-reset performance. Consider adding these results to the PR description.

🔍 Questions

Wrench Composer Caching Safety: The wrench composer views (_get_inst_wrench_force_f32, etc.) assume the composed_force/torque arrays don't get reallocated. Can you confirm this is guaranteed by the WrenchComposer implementation?
Deformable Object Cache Safety: The comment says "Safe because DeformableObjectData uses wp.copy into stable buffers." Is this stable-buffer contract documented/enforced anywhere?

Verdict

This is a well-executed performance optimization PR. The changes are systematic, well-documented in CHANGELOGs, and follow consistent patterns across backends. The main concerns are:

Minor bug: missing int64 handling in joint/body ID resolvers
Documentation: cache safety assumptions should be explicitly commented

With the int64 fix addressed, this is ready to merge.

Remove the single-slot tensor index wrapper caches from PhysX and Newton assets because tensor views can share data pointers with different shapes. Add regression coverage for tensor-view index resolution and move changelog entries to fresh extension versions.

# Conflicts: # source/isaaclab_newton/docs/CHANGELOG.rst

This rewrites the RigidObject + RigidObjectData write/read paths to follow the canonical PhysX (PR isaac-sim#5329) and Newton conventions, replacing the previous lazy-buffer + invalidate-cache machinery with the upstream patterns. Data class: * Eagerly allocate every per-instance TimestampedBuffer in _create_buffers (called from __init__). Drop _ensure_root_buffers / _ensure_derived_buffers / _ensure_body_prop_buffers and the per-property guard calls. num_instances / num_bodies / body_names are now passed via the constructor instead of set post-construction. * Bridge the wheel's pull-only binding.read(target) API to PhysX's pointer-stable view contract via a lazily-allocated pinned-host wp.array for CPU-only bindings (RIGID_BODY_MASS / COM_POSE / INERTIA). * Fix GRAVITY_VEC_W returning (0, 0, -1) instead of (0, 0, 0) when cfg.gravity is the zero vector. RigidObject writers/setters (PhysX PR isaac-sim#5329 cache-update pattern): * Every set_*_index / write_*_to_sim_index method scatters the user data into the cached _body_*/_root_* buffer via the matching scatter kernel, then pushes the cache to the wheel via binding.write(cache, indices=...). * Pre-allocated pinned-host CPU staging buffers (_cpu_body_mass / _cpu_body_coms / _cpu_body_inertia / _cpu_env_ids_all) are reused across calls to avoid per-call wp.clone(..., device="cpu") allocation. * The cache becomes the single source of truth post-write; the previous _invalidate_caches / _invalidate_root_caches helpers are removed. Mask path (Newton-style native mask): * Add seven mask kernels in isaaclab_ovphysx.assets.kernels: set_root_link_pose_to_sim_mask, set_root_com_pose_to_sim_mask, set_root_com_velocity_to_sim_mask, set_root_link_velocity_to_sim_mask, write_2d_data_to_buffer_with_mask, write_body_inertia_to_buffer_mask, write_body_com_pose_to_buffer_mask. * Each *_mask method scatters into the cache only where the boolean mask is True, then pushes the cache via the wheel's native binding.write(cache, mask=mask_u8) -- no torch.nonzero conversion. * _resolve_env_mask / _resolve_body_mask now return wp.bool masks (defaulting to pre-allocated _ALL_TRUE_*_MASK), not indices. Initialization: * Add USD prim-scan validation to RigidObject._initialize_impl mirroring PhysX. RuntimeError is now raised cleanly when cfg.prim_path resolves to no UsdPhysics.RigidBodyAPI prim, multiple rigid-body prims, or a prim with an enabled UsdPhysics.ArticulationRootAPI, instead of crashing with an obscure TypeError deep in property accessors.

isaaclab-review-bot

🤖 Isaac Lab Review Bot

Summary

This follow-up review examines the new commits addressing the previous review concerns. The PR continues to implement performance optimizations for asset writes with comprehensive benchmarking infrastructure.

Previous Concerns Status

int64 handling in _resolve_joint_ids/_resolve_body_ids: NOT addressed in the diff - the previous concern about missing int64→int32 conversion in joint/body ID resolvers remains unaddressed in the visible changes.
Cache safety documentation: NOT visible in the truncated diff - cannot confirm if comments were added.
Thread safety documentation: NOT visible - cannot confirm if documented.

New Findings

🟡 Warning: test_articulation_iface.py:519-562 — New index resolution tests only cover tensor views, not int64 dtype edge case

The new TestArticulationIndexResolution tests verify that tensor views are handled correctly:

resolved_full = art._resolve_env_ids(env_ids)
resolved_view = art._resolve_env_ids(env_ids[:2])

However, these tests don't cover the int64 dtype case that was flagged in the previous review. Consider adding a test like:

def test_resolve_env_ids_handles_int64_dtype(self, backend):
    art, _ = get_articulation(backend, num_instances=4, device="cpu")
    env_ids_i64 = torch.arange(4, dtype=torch.int64, device="cpu")
    resolved = art._resolve_env_ids(env_ids_i64)
    assert resolved.dtype == wp.int32

🔵 Improvement: The new disable_shape_checks config addition is clean and well-documented in asset_base_cfg.py:78-91

The tri-state design with clear docstrings explaining None/True/False behavior is exactly what was praised in the previous review.

Implementation Verdict

Minor fixes needed — the int64 handling gap from the previous review should be addressed before merge.

Test Coverage

The new index resolution tests in TestArticulationIndexResolution, TestCollectionIndexResolution, and TestRigidObjectIndexResolution are good additions for catching tensor view shape issues, but don't cover the dtype conversion edge case.

CI Status

The Check for Broken Links failure appears unrelated to this PR's changes (documentation link checker). Pre-commit and wheel build passed.

# Conflicts: # source/isaaclab/docs/CHANGELOG.rst # source/isaaclab_physx/docs/CHANGELOG.rst

isaaclab-review-bot

🤖 Isaac Lab Review Bot

Follow-up Review

The new commits don't appear to address the previously flagged concerns:

int64 handling in _resolve_joint_ids/_resolve_body_ids: Still not addressed - the PhysX backend resolvers for joint_ids and body_ids still lack the int64→int32 conversion that exists in _resolve_env_ids.
Index resolution tests: The new tests verify tensor view shapes work correctly but don't test the int64 dtype edge case.

No new issues introduced by the latest commits. The code changes are consistent with the previous review.

Implementation Verdict

Minor fixes needed — previous int64 handling gap should be addressed before merge.

isaaclab-review-bot

🤖 Isaac Lab Review Bot

Follow-up Review

The previous concern about int64 handling in _resolve_joint_ids/_resolve_body_ids remains unaddressed. The PhysX backend's _resolve_env_ids has explicit int64→int32 conversion, but the joint_ids and body_ids resolvers still lack this safety check.

Looking at the new commits, no changes were made to the resolver methods in the PhysX articulation implementation. The code changes are primarily:

Newton benchmark additions (new files)
Test infrastructure improvements
No modifications to the flagged resolver methods

Implementation Verdict

Minor fixes needed — the int64 dtype handling gap in joint/body ID resolvers should still be addressed for API consistency.

AntoineRichard added 12 commits April 20, 2026 16:12

Cache warp .view(wp.float32) results in PhysX writers

52f6d4c

Pre-compute and cache float32 views of structured warp arrays used by PhysX TensorAPI calls. Eliminates per-call wp.array wrapper allocation in root pose and velocity writers.

Update benchmark create_test_articulation for new cache fields

9b74105

Add missing _cached_*_ptr/_wp fields, _root_*_f32 view caches, and pinned CPU buffers to benchmark mock articulation constructors so they match the new attributes added by the optimization commits.

Cache wrench .flatten().view() results in PhysX write_data_to_sim

d908d37

Cache the flattened float32 views of wrench composer force and torque arrays to avoid creating new wp.array wrapper objects every simulation step.

Skip resolve cache for int64 tensors to avoid GC hazard

dbf1932

int64→int32 conversion creates a temporary tensor whose backing memory can be reclaimed. Return wp.from_torch directly without caching.

github-actions Bot added documentation Improvements or additions to documentation isaac-lab Related to Isaac Lab team labels Apr 20, 2026

AntoineRichard added 8 commits April 20, 2026 18:46

Remove superpowers planning docs

e16d1a5

Add resolve cache to DeformableObject and Newton RigidObject/Collection

1012457

Apply single-slot data_ptr() caching to _resolve_env_ids and _resolve_body_ids in PhysX DeformableObject, Newton RigidObject, and Newton RigidObjectCollection.

Fix _check_shapes AttributeError by adding class-level default

91c6c08

The _check_shapes attribute was only set in __init__, but assert_shape_and_dtype can be called before __init__ completes (e.g. during _initialize_impl callbacks). Add a class-level default so the attribute always exists.

Add Newton RigidObjectCollection and CollectionData benchmarks

f3ca173

Add micro-benchmark scripts for Newton RigidObjectCollection setter/ writer methods and CollectionData property access, following the same patterns as the existing Newton articulation benchmarks.

Add fill-ratio benchmark variants to PhysX articulation benchmark

abb8b34

isaaclab-review-bot Bot reviewed Apr 21, 2026

View reviewed changes

isaaclab-review-bot Bot previously requested changes Apr 21, 2026

View reviewed changes

AntoineRichard added 4 commits April 21, 2026 11:47

Merge remote-tracking branch 'origin/develop' into antoiner/perf/arti…

30599f7

…culation-resolve-caching # Conflicts: # source/isaaclab/docs/CHANGELOG.rst # source/isaaclab_newton/docs/CHANGELOG.rst # source/isaaclab_physx/docs/CHANGELOG.rst

AntoineRichard marked this pull request as ready for review April 21, 2026 12:19

AntoineRichard requested review from Mayankm96, jtigue-bdai and kellyguo11 as code owners April 21, 2026 12:19

Add fill-ratio benchmark variants to PhysX RigidObject and Collection

d92754c

Add 5%, 95%, 100% fill-ratio variants for env_ids subsetting to the PhysX RigidObject and RigidObjectCollection benchmarks, matching the pattern already used in the Newton benchmarks and PhysX Articulation.

greptile-apps Bot reviewed Apr 21, 2026

View reviewed changes

isaaclab-review-bot Bot reviewed Apr 21, 2026

View reviewed changes

AntoineRichard commented Apr 23, 2026

View reviewed changes

Comment thread source/isaaclab_physx/isaaclab_physx/assets/articulation/articulation.py

Comment thread source/isaaclab_physx/isaaclab_physx/assets/articulation/articulation.py

AntoineRichard added 3 commits April 28, 2026 09:01

Merge develop into PR 5329

689671a

Merge remote-tracking branch 'origin/develop' into pr-5329

946c130

# Conflicts: # source/isaaclab_newton/docs/CHANGELOG.rst

isaaclab-review-bot Bot reviewed Apr 29, 2026

View reviewed changes

kellyguo11 approved these changes Apr 30, 2026

View reviewed changes

Merge remote-tracking branch 'origin/develop' into pr-5329

3a085b8

# Conflicts: # source/isaaclab/docs/CHANGELOG.rst # source/isaaclab_physx/docs/CHANGELOG.rst

isaaclab-review-bot Bot reviewed Apr 30, 2026

View reviewed changes

Merge branch 'develop' into antoiner/perf/articulation-resolve-caching

65d8ae5

isaaclab-review-bot Bot reviewed Apr 30, 2026

View reviewed changes

This was referenced Apr 30, 2026

[OVPHYSX] Articulation rewrite (data class + asset class + kernels) AntoineRichard/IsaacLab#10

Draft

[OVPHYSX] Articulation rewrite (data class + asset class + kernels) #5459

Draft

kellyguo11 merged commit e57269f into isaac-sim:develop Apr 30, 2026
70 of 73 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance optimization on asset writes#5329

Performance optimization on asset writes#5329
kellyguo11 merged 30 commits intoisaac-sim:developfrom
AntoineRichard:antoiner/perf/articulation-resolve-caching

AntoineRichard commented Apr 20, 2026

Uh oh!

isaaclab-review-bot Bot left a comment

Uh oh!

isaaclab-review-bot Bot left a comment

Uh oh!

isaaclab-review-bot Bot left a comment

Uh oh!

greptile-apps Bot commented Apr 21, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

isaaclab-review-bot Bot left a comment

Uh oh!

Uh oh!

Uh oh!

isaaclab-review-bot Bot left a comment

Uh oh!

isaaclab-review-bot Bot left a comment

Uh oh!

isaaclab-review-bot Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

AntoineRichard commented Apr 20, 2026

Description

Type of change

Micro benchmarks

Checklist

Uh oh!

isaaclab-review-bot Bot left a comment

Choose a reason for hiding this comment

🤖 Isaac Lab Review Bot

Summary

Architecture Impact

Implementation Verdict

Test Coverage

CI Status

Findings

Recommendations

Uh oh!

isaaclab-review-bot Bot left a comment

Choose a reason for hiding this comment

🤖 Isaac Lab Review Bot

Summary

Architecture Impact

Implementation Verdict

Test Coverage

CI Status

Findings

Notes

Uh oh!

isaaclab-review-bot Bot left a comment

Choose a reason for hiding this comment

Performance Optimization Review – Changes Requested

Summary

🚨 Critical Issues (Must Fix)

1. Missing Cache Attribute Initialization in RigidObject Classes

2. Pre-commit / Linting Failures

⚠️ Correctness Concerns

3. Single-Slot Cache Invalidation Edge Cases

4. Cache Not Cleared on Reset

📝 Style & Documentation

5. Benchmark Files Added Without Documentation

✅ What Looks Good

Required Actions Before Merge

Uh oh!

greptile-apps Bot commented Apr 21, 2026

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

isaaclab-review-bot Bot left a comment

Choose a reason for hiding this comment

Code Review: Performance Optimization for Asset Writes

Summary

✅ Strengths

⚠️ Issues & Suggestions

1. Single-Slot Cache Thread Safety (Low Risk)

2. Cache Staleness with Tensor Reallocation (Medium Risk)

3. Missing _resolve_joint_ids int64 Handling (Bug)

4. Pinned CPU Buffer Size Assumption (Minor)

5. Benchmark Test Fixture Duplication

6. Newton Kernel Output Parameters Removal

📝 Minor Suggestions

🔍 Questions

Verdict

Uh oh!

Uh oh!

Uh oh!

isaaclab-review-bot Bot left a comment

Choose a reason for hiding this comment

🤖 Isaac Lab Review Bot

Summary

Previous Concerns Status

New Findings

Implementation Verdict

Test Coverage

3. Missing `_resolve_joint_ids` int64 Handling (Bug)