Skip to content

Performance optimization on asset writes#5329

Merged
kellyguo11 merged 30 commits intoisaac-sim:developfrom
AntoineRichard:antoiner/perf/articulation-resolve-caching
Apr 30, 2026
Merged

Performance optimization on asset writes#5329
kellyguo11 merged 30 commits intoisaac-sim:developfrom
AntoineRichard:antoiner/perf/articulation-resolve-caching

Conversation

@AntoineRichard
Copy link
Copy Markdown
Collaborator

Description

Improves performance across the board for asset writes.

Type of change

  • Bug fix (non-breaking change which fixes an issue)

Micro benchmarks

  ┌─────────────────────────────────┬────────┬───────┬─────────┐
  │             Method              │ Before │ After │ Speedup │
  ├─────────────────────────────────┼────────┼───────┼─────────┤
  │ write_root_link_pose_to_sim     │ 94     │ 81    │ 1.16x   │
  ├─────────────────────────────────┼────────┼───────┼─────────┤
  │ write_root_com_pose_to_sim      │ 111    │ 90    │ 1.23x   │
  ├─────────────────────────────────┼────────┼───────┼─────────┤
  │ write_root_link_velocity_to_sim │ 128    │ 124   │ 1.03x   │
  ├─────────────────────────────────┼────────┼───────┼─────────┤
  │ write_root_com_velocity_to_sim  │ 110    │ 108   │ 1.02x   │
  ├─────────────────────────────────┼────────┼───────┼─────────┤
  │ write_joint_state_to_sim        │ 180    │ 127   │ 1.42x   │
  ├─────────────────────────────────┼────────┼───────┼─────────┤
  │ write_joint_position_to_sim     │ 98     │ 108   │ ~1.0x   │
  ├─────────────────────────────────┼────────┼───────┼─────────┤
  │ set_joint_position_target       │ 94     │ 107   │ ~1.0x   │
  ├─────────────────────────────────┼────────┼───────┼─────────┤
  │ write_joint_stiffness_to_sim    │ 171    │ 152   │ 1.13x   │
  ├─────────────────────────────────┼────────┼───────┼─────────┤
  │ write_joint_damping_to_sim      │ 177    │ 151   │ 1.17x   │
  ├─────────────────────────────────┼────────┼───────┼─────────┤
  │ write_joint_armature_to_sim     │ 170    │ 153   │ 1.11x   │
  ├─────────────────────────────────┼────────┼───────┼─────────┤
  │ set_masses                      │ 177    │ 152   │ 1.16x   │
  ├─────────────────────────────────┼────────┼───────┼─────────┤
  │ set_coms                        │ 350    │ 272   │ 1.29x   │
  ├─────────────────────────────────┼────────┼───────┼─────────┤
  │ set_inertias                    │ 350    │ 280   │ 1.25x   │
  └─────────────────────────────────┴────────┴───────┴─────────┘

Checklist

  • I have read and understood the contribution guidelines
  • I have run the pre-commit checks with ./isaaclab.sh --format
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • I have updated the changelog and the corresponding version in the extension's config/extension.toml file
  • I have added my name to the CONTRIBUTORS.md or my name already exists there

Add benchmark scripts for both PhysX and Newton backends covering
ArticulationData property access and Articulation setter/writer methods.
Includes torch_list, torch_tensor, warp_mask modes and fill-ratio
variants (5%, 95%, 100%). Add design spec and implementation plan for
four performance optimizations targeting Python dispatch overhead.
Slim 8 warp kernels (index + mask variants for root link pose, root COM
pose, root COM velocity, root link velocity) to remove output parameters
that are always passed as None. Reduces arg marshalling overhead and
simplifies generated GPU code.
Add single-slot caches keyed on tensor.data_ptr() to _resolve_env_ids,
_resolve_joint_ids, and _resolve_body_ids in both PhysX and Newton
backends. Eliminates redundant wp.array wrapper allocations when the
same tensor is passed across training steps.
Pre-compute and cache float32 views of structured warp arrays used by
PhysX TensorAPI calls. Eliminates per-call wp.array wrapper allocation
in root pose and velocity writers.
Replace per-call wp.clone(device='cpu') with wp.copy into pre-allocated
pinned CPU buffers for all joint property and body property writers.
Pinned memory enables DMA fast path and eliminates CPU malloc overhead.
Full-index env_ids also use a pre-allocated pinned buffer.
Add missing _cached_*_ptr/_wp fields, _root_*_f32 view caches, and
pinned CPU buffers to benchmark mock articulation constructors so they
match the new attributes added by the optimization commits.
Add AssetBaseCfg.disable_shape_checks that controls whether
assert_shape_and_dtype and assert_shape_and_dtype_mask run validation.
When None (default), follows Python's __debug__ flag. When True,
disables checks entirely for ~3us savings per setter/writer call.
The flag is resolved once at init to a bool _check_shapes field.
Also add --no_shape_checks CLI flag to articulation benchmarks.
Cache the flattened float32 views of wrench composer force and torque
arrays to avoid creating new wp.array wrapper objects every simulation
step.
Add fused write_joint_state_data kernels (index + mask variants) for
both Newton and PhysX backends. write_joint_state_to_sim_mask and the
deprecated write_joint_state_to_sim now resolve indices and launch a
single kernel instead of two separate calls, saving ~50-80us per call.
- Add CHANGELOG entries and version bumps for isaaclab (4.6.7),
  isaaclab_newton (0.5.17), and isaaclab_physx (0.5.19).
- Fix _resolve_env_ids int64 cache GC hazard: move data_ptr() capture
  after dtype conversion so the cache key matches the actual tensor.
- Add write_joint_state_to_sim_index to Newton for API parity with PhysX.
- Update Newton deprecated write_joint_state_to_sim to delegate to the
  new fused index method with correct docstring and warning message.
- Fix misleading "Recreated in _create_simulation_bindings" comment in
  PhysX _create_buffers.
int64→int32 conversion creates a temporary tensor whose backing memory
can be reclaimed. Return wp.from_torch directly without caching.
- Fix "This method expect" → "This method expects" across all PhysX
  articulation docstrings.
- Rename kernel param from_mask → full_data in write_joint_state_data
  for consistency with caller.
- Add invalidation notes to cached view helper docstrings.
- Replace "infrequent path" with "partial resets" in _get_cpu_env_ids.
- Remove stale timing estimate from disable_shape_checks docstring and
  document the False case.
@github-actions github-actions Bot added documentation Improvements or additions to documentation isaac-lab Related to Isaac Lab team labels Apr 20, 2026
…ernels

Slim 8 PhysX warp kernels (4 root + 4 body variants) to remove output
parameters for deprecated state buffers that are always passed as None.
Update all 12 call sites across Articulation, RigidObject, and
RigidObjectCollection to remove the None entries from outputs lists.
Apply single-slot data_ptr() caching to _resolve_env_ids and
_resolve_body_ids in PhysX DeformableObject, Newton RigidObject,
and Newton RigidObjectCollection.
Cache .view(wp.float32) results in PhysX RigidObject, DeformableObject,
and wrench .flatten().view() in RigidObject/Collection write_data_to_sim.
Pre-allocate pinned CPU buffers for RigidObject set_masses/set_coms/
set_inertias.
The _check_shapes attribute was only set in __init__, but
assert_shape_and_dtype can be called before __init__ completes (e.g.
during _initialize_impl callbacks). Add a class-level default so the
attribute always exists.
Add micro-benchmark scripts for Newton RigidObject setter/writer methods
(torch_list, torch_tensor, warp_mask modes with fill-ratio variants) and
RigidObjectData property access, following the same patterns as the
existing Newton articulation benchmarks.
Add micro-benchmark scripts for Newton RigidObjectCollection setter/
writer methods and CollectionData property access, following the same
patterns as the existing Newton articulation benchmarks.
Copy link
Copy Markdown

@isaaclab-review-bot isaaclab-review-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Isaac Lab Review Bot

Summary

This PR delivers a solid performance optimization for asset write operations by introducing single-slot caching for _resolve_* methods, pre-allocated pinned CPU buffers for PhysX TensorAPI writes, cached .view(wp.float32) wrappers for structured warp arrays, and a fused write_joint_state_to_sim_index kernel. The implementation is well-structured and aligns with the claimed speedups (1.16x-1.42x on hot paths).

Architecture Impact

Low risk. Changes are internal implementation optimizations that maintain public API compatibility:

  • _resolve_env_ids, _resolve_joint_ids, _resolve_body_ids now use single-slot pointer caching to avoid repeated wp.from_torch allocations
  • PhysX TensorAPI writes now use pre-allocated pinned CPU buffers instead of per-call wp.clone(..., device="cpu")
  • The new disable_shape_checks config option provides a documented escape hatch for production workloads
  • Kernel output signature changes are internal (removed unused None outputs for state arrays)

Implementation Verdict

Minor fixes needed — Clean implementation with good performance wins. A few issues require attention before merge.

Test Coverage

Adequate for performance optimization PR. The extensive micro-benchmarking framework added in benchmark/assets/ serves as validation infrastructure. The PR does not claim to fix bugs, so no regression tests are required. Existing test coverage through isaaclab_physx and isaaclab_newton test suites validates correctness. The CI test failures (test_deformable_object.py::test_set_nodal_state_with_applied_transform) are unrelated to this PR.

CI Status

  • pre-commit — Formatting issues in benchmark files (code will need ./isaaclab.sh -f)
  • license-check — Pre-existing infrastructure issue (ovphysx has license: LicenseRef-NVIDIA-Omniverse), unrelated to PR
  • isaaclab_physx tests — 4 failures in test_deformable_object.py (deformable object tests), unrelated to this PR's articulation/rigid object changes
  • ✅ Core tests, build, wheel, docs all passing

Findings

🟡 Warning: pre-commit CI failure — formatting issues need fixing
The benchmark files have formatting that pre-commit rejects. Run ./isaaclab.sh -f to auto-fix and push.

🔵 Improvement: source/isaaclab_physx/isaaclab_physx/assets/articulation/articulation.py — Cache invalidation not explicit
The cached view wrappers (_root_link_pose_w_f32, etc.) are set to None in _create_buffers(), but this only happens during initialization. If _create_buffers() is ever called again (unlikely but defensive), consider adding explicit self._root_link_pose_w_f32 = None invalidation at the top of the method for all cached views.

🔵 Improvement: source/isaaclab/isaaclab/assets/asset_base.py:277 — __debug__ check could be more explicit
The instance-level _check_shapes flag correctly overrides the class-level default, but the resolution logic uses the implicit __debug__ global. Consider adding a brief comment noting that __debug__ is True in normal Python and False with -O for maintainability:

# __debug__ is True normally, False with `python -O`
if self.cfg.disable_shape_checks is None:
    self._check_shapes = __debug__

🔵 Improvement: Documentation — PR description claims "Bug fix" label but this is a performance optimization
The PR template shows "Bug fix (non-breaking change which fixes an issue)" but this is clearly a performance optimization. Consider updating the checklist to reflect the actual change type to avoid confusion in release notes.

🔵 Minor: source/isaaclab_newton/benchmark/assets/benchmark_rigid_object_collection.py — pre-commit formatting
Based on the pre-commit output, the formatting around object.__setattr__ calls and multiline torch.rand calls needs adjustment. This will be auto-fixed by ./isaaclab.sh -f.

Recommendations

  1. Run ./isaaclab.sh -f to fix pre-commit failures and push
  2. Consider updating PR labels/description to reflect "performance optimization" rather than "bug fix"
  3. The core implementation is sound — ship after CI passes

Copy link
Copy Markdown

@isaaclab-review-bot isaaclab-review-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Isaac Lab Review Bot

Summary

This PR implements four performance optimizations for asset write methods: (1) single-slot data_ptr() caching in _resolve_* methods to avoid repeated wp.from_torch allocations, (2) pre-allocated pinned CPU buffers for PhysX TensorAPI writes, (3) cached .view(wp.float32) results for structured warp arrays, and (4) fused write_joint_state_* kernels that combine position+velocity writes. The implementation is solid and well-organized across both Newton and PhysX backends.

Architecture Impact

Changes are self-contained within the asset classes (Articulation, RigidObject, RigidObjectCollection, DeformableObject) and their kernels. No cross-module API changes—all existing callers remain compatible. The new disable_shape_checks config flag in AssetBaseCfg is optional with sensible defaults.

Implementation Verdict

Ship it — Clean implementation with comprehensive benchmark scripts. The optimizations are well-motivated by measured speedups (1.02x–1.42x on micro-benchmarks). Minor polish items below.

Test Coverage

This PR adds extensive benchmark scripts covering all modified methods, which is valuable for regression detection. However:

  • 🟡 Warning: No unit tests are added to verify the correctness of the new caching logic or fused kernels. The _resolve_* caching in particular has subtle semantics (GC hazards with int64 tensors) that would benefit from targeted tests.
  • The existing test suite should cover functional correctness, but explicit tests for the cache invalidation logic and fused kernel output equivalence would strengthen confidence.

Recommendation: Consider adding a few unit tests that verify:

  1. _resolve_env_ids returns cached result for same tensor, fresh result for new tensor
  2. int64 tensors bypass the cache correctly
  3. write_joint_state_to_sim_index produces identical results to separate position+velocity calls

CI Status

Unable to check CI status programmatically (GitHub API limitations).

Findings

🔵 Improvement: Newton write_joint_state_to_sim deprecation message is inconsistent
The Newton write_joint_state_to_sim deprecation message says "use write_joint_state_to_sim_index instead" which is good, but the PhysX version should be updated similarly. Currently PhysX still calls the index method but doesn't have the fused kernel optimization.

🔵 Improvement: _check_shapes class-level default uses __debug__
The class-level default _check_shapes: bool = __debug__ is correct, but the comment "Class-level default for shape validation. Overridden per-instance in __init__" could be clearer that this exists to prevent AttributeError when assert_shape_and_dtype is called before __init__ completes.

🔵 Improvement: Benchmark scripts are comprehensive but large
The benchmark scripts are well-written but quite large (600-1300 lines each). Consider extracting common patterns (fill-ratio generators, benchmark definitions) into a shared module to reduce duplication across the 6 new benchmark files.

🔵 Improvement: Minor docstring enhancement opportunity
The disable_shape_checks docstring in AssetBaseCfg is excellent. Consider adding a brief note about the measured overhead savings (you mention ~3us in a commit message but it's not in the user-facing docs).

Notes

  • The GC hazard fix for int64 tensors is correctly handled—skipping the cache when dtype conversion creates a temporary.
  • Good use of pinned memory for CPU buffers to enable DMA fast path.
  • The removal of dead state-buffer output parameters from warp kernels is clean and reduces unnecessary argument marshalling.

Copy link
Copy Markdown

@isaaclab-review-bot isaaclab-review-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Performance Optimization Review – Changes Requested

Summary

This PR introduces single-slot caching for _resolve_env_ids, _resolve_body_ids, and _resolve_joint_ids methods to reduce repeated wp.from_torch wrapper allocations. The optimization intent is sound, but CI is failing due to missing attribute initializations.


🚨 Critical Issues (Must Fix)

1. Missing Cache Attribute Initialization in RigidObject Classes

The _cached_env_ids_ptr, _cached_env_ids_wp, _cached_body_ids_ptr, and _cached_body_ids_wp attributes are accessed in _resolve_env_ids and _resolve_body_ids before they're initialized.

Failing Tests:

  • AttributeError: 'RigidObject' object has no attribute '_cached_env_ids_ptr' (Newton)
  • AttributeError: 'RigidObject' object has no attribute '_root_link_pose_w_f32' (PhysX)

Root Cause: The cache attributes are initialized in _create_buffers(), but _resolve_env_ids() can be called before buffers are created, or the initialization block is missing entirely in some classes.

Fix Required:

  1. Ensure all asset classes (RigidObject, RigidObjectCollection, Articulation) in both PhysX and Newton backends initialize these cache attributes in __init__ or at the start of _create_buffers().
  2. Verify the initialization order—_create_buffers() must run before any method calls that invoke _resolve_* methods.

2. Pre-commit / Linting Failures

E501 Line too long (121 > 120)
  --> source/isaaclab/isaaclab/assets/asset_base.py:73:121

Two linting errors remain. Please run pre-commit run --all-files locally before pushing.


⚠️ Correctness Concerns

3. Single-Slot Cache Invalidation Edge Cases

The caching strategy uses a single-slot cache keyed on tensor.data_ptr(). This is efficient for the common case (same tensor reused across steps), but has edge cases:

Potential Issue – ABA Problem:

ptr = env_ids.data_ptr()
if self._cached_env_ids_ptr == ptr:
    return self._cached_env_ids_wp  # Returns stale warp array if tensor was reallocated

If a tensor is deallocated and a new tensor is allocated at the same memory address (ABA problem), the cache will incorrectly return the old warp wrapper. This is unlikely in typical RL loops but can occur with dynamic tensor allocation patterns.

Mitigation Suggestion (optional): Consider also comparing env_ids.numel() and env_ids.device as secondary validation, or document this limitation clearly.

4. Cache Not Cleared on Reset

The caches are never explicitly cleared (e.g., during reset() or environment reconfiguration). If environment counts change dynamically, stale cached warp arrays could cause shape mismatches.


📝 Style & Documentation

5. Benchmark Files Added Without Documentation

New benchmark scripts (benchmark_articulation.py, benchmark_rigid_object.py, etc.) are added but:

  • No --help documentation in the arg parser descriptions
  • No docstring explaining benchmark methodology or how to interpret results
  • Consider adding a README.md in the benchmark directory

✅ What Looks Good

  • The caching pattern itself is appropriate for this use case
  • The int64→int32 conversion handling (skipping cache) is correct
  • Consistent application across PhysX and Newton backends
  • Benchmark infrastructure is valuable for ongoing perf tracking

Required Actions Before Merge

  1. Fix missing attribute initialization in all RigidObject classes
  2. Fix linting errors (line length)
  3. Add test coverage or verify existing tests pass after fixes
  4. Consider adding cache invalidation on reset() (optional but recommended)

…culation-resolve-caching

# Conflicts:
#	source/isaaclab/docs/CHANGELOG.rst
#	source/isaaclab_newton/docs/CHANGELOG.rst
#	source/isaaclab_physx/docs/CHANGELOG.rst
Add single-slot cache attributes (_cached_*_ptr, _cached_*_wp), view
caches (_root_*_f32, _*_wrench_*_f32), and pinned CPU buffers to test
fixture functions that bypass __init__/_create_buffers via object.__new__.
Convert PhysX articulation test _ALL_INDICES from torch to warp arrays
to match the actual class. Fix line-length lint in asset_base.py and
auto-format benchmark file.
The underlying _nodal_pos_w.data / _nodal_vel_w.data arrays are
reassigned on every property read (from get_simulation_nodal_positions),
so a cached .view(wp.float32) goes stale and writes old data to the
binding. Inline the .view() call instead.
DeformableObjectData was reassigning _nodal_pos_w.data and
_nodal_vel_w.data to new arrays on every read, violating the
TimestampedBufferWarp contract (which allocates stable buffers at
init). Use wp.copy into the pre-allocated buffers instead, keeping
pointers stable so cached .view(wp.float32) wrappers remain valid.
Re-enable the nodal view caching in DeformableObject writers.
@AntoineRichard AntoineRichard marked this pull request as ready for review April 21, 2026 12:19
Add 5%, 95%, 100% fill-ratio variants for env_ids subsetting to the
PhysX RigidObject and RigidObjectCollection benchmarks, matching the
pattern already used in the Newton benchmarks and PhysX Articulation.
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 21, 2026

Greptile Summary

This PR introduces a suite of performance optimizations for asset write methods across PhysX and Newton backends (articulations, rigid objects, rigid object collections, deformable objects). The main techniques are: (1) single-slot data_ptr()-keyed caches in _resolve_env_ids/_resolve_joint_ids/_resolve_body_ids to avoid repeated wp.from_torch wrapper allocations; (2) lazy .view(wp.float32) caches for structured warp buffers (root pose, root velocity, wrench force/torque); (3) pre-allocated pinned CPU buffers for PhysX model-property writes (stiffness, damping, masses, COMs, inertias); (4) fused kernel launches combining joint position+velocity+prev_vel+acc writes into a single 2D dispatch; and (5) a new disable_shape_checks config flag that follows Python's __debug__ by default. All cache invalidations are handled correctly in _create_buffers, and the DeformableObjectData stale-view bug is fixed by switching to copy-pull into pre-allocated buffers.

Confidence Score: 5/5

Safe to merge — all optimizations are correctly implemented with proper cache invalidation; remaining comments are minor P2 style/design suggestions.

All P0/P1 correctness, cache-invalidation, and data-consistency issues reviewed are sound. Caches are reset in _create_buffers, warp .view() wrappers are backed by stable pre-allocated buffers, the wrench composer uses in-place zero fills, and fused kernels produce the same outputs as the prior multi-launch paths. Benchmark data confirms measurable speedups. All remaining comments are P2 quality/documentation suggestions that do not block merge.

articulation.py (physx) — partial-env CPU copy overhead and cache documentation; asset_base_cfg.py — shape-check footgun documentation.

Important Files Changed

Filename Overview
source/isaaclab_physx/isaaclab_physx/assets/articulation/articulation.py Core optimization target: adds single-slot tensor caches, pinned CPU buffers, .view(wp.float32) wrench caches, and fused kernel launches. Full-buffer CPU copy on partial env_id model updates is slightly wasteful; cache docs could be sharper.
source/isaaclab/isaaclab/assets/asset_base_cfg.py Adds `disable_shape_checks: bool
source/isaaclab_physx/isaaclab_physx/assets/kernels.py Adds shared write kernels (set_root_link_pose_to_sim, set_root_com_pose_to_sim, set_root_com_velocity_to_sim, etc.); from_mask/full-data branching logic is consistent and correct.
source/isaaclab_physx/isaaclab_physx/assets/articulation/kernels.py New write_joint_state_data fuses position+velocity+prev_vel+acc writes into a single 2D kernel launch; logic and index mapping are correct.
source/isaaclab_physx/isaaclab_physx/assets/rigid_object/rigid_object.py Mirrors articulation optimizations: single-slot tensor caches, pinned CPU buffers for mass/COM/inertia, wrench view caches; all correctly invalidated in _create_buffers.
source/isaaclab_physx/isaaclab_physx/assets/deformable_object/deformable_object_data.py Fixes stale-view issue by copy-pulling into pre-allocated buffers (nodal_pos_w, nodal_vel_w) rather than returning raw PhysX views; correct approach.
source/isaaclab/isaaclab/assets/asset_base.py Adds disable_shape_checks per-instance flag, _check_shapes class-level default, and assert_shape_and_dtype_mask; logic is correct and well-documented.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[write_*_to_sim_index called] --> B[_resolve_env_ids / _resolve_joint_ids]
    B --> C{Single-slot ptr cache hit?}
    C -- Yes --> D[Return cached wp.array wrapper]
    C -- No --> E[wp.from_torch + cache update]
    D --> F[wp.launch fused kernel]
    E --> F
    F --> G{Model property write?}
    G -- Yes --> H[_get_cpu_env_ids]
    G -- No --> I[PhysX TensorAPI call]
    H --> J[wp.copy full buffer to pinned CPU
then root_view.set_* indices=cpu_env_ids]
    F --> K{View-typed buffer?}
    K -- Yes --> L[Lazy .view cached getter]
    L --> I
    J --> M[Timestamp invalidation]
    I --> M
Loading

Reviews (1): Last reviewed commit: "Add fill-ratio benchmark variants to Phy..." | Re-trigger Greptile

Comment thread source/isaaclab/isaaclab/assets/asset_base_cfg.py
@isaaclab-review-bot isaaclab-review-bot Bot dismissed their stale review April 21, 2026 16:13

Dismissing: bot incorrectly used CHANGES_REQUESTED instead of COMMENT. Bot policy is COMMENT-only.

Copy link
Copy Markdown

@isaaclab-review-bot isaaclab-review-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review: Performance Optimization for Asset Writes

Summary

This PR introduces multiple performance optimizations for setter and writer methods in asset classes. The changes are well-structured and follow a consistent pattern across PhysX and Newton backends. The documented speedups (1.02x–1.42x) are plausible given the eliminated overhead sources.


✅ Strengths

1. Clean Configuration Design
The disable_shape_checks tri-state config (None/True/False) is well-designed:

  • None → follow __debug__ (sensible default)
  • True → force disable (production optimization)
  • False → force enable (useful for debugging even with -O)

2. Consistent Caching Strategy
Single-slot data_ptr() caches in _resolve_* methods are a good trade-off:

  • Low memory overhead (one slot)
  • High hit rate for common RL loops where the same tensor is reused
  • Graceful fallback when tensors differ

3. Proper Cache Invalidation
Caches are reset to None in _create_buffers(), ensuring they're invalidated on re-initialization.

4. Pinned Memory for CPU Transfers
Pre-allocating pinned CPU buffers eliminates per-call malloc overhead for PhysX "model" property updates.

5. Fused Kernels
The new write_joint_state_to_sim_index and write_joint_state_data kernel reduce kernel launch overhead for the common position+velocity write pattern.


⚠️ Issues & Suggestions

1. Single-Slot Cache Thread Safety (Low Risk)

The caching pattern:

ptr = env_ids.data_ptr()
if self._cached_env_ids_ptr == ptr:
    return self._cached_env_ids_wp
result = wp.from_torch(env_ids, dtype=wp.int32)
self._cached_env_ids_ptr = ptr
self._cached_env_ids_wp = result

Is not thread-safe if two threads call these methods concurrently. While IsaacLab likely runs single-threaded, consider documenting this assumption or using a thread-local cache if multi-threaded access is ever expected.

2. Cache Staleness with Tensor Reallocation (Medium Risk)

data_ptr() identifies the memory address, not the tensor content. If a user:

  1. Calls write_joint_position_to_sim_index(env_ids=my_tensor)
  2. Modifies my_tensor in-place
  3. Calls again with the same my_tensor

The cached wp.array will have stale content. This is fine if the underlying memory was also updated (warp wraps the same memory), but could cause issues if PyTorch reallocates. The current implementation is safe because wp.from_torch creates a non-owning view, but this subtlety deserves a comment.

Suggestion: Add a comment in the caching code explaining this relies on wp.from_torch being a non-owning view:

# Safe: wp.from_torch creates a non-owning view; if the tensor's memory
# is modified, the warp array reflects those changes automatically.

3. Missing _resolve_joint_ids int64 Handling (Bug)

In _resolve_env_ids, there's explicit int64→int32 conversion:

if env_ids.dtype == torch.int64:
    return wp.from_torch(env_ids.to(torch.int32), dtype=wp.int32)

But _resolve_joint_ids and _resolve_body_ids lack this handling. If a user passes int64 joint_ids, it may fail or produce incorrect results.

Fix: Add the same int64 check to _resolve_joint_ids and _resolve_body_ids.

4. Pinned CPU Buffer Size Assumption (Minor)

The pinned CPU buffers are sized for the full simulation:

self._cpu_joint_stiffness = wp.zeros((N, J), dtype=wp.float32, device="cpu", pinned=True)

This is fine, but the _get_cpu_env_ids method still calls wp.clone for partial indices:

if env_ids.ptr == self._ALL_INDICES.ptr:
    return self._cpu_env_ids_all
return wp.clone(env_ids, device="cpu")  # <-- Still allocates for partial resets

For a more aggressive optimization, you could pre-allocate a smaller pinned buffer for partial indices and copy into it. However, partial resets are typically infrequent enough that this may not matter.

5. Benchmark Test Fixture Duplication

The test fixtures in test_articulation_iface.py and benchmark_articulation.py now have significant overlap in the mock setup. Consider extracting a shared helper to reduce duplication and ensure they stay in sync.

6. Newton Kernel Output Parameters Removal

Removing the None output parameters from Newton kernels:

outputs=[
    self.data.root_link_pose_w,
-   None,  # self.data._root_link_state_w.data,
-   None,  # self.data._root_state_w.data,
],

Is a nice cleanup. Verify these were truly dead code and not vestigial from a planned feature.


📝 Minor Suggestions

  1. Docstring Typos Fixed — Good catch on expectexpects throughout.

  2. CHANGELOG Tilde Alignment — The CHANGELOG headers have inconsistent underline lengths (~~~ vs ~~~~). RST requires the underline to be at least as long as the title. Double-check these render correctly.

  3. Benchmark Fill Ratios — The fill-ratio benchmarks (5%, 95%, 100%) are a nice addition for understanding partial-reset performance. Consider adding these results to the PR description.


🔍 Questions

  1. Wrench Composer Caching Safety: The wrench composer views (_get_inst_wrench_force_f32, etc.) assume the composed_force/torque arrays don't get reallocated. Can you confirm this is guaranteed by the WrenchComposer implementation?

  2. Deformable Object Cache Safety: The comment says "Safe because DeformableObjectData uses wp.copy into stable buffers." Is this stable-buffer contract documented/enforced anywhere?


Verdict

This is a well-executed performance optimization PR. The changes are systematic, well-documented in CHANGELOGs, and follow consistent patterns across backends. The main concerns are:

  • Minor bug: missing int64 handling in joint/body ID resolvers
  • Documentation: cache safety assumptions should be explicitly commented

With the int64 fix addressed, this is ready to merge.

Remove the single-slot tensor index wrapper caches from PhysX and Newton assets because tensor views can share data pointers with different shapes. Add regression coverage for tensor-view index resolution and move changelog entries to fresh extension versions.
# Conflicts:
#	source/isaaclab_newton/docs/CHANGELOG.rst
AntoineRichard added a commit to AntoineRichard/IsaacLab that referenced this pull request Apr 29, 2026
This rewrites the RigidObject + RigidObjectData write/read paths to follow
the canonical PhysX (PR isaac-sim#5329) and Newton conventions, replacing the
previous lazy-buffer + invalidate-cache machinery with the upstream
patterns.

Data class:
* Eagerly allocate every per-instance TimestampedBuffer in _create_buffers
  (called from __init__).  Drop _ensure_root_buffers /
  _ensure_derived_buffers / _ensure_body_prop_buffers and the per-property
  guard calls.  num_instances / num_bodies / body_names are now passed via
  the constructor instead of set post-construction.
* Bridge the wheel's pull-only binding.read(target) API to PhysX's
  pointer-stable view contract via a lazily-allocated pinned-host wp.array
  for CPU-only bindings (RIGID_BODY_MASS / COM_POSE / INERTIA).
* Fix GRAVITY_VEC_W returning (0, 0, -1) instead of (0, 0, 0) when
  cfg.gravity is the zero vector.

RigidObject writers/setters (PhysX PR isaac-sim#5329 cache-update pattern):
* Every set_*_index / write_*_to_sim_index method scatters the user data
  into the cached _body_*/_root_* buffer via the matching scatter kernel,
  then pushes the cache to the wheel via binding.write(cache, indices=...).
* Pre-allocated pinned-host CPU staging buffers (_cpu_body_mass /
  _cpu_body_coms / _cpu_body_inertia / _cpu_env_ids_all) are reused across
  calls to avoid per-call wp.clone(..., device="cpu") allocation.
* The cache becomes the single source of truth post-write; the previous
  _invalidate_caches / _invalidate_root_caches helpers are removed.

Mask path (Newton-style native mask):
* Add seven mask kernels in isaaclab_ovphysx.assets.kernels:
  set_root_link_pose_to_sim_mask, set_root_com_pose_to_sim_mask,
  set_root_com_velocity_to_sim_mask, set_root_link_velocity_to_sim_mask,
  write_2d_data_to_buffer_with_mask, write_body_inertia_to_buffer_mask,
  write_body_com_pose_to_buffer_mask.
* Each *_mask method scatters into the cache only where the boolean mask
  is True, then pushes the cache via the wheel's native
  binding.write(cache, mask=mask_u8) -- no torch.nonzero conversion.
* _resolve_env_mask / _resolve_body_mask now return wp.bool masks
  (defaulting to pre-allocated _ALL_TRUE_*_MASK), not indices.

Initialization:
* Add USD prim-scan validation to RigidObject._initialize_impl mirroring
  PhysX.  RuntimeError is now raised cleanly when cfg.prim_path resolves
  to no UsdPhysics.RigidBodyAPI prim, multiple rigid-body prims, or a prim
  with an enabled UsdPhysics.ArticulationRootAPI, instead of crashing with
  an obscure TypeError deep in property accessors.
Copy link
Copy Markdown

@isaaclab-review-bot isaaclab-review-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Isaac Lab Review Bot

Summary

This follow-up review examines the new commits addressing the previous review concerns. The PR continues to implement performance optimizations for asset writes with comprehensive benchmarking infrastructure.

Previous Concerns Status

  • int64 handling in _resolve_joint_ids/_resolve_body_ids: NOT addressed in the diff - the previous concern about missing int64→int32 conversion in joint/body ID resolvers remains unaddressed in the visible changes.
  • Cache safety documentation: NOT visible in the truncated diff - cannot confirm if comments were added.
  • Thread safety documentation: NOT visible - cannot confirm if documented.

New Findings

🟡 Warning: test_articulation_iface.py:519-562 — New index resolution tests only cover tensor views, not int64 dtype edge case

The new TestArticulationIndexResolution tests verify that tensor views are handled correctly:

resolved_full = art._resolve_env_ids(env_ids)
resolved_view = art._resolve_env_ids(env_ids[:2])

However, these tests don't cover the int64 dtype case that was flagged in the previous review. Consider adding a test like:

def test_resolve_env_ids_handles_int64_dtype(self, backend):
    art, _ = get_articulation(backend, num_instances=4, device="cpu")
    env_ids_i64 = torch.arange(4, dtype=torch.int64, device="cpu")
    resolved = art._resolve_env_ids(env_ids_i64)
    assert resolved.dtype == wp.int32

🔵 Improvement: The new disable_shape_checks config addition is clean and well-documented in asset_base_cfg.py:78-91

The tri-state design with clear docstrings explaining None/True/False behavior is exactly what was praised in the previous review.

Implementation Verdict

Minor fixes needed — the int64 handling gap from the previous review should be addressed before merge.

Test Coverage

The new index resolution tests in TestArticulationIndexResolution, TestCollectionIndexResolution, and TestRigidObjectIndexResolution are good additions for catching tensor view shape issues, but don't cover the dtype conversion edge case.

CI Status

The Check for Broken Links failure appears unrelated to this PR's changes (documentation link checker). Pre-commit and wheel build passed.

# Conflicts:
#	source/isaaclab/docs/CHANGELOG.rst
#	source/isaaclab_physx/docs/CHANGELOG.rst
Copy link
Copy Markdown

@isaaclab-review-bot isaaclab-review-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Isaac Lab Review Bot

Follow-up Review

The new commits don't appear to address the previously flagged concerns:

  1. int64 handling in _resolve_joint_ids/_resolve_body_ids: Still not addressed - the PhysX backend resolvers for joint_ids and body_ids still lack the int64→int32 conversion that exists in _resolve_env_ids.

  2. Index resolution tests: The new tests verify tensor view shapes work correctly but don't test the int64 dtype edge case.

No new issues introduced by the latest commits. The code changes are consistent with the previous review.

Implementation Verdict

Minor fixes needed — previous int64 handling gap should be addressed before merge.

Copy link
Copy Markdown

@isaaclab-review-bot isaaclab-review-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Isaac Lab Review Bot

Follow-up Review

The previous concern about int64 handling in _resolve_joint_ids/_resolve_body_ids remains unaddressed. The PhysX backend's _resolve_env_ids has explicit int64→int32 conversion, but the joint_ids and body_ids resolvers still lack this safety check.

Looking at the new commits, no changes were made to the resolver methods in the PhysX articulation implementation. The code changes are primarily:

  • Newton benchmark additions (new files)
  • Test infrastructure improvements
  • No modifications to the flagged resolver methods

Implementation Verdict

Minor fixes needed — the int64 dtype handling gap in joint/body ID resolvers should still be addressed for API consistency.

@kellyguo11 kellyguo11 merged commit e57269f into isaac-sim:develop Apr 30, 2026
70 of 73 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation isaac-lab Related to Isaac Lab team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants