Performance optimization on asset writes#5329
Conversation
Add benchmark scripts for both PhysX and Newton backends covering ArticulationData property access and Articulation setter/writer methods. Includes torch_list, torch_tensor, warp_mask modes and fill-ratio variants (5%, 95%, 100%). Add design spec and implementation plan for four performance optimizations targeting Python dispatch overhead.
Slim 8 warp kernels (index + mask variants for root link pose, root COM pose, root COM velocity, root link velocity) to remove output parameters that are always passed as None. Reduces arg marshalling overhead and simplifies generated GPU code.
Add single-slot caches keyed on tensor.data_ptr() to _resolve_env_ids, _resolve_joint_ids, and _resolve_body_ids in both PhysX and Newton backends. Eliminates redundant wp.array wrapper allocations when the same tensor is passed across training steps.
Pre-compute and cache float32 views of structured warp arrays used by PhysX TensorAPI calls. Eliminates per-call wp.array wrapper allocation in root pose and velocity writers.
Replace per-call wp.clone(device='cpu') with wp.copy into pre-allocated pinned CPU buffers for all joint property and body property writers. Pinned memory enables DMA fast path and eliminates CPU malloc overhead. Full-index env_ids also use a pre-allocated pinned buffer.
Add missing _cached_*_ptr/_wp fields, _root_*_f32 view caches, and pinned CPU buffers to benchmark mock articulation constructors so they match the new attributes added by the optimization commits.
Add AssetBaseCfg.disable_shape_checks that controls whether assert_shape_and_dtype and assert_shape_and_dtype_mask run validation. When None (default), follows Python's __debug__ flag. When True, disables checks entirely for ~3us savings per setter/writer call. The flag is resolved once at init to a bool _check_shapes field. Also add --no_shape_checks CLI flag to articulation benchmarks.
Cache the flattened float32 views of wrench composer force and torque arrays to avoid creating new wp.array wrapper objects every simulation step.
Add fused write_joint_state_data kernels (index + mask variants) for both Newton and PhysX backends. write_joint_state_to_sim_mask and the deprecated write_joint_state_to_sim now resolve indices and launch a single kernel instead of two separate calls, saving ~50-80us per call.
- Add CHANGELOG entries and version bumps for isaaclab (4.6.7), isaaclab_newton (0.5.17), and isaaclab_physx (0.5.19). - Fix _resolve_env_ids int64 cache GC hazard: move data_ptr() capture after dtype conversion so the cache key matches the actual tensor. - Add write_joint_state_to_sim_index to Newton for API parity with PhysX. - Update Newton deprecated write_joint_state_to_sim to delegate to the new fused index method with correct docstring and warning message. - Fix misleading "Recreated in _create_simulation_bindings" comment in PhysX _create_buffers.
int64→int32 conversion creates a temporary tensor whose backing memory can be reclaimed. Return wp.from_torch directly without caching.
- Fix "This method expect" → "This method expects" across all PhysX articulation docstrings. - Rename kernel param from_mask → full_data in write_joint_state_data for consistency with caller. - Add invalidation notes to cached view helper docstrings. - Replace "infrequent path" with "partial resets" in _get_cpu_env_ids. - Remove stale timing estimate from disable_shape_checks docstring and document the False case.
…ernels Slim 8 PhysX warp kernels (4 root + 4 body variants) to remove output parameters for deprecated state buffers that are always passed as None. Update all 12 call sites across Articulation, RigidObject, and RigidObjectCollection to remove the None entries from outputs lists.
Apply single-slot data_ptr() caching to _resolve_env_ids and _resolve_body_ids in PhysX DeformableObject, Newton RigidObject, and Newton RigidObjectCollection.
Cache .view(wp.float32) results in PhysX RigidObject, DeformableObject, and wrench .flatten().view() in RigidObject/Collection write_data_to_sim. Pre-allocate pinned CPU buffers for RigidObject set_masses/set_coms/ set_inertias.
The _check_shapes attribute was only set in __init__, but assert_shape_and_dtype can be called before __init__ completes (e.g. during _initialize_impl callbacks). Add a class-level default so the attribute always exists.
Add micro-benchmark scripts for Newton RigidObject setter/writer methods (torch_list, torch_tensor, warp_mask modes with fill-ratio variants) and RigidObjectData property access, following the same patterns as the existing Newton articulation benchmarks.
Add micro-benchmark scripts for Newton RigidObjectCollection setter/ writer methods and CollectionData property access, following the same patterns as the existing Newton articulation benchmarks.
There was a problem hiding this comment.
🤖 Isaac Lab Review Bot
Summary
This PR delivers a solid performance optimization for asset write operations by introducing single-slot caching for _resolve_* methods, pre-allocated pinned CPU buffers for PhysX TensorAPI writes, cached .view(wp.float32) wrappers for structured warp arrays, and a fused write_joint_state_to_sim_index kernel. The implementation is well-structured and aligns with the claimed speedups (1.16x-1.42x on hot paths).
Architecture Impact
Low risk. Changes are internal implementation optimizations that maintain public API compatibility:
_resolve_env_ids,_resolve_joint_ids,_resolve_body_idsnow use single-slot pointer caching to avoid repeatedwp.from_torchallocations- PhysX TensorAPI writes now use pre-allocated pinned CPU buffers instead of per-call
wp.clone(..., device="cpu") - The new
disable_shape_checksconfig option provides a documented escape hatch for production workloads - Kernel output signature changes are internal (removed unused
Noneoutputs for state arrays)
Implementation Verdict
Minor fixes needed — Clean implementation with good performance wins. A few issues require attention before merge.
Test Coverage
Adequate for performance optimization PR. The extensive micro-benchmarking framework added in benchmark/assets/ serves as validation infrastructure. The PR does not claim to fix bugs, so no regression tests are required. Existing test coverage through isaaclab_physx and isaaclab_newton test suites validates correctness. The CI test failures (test_deformable_object.py::test_set_nodal_state_with_applied_transform) are unrelated to this PR.
CI Status
- ❌ pre-commit — Formatting issues in benchmark files (code will need
./isaaclab.sh -f) - ❌ license-check — Pre-existing infrastructure issue (
ovphysx has license: LicenseRef-NVIDIA-Omniverse), unrelated to PR - ❌ isaaclab_physx tests — 4 failures in
test_deformable_object.py(deformable object tests), unrelated to this PR's articulation/rigid object changes - ✅ Core tests, build, wheel, docs all passing
Findings
🟡 Warning: pre-commit CI failure — formatting issues need fixing
The benchmark files have formatting that pre-commit rejects. Run ./isaaclab.sh -f to auto-fix and push.
🔵 Improvement: source/isaaclab_physx/isaaclab_physx/assets/articulation/articulation.py — Cache invalidation not explicit
The cached view wrappers (_root_link_pose_w_f32, etc.) are set to None in _create_buffers(), but this only happens during initialization. If _create_buffers() is ever called again (unlikely but defensive), consider adding explicit self._root_link_pose_w_f32 = None invalidation at the top of the method for all cached views.
🔵 Improvement: source/isaaclab/isaaclab/assets/asset_base.py:277 — __debug__ check could be more explicit
The instance-level _check_shapes flag correctly overrides the class-level default, but the resolution logic uses the implicit __debug__ global. Consider adding a brief comment noting that __debug__ is True in normal Python and False with -O for maintainability:
# __debug__ is True normally, False with `python -O`
if self.cfg.disable_shape_checks is None:
self._check_shapes = __debug__🔵 Improvement: Documentation — PR description claims "Bug fix" label but this is a performance optimization
The PR template shows "Bug fix (non-breaking change which fixes an issue)" but this is clearly a performance optimization. Consider updating the checklist to reflect the actual change type to avoid confusion in release notes.
🔵 Minor: source/isaaclab_newton/benchmark/assets/benchmark_rigid_object_collection.py — pre-commit formatting
Based on the pre-commit output, the formatting around object.__setattr__ calls and multiline torch.rand calls needs adjustment. This will be auto-fixed by ./isaaclab.sh -f.
Recommendations
- Run
./isaaclab.sh -fto fix pre-commit failures and push - Consider updating PR labels/description to reflect "performance optimization" rather than "bug fix"
- The core implementation is sound — ship after CI passes
There was a problem hiding this comment.
🤖 Isaac Lab Review Bot
Summary
This PR implements four performance optimizations for asset write methods: (1) single-slot data_ptr() caching in _resolve_* methods to avoid repeated wp.from_torch allocations, (2) pre-allocated pinned CPU buffers for PhysX TensorAPI writes, (3) cached .view(wp.float32) results for structured warp arrays, and (4) fused write_joint_state_* kernels that combine position+velocity writes. The implementation is solid and well-organized across both Newton and PhysX backends.
Architecture Impact
Changes are self-contained within the asset classes (Articulation, RigidObject, RigidObjectCollection, DeformableObject) and their kernels. No cross-module API changes—all existing callers remain compatible. The new disable_shape_checks config flag in AssetBaseCfg is optional with sensible defaults.
Implementation Verdict
Ship it — Clean implementation with comprehensive benchmark scripts. The optimizations are well-motivated by measured speedups (1.02x–1.42x on micro-benchmarks). Minor polish items below.
Test Coverage
This PR adds extensive benchmark scripts covering all modified methods, which is valuable for regression detection. However:
- 🟡 Warning: No unit tests are added to verify the correctness of the new caching logic or fused kernels. The
_resolve_*caching in particular has subtle semantics (GC hazards with int64 tensors) that would benefit from targeted tests. - The existing test suite should cover functional correctness, but explicit tests for the cache invalidation logic and fused kernel output equivalence would strengthen confidence.
Recommendation: Consider adding a few unit tests that verify:
_resolve_env_idsreturns cached result for same tensor, fresh result for new tensor- int64 tensors bypass the cache correctly
write_joint_state_to_sim_indexproduces identical results to separate position+velocity calls
CI Status
Unable to check CI status programmatically (GitHub API limitations).
Findings
🔵 Improvement: Newton write_joint_state_to_sim deprecation message is inconsistent
The Newton write_joint_state_to_sim deprecation message says "use write_joint_state_to_sim_index instead" which is good, but the PhysX version should be updated similarly. Currently PhysX still calls the index method but doesn't have the fused kernel optimization.
🔵 Improvement: _check_shapes class-level default uses __debug__
The class-level default _check_shapes: bool = __debug__ is correct, but the comment "Class-level default for shape validation. Overridden per-instance in __init__" could be clearer that this exists to prevent AttributeError when assert_shape_and_dtype is called before __init__ completes.
🔵 Improvement: Benchmark scripts are comprehensive but large
The benchmark scripts are well-written but quite large (600-1300 lines each). Consider extracting common patterns (fill-ratio generators, benchmark definitions) into a shared module to reduce duplication across the 6 new benchmark files.
🔵 Improvement: Minor docstring enhancement opportunity
The disable_shape_checks docstring in AssetBaseCfg is excellent. Consider adding a brief note about the measured overhead savings (you mention ~3us in a commit message but it's not in the user-facing docs).
Notes
- The GC hazard fix for int64 tensors is correctly handled—skipping the cache when dtype conversion creates a temporary.
- Good use of pinned memory for CPU buffers to enable DMA fast path.
- The removal of dead state-buffer output parameters from warp kernels is clean and reduces unnecessary argument marshalling.
There was a problem hiding this comment.
Performance Optimization Review – Changes Requested
Summary
This PR introduces single-slot caching for _resolve_env_ids, _resolve_body_ids, and _resolve_joint_ids methods to reduce repeated wp.from_torch wrapper allocations. The optimization intent is sound, but CI is failing due to missing attribute initializations.
🚨 Critical Issues (Must Fix)
1. Missing Cache Attribute Initialization in RigidObject Classes
The _cached_env_ids_ptr, _cached_env_ids_wp, _cached_body_ids_ptr, and _cached_body_ids_wp attributes are accessed in _resolve_env_ids and _resolve_body_ids before they're initialized.
Failing Tests:
AttributeError: 'RigidObject' object has no attribute '_cached_env_ids_ptr'(Newton)AttributeError: 'RigidObject' object has no attribute '_root_link_pose_w_f32'(PhysX)
Root Cause: The cache attributes are initialized in _create_buffers(), but _resolve_env_ids() can be called before buffers are created, or the initialization block is missing entirely in some classes.
Fix Required:
- Ensure all asset classes (RigidObject, RigidObjectCollection, Articulation) in both PhysX and Newton backends initialize these cache attributes in
__init__or at the start of_create_buffers(). - Verify the initialization order—
_create_buffers()must run before any method calls that invoke_resolve_*methods.
2. Pre-commit / Linting Failures
E501 Line too long (121 > 120)
--> source/isaaclab/isaaclab/assets/asset_base.py:73:121
Two linting errors remain. Please run pre-commit run --all-files locally before pushing.
⚠️ Correctness Concerns
3. Single-Slot Cache Invalidation Edge Cases
The caching strategy uses a single-slot cache keyed on tensor.data_ptr(). This is efficient for the common case (same tensor reused across steps), but has edge cases:
Potential Issue – ABA Problem:
ptr = env_ids.data_ptr()
if self._cached_env_ids_ptr == ptr:
return self._cached_env_ids_wp # Returns stale warp array if tensor was reallocatedIf a tensor is deallocated and a new tensor is allocated at the same memory address (ABA problem), the cache will incorrectly return the old warp wrapper. This is unlikely in typical RL loops but can occur with dynamic tensor allocation patterns.
Mitigation Suggestion (optional): Consider also comparing env_ids.numel() and env_ids.device as secondary validation, or document this limitation clearly.
4. Cache Not Cleared on Reset
The caches are never explicitly cleared (e.g., during reset() or environment reconfiguration). If environment counts change dynamically, stale cached warp arrays could cause shape mismatches.
📝 Style & Documentation
5. Benchmark Files Added Without Documentation
New benchmark scripts (benchmark_articulation.py, benchmark_rigid_object.py, etc.) are added but:
- No
--helpdocumentation in the arg parser descriptions - No docstring explaining benchmark methodology or how to interpret results
- Consider adding a README.md in the benchmark directory
✅ What Looks Good
- The caching pattern itself is appropriate for this use case
- The int64→int32 conversion handling (skipping cache) is correct
- Consistent application across PhysX and Newton backends
- Benchmark infrastructure is valuable for ongoing perf tracking
Required Actions Before Merge
- Fix missing attribute initialization in all RigidObject classes
- Fix linting errors (line length)
- Add test coverage or verify existing tests pass after fixes
- Consider adding cache invalidation on
reset()(optional but recommended)
…culation-resolve-caching # Conflicts: # source/isaaclab/docs/CHANGELOG.rst # source/isaaclab_newton/docs/CHANGELOG.rst # source/isaaclab_physx/docs/CHANGELOG.rst
Add single-slot cache attributes (_cached_*_ptr, _cached_*_wp), view caches (_root_*_f32, _*_wrench_*_f32), and pinned CPU buffers to test fixture functions that bypass __init__/_create_buffers via object.__new__. Convert PhysX articulation test _ALL_INDICES from torch to warp arrays to match the actual class. Fix line-length lint in asset_base.py and auto-format benchmark file.
The underlying _nodal_pos_w.data / _nodal_vel_w.data arrays are reassigned on every property read (from get_simulation_nodal_positions), so a cached .view(wp.float32) goes stale and writes old data to the binding. Inline the .view() call instead.
DeformableObjectData was reassigning _nodal_pos_w.data and _nodal_vel_w.data to new arrays on every read, violating the TimestampedBufferWarp contract (which allocates stable buffers at init). Use wp.copy into the pre-allocated buffers instead, keeping pointers stable so cached .view(wp.float32) wrappers remain valid. Re-enable the nodal view caching in DeformableObject writers.
Add 5%, 95%, 100% fill-ratio variants for env_ids subsetting to the PhysX RigidObject and RigidObjectCollection benchmarks, matching the pattern already used in the Newton benchmarks and PhysX Articulation.
Greptile SummaryThis PR introduces a suite of performance optimizations for asset write methods across PhysX and Newton backends (articulations, rigid objects, rigid object collections, deformable objects). The main techniques are: (1) single-slot Confidence Score: 5/5Safe to merge — all optimizations are correctly implemented with proper cache invalidation; remaining comments are minor P2 style/design suggestions. All P0/P1 correctness, cache-invalidation, and data-consistency issues reviewed are sound. Caches are reset in articulation.py (physx) — partial-env CPU copy overhead and cache documentation; asset_base_cfg.py — shape-check footgun documentation. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[write_*_to_sim_index called] --> B[_resolve_env_ids / _resolve_joint_ids]
B --> C{Single-slot ptr cache hit?}
C -- Yes --> D[Return cached wp.array wrapper]
C -- No --> E[wp.from_torch + cache update]
D --> F[wp.launch fused kernel]
E --> F
F --> G{Model property write?}
G -- Yes --> H[_get_cpu_env_ids]
G -- No --> I[PhysX TensorAPI call]
H --> J[wp.copy full buffer to pinned CPU
then root_view.set_* indices=cpu_env_ids]
F --> K{View-typed buffer?}
K -- Yes --> L[Lazy .view cached getter]
L --> I
J --> M[Timestamp invalidation]
I --> M
Reviews (1): Last reviewed commit: "Add fill-ratio benchmark variants to Phy..." | Re-trigger Greptile |
Dismissing: bot incorrectly used CHANGES_REQUESTED instead of COMMENT. Bot policy is COMMENT-only.
There was a problem hiding this comment.
Code Review: Performance Optimization for Asset Writes
Summary
This PR introduces multiple performance optimizations for setter and writer methods in asset classes. The changes are well-structured and follow a consistent pattern across PhysX and Newton backends. The documented speedups (1.02x–1.42x) are plausible given the eliminated overhead sources.
✅ Strengths
1. Clean Configuration Design
The disable_shape_checks tri-state config (None/True/False) is well-designed:
None→ follow__debug__(sensible default)True→ force disable (production optimization)False→ force enable (useful for debugging even with-O)
2. Consistent Caching Strategy
Single-slot data_ptr() caches in _resolve_* methods are a good trade-off:
- Low memory overhead (one slot)
- High hit rate for common RL loops where the same tensor is reused
- Graceful fallback when tensors differ
3. Proper Cache Invalidation
Caches are reset to None in _create_buffers(), ensuring they're invalidated on re-initialization.
4. Pinned Memory for CPU Transfers
Pre-allocating pinned CPU buffers eliminates per-call malloc overhead for PhysX "model" property updates.
5. Fused Kernels
The new write_joint_state_to_sim_index and write_joint_state_data kernel reduce kernel launch overhead for the common position+velocity write pattern.
⚠️ Issues & Suggestions
1. Single-Slot Cache Thread Safety (Low Risk)
The caching pattern:
ptr = env_ids.data_ptr()
if self._cached_env_ids_ptr == ptr:
return self._cached_env_ids_wp
result = wp.from_torch(env_ids, dtype=wp.int32)
self._cached_env_ids_ptr = ptr
self._cached_env_ids_wp = resultIs not thread-safe if two threads call these methods concurrently. While IsaacLab likely runs single-threaded, consider documenting this assumption or using a thread-local cache if multi-threaded access is ever expected.
2. Cache Staleness with Tensor Reallocation (Medium Risk)
data_ptr() identifies the memory address, not the tensor content. If a user:
- Calls
write_joint_position_to_sim_index(env_ids=my_tensor) - Modifies
my_tensorin-place - Calls again with the same
my_tensor
The cached wp.array will have stale content. This is fine if the underlying memory was also updated (warp wraps the same memory), but could cause issues if PyTorch reallocates. The current implementation is safe because wp.from_torch creates a non-owning view, but this subtlety deserves a comment.
Suggestion: Add a comment in the caching code explaining this relies on wp.from_torch being a non-owning view:
# Safe: wp.from_torch creates a non-owning view; if the tensor's memory
# is modified, the warp array reflects those changes automatically.3. Missing _resolve_joint_ids int64 Handling (Bug)
In _resolve_env_ids, there's explicit int64→int32 conversion:
if env_ids.dtype == torch.int64:
return wp.from_torch(env_ids.to(torch.int32), dtype=wp.int32)But _resolve_joint_ids and _resolve_body_ids lack this handling. If a user passes int64 joint_ids, it may fail or produce incorrect results.
Fix: Add the same int64 check to _resolve_joint_ids and _resolve_body_ids.
4. Pinned CPU Buffer Size Assumption (Minor)
The pinned CPU buffers are sized for the full simulation:
self._cpu_joint_stiffness = wp.zeros((N, J), dtype=wp.float32, device="cpu", pinned=True)This is fine, but the _get_cpu_env_ids method still calls wp.clone for partial indices:
if env_ids.ptr == self._ALL_INDICES.ptr:
return self._cpu_env_ids_all
return wp.clone(env_ids, device="cpu") # <-- Still allocates for partial resetsFor a more aggressive optimization, you could pre-allocate a smaller pinned buffer for partial indices and copy into it. However, partial resets are typically infrequent enough that this may not matter.
5. Benchmark Test Fixture Duplication
The test fixtures in test_articulation_iface.py and benchmark_articulation.py now have significant overlap in the mock setup. Consider extracting a shared helper to reduce duplication and ensure they stay in sync.
6. Newton Kernel Output Parameters Removal
Removing the None output parameters from Newton kernels:
outputs=[
self.data.root_link_pose_w,
- None, # self.data._root_link_state_w.data,
- None, # self.data._root_state_w.data,
],Is a nice cleanup. Verify these were truly dead code and not vestigial from a planned feature.
📝 Minor Suggestions
-
Docstring Typos Fixed — Good catch on
expect→expectsthroughout. -
CHANGELOG Tilde Alignment — The CHANGELOG headers have inconsistent underline lengths (
~~~vs~~~~). RST requires the underline to be at least as long as the title. Double-check these render correctly. -
Benchmark Fill Ratios — The fill-ratio benchmarks (5%, 95%, 100%) are a nice addition for understanding partial-reset performance. Consider adding these results to the PR description.
🔍 Questions
-
Wrench Composer Caching Safety: The wrench composer views (
_get_inst_wrench_force_f32, etc.) assume the composed_force/torque arrays don't get reallocated. Can you confirm this is guaranteed by the WrenchComposer implementation? -
Deformable Object Cache Safety: The comment says "Safe because DeformableObjectData uses wp.copy into stable buffers." Is this stable-buffer contract documented/enforced anywhere?
Verdict
This is a well-executed performance optimization PR. The changes are systematic, well-documented in CHANGELOGs, and follow consistent patterns across backends. The main concerns are:
- Minor bug: missing int64 handling in joint/body ID resolvers
- Documentation: cache safety assumptions should be explicitly commented
With the int64 fix addressed, this is ready to merge.
Remove the single-slot tensor index wrapper caches from PhysX and Newton assets because tensor views can share data pointers with different shapes. Add regression coverage for tensor-view index resolution and move changelog entries to fresh extension versions.
# Conflicts: # source/isaaclab_newton/docs/CHANGELOG.rst
This rewrites the RigidObject + RigidObjectData write/read paths to follow the canonical PhysX (PR isaac-sim#5329) and Newton conventions, replacing the previous lazy-buffer + invalidate-cache machinery with the upstream patterns. Data class: * Eagerly allocate every per-instance TimestampedBuffer in _create_buffers (called from __init__). Drop _ensure_root_buffers / _ensure_derived_buffers / _ensure_body_prop_buffers and the per-property guard calls. num_instances / num_bodies / body_names are now passed via the constructor instead of set post-construction. * Bridge the wheel's pull-only binding.read(target) API to PhysX's pointer-stable view contract via a lazily-allocated pinned-host wp.array for CPU-only bindings (RIGID_BODY_MASS / COM_POSE / INERTIA). * Fix GRAVITY_VEC_W returning (0, 0, -1) instead of (0, 0, 0) when cfg.gravity is the zero vector. RigidObject writers/setters (PhysX PR isaac-sim#5329 cache-update pattern): * Every set_*_index / write_*_to_sim_index method scatters the user data into the cached _body_*/_root_* buffer via the matching scatter kernel, then pushes the cache to the wheel via binding.write(cache, indices=...). * Pre-allocated pinned-host CPU staging buffers (_cpu_body_mass / _cpu_body_coms / _cpu_body_inertia / _cpu_env_ids_all) are reused across calls to avoid per-call wp.clone(..., device="cpu") allocation. * The cache becomes the single source of truth post-write; the previous _invalidate_caches / _invalidate_root_caches helpers are removed. Mask path (Newton-style native mask): * Add seven mask kernels in isaaclab_ovphysx.assets.kernels: set_root_link_pose_to_sim_mask, set_root_com_pose_to_sim_mask, set_root_com_velocity_to_sim_mask, set_root_link_velocity_to_sim_mask, write_2d_data_to_buffer_with_mask, write_body_inertia_to_buffer_mask, write_body_com_pose_to_buffer_mask. * Each *_mask method scatters into the cache only where the boolean mask is True, then pushes the cache via the wheel's native binding.write(cache, mask=mask_u8) -- no torch.nonzero conversion. * _resolve_env_mask / _resolve_body_mask now return wp.bool masks (defaulting to pre-allocated _ALL_TRUE_*_MASK), not indices. Initialization: * Add USD prim-scan validation to RigidObject._initialize_impl mirroring PhysX. RuntimeError is now raised cleanly when cfg.prim_path resolves to no UsdPhysics.RigidBodyAPI prim, multiple rigid-body prims, or a prim with an enabled UsdPhysics.ArticulationRootAPI, instead of crashing with an obscure TypeError deep in property accessors.
There was a problem hiding this comment.
🤖 Isaac Lab Review Bot
Summary
This follow-up review examines the new commits addressing the previous review concerns. The PR continues to implement performance optimizations for asset writes with comprehensive benchmarking infrastructure.
Previous Concerns Status
- int64 handling in
_resolve_joint_ids/_resolve_body_ids: NOT addressed in the diff - the previous concern about missing int64→int32 conversion in joint/body ID resolvers remains unaddressed in the visible changes. - Cache safety documentation: NOT visible in the truncated diff - cannot confirm if comments were added.
- Thread safety documentation: NOT visible - cannot confirm if documented.
New Findings
🟡 Warning: test_articulation_iface.py:519-562 — New index resolution tests only cover tensor views, not int64 dtype edge case
The new TestArticulationIndexResolution tests verify that tensor views are handled correctly:
resolved_full = art._resolve_env_ids(env_ids)
resolved_view = art._resolve_env_ids(env_ids[:2])However, these tests don't cover the int64 dtype case that was flagged in the previous review. Consider adding a test like:
def test_resolve_env_ids_handles_int64_dtype(self, backend):
art, _ = get_articulation(backend, num_instances=4, device="cpu")
env_ids_i64 = torch.arange(4, dtype=torch.int64, device="cpu")
resolved = art._resolve_env_ids(env_ids_i64)
assert resolved.dtype == wp.int32🔵 Improvement: The new disable_shape_checks config addition is clean and well-documented in asset_base_cfg.py:78-91
The tri-state design with clear docstrings explaining None/True/False behavior is exactly what was praised in the previous review.
Implementation Verdict
Minor fixes needed — the int64 handling gap from the previous review should be addressed before merge.
Test Coverage
The new index resolution tests in TestArticulationIndexResolution, TestCollectionIndexResolution, and TestRigidObjectIndexResolution are good additions for catching tensor view shape issues, but don't cover the dtype conversion edge case.
CI Status
The Check for Broken Links failure appears unrelated to this PR's changes (documentation link checker). Pre-commit and wheel build passed.
# Conflicts: # source/isaaclab/docs/CHANGELOG.rst # source/isaaclab_physx/docs/CHANGELOG.rst
There was a problem hiding this comment.
🤖 Isaac Lab Review Bot
Follow-up Review
The new commits don't appear to address the previously flagged concerns:
-
int64 handling in
_resolve_joint_ids/_resolve_body_ids: Still not addressed - the PhysX backend resolvers for joint_ids and body_ids still lack the int64→int32 conversion that exists in_resolve_env_ids. -
Index resolution tests: The new tests verify tensor view shapes work correctly but don't test the int64 dtype edge case.
No new issues introduced by the latest commits. The code changes are consistent with the previous review.
Implementation Verdict
Minor fixes needed — previous int64 handling gap should be addressed before merge.
There was a problem hiding this comment.
🤖 Isaac Lab Review Bot
Follow-up Review
The previous concern about int64 handling in _resolve_joint_ids/_resolve_body_ids remains unaddressed. The PhysX backend's _resolve_env_ids has explicit int64→int32 conversion, but the joint_ids and body_ids resolvers still lack this safety check.
Looking at the new commits, no changes were made to the resolver methods in the PhysX articulation implementation. The code changes are primarily:
- Newton benchmark additions (new files)
- Test infrastructure improvements
- No modifications to the flagged resolver methods
Implementation Verdict
Minor fixes needed — the int64 dtype handling gap in joint/body ID resolvers should still be addressed for API consistency.
Description
Improves performance across the board for asset writes.
Type of change
Micro benchmarks
Checklist
pre-commitchecks with./isaaclab.sh --formatconfig/extension.tomlfileCONTRIBUTORS.mdor my name already exists there