Skip to content

[OvPhysX] Snapshot env_0 stage pre-clone to speed up OvPhysX warm-up#5679

Open
AntoineRichard wants to merge 9 commits into
isaac-sim:developfrom
AntoineRichard:antoiner/perf/ovphysx_pre_clone_snapshot
Open

[OvPhysX] Snapshot env_0 stage pre-clone to speed up OvPhysX warm-up#5679
AntoineRichard wants to merge 9 commits into
isaac-sim:developfrom
AntoineRichard:antoiner/perf/ovphysx_pre_clone_snapshot

Conversation

@AntoineRichard
Copy link
Copy Markdown
Collaborator

Description

Cuts ~31 seconds off OVPhysX warmup at 4096 envs.

The OvPhysX warmup path in :meth:~isaaclab_ovphysx.physics.OvPhysxManager._warmup_and_load previously called stage.Export(target_file) on the live USD stage and then opened the resulting file as an :class:Sdf.Layer to delete /World/envs/env_<i!=0> from it. With clone_usd=True (introduced by PR #5678), the live stage carries every per-env subtree authored, so the export step had to flatten 4096 envs to disk before the strip could discard most of them. Measured cost on Isaac-Velocity-Rough-Anymal-D-v0 at 4096 envs: ~31 s of pure setup.

:func:~isaaclab_ovphysx.cloner.ovphysx_replicate runs from :meth:~isaaclab.scene.InteractiveScene.clone_environments before :func:cloner.usd_replicate inflates the stage. At that moment the live stage holds only env_0's authored content (plus globals), and stage.Export is cheap. This PR snapshots the stage there and has :meth:_warmup_and_load consume the snapshot directly. The old export-and-strip path is preserved as a fallback for callers that bypass InteractiveScene.clone_environments (single-env tests, etc.).

Depends on PR #5678 (FrameView) for the existing _export_env0_only_stage fallback path.

Fixes # (issue)

Type of change

  • New feature (non-breaking change which adds functionality)

Screenshots

N/A — backend init perf, no UI surface.

Checklist

  • I have read and understood the contribution guidelines
  • I have run the pre-commit checks with ./isaaclab.sh --format
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • I have added a changelog fragment under source/<pkg>/changelog.d/ for every touched package (do not edit CHANGELOG.rst or bump extension.toml — CI handles that)
  • I have added my name to the CONTRIBUTORS.md or my name already exists there

Add the ovphysx entry to the FrameView factory and teach _get_backend
to recognise OvPhysxManager so calls to FrameView(...) return the
OVPhysX-backed view instead of falling through to FabricFrameView.
Implement a Warp-native batched-prim view that mirrors
NewtonSiteFrameView's site-resolution approach against the OVPhysX
scene data provider's body_q array. Each prim resolves at init to a
(body_idx, site_local) pair via USD ancestor walk; world poses are
computed on GPU as body_q[bid] * site_local. Scales and visibility
delegate to an internal UsdFrameView. The view defers initialization
to PhysicsEvent.PHYSICS_READY when the SDP is not yet built.

Adds contract-suite coverage via the shared frame_view_contract_utils
plus OVPhysX-specific tests for factory dispatch, deferred-init
guarding, and the requires_newton_model error path.
Read body poses directly from an OVPhysX RIGID_BODY_POSE tensor binding
(mirroring the ContactSensor data path) instead of going through the
scene data provider's Newton state. Scenes that don't declare
requires_newton_model=True (e.g. headless training without a Newton-style
visualizer) can now use OvPhysxFrameView -- the previous design raised at
PHYSICS_READY because the SDP exposes body_q only when the Newton model
build is requested.

Site discovery handles both InteractiveScene modes:
  * clone_usd=True: every env has USD prims, one per env matches the
    pattern and resolves directly to its rigid-body ancestor.
  * clone_usd=False: only env_0 has authored USD prims; env_1..N are
    physics-layer clones. The binding row count becomes the authoritative
    site count, and per-env site paths are synthesized from the env_0
    template prim with env_0 replaced by each row's env_id.

The Warp kernels are unchanged. count, prim_paths, and prims honor the
clone_usd=False expansion. Tests updated to drive _pose_buf directly
(detach the binding after one warm-up read) instead of mutating SDP
body_q, and the now-obsolete requires_newton_model error test is removed.
Under OVPhysX's clone_usd=False scenes (the default for InteractiveScene),
USD only holds env_0 -- env_1..N are physics-layer clones via physx.clone().
SensorBase.__init__ derives _num_envs from len(find_matching_prims(...)) so
any FrameView-using sensor (RayCaster, MultiMeshRayCaster, Camera) sees
_num_envs=1 even when the scene has many envs. The sensor's
_reset_mask_torch is then sized 1, and the first reset(env_ids=[0..N-1])
triggers a CUDA assert from out-of-bounds indexing.

Walk the call stack at construction time to capture the sensor that owns
this view, then at the end of _initialize_impl re-allocate its env-sized
buffers (_ALL_ENV_MASK, _reset_mask + torch view, _is_outdated, _timestamp,
_timestamp_last_update) to match the OVPhysX RIGID_BODY_POSE binding's
row count. Duck-typed -- works for any SensorBase subclass without an
isaaclab.sensors import dependency. Mirrors the local fix the OVPhysX
ContactSensor already applies to itself at contact_sensor.py:240-248.

This is a hack confined to the OVPhysX backend. A cleaner long-term fix
would source _num_envs from InteractiveScene.num_envs (or move the
under-cloning behaviour to be opt-in for rendering rather than backend-
keyed), but both are larger changes that touch core IsaacLab.

Verified: Anymal-D rough velocity env with presets=ovphysx + 64 envs
now initializes past env.reset() without the CUDA assert.
InteractiveScene now USD-replicates the per-env asset subtree for every
backend including OVPhysX (clone_usd=True). OVPhysX's add_usd +
physx.clone tolerates the extra USD content at runtime, so the per-env
USD prims layer cleanly on top of the physics-side replicas. Sensors
that discover _num_envs via find_matching_prims now see N prims under
OVPhysX too, matching the behaviour they already had under PhysX and
Newton.

Removes the OvPhysxFrameView workaround that the previous
clone_usd=False behaviour had forced:

* OvPhysxFrameView no longer walks the call stack to patch the caller
  sensor's _num_envs. _num_envs now reaches the binding row count
  naturally.

FrameView parity fixes inspired by a PhysX/Newton review:

* OvPhysxFrameView._initialize_impl rejects prim_paths that resolve to
  a rigid body with a clear ValueError, mirroring the
  NewtonSiteFrameView guard. FrameView is for non-physics sensor frames;
  callers should use RigidObject/Articulation for body-level control.
* OvPhysxFrameView.get_scales / set_scales docstrings now make explicit
  that these read/write the static USD xformOp:scale attribute and do
  not propagate to PhysX collision-shape scale (unlike Newton's
  shape_scale path).
InteractiveScene now USD-replicates every env's asset subtree for all
backends (clone_usd=True from the previous commit), but OvPhysxManager
was still handing the full N-env stage to physx.add_usd. The wheel
loaded all N USD-defined bodies as independent prims, so the subsequent
physx.clone() ran onto already-populated targets and never produced the
clone-lineage that the wheel's create_tensor_binding fast path expects.
At 4096 envs this turned every binding-creation call into a multi-second
USD enumeration -- the hang in articulation init.

Re-shape _warmup_and_load to export an env_0-scoped USD file:

1. Export the full stage to disk (existing flatten-and-write).
2. Re-open the exported file as an Sdf.Layer.
3. Delete every /World/envs/env_<i>  prim spec for i != 0 from the layer.
4. Re-export.

The live USD stage held by SimulationContext is untouched -- sensors
(RayCaster, Camera, ContactSensor discovery) still see N envs and
discover _num_envs = N correctly. Only the file passed to the wheel is
scoped to env_0 + globals. physx.clone() then repopulates env_1..N at
the physics layer with proper lineage, and create_tensor_binding walks a
1-USD-path result that auto-extends across the N clones -- the fast
path that clone_usd=False used to give us implicitly.

Net effect: keeps the previous commit's clone_usd=True flip (sensor
parity across backends) while restoring OVPhysX's per-env scaling. No
test changes required; the FrameView/ContactSensor suite stays at
27/27 pass on CPU.
Adds a Limitations section to _export_env0_only_stage covering the
three assumptions the workaround makes:

* Homogeneous envs -- per-env USD-authored physics overrides (mass,
  friction, collision filters under env_<i!=0>) are dropped from the
  file handed to physx.add_usd. Sensors and visualizers still see
  them in the live stage, so a divergence is possible. Per-env state
  has to be written via the runtime APIs instead.
* Global path convention -- physics-relevant prims must live outside
  /World/envs (or under env_0) to survive the export.
* Static topology -- envs added/removed after warmup require a
  re-warmup with a re-exported stage.
Mirrors the same fix applied to the contact sensor cfg/test; develop's
isaaclab.utils no longer re-exports configclass as a callable, so the
test's @configclass decorator needs the explicit module import.
OvPhysxManager._warmup_and_load previously called sim_stage.Export() on
the live USD stage and then stripped /World/envs/env_<i!=0> from the
exported file. With clone_usd=True, the live stage carries every env's
authored subtree, so the export step flattens 4096 envs to disk -- a
31s cost at scale on Anymal-D Rough even though the eventual wheel
input is env_0-only.

ovphysx_replicate runs from InteractiveScene.clone_environments BEFORE
cloner.usd_replicate inflates the stage. At that point the live stage
holds only env_0's authored content (plus globals), and a
stage.Export of the un-cloned stage is cheap. Register that snapshot
on OvPhysxManager and have _warmup_and_load consume it directly,
falling back to the old export-and-strip path for callers that don't
flow through InteractiveScene.clone_environments (tests, single-env
scenes).
@github-actions github-actions Bot added the isaac-lab Related to Isaac Lab team label May 18, 2026
Copy link
Copy Markdown

@isaaclab-review-bot isaaclab-review-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Isaac Lab Review Bot — Code Review Summary

Overview

This PR introduces a significant performance optimization (~31s speedup at 4096 envs) by snapshotting the USD stage before clone inflation, along with a new OvPhysxFrameView for GPU-native pose queries.


✅ Strengths

  1. Sound Architecture: The pre-clone snapshot approach correctly identifies that ovphysx_replicate runs before usd_replicate, making it the ideal interception point.

  2. Defensive Fallback: The _export_env0_only_stage fallback preserves compatibility for callers that bypass InteractiveScene.clone_environments.

  3. Well-Documented Limitations: The docstring for _export_env0_only_stage clearly outlines the homogeneous-env assumption, global path convention, and static topology constraints.

  4. Comprehensive OvPhysxFrameView: The new FrameView implementation includes proper handling of both clone_usd=True and clone_usd=False modes with synthetic path expansion.

  5. Solid Test Coverage: Contract-suite integration and specific deferred-init tests are included.


🔴 Critical Issues

None identified — the implementation is sound.


🟡 Important Suggestions

  1. State Reset Completeness (ovphysx_manager.py)

    In initialize(), _pre_clone_stage_path is reset to None, but if a previous run's temp file still exists (e.g., the process didn't clean up), and _tmp_dir is reused, there could be stale file confusion. Consider explicitly cleaning the snapshot file in initialize():

    if cls._pre_clone_stage_path and os.path.exists(cls._pre_clone_stage_path):
        try:
            os.remove(cls._pre_clone_stage_path)
        except OSError:
            pass
    cls._pre_clone_stage_path = None
  2. Idempotency Edge Case (register_pre_clone_stage_snapshot)

    The method is documented as idempotent, but the guard if cls._pre_clone_stage_path is not None: return doesn't verify the file still exists. If the file was deleted between calls, subsequent warmup would fail. Consider:

    if cls._pre_clone_stage_path is not None:
        if os.path.exists(cls._pre_clone_stage_path):
            return
        # File was deleted; allow re-snapshot
        cls._pre_clone_stage_path = None
  3. Layer Caching Clarification (_export_env0_only_stage)

    Using Sdf.Layer.FindOrOpen may return a cached layer from a previous session. Consider adding a comment explaining this is intentional, or use Sdf.Layer.OpenAsAnonymous if cache isolation is preferred.


🟢 Minor Notes

  1. Import Style: The import os inside _ensure_physx_schemas_registered() is redundant since os is already imported at module level. Consider removing the local import.

  2. Logging Consistency: logger.info is used for snapshot creation but logger.debug for the "no /World/envs" case. Consider consistent log levels for similar operational messages.

  3. Magic Constant: WORLD_BODY_INDEX = -1 is well-named but could benefit from a brief docstring explaining it's used in Warp kernels to indicate world-attached sites.

  4. Test Fixture Clarity: The test file comment about detaching the binding for synthetic pose injection is helpful — consider adding a similar note in the OvPhysxFrameView class docstring about the binding lifecycle.


📋 Test Coverage Assessment

Covered:

  • Factory dispatch to OvPhysxFrameView ✓
  • Deferred initialization guard ✓
  • Shared FrameView contract suite ✓

Gaps to Consider (non-blocking):

  • No explicit test for the pre-clone snapshot path vs. fallback path
  • No test for _export_env0_only_stage with edge cases (single env, missing /World/envs)
  • No benchmark test validating the 31s speedup claim

Verdict: Approve

This is a well-designed performance optimization with proper fallback handling and good test coverage. The suggestions above are minor improvements that don't block merging. The ~31s speedup at scale is a meaningful improvement for training workflows.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 18, 2026

Greptile Summary

This PR cuts ~31 s from OVPhysX warmup at large env counts by snapshotting the USD stage inside ovphysx_replicate — before usd_replicate inflates it to 4096 envs — and having _warmup_and_load consume the small env_0 snapshot directly instead of flattening the full post-clone stage. It also adds OvPhysxFrameView, wires it into the FrameView factory, and changes clone_usd to True unconditionally for all backends.

  • Pre-clone snapshot fast path: register_pre_clone_stage_snapshot writes scene_env0.usda before USD cloning fires; _warmup_and_load uses this path when available and falls back to the slower _export_env0_only_stage strip-and-re-export for callers that bypass InteractiveScene.
  • OvPhysxFrameView: Warp-native, GPU-resident site-pose view using a RIGID_BODY_POSE tensor binding; supports both clone_usd=True and clone_usd=False expansion modes.
  • clone_usd=True for OVPhysX: Previously skipped USD replication for OVPhysX; now runs it for all backends. The PR's own comment flags this as an in-progress assumption that may need reverting.

Confidence Score: 3/5

The warmup snapshot logic is sound for the happy path, but two defects in the fallback and kitless code paths need fixing before merge.

The new _ensure_physx_schemas_registered method is added but never called anywhere in the codebase. Its docstring says kitless runs require it to resolve PhysxContactReportAPI and related schemas — without a call site, those runs silently lose schema resolution. The _export_env0_only_stage fallback uses Sdf.Layer.FindOrOpen without a subsequent Reload(), so a re-warmup in the same process with a reused tmpdir path could feed the wrong prim hierarchy to physx.add_usd. The clone_usd=True change for OVPhysX is acknowledged as exploratory without CI test coverage for the failure path.

ovphysx_manager.py (call site for _ensure_physx_schemas_registered missing; Sdf.Layer.FindOrOpen stale-cache risk in _export_env0_only_stage); interactive_scene.py (unconditional clone_usd=True for OVPhysX needs test coverage)

Important Files Changed

Filename Overview
source/isaaclab_ovphysx/isaaclab_ovphysx/physics/ovphysx_manager.py Core change: adds register_pre_clone_stage_snapshot fast path and _export_env0_only_stage fallback. Issues: _ensure_physx_schemas_registered is added but never called; Sdf.Layer.FindOrOpen in the fallback may return a stale cached layer on re-warmup.
source/isaaclab_ovphysx/isaaclab_ovphysx/cloner/ovphysx_replicate.py Adds a single call to OvPhysxManager.register_pre_clone_stage_snapshot(stage) before the USD-clone loop. Idempotency handled in the callee. Change is minimal and correct.
source/isaaclab/isaaclab/scene/interactive_scene.py Unconditionally sets clone_usd=True for all backends including OVPhysX. The comment acknowledges this is exploratory and may need reverting.
source/isaaclab_ovphysx/isaaclab_ovphysx/sim/views/ovphysx_frame_view.py New 890-line OvPhysxFrameView class. Warp kernels are correct; deferred init pattern is sound; clone_usd=False expansion logic correctly handles env_1..N via env_token substitution.
source/isaaclab/isaaclab/sim/views/frame_view.py Adds "ovphysx": "OvPhysxFrameView" to the factory dispatch map and a _get_backend check. Straightforward and correct.
source/isaaclab_ovphysx/test/sim/test_views_xform_prim_ovphysx.py Tests factory dispatch and deferred-init error path. The fixture detaches the live binding after one read to allow synchronous get/set testing — a reasonable test scaffold.

Sequence Diagram

sequenceDiagram
    participant IS as InteractiveScene
    participant OR as ovphysx_replicate
    participant UR as usd_replicate
    participant OM as OvPhysxManager
    participant Stage as USD Stage

    IS->>OR: "physics_clone_fn() [stage = env_0 only]"
    OR->>OM: register_pre_clone_stage_snapshot(stage)
    OM->>Stage: stage.Export(scene_env0.usda)
    Note over OM: _pre_clone_stage_path set
    OR->>OM: register_clone(source, targets, positions)
    IS->>UR: usd_replicate() [inflates stage to env_0..N]
    Note over Stage: Stage now holds env_0..N USD prims
    IS->>OM: "reset() -> _warmup_and_load()"
    alt fast path
        OM-->>OM: "stage_file = scene_env0.usda"
    else fallback
        OM->>Stage: Export full stage
        OM->>OM: _export_env0_only_stage() strip env_1..N
    end
    OM->>OM: physx.add_usd(stage_file)
    loop pending_clones
        OM->>OM: physx.clone(source, targets)
    end
    OM->>OM: physx.warmup_gpu()
    OM-->>IS: PhysicsEvent.PHYSICS_READY
Loading

Comments Outside Diff (1)

  1. source/isaaclab/isaaclab/scene/interactive_scene.py, line 26-34 (link)

    P2 Uncertainty baked into production via "revert if it breaks" comment

    The comment "Probing whether this assumption holds in practice; revert to not startswith("ovphysx") if physx.clone() errors on already-populated targets" signals that the behavioral change from clone_usd=False to clone_usd=True for the OVPhysX backend has not been validated in all important cases. Embedding an exploratory probe in production code without test coverage for the failure path makes regressions hard to detect. At a minimum this should be gated by a CI test for the OVPhysX + clone_usd=True path, or the comment should be converted to a tracked issue.

Reviews (1): Last reviewed commit: "Snapshot env_0 stage pre-clone to skip f..." | Re-trigger Greptile

Comment on lines +285 to +313
_physx_schemas_registered: ClassVar[bool] = False

@classmethod
def _ensure_physx_schemas_registered(cls) -> None:
"""Register the ``PhysxSchema`` USD plugin shipped with the ovphysx wheel.

In Kit-based runs ``omni.physx`` registers the schema; in kitless
runs it must be registered manually before the wheel can match
``PhysxContactReportAPI`` and friends on the stage. The wheel
bundles the plugin under ``ovphysx/plugins/usd/PhysxSchema``. This
method is idempotent — :meth:`pxr.Plug.Registry.RegisterPlugins`
is a no-op once the plugin is registered.
"""
if cls._physx_schemas_registered:
return
try:
import os # noqa: PLC0415

import ovphysx # noqa: PLC0415

from pxr import Plug # noqa: PLC0415
except Exception:
return
plugin_root = os.path.join(os.path.dirname(ovphysx.__file__), "plugins", "usd")
for sub in ("PhysxSchema/resources", "PhysxSchemaAddition/resources"):
path = os.path.join(plugin_root, sub)
if os.path.isdir(path):
Plug.Registry().RegisterPlugins(path)
cls._physx_schemas_registered = True
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 _ensure_physx_schemas_registered has no call site

This method was added in this PR but is never called anywhere in the codebase (confirmed via repo-wide search). Its docstring explicitly states that in kitless runs "it must be registered manually before the wheel can match PhysxContactReportAPI and friends on the stage." Without a call in initialize() or _warmup_and_load(), kitless runs would silently skip schema registration and any USD stage with PhysxContactReportAPI, PhysxJointAPI, etc. would not resolve those schemas — causing silent failures when loading physics prims.

Comment on lines +496 to +497
layer = Sdf.Layer.FindOrOpen(target_file)
if layer is None:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 After sim_stage.Export(target_file), using Sdf.Layer.FindOrOpen(target_file) looks up the path in the Sdf layer registry first. On a re-warmup in the same process (where _tmp_dir is reused and the path is identical), the registry would return the cached layer with old content rather than the freshly exported file. Sdf.Layer.Reload() must be called on the found layer if it was already open, or the file should be opened anonymously to bypass the registry entirely.

Suggested change
layer = Sdf.Layer.FindOrOpen(target_file)
if layer is None:
# Use FindOrOpen and unconditionally reload so that a re-warmup in the
# same process (reusing the same tmp path) sees the freshly-exported file
# rather than a stale cached layer from the Sdf registry.
layer = Sdf.Layer.FindOrOpen(target_file)
if layer is not None:
layer.Reload()
if layer is None:

Comment on lines +300 to +307
try:
import os # noqa: PLC0415

import ovphysx # noqa: PLC0415

from pxr import Plug # noqa: PLC0415
except Exception:
return
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 import os is a standard-library module that will never fail to import. Including it inside the broad except Exception: return block means any ImportError from ovphysx or pxr.Plug silently suppresses schema registration, but the root cause (missing wheel or missing pxr) is completely hidden. Move import os outside the try block.

Suggested change
try:
import os # noqa: PLC0415
import ovphysx # noqa: PLC0415
from pxr import Plug # noqa: PLC0415
except Exception:
return
try:
import ovphysx # noqa: PLC0415
from pxr import Plug # noqa: PLC0415
except Exception:
return

@AntoineRichard AntoineRichard changed the title Snapshot env_0 stage pre-clone to skip OVPhysX warmup flatten [OvPhysX] Snapshot env_0 stage pre-clone to speed up OvPhysX warm-up May 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

isaac-lab Related to Isaac Lab team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant