[CI] Cross-platform — Part 4: Multi-GPU pytest workflow with cuda:1 coverage by hujc7 · Pull Request #5823 · isaac-sim/IsaacLab

hujc7 · 2026-05-28T01:30:37Z

1. Summary

Adds isaaclab.test.utils.test_devices() — resolves a test's parametrized device list as scope ∩ runtime: the call-site mask declares the devices a test is valid on (default "11X"); the ISAACLAB_TEST_DEVICES env var declares the devices a run may use (default "110" ⇒ cpu + cuda:0).
Adds ISAACLAB_SIM_DEVICE, honored by isaaclab.app.AppLauncher as the implicit-default boot device, so the workflow can boot Kit on a non-default GPU without editing every AppLauncher() call site.
Fixes Newton physics initialization on non-default CUDA devices ([Bug Report] Multi-GPU distributed training fails with Newton physics backend #5132): pins torch + Warp to the target device before allocation and graph capture, so mujoco_warp's collision pipeline no longer null-pointers on cuda:N.
New workflow test-multi-gpu-pytest.yaml runs the non-default-GPU subset across all non-default GPUs in parallel on the [self-hosted, …, multi-gpu] pool, one container per GPU, each a disjoint round-robin file slice. Same ECR-pulled isaac-lab image and run-tests actions as the single-GPU matrix.
Migrates 328 device parametrizations onto the helper — 268 argless test_devices() (device-dependent), 57 "110" (torch-only math, which then skip the non-default shards), 3 "00X" (non-default-only regressions) — and adds 23 unit tests for the helper. Single-GPU CI behavior is unchanged.

2. Device selection: `scope ∩ runtime`

A test declares scope; the operator/CI sets the runtime; the executed set is the intersection. The author never names the shard's GPU; the operator never inspects a test's device support.

from isaaclab.test.utils import test_devices

@pytest.mark.parametrize("device", test_devices())        # cpu + cuda:0 + a non-default GPU (the common case)
@pytest.mark.parametrize("device", test_devices("110"))   # cpu + cuda:0 only (torch-only math: cuda:0 == cuda:N)
@pytest.mark.parametrize("device", test_devices("00X"))   # a non-default GPU only (validates cuda:N behavior)
@pytest.mark.parametrize("device", test_devices("100"))   # cpu only

2.1 Mask grammar

Positions left to right: 0 = cpu, 1 = cuda:0 (default GPU), 2 = cuda:1, 3 = cuda:2, … Each is 0/1. A trailing X means "include all the remaining devices". cpu, cuda:0, and a non-default GPU stay distinct by position — passing on cuda:0 says nothing about cuda:1+, which is the point.

Mask	scope
`110`	cpu + cuda:0
`11X`	cpu + cuda:0 + every non-default GPU (the argless default)
`00X`	non-default GPUs only
`100`	cpu only

test_devices() defaults to "11X", so the device-agnostic common case needs no argument. The helper never reads ISAACLAB_SIM_DEVICE (that is AppLauncher's boot device); it assumes the run's ISAACLAB_TEST_DEVICES already matches it, which the workflow guarantees.

2.2 Runtime and the two CI lanes

ISAACLAB_TEST_DEVICES carries the runtime. A scope="11X" (or argless) test resolves to:

single-GPU CI (runtime unset ⇒ "110") ⇒ [cpu, cuda:0].
a multi-GPU shard (runtime "0001", one device) ⇒ [cuda:2].

An empty result means the test is cleanly skipped for that run — e.g. a "00X" test on a single-GPU host, or a "110" math test on a non-default-GPU shard. The helper raises only when the run explicitly set ISAACLAB_TEST_DEVICES to a device the host does not have (a misconfigured shard), so a runner with fewer GPUs than expected fails loudly instead of reporting a vacuous green.

Running each test once on a single non-default GPU is a property of the workflow (a one-device-per-shard runtime mask plus round-robin file assignment), not of the helper — there is no device-picking logic inside test_devices.

3. Workflow shape

config → build (ECR cache + GPU detect, emits the non-default shard list) → test (one docker run per cuda:N in parallel on the multi-GPU runner). Each shard sets a per-shard runtime mask naming exactly its GPU (cuda:2 ⇒ "0001") and ISAACLAB_SIM_DEVICE=cuda:N, so scope ∩ runtime resolves to that one GPU and Kit boots on the same one. Auto-discovery includes any test_*.py with an argless test_devices() or a test_devices("..X") scope. Inside each container, run-tests runs ./isaaclab.sh -p -m pytest via tools/conftest.py's subprocess-per-file pattern.

4. Single-GPU CI invariant

Migrated tests behave identically on the single-GPU pool: with ISAACLAB_TEST_DEVICES unset the runtime is "110", so an argless / "11X" scope resolves to [cpu, cuda:0] — the same set tests carried before. ISAACLAB_SIM_DEVICE unset ⇒ AppLauncher defaults to cuda:0. Both env vars only take effect on the multi-GPU workflow.

5. Known non-default-device gaps

Two Newton parametrizations (test_get_jacobians_link_origin_contract, test_body_q_consistent_after_root_write) expose a genuine non-default-device defect beyond [Bug Report] Multi-GPU distributed training fails with Newton physics backend #5132 (stale body_q / link poses on cuda:N). They stay pinned to cuda:0 for now; broadening is gated on the upstream fix.
The wider device fixture and the remaining literal parametrize("device", [...]) sites (~150) are not on the helper yet, so those tests still run cpu/cuda:0 on shards. Migrating them is a separate pass.

Re-enables the pull_request trigger on test-fabric-multi-gpu.yaml and wires it to run the FabricFrameView contract tests with ISAACLAB_TEST_MULTI_GPU=1, which activates the three cuda:1 -parameterised tests added in isaac-sim#5514. The cuda:1 tests target FabricFrameView's SelectPrims path on non-zero CUDA device indices. They currently hang indefinitely on real multi-GPU hardware (reproduced locally on 3x RTX 6000 Pro Blackwell and on the multi-GPU runner pool); the 60-min workflow timeout will cancel the job and surface the regression in CI for the FabricFrameView maintainers. Install pipeline matches isaac-sim#5738's proven-working layout: - Pin Python 3.12 via SHA-pinned actions/setup-python. - Pre-install cmake via pip to skip install.py's sudo apt-get branch. - ./isaaclab.sh --install none (core only, avoids egl_probe libEGL). - pip install isaacsim[all,extscache]==${vars.ISAACSIM_BASE_VERSION || '6.0.0'} --extra-index-url https://pypi.nvidia.com. - Bypass Kit's interactive EULA via OMNI_KIT_ACCEPT_EULA / ACCEPT_EULA / ISAAC_SIM_HEADLESS. Status: this PR is expected to fail with the 60-min workflow timeout. Land once the underlying hang in fabric_frame_view.py is fixed.

isaaclab-review-bot

Overall

This is a well-designed PR that introduces a clean, env-driven device parametrization helper for multi-GPU testing. The mask grammar is elegant and the documentation in the module docstring is excellent. The workflow setup and test migrations look correct.

However, there's a critical issue that will prevent the tests from running.

Critical: Missing `init.py`

The new isaaclab.testing package is missing its __init__.py file. Without it, Python won't recognize source/isaaclab/isaaclab/testing/ as a package, and all the imports like:

from isaaclab.testing import cuda_test_devices

will fail with ModuleNotFoundError.

Fix: Add source/isaaclab/isaaclab/testing/__init__.py with at minimum:

# Copyright (c) 2022-2026, The Isaac Lab Project Developers (https://github.com/isaac-sim/IsaacLab/blob/main/CONTRIBUTORS.md).
# All rights reserved.
#
# SPDX-License-Identifier: BSD-3-Clause

"""Testing utilities for Isaac Lab."""

from .devices import cuda_test_devices

__all__ = ["cuda_test_devices"]

Minor Suggestions

1. Workflow: Consider using matrix strategy for future extensibility

The pytest target list is explicit (good for control), but as you add more files in P1/P2, consider whether a matrix or glob pattern might be cleaner. Current approach is fine for now.

2. `devices.py` line 114: Consider caching available devices

Calling torch.cuda.device_count() on every test parametrization is cheap but redundant. A module-level cache (computed once at import) would be cleaner:

@functools.lru_cache(maxsize=1)
def _list_available_devices() -> tuple[str, ...]:
    ...

(Return tuple for hashability, or just compute at module level.)

3. Workflow line 57: Pin Isaac Sim version handling

The fallback '6.0.0' is hardcoded. If vars.ISAACSIM_BASE_VERSION is the source of truth, consider failing explicitly when unset rather than silently falling back, to avoid version drift.

4. Consider adding a simple unit test for `cuda_test_devices` itself

A few tests for the helper (mask parsing, strict vs non-strict, edge cases like empty mask, invalid chars) would catch regressions in the helper logic itself.

Summary

Blocking: Add the missing __init__.py for isaaclab.testing
Non-blocking: Minor polish suggestions above

Once the __init__.py is added, this looks good to merge. Nice work on the mask grammar design! 🎯

Adds a single helper, cuda_test_devices(), that converts a 3-position device mask (env-var ISAACLAB_TEST_DEVICES, default '110') into the list of device strings tests parametrize over. Single-GPU CI sees no change (default mask '110' resolves to [cpu, cuda:0], identical to the hardcoded lists tests carry today). The new multi-GPU-pytest workflow sets ISAACLAB_TEST_DEVICES=001 so migrated tests run on cuda:1 only. Mask grammar: each position is 0 or 1, optional trailing X expands to all remaining positions. Position 0 -> cpu; position k>=1 -> cuda:{k-1}. Strict mode raises on missing devices; non-strict returns empty for opt-in tests that should skip on hosts that can't satisfy them. P0 migration (pure-Python utility tests, no Kit): * source/isaaclab/test/utils/test_math.py: 45 parametrize sites + 2 inline for-loops migrated. * source/isaaclab/test/utils/test_wrench_composer.py: 37 sites. * source/isaaclab/test/utils/test_episode_data.py: 5 sites. Each migrated site replaces a hardcoded [cpu, cuda:0] (or the reversed or tuple form) with cuda_test_devices(). Migration is additive - one import line per file plus the inline edits. No test logic changes. Workflow: .github/workflows/test-multi-gpu-pytest.yaml runs on the [self-hosted, ..., multi-gpu] pool with ISAACLAB_TEST_DEVICES=001. Triggered on changes to the helper, the P0 test files, or the workflow itself. Excluded scope (to follow up after CI validates this MVP): * P1 light-Kit tests (test_simulation_context, test_views_xform_prim, test_newton_model_utils, test_views_xform_prim_newton). * P2 asset tests (test_articulation / test_rigid_object on physx and newton backends). * FabricFrameView cuda:1 tests (PR isaac-sim#5514) - separate path, the SelectPrims deadlock there is tracked independently. Reverts the fabric-specific .github/workflows/test-fabric-multi-gpu.yaml edits that were carried on this branch from the earlier PR scope; that demo is independent of this framework work.

Migrate 16 additional test files (P0 extras + P1 + P2 + P3) to call cuda_test_devices() in their device parametrize, covering ~280 sites across articulation/rigid-object/rigid-object-collection/sim/sensors suites for physx, newton, and ovphysx backends. Rewrite the workflow's run step to auto-discover any test file calling cuda_test_devices() via grep, so new opt-ins land without workflow edits. Files are split into a pure-Python pytest session and per-file Kit-bound invocations (Kit is a process-wide singleton). A hardcoded SKIP list parks the known-broken FabricFrameView cuda:1 path. Per-Kit-file timeout 600 bounds any single hang at 10 minutes so the job surfaces all failing files rather than blocking on the first.

isaaclab-review-bot

Overall Assessment

This is a well-designed PR that introduces an elegant, env-driven device parametrization system for multi-GPU testing. The mask grammar is intuitive (e.g., 110 → [cpu, cuda:0], 001 → [cuda:1]), the documentation is excellent, and the backward-compatible default ensures zero impact on existing single-GPU CI runs.

The PR successfully addresses:

✅ New isaaclab.testing module with cuda_test_devices() helper
✅ Auto-discovery workflow that picks up opt-in test files automatically
✅ Migration of 19 test files (~280 parametrize sites)
✅ Clear P1/P2 scope documentation for follow-ups

Findings

1. Missing Unit Tests for the Helper Function (Medium)

Location: source/isaaclab/isaaclab/testing/devices.py

The cuda_test_devices() function has non-trivial mask parsing logic with multiple edge cases (trailing X, strict vs non-strict, empty masks, invalid characters), but there are no dedicated unit tests for the helper itself.

Recommendation: Add a test file like source/isaaclab/test/testing/test_cuda_test_devices.py with cases covering:

Default mask resolution
Explicit mask parsing ("110", "001", "00X", "X")
Strict mode raising on unavailable devices
Non-strict mode returning empty list gracefully
Invalid mask characters raising ValueError

This protects against regressions in the helper logic itself.

2. Workflow SKIP List Pattern Could Be More Robust (Low) ✅ RESOLVED

Resolved in f66b101 — The new include-files approach is cleaner and avoids the heredoc pattern entirely.

3. Consider Caching Device List (Nit)

Location: source/isaaclab/isaaclab/testing/devices.py line 113

_list_available_devices() is called on every cuda_test_devices() invocation. While torch.cuda.device_count() is cheap, caching the result with @functools.lru_cache would be cleaner and more efficient for test collection.

Summary

This is a solid foundation for multi-GPU testing infrastructure. The design choices are well-reasoned and the migration is clean.

Actionable items:

(Recommended) Add unit tests for cuda_test_devices() to catch edge-case regressions
~~(Optional) Minor robustness improvements noted above~~ ✅ Resolved

Nice work on the mask grammar design! 🎯

Update (d6b2934): New commit adds explicit pip install pytest step to the workflow — a good defensive fix ensuring runner independence. No new issues. Previous suggestions remain optional improvements for future iterations.

Update (f1fa016): Good improvements in this commit:

Extended pip install to include pytest-mock and flaky with clear explanation
Added is_ok() helper treating pytest exit code 5 (no tests collected) as success — correct for device-gated tests
Extended SKIP list with 4 newton tests and improved documentation

No new issues. Previous optional suggestions (unit tests for helper, robustness improvements) remain relevant but non-blocking.

Update (042c409): This commit expands the skip list with well-documented categories explaining why each test is skipped (FabricFrameView deadlocks, SIGSEGV failures, cuda:1 hangs, etc.). Good practice to document the reasoning. Also adds test_simulation_context.py and physx rigid-object tests to the skip list. No new issues — this is maintenance housekeeping for multi-GPU stability.

Update (1546520): 🎉 Excellent fix! This commit addresses the root cause of the cuda:1 test failures:

app_launcher.py: Added ISAACLAB_SIM_DEVICE env var support to override the default device when not explicitly passed. Clean implementation with good documentation.
Workflow: Sets ISAACLAB_SIM_DEVICE=cuda:1 to boot Kit with active_gpu=1 from process start. This correctly targets PhysX/Warp to cuda:1 before SimulationApp locks it.
Result: 7 tests removed from the skip list and now run successfully on cuda:1!

This is a proper fix rather than just skipping problematic tests. No new issues introduced.

Update (feb8e21): Skip list maintenance — expanded documentation with categorized skip reasons:

Upstream Kit limits (FabricFrameView)
Pre-existing test/API breakage (newton contact_sensor)
Newton cuda:1 — Warp allocator failure in mujoco_warp collision driver
PhysX cuda:1 — hangs specific to AWS L40 runner (passes on other hardware)

Added 7 files back to skip list (4 Newton, 3 PhysX asset tests) with clear root-cause explanations. Good documentation practice for future debugging. No new issues.

Update (6eeda64): Added changelog entry (jichuanh-multi-gpu-ci.minor.rst) documenting the new cuda_test_devices() helper and ISAACLAB_SIM_DEVICE env var. Good housekeeping. No new issues.

Update (e3d328b): Major infrastructure improvement — switched from bare-runner installation to Docker-based execution:

Added config job that loads isaacsim image config from config.yaml via sparse checkout
Added ecr-build-push-pull action to pull the same per-commit CI image used by regular CI (dep matrices now match exactly)
Tests run inside container with workspace volume-mounted, removing need for pip install steps on runner
Cleaned up SKIP list comments (removed obsolete contact_sensor reference)
Proper symlink setup (rm -f _isaac_sim && ln -s /isaac-sim _isaac_sim) inside container

This is a solid approach — reusing the regular CI's Docker image ensures environment consistency and removes runner-specific pip quirks. Good fallback documentation if ECR cache misses. No new issues.

Update (3c5248d): Good documentation and consistency improvements:

Added clear comments explaining --entrypoint bash (overrides isaac-sim base image's default ENTRYPOINT that would swallow the script)
Added explanation for --ignore=tools/conftest.py (matches regular CI's behavior)
Consolidated ignore paths into ignore_args variable and applied to both pure-python and per-file pytest runs
Cleaner shell structure: --entrypoint bash + -c instead of inline bash -c

No new issues. Ready for merge. 🚀

Update (9ff909d): Fixes local image availability issue — when ecr-build-push-pull takes the deps-cache-hit path, it creates an ECR registry-side alias but leaves no image in local Docker. This commit adds an "Ensure image is local" step that:

Creates isolated DOCKER_CONFIG with credsStore disabled (workaround for broken runner credential helper)
Re-authenticates to ECR via EC2 IAM role
Explicitly pulls and tags the image for subsequent container steps

Good defensive fix with clear comments explaining the edge case. No new issues.

Update (6706690): Good fixes for container permission issues:

Added --user "$(id -u):$(id -g)" to run container as host uid:gid — fixes write permission failures on volume-mounted workspace
Created writable $HOME (/tmp/isaaclab-ci-home) with XDG_CACHE_HOME for Warp/numpy/pip caches
Well-commented: explains the issue (image's default isaaclab uid 1000 ≠ runner's github-runner user)

Proper solution for the permission mismatch. No new issues.

Update (328878a): Skip list expanded with ovphysx tests. Good documentation explaining the issue: module-level pytest.mark.skipif collects 0 items + 1 skip, which isaaclab.sh CLI wrapper interprets as exit 1 (doesn't distinguish from real failures). Added 3 ovphysx asset tests to skip list. Clean maintenance commit. No new issues.

Update (665f0c3): Added temporary run_docker_tests: 'false' override in build.yaml to free GPU runners during PR iteration. The TEMP comment is well-documented ("revert before final review"). ⚠️ Remember to revert this before merge — the comment is a good reminder. No other changes, no new issues.

Update (f66b101): 🎉 Excellent refactor! This commit makes major architectural improvements:

Changes:

New extra-env-vars input added to both run-tests and run-package-tests actions — clean mechanism to forward env vars into the container with proper parsing (handles comments, whitespace, KEY=value format)
Workflow refactored to use run-package-tests action instead of ~150 lines of inline bash. Much cleaner and more maintainable.
Separate build job added to pre-populate the ECR image before test job runs (mirrors the build→test split in build.yaml)
Include-list approach replaces the SKIP list pattern — explicitly listing 9 known-working tests is safer than auto-discovery with exclusions
Path triggers expanded to include action files, so workflow changes properly trigger CI

My previous Finding #2 (SKIP list robustness) is now obsolete — the new include-files approach is cleaner and avoids the heredoc pattern entirely. ✅

No new issues. This is a significant quality improvement that makes the multi-GPU workflow consistent with the rest of the CI infrastructure.

⚠️ Before merge: Still need to revert the run_docker_tests: 'false' in build.yaml (from 665f0c3).

Update (3c985eb): Minor update for FabricFrameView retest iteration:

Changes in build.yaml:

Updated TEMP comment: now focused on "FabricFrameView cuda:1 retest" (previously was "Multi-GPU pytest docker conversion")
Streamlined comment (removed redundant "freeing the single-GPU runner pool" explanation)
run_docker_tests: 'false' remains set — still iterating

Changes in test-multi-gpu-pytest.yaml:

Added test_views_xform_prim_fabric.py to the include-files list — attempting to enable the FabricFrameView test on cuda:1

This is a WIP commit testing whether the FabricFrameView deadlock issue (documented in earlier skip lists) has been resolved. ⚠️ run_docker_tests: 'false' still needs to be reverted before merge.

No new architectural issues. Just experimental test expansion.

Update (`bdcb527`): ✅ Final Cleanup Complete

The latest commit removes the temporary run_docker_tests: 'false' override that was used during iteration, restoring the proper change-detection logic:

-      run_docker_tests: 'false'
+      run_docker_tests: ${{ steps.detect.outputs.run_docker_tests }}

This was the expected final step — the temporary bypass was necessary to test the FabricFrameView cuda:1 retest path, and it's now correctly reverted.

Status: All outstanding items addressed. PR is ready for merge. 🚀

pytest is not pulled in by --install none or by isaacsim[all,extscache]. Runner state was masking this; pin it explicitly.

flaky and pytest-mock are declared in source/isaaclab/setup.py install_requires but pip's resolver was silently skipping them when combined with the pytorch/nvidia extra-index urls in the install step. Pin them explicitly so the multi-GPU runner is runner-state independent. SKIP four newton test files that the cuda:1 cold-runner surfaces as broken (test_contact_sensor hits a pre-existing measure_total kwarg bug; test_articulation segfaults; test_rigid_object_collection and test_views_xform_prim_newton have cuda:1 specific failures). They're still parametrized via cuda_test_devices() so single-GPU CI continues to cover cpu+cuda:0. Accept pytest exit code 5 (no tests collected) so module-level pytestmark skips (e.g. backend-availability gates in ovphysx) and device-only parametrize that resolves to [] on incompatible hosts both count as success.

Four additional test files surfaced cuda:1-specific failures or hangs on the multi-GPU runner: * test_simulation_context: passes test_init[cuda:1] then hangs the next parametrize variant, 10-min per-file timeout fires. * newton/test_rigid_object: 41 cuda:1 failures (out of 54). * physx/test_rigid_object: passes test_initialization[cuda:1-1] then hangs at [cuda:1-2] (env-count 2 on cuda:1). * physx/test_rigid_object_collection: same hang signature. They keep their cuda_test_devices() parametrize so single-GPU CI continues to exercise cpu+cuda:0; only multi-GPU CI skips them pending separate investigation.

When the caller doesn't pass device= explicitly, AppLauncher now falls back to the ISAACLAB_SIM_DEVICE env var (if set) instead of the hardcoded cuda:0 default. Kit's active_gpu / physics_gpu are process-global and locked after SimulationApp init, so per-test parametrize alone cannot retarget GPU selection once the app is up. Boot-time alignment is the only path that works. The multi-GPU pytest workflow now sets ISAACLAB_SIM_DEVICE=cuda:1 alongside ISAACLAB_TEST_DEVICES=001, so PhysX and Warp pin to cuda:1 from process start. Drops 7 entries from the SKIP list (5 cuda:1 hangs around active_gpu/cuda mismatch + 2 newton suites likely sharing the same root cause). Remaining SKIPs: * FabricFrameView (usdrt SelectPrims cuda:0-only, upstream Kit) * newton/contact_sensor (Newton PR isaac-sim#2135 measure_total rename, needs caller update in newton_manager.py — tracked separately).

Round-5 CI confirmed the AppLauncher fix unblocks test_simulation_context (was hanging at second parametrize, now 42 passed in 62 s). Other files surfaced separate, non-AppLauncher root causes that need independent fixes: * Newton suites (4 files): Warp allocator failure inside mujoco_warp.collision_driver on cuda:1. Reproduces locally on a 3-device MIG host; root cause is the Warp/mujoco_warp interaction, not AppLauncher routing. * PhysX suites (3 files): hang at test_initialization[cuda:1-2] only on the AWS multi-GPU runner. Passes in 11 s locally with the same code, so the hang is runner-specific (L40 driver / peer access / PCIe topology), not an IsaacLab bug. test_simulation_context stays in scope (the AppLauncher fix made it pass deterministically). FabricFrameView usdrt and contact_sensor Newton API rename remain in SKIP for their pre-existing root causes.

Replaces the pip-install path with ECR pull of the same isaac-lab CI image used by build.yaml. ECR auths via the runner EC2 instance's IAM role (no nvcr.io credentials required at PR-time), so fork PRs work without exposing NGC_API_KEY. Benefits: * Newton 1.2+ pre-installed in the image, fixing the contact_sensor measure_total kwarg mismatch without a manual pip pin. * Eliminates the 9 min cold pip-install step (image pull from ECR is tens of seconds when cached). * Dep matrix matches single-GPU CI exactly, so both gates surface the same kind of dep skew. If ECR cache misses (e.g. build.yaml hasn't completed first), the action falls back to local build; that path is slow and requires NGC_API_KEY. Validating ECR auth on the multi-gpu runner pool is the primary goal of this commit — drops contact_sensor from SKIP to test that the version skew is resolved.

The previous attempt missed: 1. The isaac-sim base image has an ENTRYPOINT that launches Kit, so bash -c '...' was passed as Kit's argv (Kit booted, my script never ran). Mirror run-tests action: --entrypoint bash + -c '...'. 2. tools/conftest.py's pytest_ignore_collect returns True for every file (subprocess-per-test runner), so pytest collects 0 items and exits. Pass --ignore=tools/conftest.py --ignore=source/isaaclab/test/install_ci, same as run-tests does.

ecr-build-push-pull only pulls locally on the exact-cache-hit path. On deps-cache-hit (registry-side alias) the image isn't local, so docker run fails with 'Unable to find image isaac-lab-ci:... locally' followed by an unauthed Docker Hub pull attempt. Explicit pull via the ECR URL covers all paths uniformly.

The ecr-build-push-pull action cleans up its temp DOCKER_CONFIG after running, so the docker login from inside the action does not persist. Re-authenticate via aws ecr get-login-password (works via the runner EC2's IAM role, no AWS creds in the workflow).

Runner's default ~/.docker/config.json declares a credential helper that fails with 'not implemented'. Mirror the same workaround the ecr-build-push-pull action uses: drop a fresh config with credsStore set to empty string, then docker login + pull work.

Image's default USER is isaaclab (uid 1000), which doesn't own the volume-mounted host workspace, so it can't ln -s _isaac_sim (perm denied) — falling back to PATH python3 which doesn't exist in the image, hence pytest exit 127.

Running container as host uid:gid means the image's default /root home is not writable, so Warp/numpy/pip cache writes hit PermissionError [Errno 13] '/root/.cache'. Mount a fresh tmp dir and point HOME + XDG_CACHE_HOME at it.

The docker image properly installs ovphysx, so module-level pytestmark.skipif now triggers (no backend init at multi-gpu) collecting 0 items / 1 skipped. isaaclab.sh's CLI wrapper translates that exit-5 to exit-1, breaking the workflow's is_ok() check. Skip the 3 ovphysx files here.

Per ~/.claude/skills/pr/ci-iteration-shortcut.md. All gated Docker + Tests jobs (single-GPU build/test matrix) skip via their existing if-gate. Revert before final review. PR 5823 iterates the multi-GPU pytest docker conversion; the heavy single-GPU matrix adds no signal to that work and costs 30+ runner minutes per push.

Replaces the 237-line custom workflow with the ~100-line shape used by single-GPU test jobs: pull image via ecr-build-push-pull, run pytest in container via run-package-tests + run-tests, let tools/conftest.py handle the Kit-singleton subprocess-per-test pattern. The 8 docker-runtime quirks I worked around in the previous attempt (ENTRYPOINT, conftest ignore, deps-cache pull, DOCKER_CONFIG cleanup, credsStore, uid mismatch, HOME, exit-5 propagation) are all already handled inside the run-tests action. No reinvention. Adds one input to run-tests + run-package-tests: extra-env-vars (multiline KEY=value), used here to inject ISAACLAB_TEST_DEVICES and ISAACLAB_SIM_DEVICE so the container's pytest parametrize and Kit boot align on cuda:1. Test scope: 9 opt-in basenames covering ~512 cuda:1 tests, same discovery scope as the previous attempt minus the 11 SKIPped files.

ecr-build-push-pull's deps-cache-hit path only creates a registry-side alias (no local pull). Without a prior build job that establishes the exact-commit tag in ECR, the test job's internal ecr-build-push-pull hits exact-cache-miss + deps-cache-hit and leaves no local image, so docker run fails with 'pull access denied'. Mirrors the build → test split that build.yaml already uses for the single-GPU matrix. Build job pre-populates the exact tag (via buildx imagetools create on deps-cache-hit, or full build on miss); test job's inner ecr-build-push-pull then hits exact-cache-hit and pulls locally via the action's existing 'Pull exact image' step.

This reverts commit 665f0c3.

isaaclab-review-bot

Update (`5d29bb0`): ✅ Ready for Merge

The final commit reverts the temporary `run_docker_tests: 'false' override — the exact action my previous review flagged as blocking. All outstanding items are now addressed:

✅ Missing __init__.py — Added in earlier commit
✅ **run_docker_tests: 'false' reverted** — This commit ✅ **Workflow refactored** to use run-package-tests` action (f66b101)
✅ Include-list approach replaces SKIP heredoc pattern
✅ Changelog fragment added

Summary

This PR is now complete and ready for merge. The multi-GPU testing infrastructure is solid:

Clean cuda_test_devices() helper with elegant mask grammar
ISAACLAB_SIM_DEVICE env var for AppLauncher device override
Proper integration with existing CI via run-package-tests action
9 test files migrated with known-working cuda:1 paths

No new issues introduced. Nice work! 🚀

Update (`d6d69c4`): ⚠️ Iteration Checkpoint — Not Merge-Ready

This commit re-introduces the temporary run_docker_tests: 'false' override in .github/workflows/build.yaml`. The PR author is iterating and has added a note:

"TEMP (revert before final review / before landing): force run_docker_tests=false while iterating PR #5823"

Status: The PR was previously ready for merge, but is now in an iteration state. This temporary flag must be reverted before merging — the author has clearly marked it as such.

📋 Blocking for merge:

Revert `run_docker_tests: 'false' override (same fix as commit 5d29bb0)

No functional regressions otherwise — this is expected iteration behavior.

Update (`ea919b7`): Module Relocation

This commit relocates the testing utilities from isaaclab.testing to isaaclab.test.utils:

Changes:

Renamed source/isaaclab/isaaclab/testing/ → source/isaaclab/isaaclab/test/utils/
Updated imports across 16 test files in isaaclab, isaaclab_newton, isaaclab_ovphysx, and isaaclab_physx packages
Workflow path trigger updated from source/isaaclab/isaaclab/testing/** → source/isaaclab/isaaclab/test/utils/**
Changelog fragment updated to reflect new import path

✅ This is a clean refactor that consolidates test utilities under the isaaclab.test namespace. All import statements are consistently updated.

📋 Blocking for merge:

Revert `run_docker_tests: 'false' override (status unchanged from previous update)

Otherwise, no new issues introduced.

Update (`0118ea7`): 🎯 Nice Refinements

This commit adds several quality-of-life improvements:

Changes:

1. run-tests/action.yml — Smarter env var parsing

Now preserves mid-line # characters (e.g., IMAGE_TAG=v1.0#nightly survives)
Only treats full-line comments (where first non-whitespace is #) as comments
Strips YAML block indent whitespace properly

2. test-multi-gpu-pytest.yaml — Auto-discovery of opt-in tests

Replaces hardcoded include-list with grep -rl 'cuda_test_devices' discovery
Adding a new test to multi-GPU CI now requires no workflow edit — just use the helper
Clean design: opt-in via code, not via workflow maintenance

3. cuda_test_devices() helper — New skip parameter

Added skip: dict[str, str] for per-device skips with reasons
Wraps skipped devices in pytest.param(..., marks=pytest.mark.skip(reason=...))
CI shows SKIPPED with reason instead of silent omission
Changed strict default from True to False (friendlier for CPU-only dev machines)

4. test_views_xform_prim_fabric.py — Migrated to helper

Replaced @pytest.mark.parametrize("device", ["cpu", "cuda:0"]) with cuda_test_devices()
Removed manual ISAACLAB_TEST_MULTI_GPU skipif decorators
Uses mask="001", strict=False for cuda:1-only tests

✅ All changes are clean refinements that improve maintainability and developer experience.

📋 Blocking for merge:

Revert run_docker_tests: 'false' override (status unchanged)

Nice incremental polish! 🔧

Update (`101a47a`): 🔧 Newton Device-Pinning & Test Tolerance Fixes

This commit addresses cuda:1 context issues in Newton manager and fixes device-specific test tolerances:

Changes:

1. newton_manager.py — Device context pinning (issue #5132)
Added explicit torch.cuda.set_device() + wp.set_device() calls in two critical locations:

start_simulation() — Before Warp/Newton allocations to ensure mujoco_warp collision pipeline allocates on the correct CUDA device
initialize_solver() — Mirrors the guard from start_simulation() for solver build and collision-pipeline init

This fixes null pointer returns when mujoco_warp allocates against a primary CUDA context that was never made current. The guard is idempotent if already pinned.

2. wp.ScopedCapture(device=device) — Explicit device for CUDA graph capture
Previously, CUDA graph capture didn't specify a device; now explicitly passes device to avoid capturing on wrong GPU.

3. Test tolerance fixes (test_rigid_object.py in both newton & physx packages)
Changed if device == "cuda:0" to if device.startswith("cuda") for tolerance selection. This ensures cuda:1 (and any other CUDA device) uses the relaxed GPU tolerance (1e-2) instead of the strict CPU tolerance (1e-5).

✅ Solid fixes:

Device-pinning guards reference the related issue (#5132)
Comments explain why the guards are needed (mujoco_warp context behavior)
Test tolerance fix is the correct pattern for any CUDA device
CUDA graph capture now device-aware

📋 Blocking for merge:

Revert run_docker_tests: 'false' override (status unchanged)

Otherwise, these are important correctness fixes for multi-GPU setups. 👍

greptile-apps · 2026-05-28T10:49:15Z

Greptile Summary

This PR adds isaaclab.testing.cuda_test_devices(), an env-driven helper that maps an ISAACLAB_TEST_DEVICES bitmask to a device list for pytest parametrization, and extends AppLauncher to respect ISAACLAB_SIM_DEVICE so multi-GPU CI can boot Kit on cuda:1 without editing every test's call site. A new workflow runs the cuda:1-parameterized subset on a [multi-gpu] runner pool using the same ECR-backed Docker infrastructure as the existing single-GPU matrix.

source/isaaclab/isaaclab/testing/devices.py — new module implementing the mask grammar (0/1/X per position) with strict/non-strict resolution; 19 test files migrated from hardcoded device lists to cuda_test_devices().
source/isaaclab/isaaclab/app/app_launcher.py — honors ISAACLAB_SIM_DEVICE when no explicit device= argument is passed, with the XR cpu override still taking precedence.
.github/actions/run-tests/action.yml — new extra-env-vars input (one assignment per line) forwarded into the Docker container; used by the multi-GPU workflow to inject the device-selection and simulation-device settings.

Confidence Score: 4/5

Safe to merge; the multi-GPU workflow and mask-resolution logic are well-designed and single-GPU CI behaviour is provably unchanged.

The core feature is correct and thoroughly tested. The strict=True default raises at collection time on hosts without cuda:0, changing the failure mode for migrated tests and potentially tripping up contributors on CPU-only machines. The env-var forwarding loop silently strips # from values, which is benign today but could cause data loss for future users of extra-env-vars. Neither issue affects the CI green path described in the PR.

source/isaaclab/isaaclab/testing/devices.py (strict-mode collection failure on CPU-only hosts) and .github/actions/run-tests/action.yml (env-var value stripping).

Important Files Changed

Filename	Overview
source/isaaclab/isaaclab/testing/devices.py	New module implementing the mask-grammar device resolver; well-documented and handles edge cases (wildcard X, empty mask, over-length bodies), but the strict=True default raises ValueError at collection time on hosts without cuda:0.
source/isaaclab/isaaclab/app/app_launcher.py	Adds ISAACLAB_SIM_DEVICE env-var override only when device was not explicitly passed; XR mode still correctly overrides to cpu afterwards. Logic is clean and non-breaking.
.github/actions/run-tests/action.yml	Adds extra-env-vars forwarding via a shell loop; comment-stripping (%%#*) and xargs whitespace trimming work for the current values but can silently corrupt future values containing # or single quotes.
.github/workflows/test-multi-gpu-pytest.yaml	New workflow correctly mirrors single-GPU structure; GPU count guard, concurrency cancellation, and opt-in include-files list are all present and correct.
.github/actions/run-package-tests/action.yml	Minimal pass-through of extra-env-vars to the inner run-tests action; change is correct.
source/isaaclab/isaaclab/testing/init.py	New package init; cleanly re-exports cuda_test_devices with all.
source/isaaclab/test/utils/test_episode_data.py	Migrated from hardcoded ["cuda:0", "cpu"] to cuda_test_devices(); change is mechanical and correct.
source/isaaclab/test/utils/test_math.py	Migrated all device parametrize sites to cuda_test_devices(); no logic changes.
source/isaaclab_newton/test/assets/test_articulation.py	Migrated to cuda_test_devices(); file intentionally excluded from multi-GPU workflow due to Newton/cuda:1 allocation issue tracked in #5132.
source/isaaclab_physx/test/assets/test_articulation.py	Migrated to cuda_test_devices(); excluded from multi-GPU workflow due to PhysX hang on AWS runner at cuda:1.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["pytest collection\n(parametrize decorator)"] --> B["cuda_test_devices()"]
    B --> C{"mask= arg\nprovided?"}
    C -- "No" --> D["read ISAACLAB_TEST_DEVICES\nenv var"]
    D --> E{"env var set?"}
    E -- "No" --> F["default mask = '110'"]
    E -- "Yes" --> G["use env var mask"]
    C -- "Yes" --> G
    F & G --> H["_list_available_devices()\n[cpu, cuda:0, cuda:1, ...]"]
    H --> I["_expand_mask(mask, len)"]
    I --> J{"strict=True and\nsurplus '1' bits?"}
    J -- "Yes" --> K["raise ValueError\nat collection time"]
    J -- "No" --> L["return filtered device list"]
    L --> M["pytest parametrizes\nover device list"]

    subgraph AppLauncher["AppLauncher._resolve_device_settings"]
        N{"device_explicit\npassed?"}
        N -- "No" --> O{"ISAACLAB_SIM_DEVICE\nset?"}
        O -- "Yes" --> P["device = env var value"]
        O -- "No" --> Q["keep default device"]
        N -- "Yes" --> Q
        P & Q --> R{"XR mode\nenabled?"}
        R -- "Yes" --> S["override to cpu"]
        R -- "No" --> T["proceed with device"]
    end

_{Reviews (1): Last reviewed commit: "Revert "TEMP: force run_docker_tests=fal..." | Re-trigger Greptile}

greptile-apps · 2026-05-28T10:49:19Z

+
+    Args:
+        mask: Optional explicit mask string following the grammar in the
+            module docstring. When ``None``, the helper reads the mask from
+            the ``ISAACLAB_TEST_DEVICES`` environment variable, defaulting
+            to ``"110"`` if the variable is unset.
+        strict: When ``True`` (the default), raise ``ValueError`` if the
+            mask requests devices the host does not have or if the resolved
+            list would be empty. When ``False``, silently truncate to what
+            the host can satisfy - callers using this in
+            ``pytest.mark.parametrize`` get a clean "no tests collected for


strict=True default raises at collection time on hosts without cuda:0

With the default mask "110" and strict=True, calling cuda_test_devices() on a host that reports only ["cpu"] (no CUDA) immediately raises ValueError during pytest collection — before any test runs. Before this change, the same tests (parametrized with ["cuda:0", "cpu"]) would collect normally; only the cuda:0 variant would fail at runtime. Tests like test_episode_data.py and test_math.py have no AppLauncher dependency, so they are plausible targets for collection on CPU-only dev machines. Passing strict=False to the default call-sites, or documenting the constraint, would preserve the old graceful-degrade behaviour for contributors without full GPU setups.

greptile-apps · 2026-05-28T10:49:19Z

+          # Caller-supplied extra env vars (one KEY=value per line).
+          if [ -n "$extra_env_vars" ]; then
+            while IFS= read -r line; do
+              line="${line%%#*}"
+              line="$(echo "$line" | xargs)"
+              [ -z "$line" ] && continue
+              key="${line%%=*}"


Comment-stripping and xargs trimming silently corrupt values containing # or single quotes

line="${line%%#*}" strips everything from the first # in a line, so a future caller passing a value like IMAGE_TAG=v1.0#nightly would silently become IMAGE_TAG=v1.0. Similarly, $(echo "$line" | xargs) collapses internal whitespace and fails on values containing unbalanced single quotes. For the two values used today (001 and cuda:1) this is harmless, but it is a latent trap for any future extra-env-vars user. Documenting the # restriction in the input description, or switching to a safer stripping approach, would prevent silent data loss.

* Re-applies the run_docker_tests='false' guard in build.yaml's changes job (per pr/ci-iteration-shortcut) so the single-GPU Docker + Tests matrix skips during this iteration. * Adds test_views_xform_prim_fabric.py to the multi-GPU include-files list. Previously SKIPped because pip-install rounds hung on a usdrt SelectPrims cuda:1 deadlock; the docker image carries a newer Kit, so the cuda:1 path is worth re-validating here. Must be reverted before final review.

The docker image's newer Kit resolves the usdrt SelectPrims cuda:1 deadlock that previously kept this file in the SKIP list (pip-install rounds hit it). Run 26587461494 passed: 36 passed, 3 skipped, 2 xfailed for test_views_xform_prim_fabric.py. This also restores build.yaml's changes detection (drops the temp TEMP-iteration skip).

Per ~/.claude/skills/pr/ci-iteration-shortcut.md. Keep the single-GPU Docker + Tests matrix disabled until iteration is over and the PR is ready to land. Revert as the last commit before merge.

Folds the helper into the existing isaaclab.test subpackage shape (sibling of isaaclab.test.benchmark, isaaclab.test.mock_interfaces) under a new isaaclab.test.utils subpackage. Drops the standalone isaaclab.testing folder, which was a new top-level namespace with no precedent. Import path: from isaaclab.test.utils import cuda_test_devices.

Implements Greptile P2.1 + P2.2 and consolidates the device-skip mechanism inside test files so the workflow needs no opt-in or opt-out edits. API: * cuda_test_devices() default strict=False — CPU-only dev hosts now collect the cpu variant cleanly instead of failing at pytest collection (Greptile P2.1). * cuda_test_devices(skip={device: reason}) — wraps unsupported variants in pytest.param(..., marks=pytest.mark.skip(reason=...)) so pytest still collects them and shows SKIPPED with the reason in CI output. Per-call granularity; reason co-located with the test. Workflow: * Auto-discovery via grep for cuda_test_devices callers; no SKIP list in the workflow. Adding/removing a test from multi-GPU scope is a test-file-only edit. run-tests action: * extra-env-vars parser now skips only full-line comments (no mid-line # stripping) and doesn't xargs-collapse whitespace (Greptile P2.2). Test file migrations: * 7 previously-SKIPped files (4 newton + 3 physx) now declare a module-level _CUDA_1_BROKEN dict with a tracking-issue URL and apply cuda_test_devices(skip=_CUDA_1_BROKEN) per parametrize site. * test_views_xform_prim_fabric.py migrated to the helper too (was using the legacy ISAACLAB_TEST_MULTI_GPU env var pattern), so auto-discovery picks it up.

Earlier rounds SKIPped 7 newton+physx files based on failures observed on the pip-install path or pre-docker rounds. The docker image carries newer Kit + Newton + Warp that already resolved 2 other categories (measure_total kwarg, FabricFrameView usdrt deadlock). Re-running the cuda:1 variants of these 7 files to see which actually still fail on the docker path.

Enable-all run (0118ea7) confirmed the docker image resolves the PhysX hangs and FabricFrameView/contact_sensor failures that earlier rounds suspected. Two narrow categories remain: * 4 newton files — Warp/mujoco_warp init-order on cuda:1 (issue isaac-sim#5132). Same root cause across all four; gated via module-level _NEWTON_5132 dict. * 1 PhysX test — test_rigid_body_no_friction[cuda:1-*] precision drift (1.8e-3 vs 1e-5 tolerance); gated via per-test _PHYSX_NO_FRICTION_CUDA1 dict. Everything else previously SKIPped now runs and passes on cuda:1 (test_articulation, test_rigid_object except no_friction, test_rigid_object_collection, test_views_xform_prim_fabric).

Add torch.cuda.set_device(device) + wp.set_device(device) at the top of NewtonManager.start_simulation and initialize_solver so mujoco_warp's collision pipeline allocates against an initialized cuda:N primary context. Also pass device=device to the standard- path wp.ScopedCapture (the relaxed-graph sibling already did this). Local repro confirms: isaaclab_newton/test/assets/test_rigid_object.py on cuda:1 was 41 failed; now 45 passed / 9 skipped / 0 failed. Also fixes the test_rigid_body_no_friction tolerance branch in both isaaclab_physx and isaaclab_newton test_rigid_object.py files. The author already documented GPU non-determinism and set tolerance = 1e-2 for cuda:0; the else branch fell through to the CPU-tight 1e-5 on cuda:1, where PhysX's GPU integrator drift is the same 1.8e-3 envelope. Gate on device.startswith('cuda') so all cuda devices share the same loose tolerance. Drops the temporary _NEWTON_5132 and _PHYSX_NO_FRICTION_CUDA1 skip dicts from the 5 test files now that the underlying bugs are fixed. Tracks: isaac-sim#5132.

isaaclab-review-bot

Update (`52da4a7`): 🔧 Container Isolation & Race Condition Fixes

This commit significantly hardens the parallel container execution by addressing race conditions and improving filesystem isolation.

Key Changes:

1. Pre-created _isaac_sim symlink on host

rm -f _isaac_sim && ln -s /isaac-sim _isaac_sim

Critical fix: Prevents parallel containers from racing on rm/ln operations on the shared workspace mount
The symlink is dangling on the host (no /isaac-sim path) but resolves correctly inside each container

2. Comprehensive per-shard runtime directories
New volume mounts for complete isolation:

$shard_runtime/isaac-sim/kit/cache → /isaac-sim/kit/cache
$shard_runtime/isaac-sim/kit/data → /isaac-sim/kit/data
$shard_runtime/isaac-sim/kit/logs → /isaac-sim/kit/logs
$shard_runtime/isaac-sim/cache → /isaac-sim/.cache
$shard_runtime/isaac-sim/computecache → /isaac-sim/.nv/ComputeCache
$shard_runtime/isaac-sim/config → /isaac-sim/.nvidia-omniverse/config
$shard_runtime/isaac-sim/data → /isaac-sim/.local/share/ov/data
$shard_runtime/isaac-sim/logs → /isaac-sim/.nvidia-omniverse/logs
$shard_runtime/isaac-sim/pkg → /isaac-sim/.local/share/ov/pkg

Each container now has fully isolated writable areas for Kit and Isaac-Sim state
Prevents cross-contamination of caches, logs, and configs between parallel shards

3. Additional environment variables

XDG_DATA_HOME=/tmp/isaaclab-ci-home/.local/share
USER="${host_user}"
LOGNAME="${host_user}"
OMNI_KIT_DISABLE_CUP=1
ISAAC_SIM_LOW_MEMORY=1
PYTHONUNBUFFERED=1
PYTHONIOENCODING=utf-8

OMNI_KIT_DISABLE_CUP=1: Disables Omniverse Connector Upload (reduces network I/O)
ISAAC_SIM_LOW_MEMORY=1: Memory-conscious mode for parallel execution
Python buffering settings ensure clean log capture

4. Minor cleanup

Removed hardcoded "4 GPUs" from comment (now generic)
unset HUB__ARGS__DETECT_ONLY added inside container
Explicit :rw suffixes on volume mounts for clarity

✅ Excellent improvements:

Race condition fixed: The symlink pre-creation is a crucial fix for parallel stability
Full isolation: Each shard is now a proper sandbox with no shared writable state
Memory-conscious: ISAAC_SIM_LOW_MEMORY=1 is smart for 3+ containers on one runner
Clean logging: Python unbuffered mode ensures real-time log capture

📋 Blocking for merge:

Revert run_docker_tests: 'false' override in build.yaml (still pending)

This commit addresses the practical realities of running parallel GPU containers. The filesystem isolation is now robust enough for production CI. 👍

📜 Previous review updates

Update (`9dc722d`): 🚀 Single-Runner Parallel Shard Architecture

This commit refactored the multi-GPU workflow from a matrix-based strategy (N runners) to parallel Docker containers on a single runner.

Key improvements:

Resource efficiency: 1 runner instance vs 3 — doesn't burn the GPU pool
Shared build cache: All shards share the same pulled image
Clean log grouping: ::group::shard cuda:N per container
Wait-all-then-aggregate pattern for clean failure handling

Update (`9cb9b32`): 🎯 Dynamic GPU Detection & Matrix Sharding

Transformed workflow to dynamic, matrix-based strategy that auto-adapts to runner's GPU count. (Now superseded by single-runner parallel approach.)

Update (`101a47a`): 🔧 Newton Device-Pinning & Test Tolerance Fixes

Device-pinning guards for Newton manager and test tolerance fixes for cuda:N devices.

Update (`5d29bb0`): 🎉 Ready for Merge

Reverted temporary run_docker_tests: 'false' override. (Note: has since been re-added for iteration.)

Update (`3941b42`): ✅ Test Dependency Install Fix

This commit addresses a runtime failure where pytest dependencies were missing inside the parallel shard containers.

Change:

./isaaclab.sh -p -m pip install pytest pytest-mock junitparser flatdict flaky "coverage>=7.6.1"

Why needed:

The inline docker run for parallel shards bypassed the run-tests action
That action normally installs these test deps as a hardcoded step
Without them, tools/conftest.py fails at collection time (imports junitparser)
Now each shard installs deps in parallel (~2s overhead, acceptable)

✅ Good fix: Mirrors the run-tests action's install line. Comment documents the rationale.

📋 Blocking for merge:

Revert run_docker_tests: 'false' override in build.yaml (still pending)

Previous inline comments status:

🔄 P2: strict=True default in devices.py — not addressed in this commit (documentation/behavior concern)
🔄 P2: Comment-stripping # in extra-env-vars — not addressed in this commit (latent bug concern)

These are P2 (non-blocking) suggestions that can be addressed separately.

Update (`612a019`): 📺 Live Streaming Output & Timeout Adjustment

Changes:

1. Timeout raised 30 → 45 minutes (diagnostic)

timeout-minutes: 45  # was 30

First buffered parallel run hit the 30-min wall without printing anything useful
Raised temporarily to capture real per-shard durations
Comment documents intent to tighten back to 30 once split is balanced

2. Live streaming per-shard output with tagging

(
  docker run ... 2>&1 | tee "${logfiles[$cuda]}" | stdbuf -oL sed "s/^/[cuda:$cuda] /"
  exit "${PIPESTATUS[0]}"
) &

Why this matters:

Previously output was fully buffered to logfiles — silent until job completion
Now: live [cuda:0], [cuda:1], [cuda:2] prefixed lines interleave in the workflow log
tee keeps clean per-shard logfiles for the grouped re-print at job end
stdbuf -oL ensures line-buffered output (no delayed batching)
exit "${PIPESTATUS[0]}" propagates docker's exit code, not sed's — a failing shard isn't masked

✅ Good improvements:

Debugging win: Interleaved tagged output makes parallel runs followable in real-time
Correct exit handling: PIPESTATUS[0] is the right pattern for preserving the producer's exit code through a pipe
Temporary nature documented: The 45-min timeout is explicitly flagged as diagnostic, not permanent

📋 Blocking for merge:

Revert run_docker_tests: 'false' override in build.yaml (still pending)

P2 inline comments (non-blocking, from earlier review):

strict=True default in devices.py — still open
Comment-stripping # in extra-env-vars — still open

Update (`ae7ddad`): ✅ Round-Robin File Partitioning

Changes:

1. Deterministic file distribution across shards

IFS=',' read -ra ALL_FILES <<< "$INCLUDE_FILES"
for i in "${!ALL_FILES[@]}"; do
  if [ $((i % SHARD_COUNT)) -eq "$idx" ]; then
    shard_files+=("${ALL_FILES[$i]}")
  fi
done
shard_include=$(IFS=,; echo "${shard_files[*]}")

Why this matters:

Before: Every shard received the full TEST_INCLUDE_FILES list and relied on TEST_SHARD_INDEX/TEST_SHARD_COUNT to filter at pytest-collection time
After: Each shard receives only its pre-partitioned slice via TEST_INCLUDE_FILES
Round-robin by position: shard 0 gets files 0,3,6,...; shard 1 gets 1,4,7,...; etc.
Eliminates duplicate test runs across shards

2. Simplified container env vars

Removed TEST_SHARD_INDEX and TEST_SHARD_COUNT — no longer needed
Container only sees TEST_INCLUDE_FILES with its pre-assigned subset

✅ Good fix:

Fixes redundant test execution (each file now runs on exactly one GPU)
Comment documents the intentional trade-off: correct but not duration-balanced
Duration-aware balancing acknowledged as future work

📋 Blocking for merge:

Revert run_docker_tests: 'false' override in build.yaml (still pending)

P2 inline comments (non-blocking):

strict=True default in devices.py — still open
Comment-stripping # in extra-env-vars — still open

Update (`1547a9c`): 🎯 test_devices() API Redesign — Scope ∩ Runtime Intersection

This commit introduces a significant API redesign for device parametrization, replacing cuda_test_devices() with a cleaner test_devices() function based on scope ∩ runtime_devices intersection.

Core Concept: Two Masks, Two Roles

# New API:
@pytest.mark.parametrize("device", test_devices("11X"))
def test_foo(device): ...

Scope (call-site, author-owned): The devices the test is valid on
Runtime Devices (env var, operator-owned): The devices the run may use

The intersection determines what actually runs:

Single-GPU CI: scope="11X" ∩ runtime="110" → [cpu, cuda:0]
Multi-GPU shard: scope="11X" ∩ runtime="00X" → [cuda:N] (one non-default)

Mask Grammar Changes

Mask	Meaning
`"110"`	cpu + cuda:0 (default scope & runtime)
`"11X"`	cpu + cuda:0 + any one non-default GPU
`"00X"`	A non-default GPU only
`"100"`	cpu only
`"001"`	cuda:1 specifically (rare)

Key innovation: Trailing X means any one non-default GPU (not all). It resolves to ISAACLAB_SIM_DEVICE when set, or the lowest-index available non-default GPU.

Workflow Integration

# Each shard now uses uniform runtime mask
runtime_devices="00X"  # Not per-shard "001", "002", etc.
-e ISAACLAB_TEST_DEVICES="$runtime_devices" \
-e ISAACLAB_SIM_DEVICE="cuda:$cuda" \

The X resolves to whichever GPU the shard was assigned via ISAACLAB_SIM_DEVICE.

API Simplification

Before (removed):

cuda_test_devices(mask=None, strict=False, skip=None)

After:

test_devices(scope="110", require_available=False)

Removed skip= parameter (use pytest.mark.skip instead)
Renamed strict → require_available (clearer intent)
scope replaces mask with clearer semantics

Mass Migration

All test files updated from cuda_test_devices() to test_devices("11X"):

test_newton_model_utils.py
test_simulation_context.py
test_views_xform_prim.py
test_math.py, test_noise.py, test_modifiers.py
test_wrench_composer.py
test_articulation.py (newton, physx, ovphysx)
test_rigid_object.py, test_rigid_object_collection.py
test_contact_sensor.py
test_views_xform_prim_fabric.py
And more...

Test Infrastructure Improvements

tools/conftest.py now includes device information in the per-test timing table:

per_test_time_table = PrettyTable(field_names=["Test", "Device", "Time (s)"])

Extracts device from test params (e.g., [size0-cuda:1]) for visibility.

Changelog

* Added :func:`isaaclab.test.utils.test_devices` to parametrize unit tests over
  a device set resolved as ``scope ∩ runtime_devices``

✅ Excellent design:

Clean separation: Test authors specify validity, operators specify availability
No per-shard customization: Uniform "00X" mask works across any GPU count
Backwards compatible for single-GPU: Default "110" scope matches old behavior
Self-documenting: "11X" clearly reads as "cpu, cuda:0, and one non-default GPU"

📋 Blocking for merge:

Revert run_docker_tests: 'false' override in build.yaml (still pending)

P2 inline comments (non-blocking, from earlier review):

Comment-stripping # in extra-env-vars — still open

Update (`a1fa279`): 🔄 test_devices() API v2 — All Non-Default GPUs & Argless Default

This commit evolves the test_devices() helper with cleaner semantics and per-shard device pinning.

Key Semantic Change: Trailing `X` Now Means "All Remaining Devices"

Before (v1): X = "any one non-default GPU" (resolved via ISAACLAB_SIM_DEVICE)
After (v2): X = "include all remaining devices"

# v1: scope="11X" → cpu + cuda:0 + any-one-of(cuda:1, cuda:2, ...)
# v2: scope="11X" → cpu + cuda:0 + cuda:1 + cuda:2 + ... (all of them)

The "run-once" property is now achieved through per-shard runtime masks in CI, not via the scope itself.

New Default: Argless `test_devices()`

# New idiomatic usage (default scope="11X"):
@pytest.mark.parametrize("device", test_devices())  # cpu + cuda:0 + all non-default GPUs
def test_foo(device): ...

# Pure math tests that don't need multi-GPU coverage:
@pytest.mark.parametrize("device", test_devices("110"))  # cpu + cuda:0 only
def test_math_helper(device): ...

Workflow: Per-Shard Precise Masks

# Old v1 approach:
runtime_devices="00X"  # uniform across shards, X resolved via ISAACLAB_SIM_DEVICE

# New v2 approach:
runtime_devices=$(python3 -c "print('0' * ($cuda + 1) + '1')")
# cuda:1 → "001", cuda:2 → "0001", etc.

Each shard now receives a position-specific mask that names exactly one device. The scope ∩ runtime intersection then yields that single GPU.

Implementation Simplification

Removed:

_target_nondefault() — no longer resolving "any one"
_select() — complex mask-to-set conversion

Added:

_expand(mask, count) — simple mask expansion to boolean list
skip={device: reason} parameter — gate known-broken device variants with pytest.mark.skip

Error Handling: Skip vs Raise

# Scope that doesn't intersect runtime → silent skip (green)
# e.g., "110" test on a non-default-GPU shard

# Runtime explicitly names devices host doesn't have → raise (red)
# e.g., ISAACLAB_TEST_DEVICES="0001" on a 2-GPU host

This prevents vacuous greens from misconfigured runs while allowing legitimate skips.

Test Migrations

Changed to argless test_devices():

test_newton_model_utils.py
test_simulation_context.py
test_views_xform_prim*.py
test_wrench_composer.py
test_articulation.py, test_rigid_object*.py
test_contact_sensor.py

Changed to explicit test_devices("110"):

test_math.py, test_noise.py, test_modifiers.py (pure-math tests)
test_episode_data.py

New: Comprehensive Unit Tests

source/isaaclab/test/utils/test_device_selection.py — 148 lines of tests covering:

Scope ∩ runtime resolution
Argless equals default scope
Skip vs raise semantics
skip= parameter wrapping
_expand() mask grammar

✅ Excellent evolution:

Cleaner mental model: X = "the rest" is more intuitive than "pick one"
Simpler implementation: No more ISAACLAB_SIM_DEVICE resolution logic in the helper
Per-shard precision: Workflow owns the "which GPU" decision, not the test helper
Comprehensive tests: The new test file is well-structured and covers edge cases

📋 Blocking for merge:

Revert run_docker_tests: 'false' override in build.yaml (still pending)

P2 inline comments (non-blocking):\n- Comment-stripping # in extra-env-vars — still open

EXPERIMENTAL — may be reverted if observed runtime or balance is worse than the serial baseline. Replaces the single-job 39-min serial test with a dynamic matrix of parallel jobs, one per non-default cuda:N device on the runner. Mechanism: * build job runs nvidia-smi on a multi-gpu runner, emits cuda_shards (JSON array, e.g. [1, 2, 3]) and shard_count (int) as outputs. * test job matrix consumes cuda_shards via fromJSON, so the partition follows whatever the pool exposes (4 GPUs today; auto-adapts to 2 or 8 later without code edits). * Each shard pins ISAACLAB_SIM_DEVICE=cuda:N + ISAACLAB_TEST_DEVICES to the matching mask, and requests shard-index=N-1 / shard-count from tools/conftest.py's file-shard logic. Expected wall on 4-GPU pool: ~13 min (max single-shard duration with naive paths[idx::3] split). Was ~39 min serial. Risks for monitoring: * Naive shard split is by sorted file name, not duration. Worst-case shard could pack the two longest files (530 s + 360 s) together. * Per-test parametrize is now ONE cuda:N per file rather than all of cuda:1/2/3 — distributes coverage instead of duplicating. Catches the same non-default-cuda defect class but loses 3x coverage on individual files. Revert path: drop this commit (drops the build outputs, the matrix strategy, and the per-shard env computation). Workflow reverts to a single-job test on cuda:1.

Matrix-of-shards allocated N separate runners from the multi-gpu pool even though each shard only uses 1 GPU and the runner exposes 4. Replaces the matrix with a single job that backgrounds one `docker run` per non-default cuda:N and waits for all. Same shard-index/shard-count semantics into tools/conftest.py, same ISAACLAB_TEST_DEVICES + ISAACLAB_SIM_DEVICE env per shard, just no extra runner. Per-shard logs grouped at the end; exit code aggregates across all shards. Trade: more bash in the workflow (the run-tests action can't be backgrounded as a composite step), but uses 1/N runners and pulls the image once instead of N times.

Prev attempt hit two issues that the run-tests action handles: 1. Symlink race — all 3 shards did rm/ln on the shared workspace's _isaac_sim symlink simultaneously, leaving cuda:1 without the symlink and forcing isaaclab.sh to fall back to python3 (not in path, exit 127). Now pre-create the symlink on the host once, before launching containers. 2. ModuleNotFoundError: junitparser — missing USER, LOGNAME, XDG_DATA_HOME env vars and the per-shard isaac-sim writable overlays that run-tests action sets. Without them the bundled python's import resolution went sideways. Mirror the full run-tests volume + env setup per shard.

The custom inline docker run bypassed run-tests action's hardcoded pip install of pytest pytest-mock junitparser flatdict flaky coverage, so the bundled python in the image (which does not include these) failed to import junitparser at tools/conftest.py load time. Mirror the run-tests action's pip install line inside the shard's in-container script. Each shard pays the install cost once (~2 s) in parallel.

The prior buffered design wrote each shard to a logfile and only cat'd them after wait, so when the job hit the 30-min timeout it had printed nothing but the launch notices — zero diagnostic signal. Each shard now pipes through 'tee clean.log | stdbuf -oL sed "s/^/[cuda:N] /"': the three concurrent streams interleave but every line is tagged with its GPU, so progress is followable live in the workflow log. tee keeps the unprefixed per-shard file for the end-of-job grouped re-print. The subshell exits with PIPESTATUS[0] (docker's code), so a passing sed tail can't mask a failing shard. timeout 30 -> 45 (DIAGNOSTIC): the first parallel run ran ~24.5 min of shard execution without finishing. Raised so a slow-but-completing run yields real per-shard durations to judge contention vs imbalance; tighten back to 30 once the picture is clear.

The shards passed the full include-files list plus TEST_SHARD_INDEX/ COUNT, but tools/conftest.py disables its shard split when include-files is set — so every shard ran the entire migrated suite (3x total work, ~40 min) instead of partitioning. Partition the discovered file list in the workflow instead: shard idx takes basenames at positions idx, idx+N, ... and passes only that slice as TEST_INCLUDE_FILES (dropping the now-ignored shard-index/ count env). Each file runs on exactly one GPU. Verified the split is deterministic with no duplicate/missing files. The round-robin split is correct but not load-balanced; duration-aware / work-stealing balancing is tracked as a follow-up.

pbarejko · 2026-05-29T17:49:19Z

+        Ordered list of device strings as torch would address them.
+    """
+    devices = ["cpu"]
+    if torch.cuda.is_available():


Why are we using torch api here? This smells like tech debt. Shouldn't we switch to use warp? i.e.:

>>> import warp >>> warp.init() Warp 1.12.0 initialized: CUDA Toolkit 12.9, Driver 13.0 Devices: "cpu" : "x86_64" "cuda:0" : "NVIDIA RTX PRO 6000 Blackwell Workstation Edition" (95 GiB, sm_120, mempool enabled) Kernel cache: /home/pbarejko/.cache/warp/1.12.0 >>> warp.get_cuda_devices() ['cuda:0']

Checked. using warp requires initializing runtime which might have side-effect to kit. torch does not have that.

pbarejko · 2026-05-29T17:53:10Z

    ],
 )
-@pytest.mark.parametrize("device", ["cuda:0", "cpu"])
+@pytest.mark.parametrize("device", cuda_test_devices())


Based on function name cuda_test_devices have we started to exclude cpu device tests. If so - why?

renamed to test_devices

pbarejko · 2026-05-29T17:56:54Z

+so adopting this helper has zero impact on single-GPU runs.
+"""
+
+_ENV_VAR = "ISAACLAB_TEST_DEVICES"


Name of this global variable makes no sense, besides it's used in one place only. This unnecessary indirection makes code harder to read.

pbarejko · 2026-05-29T18:01:18Z

-    reason="Multi-GPU tests disabled (set ISAACLAB_TEST_MULTI_GPU=1 to enable)",
-)
-@pytest.mark.parametrize("device", ["cuda:1"])
+@pytest.mark.parametrize("device", cuda_test_devices(mask="001", strict=False))


I don't understand the premise behind strictness in this function. I read it as maybe run on GPU 1, if unavailable then run on what's available. This introduces testing ambiguity, Just by looking at this line I don't know what you are trying to test.

In my opinion we would be better without cuda_test_devices function and keep list explicit, instead of resolved at runtime. In testing you want to keep constraints explicit.

The isaaclab_newton non-default-device fix and the isaaclab_physx / isaaclab_ovphysx test migrations had no changelog fragments, failing the changelog-fragments check. Add a Fixed entry for newton and .skip markers for the test-only packages.

test_devices() (renamed from cuda_test_devices) resolves the parametrized device list as scope intersect budget: the call-site mask declares the devices a test is valid on; the ISAACLAB_TEST_DEVICES env var declares the devices a run may use (default 110 = cpu + cuda:0). A trailing X means 'any one non-default GPU', resolved to ISAACLAB_SIM_DEVICE, so every shard shares one uniform budget instead of a per-shard positional mask. Replaces the strict flag with require_available. Migrate the call sites to scope masks (11X / 00X) and simplify the multi-GPU workflow auto-discovery and per-shard env accordingly.

Address review: the single-use _ENV_VAR constant named the ISAACLAB_TEST_DEVICES env var but its name conveyed nothing. Rename it _RUNTIME_DEVICES_ENV_VAR (and _DEFAULT_BUDGET -> _DEFAULT_RUNTIME_DEVICES), and use 'runtime devices' for the env-controlled device set throughout the docstrings, changelog, and workflow, so the scope (call-site) vs runtime-devices (env) split reads consistently.

The end-of-run summary showed only per-file timing. Add a GPU column to the per-file table (the run's boot device) and a new per-test run-time table (slowest first, with each test's device parsed from its id), and rename the mislabeled 'Per Test Result Summary' to 'Per File Result Summary'.

Rework the device-selection helper from scope/budget masks with an "any one non-default GPU" X wildcard (resolved via ISAACLAB_SIM_DEVICE) to a plain scope ∩ runtime intersection: - scope (call-site mask, default "11X") and runtime (ISAACLAB_TEST_DEVICES, default "110") are each expanded to per-device include flags and ANDed. - A trailing X includes all remaining devices; the helper no longer reads ISAACLAB_SIM_DEVICE. Running once on a non-default GPU is a property of the one-device-per-shard runtime the workflow sets, not of the helper. - Raise only when the runtime names a device the host lacks (a misconfigured run); a scope that simply does not intersect the run's devices skips. Migrate call sites to the convention: argless test_devices() for the device-dependent common case, "110" for torch-only math tests (which then skip the non-default shards), "00X" for non-default-only regressions. The multi-GPU workflow now sets a per-shard one-device runtime mask and its file discovery matches argless calls. Add unit tests covering the intersection, the skip-versus-raise boundary, the skip= gating, and the mask grammar.

github-actions Bot added isaac-lab Related to Isaac Lab team infrastructure labels May 28, 2026

isaaclab-review-bot Bot requested changes May 28, 2026

View reviewed changes

hujc7 force-pushed the jichuanh/multi-gpu-ci branch from c0b18ca to 30667b8 Compare May 28, 2026 01:48

isaaclab-review-bot Bot reviewed May 28, 2026

View reviewed changes

hujc7 added 18 commits May 28, 2026 04:25

Install pytest in multi-GPU pytest workflow

d6b2934

pytest is not pulled in by --install none or by isaacsim[all,extscache]. Runner state was masking this; pin it explicitly.

Add changelog fragment for multi-GPU CI helpers

6eeda64

Run docker container as host uid:gid

c45604c

Image's default USER is isaaclab (uid 1000), which doesn't own the volume-mounted host workspace, so it can't ln -s _isaac_sim (perm denied) — falling back to PATH python3 which doesn't exist in the image, hence pytest exit 127.

Point HOME / XDG_CACHE_HOME at a writable tmp dir

6706690

Running container as host uid:gid means the image's default /root home is not writable, so Warp/numpy/pip cache writes hit PermissionError [Errno 13] '/root/.cache'. Mount a fresh tmp dir and point HOME + XDG_CACHE_HOME at it.

Revert "TEMP: force run_docker_tests=false while iterating multi-GPU CI"

5d29bb0

This reverts commit 665f0c3.

isaaclab-review-bot Bot reviewed May 28, 2026

View reviewed changes

hujc7 changed the title ~~[CI] Multi-GPU pytest framework + P0 unit-test migration~~ [CI] Gate cuda:1 unit-test regressions via new multi-GPU pytest workflow May 28, 2026

hujc7 marked this pull request as ready for review May 28, 2026 10:43

hujc7 requested a review from kellyguo11 as a code owner May 28, 2026 10:43

greptile-apps Bot reviewed May 28, 2026

View reviewed changes

hujc7 marked this pull request as draft May 28, 2026 16:13

hujc7 added 9 commits May 28, 2026 16:22

TEMP: re-apply run_docker_tests=false until PR 5823 lands

d6d69c4

Per ~/.claude/skills/pr/ci-iteration-shortcut.md. Keep the single-GPU Docker + Tests matrix disabled until iteration is over and the PR is ready to land. Revert as the last commit before merge.

Wrap long line in PhysX no-friction skip reason

c37b0f9

isaaclab-review-bot Bot reviewed May 28, 2026

View reviewed changes

hujc7 changed the title ~~[CI] Gate cuda:1 unit-test regressions via new multi-GPU pytest workflow~~ [CI] Cross-platform — Part 4: Multi-GPU pytest workflow with cuda:1 coverage May 28, 2026

hujc7 added 6 commits May 29, 2026 00:40

pbarejko reviewed May 29, 2026

View reviewed changes

hujc7 added 5 commits May 29, 2026 20:04

hujc7 mentioned this pull request May 30, 2026

[CI] Cross-platform — Part 5: Dynamic work-stealing across multi-GPU shards #5875

Draft

Conversation

hujc7 commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

1. Summary

2. Device selection: scope ∩ runtime

2.1 Mask grammar

2.2 Runtime and the two CI lanes

3. Workflow shape

4. Single-GPU CI invariant

5. Known non-default-device gaps

Uh oh!

isaaclab-review-bot Bot left a comment

Choose a reason for hiding this comment

Overall

Critical: Missing __init__.py

Minor Suggestions

1. Workflow: Consider using matrix strategy for future extensibility

2. devices.py line 114: Consider caching available devices

3. Workflow line 57: Pin Isaac Sim version handling

4. Consider adding a simple unit test for cuda_test_devices itself

Summary

Uh oh!

isaaclab-review-bot Bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Overall Assessment

Findings

1. Missing Unit Tests for the Helper Function (Medium)

2. Workflow SKIP List Pattern Could Be More Robust (Low) ✅ RESOLVED

3. Consider Caching Device List (Nit)

Summary

Update (bdcb527): ✅ Final Cleanup Complete

Uh oh!

isaaclab-review-bot Bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Update (5d29bb0): ✅ Ready for Merge

Summary

Update (d6d69c4): ⚠️ Iteration Checkpoint — Not Merge-Ready

Update (ea919b7): Module Relocation

Changes:

Update (0118ea7): 🎯 Nice Refinements

Changes:

Update (101a47a): 🔧 Newton Device-Pinning & Test Tolerance Fixes

Changes:

Uh oh!

greptile-apps Bot commented May 28, 2026

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Flowchart

Uh oh!

greptile-apps Bot May 28, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot May 28, 2026

Choose a reason for hiding this comment

Uh oh!

isaaclab-review-bot Bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Update (52da4a7): 🔧 Container Isolation & Race Condition Fixes

Key Changes:

Update (9dc722d): 🚀 Single-Runner Parallel Shard Architecture

Update (9cb9b32): 🎯 Dynamic GPU Detection & Matrix Sharding

Update (101a47a): 🔧 Newton Device-Pinning & Test Tolerance Fixes

Update (5d29bb0): 🎉 Ready for Merge

Update (3941b42): ✅ Test Dependency Install Fix

Change:

Update (612a019): 📺 Live Streaming Output & Timeout Adjustment

Changes:

Update (ae7ddad): ✅ Round-Robin File Partitioning

Changes:

Update (1547a9c): 🎯 test_devices() API Redesign — Scope ∩ Runtime Intersection

Core Concept: Two Masks, Two Roles

Mask Grammar Changes

Workflow Integration

API Simplification

Mass Migration

Test Infrastructure Improvements

hujc7 commented May 28, 2026 •

edited

Loading

2. Device selection: `scope ∩ runtime`

Critical: Missing `init.py`

2. `devices.py` line 114: Consider caching available devices

4. Consider adding a simple unit test for `cuda_test_devices` itself

isaaclab-review-bot Bot left a comment •

edited

Loading

Update (`bdcb527`): ✅ Final Cleanup Complete

isaaclab-review-bot Bot left a comment •

edited

Loading

Update (`5d29bb0`): ✅ Ready for Merge

Update (`d6d69c4`): ⚠️ Iteration Checkpoint — Not Merge-Ready

Update (`ea919b7`): Module Relocation

Update (`0118ea7`): 🎯 Nice Refinements

Update (`101a47a`): 🔧 Newton Device-Pinning & Test Tolerance Fixes

isaaclab-review-bot Bot left a comment •

edited

Loading

Update (`52da4a7`): 🔧 Container Isolation & Race Condition Fixes

Update (`9dc722d`): 🚀 Single-Runner Parallel Shard Architecture

Update (`9cb9b32`): 🎯 Dynamic GPU Detection & Matrix Sharding

Update (`101a47a`): 🔧 Newton Device-Pinning & Test Tolerance Fixes

Update (`5d29bb0`): 🎉 Ready for Merge

Update (`3941b42`): ✅ Test Dependency Install Fix

Update (`612a019`): 📺 Live Streaming Output & Timeout Adjustment

Update (`ae7ddad`): ✅ Round-Robin File Partitioning

Update (`1547a9c`): 🎯 test_devices() API Redesign — Scope ∩ Runtime Intersection

Update (`a1fa279`): 🔄 test_devices() API v2 — All Non-Default GPUs & Argless Default

Key Semantic Change: Trailing `X` Now Means "All Remaining Devices"

New Default: Argless `test_devices()`

pbarejko May 29, 2026 •

edited

Loading

pbarejko May 29, 2026 •

edited

Loading

pbarejko May 29, 2026 •

edited

Loading