Prevents early numpy imports to avoid Kit crash#5620
Conversation
Greptile SummaryThis PR contains two debugging-only changes that appear unintentional for a merge to
Confidence Score: 2/5Not safe to merge — both changes are debugging artefacts that actively break the CI pipeline and test runner. The pytest command in
Important Files Changed
|
| build: | ||
| name: Build Base Docker Image | ||
| runs-on: [self-hosted, gpu] | ||
| runs-on: [self-hosted, gpu-stg] |
There was a problem hiding this comment.
The
build job runner label was changed from gpu to gpu-stg, routing all Docker-image builds to the staging GPU runner pool. If this was intentional for debugging purposes only, it should be reverted before merging to develop — merging as-is means production CI will continue to run on staging hardware.
There was a problem hiding this comment.
Review Summary
This PR makes two infrastructure/debugging changes:
- Runner change (
.github/workflows/build.yaml):gpu→gpu-stg - Pytest verbosity (
tools/conftest.py): Added-s -vvflags
⚠️ Concerns
PR Metadata Issues
- Incomplete description: The PR body uses template placeholders without actual content explaining the purpose
- No linked issue: "Fixes #" is empty
- Unchecked checklist: All items remain unchecked
- Vague title: "Debugging" doesn't describe the actual changes
Technical Concerns
Runner Change (gpu → gpu-stg)
- Is
gpu-stga staging/test runner? Please clarify the intent - Will this affect CI reliability if staging runners have different availability or configuration?
- Is this intended to be temporary for debugging, or permanent?
Pytest Flags (-s -vv)
- Adding
-sdisables output capture, which will significantly increase CI log verbosity -vv(very verbose) will further increase output volume- Impact: All CI test runs will produce much larger logs, which may:
- Slow down CI execution
- Make it harder to identify actual failures in verbose output
- Increase storage/bandwidth for CI artifacts
🔍 Questions for Author
- Is this PR intended for debugging purposes only? If so, consider marking it as draft or adding
[WIP]to the title - Should these changes be reverted after debugging? If yes, please note that in the PR description
- What issue or problem are you debugging? This context would help reviewers understand the necessity of these changes
📋 Recommendations
- Update PR description with actual context explaining the debugging purpose
- Consider if verbose pytest output should be permanent or temporary
- Clarify the runner change rationale
- If temporary, consider using a feature flag or environment variable instead of hardcoding debug flags
Waiting for clarification before approving.
Update (8447f64): New commits add PYTHONFAULTHANDLER=1 and expand PYTHONUNBUFFERED=1 across multiple files (action.yml, Dockerfile.installci, utils.py, conftest.py, run_install_ci.py). These are reasonable diagnostic enhancements for better crash tracebacks and real-time output streaming. ✅ No concerns with these additions.
However, original concerns remain unaddressed:
gpu-stgrunner change still present-s -vvverbose pytest flags still present
Still awaiting clarification on whether these debugging changes are temporary or intended for permanent merge.
Update (943d9fa): Runner label changed again: [self-hosted, gpu-stg] → [self-hosted] (removes GPU requirement entirely for build job). [self-hosted, gpu].
Original concerns still apply:
-s -vvverbose pytest flags still present- PR metadata still incomplete
Awaiting clarification on intended runner configuration.
Update (c08d001): Pytest verbosity reduced from -vv to -v. ✅ This partially addresses the verbose output concern — single verbose (-v) is more reasonable than very verbose (-vv).
Remaining concerns:
-sflag still disables output capture (increased log size)- Runner set to
[self-hosted](no GPU label) — clarification needed - PR metadata still incomplete
Still awaiting clarification on runner configuration and whether these changes are temporary.
Update (bc8d650): Added thread count constraints (OPENBLAS_NUM_THREADS=1, OMP_NUM_THREADS=1, MKL_NUM_THREADS=1) across action.yml, utils.py, conftest.py, and run_install_ci.py. ✅ These are reasonable additions to prevent thread oversubscription during parallel test execution — no concerns with these changes.
Remaining concerns unchanged:
-sflag still disables pytest output capture- Runner configuration still needs clarification
- PR metadata incomplete
Update (9b90222): New lazy import implementation for isaaclab.app:
- Replaced
lazy_export()fromisaaclab.utils.modulewith a custom__getattr__using onlyimportlib - Added new test file
test_app_imports.pyverifying no numpy/torch/scipy loaded on import
✅ This is a positive improvement — avoids pulling in heavy numeric dependencies when importing isaaclab.app, which is important for fast startup before Kit initializes.
Remaining concerns unchanged:
-sflag still disables pytest output capture- Runner
[self-hosted](no GPU) — clarification needed - PR metadata still incomplete
Update (f438131):
isaaclab/app/__init__.py: Switched back to usinglazy_export()fromisaaclab.utils.module(which may pull in heavier dependencies)- Deleted
test_app_imports.py— the test that verified no numpy/torch/scipy were loaded on import is now gone
This removes the isolation benefit from the previous commit. If lightweight imports before Kit startup are important, consider keeping the standalone __getattr__ implementation.
Remaining concerns unchanged:
-sflag still disables pytest output capture- Runner
[self-hosted](no GPU) — clarification needed - PR metadata still incomplete
Update (67d10a4): PR scope significantly reduced. Current diff now only contains:
source/isaaclab/isaaclab/utils/__init__.py: Removed explicitconfigclassimport (now lazy-loaded viaattach_stub) — ✅ reasonable change for lazy import consistencytools/conftest.py: Added-s -vpytest flags —⚠️ previous concern still applies (increases log verbosity)
The workflow/runner changes and environment variable additions appear to have been removed from this PR.
Current status: Only the pytest verbosity flags (-s -v) remain as a concern. Is this PR being used for debugging only, or are these flags intended to be permanent?
Update (3b15efd): New commit adds deferred CUDA device setting in app_launcher.py — torch import/set_device() is now deferred until after SimulationApp starts to avoid NumPy/OpenBLAS at-fork handlers crashing Kit's platform-info fork during startup. ✅ This is a well-implemented fix with clear comments explaining the rationale. No concerns with this change.
Current PR scope:
app_launcher.py: Deferred CUDA device setting — ✅ looks goodutils/__init__.py: Lazy configclass import — ✅ looks goodconftest.py:-s -vpytest flags —⚠️ previous concern still applies (increases log verbosity)
Overall this PR is improving. The main technical changes are solid. Only remaining question is whether the pytest verbosity flags are intended to be permanent.
Update (ba57283): ✅ Pytest -v flag removed — only -s remains in conftest.py. This addresses my verbosity concern.
New changes in this commit:
physics_manager_cfg.py,newton_collision_cfg.py,newton_manager_cfg.py: Changed to direct imports (from isaaclab.utils.configclass import configclass) — ✅ cleaner, more explicittest_isaac_rtx_renderer_utils.py: Refactored tests with amock_sim_contextfixture — ✅ cleaner test code, removes duplicated patch blocks
Current PR scope looks clean. No remaining concerns with the technical changes. The -s flag still disables output capture but without -v the impact is reasonable for debugging purposes.---
Update (fa628db): Added changelog entry (pbarejko-debugging.rst) documenting the CUDA device setting fix and .skip files for newton/physx sub-packages. ✅ Documentation-only changes — no new concerns.
Update (c70e9e0): Test infrastructure improvements:
test_noise.py: Removed unnecessaryAppLauncher/simulation_appboilerplate — this test only exercises pure math utilities and doesn't need the Omniverse app. ✅ Good cleanup.test_wrench_composer.py: Addedteardown_module()to closesimulation_appafter tests complete. ✅ Proper resource cleanup.
No new concerns. PR continues to look solid.
Update (f8b2c7a): test_wrench_composer.py now also removes the AppLauncher/simulation_app boilerplate entirely (and the teardown_module added in the previous commit). ✅ This test only exercises pure math/torch utilities and doesn't need the Omniverse app — same cleanup as test_noise.py. Good improvement for faster test execution.
No new concerns. PR looks ready for merge.
Update (ee49ecf): Changelog consolidation — moved fragment files from changelog.d/ into main CHANGELOG.rst files and bumped versions (isaaclab 5.2.0→5.2.1, isaaclab_newton 0.9.0→0.9.1, isaaclab_physx 0.7.0→0.7.1). ✅ Standard release housekeeping. No code changes, no new concerns.
Final status: PR looks good to merge.
f438131 to
67d10a4
Compare
There was a problem hiding this comment.
Consider removing this if we're going to merge this?
|
Closing because of #5633 |
# Description This PR is based on and includes the changes from #5620, then adds one CI fix on top: it unsets `HUB__ARGS__DETECT_ONLY` inside the Docker test container before running Isaac Lab commands. Some base images set this flag, which prevents OmniHub from starting and makes cold Nucleus asset retrieval fall back to slow repeated retries. This was reproduced from the failing Actions job: https://github.com/isaac-sim/IsaacLab/actions/runs/25904143763/job/76158743634 The affected `test_rsl_rl_export_flow.py` Dexsuite Kuka-Allegro export timed out at 600 s with the flag set, then completed in about 73 s with the flag unset after clearing the local KukaAllegro mirror. Fixes # N/A ## Type of change - Bug fix (non-breaking change which fixes an issue) ## Screenshots N/A - CI-only change. ## Checklist - [x] I have read and understood the [contribution guidelines](https://isaac-sim.github.io/IsaacLab/main/source/refs/contributing.html) - [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with `./isaaclab.sh --format` - [x] I have made corresponding changes to the documentation (N/A - CI-only change) - [x] My changes generate no new warnings - [x] I have added tests that prove my fix is effective or that my feature works (validated with the affected Docker export test) - [x] I have added a changelog fragment under `source/<pkg>/changelog.d/` for every touched package (N/A for the CI-only commit; #5620 carries its own changelog fragments) - [x] I have added my name to the `CONTRIBUTORS.md` or my name already exists there ## Test Plan - `./isaaclab.sh -f` - Docker reproduction with `HUB__ARGS__DETECT_ONLY=true`: `test_export_flow[Isaac-Dexsuite-Kuka-Allegro-Reorient-v0]` timed out after 600 s. - Docker reproduction with `HUB__ARGS__DETECT_ONLY` unset after clearing the KukaAllegro mirror: `test_export_flow[Isaac-Dexsuite-Kuka-Allegro-Reorient-v0]` passed in 72.75 s. --------- Co-authored-by: Piotr Barejko <pbarejko@nvidia.com>
Description
Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context.
List any dependencies that are required for this change.
Fixes # (issue)
Type of change
Screenshots
Please attach before and after screenshots of the change if applicable.
Checklist
pre-commitchecks with./isaaclab.sh --formatconfig/extension.tomlfileCONTRIBUTORS.mdor my name already exists there