Cleans up failing unit tests#5385
Conversation
There was a problem hiding this comment.
🤖 Isaac Lab Review Bot
Summary
This PR applies pragmatic CI stabilization fixes: skipping a long-running training test, adding flaky retries for known-unstable Newton renderer tests, marking the TheiaTiny environment as xfail, and adding crash retries for signal-terminated tests. The changes are straightforward and correctly implemented, though one architectural concern exists around how xfail tasks propagate through the test utilities.
Architecture Impact
env_test_utils.py: TheXFAIL_TASKSdict andpytest.paramwrapping affects all downstream tests that consumesetup_environment(). This includestest_environments.py,test_environments_*variants, and any new tests using this utility. The change is backward-compatible sincepytest.param(task_id)without marks behaves identically to passingtask_iddirectly.tools/conftest.py: The crash retry logic is self-contained withinrun_individual_tests()and doesn't affect the external API.test_rendering_correctness.py: The flaky marks only affect the parametrized test cases for Newton renderer combinations.
Implementation Verdict
Minor fixes needed — One logic issue in the retry loop termination condition.
Test Coverage
This PR modifies test infrastructure rather than feature code. The changes themselves cannot have unit tests in the traditional sense. The @pytest.mark.skip on training tests means those tests won't run until the auto-cancel pipeline issue is resolved — this is documented in the PR description. No new regression tests are needed for infrastructure changes.
CI Status
No CI checks available yet. The effectiveness of these changes will be validated when CI runs.
Findings
🟡 Warning: tools/conftest.py:369-416 — Retry loop may terminate prematurely due to max_attempts calculation
The loop runs max_attempts = max(STARTUP_HANG_RETRIES, CRASH_RETRIES) + 1 times, but the individual retry counters (startup_hang_attempts, crash_attempts) are tracked separately. Consider this scenario: STARTUP_HANG_RETRIES=2, CRASH_RETRIES=2, max_attempts=3.
If attempt 0 crashes, crash_attempts becomes 1, loop continues.
If attempt 1 crashes, crash_attempts becomes 2, loop continues.
If attempt 2 crashes, crash_attempts equals CRASH_RETRIES (2), the if crashed and crash_attempts < CRASH_RETRIES is false, so the loop breaks correctly.
However, if attempt 0 is a startup hang and attempts 1-2 are crashes, you get:
- Attempt 0: startup hang →
startup_hang_attempts=1, continue - Attempt 1: crash →
crash_attempts=1, continue - Attempt 2: crash →
crash_attempts=2, break (consumed all crash retries)
This seems intentional but the comment "Number of times to retry" implies each failure type gets independent retries. If a test alternates between startup hangs and crashes, the current logic may not fully exhaust retries for each type. This is minor since such alternating failures are rare, but the semantics could be clearer.
🔵 Improvement: source/isaaclab_tasks/test/env_test_utils.py:26-30 — Consider centralizing xfail tasks with rendering correctness
test_rendering_correctness.py:58-62 has its own _OVRTX_DISABLED skip marker for experimental features. If TheiaTiny failures are related to rendering, there may be value in consolidating known-broken environments/renderers in one place. Currently XFAIL_TASKS only affects environments discovered via setup_environment(), not direct fixture-based tests.
🔵 Improvement: source/isaaclab_tasks/test/test_rendering_correctness.py:166-175 — Flaky mark placement is correct but consider adding reason
The _PHYSX_NEWTON_WARP_FLAKY mark correctly uses pytest.mark.flaky(max_runs=3, min_passes=1). For future debugging, consider adding a reason parameter if the flaky plugin supports it, or document which specific intermittent errors are observed.
🔵 Improvement: source/isaaclab_tasks/test/benchmarking/test_environments_training.py:79 — Skip reason could reference issue tracker
The skip reason "Temporarily disabled due to long running time" doesn't mention the auto-cancel pipeline issue from the PR description. A GitHub issue reference (e.g., reason="Temporarily disabled pending #XXXX (auto-cancel pipeline issue)") would help track when this can be re-enabled.
🔵 Improvement: tools/conftest.py:53-60 — CRASH_RETRIES docstring mentions deformables but code is generic
The PR description mentions "Adds retries for crashes (for deformables test)" but the implementation is generic to all tests. This is good design, but the docstring could mention the motivating use case for future context.
Greptile SummaryThis PR stabilises flaky and broken CI tests: it skips the long-running benchmarking suite pending an auto-cancel fix, marks
Confidence Score: 4/5Safe to merge after fixing the One P1 logic bug in the new crash-retry loop:
Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[Start: attempt loop\nrange max_attempts] --> B[Run test subprocess]
B --> C{kill_reason?}
C -- startup_hang --> D{startup_hang_attempts\n< STARTUP_HANG_RETRIES?}
D -- yes --> E[startup_hang_attempts++\ncontinue]
E --> B
D -- no --> Z[break → report STARTUP_HANG failure]
C -- none --> F{crashed?\nreturncode < 0\nno report\nno kill}
F -- yes --> G{crash_attempts\n< CRASH_RETRIES?}
G -- yes --> H[crash_attempts++\ncontinue]
H --> B
G -- no --> Z2[break → report CRASHED failure]
F -- no --> I[break → parse JUnit report]
C -- timeout/shutdown_hang --> I
|
| max_attempts = max(STARTUP_HANG_RETRIES, CRASH_RETRIES) + 1 | ||
| for attempt in range(max_attempts): |
There was a problem hiding this comment.
max_attempts can starve crash retries after startup-hang retries
max(STARTUP_HANG_RETRIES, CRASH_RETRIES) + 1 equals 3 when both are 2 — the same total as the old STARTUP_HANG_RETRIES + 1. This means the crash-retry budget is not additive. If a test startup-hangs on attempts 0 and 1 (consuming both startup-hang retries), the third attempt that crashes will hit break immediately because range(3) is exhausted — crash_attempts < CRASH_RETRIES is never evaluated. The formula should be STARTUP_HANG_RETRIES + CRASH_RETRIES + 1 to allow both retry budgets to be spent independently.
| max_attempts = max(STARTUP_HANG_RETRIES, CRASH_RETRIES) + 1 | |
| for attempt in range(max_attempts): | |
| max_attempts = STARTUP_HANG_RETRIES + CRASH_RETRIES + 1 |
PR isaac-sim#5385 marked Isaac-Cartpole-RGB-TheiaTiny-v0 as xfail with the reason "TheiaTiny environment is currently broken; xfailed pending fix." The underlying breakage is the packaging METADATA corruption addressed in this branch: when importlib.metadata.version("packaging") returns None, transformers' require_version at import time crashes every TheiaTiny code path. With the force-reinstall of packaging in command_install, the metadata is rewritten cleanly and transformers imports succeed, so the xfail mute is no longer needed. Remove it so CI on this PR actually exercises the TheiaTiny task and proves the fix works end-to-end instead of silently passing. Changelog/extension-version bump deferred; will be added in a follow-up commit.
Description
source/isaaclab_tasks/test/benchmarking/test_environments_training.pyuntil we resolve auto-cancel issue with pipeline where previous jobs gets cancelled on newer commits.TheiaTinyenvironment tests with xfailtest_deformable_object.pydue to crashtest_rendering_correctness.pyshadow hand testType of change
Checklist
pre-commitchecks with./isaaclab.sh --formatconfig/extension.tomlfileCONTRIBUTORS.mdor my name already exists there