Cleans up failing unit tests by kellyguo11 · Pull Request #5385 · isaac-sim/IsaacLab

kellyguo11 · 2026-04-24T02:10:12Z

Description

Skips source/isaaclab_tasks/test/benchmarking/test_environments_training.py until we resolve auto-cancel issue with pipeline where previous jobs gets cancelled on newer commits.
Marks TheiaTiny environment tests with xfail
Skips test_deformable_object.py due to crash
Reverts SDP change that removed USD fallback path for Newton model creation
Adds flaky test to test_rendering_correctness.py‎ shadow hand test

Type of change

Bug fix (non-breaking change which fixes an issue)

Checklist

I have read and understood the contribution guidelines
I have run the pre-commit checks with ./isaaclab.sh --format
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
I have updated the changelog and the corresponding version in the extension's config/extension.toml file
I have added my name to the CONTRIBUTORS.md or my name already exists there

isaaclab-review-bot

🤖 Isaac Lab Review Bot

Summary

This PR applies pragmatic CI stabilization fixes: skipping a long-running training test, adding flaky retries for known-unstable Newton renderer tests, marking the TheiaTiny environment as xfail, and adding crash retries for signal-terminated tests. The changes are straightforward and correctly implemented, though one architectural concern exists around how xfail tasks propagate through the test utilities.

Architecture Impact

env_test_utils.py: The XFAIL_TASKS dict and pytest.param wrapping affects all downstream tests that consume setup_environment(). This includes test_environments.py, test_environments_* variants, and any new tests using this utility. The change is backward-compatible since pytest.param(task_id) without marks behaves identically to passing task_id directly.
tools/conftest.py: The crash retry logic is self-contained within run_individual_tests() and doesn't affect the external API.
test_rendering_correctness.py: The flaky marks only affect the parametrized test cases for Newton renderer combinations.

Implementation Verdict

Minor fixes needed — One logic issue in the retry loop termination condition.

Test Coverage

This PR modifies test infrastructure rather than feature code. The changes themselves cannot have unit tests in the traditional sense. The @pytest.mark.skip on training tests means those tests won't run until the auto-cancel pipeline issue is resolved — this is documented in the PR description. No new regression tests are needed for infrastructure changes.

CI Status

No CI checks available yet. The effectiveness of these changes will be validated when CI runs.

Findings

🟡 Warning: tools/conftest.py:369-416 — Retry loop may terminate prematurely due to max_attempts calculation

The loop runs max_attempts = max(STARTUP_HANG_RETRIES, CRASH_RETRIES) + 1 times, but the individual retry counters (startup_hang_attempts, crash_attempts) are tracked separately. Consider this scenario: STARTUP_HANG_RETRIES=2, CRASH_RETRIES=2, max_attempts=3.

If attempt 0 crashes, crash_attempts becomes 1, loop continues.
If attempt 1 crashes, crash_attempts becomes 2, loop continues.
If attempt 2 crashes, crash_attempts equals CRASH_RETRIES (2), the if crashed and crash_attempts < CRASH_RETRIES is false, so the loop breaks correctly.

However, if attempt 0 is a startup hang and attempts 1-2 are crashes, you get:

Attempt 0: startup hang → startup_hang_attempts=1, continue
Attempt 1: crash → crash_attempts=1, continue
Attempt 2: crash → crash_attempts=2, break (consumed all crash retries)

This seems intentional but the comment "Number of times to retry" implies each failure type gets independent retries. If a test alternates between startup hangs and crashes, the current logic may not fully exhaust retries for each type. This is minor since such alternating failures are rare, but the semantics could be clearer.

🔵 Improvement: source/isaaclab_tasks/test/env_test_utils.py:26-30 — Consider centralizing xfail tasks with rendering correctness

test_rendering_correctness.py:58-62 has its own _OVRTX_DISABLED skip marker for experimental features. If TheiaTiny failures are related to rendering, there may be value in consolidating known-broken environments/renderers in one place. Currently XFAIL_TASKS only affects environments discovered via setup_environment(), not direct fixture-based tests.

🔵 Improvement: source/isaaclab_tasks/test/test_rendering_correctness.py:166-175 — Flaky mark placement is correct but consider adding reason

The _PHYSX_NEWTON_WARP_FLAKY mark correctly uses pytest.mark.flaky(max_runs=3, min_passes=1). For future debugging, consider adding a reason parameter if the flaky plugin supports it, or document which specific intermittent errors are observed.

🔵 Improvement: source/isaaclab_tasks/test/benchmarking/test_environments_training.py:79 — Skip reason could reference issue tracker

The skip reason "Temporarily disabled due to long running time" doesn't mention the auto-cancel pipeline issue from the PR description. A GitHub issue reference (e.g., reason="Temporarily disabled pending #XXXX (auto-cancel pipeline issue)") would help track when this can be re-enabled.

🔵 Improvement: tools/conftest.py:53-60 — CRASH_RETRIES docstring mentions deformables but code is generic

The PR description mentions "Adds retries for crashes (for deformables test)" but the implementation is generic to all tests. This is good design, but the docstring could mention the motivating use case for future context.

greptile-apps · 2026-04-24T02:12:39Z

Greptile Summary

This PR stabilises flaky and broken CI tests: it skips the long-running benchmarking suite pending an auto-cancel fix, marks TheiaTiny environment tests as xfail, adds flaky reruns for physx/newton renderer combinations, and introduces crash-signal retry logic in the harness for deformables tests.

P1 – tools/conftest.py line 370: max_attempts = max(STARTUP_HANG_RETRIES, CRASH_RETRIES) + 1 evaluates to 3 when both are 2 — identical to the previous STARTUP_HANG_RETRIES + 1. If two startup hangs exhaust their budget, the next attempt that crashes hits break immediately with no crash retry. The formula should be STARTUP_HANG_RETRIES + CRASH_RETRIES + 1.

Confidence Score: 4/5

Safe to merge after fixing the max_attempts formula; the crash-retry feature will silently underdeliver retries in mixed failure scenarios until corrected.

One P1 logic bug in the new crash-retry loop: max(STARTUP_HANG_RETRIES, CRASH_RETRIES) + 1 does not give independent budgets for both retry types, so crash retries may be silently skipped. All other changes are clean.

tools/conftest.py — the max_attempts calculation on line 370 needs to be STARTUP_HANG_RETRIES + CRASH_RETRIES + 1.

Important Files Changed

Filename	Overview
tools/conftest.py	Adds crash-retry logic, but `max_attempts = max(STARTUP_HANG_RETRIES, CRASH_RETRIES) + 1` is insufficient when both retry budgets need to be spent independently — the correct formula is their sum plus 1.
source/isaaclab_tasks/test/env_test_utils.py	Introduces `XFAIL_TASKS` map and wraps matching task IDs in `pytest.param` with `xfail` mark; return type annotation `list[str]` is now stale and should be `list[str
source/isaaclab_tasks/test/benchmarking/test_environments_training.py	Adds `@pytest.mark.skip` to `test_train_environments` to temporarily disable it due to the pipeline auto-cancel issue; straightforward change.
source/isaaclab_tasks/test/test_rendering_correctness.py	Defines `_PHYSX_NEWTON_WARP_FLAKY = pytest.mark.flaky(max_runs=3, min_passes=1)` and applies it to both physx/newton_renderer RGB and depth parametrized test cases; clean and correct.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Start: attempt loop\nrange max_attempts] --> B[Run test subprocess]
    B --> C{kill_reason?}
    C -- startup_hang --> D{startup_hang_attempts\n< STARTUP_HANG_RETRIES?}
    D -- yes --> E[startup_hang_attempts++\ncontinue]
    E --> B
    D -- no --> Z[break → report STARTUP_HANG failure]
    C -- none --> F{crashed?\nreturncode < 0\nno report\nno kill}
    F -- yes --> G{crash_attempts\n< CRASH_RETRIES?}
    G -- yes --> H[crash_attempts++\ncontinue]
    H --> B
    G -- no --> Z2[break → report CRASHED failure]
    F -- no --> I[break → parse JUnit report]
    C -- timeout/shutdown_hang --> I

Comments Outside Diff (1)

source/isaaclab_tasks/test/env_test_utils.py, line 100 (link)

Return type annotation now stale

setup_environment is annotated as -> list[str], but after this PR it can return a list[str | pytest.param] — plain task IDs for normal tasks and wrapped pytest.param objects for those in XFAIL_TASKS. Type checkers and callers that assume every element is a str will be wrong.

_{Reviews (1): Last reviewed commit: "Cleans up failing unit tests" | Re-trigger Greptile}

greptile-apps · 2026-04-24T02:12:43Z

+        max_attempts = max(STARTUP_HANG_RETRIES, CRASH_RETRIES) + 1
+        for attempt in range(max_attempts):


max_attempts can starve crash retries after startup-hang retries

max(STARTUP_HANG_RETRIES, CRASH_RETRIES) + 1 equals 3 when both are 2 — the same total as the old STARTUP_HANG_RETRIES + 1. This means the crash-retry budget is not additive. If a test startup-hangs on attempts 0 and 1 (consuming both startup-hang retries), the third attempt that crashes will hit break immediately because range(3) is exhausted — crash_attempts < CRASH_RETRIES is never evaluated. The formula should be STARTUP_HANG_RETRIES + CRASH_RETRIES + 1 to allow both retry budgets to be spent independently.

Suggested change

max_attempts = max(STARTUP_HANG_RETRIES, CRASH_RETRIES) + 1

for attempt in range(max_attempts):

max_attempts = STARTUP_HANG_RETRIES + CRASH_RETRIES + 1

PR isaac-sim#5385 marked Isaac-Cartpole-RGB-TheiaTiny-v0 as xfail with the reason "TheiaTiny environment is currently broken; xfailed pending fix." The underlying breakage is the packaging METADATA corruption addressed in this branch: when importlib.metadata.version("packaging") returns None, transformers' require_version at import time crashes every TheiaTiny code path. With the force-reinstall of packaging in command_install, the metadata is rewritten cleanly and transformers imports succeed, so the xfail mute is no longer needed. Remove it so CI on this PR actually exercises the TheiaTiny task and proves the fix works end-to-end instead of silently passing. Changelog/extension-version bump deferred; will be added in a follow-up commit.

Cleans up failing unit tests

bb33928

kellyguo11 requested a review from hhansen-bdai as a code owner April 24, 2026 02:10

github-actions Bot added isaac-lab Related to Isaac Lab team infrastructure labels Apr 24, 2026

isaaclab-review-bot Bot reviewed Apr 24, 2026

View reviewed changes

greptile-apps Bot reviewed Apr 24, 2026

View reviewed changes

kellyguo11 added 3 commits April 23, 2026 20:08

revert newton model USD path and skip deformables test

d3192fc

clean up

558bb28

fix test

b8bc35a

pbarejko self-requested a review April 24, 2026 04:34

pbarejko approved these changes Apr 24, 2026

View reviewed changes

add flaky to rendering correctness test

62974ba

kellyguo11 merged commit ae41e2a into isaac-sim:develop Apr 24, 2026
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cleans up failing unit tests#5385

Cleans up failing unit tests#5385
kellyguo11 merged 5 commits intoisaac-sim:developfrom
kellyguo11:skip-failed-tests

kellyguo11 commented Apr 24, 2026 •

edited

Loading

Uh oh!

isaaclab-review-bot Bot left a comment

Uh oh!

greptile-apps Bot commented Apr 24, 2026 •

edited

Loading

Comments Outside Diff (1)

Uh oh!

greptile-apps Bot Apr 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		max_attempts = max(STARTUP_HANG_RETRIES, CRASH_RETRIES) + 1
		for attempt in range(max_attempts):

	max_attempts = max(STARTUP_HANG_RETRIES, CRASH_RETRIES) + 1
	for attempt in range(max_attempts):
	max_attempts = STARTUP_HANG_RETRIES + CRASH_RETRIES + 1

Conversation

kellyguo11 commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Checklist

Uh oh!

isaaclab-review-bot Bot left a comment

Choose a reason for hiding this comment

🤖 Isaac Lab Review Bot

Summary

Architecture Impact

Implementation Verdict

Test Coverage

CI Status

Findings

Uh oh!

greptile-apps Bot commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Flowchart

Comments Outside Diff (1)

Uh oh!

greptile-apps Bot Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kellyguo11 commented Apr 24, 2026 •

edited

Loading

greptile-apps Bot commented Apr 24, 2026 •

edited

Loading