Skip to content

Cleans up failing unit tests#5385

Merged
kellyguo11 merged 5 commits intoisaac-sim:developfrom
kellyguo11:skip-failed-tests
Apr 24, 2026
Merged

Cleans up failing unit tests#5385
kellyguo11 merged 5 commits intoisaac-sim:developfrom
kellyguo11:skip-failed-tests

Conversation

@kellyguo11
Copy link
Copy Markdown
Contributor

@kellyguo11 kellyguo11 commented Apr 24, 2026

Description

  • Skips source/isaaclab_tasks/test/benchmarking/test_environments_training.py until we resolve auto-cancel issue with pipeline where previous jobs gets cancelled on newer commits.
  • Marks TheiaTiny environment tests with xfail
  • Skips test_deformable_object.py due to crash
  • Reverts SDP change that removed USD fallback path for Newton model creation
  • Adds flaky test to test_rendering_correctness.py‎ shadow hand test

Type of change

  • Bug fix (non-breaking change which fixes an issue)

Checklist

  • I have read and understood the contribution guidelines
  • I have run the pre-commit checks with ./isaaclab.sh --format
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • I have updated the changelog and the corresponding version in the extension's config/extension.toml file
  • I have added my name to the CONTRIBUTORS.md or my name already exists there

@github-actions github-actions Bot added isaac-lab Related to Isaac Lab team infrastructure labels Apr 24, 2026
Copy link
Copy Markdown

@isaaclab-review-bot isaaclab-review-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Isaac Lab Review Bot

Summary

This PR applies pragmatic CI stabilization fixes: skipping a long-running training test, adding flaky retries for known-unstable Newton renderer tests, marking the TheiaTiny environment as xfail, and adding crash retries for signal-terminated tests. The changes are straightforward and correctly implemented, though one architectural concern exists around how xfail tasks propagate through the test utilities.

Architecture Impact

  • env_test_utils.py: The XFAIL_TASKS dict and pytest.param wrapping affects all downstream tests that consume setup_environment(). This includes test_environments.py, test_environments_* variants, and any new tests using this utility. The change is backward-compatible since pytest.param(task_id) without marks behaves identically to passing task_id directly.
  • tools/conftest.py: The crash retry logic is self-contained within run_individual_tests() and doesn't affect the external API.
  • test_rendering_correctness.py: The flaky marks only affect the parametrized test cases for Newton renderer combinations.

Implementation Verdict

Minor fixes needed — One logic issue in the retry loop termination condition.

Test Coverage

This PR modifies test infrastructure rather than feature code. The changes themselves cannot have unit tests in the traditional sense. The @pytest.mark.skip on training tests means those tests won't run until the auto-cancel pipeline issue is resolved — this is documented in the PR description. No new regression tests are needed for infrastructure changes.

CI Status

No CI checks available yet. The effectiveness of these changes will be validated when CI runs.

Findings

🟡 Warning: tools/conftest.py:369-416 — Retry loop may terminate prematurely due to max_attempts calculation

The loop runs max_attempts = max(STARTUP_HANG_RETRIES, CRASH_RETRIES) + 1 times, but the individual retry counters (startup_hang_attempts, crash_attempts) are tracked separately. Consider this scenario: STARTUP_HANG_RETRIES=2, CRASH_RETRIES=2, max_attempts=3.

If attempt 0 crashes, crash_attempts becomes 1, loop continues.
If attempt 1 crashes, crash_attempts becomes 2, loop continues.
If attempt 2 crashes, crash_attempts equals CRASH_RETRIES (2), the if crashed and crash_attempts < CRASH_RETRIES is false, so the loop breaks correctly.

However, if attempt 0 is a startup hang and attempts 1-2 are crashes, you get:

  • Attempt 0: startup hang → startup_hang_attempts=1, continue
  • Attempt 1: crash → crash_attempts=1, continue
  • Attempt 2: crash → crash_attempts=2, break (consumed all crash retries)

This seems intentional but the comment "Number of times to retry" implies each failure type gets independent retries. If a test alternates between startup hangs and crashes, the current logic may not fully exhaust retries for each type. This is minor since such alternating failures are rare, but the semantics could be clearer.

🔵 Improvement: source/isaaclab_tasks/test/env_test_utils.py:26-30 — Consider centralizing xfail tasks with rendering correctness

test_rendering_correctness.py:58-62 has its own _OVRTX_DISABLED skip marker for experimental features. If TheiaTiny failures are related to rendering, there may be value in consolidating known-broken environments/renderers in one place. Currently XFAIL_TASKS only affects environments discovered via setup_environment(), not direct fixture-based tests.

🔵 Improvement: source/isaaclab_tasks/test/test_rendering_correctness.py:166-175 — Flaky mark placement is correct but consider adding reason

The _PHYSX_NEWTON_WARP_FLAKY mark correctly uses pytest.mark.flaky(max_runs=3, min_passes=1). For future debugging, consider adding a reason parameter if the flaky plugin supports it, or document which specific intermittent errors are observed.

🔵 Improvement: source/isaaclab_tasks/test/benchmarking/test_environments_training.py:79 — Skip reason could reference issue tracker

The skip reason "Temporarily disabled due to long running time" doesn't mention the auto-cancel pipeline issue from the PR description. A GitHub issue reference (e.g., reason="Temporarily disabled pending #XXXX (auto-cancel pipeline issue)") would help track when this can be re-enabled.

🔵 Improvement: tools/conftest.py:53-60 — CRASH_RETRIES docstring mentions deformables but code is generic

The PR description mentions "Adds retries for crashes (for deformables test)" but the implementation is generic to all tests. This is good design, but the docstring could mention the motivating use case for future context.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 24, 2026

Greptile Summary

This PR stabilises flaky and broken CI tests: it skips the long-running benchmarking suite pending an auto-cancel fix, marks TheiaTiny environment tests as xfail, adds flaky reruns for physx/newton renderer combinations, and introduces crash-signal retry logic in the harness for deformables tests.

  • P1 – tools/conftest.py line 370: max_attempts = max(STARTUP_HANG_RETRIES, CRASH_RETRIES) + 1 evaluates to 3 when both are 2 — identical to the previous STARTUP_HANG_RETRIES + 1. If two startup hangs exhaust their budget, the next attempt that crashes hits break immediately with no crash retry. The formula should be STARTUP_HANG_RETRIES + CRASH_RETRIES + 1.

Confidence Score: 4/5

Safe to merge after fixing the max_attempts formula; the crash-retry feature will silently underdeliver retries in mixed failure scenarios until corrected.

One P1 logic bug in the new crash-retry loop: max(STARTUP_HANG_RETRIES, CRASH_RETRIES) + 1 does not give independent budgets for both retry types, so crash retries may be silently skipped. All other changes are clean.

tools/conftest.py — the max_attempts calculation on line 370 needs to be STARTUP_HANG_RETRIES + CRASH_RETRIES + 1.

Important Files Changed

Filename Overview
tools/conftest.py Adds crash-retry logic, but max_attempts = max(STARTUP_HANG_RETRIES, CRASH_RETRIES) + 1 is insufficient when both retry budgets need to be spent independently — the correct formula is their sum plus 1.
source/isaaclab_tasks/test/env_test_utils.py Introduces XFAIL_TASKS map and wraps matching task IDs in pytest.param with xfail mark; return type annotation list[str] is now stale and should be `list[str
source/isaaclab_tasks/test/benchmarking/test_environments_training.py Adds @pytest.mark.skip to test_train_environments to temporarily disable it due to the pipeline auto-cancel issue; straightforward change.
source/isaaclab_tasks/test/test_rendering_correctness.py Defines _PHYSX_NEWTON_WARP_FLAKY = pytest.mark.flaky(max_runs=3, min_passes=1) and applies it to both physx/newton_renderer RGB and depth parametrized test cases; clean and correct.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Start: attempt loop\nrange max_attempts] --> B[Run test subprocess]
    B --> C{kill_reason?}
    C -- startup_hang --> D{startup_hang_attempts\n< STARTUP_HANG_RETRIES?}
    D -- yes --> E[startup_hang_attempts++\ncontinue]
    E --> B
    D -- no --> Z[break → report STARTUP_HANG failure]
    C -- none --> F{crashed?\nreturncode < 0\nno report\nno kill}
    F -- yes --> G{crash_attempts\n< CRASH_RETRIES?}
    G -- yes --> H[crash_attempts++\ncontinue]
    H --> B
    G -- no --> Z2[break → report CRASHED failure]
    F -- no --> I[break → parse JUnit report]
    C -- timeout/shutdown_hang --> I
Loading

Comments Outside Diff (1)

  1. source/isaaclab_tasks/test/env_test_utils.py, line 100 (link)

    P2 Return type annotation now stale

    setup_environment is annotated as -> list[str], but after this PR it can return a list[str | pytest.param] — plain task IDs for normal tasks and wrapped pytest.param objects for those in XFAIL_TASKS. Type checkers and callers that assume every element is a str will be wrong.

Reviews (1): Last reviewed commit: "Cleans up failing unit tests" | Re-trigger Greptile

Comment thread tools/conftest.py Outdated
Comment on lines +370 to +371
max_attempts = max(STARTUP_HANG_RETRIES, CRASH_RETRIES) + 1
for attempt in range(max_attempts):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 max_attempts can starve crash retries after startup-hang retries

max(STARTUP_HANG_RETRIES, CRASH_RETRIES) + 1 equals 3 when both are 2 — the same total as the old STARTUP_HANG_RETRIES + 1. This means the crash-retry budget is not additive. If a test startup-hangs on attempts 0 and 1 (consuming both startup-hang retries), the third attempt that crashes will hit break immediately because range(3) is exhausted — crash_attempts < CRASH_RETRIES is never evaluated. The formula should be STARTUP_HANG_RETRIES + CRASH_RETRIES + 1 to allow both retry budgets to be spent independently.

Suggested change
max_attempts = max(STARTUP_HANG_RETRIES, CRASH_RETRIES) + 1
for attempt in range(max_attempts):
max_attempts = STARTUP_HANG_RETRIES + CRASH_RETRIES + 1

@pbarejko pbarejko self-requested a review April 24, 2026 04:34
@kellyguo11 kellyguo11 merged commit ae41e2a into isaac-sim:develop Apr 24, 2026
10 checks passed
AntoineRichard added a commit to AntoineRichard/IsaacLab that referenced this pull request Apr 24, 2026
PR isaac-sim#5385 marked Isaac-Cartpole-RGB-TheiaTiny-v0 as xfail with the
reason "TheiaTiny environment is currently broken; xfailed pending
fix." The underlying breakage is the packaging METADATA corruption
addressed in this branch: when importlib.metadata.version("packaging")
returns None, transformers' require_version at import time crashes
every TheiaTiny code path.

With the force-reinstall of packaging in command_install, the metadata
is rewritten cleanly and transformers imports succeed, so the xfail
mute is no longer needed. Remove it so CI on this PR actually
exercises the TheiaTiny task and proves the fix works end-to-end
instead of silently passing.

Changelog/extension-version bump deferred; will be added in a
follow-up commit.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

infrastructure isaac-lab Related to Isaac Lab team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants