Skip to content

Reduce rendering test flakiness#5475

Merged
huidongc merged 8 commits intoisaac-sim:developfrom
huidongc:rendering-test-flakiness
May 5, 2026
Merged

Reduce rendering test flakiness#5475
huidongc merged 8 commits intoisaac-sim:developfrom
huidongc:rendering-test-flakiness

Conversation

@huidongc
Copy link
Copy Markdown
Collaborator

@huidongc huidongc commented May 2, 2026

Description

  • In the Dexsuite env the success and failure markers are placed exactly at the same location.
    If both markers are visible, the rendering order will determine which one is visible in the
    camera output. Hide both markers to avoid this nondeterministic behavior.

  • Ordering of test cases appear to affect camera outputs for some reason, therefore I move
    the newton_renderer test cases after isaacsim_rtx_renderer test cases.

Fixes # (issue)

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (existing functionality will not work without user modification)
  • Documentation update

Screenshots

Please attach before and after screenshots of the change if applicable.

Checklist

  • I have read and understood the contribution guidelines
  • I have run the pre-commit checks with ./isaaclab.sh --format
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • I have updated the changelog and the corresponding version in the extension's config/extension.toml file
  • I have added my name to the CONTRIBUTORS.md or my name already exists there

@github-actions github-actions Bot added the isaac-lab Related to Isaac Lab team label May 2, 2026
Copy link
Copy Markdown

@isaaclab-review-bot isaaclab-review-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Isaac Lab Review Bot

Summary

This PR addresses rendering test flakiness by updating 18 golden images for the dexsuite_kuka rendering tests and removing the @pytest.mark.flaky(max_runs=3, min_passes=1) decorator from both kit-based and kitless test functions. The approach assumes the new golden images are deterministic and represent the correct baseline.

Architecture Impact

Self-contained. The changes are limited to test infrastructure — golden image baselines and test decorators. No production code is affected. The rendering test utilities (rendering_test_utils.py) and the test execution flow remain unchanged.

Implementation Verdict

Minor fixes needed — The approach is reasonable but lacks documentation explaining what caused the original flakiness and why the new golden images are expected to be stable.

Test Coverage

This PR is the test fix. The tests themselves (test_rendering_dexsuite_kuka and test_rendering_dexsuite_kuka_kitless) remain parameterized across all physics backend, renderer, and AOV combinations. The removal of @pytest.mark.flaky is a strong assertion that the new golden images will pass consistently — this needs CI validation.

CI Status

No CI checks available yet. This is critical for this PR — the entire premise is that tests will now pass deterministically. Cannot verify the fix without CI results.

Findings

🟡 Warning: PR Description — Missing root cause analysis
The PR description is incomplete. For a test flakiness fix, the description should explain:

  1. What was causing the flakiness (non-deterministic rendering, race conditions, warm-up frames, etc.)
  2. Why the new golden images are expected to be stable (were they generated with additional warm-up? Different seeding? A simulator fix?)
  3. Whether this is a workaround or addresses the underlying issue

Without this context, reviewers cannot assess whether removing @pytest.mark.flaky is premature.

🟡 Warning: test_rendering_dexsuite_kuka.py:32 — Flaky decorator removal without proven stability

@pytest.mark.parametrize("physics_backend,renderer,data_type", PHYSICS_RENDERER_AOV_COMBINATIONS)
def test_rendering_dexsuite_kuka(physics_backend, renderer, data_type):

Removing @pytest.mark.flaky(max_runs=3, min_passes=1) assumes deterministic behavior. If the underlying flakiness was due to renderer warm-up, GPU state, or non-deterministic physics, simply updating golden images may shift which runs fail rather than eliminating failures. This needs multiple CI runs to validate.

🟡 Warning: test_rendering_dexsuite_kuka_kitless.py:28 — Same concern as kit-based test
The kitless variant has identical risk. Both files should be validated with multiple CI runs before merge.

🔵 Improvement: Golden image generation — Consider adding tolerance metadata
The rendering tests compare against golden images, but the test utilities likely use SSIM or MSE thresholds. If flakiness was due to minor pixel variations, an alternative approach would be to document acceptable tolerance ranges rather than chasing pixel-perfect golden images. This could be addressed in follow-up work.

🔵 Improvement: Consider adding a determinism stress test
For rendering tests that have historically been flaky, adding a stress test that runs the same configuration N times and asserts identical outputs would provide confidence before removing flaky markers.


Recommendation: Wait for CI results showing green on these tests before merging. Ideally, run CI multiple times (3+) to confirm the flakiness is actually resolved. The PR author should also update the description with the root cause explanation.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 2, 2026

Greptile Summary

This PR reduces rendering test flakiness in the DexSuite Kuka environment by hiding both success and failure markers (which are co-located at initialization time before visualize() positions them) and by moving newton_renderer test cases after isaacsim_rtx_renderer test cases to address ordering-sensitive camera output differences. It also tightens per-pixel difference thresholds for shadow_hand (7.0→5.0) and dexsuite_kuka (10.0→8.0), and removes the redundant private/public alias for MAX_DIFFERENT_PIXELS_PERCENTAGE_BY_ENV_NAME.

Confidence Score: 5/5

Safe to merge — targeted flakiness fix with no functional behaviour changes and updated golden images.

No P0 or P1 issues found. The marker-hiding fix is correctly scoped (both 'success' and 'failure' prototypes live under success_visualizer_cfg.markers and are hidden before env construction). The set_visibility(True) call in pose_commands.init operates on the instancer parent and does not override the explicitly-set invisible attribute on child prototype prims. Threshold tightening and test-ordering changes are clean. All golden images are updated to match.

No files require special attention.

Important Files Changed

Filename Overview
source/isaaclab_tasks/test/rendering_test_utils.py Core fix: hides co-located success/failure markers before env construction, reorders newton_renderer test params, tightens thresholds, and removes the private-variable alias pattern. All changes are correct and consistent.
source/isaaclab_tasks/changelog.d/rendering-test-flakiness.skip Empty changelog sentinel file; standard practice for this repo to track the change.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[rendering_test_dexsuite_kuka called] --> B[Apply overrides to env_cfg]
    B --> C[Disable point-cloud visualisation markers]
    C --> D["Hide success_visualizer_cfg markers\n(both 'success' and 'failure' keys)\nmarker_cfg.visible = False"]
    D --> E[Construct ManagerBasedRLEnv]
    E --> F["VisualizationMarkers.__init__\nspawns prototype prims with visible=False\nInitialises instancer at pos 0,0,0"]
    F --> G["pose_commands.__init__\nset_visibility(True) on instancer\n— does NOT override child visible=False"]
    G --> H[maybe_save_stage]
    H --> I[validate_camera_outputs]
    I --> J{Compare against golden image}
    J -->|within threshold| K[PASS]
    J -->|exceeds threshold| L[FAIL]
Loading

Reviews (3): Last reviewed commit: "remove nvbugs link" | Re-trigger Greptile

@huidongc huidongc marked this pull request as draft May 2, 2026 04:08
Copy link
Copy Markdown

@isaaclab-review-bot isaaclab-review-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Isaac Lab Review Bot

Summary

The new commit addresses the root cause of the rendering test flakiness by hiding the success/failure visualization markers that were causing non-deterministic rendering order, and regenerates all 24 golden images accordingly.

Follow-up Assessment

Previous concern addressed: The root cause is now clear — the success and failure markers were placed at the same location, causing non-deterministic z-fighting/rendering order. The fix at rendering_test_utils.py:763-766 explicitly hides these markers:

for marker_cfg in env_cfg.commands.object_pose.success_visualizer_cfg.markers.values():
    marker_cfg.visible = False

This is a proper fix rather than just updating golden images. The @pytest.mark.flaky removal is now justified.

New Findings

🔵 Improvement: rendering_test_utils.py:763-766 — Consider also hiding failure_visualizer_cfg
The comment mentions "success and failure markers" but only success_visualizer_cfg is explicitly hidden. If there's a corresponding failure_visualizer_cfg, it should also be hidden for completeness. Verify this isn't causing partial flakiness.


Verdict: Ship it once CI confirms the fix works. The approach is sound — removing the source of non-determinism rather than masking it with flaky markers.

@huidongc huidongc force-pushed the rendering-test-flakiness branch from 518656a to 5dc2a1d Compare May 4, 2026 10:04
Copy link
Copy Markdown

@isaaclab-review-bot isaaclab-review-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Isaac Lab Review Bot

Summary

The previous concern about failure_visualizer_cfg has been addressed — looking at the full file context, I can confirm that success_visualizer_cfg is the only visualizer used for this command type (it displays both success/failure states via different marker visibility). The golden images have been regenerated with markers hidden, and the @pytest.mark.flaky decorators are properly removed.

Implementation Verdict

Ship it. The fix correctly addresses the root cause of non-deterministic rendering by hiding overlapping markers, and CI pre-commit/changelog checks pass.

@huidongc huidongc marked this pull request as ready for review May 4, 2026 12:02
@huidongc huidongc changed the title rendering test flakiness Reduce rendering test flakiness May 4, 2026
@huidongc huidongc requested a review from pbarejko May 4, 2026 12:21
Copy link
Copy Markdown

@isaaclab-review-bot isaaclab-review-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Isaac Lab Review Bot

Summary

This is a follow-up review. The previous concern about failure_visualizer_cfg was already addressed in my last review — confirmed that success_visualizer_cfg is the only visualizer for this command type. No new code changes have been introduced since the last review (commit 70e65e6 is the same).

Implementation Verdict

Ship it. The fix correctly hides overlapping success/failure markers to eliminate non-deterministic rendering order, golden images are regenerated, thresholds are appropriately tightened, and @pytest.mark.flaky decorators are removed. CI shows pre-commit and changelog checks passing; remaining pending jobs are expected for a rendering test change.

Copy link
Copy Markdown

@isaaclab-review-bot isaaclab-review-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Isaac Lab Review Bot

Summary

This is a follow-up review on commit 32fea69. The changes are identical to the previously reviewed commit — no new code modifications. The PR correctly addresses rendering test flakiness by hiding overlapping success/failure markers and reordering test cases.

Implementation Verdict

Ship it. All previous concerns have been addressed. The fix is correct and well-documented.

CI Status

Most jobs are still pending, but pre-commit, changelog checks, and base Docker builds have passed. This is expected for a rendering test change that requires the full test suite.

@huidongc huidongc marked this pull request as draft May 4, 2026 13:24
Copy link
Copy Markdown

@isaaclab-review-bot isaaclab-review-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Isaac Lab Review Bot

Summary

This is a follow-up review on commit b711957. The changes are identical to the previously reviewed commit — the fix correctly hides overlapping success/failure markers in the Dexsuite env and reorders newton_renderer test cases after isaacsim_rtx_renderer cases.

Implementation Verdict

Ship it. The implementation is correct and the golden images have been regenerated to reflect the marker visibility change.

CI Status

Most jobs are still pending. Pre-commit, changelog, and build checks have passed.

Copy link
Copy Markdown
Collaborator

@ooctipus ooctipus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment thread source/isaaclab_tasks/test/rendering_test_utils.py Outdated
@huidongc huidongc marked this pull request as ready for review May 4, 2026 21:44
Copy link
Copy Markdown

@isaaclab-review-bot isaaclab-review-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Isaac Lab Review Bot

Summary

This is a follow-up review on commit 5a9c88d. The changes are identical to the previously reviewed commit — the PR correctly hides overlapping success/failure markers in the Dexsuite env to eliminate rendering order non-determinism, and reorders test cases to address execution-order-dependent flakiness.

CI Status

Multiple CI jobs are failing (isaaclab_newton, isaaclab_physx, isaaclab_assets, isaaclab_tasks, isaaclab_rl, environments_training). These failures should be investigated to confirm they are unrelated to this PR's changes — the rendering test flakiness fix itself should not cause widespread failures across unrelated modules.

Implementation Verdict

Ship it (pending CI investigation). The code changes are correct and targeted. The CI failures appear systemic rather than caused by this PR, but should be verified before merge.

@huidongc huidongc merged commit 5afbca4 into isaac-sim:develop May 5, 2026
48 of 56 checks passed
@huidongc huidongc deleted the rendering-test-flakiness branch May 5, 2026 00:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

isaac-lab Related to Isaac Lab team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants