Reduce rendering test flakiness#5475
Conversation
There was a problem hiding this comment.
🤖 Isaac Lab Review Bot
Summary
This PR addresses rendering test flakiness by updating 18 golden images for the dexsuite_kuka rendering tests and removing the @pytest.mark.flaky(max_runs=3, min_passes=1) decorator from both kit-based and kitless test functions. The approach assumes the new golden images are deterministic and represent the correct baseline.
Architecture Impact
Self-contained. The changes are limited to test infrastructure — golden image baselines and test decorators. No production code is affected. The rendering test utilities (rendering_test_utils.py) and the test execution flow remain unchanged.
Implementation Verdict
Minor fixes needed — The approach is reasonable but lacks documentation explaining what caused the original flakiness and why the new golden images are expected to be stable.
Test Coverage
This PR is the test fix. The tests themselves (test_rendering_dexsuite_kuka and test_rendering_dexsuite_kuka_kitless) remain parameterized across all physics backend, renderer, and AOV combinations. The removal of @pytest.mark.flaky is a strong assertion that the new golden images will pass consistently — this needs CI validation.
CI Status
No CI checks available yet. This is critical for this PR — the entire premise is that tests will now pass deterministically. Cannot verify the fix without CI results.
Findings
🟡 Warning: PR Description — Missing root cause analysis
The PR description is incomplete. For a test flakiness fix, the description should explain:
- What was causing the flakiness (non-deterministic rendering, race conditions, warm-up frames, etc.)
- Why the new golden images are expected to be stable (were they generated with additional warm-up? Different seeding? A simulator fix?)
- Whether this is a workaround or addresses the underlying issue
Without this context, reviewers cannot assess whether removing @pytest.mark.flaky is premature.
🟡 Warning: test_rendering_dexsuite_kuka.py:32 — Flaky decorator removal without proven stability
@pytest.mark.parametrize("physics_backend,renderer,data_type", PHYSICS_RENDERER_AOV_COMBINATIONS)
def test_rendering_dexsuite_kuka(physics_backend, renderer, data_type):Removing @pytest.mark.flaky(max_runs=3, min_passes=1) assumes deterministic behavior. If the underlying flakiness was due to renderer warm-up, GPU state, or non-deterministic physics, simply updating golden images may shift which runs fail rather than eliminating failures. This needs multiple CI runs to validate.
🟡 Warning: test_rendering_dexsuite_kuka_kitless.py:28 — Same concern as kit-based test
The kitless variant has identical risk. Both files should be validated with multiple CI runs before merge.
🔵 Improvement: Golden image generation — Consider adding tolerance metadata
The rendering tests compare against golden images, but the test utilities likely use SSIM or MSE thresholds. If flakiness was due to minor pixel variations, an alternative approach would be to document acceptable tolerance ranges rather than chasing pixel-perfect golden images. This could be addressed in follow-up work.
🔵 Improvement: Consider adding a determinism stress test
For rendering tests that have historically been flaky, adding a stress test that runs the same configuration N times and asserts identical outputs would provide confidence before removing flaky markers.
Recommendation: Wait for CI results showing green on these tests before merging. Ideally, run CI multiple times (3+) to confirm the flakiness is actually resolved. The PR author should also update the description with the root cause explanation.
Greptile SummaryThis PR reduces rendering test flakiness in the DexSuite Kuka environment by hiding both success and failure markers (which are co-located at initialization time before Confidence Score: 5/5Safe to merge — targeted flakiness fix with no functional behaviour changes and updated golden images. No P0 or P1 issues found. The marker-hiding fix is correctly scoped (both 'success' and 'failure' prototypes live under success_visualizer_cfg.markers and are hidden before env construction). The set_visibility(True) call in pose_commands.init operates on the instancer parent and does not override the explicitly-set invisible attribute on child prototype prims. Threshold tightening and test-ordering changes are clean. All golden images are updated to match. No files require special attention. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[rendering_test_dexsuite_kuka called] --> B[Apply overrides to env_cfg]
B --> C[Disable point-cloud visualisation markers]
C --> D["Hide success_visualizer_cfg markers\n(both 'success' and 'failure' keys)\nmarker_cfg.visible = False"]
D --> E[Construct ManagerBasedRLEnv]
E --> F["VisualizationMarkers.__init__\nspawns prototype prims with visible=False\nInitialises instancer at pos 0,0,0"]
F --> G["pose_commands.__init__\nset_visibility(True) on instancer\n— does NOT override child visible=False"]
G --> H[maybe_save_stage]
H --> I[validate_camera_outputs]
I --> J{Compare against golden image}
J -->|within threshold| K[PASS]
J -->|exceeds threshold| L[FAIL]
Reviews (3): Last reviewed commit: "remove nvbugs link" | Re-trigger Greptile |
There was a problem hiding this comment.
🤖 Isaac Lab Review Bot
Summary
The new commit addresses the root cause of the rendering test flakiness by hiding the success/failure visualization markers that were causing non-deterministic rendering order, and regenerates all 24 golden images accordingly.
Follow-up Assessment
Previous concern addressed: The root cause is now clear — the success and failure markers were placed at the same location, causing non-deterministic z-fighting/rendering order. The fix at rendering_test_utils.py:763-766 explicitly hides these markers:
for marker_cfg in env_cfg.commands.object_pose.success_visualizer_cfg.markers.values():
marker_cfg.visible = FalseThis is a proper fix rather than just updating golden images. The @pytest.mark.flaky removal is now justified.
New Findings
🔵 Improvement: rendering_test_utils.py:763-766 — Consider also hiding failure_visualizer_cfg
The comment mentions "success and failure markers" but only success_visualizer_cfg is explicitly hidden. If there's a corresponding failure_visualizer_cfg, it should also be hidden for completeness. Verify this isn't causing partial flakiness.
Verdict: Ship it once CI confirms the fix works. The approach is sound — removing the source of non-determinism rather than masking it with flaky markers.
518656a to
5dc2a1d
Compare
There was a problem hiding this comment.
🤖 Isaac Lab Review Bot
Summary
The previous concern about failure_visualizer_cfg has been addressed — looking at the full file context, I can confirm that success_visualizer_cfg is the only visualizer used for this command type (it displays both success/failure states via different marker visibility). The golden images have been regenerated with markers hidden, and the @pytest.mark.flaky decorators are properly removed.
Implementation Verdict
Ship it. The fix correctly addresses the root cause of non-deterministic rendering by hiding overlapping markers, and CI pre-commit/changelog checks pass.
There was a problem hiding this comment.
🤖 Isaac Lab Review Bot
Summary
This is a follow-up review. The previous concern about failure_visualizer_cfg was already addressed in my last review — confirmed that success_visualizer_cfg is the only visualizer for this command type. No new code changes have been introduced since the last review (commit 70e65e6 is the same).
Implementation Verdict
Ship it. The fix correctly hides overlapping success/failure markers to eliminate non-deterministic rendering order, golden images are regenerated, thresholds are appropriately tightened, and @pytest.mark.flaky decorators are removed. CI shows pre-commit and changelog checks passing; remaining pending jobs are expected for a rendering test change.
There was a problem hiding this comment.
🤖 Isaac Lab Review Bot
Summary
This is a follow-up review on commit 32fea69. The changes are identical to the previously reviewed commit — no new code modifications. The PR correctly addresses rendering test flakiness by hiding overlapping success/failure markers and reordering test cases.
Implementation Verdict
Ship it. All previous concerns have been addressed. The fix is correct and well-documented.
CI Status
Most jobs are still pending, but pre-commit, changelog checks, and base Docker builds have passed. This is expected for a rendering test change that requires the full test suite.
There was a problem hiding this comment.
🤖 Isaac Lab Review Bot
Summary
This is a follow-up review on commit b711957. The changes are identical to the previously reviewed commit — the fix correctly hides overlapping success/failure markers in the Dexsuite env and reorders newton_renderer test cases after isaacsim_rtx_renderer cases.
Implementation Verdict
Ship it. The implementation is correct and the golden images have been regenerated to reflect the marker visibility change.
CI Status
Most jobs are still pending. Pre-commit, changelog, and build checks have passed.
There was a problem hiding this comment.
🤖 Isaac Lab Review Bot
Summary
This is a follow-up review on commit 5a9c88d. The changes are identical to the previously reviewed commit — the PR correctly hides overlapping success/failure markers in the Dexsuite env to eliminate rendering order non-determinism, and reorders test cases to address execution-order-dependent flakiness.
CI Status
Multiple CI jobs are failing (isaaclab_newton, isaaclab_physx, isaaclab_assets, isaaclab_tasks, isaaclab_rl, environments_training). These failures should be investigated to confirm they are unrelated to this PR's changes — the rendering test flakiness fix itself should not cause widespread failures across unrelated modules.
Implementation Verdict
Ship it (pending CI investigation). The code changes are correct and targeted. The CI failures appear systemic rather than caused by this PR, but should be verified before merge.
Description
In the Dexsuite env the success and failure markers are placed exactly at the same location.
If both markers are visible, the rendering order will determine which one is visible in the
camera output. Hide both markers to avoid this nondeterministic behavior.
Ordering of test cases appear to affect camera outputs for some reason, therefore I move
the
newton_renderertest cases afterisaacsim_rtx_renderertest cases.Fixes # (issue)
Type of change
Screenshots
Please attach before and after screenshots of the change if applicable.
Checklist
pre-commitchecks with./isaaclab.sh --formatconfig/extension.tomlfileCONTRIBUTORS.mdor my name already exists there