OMPE-94619: Fix non-deterministic test_noise failures.#5732
Conversation
`torch.rand` can return exact 0 per its [0, 1) contract (~1 in 2^25 on CUDA). The scale-branch of test_noise recovers the scale factor and noise contribution by dividing `noisy_data / data`. Since `data` was generated via `torch.rand`, this produces NaN about ~1 in 1000 test runs (0.1% of the time.) This can be deterministically reproduced by feeding it a seed of 1981. This change fixes the test by switching `data` to `torch.ones` for the scale branch in all three noise tests so `noisy_data` is the noise term directly. There is no need to recover the scale factor and noise contribution separately. The internal `rand_like` call inside `noise_cfg.func` is still exercised and measured, which is the goal of the test.
Greptile SummaryThis PR fixes a non-deterministic test failure in
Confidence Score: 5/5Test-only change with no production code modified; the new assertions are mathematically equivalent to the originals. The fix replaces torch.rand with torch.ones for the scale branch in three tests. The noise model implementations confirm that scale computes data * noise, so with data = ones, noisy_data equals the noise distribution directly — the same quantity the old code recovered via division. All three assertions (Gaussian std/mean, uniform min/max, constant exact-match) remain correct and statistically sound with 10,000 samples. No files require special attention. Important Files Changed
Reviews (1): Last reviewed commit: "OMPE-94619: Fix non-deterministic test_n..." | Re-trigger Greptile |
There was a problem hiding this comment.
Code Review Summary
Verdict: ✅ Approve
This is a well-crafted fix for a subtle but real flaky test issue. The analysis and solution are both correct.
What This PR Does Well
-
Root Cause Analysis: Correctly identifies that
torch.randcan return exact 0 per its[0, 1)contract, causingnoisy_data / datato produce NaN during scale operation testing. -
Minimal, Targeted Fix: Rather than adding epsilon guards or try/except blocks, the fix elegantly sidesteps the issue by using
torch.onesfor scale operations. This is mathematically sound becausenoisy_data = data * noise_factorsimplifies to justnoise_factorwhendata=1. -
Preserved Test Coverage: The tests still validate the same noise distribution properties (mean, std, min, max, bias) and the internal
rand_likecall inside the noise functions is still exercised. -
Good Documentation: Comments explain the "why" with ticket reference (OMPE-94619), and the changelog fragment is clear.
Technical Validation
The branch consolidation (elif op == "scale" or op == "abs") is correct because:
- scale with
data=ones:noisy_data = 1 * noise = noise - abs:
noisy_data = noise(replaces data entirely)
Both result in noisy_data being the pure noise term, so the same verification logic applies.
No blocking issues found. LGTM! 🎉
) (#5755) # Description Cherry pick PR #5732: `torch.rand` can return exact 0 per its [0, 1) contract (~1 in 2^25 on CUDA). The scale-branch of test_noise recovers the scale factor and noise contribution by dividing `noisy_data / data`. Since `data` was generated via `torch.rand`, this produces NaN about ~1 in 1000 test runs (0.1% of the time.) This can be deterministically reproduced by feeding it a seed of 1981. This change fixes the test by switching `data` to `torch.ones` for the scale branch in all three noise tests so `noisy_data` is the noise term directly. There is no need to recover the scale factor and noise contribution separately. The internal `rand_like` call inside `noise_cfg.func` is still exercised and measured, which is the goal of the test. ## Type of change - Bug fix (non-breaking change which fixes an issue) ## Checklist - [x] I have read and understood the [contribution guidelines](https://isaac-sim.github.io/IsaacLab/main/source/refs/contributing.html) - [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with `./isaaclab.sh --format` - [ ] I have made corresponding changes to the documentation *(N/A - test change only)* - [x] My changes generate no new warnings - [x] I have added tests that prove my fix is effective or that my feature works - [x] I have updated the changelog and the corresponding version in the extension's `config/extension.toml` file - [x] I have added my name to the `CONTRIBUTORS.md` or my name already exists there Co-authored-by: Kelly Guo <kellyg@nvidia.com>
Description
torch.randcan return exact 0 per its [0, 1) contract (~1 in 2^25 on CUDA). The scale-branch of test_noise recovers the scale factor and noise contribution by dividingnoisy_data / data. Sincedatawas generated viatorch.rand, this produces NaN about ~1 in 1000 test runs (0.1% of the time.) This can be deterministically reproduced by feeding it a seed of 1981.This change fixes the test by switching
datatotorch.onesfor the scale branch in all three noise tests sonoisy_datais the noise term directly. There is no need to recover the scale factor and noise contribution separately. The internalrand_likecall insidenoise_cfg.funcis still exercised and measured, which is the goal of the test.Type of change
Checklist
pre-commitchecks with./isaaclab.sh --formatconfig/extension.tomlfileCONTRIBUTORS.mdor my name already exists there