Skip to content

Disable deterministic option in compile tests#7720

Merged
tohtana merged 3 commits intodeepspeedai:masterfrom
tohtana:tohtana/improve_compile_tests_stability
Dec 9, 2025
Merged

Disable deterministic option in compile tests#7720
tohtana merged 3 commits intodeepspeedai:masterfrom
tohtana:tohtana/improve_compile_tests_stability

Conversation

@tohtana
Copy link
Collaborator

@tohtana tohtana commented Dec 9, 2025

Compiler tests (with/without DeepCompile) occasionally fail with mismatching loss values:

FAILED tests/unit/v1/compile/test_compile_zero.py::TestZeRO::test_compile_zero[none-1-dtype0] 
AssertionError: Loss values are not close. Tensors are not close: actual=tensor(-0., device='cuda:1', dtype=torch.bfloat16, grad_fn=<DivBackward1>), expected=tensor(0.0255, device='cuda:1', dtype=torch.bfloat16,
       grad_fn=<CompiledFunctionBackward>) kwargs={'rtol': 0.5, 'atol': 0.01}

While the exact root cause is not yet clear, but we found a similar issue related to the compiler.

This PR disables the deterministic option, which has improved stability. Previously, we encountered this error intermittently when running the compiler tests repeatedly. With this change, the tests now pass 100 consecutive runs.

Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>
Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>
@tohtana tohtana enabled auto-merge (squash) December 9, 2025 23:29
@tohtana tohtana merged commit 4862115 into deepspeedai:master Dec 9, 2025
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants