Tests modify global state cause later tests to fail #110295
Labels
module: ci
Related to continuous integration
module: devx
Related to PyTorch contribution experience (HUD, pytorchbot)
needs design
triaged
This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Context
I've been working on getting the Dynamo tests to reset the Dynamo state before and after each unittest, otherwise, the state from one test affects subsequent tests, leading to unexpected behavior.
However, Dynamo is not the only place where we have global state. We've ran into situations in the past where changing e.g. the global default dtype in one test had a downstream effect on other tests.
Can we make unittests better by better isolating them somehow?
Pitch
Some ideas:
cc @ZainRizvi @kit1980 @huydhn @clee2000 @janeyx99, @jbschlosser, @pytorch/pytorch-dev-infra for ideas
This is more of a tracker for tests that modify global state somehow, and then tests that run after them change outcome depending on whether the first test was run or not. I want to keep track of this to see how often it happens and why it happens.
The tests afterwards generally get marked as flaky because they usually succeed on file level retry, which starts at the failing test and doesn't run the first test.
Examples of this include:
Fixes can include cleaning up after the function or moving tests to run in different files (then CI will run in different processes).
cc @seemethere @malfet @pytorch/pytorch-dev-infra @ZainRizvi @kit1980 @huydhn
The text was updated successfully, but these errors were encountered: