-
Notifications
You must be signed in to change notification settings - Fork 780
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Occaisional test failure (timing?) #1153
Occaisional test failure (timing?) #1153
Comments
I'd like to take this one if it's available. Can you link to 1 or 2 related Action failures so I can take a deeper look? Any suspicion if this only comes into play when running all of the tests? I haven't been able to reproduce locally. I've run all of the tests with |
Thanks @bvand https://github.com/opentofu/opentofu/actions/runs/7586115064/ is the run that had issues, but I re-ran failed tests and it appears to have overwritten the previous logs? It seems to be intermittent: from @cube2222 "Oh yeah, this test case is known and liked 😄" I wonder if it has to do with how bogged down the host of the github runner is, as it appears to be timing related? |
@janosdebugs I can access that link but don't see any job failures there (I guess because of the rerun). I do see 40 errors under annotations, but those seem to be unrelated? |
This seems strange to me, but I may just be missing some implementation detail:
Any idea if this is a red herring or an actual issue? |
I reproduced the failure by running the Comparing the test ordering and timing between the failed job and a passed job didn't yield anything interesting as far as I can tell (OpenTofu Timing Bug 1153 - Pass_Fail Job Comparison.xlsx). The maximum time diff is the
Given that these jobs run on separate runners, I wouldn't expect resource constraints to be an issue, but that could be tested by running jobs in a matrix with larger runners a few times and seeing if the test failure stops. That leaves some sort of testing race condition or similar, but I'm not 100% sure how to proceed on debugging further - open to ideas. |
Ok I think I've discovered the cause - I reduced the Some options for fixing:
|
I have a proof of concept change to deterministically wait for each I'm happy to either clean up the PR and move forward, go with increased sleep time (I still saw a failure with 10ms so would look to do 50ms), or implement a suggested alternate approach if there are any @cam72cam @janosdebugs |
@bvand apologies for the late reply, we have been a bit swamped with the magnitude of community contributions as of late. Your logic seems sound and I think after a quick tidy, that PR should be good to merge. Thanks again for taking this on! |
@cam72cam no problem, will do! |
Once and a while we see this in test reports. My guess is that there is some timing that could be tweaked or a test that could be restructured.
The text was updated successfully, but these errors were encountered: