-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TestParserReadNetworkingConfig could be randomly failing. #9382
Comments
Hey @arizvisa ! About this issue, do you still have the context in mind ? It looks like to me like a race-condition, do you have some time to take a look at this ? Could this cause some trouble down the line too ? Cheers ! |
I think this must be a major resource constraint issue because I've run this test over a hundred times, with lots of other stuff going on in my env, to try to catch it and I haven't managed to. |
Oh yeah. It's totally a race condition. Right here at the beginning of the code in
Having tests sure is awesome for catching this. It's crazy to think that by sheer luck this race condition hasn't really manifested itself until this refactor. I'll open up a PR in a sec. I think I can read the version and then hand off the channel to |
Okay. I think PR #9387 should fix this race. I tried to keep things linearly defined so that there's only one consumer of each channel at a given time. I went through the other parsers to see if this pattern affected anything else but couldn't spot anything. The reading of the version isn't in a goro so it should block until it's able to read it. Once the version has been confirmed, then we set the parser off which is free to consume |
In terms of reproducing this, But as a general way to reproduce this and avoid this happening again, the only thing I can think of is to test after each fork/clone of a thread. Maybe sleep or blocking-syscall before reading a channel in the parent, so that a child will get woken up by the OS while the parent is blocking. As an example, if you put a Typically, your OS won't wake up the child thread until your parent thread blocks on some call...so something to maybe think about when dealing with goro-heavy code (which seems to be all of my code, hah), would be to do a test where I sleep in the parent, and sleep in the child to ensure that things work as they're actually intended. |
Anyways, sorry bout that. Saw you guys just did your release. :-/ |
Hey @arizvisa thanks for fixing this so quickly ! I think this case should not happen too much. I'll review the PR carefully now 🙂 ! The thing with chans is that they are meant to be used across processes so simply using chans will never trigger the race detector. In other words the race detector will detect unprotected 'write + read' calls, but a chan is a mean of synchronisation across goroutines and sometimes as avoidance against race-conditions. Another thing about go testing is that it caches the results of tests, so if a test passes once, no mater how often you will re-run it the go toolchain will simply say 'success' without running it. This is especially dangerous for randomly failing tests because it sorts of irons them out. I think a specific option for it is coming soon but for now running |
Fixed a race in the ReadNetworkingConfig implementation from the vmware builders
A circle-ci build on linux failed, then worked after a retry:
It may be worth investigating.
https://circleci.com/gh/hashicorp/packer/58370
The text was updated successfully, but these errors were encountered: