TestParserReadNetworkingConfig could be randomly failing. #9382

azr · 2020-06-09T13:30:08Z

A circle-ci build on linux failed, then worked after a retry:

2020/06/09 13:17:12 Invalid command : [VERSION=1,0]
--- FAIL: TestParserReadNetworingConfig (0.00s)
    driver_parser_test.go:1016: error parsing networking-example: Unexpected format for VERSION entry : [answer VNET_1_DHCP yes]

It may be worth investigating.

https://circleci.com/gh/hashicorp/packer/58370

The text was updated successfully, but these errors were encountered:

azr · 2020-06-09T15:45:14Z

Hey @arizvisa ! About this issue, do you still have the context in mind ? It looks like to me like a race-condition, do you have some time to take a look at this ? Could this cause some trouble down the line too ? Cheers !

SwampDragons · 2020-06-09T17:00:20Z

I think this must be a major resource constraint issue because I've run this test over a hundred times, with lots of other stuff going on in my env, to try to catch it and I haven't managed to.

arizvisa · 2020-06-10T00:19:06Z

Oh yeah. It's totally a race condition. Right here at the beginning of the code in ReadNetworkingConfig, there's two consumers for the same channel (rows). I didn't have fusion when I wrote this a long time ago and plus was a newb at golang.

        consumeFile(fd)
        tokenized := tokenizeNetworkingConfig(fromfile)
        rows := splitNetworkingConfig(tokenized)
        entries := parseNetworkingConfig(rows)

        // parse the version
        parsed_version, err := networkingReadVersion(<-rows)
        if err != nil {
                return NetworkingConfig{}, err
        }

Having tests sure is awesome for catching this. It's crazy to think that by sheer luck this race condition hasn't really manifested itself until this refactor.

I'll open up a PR in a sec. I think I can read the version and then hand off the channel to parseNetworkingConfig after confirming it (which appears to be the right thing to do anyways). Although, I'm not quite sure how to reproduce this as @SwampDragons mentioned. So I can only hope my logic is sound... Is there some sane way to repro this to ensure it's squashed?

arizvisa · 2020-06-10T00:49:02Z

Okay. I think PR #9387 should fix this race.

I tried to keep things linearly defined so that there's only one consumer of each channel at a given time. I went through the other parsers to see if this pattern affected anything else but couldn't spot anything.

The reading of the version isn't in a goro so it should block until it's able to read it. Once the version has been confirmed, then we set the parser off which is free to consume rows as much as it needs to.

arizvisa · 2020-06-10T01:06:52Z

In terms of reproducing this, go test -race doesn't seem to do anything. Reason being is that it's oriented around memory reads/writes due to being based on AddressSanitizer (ASAN). So in essense only memory accesses are instrumented in order to keep track of more than one thread accessing the same piece of memory. For some reason this doesn't affect channels which I thought used a lock and a buffer of some kind which should be a memory access between two coop'd threads...

But as a general way to reproduce this and avoid this happening again, the only thing I can think of is to test after each fork/clone of a thread. Maybe sleep or blocking-syscall before reading a channel in the parent, so that a child will get woken up by the OS while the parent is blocking. As an example, if you put a time.Sleep(1.0) before the call to networkingReadVersion, you can manifest the error as the children (parseNetworkingConfig) are woken up to consume said channel.

Typically, your OS won't wake up the child thread until your parent thread blocks on some call...so something to maybe think about when dealing with goro-heavy code (which seems to be all of my code, hah), would be to do a test where I sleep in the parent, and sleep in the child to ensure that things work as they're actually intended.

arizvisa · 2020-06-10T01:07:13Z

Anyways, sorry bout that. Saw you guys just did your release. :-/

azr · 2020-06-10T08:44:45Z

Hey @arizvisa thanks for fixing this so quickly ! I think this case should not happen too much. I'll review the PR carefully now 🙂 !

The thing with chans is that they are meant to be used across processes so simply using chans will never trigger the race detector. In other words the race detector will detect unprotected 'write + read' calls, but a chan is a mean of synchronisation across goroutines and sometimes as avoidance against race-conditions.

Another thing about go testing is that it caches the results of tests, so if a test passes once, no mater how often you will re-run it the go toolchain will simply say 'success' without running it. This is especially dangerous for randomly failing tests because it sorts of irons them out. I think a specific option for it is coming soon but for now running go test -count=1 ... ( from golang/go#24573 ) will remove caching.

Fixed a race in the ReadNetworkingConfig implementation from the vmware builders

azr added bug need-more-tests tech-debt Issues and pull requests related to addressing technical debt or improving the codebase labels Jun 9, 2020

arizvisa mentioned this issue Jun 10, 2020

Fixed a race in the ReadNetworkingConfig implementation from the vmware builders #9387

Merged

SwampDragons added a commit that referenced this issue Jun 10, 2020

Merge pull request #9387 from arizvisa/GH-9382

3a13eaf

Fixed a race in the ReadNetworkingConfig implementation from the vmware builders

nywilken changed the title ~~TestParserReadNetworingConfig could be randomly failing.~~ TestParserReadNetworkingConfig could be randomly failing. Jul 10, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TestParserReadNetworkingConfig could be randomly failing. #9382

TestParserReadNetworkingConfig could be randomly failing. #9382

azr commented Jun 9, 2020 •

edited

Loading

azr commented Jun 9, 2020

SwampDragons commented Jun 9, 2020

arizvisa commented Jun 10, 2020

arizvisa commented Jun 10, 2020

arizvisa commented Jun 10, 2020

arizvisa commented Jun 10, 2020

azr commented Jun 10, 2020 •

edited

Loading

TestParserReadNetworkingConfig could be randomly failing. #9382

TestParserReadNetworkingConfig could be randomly failing. #9382

Comments

azr commented Jun 9, 2020 • edited Loading

azr commented Jun 9, 2020

SwampDragons commented Jun 9, 2020

arizvisa commented Jun 10, 2020

arizvisa commented Jun 10, 2020

arizvisa commented Jun 10, 2020

arizvisa commented Jun 10, 2020

azr commented Jun 10, 2020 • edited Loading

azr commented Jun 9, 2020 •

edited

Loading

azr commented Jun 10, 2020 •

edited

Loading