Add fail safe in pcstore.WatchAndSync to protect missing pod clusters. #755

mpuncel · 2017-02-09T19:28:28Z

To protect against data loss or misfiring consul queries, this commit
adds a failsafe to the pcstore.WatchAndSync function in which it will
not call "SyncCluster" or "DeleteCluster" when the pod cluster watch
returns no data.

This is because a syncer may wish to perform destructive action at the
time of a pod cluster deletion, and if the entire set goes away it may
be disastrous.

To protect against data loss or misfiring consul queries, this commit adds a failsafe to the pcstore.WatchAndSync function in which it will not call "SyncCluster" or "DeleteCluster" when the pod cluster watch returns no data. This is because a syncer may wish to perform destructive action at the time of a pod cluster deletion, and if the entire set goes away it may be disastrous.

mpuncel · 2017-02-09T19:29:38Z

pkg/store/consul/pcstore/consul_store_test.go

+	}()
+
+	select {
+	case <-time.After(1 * time.Second):


writing a test that must take a second to pass makes me sad but short of passing an error on a channel within the failsafe if to the syncing routine i'm not sure how to avoid it

Note that I also decided I could do no better in https://github.com/square/p2/pull/750/files#diff-465f690c6c15563e83acbe3a7e9a4e2cR986 or https://github.com/square/p2/pull/751/files#diff-465f690c6c15563e83acbe3a7e9a4e2cR990

Extracting the contents of the for { select { in pcstore/consul_store.go into a helper function would allow for unit testing its behaviour. The method looks a bit like channel soup to me, so idk if it's worth the trade.

I think we could do better, yeah. The behavior I don't want to happen (DeleteCluster() called) is in a different function though that is communicated with by a channel. I think it'd be fairly invasive to reorganize. Something we should look at in a later PR i think

petertseng · 2017-02-09T19:39:38Z

pkg/store/consul/pcstore/consul_store_test.go

+	go func() {
+		err := store.WatchAndSync(syncer, quit)
+		if err != nil {
+			t.Fatalf("Couldn't start WatchAndSync(): %s", err)


funny story - Fatalf calls FailNow, and https://golang.org/pkg/testing/#B.FailNow "FailNow must be called from the goroutine running the test or benchmark function, not from other goroutines created during the test."

Now, I'm not sure what happens if some other goroutine calls it. Perhaps we could try?

Interesting, i've observed that behavior in the past but never seen an explanation. Yeah i think it won't work in its current state.

In this case it happens that err can't be nil because only GetInitialClusters() returning an error can cause it, and our mock implementation unconditionally returns a nil error

mpuncel commented Feb 9, 2017

View reviewed changes

rudle approved these changes Feb 9, 2017

View reviewed changes

petertseng reviewed Feb 9, 2017

View reviewed changes

mpuncel merged commit c4d7769 into square:master Feb 9, 2017

mpuncel deleted the mpuncel/dont-sync-pod-clusters-if-no-pod-clusters branch March 8, 2017 13:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add fail safe in pcstore.WatchAndSync to protect missing pod clusters. #755

Add fail safe in pcstore.WatchAndSync to protect missing pod clusters. #755

mpuncel commented Feb 9, 2017

mpuncel Feb 9, 2017

petertseng Feb 9, 2017

rudle Feb 9, 2017

mpuncel Feb 9, 2017

petertseng Feb 9, 2017

mpuncel Feb 9, 2017

mpuncel Feb 9, 2017

Add fail safe in pcstore.WatchAndSync to protect missing pod clusters. #755

Add fail safe in pcstore.WatchAndSync to protect missing pod clusters. #755

Conversation

mpuncel commented Feb 9, 2017

mpuncel Feb 9, 2017

Choose a reason for hiding this comment

petertseng Feb 9, 2017

Choose a reason for hiding this comment

rudle Feb 9, 2017

Choose a reason for hiding this comment

mpuncel Feb 9, 2017

Choose a reason for hiding this comment

petertseng Feb 9, 2017

Choose a reason for hiding this comment

mpuncel Feb 9, 2017

Choose a reason for hiding this comment

mpuncel Feb 9, 2017

Choose a reason for hiding this comment