Skip to content

Conversation

cfromknecht
Copy link
Contributor

@cfromknecht cfromknecht commented Nov 3, 2017

This PR is intended to address the following flake observed in the graph_topology_notifications:

    --- FAIL: TestLightningNetworkDaemon/graph_topology_notifications (17.59s)
    	lnd_test.go:66: Failed: (graph topology notifications): exited with error: 
    		*errors.errorString timeout waiting for graph notification 0
    		/home/travis/gopath/src/github.com/lightningnetwork/lnd/lnd_test.go:3344 (0xba7fd3)
    			testGraphTopologyNotifications: t.Fatalf("timeout waiting for graph notification %v", i)
    		/home/travis/gopath/src/github.com/lightningnetwork/lnd/lnd_test.go:88 (0xb9177b)
    			(*harnessTest).RunTestCase: testCase.test(net, h)
    		/home/travis/gopath/src/github.com/lightningnetwork/lnd/lnd_test.go:4110 (0xbc0197)
    			TestLightningNetworkDaemon.func3: ht.RunTestCase(testCase, lndHarness)
    		/home/travis/.gimme/versions/go1.8.5.linux.amd64/src/testing/testing.go:657 (0x4ee356)
    			tRunner: fn(t)
    		/home/travis/.gimme/versions/go1.8.5.linux.amd64/src/runtime/asm_amd64.s:2197 (0x45b711)
    			goexit: BYTE	$0x90	// NOP

After a non-trivial amount of travis flake hunting, the problem seems to have been an artifact of the shutdown procedure of the readHandler and the writeHandler. I have had trouble getting logs that clearly signal this is the issue, but have not encountered the error since making these changes. The bulk of the fix was to defer signalling waitgroups and closing chanMsgStream within the handlers. I also added some small fixes to the server code, specifically:

  • fixing a print statement, because I can't English write
  • use the peer's public SendMessage, which wraps the private method we used before
  • output of Peers() now returns the peers contained in the peersByPub map, instead peersByID

Status: Removed debug prints and error resurfaced 🤦‍♂️

@cfromknecht cfromknecht force-pushed the btcd-chainview-shutdown branch from 5ead2ff to 81b3d50 Compare November 3, 2017 03:44
@cfromknecht cfromknecht changed the title routing/chainview/btcd: reorders shutdown procedure WIP: Flake Huntin' Nov 3, 2017
@cfromknecht cfromknecht force-pushed the btcd-chainview-shutdown branch 6 times, most recently from 48793d7 to 39adbcd Compare November 7, 2017 00:02
@cfromknecht cfromknecht changed the title WIP: Flake Huntin' Revamp Peer Disconnection Nov 7, 2017
 * fixes logging statement in send to peer
 * returns peers from peersByPub instead of peersByID
 * uses peer's public SendMessage method
@cfromknecht cfromknecht force-pushed the btcd-chainview-shutdown branch from 39adbcd to 4655ec3 Compare November 7, 2017 02:17
@cfromknecht cfromknecht changed the title Revamp Peer Disconnection WIP: Revamp Peer Disconnection Nov 7, 2017
@cfromknecht
Copy link
Contributor Author

Closing due to inconclusive solution, believe this will be partially addressed in #432

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant