les: implement client connection logic #16899

zsfelfoldi · 2018-06-05T18:53:12Z

This PR implements les.freeClientPool. It also adds common/mclock.SimulatedClock which enables time-sensitive tests to run quickly and still produce accurate results, and les/prque package which is a generalised variant of gopkg.in/karalabe/cookiejar.v2/collections/prque, allows different comparison functions and also enables removing elements other than the top one from the queue.

les.freeClientPool implements a client database that limits the connection time of each client and manages accepting/rejecting incoming connections and even kicking out some connected clients. The pool calculates recent usage time for each known client (a value that increases linearly when the client is connected and decreases exponentially when not connected). Clients with lower recent usage are preferred, unknown nodes have the highest priority. Already connected nodes receive a small bias in their favor in order to avoid accepting and instantly kicking out clients.

Note: the pool can use any string for client identification. Using signature keys for that purpose would not make sense when being known has a negative value for the client. Currently the LES protocol manager uses IP addresses (without port address) to identify clients.

fjl · 2018-07-03T16:46:58Z

common/mclock/simclock.go

+		lastScheduled := 0
+		for {
+			for i := 0; i < goSchedCount; i++ {
+				runtime.Gosched()


This loop is weird. You should never make assumptions about the scheduler, particularly when it comes to sleeping goroutines. You need very good justification to use runtime.Gosched.

An acceptable use of Gosched would be when your goroutine is running a tight loop that cannot be preempted because it contains no function calls or channel synchronization. In such cases, the call to Gosched can ensure that other goroutines will run.

I admit this is a hacky solution but it is only used for testing and it proved to be really useful. It is also used in #16490 which would be practically impossible to properly test without using a virtual timescale. This is the most reliable and least ugly solution I could come up with but I am open to suggestions.

I understand that you need a virtual clock, my only issue is the call to Gosched. Does the test work without this call?

No. We need something to let goroutines run until they stop waiting on something again. A Sleep works too but it procuces less accurate and less reliable/reproducible results and also I'm not sure if it is less ugly.

I've verified this now and the tests do pass without the Gosched call, they're just slower.

fjl · 2018-07-03T17:16:56Z

les/handler.go

@@ -281,6 +286,19 @@ func (pm *ProtocolManager) handle(p *peer) error {
 		p.Log().Debug("Light Ethereum handshake failed", "err", err)
 		return err
 	}
+
+	if !pm.lightSync && !p.Peer.Info().Network.Trusted {
+		addr, ok := p.RemoteAddr().(*net.TCPAddr)


Note: net.Addr supports String, it's one of the methods in the interface, so you don't need to check for TCPAddr here.

That includes port too. I just want to use the IP address here because it is very easy for the client to change port.

fjl

I'm still trying to wrap my head around this change, sorry for the long review delay.

An observation: You forked prque for two reasons: to have custom comparison function, and to be able to remove a specific element that might not be at the top. I agree with the latter new feature, but don't understand the need to have a custom comparison function. Here's why: if I'm not mistaken, the order of items in the queue never changes. Client 'usage' is recalculated only on entry and exit to the queue (anything else would be slightly dangerous because it might break priority order) and new clients are added only if they have less usage than any connected client. So the actual numbers don't matter much, all you need is a queue.

Please correct me if I'm wrong. Is there any condition under which a client entry might be inserted in the middle of the queue?

zsfelfoldi · 2018-07-04T10:12:22Z

@fjl " Is there any condition under which a client entry might be inserted in the middle of the queue?" sure, absolutely. There are two states and accordingly, two queues here, one where the "usage" value is linearly increasing and one where it is exponentially decreasing. Clients may go back and forth any time. A client can go to "connected" state when it is slightly better than the worst currently connected one so it can indeed be inserted in the middle of the queue. It can also disconnect at any time and get back in the disconneced queue in any position.

fjl · 2018-07-04T11:43:31Z

Thank you for the more detailed explanation. That still leaves me wondering why we need the comparison function. If we kept the priority parameter for Push, you'd just use linUsage or logUsage as priority. How would that be different?

zsfelfoldi · 2018-07-04T15:13:23Z

@fjl I removed the comparison function and added a comment about the wrap-around priority comparison. Also made logOffset a bit more readable. I tried removing gosched from simclocks and the tests are running 20x slower. Also, it will not work for my other use cases so I did not remove it just yet. I think I have an idea how to handle this problem better but I'll have to think about it a bit. I have to leave now for a few hours, I'll try to make simclock nicer later tonight.

zsfelfoldi · 2018-07-04T20:39:28Z

@fjl I added a commit that removes GoSched and uses a "ping channel" instead. It achieves the same effect but other goroutines unrelated to the actual tested mechanism do not interfere with the process so it should be more deterministic.

zsfelfoldi · 2018-07-08T15:07:01Z

@fjl I am writing a much simpler unit test for the flow control PR and your simclock is totally enough to do it. Actually it only needs Run and Now (like this PR). I am planning another unit test for the new request serving mechanism which needs Sleep/After too and I think I can do that too with your WaitForTimers solution.

This PR implements les.freeClientPool. It also adds a simulated clock in common/mclock, which enables time-sensitive tests to run quickly and still produce accurate results, and package common/prque which is a generalised variant of prque that enables removing elements other than the top one from the queue. les.freeClientPool implements a client database that limits the connection time of each client and manages accepting/rejecting incoming connections and even kicking out some connected clients. The pool calculates recent usage time for each known client (a value that increases linearly when the client is connected and decreases exponentially when not connected). Clients with lower recent usage are preferred, unknown nodes have the highest priority. Already connected nodes receive a small bias in their favor in order to avoid accepting and instantly kicking out clients. Note: the pool can use any string for client identification. Using signature keys for that purpose would not make sense when being known has a negative value for the client. Currently the LES protocol manager uses IP addresses (without port address) to identify clients.

nambrot · 2018-09-04T20:51:06Z

@zsfelfoldi I apologize if a merged PR is not the right avenue for this question, but I wasn't sure where else to ask:

I understand the rationale for wanting to use the IP address of the connection to identify clients and reward ones that utilize less resources. However, an advertised use case that we are relying on is the use of the light protocol on devices like smartphones. This change effectively blocks two users on the same network (and thus sharing the same IP address) of connecting to the same full node. I understand that they could always go to a different node, but effectively the number of clients in a network is bound by the number of full nodes. I was wondering if that is an explicit trade-off that has been decided on or whether we can expect a future change on this, that would avoid us having to run custom nodes with forked behavior on this.

Upstream geth uses a client pool to determine which peers to allow and stay connected to and which not. It is supposed to reward good peers who only stay connected as little as necessary and punish peers who hog the connection. However, since there is nothing that can truly uniquely identify clients (enode addresses are trivially changable), upstream geth uses IP addresses as a proxyy. However this breaks light clients with the same IP addresss, as well as setups like Kubernetes that appear to geth under the same IP address due to their networking. Thus this change is a short term hack to stabilizes connections for us by doing the lookup by the enode ID. Longer-term we need to figure out a solution to this. Check out celo-org/celo-monorepo#522 and ethereum/go-ethereum#16899 (comment)

* Lookup peers in the client pool by ID instead of by IP address Upstream geth uses a client pool to determine which peers to allow and stay connected to and which not. It is supposed to reward good peers who only stay connected as little as necessary and punish peers who hog the connection. However, since there is nothing that can truly uniquely identify clients (enode addresses are trivially changable), upstream geth uses IP addresses as a proxyy. However this breaks light clients with the same IP addresss, as well as setups like Kubernetes that appear to geth under the same IP address due to their networking. Thus this change is a short term hack to stabilizes connections for us by doing the lookup by the enode ID. Longer-term we need to figure out a solution to this. Check out celo-org/celo-monorepo#522 and ethereum/go-ethereum#16899 (comment) * Remove old code and use p.id directly

zsfelfoldi added this to the 1.8.11 milestone Jun 5, 2018

zsfelfoldi added the please review label Jun 5, 2018

zsfelfoldi modified the milestones: 1.8.11, 1.8.12 Jun 11, 2018

zsfelfoldi force-pushed the les-disconnect branch from 6fe46f0 to 4e4798f Compare June 13, 2018 11:27

fjl self-requested a review June 25, 2018 12:23

fjl reviewed Jul 3, 2018

View reviewed changes

zsfelfoldi force-pushed the les-disconnect branch from 4e4798f to 6fd3f6e Compare July 4, 2018 15:08

fjl modified the milestones: 1.8.12, 1.8.13 Jul 5, 2018

zsfelfoldi and others added 6 commits July 9, 2018 12:42

common/mclock: add SimulatedClock

d638d20

les/prque: a more general variant of the prque package

28821bb

les: implement free client pool

901f68d

common/mclock: use ping channel instead of GoSched

7aa34bc

common/mclock: improve simulated clock

0320df2

les: adapt to mclock change

0740091

zsfelfoldi force-pushed the les-disconnect branch from d0cbe79 to 0740091 Compare July 9, 2018 10:44

zsfelfoldi added 4 commits August 8, 2018 21:02

les: fix test by calling the disconnectFn callback directly

4a6959a

les: remove delayed announcement

36bfbc6

les: freeClientPool.stop

b005cd5

les: fix removed condition

19b3839

fjl approved these changes Aug 14, 2018

View reviewed changes

fjl merged commit b2ddb1f into ethereum:master Aug 14, 2018

This was referenced Sep 11, 2018

Light Clients SBAT to connect to nodes on the same network celo-org/celo-blockchain#40

Closed

Lookup peers in the client pool by ID instead of by IP address celo-org/celo-blockchain#41

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

les: implement client connection logic #16899

les: implement client connection logic #16899

zsfelfoldi commented Jun 5, 2018

fjl Jul 3, 2018 •

edited

Loading

zsfelfoldi Jul 4, 2018

fjl Jul 4, 2018

zsfelfoldi Jul 4, 2018

fjl Jul 4, 2018

fjl Jul 3, 2018 •

edited

Loading

zsfelfoldi Jul 4, 2018

fjl left a comment •

edited

Loading

zsfelfoldi commented Jul 4, 2018

fjl commented Jul 4, 2018

zsfelfoldi commented Jul 4, 2018

zsfelfoldi commented Jul 4, 2018

zsfelfoldi commented Jul 8, 2018

nambrot commented Sep 4, 2018

les: implement client connection logic #16899

les: implement client connection logic #16899

Conversation

zsfelfoldi commented Jun 5, 2018

fjl Jul 3, 2018 • edited Loading

Choose a reason for hiding this comment

zsfelfoldi Jul 4, 2018

Choose a reason for hiding this comment

fjl Jul 4, 2018

Choose a reason for hiding this comment

zsfelfoldi Jul 4, 2018

Choose a reason for hiding this comment

fjl Jul 4, 2018

Choose a reason for hiding this comment

fjl Jul 3, 2018 • edited Loading

Choose a reason for hiding this comment

zsfelfoldi Jul 4, 2018

Choose a reason for hiding this comment

fjl left a comment • edited Loading

Choose a reason for hiding this comment

zsfelfoldi commented Jul 4, 2018

fjl commented Jul 4, 2018

zsfelfoldi commented Jul 4, 2018

zsfelfoldi commented Jul 4, 2018

zsfelfoldi commented Jul 8, 2018

nambrot commented Sep 4, 2018

fjl Jul 3, 2018 •

edited

Loading

fjl Jul 3, 2018 •

edited

Loading

fjl left a comment •

edited

Loading