-
Notifications
You must be signed in to change notification settings - Fork 20.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
les: implement client connection logic #16899
Conversation
6fe46f0
to
4e4798f
Compare
common/mclock/simclock.go
Outdated
lastScheduled := 0 | ||
for { | ||
for i := 0; i < goSchedCount; i++ { | ||
runtime.Gosched() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This loop is weird. You should never make assumptions about the scheduler, particularly when it comes to sleeping goroutines. You need very good justification to use runtime.Gosched
.
An acceptable use of Gosched
would be when your goroutine is running a tight loop that cannot be preempted because it contains no function calls or channel synchronization. In such cases, the call to Gosched
can ensure that other goroutines will run.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I admit this is a hacky solution but it is only used for testing and it proved to be really useful. It is also used in #16490 which would be practically impossible to properly test without using a virtual timescale. This is the most reliable and least ugly solution I could come up with but I am open to suggestions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand that you need a virtual clock, my only issue is the call to Gosched
. Does the test work without this call?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. We need something to let goroutines run until they stop waiting on something again. A Sleep works too but it procuces less accurate and less reliable/reproducible results and also I'm not sure if it is less ugly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've verified this now and the tests do pass without the Gosched
call, they're just slower.
@@ -281,6 +286,19 @@ func (pm *ProtocolManager) handle(p *peer) error { | |||
p.Log().Debug("Light Ethereum handshake failed", "err", err) | |||
return err | |||
} | |||
|
|||
if !pm.lightSync && !p.Peer.Info().Network.Trusted { | |||
addr, ok := p.RemoteAddr().(*net.TCPAddr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: net.Addr
supports String
, it's one of the methods in the interface, so you don't need to check for TCPAddr
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That includes port too. I just want to use the IP address here because it is very easy for the client to change port.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still trying to wrap my head around this change, sorry for the long review delay.
An observation: You forked prque for two reasons: to have custom comparison function, and to be able to remove a specific element that might not be at the top. I agree with the latter new feature, but don't understand the need to have a custom comparison function. Here's why: if I'm not mistaken, the order of items in the queue never changes. Client 'usage' is recalculated only on entry and exit to the queue (anything else would be slightly dangerous because it might break priority order) and new clients are added only if they have less usage than any connected client. So the actual numbers don't matter much, all you need is a queue.
Please correct me if I'm wrong. Is there any condition under which a client entry might be inserted in the middle of the queue?
@fjl " Is there any condition under which a client entry might be inserted in the middle of the queue?" sure, absolutely. There are two states and accordingly, two queues here, one where the "usage" value is linearly increasing and one where it is exponentially decreasing. Clients may go back and forth any time. A client can go to "connected" state when it is slightly better than the worst currently connected one so it can indeed be inserted in the middle of the queue. It can also disconnect at any time and get back in the disconneced queue in any position. |
Thank you for the more detailed explanation. That still leaves me wondering why we need the comparison function. If we kept the priority parameter for |
4e4798f
to
6fd3f6e
Compare
@fjl I removed the comparison function and added a comment about the wrap-around priority comparison. Also made logOffset a bit more readable. I tried removing gosched from simclocks and the tests are running 20x slower. Also, it will not work for my other use cases so I did not remove it just yet. I think I have an idea how to handle this problem better but I'll have to think about it a bit. I have to leave now for a few hours, I'll try to make simclock nicer later tonight. |
@fjl I added a commit that removes GoSched and uses a "ping channel" instead. It achieves the same effect but other goroutines unrelated to the actual tested mechanism do not interfere with the process so it should be more deterministic. |
@fjl I am writing a much simpler unit test for the flow control PR and your simclock is totally enough to do it. Actually it only needs Run and Now (like this PR). I am planning another unit test for the new request serving mechanism which needs Sleep/After too and I think I can do that too with your WaitForTimers solution. |
d0cbe79
to
0740091
Compare
This PR implements les.freeClientPool. It also adds a simulated clock in common/mclock, which enables time-sensitive tests to run quickly and still produce accurate results, and package common/prque which is a generalised variant of prque that enables removing elements other than the top one from the queue. les.freeClientPool implements a client database that limits the connection time of each client and manages accepting/rejecting incoming connections and even kicking out some connected clients. The pool calculates recent usage time for each known client (a value that increases linearly when the client is connected and decreases exponentially when not connected). Clients with lower recent usage are preferred, unknown nodes have the highest priority. Already connected nodes receive a small bias in their favor in order to avoid accepting and instantly kicking out clients. Note: the pool can use any string for client identification. Using signature keys for that purpose would not make sense when being known has a negative value for the client. Currently the LES protocol manager uses IP addresses (without port address) to identify clients.
@zsfelfoldi I apologize if a merged PR is not the right avenue for this question, but I wasn't sure where else to ask: I understand the rationale for wanting to use the IP address of the connection to identify clients and reward ones that utilize less resources. However, an advertised use case that we are relying on is the use of the light protocol on devices like smartphones. This change effectively blocks two users on the same network (and thus sharing the same IP address) of connecting to the same full node. I understand that they could always go to a different node, but effectively the number of clients in a network is bound by the number of full nodes. I was wondering if that is an explicit trade-off that has been decided on or whether we can expect a future change on this, that would avoid us having to run custom nodes with forked behavior on this. |
Upstream geth uses a client pool to determine which peers to allow and stay connected to and which not. It is supposed to reward good peers who only stay connected as little as necessary and punish peers who hog the connection. However, since there is nothing that can truly uniquely identify clients (enode addresses are trivially changable), upstream geth uses IP addresses as a proxyy. However this breaks light clients with the same IP addresss, as well as setups like Kubernetes that appear to geth under the same IP address due to their networking. Thus this change is a short term hack to stabilizes connections for us by doing the lookup by the enode ID. Longer-term we need to figure out a solution to this. Check out celo-org/celo-monorepo#522 and ethereum/go-ethereum#16899 (comment)
* Lookup peers in the client pool by ID instead of by IP address Upstream geth uses a client pool to determine which peers to allow and stay connected to and which not. It is supposed to reward good peers who only stay connected as little as necessary and punish peers who hog the connection. However, since there is nothing that can truly uniquely identify clients (enode addresses are trivially changable), upstream geth uses IP addresses as a proxyy. However this breaks light clients with the same IP addresss, as well as setups like Kubernetes that appear to geth under the same IP address due to their networking. Thus this change is a short term hack to stabilizes connections for us by doing the lookup by the enode ID. Longer-term we need to figure out a solution to this. Check out celo-org/celo-monorepo#522 and ethereum/go-ethereum#16899 (comment) * Remove old code and use p.id directly
* Lookup peers in the client pool by ID instead of by IP address Upstream geth uses a client pool to determine which peers to allow and stay connected to and which not. It is supposed to reward good peers who only stay connected as little as necessary and punish peers who hog the connection. However, since there is nothing that can truly uniquely identify clients (enode addresses are trivially changable), upstream geth uses IP addresses as a proxyy. However this breaks light clients with the same IP addresss, as well as setups like Kubernetes that appear to geth under the same IP address due to their networking. Thus this change is a short term hack to stabilizes connections for us by doing the lookup by the enode ID. Longer-term we need to figure out a solution to this. Check out celo-org/celo-monorepo#522 and ethereum/go-ethereum#16899 (comment) * Remove old code and use p.id directly
This PR implements les.freeClientPool. It also adds common/mclock.SimulatedClock which enables time-sensitive tests to run quickly and still produce accurate results, and les/prque package which is a generalised variant of gopkg.in/karalabe/cookiejar.v2/collections/prque, allows different comparison functions and also enables removing elements other than the top one from the queue.
les.freeClientPool implements a client database that limits the connection time of each client and manages accepting/rejecting incoming connections and even kicking out some connected clients. The pool calculates recent usage time for each known client (a value that increases linearly when the client is connected and decreases exponentially when not connected). Clients with lower recent usage are preferred, unknown nodes have the highest priority. Already connected nodes receive a small bias in their favor in order to avoid accepting and instantly kicking out clients.
Note: the pool can use any string for client identification. Using signature keys for that purpose would not make sense when being known has a negative value for the client. Currently the LES protocol manager uses IP addresses (without port address) to identify clients.