Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IPC/Websockets requests continue executing after connection closes #20583

Open
AusIV opened this issue Jan 20, 2020 · 0 comments
Open

IPC/Websockets requests continue executing after connection closes #20583

AusIV opened this issue Jan 20, 2020 · 0 comments
Assignees
Labels

Comments

@AusIV
Copy link
Contributor

AusIV commented Jan 20, 2020

System information

Geth
Version: 1.9.9-stable
Architecture: amd64
Protocol Versions: [64 63]
Go Version: go1.13.3
Operating System: linux
GOPATH=/home/aroberts/Projects/gopath
GOROOT=/usr/lib/go-1.13

Expected behaviour

A long running RPC request made via IPC, such as {"jsonrpc":"2.0","id":547951,"method":"eth_getLogs","params":[{"fromBlock":"0x0","toBlock":"latest"}]} should stop executing if the IPC connection is closed. This is the behavior for HTTP.

Actual behaviour

The RPC requests continues executing until the response is calculated.

Steps to reproduce the behaviour

Run:

echo '{"jsonrpc":"2.0","id":547951,"method":"eth_getLogs","params":[{"fromBlock":"0x0","toBlock":"latest"}]}' | nc -U ~/.ethereum/geth.ipc

Then Ctrl+C to kill the request. CPU usage remains high until geth finishes processing the request.

Analysis

I've dug pretty deep into the RPC module to understand why this is happening, why it cancels properly with HTTP but not with IPC. Having looked at not only the code but the Git history, I think there are several things in the RPC module that aren't working as intended.

Starting with https://github.com/ethereum/go-ethereum/blob/master/rpc/client.go#L554 - Here it seems to be assumed that as soon as an read error is received, that the connection is closed and things should be cleaned up. This doesn't necessarily hold up with IPC. An EOF message can be received, but that only indicates that the last request has been sent; the other end of the IPC connection may still be waiting for a response, and the connection is not actually closed. But when a read error occurs, that takes us to https://github.com/ethereum/go-ethereum/blob/master/rpc/client.go#L118, which in turn takes us to https://github.com/ethereum/go-ethereum/blob/master/rpc/handler.go#L152.

The handler.close() function attempts to cancel outstanding requests, waits for the requests to finish executing, cancels the root context for this connection, then closes any subscriptions. The next problem is that handler.cancelAllRequests() doesn't seem to actually stop the execution of any outstanding requests, so they run to completion anyway. If you cancel the root context at that point, the processes stemming from this connection will terminate pretty quickly, but we don't necessarily want to do that because in both the HTTP and IPC implementations (and possibly others) the client dispatcher (linked above) can receive an EOF error even while they're still capable of completing the request and sending a response.

At one point in the past, the handler.close() function attempted to cancel the root context before waiting for requests to finish, but this was changed in this commit with relatively little explanation. From what I can gather, it looks like handler.close() was getting called erroneously because of EOF errors that don't really mean the connection has closed. The fact that handler.cancelAllRequests() doesn't actually stop requests from executing combined with moving the context cancellation after the waitgroup made things work, but in a way that seems rather fragile.

It seems to me that properly cancelling requests when the connection that made them gets disconnected need to involve directly monitoring the connection to see if it closed, and cancelling the context when the request closed. Unfortunately, given limitations of the Go net.Conn interface, the only way to really tell if a connection is "closed" is to attempt to write to it and see if you get an error - I haven't really found another clean approach that works across the board (though you can send an empty message to see if you get an error).

I have a viable workaround for my own purposes, but I wanted to bring this up, as the current implementation seems to have several pieces that don't work like I would expect, and fixing some components to work as they seem to be intended would actually cause other things to break.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants