Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
enhancement: kill misbehaving IPC connections instead of deadlocking #2999
Currently, i3 blockingly writes replies and events (only after subscribing, of course) to IPC clients. If an IPC client does not read from the connection for whichever reason, this will result in i3 blocking indefinitely, effectively freezing as far as users are concerned. (Side note: the same is true for reading, i.e. once i3 starts reading anything, it blockingly waits for the entire message to arrive. This turns out to not be a problem in practice.)
My current favorite mitigation technique is to check whether the IPC connection’s file descriptor is writeable before sending a message (using
Here are a few more thoughts about this idea:
Disclaimer: I have not actually implemented anything yet, so I’m not sure if this works out in practice.
Another alternative is to implement a per-connection message queue which can be consumed byte-wise into the IPC connection using a libev watcher which is called when the socket is writeable.
That’s more portable, but raises the question of queue sizing. I would suggest applying the following rules of thumb when sizing the queue:
Also, the libev watcher priority should be set such that immediately after an i3 event loop iteration, the queued messages are written to clients, so that the buffers are empty in the common case.
referenced this issue
Apr 21, 2018
But a 0 timeout is just one extreme of the spectrum you described. Theoretically, with a 0 timeout we will kill more correctly functioning IPC clients than with a 50ms timeout.
I have a simple implementation here: https://github.com/orestisf1993/i3/tree/misbehaving-ipc-2999.
Since there doesn't seem to be a definite way to tell when an IPC client is misbehaving, the most robust (but complicated) heuristic seems to be the queue solution that could combine a timeout and the message count: a client that has missed multiple messages over a period of X seconds is probably misbehaving.