New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enhancement: kill misbehaving IPC connections instead of deadlocking #2999

Closed
stapelberg opened this Issue Sep 29, 2017 · 3 comments

Comments

Projects
None yet
4 participants
@stapelberg
Member

stapelberg commented Sep 29, 2017

Currently, i3 blockingly writes replies and events (only after subscribing, of course) to IPC clients. If an IPC client does not read from the connection for whichever reason, this will result in i3 blocking indefinitely, effectively freezing as far as users are concerned. (Side note: the same is true for reading, i.e. once i3 starts reading anything, it blockingly waits for the entire message to arrive. This turns out to not be a problem in practice.)

This is a recurring problem (see #2280 and #2539 for examples), so I’m filing this issue to brainstorm about what we could do about it, if anything.

My current favorite mitigation technique is to check whether the IPC connection’s file descriptor is writeable before sending a message (using select(2) with a 0 timeout). If not, we kill the connection, preventing a deadlock.

Here are a few more thoughts about this idea:

  • We should spawn an i3-nagbar instance to inform users that they just ran into a third-party bug. Otherwise, software might start to rely on this behavior (e.g. it might not clean up IPC connections which are no longer used).
    • We should refer users to a help document on https://i3wm.org/docs where we explain how to further diagnose the issue.
  • I considered using a timeout instead, but timeouts are very brittle: they make assumptions about how quickly processes run at every point in time on vastly different computers. E.g., if we decided to use a 10 second timeout, and another program starts swapping to hard disk (hence starving other processes for a while), i3 would incorrectly kill correctly functioning IPC clients. If we opt for a timeout that is too high, the mitigation becomes ineffective and users would still think that i3 hangs.
    • I think that the check-writeability approach will gracefully degrade on systems which are under high load: if the IPC client is slow, i3 will likely be slowed down to a somewhat similar extent.
  • Linux-specific: as per http://elixir.free-electrons.com/linux/v4.13.4/source/include/net/sock.h#L2139 and https://stackoverflow.com/a/22007520/712014, UNIX sockets are reported as writeable as long as half of their send buffer is available. Linux’s default value for the default and maximum send buffer size is 212992 bytes, i.e. individual i3 messages must not exceed 106496 bytes for this mitigation technique to be effective. The largest messages we send are TREE replies, which can easily reach 60 kb in size (observed on my somewhat big session right now). Luckily, the most common case for deadlocks are events, which are significantly smaller in size, so we should be good.
  • Some platforms have significantly lower send buffer size limits (IIRC, BSDs default to 32k), so i3 should try to increase the limit to the maximum value or 256k, whichever is lower.
  • If the platform does not correctly report writeability, lacks the SIOCOUTQ (or similar) ioctl, or has a too strict limit on the maximum send buffer size, we’d fall back to the current behavior and deadlock.

Disclaimer: I have not actually implemented anything yet, so I’m not sure if this works out in practice.

cc @Airblader
cc @acrisci
cc @Merovius (we briefly spoke about this a few weeks ago)
cc @hannes to keep me honest about linux networking

@Airblader

This comment has been minimized.

Show comment
Hide comment
@Airblader

Airblader Sep 29, 2017

Member

SGTM, but I know next to nothing about these things, so I cannot really weigh in here.

Member

Airblader commented Sep 29, 2017

SGTM, but I know next to nothing about these things, so I cannot really weigh in here.

@stapelberg

This comment has been minimized.

Show comment
Hide comment
@stapelberg

stapelberg Sep 29, 2017

Member

Another alternative is to implement a per-connection message queue which can be consumed byte-wise into the IPC connection using a libev watcher which is called when the socket is writeable.

That’s more portable, but raises the question of queue sizing. I would suggest applying the following rules of thumb when sizing the queue:

  • The queue must hold at least the number of messages which a single i3 event loop iteration can generate, otherwise we’d kill clients prematurely.
  • The queue must not be sized so large that misbehaving clients are turned into what looks like a memory leak, as the connection between a misbehaving client and large i3 memory usage is not obvious for users.
  • A reasonable queue size might be, say, 5 * minimum queue size.

Also, the libev watcher priority should be set such that immediately after an i3 event loop iteration, the queued messages are written to clients, so that the buffers are empty in the common case.

Member

stapelberg commented Sep 29, 2017

Another alternative is to implement a per-connection message queue which can be consumed byte-wise into the IPC connection using a libev watcher which is called when the socket is writeable.

That’s more portable, but raises the question of queue sizing. I would suggest applying the following rules of thumb when sizing the queue:

  • The queue must hold at least the number of messages which a single i3 event loop iteration can generate, otherwise we’d kill clients prematurely.
  • The queue must not be sized so large that misbehaving clients are turned into what looks like a memory leak, as the connection between a misbehaving client and large i3 memory usage is not obvious for users.
  • A reasonable queue size might be, say, 5 * minimum queue size.

Also, the libev watcher priority should be set such that immediately after an i3 event loop iteration, the queued messages are written to clients, so that the buffers are empty in the common case.

orestisf1993 added a commit to orestisf1993/i3 that referenced this issue Apr 23, 2018

Kill misbehaving subscribed clients instead of hanging
This change only affects clients that are subscribed to events, which
should be the main cause of our problems.

Fixes i3#2999.
@orestisf1993

This comment has been minimized.

Show comment
Hide comment
@orestisf1993

orestisf1993 Apr 23, 2018

Member

My current favorite mitigation technique is to check whether the IPC connection’s file descriptor is writeable before sending a message (using select(2) with a 0 timeout). If not, we kill the connection, preventing a deadlock.
...
I considered using a timeout instead, but timeouts are very brittle: they make assumptions about how quickly processes run at every point in time on vastly different computers. E.g., if we decided to use a 10 second timeout, and another program starts swapping to hard disk (hence starving other processes for a while), i3 would incorrectly kill correctly functioning IPC clients.

But a 0 timeout is just one extreme of the spectrum you described. Theoretically, with a 0 timeout we will kill more correctly functioning IPC clients than with a 50ms timeout.

I have a simple implementation here: https://github.com/orestisf1993/i3/tree/misbehaving-ipc-2999.

Since there doesn't seem to be a definite way to tell when an IPC client is misbehaving, the most robust (but complicated) heuristic seems to be the queue solution that could combine a timeout and the message count: a client that has missed multiple messages over a period of X seconds is probably misbehaving.

Member

orestisf1993 commented Apr 23, 2018

My current favorite mitigation technique is to check whether the IPC connection’s file descriptor is writeable before sending a message (using select(2) with a 0 timeout). If not, we kill the connection, preventing a deadlock.
...
I considered using a timeout instead, but timeouts are very brittle: they make assumptions about how quickly processes run at every point in time on vastly different computers. E.g., if we decided to use a 10 second timeout, and another program starts swapping to hard disk (hence starving other processes for a while), i3 would incorrectly kill correctly functioning IPC clients.

But a 0 timeout is just one extreme of the spectrum you described. Theoretically, with a 0 timeout we will kill more correctly functioning IPC clients than with a 50ms timeout.

I have a simple implementation here: https://github.com/orestisf1993/i3/tree/misbehaving-ipc-2999.

Since there doesn't seem to be a definite way to tell when an IPC client is misbehaving, the most robust (but complicated) heuristic seems to be the queue solution that could combine a timeout and the message count: a client that has missed multiple messages over a period of X seconds is probably misbehaving.

orestisf1993 added a commit to orestisf1993/i3 that referenced this issue Apr 23, 2018

Kill misbehaving subscribed clients instead of hanging
This change only affects clients that are subscribed to events, which
should be the main cause of our problems.

In the common case (socket is immediately writeable) the behaviour
doesn't change at all: the message is sent directly, no ev_io /
ev_timeout callback is enabled.

We can still theoretically hang if the client stops reading while
ipc_send_message() is writing the message. This is not a huge problem in
practice but we could avoid it by using a simple non-blocking write()
loop instead of writeall(). On EAGAIN we can return the amount of bytes
written and save it in the pending_message_t struct. Then, when the
socket is writeable again, we can continue writing from where we
stopped.

We could also replace ipc_send_message() for all client connections in
i3, not just those subscribed to events.

We can also limit the amount of messages stored and increase the timeout
(or use multiple timeouts): eg it's ok if a client is not reading for 10
seconds and we are only holding 5KB of messages for them but it is not
ok if they are inactive for 5 seconds and we have 30MB of messages held.

Closes i3#2999
Closes i3#2539

orestisf1993 added a commit to orestisf1993/i3 that referenced this issue Apr 23, 2018

Kill misbehaving subscribed clients instead of hanging
This change only affects clients that are subscribed to events, which
should be the main cause of our problems.

In the common case (socket is immediately writeable) the behaviour
doesn't change at all: the message is sent directly, no ev_io /
ev_timeout callback is enabled.

We can still theoretically hang if the client stops reading while
ipc_send_message() is writing the message. This is not a huge problem in
practice but we could avoid it by using a simple non-blocking write()
loop instead of writeall(). On EAGAIN we can return the amount of bytes
written and save it in the pending_message_t struct. Then, when the
socket is writeable again, we can continue writing from where we
stopped.

We could also replace ipc_send_message() for all client connections in
i3, not just those subscribed to events.

We can also limit the amount of messages stored and increase the timeout
(or use multiple timeouts): eg it's ok if a client is not reading for 10
seconds and we are only holding 5KB of messages for them but it is not
ok if they are inactive for 5 seconds and we have 30MB of messages held.

Closes i3#2999
Closes i3#2539

orestisf1993 added a commit to orestisf1993/i3 that referenced this issue Apr 23, 2018

Kill misbehaving subscribed clients instead of hanging
This change only affects clients that are subscribed to events, which
should be the main cause of our problems.

In the common case (socket is immediately writeable) the behaviour
doesn't change at all: the message is sent directly, no ev_io /
ev_timeout callback is enabled.
When a socket is found to be not writeable, instead of calling the
blocking ipc_send_message we put the message in the tail of a queue and
init a ev_io callback and a corresponding timer. If timer is triggered
first, the socket is closed and the client connection is removed. If the
socket becomes writeable before the timeout we either reset the timer if
we couldn't push all stored messages or completely remove it if
everything was pushed.

A simpler alternative is to immediately remove clients with
non-writeable sockets (using the same socket_writeable() function):
https://github.com/orestisf1993/i3/tree/misbehaving-ipc-2999
The benefit of the queue + libev method is that we can allow timeouts
that are orders of magnitutude larger without blocking i3 temporarily.

We can still theoretically hang if the client stops reading while
ipc_send_message() is writing the message. This is not a huge problem in
practice but we could avoid it by using a simple non-blocking write()
loop instead of writeall(). On EAGAIN we can return the amount of bytes
written and save it in the pending_message_t struct. Then, when the
socket is writeable again, we can continue writing from where we
stopped.

We could also replace ipc_send_message() for all client connections in
i3, not just those subscribed to events.

We can also limit the amount of messages stored and increase the timeout
(or use multiple timeouts): eg it's ok if a client is not reading for 10
seconds and we are only holding 5KB of messages for them but it is not
ok if they are inactive for 5 seconds and we have 30MB of messages held.

Closes i3#2999
Closes i3#2539

orestisf1993 added a commit to orestisf1993/i3 that referenced this issue Apr 23, 2018

Kill misbehaving subscribed clients instead of hanging
This change only affects clients that are subscribed to events, which
should be the main cause of our problems.

In the common case (socket is immediately writeable) the behaviour
doesn't change at all: the message is sent directly, no ev_io /
ev_timeout callback is enabled.
When a socket is found to be not writeable, instead of calling the
blocking ipc_send_message we put the message in the tail of a queue and
init a ev_io callback and a corresponding timer. If timer is triggered
first, the socket is closed and the client connection is removed. If the
socket becomes writeable before the timeout we either reset the timer if
we couldn't push all stored messages or completely remove it if
everything was pushed.

A simpler alternative is to immediately remove clients with
non-writeable sockets (using the same socket_writeable() function):
https://github.com/orestisf1993/i3/tree/misbehaving-ipc-2999
The benefit of the queue + libev method is that we can allow timeouts
that are orders of magnitude larger without blocking i3 temporarily.

We can still theoretically hang if the client stops reading while
ipc_send_message() is writing the message. This is not a huge problem in
practice but we could avoid it by using a simple non-blocking write()
loop instead of writeall(). On EAGAIN we can return the amount of bytes
written and save it in the pending_message_t struct. Then, when the
socket is writeable again, we can continue writing from where we
stopped.

We could also replace ipc_send_message() for all client connections in
i3, not just those subscribed to events.

We can also limit the amount of messages stored and increase the timeout
(or use multiple timeouts): eg it's ok if a client is not reading for 10
seconds and we are only holding 5KB of messages for them but it is not
ok if they are inactive for 5 seconds and we have 30MB of messages held.

Closes i3#2999
Closes i3#2539

orestisf1993 added a commit to orestisf1993/i3 that referenced this issue Apr 23, 2018

Kill misbehaving subscribed clients instead of hanging
This change only affects clients that are subscribed to events, which
should be the main cause of our problems.

In the common case (socket is immediately writeable) the behaviour
doesn't change at all: the message is sent directly, no ev_io /
ev_timeout callback is enabled.
When a socket is found to be not writeable, instead of calling the
blocking ipc_send_message we put the message in the tail of a queue and
init a ev_io callback and a corresponding timer. If timer is triggered
first, the socket is closed and the client connection is removed. If the
socket becomes writeable before the timeout we either reset the timer if
we couldn't push all stored messages or completely remove it if
everything was pushed.

A simpler alternative is to immediately remove clients with
non-writeable sockets (using the same socket_writeable() function):
https://github.com/orestisf1993/i3/tree/misbehaving-ipc-2999
The benefit of the queue + libev method is that we can allow timeouts
that are orders of magnitude larger without blocking i3 temporarily.

We can still theoretically hang if the client stops reading while
ipc_send_message() is writing the message. This is not a huge problem in
practice but we could avoid it by using a simple non-blocking write()
loop instead of writeall(). On EAGAIN we can return the amount of bytes
written and save it in the pending_message_t struct. Then, when the
socket is writeable again, we can continue writing from where we
stopped.

We could also replace ipc_send_message() for all client connections in
i3, not just those subscribed to events.

We can also limit the amount of messages stored and increase the timeout
(or use multiple timeouts): eg it's ok if a client is not reading for 10
seconds and we are only holding 5KB of messages for them but it is not
ok if they are inactive for 5 seconds and we have 30MB of messages held.

Closes i3#2999
Closes i3#2539

orestisf1993 added a commit to orestisf1993/i3 that referenced this issue Apr 23, 2018

Kill misbehaving subscribed clients instead of hanging
This change only affects clients that are subscribed to events, which
should be the main cause of our problems.

In the common case (socket is immediately writeable) the behaviour
doesn't change at all: the message is sent directly, no ev_io /
ev_timeout callback is enabled.
When a socket is found to be not writeable, instead of calling the
blocking ipc_send_message we put the message in the tail of a queue and
init a ev_io callback and a corresponding timer. If timer is triggered
first, the socket is closed and the client connection is removed. If the
socket becomes writeable before the timeout we either reset the timer if
we couldn't push all stored messages or completely remove it if
everything was pushed.

A simpler alternative is to immediately remove clients with
non-writeable sockets (using the same socket_writeable() function):
https://github.com/orestisf1993/i3/tree/misbehaving-ipc-2999
The benefit of the queue + libev method is that we can allow timeouts
that are orders of magnitude larger without blocking i3 temporarily.

We can still theoretically hang if the client stops reading while
ipc_send_message() is writing the message. This is not a huge problem in
practice but we could avoid it by using a simple non-blocking write()
loop instead of writeall(). On EAGAIN we can return the amount of bytes
written and save it in the pending_message_t struct. Then, when the
socket is writeable again, we can continue writing from where we
stopped.

We could also replace ipc_send_message() for all client connections in
i3, not just those subscribed to events.

We can also limit the amount of messages stored and increase the timeout
(or use multiple timeouts): eg it's ok if a client is not reading for 10
seconds and we are only holding 5KB of messages for them but it is not
ok if they are inactive for 5 seconds and we have 30MB of messages held.

Closes i3#2999
Closes i3#2539

orestisf1993 added a commit to orestisf1993/i3 that referenced this issue Apr 26, 2018

Kill misbehaving subscribed clients instead of hanging
This change only affects clients that are subscribed to events, which
should be the main cause of our problems.

In the common case (socket is immediately writeable) the behaviour
doesn't change at all: the message is sent directly, no ev_io /
ev_timeout callback is enabled.
When a socket is found to be not writeable, instead of calling the
blocking ipc_send_message we put the message in the tail of a queue and
init a ev_io callback and a corresponding timer. If timer is triggered
first, the socket is closed and the client connection is removed. If the
socket becomes writeable before the timeout we either reset the timer if
we couldn't push all stored messages or completely remove it if
everything was pushed.

A simpler alternative is to immediately remove clients with
non-writeable sockets (using the same socket_writeable() function):
https://github.com/orestisf1993/i3/tree/misbehaving-ipc-2999
The benefit of the queue + libev method is that we can allow timeouts
that are orders of magnitude larger without blocking i3 temporarily.

We can still theoretically hang if the client stops reading while
ipc_send_message() is writing the message. This is not a huge problem in
practice but we could avoid it by using a simple non-blocking write()
loop instead of writeall(). On EAGAIN we can return the amount of bytes
written and save it in the pending_message_t struct. Then, when the
socket is writeable again, we can continue writing from where we
stopped.

We could also replace ipc_send_message() for all client connections in
i3, not just those subscribed to events.

We can also limit the amount of messages stored and increase the timeout
(or use multiple timeouts): eg it's ok if a client is not reading for 10
seconds and we are only holding 5KB of messages for them but it is not
ok if they are inactive for 5 seconds and we have 30MB of messages held.

Closes i3#2999
Closes i3#2539

orestisf1993 added a commit to orestisf1993/i3 that referenced this issue Jul 13, 2018

Kill misbehaving subscribed clients instead of hanging
This change only affects clients that are subscribed to events, which
should be the main cause of our problems.

In the common case (socket is immediately writeable) the behaviour
doesn't change at all: the message is sent directly, no ev_io /
ev_timeout callback is enabled.
When a socket is found to be not writeable, instead of calling the
blocking ipc_send_message we put the message in the tail of a queue and
init a ev_io callback and a corresponding timer. If timer is triggered
first, the socket is closed and the client connection is removed. If the
socket becomes writeable before the timeout we either reset the timer if
we couldn't push all stored messages or completely remove it if
everything was pushed.

A simpler alternative is to immediately remove clients with
non-writeable sockets (using the same socket_writeable() function):
https://github.com/orestisf1993/i3/tree/misbehaving-ipc-2999
The benefit of the queue + libev method is that we can allow timeouts
that are orders of magnitude larger without blocking i3 temporarily.

We can still theoretically hang if the client stops reading while
ipc_send_message() is writing the message. This is not a huge problem in
practice but we could avoid it by using a simple non-blocking write()
loop instead of writeall(). On EAGAIN we can return the amount of bytes
written and save it in the pending_message_t struct. Then, when the
socket is writeable again, we can continue writing from where we
stopped.

We could also replace ipc_send_message() for all client connections in
i3, not just those subscribed to events.

We can also limit the amount of messages stored and increase the timeout
(or use multiple timeouts): eg it's ok if a client is not reading for 10
seconds and we are only holding 5KB of messages for them but it is not
ok if they are inactive for 5 seconds and we have 30MB of messages held.

Closes i3#2999
Closes i3#2539

orestisf1993 added a commit to orestisf1993/i3 that referenced this issue Jul 16, 2018

Kill misbehaving subscribed clients instead of hanging
This change only affects clients that are subscribed to events, which
should be the main cause of our problems.

In the common case (no buffered data) the behaviour doesn't change at
all: the message is sent directly, no ev_io / ev_timeout callback is
enabled. Once a write to a client's socket is not completed fully
(returns with EAGAIN error), we put the message in the tail of a queue
and init an ev_io callback and a corresponding timer. If the timer is
triggered first, the socket is closed and the client connection is
removed. If the socket becomes writeable before the timeout we either
reset the timer if we couldn't push all the buffered data or completely
remove it if everything was pushed.

We could also replace ipc_send_message() for all client connections in
i3, not just those subscribed to events.

Furthermore, we could limit the amount of messages stored and increase
the timeout (or use multiple timeouts): eg it's ok if a client is not
reading for 10 seconds and we are only holding 5KB of messages for them
but it is not ok if they are inactive for 5 seconds and we have 30MB of
messages held.

Closes i3#2999
Closes i3#2539

orestisf1993 added a commit to orestisf1993/i3 that referenced this issue Jul 17, 2018

Kill misbehaving subscribed clients instead of hanging
This change only affects clients that are subscribed to events, which
should be the main cause of our problems.

In the common case (no buffered data) the behaviour doesn't change at
all: the message is sent directly, no ev_io / ev_timeout callback is
enabled. Once a write to a client's socket is not completed fully
(returns with EAGAIN error), we put the message in the tail of a queue
and init an ev_io callback and a corresponding timer. If the timer is
triggered first, the socket is closed and the client connection is
removed. If the socket becomes writeable before the timeout we either
reset the timer if we couldn't push all the buffered data or completely
remove it if everything was pushed.

We could also replace ipc_send_message() for all client connections in
i3, not just those subscribed to events.

Furthermore, we could limit the amount of messages stored and increase
the timeout (or use multiple timeouts): eg it's ok if a client is not
reading for 10 seconds and we are only holding 5KB of messages for them
but it is not ok if they are inactive for 5 seconds and we have 30MB of
messages held.

Closes i3#2999
Closes i3#2539

orestisf1993 added a commit to orestisf1993/i3 that referenced this issue Jul 30, 2018

Kill misbehaving subscribed clients instead of hanging
This change only affects clients that are subscribed to events, which
should be the main cause of our problems.

In the common case (no buffered data) the behaviour doesn't change at
all: the message is sent directly, no ev_io / ev_timeout callback is
enabled. Once a write to a client's socket is not completed fully
(returns with EAGAIN error), we put the message in the tail of a queue
and init an ev_io callback and a corresponding timer. If the timer is
triggered first, the socket is closed and the client connection is
removed. If the socket becomes writeable before the timeout we either
reset the timer if we couldn't push all the buffered data or completely
remove it if everything was pushed.

We could also replace ipc_send_message() for all client connections in
i3, not just those subscribed to events.

Furthermore, we could limit the amount of messages stored and increase
the timeout (or use multiple timeouts): eg it's ok if a client is not
reading for 10 seconds and we are only holding 5KB of messages for them
but it is not ok if they are inactive for 5 seconds and we have 30MB of
messages held.

Closes i3#2999
Closes i3#2539

orestisf1993 added a commit to orestisf1993/i3 that referenced this issue Jul 30, 2018

Kill misbehaving subscribed clients instead of hanging
This change only affects clients that are subscribed to events, which
should be the main cause of our problems.

In the common case (no buffered data) the behaviour doesn't change at
all: the message is sent directly, no ev_io / ev_timeout callback is
enabled. Once a write to a client's socket is not completed fully
(returns with EAGAIN error), we put the message in the tail of a queue
and init an ev_io callback and a corresponding timer. If the timer is
triggered first, the socket is closed and the client connection is
removed. If the socket becomes writeable before the timeout we either
reset the timer if we couldn't push all the buffered data or completely
remove it if everything was pushed.

We could also replace ipc_send_message() for all client connections in
i3, not just those subscribed to events.

Furthermore, we could limit the amount of messages stored and increase
the timeout (or use multiple timeouts): eg it's ok if a client is not
reading for 10 seconds and we are only holding 5KB of messages for them
but it is not ok if they are inactive for 5 seconds and we have 30MB of
messages held.

Closes i3#2999
Closes i3#2539

orestisf1993 added a commit to orestisf1993/i3 that referenced this issue Jul 30, 2018

Kill misbehaving subscribed clients instead of hanging
This change only affects clients that are subscribed to events, which
should be the main cause of our problems.

In the common case (no buffered data) the behaviour doesn't change at
all: the message is sent directly, no ev_io / ev_timeout callback is
enabled. Once a write to a client's socket is not completed fully
(returns with EAGAIN error), we put the message in the tail of a queue
and init an ev_io callback and a corresponding timer. If the timer is
triggered first, the socket is closed and the client connection is
removed. If the socket becomes writeable before the timeout we either
reset the timer if we couldn't push all the buffered data or completely
remove it if everything was pushed.

We could also replace ipc_send_message() for all client connections in
i3, not just those subscribed to events.

Furthermore, we could limit the amount of messages stored and increase
the timeout (or use multiple timeouts): eg it's ok if a client is not
reading for 10 seconds and we are only holding 5KB of messages for them
but it is not ok if they are inactive for 5 seconds and we have 30MB of
messages held.

Closes i3#2999
Closes i3#2539

orestisf1993 added a commit to orestisf1993/i3 that referenced this issue Jul 31, 2018

Kill misbehaving subscribed clients instead of hanging
This change only affects clients that are subscribed to events, which
should be the main cause of our problems.

In the common case (no buffered data) the behaviour doesn't change at
all: the message is sent directly, no ev_io / ev_timeout callback is
enabled. Once a write to a client's socket is not completed fully
(returns with EAGAIN error), we put the message in the tail of a queue
and init an ev_io callback and a corresponding timer. If the timer is
triggered first, the socket is closed and the client connection is
removed. If the socket becomes writeable before the timeout we either
reset the timer if we couldn't push all the buffered data or completely
remove it if everything was pushed.

We could also replace ipc_send_message() for all client connections in
i3, not just those subscribed to events.

Furthermore, we could limit the amount of messages stored and increase
the timeout (or use multiple timeouts): eg it's ok if a client is not
reading for 10 seconds and we are only holding 5KB of messages for them
but it is not ok if they are inactive for 5 seconds and we have 30MB of
messages held.

Closes i3#2999
Closes i3#2539

orestisf1993 added a commit to orestisf1993/i3 that referenced this issue Aug 1, 2018

Kill misbehaving subscribed clients instead of hanging
This change only affects clients that are subscribed to events, which
should be the main cause of our problems.

In the common case (no buffered data) the behaviour doesn't change at
all: the message is sent directly, no ev_io / ev_timeout callback is
enabled. Once a write to a client's socket is not completed fully
(returns with EAGAIN error), we put the message in the tail of a queue
and init an ev_io callback and a corresponding timer. If the timer is
triggered first, the socket is closed and the client connection is
removed. If the socket becomes writeable before the timeout we either
reset the timer if we couldn't push all the buffered data or completely
remove it if everything was pushed.

We could also replace ipc_send_message() for all client connections in
i3, not just those subscribed to events.

Furthermore, we could limit the amount of messages stored and increase
the timeout (or use multiple timeouts): eg it's ok if a client is not
reading for 10 seconds and we are only holding 5KB of messages for them
but it is not ok if they are inactive for 5 seconds and we have 30MB of
messages held.

Closes i3#2999
Closes i3#2539

orestisf1993 added a commit to orestisf1993/i3 that referenced this issue Aug 1, 2018

Kill misbehaving subscribed clients instead of hanging
This change only affects clients that are subscribed to events, which
should be the main cause of our problems.

In the common case (no buffered data) the behaviour doesn't change at
all: the message is sent directly, no ev_io / ev_timeout callback is
enabled. Once a write to a client's socket is not completed fully
(returns with EAGAIN error), we put the message in the tail of a queue
and init an ev_io callback and a corresponding timer. If the timer is
triggered first, the socket is closed and the client connection is
removed. If the socket becomes writeable before the timeout we either
reset the timer if we couldn't push all the buffered data or completely
remove it if everything was pushed.

We could also replace ipc_send_message() for all client connections in
i3, not just those subscribed to events.

Furthermore, we could limit the amount of messages stored and increase
the timeout (or use multiple timeouts): eg it's ok if a client is not
reading for 10 seconds and we are only holding 5KB of messages for them
but it is not ok if they are inactive for 5 seconds and we have 30MB of
messages held.

Closes i3#2999
Closes i3#2539

orestisf1993 added a commit to orestisf1993/i3 that referenced this issue Aug 8, 2018

Kill misbehaving subscribed clients instead of hanging
This change only affects clients that are subscribed to events, which
should be the main cause of our problems.

In the common case (no buffered data) the behaviour doesn't change at
all: the message is sent directly, no ev_io / ev_timeout callback is
enabled. Once a write to a client's socket is not completed fully
(returns with EAGAIN error), we put the message in the tail of a queue
and init an ev_io callback and a corresponding timer. If the timer is
triggered first, the socket is closed and the client connection is
removed. If the socket becomes writeable before the timeout we either
reset the timer if we couldn't push all the buffered data or completely
remove it if everything was pushed.

We could also replace ipc_send_message() for all client connections in
i3, not just those subscribed to events.

Furthermore, we could limit the amount of messages stored and increase
the timeout (or use multiple timeouts): eg it's ok if a client is not
reading for 10 seconds and we are only holding 5KB of messages for them
but it is not ok if they are inactive for 5 seconds and we have 30MB of
messages held.

Closes i3#2999
Closes i3#2539

orestisf1993 added a commit to orestisf1993/i3 that referenced this issue Aug 8, 2018

Kill misbehaving subscribed clients instead of hanging
This change only affects clients that are subscribed to events, which
should be the main cause of our problems.

In the common case (no buffered data) the behaviour doesn't change at
all: the message is sent directly, no ev_io / ev_timeout callback is
enabled. Once a write to a client's socket is not completed fully
(returns with EAGAIN error), we put the message in the tail of a queue
and init an ev_io callback and a corresponding timer. If the timer is
triggered first, the socket is closed and the client connection is
removed. If the socket becomes writeable before the timeout we either
reset the timer if we couldn't push all the buffered data or completely
remove it if everything was pushed.

We could also replace ipc_send_message() for all client connections in
i3, not just those subscribed to events.

Furthermore, we could limit the amount of messages stored and increase
the timeout (or use multiple timeouts): eg it's ok if a client is not
reading for 10 seconds and we are only holding 5KB of messages for them
but it is not ok if they are inactive for 5 seconds and we have 30MB of
messages held.

Closes i3#2999
Closes i3#2539
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment