Skip to content

Commit

Permalink
docs: inactive group members rfc
Browse files Browse the repository at this point in the history
  • Loading branch information
spaced4ndy committed Nov 21, 2023
1 parent a8576c2 commit 53c0a37
Showing 1 changed file with 59 additions and 0 deletions.
59 changes: 59 additions & 0 deletions docs/rfcs/2023-11-21-inactive-group-members.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
# Inactive group members

## Problem

Group traffic is higher than necessary due to lack of diagnosis of inactive group members. By inactive we understand group members who went offline for indefinitely long time, uninstalled application without leaving group, failed to send x.grp.leave message before deleting connection, or in any other way failed to explicitly communicate further inactivity.

Currently other group members continue to identify such members as active and to send messages to their connections until exceeding receiving SMP queues quotas, with pending messages being slowly retried even after that.

## Solution

Identify inactive members and don't send messages to their connections. Silent periodically online members should continue to receive messages, so decision to mark member as inactive should be made conservatively.

Agent:
- on SMP.QUOTA error notify client with ERR CONN QUOTA (new ConnectionErrorType QUOTA).
- on receiving QCONT notify client (new event).

Chat, on sending side, per member:
- unanswered_snd_msg_count - number of messages that were sent consecutively without receiving a message from member.
- last_rcv_ts - timestamp of last received message.
- inactive flag.
- set inactive if:
- agent reports QUOTA error.
- on sending message: (unanswered_snd_msg_count > K) && (last_rcv_ts earlier than Ddiff days ago), Ddiff = 1/2/3 days?
- reset inactive:
- on receiving QCONT.
- on receiving message or receipt. Also reset unanswered_snd_msg_count, last_rcv_ts.
- don't send to member if inactive.
- don't send only content messages (x.msg.new, etc.) and always send messages altering group state?
- unanswered_snd_msg_count, last_rcv_ts to be tracked, checked, reset only for members with compatible version.

Chat, on receiving side, per member:
- unanswered_rcv_msg_count - number of messages that were received consecutively without sending a message to member.
- send non-optional receipt / another (new) protocol message if:
- on receiving message: unanswered_rcv_msg_count > M, M < K.
- on sending a message or receipt to member reset unanswered_rcv_msg_count.
- unanswered_rcv_msg_count to be tracked, checked, reset only for members with compatible version.

\***

Consider above condition:

> (unanswered_snd_msg_count > K) && (last_rcv_ts earlier than Ddiff days ago)
It still doesn't account for following situation:

1. Sending member sends a few (N1, N1 < M) messages to silent member on day D1.
2. Sending member doesn't send messages for several days.
3. Sending member sends more messages (N2, N1 + N2 > K) to silent member on day DI (DI - D1 > diff in days in above condition), while silent member is offline.
- Sending member checks above condition and evaluates it to be true, marks silent member as inactive.
- Simply remembering last_snd_ts on sending side and adding check for it not being from several days ago to above condition is not enough, as it will be overwritten by current day sends and will only evaluate false for the first send. What could work is remembering prev_session_last_snd_ts or prev_day_last_snd_ts, but it further complicates logic, and still probably wouldn't account for some time zone differences.
4. Sending member sends yet more messages, which will not be queued for silent member marked inactive.
5. Silent member comes online, sends receipt upon receiving message fulfilling above condition: `unanswered_rcv_msg_count > M`, and will lose following messages.
- If sending member created messages from 4 as pending, and sent them upon receiving receipt from silent member, silent member would only receive them after sending member coming online. If they are in different time zones it may happen on next day.

Same situation can occur even without step 1, simply by sending many messages while other member is offline.

The problem is less acute the greater the difference between K and M, but making K >> M renders this whole mechanism obsolete, as we could then simply rely on QUOTA errors to mark group members inactive (and don't slow retry in agent?).

Perhaps an acceptable way to solve this problem is to add a task to cleanup manager that would send receipts to all members on condition: (unanswered_rcv_msg_count > 0) && (last_reply_ts earlier than 1 day ago). (Adds last_reply_ts to tracking on receiving side).

0 comments on commit 53c0a37

Please sign in to comment.