-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Presence fix for idle mode #15989
Presence fix for idle mode #15989
Changes from all commits
d6a6dff
9f2fa60
b4e3987
f350e01
dc8cbdd
bf71e78
6b7e273
183e5f5
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
Allow presence to achieve idle state. |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -19,6 +19,7 @@ | |
The methods that define policy are: | ||
- PresenceHandler._update_states | ||
- PresenceHandler._handle_timeouts | ||
- PresenceHandler.set_state | ||
- should_notify | ||
""" | ||
import abc | ||
|
@@ -444,20 +445,18 @@ def send_user_sync(self, user_id: str, is_syncing: bool, last_sync_ms: int) -> N | |
) | ||
|
||
def mark_as_coming_online(self, user_id: str) -> None: | ||
"""A user has started syncing. Send a UserSync to the presence writer, | ||
unless they had recently stopped syncing. | ||
""" | ||
going_offline = self.users_going_offline.pop(user_id, None) | ||
if not going_offline: | ||
# Safe to skip because we haven't yet told the presence writer they | ||
# were offline | ||
self.send_user_sync(user_id, True, self.clock.time_msec()) | ||
"""A user has started syncing. Send a UserSync to the presence writer.""" | ||
# Need to remove the user_id from users_going_offline or the looping call will | ||
# tell the presence writer that the user went offline. | ||
self.users_going_offline.pop(user_id, None) | ||
self.send_user_sync(user_id, True, self.clock.time_msec()) | ||
|
||
def mark_as_going_offline(self, user_id: str) -> None: | ||
"""A user has stopped syncing. We wait before notifying the presence | ||
writer as its likely they'll come back soon. This allows us to avoid | ||
sending a stopped syncing immediately followed by a started syncing | ||
notification to the presence writer | ||
"""A user has stopped syncing. We allow the timeout(UPDATE_SYNCING_USERS_MS) | ||
to expire and the looping_call to handle notifying the presence writer as its | ||
likely they'll come back soon. This allows us to avoid sending a stopped | ||
syncing immediately followed by a started syncing notification to the | ||
presence writer | ||
""" | ||
self.users_going_offline[user_id] = self.clock.time_msec() | ||
|
||
|
@@ -486,11 +485,12 @@ async def user_syncing( | |
if prev_state.state != PresenceState.BUSY: | ||
# We set state here but pass ignore_status_msg = True as we don't want to | ||
# cause the status message to be cleared. | ||
# Note that this causes last_active_ts to be incremented which is not | ||
# what the spec wants: see comment in the BasePresenceHandler version | ||
# of this function. | ||
# Note that this causes last_active_ts to be incremented only if changing | ||
# state to 'online' from 'offline'. | ||
await self.set_state( | ||
UserID.from_string(user_id), {"presence": presence_state}, True | ||
UserID.from_string(user_id), | ||
{"presence": presence_state}, | ||
ignore_status_msg=True, | ||
) | ||
|
||
curr_sync = self._user_to_num_current_syncs.get(user_id, 0) | ||
|
@@ -1004,13 +1004,12 @@ async def user_syncing( | |
# If they're busy then they don't stop being busy just by syncing, | ||
# so just update the last sync time. | ||
if prev_state.state != PresenceState.BUSY: | ||
# XXX: We set_state separately here and just update the last_active_ts above | ||
# This keeps the logic as similar as possible between the worker and single | ||
# process modes. Using set_state will actually cause last_active_ts to be | ||
# updated always, which is not what the spec calls for, but synapse has done | ||
# this for... forever, I think. | ||
# process modes. | ||
await self.set_state( | ||
UserID.from_string(user_id), {"presence": presence_state}, True | ||
UserID.from_string(user_id), | ||
{"presence": presence_state}, | ||
ignore_status_msg=True, | ||
) | ||
# Retrieve the new state for the logic below. This should come from the | ||
# in-memory cache. | ||
|
@@ -1254,9 +1253,12 @@ async def set_state( | |
if not ignore_status_msg: | ||
new_fields["status_msg"] = status_msg | ||
|
||
if presence == PresenceState.ONLINE or ( | ||
presence == PresenceState.BUSY and self._busy_presence_enabled | ||
): | ||
if ( | ||
prev_state.state == PresenceState.OFFLINE | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm failing to follow why this is specifically for offline->online; naively I would have expected if it is going from !online->online. (The busy presence bit seems like it might have the same bug?) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I had a reason, which is entirely escaping me right now(and appears to be missing from my notes). Let me test it out, because if this holds up we can remove the 'hacky' bit in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. OK! The various possible state transitions are rather fuzzy in my head, which is why I'm asking. 😄 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I went ahead and made those changes. No deal(at least as-is). It will need something else to handle the fact that some clients still set presence via the presence endpoint and not through https://github.com/matrix-org/synapse/actions/runs/5745219286 I left my desktop client open(so it should have been syncing in the background, at least until it went to sleep). But then I checked in real quick with my Android mobile, a few minutes later this nonsense started spamming federation. |
||
and presence == PresenceState.ONLINE | ||
) or (presence == PresenceState.BUSY and self._busy_presence_enabled): | ||
# By updating last_active_ts in this way, currently_active will be triggered | ||
# in handle_update() | ||
new_fields["last_active_ts"] = self.clock.time_msec() | ||
|
||
await self._update_states( | ||
|
@@ -1954,6 +1956,9 @@ def handle_update( | |
- persist_and_notify: whether to persist and notify people | ||
- federation_ping: whether we should send a ping over federation | ||
""" | ||
# Note: if the previous state was OFFLINE and we are now ONLINE, this is most likely | ||
# caused by a call to the /sync endpoint. If that is the case, last_active_ts would | ||
# have been set in PresenceHandler.set_state() | ||
user_id = new_state.user_id | ||
|
||
persist_and_notify = False | ||
|
@@ -1977,6 +1982,14 @@ def handle_update( | |
obj=user_id, | ||
then=new_state.last_active_ts + LAST_ACTIVE_GRANULARITY, | ||
) | ||
if prev_state.state == PresenceState.UNAVAILABLE: | ||
# If we had idled out, but are still syncing, then new state will look | ||
# like it's online. Verify with last_active_ts, as that is a sign of a | ||
# pro-active event. Override that we are online when we probably aren't. | ||
if now - new_state.last_active_ts > IDLE_TIMER: | ||
new_state = new_state.copy_and_replace( | ||
state=PresenceState.UNAVAILABLE | ||
) | ||
Comment on lines
+1985
to
+1992
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This feels a bit hacky -- can we avoid trying to update in this case at all instead backtracking from saying we're online to unavailable? |
||
|
||
if new_state.state != PresenceState.OFFLINE: | ||
# User has stopped syncing | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -306,6 +306,37 @@ def test_online_to_idle(self) -> None: | |
any_order=True, | ||
) | ||
|
||
def test_idle_while_syncing_stays_idle(self) -> None: | ||
"""Test that syncing alone doesn't affect unavailable state""" | ||
wheel_timer = Mock() | ||
user_id = "@foo:bar" | ||
now = 5000000 | ||
time_went_idle = now - IDLE_TIMER - 1 | ||
|
||
prev_state = UserPresenceState.default(user_id) | ||
prev_state = prev_state.copy_and_replace( | ||
state=PresenceState.UNAVAILABLE, | ||
last_active_ts=time_went_idle, | ||
currently_active=False, | ||
) | ||
|
||
# The key to the test here is that while syncing, we are being told we are | ||
# 'online'. But if the last_active_ts hasn't moved, then we are actually still | ||
# idle. | ||
new_state = prev_state.copy_and_replace(state=PresenceState.ONLINE) | ||
|
||
state, persist_and_notify, federation_ping = handle_update( | ||
prev_state, new_state, is_mine=True, wheel_timer=wheel_timer, now=now | ||
) | ||
|
||
self.assertFalse(persist_and_notify) | ||
self.assertEqual(prev_state.state, state.state) | ||
self.assertEqual(state.last_federation_update_ts, now) | ||
self.assertEqual(prev_state.state, state.state) | ||
self.assertEqual(prev_state.status_msg, state.status_msg) | ||
|
||
self.assertEqual(wheel_timer.insert.call_count, 2) | ||
|
||
def test_persisting_presence_updates(self) -> None: | ||
"""Tests that the latest presence state for each user is persisted correctly""" | ||
# Create some test users and presence states for them | ||
|
@@ -773,6 +804,32 @@ def test_set_presence_from_syncing_keeps_busy( | |
# we should still be busy | ||
self.assertEqual(state.state, PresenceState.BUSY) | ||
|
||
def test_syncing_appropriately_handles_last_active_ts(self) -> None: | ||
"""Test that syncing does not bump last_active_ts when already online""" | ||
|
||
user_id = "@test:server" | ||
|
||
# Start from a time that is not zero | ||
self.reactor.advance(2) | ||
|
||
self._set_presencestate_with_status_msg(user_id, PresenceState.ONLINE, None) | ||
|
||
prev_state = self.get_success( | ||
self.presence_handler.get_state(UserID.from_string(user_id)) | ||
) | ||
|
||
# Poke the reactor so some time passes | ||
self.reactor.advance(2) | ||
self.get_success( | ||
self.presence_handler.user_syncing(user_id, True, PresenceState.ONLINE) | ||
) | ||
|
||
state = self.get_success( | ||
self.presence_handler.get_state(UserID.from_string(user_id)) | ||
) | ||
# last_active_ts should not have changed as no pro-active event has occurred | ||
self.assertEqual(prev_state.last_active_ts, state.last_active_ts) | ||
Comment on lines
+830
to
+831
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @erikjohnston and I went through this and we think there's a misunderstanding here -- There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Reading just that summary that sounds like a recipe for disaster. start: There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
My understanding is that updating the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah yes https://spec.matrix.org/v1.7/client-server-api/#last-active-ago
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. To be clear
I wasn't done in this area. This doesn't completely fix that To fix State changes are separate from this, and this PR is only supposed to be about making the state change work. I'm sorry this is making problems. I can close this and separate it all out if you would rather. (But I'm not updating the pictures, those were a pain to do the first time) |
||
|
||
def _set_presencestate_with_status_msg( | ||
self, user_id: str, state: str, status_msg: Optional[str] | ||
) -> None: | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not quite following why we remove the if-statement here. Can you explain a bit more what this is doing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mentioned that above in Number 2 under Three-fold fix.
Some context to keep in mind before I begin the long version:
users_going_offline
and at least 10 seconds has passed. These timers are not synchronized in any way and a theoretical 19 seconds could occur. The looping call uses the companion tosend_user_sync()
calledsend_stop_syncing()
which makes an explicit call to the presence writer to mark them 'offline'.Now
_end()
of the sync context manager, it used to adduser_id
as a key tousers_going_offline
with a value of the current time in milliseconds.user_id
ispop()
ped off ofusers_going_offline
and checked forNone
(or more specifically, would check that there wasnot
a value), and it would only thensend_user_sync()
. This only occurs when it is the very first sync of a recent-ish session otherwise it willnot
beNone
andsend_user_sync()
would not be called.send_stop_syncing()
and mark them offline).USER_SYNC
has come over TCP replication, and remember thatset_state()
no longer bumpslast_active_ts
on a regular basis anymore. The machinery kicks into gear to mark the user as 'offline', which then gets sent over replication back to the sync handler side.WorkerPresenceHandler
, actually) thinks the user is 'online'. The call from the presence writer will overwrite what the context manager thinks at this point. Still no update to thelast_active_ts
, by the way.Now, you are probably thinking, "Well, let's just remove the
not
in the conditional to check for an actual time". I considered that, but then the presence writer doesn't get it's first notification that the user is syncing and that felt like undefined behavior. Honestly, it might work out(and is a place I'm planning on revisiting in the future as a bit of refactor, there are more things to touch here). I will test this if you wish, my instinct says as long as the second sync of a session is within 10 seconds then everything will be fine.And, to be fair, fully fixing everything possible wasn't the scope of this PR. I wanted 'unavailable' to work as advertised. But this was affecting the behavior in a negative way and wasn't going to be allowed to get away, a traditional 'bug' that was fully exposed by
last_active_ts
not being updated on every single sync any longer.