Fix broken notification incremental sync #2701

texuf · 2022-09-08T01:09:12Z

I was not seeing unread notifications in sync, even if they were written to the db

Notifications are in their own stream, but the code was trying to tack them onto the join room stream. If the offsets “happened” to line up, you might get a count here or there, but they would be totally wrong (jump from 1 to 0 to 2, etc)

To fix, put them in their own top level object, handle them on the client.

Signed-off-by: Austin Ellis <austin@hntlabs.com>

Pull Request Checklist

I have added added tests for PR or I have justified why this PR doesn't need tests.
Pull request includes a sign off

texuf · 2022-09-08T02:59:49Z

Tests are broken, but I see the same errors on other PRs

S7evinK

The upgrade tests are currently broken, but Complement seems to be unhappy with the change.
The problem is that the SyncAPI sends the event to the UserAPI, which in turn sends the unread count. This can race and we receive the unread count way after we left the room. With this change we would add the room to the response, even if we might already have left the room.

#2688 is using a different approach, by listening on a different stream and producing the unread counts earlier. Unfortunately, this can still race, but is more reliable (in my tests).

texuf · 2022-09-10T17:43:26Z

Thank you for the quick response. If you don't mind, can you answer two questions for me?

Where does UnreadNotificationCount get set? I see s.db.UpsertRoomUnreadNotificationCounts(ctx, userID, data.RoomID, data.UnreadNotificationCount, data.UnreadHighlightCount) in userapi.go, but I don't understand how that data is created. Searching matrix org for UnreadNotificationCount or unread_notification_count isn't bringing up any relevant results for me. Is this something built into jetstream? Can you point me to the right place?
Why would it matter if a notification was sent for a room you're no longer in? Wouldn't the client know you're not in that room and just ignore the notification? Is there something else that goes wrong?

Thank you for your time!
Austin

0x1a8510f2 · 2022-09-10T20:51:34Z

I don't know the answer to the first but I'll try to answer the second question:

Why would it matter if a notification was sent for a room you're no longer in? Wouldn't the client know you're not in that room and just ignore the notification? Is there something else that goes wrong?

The client may or may not ignore the notification, but it's not a good idea to assume this about a client and rely on that behavior because there are no guarantees and it could confuse users. In addition, it causes unnecessary network traffic which can mean additional mobile data usage, more battery usage and slower notifications. It's not the end of the world but it's not a good design.

texuf · 2022-09-11T01:03:46Z

I don't know the answer to the first but I'll answer the second question:

Why would it matter if a notification was sent for a room you're no longer in? Wouldn't the client know you're not in that room and just ignore the notification? Is there something else that goes wrong?

The client may or may not ignore the notification, but it's not a good idea to assume this about a client and rely on that behavior because there are no guarantees and it could confuse users. In addition, it causes unnecessary network traffic which can mean additional mobile data usage, more battery usage and slower notifications. It's not the end of the world but it's not a good design.

Usually you want to deliver notifications either at least once or exactly once. As a user I'm pretty used to seeing a text message get delivered again if I'm in a spotty network (though I haven't seen it in a while). Currently dendrite is delivering notifications sometimes. Just something to think about.

0x1a8510f2 · 2022-09-11T01:33:43Z

Usually you want to deliver notifications either at least once or exactly once. As a user I'm pretty used to seeing a text message get delivered again if I'm in a spotty network (though I haven't seen it in a while).

I believe the issue with this change is not that the notification gets re-delivered but rather that it gets delivered potentially after a room has been left by the user, right? In which case delivering the notification is bad for the aforementioned reasons and because the user is unable to view the message anyway (as they have left the room) so it is confusing.

Currently dendrite is delivering notifications sometimes. Just something to think about.

I agree that this should be fixed but I think the point being made is that this isn't a suitable solution.

texuf · 2022-09-12T16:06:24Z

You are right, that was a bad example, with this change the notifications won't get redelivered. Let me rephrase.

I'm a user. I just left a space (which I hardly ever do) and I receive a notification saying I have a new message in that space. Seeing that I just left the space, I'm probably a little confused, but I remember that matrix is a federated distributed system run by a non profit and it's not perfect, and I go about my day.

OR, I'm a user, and I never get unread notifications. I miss new messages in conversations that I'm trying to participate in all the time. I'm probably going to stop using this product.

I'm sorry if I'm coming across as harsh, I'm not meaning to be. I'm saying this with smiles that you would be able to see if we were in person. I just want to be as direct as possible because it feels like I'm missing something.

texuf · 2022-09-13T18:58:07Z

Okay! I went through some more tests on my end and you were right, this wasn't the right fix, the join message is the wrong place for this data. I updated my code. It leaves everything else in place and adds a new object to the response that matches the sync pattern you have going on here. It requires client changes but they're pretty simple in theory (in practice I added my own memory store that allows me to get access to the raw sync data and pulled the numbers from there.)

I'd love feedback, but also fine if you just close this request.

I was not seeing unread notifications in sync, even if they were written to the db Notifications are in their own stream, but the code was trying to tack them onto the join room stream. If the offsets “happened” to line up, you might get a count here or there, but they would be totally wrong (jump from 1 to 0 to 2, etc) To fix, put them in their own top level object, handle them on the client. Signed-off-by: Austin Ellis <austin@hntlabs.com>

texuf · 2022-09-15T05:36:11Z

In case anyone is still listening to this thread, if I'm reading the code correctly, it looks like we cleanup the notification table after a day


func (s *notificationsStatements) Clean(ctx context.Context, txn *sql.Tx) error {
	_, err := sqlutil.TxStmt(txn, s.cleanNotificationsStmt).ExecContext(
		ctx,
		time.Now().AddDate(0, 0, -1).UnixNano()/int64(time.Millisecond), // keep non-highlights for a day
		time.Now().AddDate(0, -1, 0).UnixNano()/int64(time.Millisecond), // keep highlights for a month
	)
	return err
}

But we don't clean up the notifications in the syncAPI table.

When we send a read receipt we first do a updated _, err := s.db.SetNotificationsRead(ctx, localpart, roomID, int64(read.Read), true) and only forward the message on if the table was updated. If a user waits more than a day to send a read receipt, they can't clear their notifications.

I couldn’t clear notifications for messages greater than 1 day old because they no longer existed :)

texuf · 2022-09-16T18:08:21Z

userapi/consumers/syncapi_readupdate.go

 		}
+		//}


this bug probably happens with fully read too, i haven't tested that yet.

S7evinK · 2022-09-27T13:17:16Z

Closing this, since #2688 is now merged and solves the same problem.

#2743) …ce {}, a slice of interface` in new notifications select The sqlite3 version was just not working, original pr here: #2688 signed off by: austin ellis <austin@hntlabs.com> This doesn't fix the notification counts, they still only work about 1 out of every 5 times in my tests. I will stick with my other fix locally for reliable notification delivery: #2701

texuf requested a review from a team as a code owner September 8, 2022 01:09

S7evinK suggested changes Sep 8, 2022

View reviewed changes

neilalexander force-pushed the main branch from c648e2b to 7595fbf Compare September 12, 2022 12:24

texuf force-pushed the austin.ellis/broken-notification-count branch from d26af75 to 451b09c Compare September 13, 2022 18:55

texuf force-pushed the austin.ellis/broken-notification-count branch from 451b09c to 4f3f982 Compare September 13, 2022 18:59

texuf force-pushed the austin.ellis/broken-notification-count branch from 4f3f982 to 4e30e03 Compare September 13, 2022 19:17

Disable the updated check to enable marking messages > 1 day old as read

af369c3

I couldn’t clear notifications for messages greater than 1 day old because they no longer existed :)

texuf commented Sep 16, 2022

View reviewed changes

userapi/consumers/syncapi_readupdate.go

}

//}

Copy link

Contributor Author

texuf Sep 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this bug probably happens with fully read too, i haven't tested that yet.

S7evinK closed this Sep 27, 2022

texuf mentioned this pull request Sep 28, 2022

Fix for `sql: converting argument $1 type: unsupported type []interfa… #2743

Merged

matrixbot mentioned this pull request Nov 1, 2024

Fix broken notification incremental sync element-hq/dendrite#2701

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix broken notification incremental sync #2701

Fix broken notification incremental sync #2701

texuf commented Sep 8, 2022 •

edited

Loading

texuf commented Sep 8, 2022

S7evinK left a comment

texuf commented Sep 10, 2022

0x1a8510f2 commented Sep 10, 2022 •

edited

Loading

texuf commented Sep 11, 2022

0x1a8510f2 commented Sep 11, 2022

texuf commented Sep 12, 2022

texuf commented Sep 13, 2022

texuf commented Sep 15, 2022

texuf Sep 16, 2022

S7evinK commented Sep 27, 2022

Fix broken notification incremental sync #2701

Fix broken notification incremental sync #2701

Conversation

texuf commented Sep 8, 2022 • edited Loading

Pull Request Checklist

texuf commented Sep 8, 2022

S7evinK left a comment

Choose a reason for hiding this comment

texuf commented Sep 10, 2022

0x1a8510f2 commented Sep 10, 2022 • edited Loading

texuf commented Sep 11, 2022

0x1a8510f2 commented Sep 11, 2022

texuf commented Sep 12, 2022

texuf commented Sep 13, 2022

texuf commented Sep 15, 2022

texuf Sep 16, 2022

Choose a reason for hiding this comment

S7evinK commented Sep 27, 2022

texuf commented Sep 8, 2022 •

edited

Loading

0x1a8510f2 commented Sep 10, 2022 •

edited

Loading