Skip to content
This repository has been archived by the owner on Nov 14, 2024. It is now read-only.

Fix broken notification incremental sync #2701

Closed

Conversation

texuf
Copy link
Contributor

@texuf texuf commented Sep 8, 2022

I was not seeing unread notifications in sync, even if they were written to the db

Notifications are in their own stream, but the code was trying to tack them onto the join room stream. If the offsets “happened” to line up, you might get a count here or there, but they would be totally wrong (jump from 1 to 0 to 2, etc)

To fix, put them in their own top level object, handle them on the client.

Signed-off-by: Austin Ellis <austin@hntlabs.com>

Pull Request Checklist

  • I have added added tests for PR or I have justified why this PR doesn't need tests.
  • Pull request includes a sign off

@texuf texuf requested a review from a team as a code owner September 8, 2022 01:09
@texuf
Copy link
Contributor Author

texuf commented Sep 8, 2022

Tests are broken, but I see the same errors on other PRs

Copy link
Contributor

@S7evinK S7evinK left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The upgrade tests are currently broken, but Complement seems to be unhappy with the change.
The problem is that the SyncAPI sends the event to the UserAPI, which in turn sends the unread count. This can race and we receive the unread count way after we left the room. With this change we would add the room to the response, even if we might already have left the room.

#2688 is using a different approach, by listening on a different stream and producing the unread counts earlier. Unfortunately, this can still race, but is more reliable (in my tests).

@texuf
Copy link
Contributor Author

texuf commented Sep 10, 2022

Thank you for the quick response. If you don't mind, can you answer two questions for me?

  1. Where does UnreadNotificationCount get set? I see s.db.UpsertRoomUnreadNotificationCounts(ctx, userID, data.RoomID, data.UnreadNotificationCount, data.UnreadHighlightCount) in userapi.go, but I don't understand how that data is created. Searching matrix org for UnreadNotificationCount or unread_notification_count isn't bringing up any relevant results for me. Is this something built into jetstream? Can you point me to the right place?

  2. Why would it matter if a notification was sent for a room you're no longer in? Wouldn't the client know you're not in that room and just ignore the notification? Is there something else that goes wrong?

Thank you for your time!
Austin

@0x1a8510f2
Copy link
Contributor

0x1a8510f2 commented Sep 10, 2022

I don't know the answer to the first but I'll try to answer the second question:

Why would it matter if a notification was sent for a room you're no longer in? Wouldn't the client know you're not in that room and just ignore the notification? Is there something else that goes wrong?

The client may or may not ignore the notification, but it's not a good idea to assume this about a client and rely on that behavior because there are no guarantees and it could confuse users. In addition, it causes unnecessary network traffic which can mean additional mobile data usage, more battery usage and slower notifications. It's not the end of the world but it's not a good design.

@texuf
Copy link
Contributor Author

texuf commented Sep 11, 2022

I don't know the answer to the first but I'll answer the second question:

Why would it matter if a notification was sent for a room you're no longer in? Wouldn't the client know you're not in that room and just ignore the notification? Is there something else that goes wrong?

The client may or may not ignore the notification, but it's not a good idea to assume this about a client and rely on that behavior because there are no guarantees and it could confuse users. In addition, it causes unnecessary network traffic which can mean additional mobile data usage, more battery usage and slower notifications. It's not the end of the world but it's not a good design.

Usually you want to deliver notifications either at least once or exactly once. As a user I'm pretty used to seeing a text message get delivered again if I'm in a spotty network (though I haven't seen it in a while). Currently dendrite is delivering notifications sometimes. Just something to think about.

@0x1a8510f2
Copy link
Contributor

Usually you want to deliver notifications either at least once or exactly once. As a user I'm pretty used to seeing a text message get delivered again if I'm in a spotty network (though I haven't seen it in a while).

I believe the issue with this change is not that the notification gets re-delivered but rather that it gets delivered potentially after a room has been left by the user, right? In which case delivering the notification is bad for the aforementioned reasons and because the user is unable to view the message anyway (as they have left the room) so it is confusing.

Currently dendrite is delivering notifications sometimes. Just something to think about.

I agree that this should be fixed but I think the point being made is that this isn't a suitable solution.

@texuf
Copy link
Contributor Author

texuf commented Sep 12, 2022

You are right, that was a bad example, with this change the notifications won't get redelivered. Let me rephrase.

I'm a user. I just left a space (which I hardly ever do) and I receive a notification saying I have a new message in that space. Seeing that I just left the space, I'm probably a little confused, but I remember that matrix is a federated distributed system run by a non profit and it's not perfect, and I go about my day.

OR, I'm a user, and I never get unread notifications. I miss new messages in conversations that I'm trying to participate in all the time. I'm probably going to stop using this product.

I'm sorry if I'm coming across as harsh, I'm not meaning to be. I'm saying this with smiles that you would be able to see if we were in person. I just want to be as direct as possible because it feels like I'm missing something.

@texuf texuf force-pushed the austin.ellis/broken-notification-count branch from d26af75 to 451b09c Compare September 13, 2022 18:55
@texuf
Copy link
Contributor Author

texuf commented Sep 13, 2022

Okay! I went through some more tests on my end and you were right, this wasn't the right fix, the join message is the wrong place for this data. I updated my code. It leaves everything else in place and adds a new object to the response that matches the sync pattern you have going on here. It requires client changes but they're pretty simple in theory (in practice I added my own memory store that allows me to get access to the raw sync data and pulled the numbers from there.)

I'd love feedback, but also fine if you just close this request.

@texuf texuf force-pushed the austin.ellis/broken-notification-count branch from 451b09c to 4f3f982 Compare September 13, 2022 18:59
I was not seeing unread notifications in sync, even if they were written to the db

Notifications are in their own stream, but the code was trying to tack them onto the join room stream. If the offsets “happened” to line up, you might get a count here or there, but they would be totally wrong (jump from 1 to 0 to 2, etc)

To fix, put them in their own top level object, handle them on the client.

Signed-off-by: Austin Ellis <austin@hntlabs.com>
@texuf texuf force-pushed the austin.ellis/broken-notification-count branch from 4f3f982 to 4e30e03 Compare September 13, 2022 19:17
@texuf
Copy link
Contributor Author

texuf commented Sep 15, 2022

In case anyone is still listening to this thread, if I'm reading the code correctly, it looks like we cleanup the notification table after a day


func (s *notificationsStatements) Clean(ctx context.Context, txn *sql.Tx) error {
	_, err := sqlutil.TxStmt(txn, s.cleanNotificationsStmt).ExecContext(
		ctx,
		time.Now().AddDate(0, 0, -1).UnixNano()/int64(time.Millisecond), // keep non-highlights for a day
		time.Now().AddDate(0, -1, 0).UnixNano()/int64(time.Millisecond), // keep highlights for a month
	)
	return err
}

But we don't clean up the notifications in the syncAPI table.

When we send a read receipt we first do a updated _, err := s.db.SetNotificationsRead(ctx, localpart, roomID, int64(read.Read), true) and only forward the message on if the table was updated. If a user waits more than a day to send a read receipt, they can't clear their notifications.

I couldn’t clear notifications for messages greater than 1 day old because they no longer existed :)
}
//}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this bug probably happens with fully read too, i haven't tested that yet.

@S7evinK
Copy link
Contributor

S7evinK commented Sep 27, 2022

Closing this, since #2688 is now merged and solves the same problem.

@S7evinK S7evinK closed this Sep 27, 2022
S7evinK pushed a commit that referenced this pull request Sep 28, 2022
#2743)

…ce {}, a slice of interface` in new notifications select

The sqlite3 version was just not working, original pr here:
#2688

signed off by: austin ellis <austin@hntlabs.com>

This doesn't fix the notification counts, they still only work about 1
out of every 5 times in my tests. I will stick with my other fix locally
for reliable notification delivery:
#2701
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants