Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stale device lists #2305

Closed
richvdh opened this issue Sep 20, 2016 · 11 comments · Fixed by matrix-org/matrix-android-sdk#132
Closed

Stale device lists #2305

richvdh opened this issue Sep 20, 2016 · 11 comments · Fixed by matrix-org/matrix-android-sdk#132
Labels
A-E2EE P1 S-Minor Impairs non-critical functionality or suitable workarounds exist T-Defect

Comments

@richvdh
Copy link
Member

richvdh commented Sep 20, 2016

STR:

  1. Alice and Bob share an e2e room
  2. Bob leaves all of the e2e rooms he shares with Alice
  3. Bob logs in on a new device
  4. Bob joins (the same, or a different) e2e room with Alice
  5. Alice sends a message
  6. Bob sees "Unknown inbound session ID"

The problem here is that Alice's client is using a stale device list for Bob.

It's a bit of an edge case, since it requires doing a complicated dance of leaving and joining rooms and logging in on new devices - but it's an edge case that people tend to hit during testing.

@richvdh
Copy link
Member Author

richvdh commented Sep 20, 2016

Some possible strategies for fixing:

  1. Force a device list refresh whenever someone joins an e2e room
  2. The above, but only if we don't already share an e2e room with them
  3. Do the above lazily somehow: store a flag which means we think the device list might be stale

@richvdh
Copy link
Member Author

richvdh commented Nov 23, 2016

This also happens even if Bob and Alice have never shared a room; Alice's client may still have cached an old list if she opened Bob's memberinfo.

@richvdh richvdh added S-Minor Impairs non-critical functionality or suitable workarounds exist P1 and removed P3 labels Nov 23, 2016
@richvdh richvdh changed the title Undecipherable messages after leaving all e2e rooms and rejoining Sometimes we end up with stale device lists Nov 24, 2016
@giordy
Copy link

giordy commented Nov 29, 2016

I confirm that this happens pretty frequently in my experience.
We are using Riot in 2 people from the web and from the app (both iOS and Android).

@richvdh
Copy link
Member Author

richvdh commented Nov 29, 2016

@giordy: this bug is something of an edge-case; it is unlikely you are seeing exactly this.

There are several potential causes of "Unknown inbound session ID" (many of them currently related to work still to be done on the mobile clients).

@giordy
Copy link

giordy commented Nov 29, 2016

@richvdh ok thanks for the clarification!

@richvdh richvdh changed the title Sometimes we end up with stale device lists Need to update device list when a user joins a room Dec 23, 2016
@richvdh
Copy link
Member Author

richvdh commented Dec 23, 2016

This is slightly tricky to fix from Alice's end. She can listen for new members while she is online, and do a device list download when she sees one, but that doesn't help when she is offline. During an intialsync, she can't tell the users who joined since she was last online from those who were there before (and she should thus expect to have an up-to-date device list for, or new_device events for).

Another idea would be to send a new_device event to all existing members when you join a room, thus prompting them to download your device list when they come online. However, this relies on the join events being well-ordered, which they are not (Alice and Bob could join at the same time on different servers; each of them see the room without the other when they first join; neither sends the other new_device events, expecting the other to sort it out).

We could just refresh the devicelist for all users in all encrypted rooms at the end of intialsync. This might be a bit high-traffic.

We could be a bit more intelligent in our caching, and store the timestamp of the devicelist response; then refresh the devicelist if we see join events (or new_device events) after that time. Probably requires server support to give us a timestamp, though.

@richvdh
Copy link
Member Author

richvdh commented Dec 24, 2016

I've been thinking about this a bit more. I've realised the current method of sending m.new_device events to everyone you think is in the room when you add a new device is inherently racy. Consider:

  • Bob is an existing member of the room
  • Alice joins the room, and downloads Bob's device list (or fails to download it, but doesn't mark the cache as invalid).
  • Alice's join event takes a while to reach Bob. Meanwhile, he adds a new device. He doesn't send an m.new_device to Alice, because he doesn't know she is a member.
  • Bob finally sees Alice's join event.

Alternatively:

  • Alice and Bob join a room at a similar time. Alice sees Bob's join event, but Bob does not (yet) see Alice's.
  • Currently this doesn't work, but let's assume Alice realises that Bob is a new member, and refreshes his device list.
  • Bob adds a new device, but because he doesn't realise that Alice is a member, doesn't send her a new_device.
  • Bob finally sees Alice's join event.

In either of these situations, Alice has a stale copy of Bob's device list. Worse, this situation persists until Alice has cause to refresh the device list.

The fundamental problem here is that the m.new_device mechanism relies on there being a well-ordering of join events. Even treating join events as cache invalidation indicators (which is tricky to do efficiently on initialsync, as above) doesn't fully resolve this problem.

One solution would be to record, in the room state, a sequence number (could be a timestamp, any monotonically increasing number will do) which increases each time you update the device list. Similarly, the device list results would contain a corresponding sequence number, which you record alongside your cache. When you see that a member's device sequence state is higher than your cached version, you know the cache is out of date.

@richvdh
Copy link
Member Author

richvdh commented Jan 11, 2017

I discussed my proposed solution with Erik. He suggested that:

  • using room state in this way is not ideal, due to the duplication between rooms, and because rollback in room state (due to auth failures etc) could make it unreliable. We could instead use a similar mechanism to presence updates (which he insists already has hooks to send updates when we see a new server join a room).
  • the use of a sequence number implies a lock, which it would be preferable to avoid. We could instead send the latest device list over federation. I remain slightly unclear on how client-side caches could be maintained in a way that doesn't end up just having locks elsewhere, but will take Erik's word on this for now.

@richvdh
Copy link
Member Author

richvdh commented Feb 3, 2017

Modulo #3126, this should be fixed as of riot v0.9.7 and synapse v0.19.0.

@richvdh richvdh closed this as completed Feb 3, 2017
@richvdh
Copy link
Member Author

richvdh commented Mar 1, 2017

Still broken. We don't update the device list of users when they join an e2e room. I think we need to update the device list when we first see the user join an e2e room.

To do this right, we probably need to maintain a flag which tracks whether we are keeping the device list for a given user up-to-date.

@richvdh richvdh reopened this Mar 1, 2017
@lampholder lampholder reopened this Apr 3, 2017
richvdh added a commit to matrix-org/matrix-js-sdk that referenced this issue Apr 25, 2017
Yet another attempt at fixing
element-hq/element-web#2305.

This now implements the algorithm described at
http://matrix.org/speculator/spec/HEAD/client_server/unstable.html#tracking-the-device-list-for-a-user:

* We now keep a flag to tell us which users' device lists we are tracking. That
  makes it much easier to figure out whether we should care about device-update
  notifications from /sync (thereby fixing
  element-hq/element-web#3588).

* We use the same flag to indicate whether the device list for a particular
  user is out of date. Previously we did this implicitly by only updating the
  stored sync token when the list had been updated, but that was somewhat
  complicated, and in any case didn't help in cases where we initiated the key
  download due to a user joining an encrypted room.

Also fixes element-hq/element-web#3310.
richvdh added a commit to matrix-org/matrix-js-sdk that referenced this issue Apr 25, 2017
Yet another attempt at fixing
element-hq/element-web#2305.

This now implements the algorithm described at
http://matrix.org/speculator/spec/HEAD/client_server/unstable.html#tracking-the-device-list-for-a-user:

* We now keep a flag to tell us which users' device lists we are tracking. That
  makes it much easier to figure out whether we should care about device-update
  notifications from /sync (thereby fixing
  element-hq/element-web#3588).

* We use the same flag to indicate whether the device list for a particular
  user is out of date. Previously we did this implicitly by only updating the
  stored sync token when the list had been updated, but that was somewhat
  complicated, and in any case didn't help in cases where we initiated the key
  download due to a user joining an encrypted room.

Also fixes element-hq/element-web#3310.
richvdh added a commit to matrix-org/matrix-js-sdk that referenced this issue Apr 25, 2017
Yet another attempt at fixing
element-hq/element-web#2305.

This now implements the algorithm described at
http://matrix.org/speculator/spec/HEAD/client_server/unstable.html#tracking-the-device-list-for-a-user:

* We now keep a flag to tell us which users' device lists we are tracking. That
  makes it much easier to figure out whether we should care about device-update
  notifications from /sync (thereby fixing
  element-hq/element-web#3588).

* We use the same flag to indicate whether the device list for a particular
  user is out of date. Previously we did this implicitly by only updating the
  stored sync token when the list had been updated, but that was somewhat
  complicated, and in any case didn't help in cases where we initiated the key
  download due to a user joining an encrypted room.

Also fixes element-hq/element-web#3310.
@richvdh
Copy link
Member Author

richvdh commented Apr 25, 2017

Hopefully finally fixed by matrix-org/matrix-js-sdk#425

@richvdh richvdh closed this as completed Apr 25, 2017
richvdh added a commit to matrix-org/matrix-js-sdk that referenced this issue Jul 6, 2017
We now rely on the server to track new devices, and tell us about them when
users add them, rather than forcing devices to announce themselves (see
element-hq/element-web#2305 for the whole backstory
there).

The necessary support for that has now been in all the clients and the server
for several months (since March or so). I now want to get rid of the
localstorage store, which this code is relying on, so now seems like a good
time to get rid of it. Yay.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-E2EE P1 S-Minor Impairs non-critical functionality or suitable workarounds exist T-Defect
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants