[E2E] track one time keys count in sync response #237

Zil0 · 2018-06-24T02:29:43Z

This feature is currently undocumented, but quite self-explanatory.
Here is my try at the missing documentation, not yet reviewed: matrix-org/matrix-spec-proposals@fce04f0

I wonder if this crosses the red-line "never initiate a new request during /sync/ processing".

Signed-off-by: Valentin Deniaud <valentin.deniaud@inpt.fr>

non-Jedi

I feel not great about blocking the _sync call to do other IO, but I think it would be a better situation if it was formulated as a listener where other such things would be expected to live? Still need to think about this a little more. Maybe we need to make that configurable with instantiation of MatrixClient object?

non-Jedi · 2018-07-02T02:07:16Z

matrix_client/crypto/olm_device.py

+                 user_id,
+                 device_id,
+                 signed_otk_proportion=1,
+                 otk_threshold=0.1):


Is 0.1 the value used by Riot?

I hadn't checked while writing this. Looking at it now it appears Riot uploads keys as soon as it can, and considers that generating a lot of one-time keys at time is slow. I tend to think that it is a JS thing, as on my machine generating the max number of one-time keys is a lot faster than any network operation. @richvdh may have some insight?

According to what @richvdh told me, this should be fine.

non-Jedi · 2018-07-02T02:07:47Z

matrix_client/crypto/olm_device.py

            int(round(signed_otk_proportion * target_keys_number))
        self.otk_target_counts['curve25519'] = \
            int(round((1 - signed_otk_proportion) * target_keys_number))
+        if not 0 <= otk_threshold <= 1:


Move this validity check to top of method as well please.

non-Jedi · 2018-07-02T02:15:51Z

matrix_client/client.py

            if room_id in self.rooms:
                del self.rooms[room_id]

+        if self.encryption and 'device_one_time_keys_count' in response:


Any chance we could formulate this as a listener that gets automatically added if encryption=True so we can keep all _sync actions other than simply updating local state in a single place?

Mmmh, I'm not seeing how without adding a new listener loop, which seems pretty overkill for doing exactly the same thing differently.
However, this is only the first sync extension of 3 (device_list and to_device are added later, and are non-blocking), so adding a loop might be worthwhile, although I'm not sure how clear it would look. Since we should get that right now, you may want to have a look at how _sync ends up looking at the end: https://github.com/matrix-org/matrix-python-sdk/pull/241/files#diff-6770c150117812114aea5a8aab1b7da9L554

Ya. I see that listeners only work for events, and this is a top-level field in the response. Hm. After mulling it over, I think we should keep the listener system for user code.

I think this is probably fine for now, but could you add a # TODO comment above _sync mentioning this behavior so that when we get around to turning _sync into a proper public method with a docstring, we remember to document this?

Zil0 · 2018-07-02T14:25:45Z

We should also consider that it is very likely that under reasonable load the request will be triggered maybe once a day, or even once a week: someone claims one of our one-time key only the first time they speak to us in an encrypted room; the Olm session derived from this key is kept forever.
Anyway, another solution is to spawn a thread and add basic locking. This is about 5 loc, but seems overkill too.

uhoreg · 2018-07-11T04:17:15Z

Comparing with what the JS SDK does,

the JS SDK only generates and uploads 5 one-time keys at a time (https://github.com/matrix-org/matrix-js-sdk/blob/master/src/crypto/index.js#L284..L288) because generating the OTKs can take a long time. It should be faster in the Python SDK because he olm library is running native code, but it might be worth doing some timings to see if it should also limit the number of keys that it generates at a time
the JS SDK doesn't update the one-time-keys until after it has caught up on toDevice messages (https://github.com/matrix-org/matrix-js-sdk/blob/master/src/crypto/index.js#L943..L946 and Upload one-time keys on /sync rather than a timer matrix-js-sdk#376) to avoid deleting one-time keys that are about to be used. It doesn't look like you're handling toDevice messages yet (I guess that comes in a later PR), so this may or may not be a concern
it looks like it tries to upload as many one-time keys as possible, which I guess is the same as setting the keys_threshold parameter to 1, so maybe that should be the default value at https://github.com/matrix-org/matrix-python-sdk/pull/237/files#diff-b0fcf5d6f2d0d6eda3db1946fac9eb4aR42

Zil0 · 2018-07-12T10:48:16Z

Thanks for those details @uhoreg.

Here are the timings:

In [5]: %timeit a.generate_one_time_keys(100)
21.7 ms ± 158 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [6]: %timeit a.generate_one_time_keys(5)
1.08 ms ± 5.22 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [14]: %timeit requests.get('http://matrix.org/_matrix/client/versions')
101 ms ± 25.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

I was too lazy to properly test the /keys/upload request but you get the idea. Also, my computer is pretty fast, but my network is too.
The conclusion I draw from this is that we should make as few requests as possible, by generating as much one-time keys as possible.

This reasoning also answers your last point. We don't want to set keys_threshold to 1 because then we would do a lot of requests with very few one-time keys. Also, this code is not asynchronous like JS, and we want to block as rarely as possible.
Let me know if you can think of a real-world scenario where having a low number of one-time keys on the HS is bad. As I see it, the worst case where we're just above the threshold, hence not triggering a new key generation, and more device than the HS has keys left want to talk to us for the first time at the exact same moment, thus not letting us replenish in time, will never happen under a normal load. Applications expecting huge traffic can set the threshold to something higher.

Your second point is a very good catch! toDevice messages support is added in another PR, but I missed that and they are effectively handled after the key update. Will fix :)

uhoreg · 2018-07-12T14:59:39Z

@Zil0 Thanks for the timings. That seems reasonable, then, at least for now. Perhaps in the future, we might want to make it configurable, but it should be fine for now. So this one seems good to go, aside from @non-Jedi's question of how to handle uploading the keys in the middle of the sync, which I don't really have an opinion on.

non-Jedi reviewed Jul 2, 2018

View reviewed changes

Zil0 added 2 commits July 13, 2018 17:34

track one time keys count in sync response

ef127cc

add one-time key counts tracking tests

f685755

non-Jedi merged commit f685755 into matrix-org:master Jul 13, 2018

[E2E] track one time keys count in sync response #237

[E2E] track one time keys count in sync response #237

Uh oh!

Conversation

Zil0 commented Jun 24, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

non-Jedi left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Zil0 commented Jul 2, 2018

Uh oh!

uhoreg commented Jul 11, 2018

Uh oh!

Zil0 commented Jul 12, 2018

Uh oh!

uhoreg commented Jul 12, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Zil0 commented Jun 24, 2018 •

edited

Loading