New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up lazy loading #3827

Merged
merged 23 commits into from Sep 11, 2018

Conversation

Projects
None yet
2 participants
@ara4n
Member

ara4n commented Sep 7, 2018

Rather than pulling room member events out of state storage to calculate room summaries, instead generate it from the room_memberships table.

ara4n added some commits Sep 4, 2018

@ara4n

This comment has been minimized.

Member

ara4n commented Sep 7, 2018

(this speeds up LL by about 2-3x)

@ara4n ara4n requested a review from erikjohnston Sep 7, 2018

ara4n added some commits Sep 7, 2018

@erikjohnston

I've only had a chance to look at the lazy loading bits of this. Can you please split out the fixes to get_filtered_current_state_ids to a separate PR please?

Things look broadly fine, other than its quite confusing trying to keep track of what r is any given situation. Hopefully the suggested changes will help clear things up a bit.

The only major thing is whether we are currently pulling the members out of the database. It feels odd to pull out 6 arbitrary users regardless of membership, while it seems the sync code assumes that e.g. if there are joined users in the room MemberSummary.members will be nonempty

" AND m.user_id = c.state_key"
" WHERE c.type = 'm.room.member' AND c.room_id = ?"
" GROUP BY m.membership"
)

This comment has been minimized.

@erikjohnston

erikjohnston Sep 10, 2018

Member

FYI you can write:

sql = """
    SELECT count(*), m.membership FROM room_memberships as m 
    INNER JOIN current_state_events as c
    ON m.event_id = c.event_id
    AND m.room_id = c.room_id
    AND m.user_id = c.state_key
    WHERE c.type = 'm.room.member' AND c.room_id = ?
    GROUP BY m.membership
"""

rather than faffing around with quotes. It has the added benefit that it makes it much easier to c+p :)

This comment has been minimized.

@ara4n

ara4n Sep 10, 2018

Member

agreed; this was just trying to be consistent with what was already there.

txn.execute(sql, (room_id,))
res = {}
for r in txn:
summary = res.setdefault(to_ascii(r[1]), MemberSummary([], r[0]))

This comment has been minimized.

@erikjohnston

erikjohnston Sep 10, 2018

Member

It'd be nicer to write

for count, membership in txn:
    ...

so that its easier to verify that things match the intent

dict of membership states, pointing to a MemberSummary named tuple.
"""
def f(txn):

This comment has been minimized.

@erikjohnston

erikjohnston Sep 10, 2018

Member

Please name these something like _get_room_summary_txn or whatever, it makes stack traces a lot easier to understand

This comment has been minimized.

@ara4n

ara4n Sep 10, 2018

Member

(this was following the pattern already present, fwiw)

txn.execute(sql, (room_id, 6))
for r in txn:
summary = res.get(to_ascii(r[1]))
members = summary.members

This comment has been minimized.

@erikjohnston

erikjohnston Sep 10, 2018

Member

Won't this explode if summary is None?

This comment has been minimized.

@erikjohnston

erikjohnston Sep 10, 2018

Member

I suppose it'll definitely not be None due the shape of the queries, but then we should probably not use get to make the intent a bit clearer.

Show resolved Hide resolved synapse/storage/events.py
Membership.BAN
):
for r in details.get(m, ([], 0))[0]:
member_ids[r[0]] = r[1]

This comment has been minimized.

@erikjohnston

erikjohnston Sep 10, 2018

Member

I'm really not following what this is doing. Can we choose variable names and/or unpack tuples to help clarity please?

Unpacking this a bit seems to give:

for user_id, event_id in details.get(membership, empty_ms).members:
     member_ids[user_id] = event_id

I think?

@@ -51,6 +51,12 @@
"ProfileInfo", ("avatar_url", "display_name")
)
# "members" points to a truncated list of (user_id, event_id) tuples for users of
# a given membership type, suitable for use in calculating heroes for a room.
# "count" points to the total numberr of users of a given membership type.

This comment has been minimized.

@erikjohnston

erikjohnston Sep 10, 2018

Member

Is there a reason that members is a list of tuples rather than a dict?

This comment has been minimized.

@ara4n

ara4n Sep 10, 2018

Member

yes; i was worried about creating a new large cache in addition to the one we already maintain of room members, and so wanted it to be small (i'm assuming that tuples take up less room as they don't include key strings).

This comment has been minimized.

@erikjohnston

erikjohnston Sep 11, 2018

Member

(i'm assuming that tuples take up less room as they don't include key strings)

Well this tuple is including the same number of strings, right? It might save a bit by not including the hash map I guess

This comment has been minimized.

@erikjohnston

erikjohnston Sep 11, 2018

Member

If we're worried about the size of the cache then what we can do is have get_room_summary return an object with a custom __len__ implementation that returns the sum of the sizes of MemberSummary.members. This would ensure that the total number of tuples stored by the cache is less than the given cache size (as opposed to the number of room summaries being less than the given cache size)

Show resolved Hide resolved synapse/handlers/sync.py
# XXX: this may include false positives in the form of LL
# members which have snuck into state
batch.limited and
any(t == EventTypes.Member for (t, k) in state.keys())

This comment has been minimized.

@erikjohnston

erikjohnston Sep 10, 2018

Member

Please don't use .keys() or .items() etc, they are expensive in py2 due to them copying the data into a list. Either use six.iteritems etc to get iterators, or in this case simply drop .keys() as iterating over a dict iterates over the keys anyway.

This comment has been minimized.

@ara4n

ara4n Sep 10, 2018

Member

TIL :)

# members which have snuck into state
batch.limited and
any(t == EventTypes.Member for (t, k) in state.keys())
) or
since_token is None
)

This comment has been minimized.

@erikjohnston

erikjohnston Sep 10, 2018

Member

Also, can this have a comment to explain why we're not always including the summary and when we do include a summary? The logic is non-trivial and its hard to know whether its right or not without a guiding comment.

ara4n added some commits Sep 10, 2018

@ara4n

This comment has been minimized.

Member

ara4n commented Sep 10, 2018

Can you please split out the fixes to get_filtered_current_state_ids to a separate PR please?

done, it's back in #3792

The only major thing is whether we are currently pulling the members out of the database. It feels odd to pull out 6 arbitrary users regardless of membership, while it seems the sync code assumes that e.g. if there are joined users in the room MemberSummary.members will be nonempty

Right. i guess it needs an order by membership in there to make sure we prefer joined users over invited ones and only use left users in a disaster.

@ara4n

This comment has been minimized.

Member

ara4n commented Sep 10, 2018

@ara4n ara4n assigned erikjohnston and unassigned ara4n Sep 10, 2018

@ara4n

This comment has been minimized.

Member

ara4n commented Sep 11, 2018

i got an IRL lgtm from @erikjohnston on this. i think.

@ara4n

This comment has been minimized.

Member

ara4n commented Sep 11, 2018

n.b. this is now twinned with matrix-org/sytest#489

@erikjohnston

Modulo perhaps wanting to further pin down the ordering of heroes

@ara4n

This comment has been minimized.

Member

ara4n commented Sep 11, 2018

the failed test here seems to be unrelated; merging anyway.

@ara4n ara4n merged commit b041115 into develop Sep 11, 2018

7 of 8 checks passed

ci/circleci: sytestpy2postgres Your tests failed on CircleCI
Details
Synapse Sytest Postgres (Commit) Build #6954 origin/matthew/speed-up-ll succeeded in 6 min 28 sec
Details
Synapse Sytest Postgres (Merged PR) Build finished.
Details
Synapse Sytest SQLite (Commit) Build #7157 origin/matthew/speed-up-ll succeeded in 3 min 29 sec
Details
Synapse Sytest SQLite (Merged PR) Build finished.
Details
ci/circleci: sytestpy2 Your tests passed on CircleCI!
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details

@hawkowl hawkowl deleted the matthew/speed-up-ll branch Sep 20, 2018

hawkowl added a commit that referenced this pull request Sep 24, 2018

Merge tag 'v0.33.5'
Features
--------

- Python 3.5 and 3.6 support is now in beta.
([\#3576](#3576))
- Implement `event_format` filter param in `/sync`
([\#3790](#3790))
- Add synapse_admin_mau:registered_reserved_users metric to expose
number of real reaserved users
([\#3846](#3846))

Bugfixes
--------

- Remove connection ID for replication prometheus metrics, as it creates
a large number of new series.
([\#3788](#3788))
- guest users should not be part of mau total
([\#3800](#3800))
- Bump dependency on pyopenssl 16.x, to avoid incompatibility with
recent Twisted.
([\#3804](#3804))
- Fix existing room tags not coming down sync when joining a room
([\#3810](#3810))
- Fix jwt import check
([\#3824](#3824))
- fix VOIP crashes under Python 3 (#3821)
([\#3835](#3835))
- Fix manhole so that it works with latest openssh clients
([\#3841](#3841))
- Fix outbound requests occasionally wedging, which can result in
federation breaking between servers.
([\#3845](#3845))
- Show heroes if room name/canonical alias has been deleted
([\#3851](#3851))
- Fix handling of redacted events from federation
([\#3859](#3859))
-  ([\#3874](#3874))
- Mitigate outbound federation randomly becoming wedged
([\#3875](#3875))

Internal Changes
----------------

- CircleCI tests now run on the potential merge of a PR.
([\#3704](#3704))
- http/ is now ported to Python 3.
([\#3771](#3771))
- Improve human readable error messages for threepid
registration/account update
([\#3789](#3789))
- Make /sync slightly faster by avoiding needless copies
([\#3795](#3795))
- handlers/ is now ported to Python 3.
([\#3803](#3803))
- Limit the number of PDUs/EDUs per federation transaction
([\#3805](#3805))
- Only start postgres instance for postgres tests on Travis CI
([\#3806](#3806))
- tests/ is now ported to Python 3.
([\#3808](#3808))
- crypto/ is now ported to Python 3.
([\#3822](#3822))
- rest/ is now ported to Python 3.
([\#3823](#3823))
- add some logging for the keyring queue
([\#3826](#3826))
- speed up lazy loading by 2-3x
([\#3827](#3827))
- Improved Dockerfile to remove build requirements after building
reducing the image size.
([\#3834](#3834))
- Disable lazy loading for incremental syncs for now
([\#3840](#3840))
- federation/ is now ported to Python 3.
([\#3847](#3847))
- Log when we retry outbound requests
([\#3853](#3853))
- Removed some excess logging messages.
([\#3855](#3855))
- Speed up purge history for rooms that have been previously purged
([\#3856](#3856))
- Refactor some HTTP timeout code.
([\#3857](#3857))
- Fix running merged builds on CircleCI
([\#3858](#3858))
- Fix typo in replication stream exception.
([\#3860](#3860))
- Add in flight real time metrics for Measure blocks
([\#3871](#3871))
- Disable buffering and automatic retrying in treq requests to prevent
timeouts. ([\#3872](#3872))
- mention jemalloc in the README
([\#3877](#3877))
- Remove unmaintained "nuke-room-from-db.sh" script
([\#3888](#3888))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment