Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster partial join to room with complex auth graph #7

Merged
merged 5 commits into from Jan 10, 2024

Conversation

erikjohnston
Copy link
Member

Instead of persisting outliers in a bunch of batches, let's just do them all at once.

This is fine because all _auth_and_persist_outliers_inner is doing is checking the auth rules for each event, which requires the events to be topologically sorted by the auth graph.

@erikjohnston erikjohnston requested a review from a team as a code owner December 15, 2023 12:47
@erikjohnston erikjohnston marked this pull request as draft December 15, 2023 12:47
@erikjohnston erikjohnston removed the request for review from a team December 15, 2023 12:47
@erikjohnston erikjohnston marked this pull request as ready for review January 4, 2024 11:04
@erikjohnston erikjohnston requested a review from a team January 8, 2024 20:44
@erikjohnston erikjohnston merged commit c3f2f0f into develop Jan 10, 2024
39 checks passed
@erikjohnston erikjohnston deleted the erikj/fix_join branch January 10, 2024 12:29
yingziwu added a commit to yingziwu/synapse that referenced this pull request Feb 1, 2024
No significant changes since 1.100.0rc3.

- Fix database performance regression due to changing Postgres table statistics. Introduced in v1.100.0rc1. ([\#16849](element-hq/synapse#16849))

This version is the same as 1.100.0rc1 but with fixes to the release process.

- Downgrade the `download-artifact` and `upload-artifact` actions to v3 due to breaking changes. ([\#16847](element-hq/synapse#16847))

*This version was never released to PyPI or the Debian repository due to failures in the automatic part of the release process.*

- Advertise experimental support for [MSC4028](matrix-org/matrix-spec-proposals#4028) through `/_matrix/clients/versions` if enabled. Contributed by @hanadi92. ([\#16787](element-hq/synapse#16787))

- Handle wildcard type filters properly for room messages endpoint. Contributed by Mo Balaa. ([\matrix-org#14984](element-hq/synapse#14984))

- Add a link to the "Request log format" explainer on the "Logging sample config" documentation page. ([\#16778](element-hq/synapse#16778))
- Fix broken links in issue templates and documentation. ([\#16810](element-hq/synapse#16810))
- NGINX listen http2 deprecation in documentation template for reverse proxy. ([\#16831](element-hq/synapse#16831))

- Faster partial join to room with complex auth graph. ([\matrix-org#7](element-hq/synapse#7))
- Improve DB performance of calculating badge counts for push. ([\matrix-org#16756](element-hq/synapse#16756))
- Split up deleting devices into batches. ([\matrix-org#16766](element-hq/synapse#16766))
- Remove CI check for sign-off as we require a CLA signature instead. ([\#16776](element-hq/synapse#16776))
- Ensure CI fails when linting fails to make sure auto-merge does the correct thing. ([\#16781](element-hq/synapse#16781))
- Faster load recents for sync by reducing amount of state pulled out. ([\#16783](element-hq/synapse#16783))
- Reduce amount of state pulled out when querying federation hierachy. ([\#16785](element-hq/synapse#16785))
- Pull less state out of the DB when we retry fetching old events during backfill. ([\#16788](element-hq/synapse#16788))
- Optimize query for fetching to-device messages in `/sync`. ([\#16805](element-hq/synapse#16805))
- Reject OIDC config when `client_secret` isn't specified, but the auth method requires one. ([\#16806](element-hq/synapse#16806))
- Allow room creation but not publishing to continue if room publication rules are violated when creating
  a new room. ([\#16811](element-hq/synapse#16811))
- Bump minimum supported Rust version to 1.65.0. ([\#16818](element-hq/synapse#16818))
- Fixup copyright lines in file headers after the licensing change. ([\#16820](element-hq/synapse#16820))
- Add a `--generate-only` option to the internal configuration/launch script for Complement. ([\#16828](element-hq/synapse#16828))
- Preparatory work for tweaking performance of auth chain lookups. ([\#16833](element-hq/synapse#16833))
- Speed up e2e device keys queries for bot accounts. ([\#16841](element-hq/synapse#16841))

* Bump actions/cache from 3 to 4. ([\#16832](element-hq/synapse#16832))
* Bump actions/download-artifact from 3 to 4. ([\#16795](element-hq/synapse#16795))
* Bump actions/upload-artifact from 3 to 4. ([\#16796](element-hq/synapse#16796))
* Bump anyhow from 1.0.75 to 1.0.79. ([\#16789](element-hq/synapse#16789))
* Bump authlib from 1.2.1 to 1.3.0. ([\#16801](element-hq/synapse#16801))
* Bump dawidd6/action-download-artifact from 2.28.0 to 3.0.0. ([\#16794](element-hq/synapse#16794))
* Bump immutabledict from 4.0.0 to 4.1.0. ([\#16812](element-hq/synapse#16812))
* Bump isort from 5.13.1 to 5.13.2. ([\#16835](element-hq/synapse#16835))
* Bump lxml from 4.9.3 to 5.1.0. ([\#16813](element-hq/synapse#16813))
* Bump pillow from 10.1.0 to 10.2.0. ([\#16802](element-hq/synapse#16802))
* Bump pydantic from 2.5.2 to 2.5.3. ([\#16836](element-hq/synapse#16836))
* Bump pyo3 from 0.20.0 to 0.20.2. ([\#16791](element-hq/synapse#16791))
* Bump regex from 1.9.6 to 1.10.3. ([\#16837](element-hq/synapse#16837))
* Bump ruff from 0.1.13 to 0.1.14. ([\#16838](element-hq/synapse#16838))
* Bump ruff from 0.1.7 to 0.1.13. ([\#16814](element-hq/synapse#16814))
* Bump sentry-sdk from 1.35.0 to 1.39.1. ([\#16799](element-hq/synapse#16799))
* Bump serde_json from 1.0.108 to 1.0.111. ([\#16792](element-hq/synapse#16792))
* Bump service-identity from 23.1.0 to 24.1.0. ([\#16816](element-hq/synapse#16816))
* Bump types-commonmark from 0.9.2.4 to 0.9.2.20240106. ([\#16797](element-hq/synapse#16797))
* Bump types-jsonschema from 4.20.0.0 to 4.20.0.20240105. ([\#16800](element-hq/synapse#16800))
* Bump types-jsonschema from 4.20.0.20240105 to 4.21.0.20240118. ([\#16834](element-hq/synapse#16834))
* Bump types-netaddr from 0.9.0.1 to 0.10.0.20240106. ([\#16839](element-hq/synapse#16839))
* Bump typing-extensions from 4.8.0 to 4.9.0. ([\#16815](element-hq/synapse#16815))
netbsd-srcmastr pushed a commit to NetBSD/pkgsrc that referenced this pull request Mar 3, 2024
!!! THIS CHANGES THE LICENSE TO AGPLv3 !!!


# Synapse 1.101.0 (2024-02-13)

### Bugfixes

- Fix performance regression when fetching auth chains from the DB. Introduced in v1.100.0. ([\#16893](element-hq/synapse#16893))




# Synapse 1.101.0rc1 (2024-02-06)

### Improved Documentation

- Fix broken links in the documentation. ([\#16853](element-hq/synapse#16853))
- Update MacOS installation instructions to mention that libicu is optional. ([\#16854](element-hq/synapse#16854))
- The version picker now correctly lists versions after `v1.98.0`. ([\#16880](element-hq/synapse#16880))

### Internal Changes

- Add support for stabilised [MSC3981](matrix-org/matrix-spec-proposals#3981) that adds a `recurse` parameter on the `/relations` API. ([\#16842](element-hq/synapse#16842))



### Updates to locked dependencies

* Bump dorny/paths-filter from 2 to 3. ([\#16869](element-hq/synapse#16869))
* Bump gitpython from 3.1.40 to 3.1.41. ([\#16850](element-hq/synapse#16850))
* Bump hiredis from 2.2.3 to 2.3.2. ([\#16862](element-hq/synapse#16862))
* Bump jsonschema from 4.20.0 to 4.21.1. ([\#16887](element-hq/synapse#16887))
* Bump lxml-stubs from 0.4.0 to 0.5.1. ([\#16885](element-hq/synapse#16885))
* Bump mypy-zope from 1.0.1 to 1.0.3. ([\#16865](element-hq/synapse#16865))
* Bump phonenumbers from 8.13.26 to 8.13.29. ([\#16868](element-hq/synapse#16868))
* Bump pydantic from 2.5.3 to 2.6.0. ([\#16888](element-hq/synapse#16888))
* Bump sentry-sdk from 1.39.1 to 1.40.0. ([\#16889](element-hq/synapse#16889))
* Bump serde from 1.0.195 to 1.0.196. ([\#16867](element-hq/synapse#16867))
* Bump serde_json from 1.0.111 to 1.0.113. ([\#16866](element-hq/synapse#16866))
* Bump sigstore/cosign-installer from 3.3.0 to 3.4.0. ([\#16890](element-hq/synapse#16890))
* Bump types-pillow from 10.1.0.2 to 10.2.0.20240125. ([\#16864](element-hq/synapse#16864))
* Bump types-requests from 2.31.0.10 to 2.31.0.20240125. ([\#16886](element-hq/synapse#16886))
* Bump types-setuptools from 69.0.0.0 to 69.0.0.20240125. ([\#16863](element-hq/synapse#16863))

# Synapse 1.100.0 (2024-01-30)

No significant changes since 1.100.0rc3.




# Synapse 1.100.0rc3 (2024-01-24)

### Bugfixes

- Fix database performance regression due to changing Postgres table statistics. Introduced in v1.100.0rc1. ([\#16849](element-hq/synapse#16849))




# Synapse 1.100.0rc2 (2024-01-24)

This version is the same as 1.100.0rc1 but with fixes to the release process.

### Internal Changes

- Downgrade the `download-artifact` and `upload-artifact` actions to v3 due to breaking changes. ([\#16847](element-hq/synapse#16847))


# Synapse 1.100.0rc1 (2024-01-23)

*This version was never released to PyPI or the Debian repository due to failures in the automatic part of the release process.*

### Features

- Advertise experimental support for [MSC4028](matrix-org/matrix-spec-proposals#4028) through `/_matrix/clients/versions` if enabled. Contributed by @hanadi92. ([\#16787](element-hq/synapse#16787))

### Bugfixes

- Handle wildcard type filters properly for room messages endpoint. Contributed by Mo Balaa. ([\#14984](element-hq/synapse#14984))

### Improved Documentation

- Add a link to the "Request log format" explainer on the "Logging sample config" documentation page. ([\#16778](element-hq/synapse#16778))
- Fix broken links in issue templates and documentation. ([\#16810](element-hq/synapse#16810))
- NGINX listen http2 deprecation in documentation template for reverse proxy. ([\#16831](element-hq/synapse#16831))

### Internal Changes

- Faster partial join to room with complex auth graph. ([\#7](element-hq/synapse#7))
- Improve DB performance of calculating badge counts for push. ([\#16756](element-hq/synapse#16756))
- Split up deleting devices into batches. ([\#16766](element-hq/synapse#16766))
- Remove CI check for sign-off as we require a CLA signature instead. ([\#16776](element-hq/synapse#16776))
- Ensure CI fails when linting fails to make sure auto-merge does the correct thing. ([\#16781](element-hq/synapse#16781))
- Faster load recents for sync by reducing amount of state pulled out. ([\#16783](element-hq/synapse#16783))
- Reduce amount of state pulled out when querying federation hierachy. ([\#16785](element-hq/synapse#16785))
- Pull less state out of the DB when we retry fetching old events during backfill. ([\#16788](element-hq/synapse#16788))
- Optimize query for fetching to-device messages in `/sync`. ([\#16805](element-hq/synapse#16805))
- Reject OIDC config when `client_secret` isn't specified, but the auth method requires one. ([\#16806](element-hq/synapse#16806))
- Allow room creation but not publishing to continue if room publication rules are violated when creating
  a new room. ([\#16811](element-hq/synapse#16811))
- Bump minimum supported Rust version to 1.65.0. ([\#16818](element-hq/synapse#16818))
- Fixup copyright lines in file headers after the licensing change. ([\#16820](element-hq/synapse#16820))
- Add a `--generate-only` option to the internal configuration/launch script for Complement. ([\#16828](element-hq/synapse#16828))
- Preparatory work for tweaking performance of auth chain lookups. ([\#16833](element-hq/synapse#16833))
- Speed up e2e device keys queries for bot accounts. ([\#16841](element-hq/synapse#16841))

### Updates to locked dependencies

* Bump actions/cache from 3 to 4. ([\#16832](element-hq/synapse#16832))
* Bump actions/download-artifact from 3 to 4. ([\#16795](element-hq/synapse#16795))
* Bump actions/upload-artifact from 3 to 4. ([\#16796](element-hq/synapse#16796))
* Bump anyhow from 1.0.75 to 1.0.79. ([\#16789](element-hq/synapse#16789))
* Bump authlib from 1.2.1 to 1.3.0. ([\#16801](element-hq/synapse#16801))
* Bump dawidd6/action-download-artifact from 2.28.0 to 3.0.0. ([\#16794](element-hq/synapse#16794))
* Bump immutabledict from 4.0.0 to 4.1.0. ([\#16812](element-hq/synapse#16812))
* Bump isort from 5.13.1 to 5.13.2. ([\#16835](element-hq/synapse#16835))
* Bump lxml from 4.9.3 to 5.1.0. ([\#16813](element-hq/synapse#16813))
* Bump pillow from 10.1.0 to 10.2.0. ([\#16802](element-hq/synapse#16802))
* Bump pydantic from 2.5.2 to 2.5.3. ([\#16836](element-hq/synapse#16836))
* Bump pyo3 from 0.20.0 to 0.20.2. ([\#16791](element-hq/synapse#16791))
* Bump regex from 1.9.6 to 1.10.3. ([\#16837](element-hq/synapse#16837))
* Bump ruff from 0.1.13 to 0.1.14. ([\#16838](element-hq/synapse#16838))
* Bump ruff from 0.1.7 to 0.1.13. ([\#16814](element-hq/synapse#16814))
* Bump sentry-sdk from 1.35.0 to 1.39.1. ([\#16799](element-hq/synapse#16799))
* Bump serde_json from 1.0.108 to 1.0.111. ([\#16792](element-hq/synapse#16792))
* Bump service-identity from 23.1.0 to 24.1.0. ([\#16816](element-hq/synapse#16816))
* Bump types-commonmark from 0.9.2.4 to 0.9.2.20240106. ([\#16797](element-hq/synapse#16797))
* Bump types-jsonschema from 4.20.0.0 to 4.20.0.20240105. ([\#16800](element-hq/synapse#16800))
* Bump types-jsonschema from 4.20.0.20240105 to 4.21.0.20240118. ([\#16834](element-hq/synapse#16834))
* Bump types-netaddr from 0.9.0.1 to 0.10.0.20240106. ([\#16839](element-hq/synapse#16839))
* Bump typing-extensions from 4.8.0 to 4.9.0. ([\#16815](element-hq/synapse#16815))


# Synapse 1.99.0 (2024-01-16)

Synapse 1.99.0 is the first Synapse release under an AGPLv3.0 licence (with CLA to enable Element to sell AGPL
exceptions). You can read more about this here:

 - https://matrix.org/blog/2023/11/06/future-of-synapse-dendrite/
 - https://element.io/blog/element-to-adopt-agplv3/
 - https://element.io/blog/synapse-now-lives-at-github-com-element-hq-synapse/

No significant changes since 1.99.0rc1.


# Synapse 1.99.0rc1 (2024-01-09)

### Features

- Add [config options](https://element-hq.github.io/synapse/v1.99/usage/configuration/config_documentation.html#server_notices) to set the avatar and the topic of the server notices room, as well as the avatar of the server notices user. ([\#16679](matrix-org/synapse#16679))
- Add config option [`email.notif_delay_before_mail`](https://element-hq.github.io/synapse/v1.99/usage/configuration/config_documentation.html#email) to tweak the delay before an email is sent following a notification. ([\#16696](matrix-org/synapse#16696))
- Add new configuration option [`sentry.environment`](https://element-hq.github.io/synapse/v1.99/usage/configuration/config_documentation.html#sentry) for improved system monitoring. Contributed by @zeeshanrafiqrana. ([\#16738](matrix-org/synapse#16738))
- Filter out rooms from the room directory being served to other homeservers when those rooms block that homeserver by their Access Control Lists. ([\#16759](element-hq/synapse#16759))

### Bugfixes

- Fix a long-standing bug where the signing keys generated by Synapse were world-readable. Contributed by Fabian Klemp. ([\#16740](matrix-org/synapse#16740))
- Fix email verification redirection. Contributed by Fadhlan Ridhwanallah. ([\#16761](element-hq/synapse#16761))
- Fixed a bug that prevented users from being queried by display name if it contains non-ASCII characters. ([\#16767](element-hq/synapse#16767))
- Allow reactivate user without password with Admin API in some edge cases. ([\#16770](element-hq/synapse#16770))
- Adds the `recursion_depth` parameter to the response of the /relations endpoint if MSC3981 recursion is being performed. ([\#16775](element-hq/synapse#16775))

### Improved Documentation

- Added version picker for Synapse documentation. Contributed by @Dmytro27Ind. ([\#16533](matrix-org/synapse#16533))
- Clarify that `password_config.enabled: "only_for_reauth"` does not allow new logins to be created using password auth. ([\#16737](matrix-org/synapse#16737))
- Remove value from header in configuration documentation for `refresh_token_lifetime`. ([\#16763](element-hq/synapse#16763))
- Add another custom statistics collection server to the documentation. Contributed by @loelkes. ([\#16769](element-hq/synapse#16769))

### Internal Changes

- Remove run-once workflow after adding the version picker to the documentation. ([\#9453](element-hq/synapse#9453))
- Update the implementation of [MSC2965](matrix-org/matrix-spec-proposals#2965) (OIDC Provider discovery). ([\#16726](matrix-org/synapse#16726))
- Move the rust stubs inline for better IDE integration. ([\#16757](element-hq/synapse#16757))
- Fix sample config doc CI. ([\#16758](element-hq/synapse#16758))
- Simplify event internal metadata class. ([\#16762](element-hq/synapse#16762), [\#16780](element-hq/synapse#16780))
- Sign the published docker image using [cosign](https://docs.sigstore.dev/). ([\#16774](element-hq/synapse#16774))
- Port `EventInternalMetadata` class to Rust. ([\#16782](element-hq/synapse#16782))



### Updates to locked dependencies

* Bump actions/setup-go from 4 to 5. ([\#16749](matrix-org/synapse#16749))
* Bump actions/setup-python from 4 to 5. ([\#16748](matrix-org/synapse#16748))
* Bump immutabledict from 3.0.0 to 4.0.0. ([\#16743](matrix-org/synapse#16743))
* Bump isort from 5.12.0 to 5.13.0. ([\#16745](matrix-org/synapse#16745))
* Bump isort from 5.13.0 to 5.13.1. ([\#16752](matrix-org/synapse#16752))
* Bump pydantic from 2.5.1 to 2.5.2. ([\#16747](matrix-org/synapse#16747))
* Bump ruff from 0.1.6 to 0.1.7. ([\#16746](matrix-org/synapse#16746))
* Bump types-setuptools from 68.2.0.2 to 69.0.0.0. ([\#16744](matrix-org/synapse#16744))
erikjohnston pushed a commit that referenced this pull request Mar 12, 2024
This PR aims to fix #16895, caused by a regression in #7 and not fixed
by #16903. The PR #16903 only fixes a starvation issue, where the CPU
isn't released. There is a second issue, where the execution is blocked.
This theory is supported by the flame graphs provided in #16895 and the
fact that I see the CPU usage reducing and far below the limit.

Since the changes in #7, the method `check_state_independent_auth_rules`
is called with the additional parameter `batched_auth_events`:


https://github.com/element-hq/synapse/blob/6fa13b4f927c10b5f4e9495be746ec28849f5cb6/synapse/handlers/federation_event.py#L1741-L1743


It makes the execution enter this if clause, introduced with #15195


https://github.com/element-hq/synapse/blob/6fa13b4f927c10b5f4e9495be746ec28849f5cb6/synapse/event_auth.py#L178-L189

There are two issues in the above code snippet.

First, there is the blocking issue. I'm not entirely sure if this is a
deadlock, starvation, or something different. In the beginning, I
thought the copy operation was responsible. It wasn't. Then I
investigated the nested `store.get_events` inside the function `update`.
This was also not causing the blocking issue. Only when I replaced the
set difference operation (`-` ) with a list comprehension, the blocking
was resolved. Creating and comparing sets with a very large amount of
events seems to be problematic.

This is how the flamegraph looks now while persisting outliers. As you
can see, the execution no longer locks up in the above function.

![output_2024-02-28_13-59-40](https://github.com/element-hq/synapse/assets/13143850/6db9c9ac-484f-47d0-bdde-70abfbd773ec)

Second, the copying here doesn't serve any purpose, because only a
shallow copy is created. This means the same objects from the original
dict are referenced. This fails the intention of protecting these
objects from mutation. The review of the original PR
matrix-org/synapse#15195 had an extensive
discussion about this matter.

Various approaches to copying the auth_events were attempted:
1) Implementing a deepcopy caused issues due to
builtins.EventInternalMetadata not being pickleable.
2) Creating a dict with new objects akin to a deepcopy.
3) Creating a dict with new objects containing only necessary
attributes.

Concluding, there is no easy way to create an actual copy of the
objects. Opting for a deepcopy can significantly strain memory and CPU
resources, making it an inefficient choice. I don't see why the copy is
necessary in the first place. Therefore I'm proposing to remove it
altogether.

After these changes, I was able to successfully join these rooms,
without the main worker locking up:
- #synapse:matrix.org
- #element-android:matrix.org
- #element-web:matrix.org
- #ecips:matrix.org
- #ipfs-chatter:ipfs.io
- #python:matrix.org
- #matrix:matrix.org
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants