Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure that pending to-device events are sent over federation at startup #16925

Merged
merged 8 commits into from Mar 22, 2024

Conversation

richvdh
Copy link
Member

@richvdh richvdh commented Feb 15, 2024

Fixes #16680, as well as a related bug, where servers which we had never successfully sent an event to would not be retried.

In order to fix the case of pending to-device messages, we hook into the existing wake_destinations_needing_catchup process, by extending it to look for destinations that have pending to-device messages. The federation transmission loop then attempts to send the pending to-device messages as normal.

Suggest review commit-by-commit.

When picking destination servers that we need to wake up for a retry, we need
to be mindful of destinations that we have *never* successfully sent to. This
can manifest either as a null `last_successful_stream_ordering`, or even no row
in `destinations` at all.

Hence, we need to left-join on `destinations` rather than inner-joining, and we
need to treat a null `last_successful_stream_ordering` the same as 0.
When considering which destinations need waking up for a retry, also look for
those that have outstanding to-device messages.
We don't really want people to have to wait 60 seconds for their to-device
messages.
@richvdh richvdh requested a review from a team as a code owner February 15, 2024 12:43
Copy link
Member

@erikjohnston erikjohnston left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

otherwise lgtm

run_as_background_process(
"wake_destinations_needing_catchup",
self._wake_destinations_needing_catchup,
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should add a now param to looping_call (as the underlying twisted thing supports it), that way if the initial run takes a long time a second run won't get started by the looping call.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hum. Turns out this screws up the type checker.

In order to add the new parameter without breaking existing code, it needs to be a named parameter (because excess positional parameters are passed to the wrapped function). However, it turns out that ParamSpec is incompatible with additional keyword args (https://peps.python.org/pep-0612/#id2 actually mentions this: "Note that this also why we have to reject signatures of the form (*args: P.args, s: str, **kwargs: P.kwargs).)

So, three options:

  • Add a positional parameter to looping_call, and update all existing calls to looping_call (55 of them, according to my IDE).
  • Add a new method looping_call2 (or something) which takes a positional now parameter. Use it here, and deprecate looping_call.
  • Add a new method looping_call_now, which is exactly the same as looping_call except for the obvious. (The implementation of this will probably involve a private equivalent to looping_call2, shared between looping_call_now and looping_call).

My instinct is number 3, but happy with whatever you think.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I took an executive decision.

@richvdh richvdh enabled auto-merge (squash) March 14, 2024 18:40
@richvdh richvdh disabled auto-merge March 14, 2024 19:05
@richvdh richvdh enabled auto-merge (squash) March 20, 2024 10:25
@richvdh
Copy link
Member Author

richvdh commented Mar 21, 2024

The tests are now failing due to db connection pool shenanigans. #17017 should fix it.

@richvdh richvdh disabled auto-merge March 21, 2024 13:13
@richvdh richvdh merged commit b5322b4 into develop Mar 22, 2024
38 checks passed
@richvdh richvdh deleted the rav/catch_up_to_device_messages branch March 22, 2024 13:24
netbsd-srcmastr pushed a commit to NetBSD/pkgsrc that referenced this pull request Apr 2, 2024
# Synapse 1.104.0 (2024-04-02)

### Bugfixes

- Fix regression when using OIDC provider. Introduced in v1.104.0rc1. ([\#17031](element-hq/synapse#17031))


# Synapse 1.104.0rc1 (2024-03-26)

### Features

- Add an OIDC config to specify extra parameters for the authorization grant URL. IT can be useful to pass an ACR value for example. ([\#16971](element-hq/synapse#16971))
- Add support for OIDC provider returning JWT. ([\#16972](element-hq/synapse#16972), [\#17031](element-hq/synapse#17031))

### Bugfixes

- Fix a bug which meant that, under certain circumstances, we might never retry sending events or to-device messages over federation after a failure. ([\#16925](element-hq/synapse#16925))
- Fix various long-standing bugs which could cause incorrect state to be returned from `/sync` in certain situations. ([\#16949](element-hq/synapse#16949))
- Fix case in which `m.fully_read` marker would not get updated. Contributed by @SpiritCroc. ([\#16990](element-hq/synapse#16990))
- Fix bug which did not retract a user's pending knocks at rooms when their account was deactivated. Contributed by @hanadi92. ([\#17010](element-hq/synapse#17010))

### Updates to the Docker image

- Updated `start.py` to generate config using the correct user ID when running as root (fixes [\#16824](element-hq/synapse#16824), [\#15202](element-hq/synapse#15202)). ([\#16978](element-hq/synapse#16978))

### Improved Documentation

- Add a query to force a refresh of a remote user's device list to the "Useful SQL for Admins" documentation page. ([\#16892](element-hq/synapse#16892))
- Minor grammatical corrections to the upgrade documentation. ([\#16965](element-hq/synapse#16965))
- Fix the sort order for the documentation version picker, so that newer releases appear above older ones. ([\#16966](element-hq/synapse#16966))
- Remove recommendation for a specific poetry version from contributing guide. ([\#17002](element-hq/synapse#17002))

### Internal Changes

- Improve lock performance when a lot of locks are all waiting for a single lock to be released. ([\#16840](element-hq/synapse#16840))
- Update power level default for public rooms. ([\#16907](element-hq/synapse#16907))
- Improve event validation. ([\#16908](element-hq/synapse#16908))
- Multi-worker-docker-container: disable log buffering. ([\#16919](element-hq/synapse#16919))
- Refactor state delta calculation in `/sync` handler. ([\#16929](element-hq/synapse#16929))
- Clarify docs for some room state functions. ([\#16950](element-hq/synapse#16950))
- Specify IP subnets in canonical form. ([\#16953](element-hq/synapse#16953))
- As done for SAML mapping provider, let's pass the module API to the OIDC one so the mapper can do more logic in its code. ([\#16974](element-hq/synapse#16974))
- Allow containers building on top of Synapse's Complement container is use the included PostgreSQL cluster. ([\#16985](element-hq/synapse#16985))
- Raise poetry-core version cap to 1.9.0. ([\#16986](element-hq/synapse#16986))
- Patch the db conn pool sooner in tests. ([\#17017](element-hq/synapse#17017))



### Updates to locked dependencies

* Bump anyhow from 1.0.80 to 1.0.81. ([\#17009](element-hq/synapse#17009))
* Bump black from 23.10.1 to 24.2.0. ([\#16936](element-hq/synapse#16936))
* Bump cryptography from 41.0.7 to 42.0.5. ([\#16958](element-hq/synapse#16958))
* Bump dawidd6/action-download-artifact from 3.1.1 to 3.1.2. ([\#16960](element-hq/synapse#16960))
* Bump dawidd6/action-download-artifact from 3.1.2 to 3.1.4. ([\#17008](element-hq/synapse#17008))
* Bump jinja2 from 3.1.2 to 3.1.3. ([\#17005](element-hq/synapse#17005))
* Bump log from 0.4.20 to 0.4.21. ([\#16977](element-hq/synapse#16977))
* Bump mypy from 1.5.1 to 1.8.0. ([\#16901](element-hq/synapse#16901))
* Bump netaddr from 0.9.0 to 1.2.1. ([\#17006](element-hq/synapse#17006))
* Bump pydantic from 2.6.0 to 2.6.4. ([\#17004](element-hq/synapse#17004))
* Bump pyo3 from 0.20.2 to 0.20.3. ([\#16962](element-hq/synapse#16962))
* Bump ruff from 0.1.14 to 0.3.2. ([\#16994](element-hq/synapse#16994))
* Bump serde from 1.0.196 to 1.0.197. ([\#16963](element-hq/synapse#16963))
* Bump serde_json from 1.0.113 to 1.0.114. ([\#16961](element-hq/synapse#16961))
* Bump types-jsonschema from 4.21.0.20240118 to 4.21.0.20240311. ([\#17007](element-hq/synapse#17007))
* Bump types-psycopg2 from 2.9.21.16 to 2.9.21.20240311. ([\#16995](element-hq/synapse#16995))
* Bump types-pyopenssl from 23.3.0.0 to 24.0.0.20240311. ([\#17003](element-hq/synapse#17003))

# Synapse 1.103.0 (2024-03-19)

No significant changes since 1.103.0rc1.




# Synapse 1.103.0rc1 (2024-03-12)

### Features

- Add a new [List Accounts v3](https://element-hq.github.io/synapse/v1.103/admin_api/user_admin_api.html#list-accounts-v3) Admin API with improved deactivated user filtering capabilities. ([\#16874](element-hq/synapse#16874))
- Include `Retry-After` header by default per [MSC4041](matrix-org/matrix-spec-proposals#4041). Contributed by @clokep. ([\#16947](element-hq/synapse#16947))

### Bugfixes

- Fix joining remote rooms when a module uses the `on_new_event` callback. This callback may now pass partial state events instead of the full state for remote rooms. Introduced in v1.76.0. ([\#16973](element-hq/synapse#16973))
- Fix performance issue when joining very large rooms that can cause the server to lock up. Introduced in v1.100.0. Contributed by @ggogel. ([\#16968](element-hq/synapse#16968))

### Improved Documentation

- Add HAProxy example for single port operation to reverse proxy documentation. Contributed by Georg Pfuetzenreuter (@tacerus). ([\#16768](element-hq/synapse#16768))
- Improve the documentation around running Complement tests with new configuration parameters. ([\#16946](element-hq/synapse#16946))
- Add docs on upgrading from a very old version. ([\#16951](element-hq/synapse#16951))


### Updates to locked dependencies

* Bump JasonEtco/create-an-issue from 2.9.1 to 2.9.2. ([\#16934](element-hq/synapse#16934))
* Bump anyhow from 1.0.79 to 1.0.80. ([\#16935](element-hq/synapse#16935))
* Bump dawidd6/action-download-artifact from 3.0.0 to 3.1.1. ([\#16933](element-hq/synapse#16933))
* Bump furo from 2023.9.10 to 2024.1.29. ([\#16939](element-hq/synapse#16939))
* Bump pyopenssl from 23.3.0 to 24.0.0. ([\#16937](element-hq/synapse#16937))
* Bump types-netaddr from 0.10.0.20240106 to 1.2.0.20240219. ([\#16938](element-hq/synapse#16938))
yingziwu added a commit to yingziwu/synapse that referenced this pull request Apr 3, 2024
- Fix regression when using OIDC provider. Introduced in v1.104.0rc1. ([\#17031](element-hq/synapse#17031))

- Add an OIDC config to specify extra parameters for the authorization grant URL. IT can be useful to pass an ACR value for example. ([\#16971](element-hq/synapse#16971))
- Add support for OIDC provider returning JWT. ([\#16972](element-hq/synapse#16972), [\#17031](element-hq/synapse#17031))

- Fix a bug which meant that, under certain circumstances, we might never retry sending events or to-device messages over federation after a failure. ([\#16925](element-hq/synapse#16925))
- Fix various long-standing bugs which could cause incorrect state to be returned from `/sync` in certain situations. ([\#16949](element-hq/synapse#16949))
- Fix case in which `m.fully_read` marker would not get updated. Contributed by @SpiritCroc. ([\#16990](element-hq/synapse#16990))
- Fix bug which did not retract a user's pending knocks at rooms when their account was deactivated. Contributed by @hanadi92. ([\#17010](element-hq/synapse#17010))

- Updated `start.py` to generate config using the correct user ID when running as root (fixes [\#16824](element-hq/synapse#16824), [\matrix-org#15202](element-hq/synapse#15202)). ([\#16978](element-hq/synapse#16978))

- Add a query to force a refresh of a remote user's device list to the "Useful SQL for Admins" documentation page. ([\#16892](element-hq/synapse#16892))
- Minor grammatical corrections to the upgrade documentation. ([\#16965](element-hq/synapse#16965))
- Fix the sort order for the documentation version picker, so that newer releases appear above older ones. ([\#16966](element-hq/synapse#16966))
- Remove recommendation for a specific poetry version from contributing guide. ([\#17002](element-hq/synapse#17002))

- Improve lock performance when a lot of locks are all waiting for a single lock to be released. ([\#16840](element-hq/synapse#16840))
- Update power level default for public rooms. ([\#16907](element-hq/synapse#16907))
- Improve event validation. ([\#16908](element-hq/synapse#16908))
- Multi-worker-docker-container: disable log buffering. ([\#16919](element-hq/synapse#16919))
- Refactor state delta calculation in `/sync` handler. ([\#16929](element-hq/synapse#16929))
- Clarify docs for some room state functions. ([\#16950](element-hq/synapse#16950))
- Specify IP subnets in canonical form. ([\#16953](element-hq/synapse#16953))
- As done for SAML mapping provider, let's pass the module API to the OIDC one so the mapper can do more logic in its code. ([\#16974](element-hq/synapse#16974))
- Allow containers building on top of Synapse's Complement container is use the included PostgreSQL cluster. ([\#16985](element-hq/synapse#16985))
- Raise poetry-core version cap to 1.9.0. ([\#16986](element-hq/synapse#16986))
- Patch the db conn pool sooner in tests. ([\#17017](element-hq/synapse#17017))

* Bump anyhow from 1.0.80 to 1.0.81. ([\#17009](element-hq/synapse#17009))
* Bump black from 23.10.1 to 24.2.0. ([\#16936](element-hq/synapse#16936))
* Bump cryptography from 41.0.7 to 42.0.5. ([\#16958](element-hq/synapse#16958))
* Bump dawidd6/action-download-artifact from 3.1.1 to 3.1.2. ([\#16960](element-hq/synapse#16960))
* Bump dawidd6/action-download-artifact from 3.1.2 to 3.1.4. ([\#17008](element-hq/synapse#17008))
* Bump jinja2 from 3.1.2 to 3.1.3. ([\#17005](element-hq/synapse#17005))
* Bump log from 0.4.20 to 0.4.21. ([\#16977](element-hq/synapse#16977))
* Bump mypy from 1.5.1 to 1.8.0. ([\#16901](element-hq/synapse#16901))
* Bump netaddr from 0.9.0 to 1.2.1. ([\#17006](element-hq/synapse#17006))
* Bump pydantic from 2.6.0 to 2.6.4. ([\#17004](element-hq/synapse#17004))
* Bump pyo3 from 0.20.2 to 0.20.3. ([\#16962](element-hq/synapse#16962))
* Bump ruff from 0.1.14 to 0.3.2. ([\#16994](element-hq/synapse#16994))
* Bump serde from 1.0.196 to 1.0.197. ([\#16963](element-hq/synapse#16963))
* Bump serde_json from 1.0.113 to 1.0.114. ([\#16961](element-hq/synapse#16961))
* Bump types-jsonschema from 4.21.0.20240118 to 4.21.0.20240311. ([\#17007](element-hq/synapse#17007))
* Bump types-psycopg2 from 2.9.21.16 to 2.9.21.20240311. ([\#16995](element-hq/synapse#16995))
* Bump types-pyopenssl from 23.3.0.0 to 24.0.0.20240311. ([\#17003](element-hq/synapse#17003))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Synapse does not attempt to send events in the device outbox at startup
2 participants