Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add logging to tasks managed by the task scheduler, showing CPU and database usage. #17219

Merged
merged 3 commits into from
May 22, 2024

Conversation

reivilibre
Copy link
Contributor

The log format is the same as the request log format, except:

  • fields that are specific to HTTP requests have been removed
  • the task's params are included at the end of the log line.

These log lines are emitted:

  • when the task function finishes — both completion and failure (and I suppose it is possible for a task to become schedulable again?)
  • every 5 minutes whilst it is running

Closes #17217.

Signed-off-by: Olivier 'reivilibre <oliverw@matrix.org>
@reivilibre reivilibre force-pushed the rei/task_scheduler_better_logging branch from d9003ae to 3958fda Compare May 18, 2024 19:50
@reivilibre
Copy link
Contributor Author

I don't understand but I got a weird set of timings like this: 0.003sec (0.040sec, 0.006sec) (0.004sec/2.848sec/8) [0 dbevts]

0.003sec should be the wall-clock time, but then 2.848sec parallel database time across 8 transactions doesn't make sense.

Review, either from me with fresh eyes or someone else, is probably useful

@reivilibre
Copy link
Contributor Author

ah hah, I bet Clock.time returns seconds even though I've been conditioned to think all Synapse times are in milliseconds.

@reivilibre reivilibre force-pushed the rei/task_scheduler_better_logging branch from d76ffd5 to effebb3 Compare May 18, 2024 20:16
@reivilibre reivilibre marked this pull request as ready for review May 18, 2024 20:40
@reivilibre reivilibre requested a review from a team as a code owner May 18, 2024 20:40
@erikjohnston
Copy link
Member

ah hah, I bet Clock.time returns seconds even though I've been conditioned to think all Synapse times are in milliseconds.

Yeah, there's time_msec for that

"""

logger.info(
"Task %s: %.3fsec (%.3fsec, %.3fsec) (%.3fsec/%.3fsec/%d)"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably want to report whether its ongoing or finished? Are these total values or deltas?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Task %s will say 'contuining', 'completed' or 'failed' depending on which one it is. These are totals, but only for this process lifetime of Synapse. (if you restart Synapse, it might start some of these tasks again where they might essentially resume. Also I think it's possible for a task to finish in such a way that it reschedules itself later, though we don't appear to do that right now)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right yes, sorry for some reason I thought it would be Task <name> 🤦

@reivilibre reivilibre merged commit 7ef00b7 into develop May 22, 2024
38 checks passed
@reivilibre reivilibre deleted the rei/task_scheduler_better_logging branch May 22, 2024 13:12
H-Shay pushed a commit to H-Shay/hq_synapse that referenced this pull request May 31, 2024
…atabase usage. (element-hq#17219)

The log format is the same as the request log format, except:

- fields that are specific to HTTP requests have been removed
- the task's params are included at the end of the log line.

These log lines are emitted:
- when the task function finishes — both completion and failure (and I
suppose it is possible for a task to become schedulable again?)
- every 5 minutes whilst it is running

Closes element-hq#17217.

---------

Signed-off-by: Olivier 'reivilibre <oliverw@matrix.org>
Mic92 pushed a commit to Mic92/synapse that referenced this pull request Jun 14, 2024
…atabase usage. (element-hq#17219)

The log format is the same as the request log format, except:

- fields that are specific to HTTP requests have been removed
- the task's params are included at the end of the log line.

These log lines are emitted:
- when the task function finishes — both completion and failure (and I
suppose it is possible for a task to become schedulable again?)
- every 5 minutes whilst it is running

Closes element-hq#17217.

---------

Signed-off-by: Olivier 'reivilibre <oliverw@matrix.org>
netbsd-srcmastr pushed a commit to NetBSD/pkgsrc that referenced this pull request Jun 18, 2024
# Synapse 1.109.0 (2024-06-18)

- Add the ability to auto-accept invites on the behalf of users. See
  the
  [`auto_accept_invites`](https://element-hq.github.io/synapse/latest/usage/configuration/config_documentation.html#auto-accept-invites)
  config option for
  details. ([\#17147](element-hq/synapse#17147))

- Add experimental
  [MSC3575](matrix-org/matrix-spec-proposals#3575)
  Sliding Sync `/sync/e2ee` endpoint for to-device messages and device
  encryption
  info. ([\#17167](element-hq/synapse#17167))

- Support
  [MSC3916](matrix-org/matrix-spec-proposals#3916)
  by adding unstable media endpoints to
  `/_matrix/client`. ([\#17213](element-hq/synapse#17213))

- Add logging to tasks managed by the task scheduler, showing CPU and
  database
  usage. ([\#17219](element-hq/synapse#17219))


# Synapse 1.108.0 (2024-05-28)

- Add a feature that allows clients to query the configured federation
  whitelist. Disabled by
  default. ([\#16848](element-hq/synapse#16848),
  [\#17199](element-hq/synapse#17199))

- Add the ability to allow numeric user IDs with a specific prefix
  when in the CAS flow. Contributed by Aurélien
  Grimpard. ([\#17098](element-hq/synapse#17098))


Synapse 1.107.0 (2024-05-14)

- Add preliminary support for [MSC3823: Account
  Suspension](matrix-org/matrix-spec-proposals#3823).
  ([\#17051](element-hq/synapse#17051))

- Declare support for [Matrix
  v1.10](https://matrix.org/blog/2024/03/22/matrix-v1.10-release/). Contributed
  by
  @clokep. ([\#17082](element-hq/synapse#17082))

- Add support for [MSC4115: membership metadata on
  events](matrix-org/matrix-spec-proposals#4115).
  ([\#17104](element-hq/synapse#17104),
  [\#17137](element-hq/synapse#17137))


# Synapse 1.106.0 (2024-04-30)

- Send an email if the address is already bound to an user
  account. ([\#16819](element-hq/synapse#16819))

- Implement the rendezvous mechanism described by
  [MSC4108](matrix-org/matrix-spec-proposals#4108).
  ([\#17056](element-hq/synapse#17056))

- Support delegating the rendezvous mechanism described
  [MSC4108](matrix-org/matrix-spec-proposals#4108)
  to an external
  implementation. ([\#17086](element-hq/synapse#17086))
yingziwu added a commit to yingziwu/synapse that referenced this pull request Jun 26, 2024
- Fix the building of binary wheels for macOS by switching to macOS 12 CI runners. ([\#17319](element-hq/synapse#17319))

- When rolling back to a previous Synapse version and then forwards again to this release, don't require server operators to manually run SQL. ([\#17305](element-hq/synapse#17305), [\#17309](element-hq/synapse#17309))

- Use the release branch for sytest in release-branch PRs. ([\#17306](element-hq/synapse#17306))

- Fix bug where one-time-keys were not always included in `/sync` response when using workers. Introduced in v1.109.0rc1. ([\#17275](element-hq/synapse#17275))
- Fix bug where `/sync` could get stuck due to edge case in device lists handling. Introduced in v1.109.0rc1. ([\#17292](element-hq/synapse#17292))

- Add the ability to auto-accept invites on the behalf of users. See the [`auto_accept_invites`](https://element-hq.github.io/synapse/latest/usage/configuration/config_documentation.html#auto-accept-invites) config option for details. ([\#17147](element-hq/synapse#17147))
- Add experimental [MSC3575](matrix-org/matrix-spec-proposals#3575) Sliding Sync `/sync/e2ee` endpoint for to-device messages and device encryption info. ([\#17167](element-hq/synapse#17167))
- Support [MSC3916](matrix-org/matrix-spec-proposals#3916) by adding unstable media endpoints to `/_matrix/client`. ([\#17213](element-hq/synapse#17213))
- Add logging to tasks managed by the task scheduler, showing CPU and database usage. ([\#17219](element-hq/synapse#17219))

- Fix deduplicating of membership events to not create unused state groups. ([\#17164](element-hq/synapse#17164))
- Fix bug where duplicate events could be sent down sync when using workers that are overloaded. ([\#17215](element-hq/synapse#17215))
- Ignore attempts to send to-device messages to bad users, to avoid log spam when we try to connect to the bad server. ([\#17240](element-hq/synapse#17240))
- Fix handling of duplicate concurrent uploading of device one-time-keys. ([\#17241](element-hq/synapse#17241))
- Fix reporting of default tags to Sentry, such as worker name. Broke in v1.108.0. ([\#17251](element-hq/synapse#17251))
- Fix bug where typing updates would not be sent when using workers after a restart. ([\#17252](element-hq/synapse#17252))

- Update the LemonLDAP documentation to say that claims should be explicitly included in the returned `id_token`, as Synapse won't request them. ([\#17204](element-hq/synapse#17204))

- Improve DB usage when fetching related events. ([\#17083](element-hq/synapse#17083))
- Log exceptions when failing to auto-join new user according to the `auto_join_rooms` option. ([\#17176](element-hq/synapse#17176))
- Reduce work of calculating outbound device lists updates. ([\#17211](element-hq/synapse#17211))
- Improve performance of calculating device lists changes in `/sync`. ([\#17216](element-hq/synapse#17216))
- Move towards using `MultiWriterIdGenerator` everywhere. ([\#17226](element-hq/synapse#17226))
- Replaces all usages of `StreamIdGenerator` with `MultiWriterIdGenerator`. ([\#17229](element-hq/synapse#17229))
- Change the `allow_unsafe_locale` config option to also apply when setting up new databases. ([\#17238](element-hq/synapse#17238))
- Fix errors in logs about closing incorrect logging contexts when media gets rejected by a module. ([\#17239](element-hq/synapse#17239), [\#17246](element-hq/synapse#17246))
- Clean out invalid destinations from `device_federation_outbox` table. ([\#17242](element-hq/synapse#17242))
- Stop logging errors when receiving invalid User IDs in key querys requests. ([\#17250](element-hq/synapse#17250))

* Bump anyhow from 1.0.83 to 1.0.86. ([\#17220](element-hq/synapse#17220))
* Bump bcrypt from 4.1.2 to 4.1.3. ([\#17224](element-hq/synapse#17224))
* Bump lxml from 5.2.1 to 5.2.2. ([\#17261](element-hq/synapse#17261))
* Bump mypy-zope from 1.0.3 to 1.0.4. ([\#17262](element-hq/synapse#17262))
* Bump phonenumbers from 8.13.35 to 8.13.37. ([\#17235](element-hq/synapse#17235))
* Bump prometheus-client from 0.19.0 to 0.20.0. ([\#17233](element-hq/synapse#17233))
* Bump pyasn1 from 0.5.1 to 0.6.0. ([\#17223](element-hq/synapse#17223))
* Bump pyicu from 2.13 to 2.13.1. ([\#17236](element-hq/synapse#17236))
* Bump pyopenssl from 24.0.0 to 24.1.0. ([\#17234](element-hq/synapse#17234))
* Bump serde from 1.0.201 to 1.0.202. ([\#17221](element-hq/synapse#17221))
* Bump serde from 1.0.202 to 1.0.203. ([\#17232](element-hq/synapse#17232))
* Bump twine from 5.0.0 to 5.1.0. ([\#17225](element-hq/synapse#17225))
* Bump types-psycopg2 from 2.9.21.20240311 to 2.9.21.20240417. ([\#17222](element-hq/synapse#17222))
* Bump types-pyopenssl from 24.0.0.20240311 to 24.1.0.20240425. ([\#17260](element-hq/synapse#17260))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Scheduled tasks should emit log lines with timings like HTTP requests do
2 participants