Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Add experimental configuration option to allow disabling legacy Prometheus metric names. #13540

Merged
merged 16 commits into from
Aug 24, 2022

Conversation

reivilibre
Copy link
Contributor

@reivilibre reivilibre commented Aug 16, 2022

This is a WIP and might be the first step of #11106.

1. Use prometheus-client's exporter, not our own vendored one

Simply by changing over to prometheus-client's exporter, rather than our vendored one, the following metrics are changed:

Details
--- fm_leg_pc.txt2022-08-16 16:54:53.307316377 +0100
+++ fm_noleg_nocre.txt2022-08-16 16:54:57.379336740 +0100
-process_cpu_seconds
-python_gc_collections
-python_gc_objects_collected
-python_gc_objects_uncollectable
-synapse_admin_mau_current
-synapse_admin_mau_max
-synapse_admin_mau_registered_reserved_users
-synapse_background_process_db_sched_duration_seconds
-synapse_background_process_db_txn_count
-synapse_background_process_db_txn_duration_seconds
-synapse_background_process_ru_stime_seconds
-synapse_background_process_ru_utime_seconds
-synapse_background_process_start_count
-synapse_federation_client_events_processed
-synapse_federation_client_sent_edus
-synapse_federation_client_sent_pdu_destinations:count
-synapse_federation_client_sent_pdu_destinations_count_total
-synapse_federation_client_sent_pdu_destinations:total
-synapse_federation_client_sent_pdu_destinations_total
+synapse_federation_client_sent_pdu_destinations:count_total
-synapse_federation_client_sent_transactions
+synapse_federation_client_sent_pdu_destinations:total_total
-synapse_federation_server_number_inbound_pdu_pruned
-synapse_federation_server_received_edus
-synapse_federation_server_received_pdus
-synapse_federation_soft_failed_events
-synapse_handler_presence_bump_active_time
-synapse_handler_presence_federation_presence
-synapse_handler_presence_federation_presence_out
-synapse_handler_presence_notified_presence
-synapse_handler_presence_presence_updates
-synapse_handler_presence_timers_fired
-synapse_handlers_appservice_events_processed
-synapse_http_httppusher_badge_updates_failed
-synapse_http_httppusher_badge_updates_processed
-synapse_http_httppusher_http_pushes_failed
-synapse_http_httppusher_http_pushes_processed
-synapse_notifier_notified_events
-synapse_push_bulk_push_rule_evaluator_push_rules_invalidation_counter
-synapse_push_bulk_push_rule_evaluator_push_rules_state_size_counter
-synapse_replication_tcp_resource_federation_ack
-synapse_replication_tcp_resource_remove_pusher
-synapse_replication_tcp_resource_user_ip_cache
-synapse_replication_tcp_resource_user_sync
-synapse_state_res_cpu_for_biggest_room_seconds
-synapse_state_res_db_for_biggest_room_seconds
-synapse_storage_events_persisted_events
-synapse_storage_events_potential_times_prune_extremities
-synapse_storage_events_state_delta
-synapse_storage_events_state_delta_reuse_delta
-synapse_storage_events_state_delta_single_event
-synapse_storage_events_state_resolutions_during_persistence
-synapse_storage_events_times_pruned_extremities
-synapse_storage_transaction_time_count
-synapse_storage_transaction_time_sum
-synapse_user_logins
-synapse_user_registrations
-synapse_util_caches_cache
-synapse_util_caches_cache:evicted_size
-synapse_util_caches_cache_evicted_size
-synapse_util_caches_cache:hits
-synapse_util_caches_cache_hits
-synapse_util_caches_cache_max_size
-synapse_util_caches_cache_pending
-synapse_util_caches_cache:size
-synapse_util_caches_cache_size
-synapse_util_caches_cache:total
-synapse_util_caches_response_cache
-synapse_util_caches_response_cache:evicted_size
-synapse_util_caches_response_cache_evicted_size
-synapse_util_caches_response_cache:hits
-synapse_util_caches_response_cache_hits
-synapse_util_caches_response_cache:size
-synapse_util_caches_response_cache_size
-synapse_util_caches_response_cache:total
-synapse_util_metrics_block_count
-synapse_util_metrics_block_db_sched_duration_seconds
-synapse_util_metrics_block_db_txn_count
-synapse_util_metrics_block_db_txn_duration_seconds
-synapse_util_metrics_block_ru_stime_seconds
-synapse_util_metrics_block_ru_utime_seconds
-synapse_util_metrics_block_time_seconds

(N.B. I don't think the removal of both :hits and _hits is what we want — I suspect this is cutting them out because we need to patch up metrics to not contain colons ourselves.)

(N.B. +synapse_federation_client_sent_pdu_destinations:count_total and +synapse_federation_client_sent_pdu_destinations:total_total seem 'undesirable' to say the least :-)).

2. Massage metrics with colons in their names so that we return the underscored variant in the modern exporter and both in the legacy exporter.

We do this by renaming all colon-containing metrics to use underscores instead, but then patch them up in the legacy exporter so they appear to have the same name.

I have verified that this means that, in the legacy configuration, we have the same metrics as before this change.

3. Removing _created metrics from the prometheus-client exporter.

To do this normally, you have to set an environment variable and you need version ≥ 0.14.0, but I don't want to require users to set one themselves.

Instead, I poke the private flag in prometheus-client if it exists. I realise this is brittle so I ensure that the world won't end if it's not there, plus I add a test so that we'll notice.
I'm not super keen but I don't have any grand ideas either.

...

This PR should be commit-by-commit reviewable.

@reivilibre reivilibre force-pushed the rei/experimental_disable_legacy_metric_names branch from ad7abc4 to 04d7065 Compare August 16, 2022 16:39
@reivilibre reivilibre force-pushed the rei/experimental_disable_legacy_metric_names branch from 04d7065 to 3abf3b6 Compare August 16, 2022 16:45
@reivilibre reivilibre marked this pull request as ready for review August 17, 2022 15:15
@reivilibre reivilibre requested a review from a team as a code owner August 17, 2022 15:15
@reivilibre reivilibre force-pushed the rei/experimental_disable_legacy_metric_names branch from b9bd286 to 07ceff7 Compare August 17, 2022 15:19
@clokep clokep requested a review from a team August 18, 2022 11:39
@clokep
Copy link
Contributor

clokep commented Aug 18, 2022

(I didn't review this, just commented on the phrasing of the PR.)

@reivilibre reivilibre force-pushed the rei/experimental_disable_legacy_metric_names branch from 07ceff7 to c8434ce Compare August 18, 2022 11:47
Copy link
Contributor

@DMRobertson DMRobertson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few thoughts. No major objections though. WDYT?

docs/usage/configuration/config_documentation.md Outdated Show resolved Hide resolved
docs/usage/configuration/config_documentation.md Outdated Show resolved Hide resolved
synapse/metrics/_exposition.py Show resolved Hide resolved
tests/test_metrics.py Outdated Show resolved Hide resolved
tests/test_metrics.py Show resolved Hide resolved
docs/usage/configuration/config_documentation.md Outdated Show resolved Hide resolved
Copy link
Contributor

@DMRobertson DMRobertson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM!

@reivilibre reivilibre enabled auto-merge (squash) August 24, 2022 11:06
@reivilibre reivilibre merged commit be4250c into develop Aug 24, 2022
@reivilibre reivilibre deleted the rei/experimental_disable_legacy_metric_names branch August 24, 2022 11:35
@reivilibre reivilibre self-assigned this Sep 1, 2022
Fizzadar added a commit to beeper/synapse-legacy-fork that referenced this pull request Sep 15, 2022
Synapse 1.67.0 (2022-09-13)
===========================

This release removes using the deprecated direct TCP replication configuration
for workers. Server admins should use Redis instead. See the [upgrade
notes](https://matrix-org.github.io/synapse/v1.67/upgrade.html#upgrading-to-v1670).

The minimum version of `poetry` supported for managing source checkouts is now
1.2.0.

**Notice:** from the next major release (1.68.0) installing Synapse from a source
checkout will require a recent Rust compiler. Those using packages or
`pip install matrix-synapse` will not be affected. See the [upgrade
notes](https://matrix-org.github.io/synapse/v1.67/upgrade.html#upgrading-to-v1670).

**Notice:** from the next major release (1.68.0), running Synapse with a SQLite
database will require SQLite version 3.27.0 or higher. (The [current minimum
 version is SQLite 3.22.0](https://github.com/matrix-org/synapse/blob/release-v1.67/synapse/storage/engines/sqlite.py#L69-L78).)
See [matrix-org#12983](matrix-org#12983) and the [upgrade notes](https://matrix-org.github.io/synapse/v1.67/upgrade.html#upgrading-to-v1670) for more details.

No significant changes since 1.67.0rc1.

Synapse 1.67.0rc1 (2022-09-06)
==============================

Features
--------

- Support setting the registration shared secret in a file, via a new `registration_shared_secret_path` configuration option. ([\matrix-org#13614](matrix-org#13614))
- Change the default startup behaviour so that any missing "additional" configuration files (signing key, etc) are generated automatically. ([\matrix-org#13615](matrix-org#13615))
- Improve performance of sending messages in rooms with thousands of local users. ([\matrix-org#13634](matrix-org#13634))

Bugfixes
--------

- Fix a bug introduced in Synapse 1.13 where the [List Rooms admin API](https://matrix-org.github.io/synapse/develop/admin_api/rooms.html#list-room-api) would return integers instead of booleans for the `federatable` and `public` fields when using a Sqlite database. ([\matrix-org#13509](matrix-org#13509))
- Fix bug that user cannot `/forget` rooms after the last member has left the room. ([\matrix-org#13546](matrix-org#13546))
- Faster Room Joins: fix `/make_knock` blocking indefinitely when the room in question is a partial-stated room. ([\matrix-org#13583](matrix-org#13583))
- Fix loading the current stream position behind the actual position. ([\matrix-org#13585](matrix-org#13585))
- Fix a longstanding bug in `register_new_matrix_user` which meant it was always necessary to explicitly give a server URL. ([\matrix-org#13616](matrix-org#13616))
- Fix the running of [MSC1763](matrix-org/matrix-spec-proposals#1763) retention purge_jobs in deployments with background jobs running on a worker by forcing them back onto the main worker. Contributed by Brad @ Beeper. ([\matrix-org#13632](matrix-org#13632))
- Fix a long-standing bug that downloaded media for URL previews was not deleted while database background updates were running. ([\matrix-org#13657](matrix-org#13657))
- Fix [MSC3030](matrix-org/matrix-spec-proposals#3030) `/timestamp_to_event` endpoint to return the correct next event when the events have the same timestamp. ([\matrix-org#13658](matrix-org#13658))
- Fix bug where we wedge media plugins if clients disconnect early. Introduced in v1.22.0. ([\matrix-org#13660](matrix-org#13660))
- Fix a long-standing bug which meant that keys for unwhitelisted servers were not returned by `/_matrix/key/v2/query`. ([\matrix-org#13683](matrix-org#13683))
- Fix a bug introduced in Synapse v1.20.0 that would cause the unstable unread counts from [MSC2654](matrix-org/matrix-spec-proposals#2654) to be calculated even if the feature is disabled. ([\matrix-org#13694](matrix-org#13694))

Updates to the Docker image
---------------------------

- Update docker image to use a stable version of poetry. ([\matrix-org#13688](matrix-org#13688))

Improved Documentation
----------------------

- Improve the description of the ["chain cover index"](https://matrix-org.github.io/synapse/latest/auth_chain_difference_algorithm.html) used internally by Synapse. ([\matrix-org#13602](matrix-org#13602))
- Document how ["monthly active users"](https://matrix-org.github.io/synapse/latest/usage/administration/monthly_active_users.html) is calculated and used. ([\matrix-org#13617](matrix-org#13617))
- Improve documentation around user registration. ([\matrix-org#13640](matrix-org#13640))
- Remove documentation of legacy `frontend_proxy` worker app. ([\matrix-org#13645](matrix-org#13645))
- Clarify documentation that HTTP replication traffic can be protected with a shared secret. ([\matrix-org#13656](matrix-org#13656))
- Remove unintentional colons from [config manual](https://matrix-org.github.io/synapse/latest/usage/configuration/config_documentation.html) headers. ([\matrix-org#13665](matrix-org#13665))
- Update docs to make enabling metrics more clear. ([\matrix-org#13678](matrix-org#13678))
- Clarify `(room_id, event_id)` global uniqueness and how we should scope our database schemas. ([\matrix-org#13701](matrix-org#13701))

Deprecations and Removals
-------------------------

- Drop support for calling `/_matrix/client/v3/rooms/{roomId}/invite` without an `id_access_token`, which was not permitted by the spec. Contributed by @Vetchu. ([\matrix-org#13241](matrix-org#13241))
- Remove redundant `_get_joined_users_from_context` cache. Contributed by Nick @ Beeper (@Fizzadar). ([\matrix-org#13569](matrix-org#13569))
- Remove the ability to use direct TCP replication with workers. Direct TCP replication was deprecated in Synapse v1.18.0. Workers now require using Redis. ([\matrix-org#13647](matrix-org#13647))
- Remove support for unstable [private read receipts](matrix-org/matrix-spec-proposals#2285). ([\matrix-org#13653](matrix-org#13653), [\matrix-org#13692](matrix-org#13692))

Internal Changes
----------------

- Extend the release script to wait for GitHub Actions to finish and to be usable as a guide for the whole process. ([\matrix-org#13483](matrix-org#13483))
- Add experimental configuration option to allow disabling legacy Prometheus metric names. ([\matrix-org#13540](matrix-org#13540))
- Cache user IDs instead of profiles to reduce cache memory usage. Contributed by Nick @ Beeper (@Fizzadar). ([\matrix-org#13573](matrix-org#13573), [\matrix-org#13600](matrix-org#13600))
- Optimize how Synapse calculates domains to fetch from during backfill. ([\matrix-org#13575](matrix-org#13575))
- Comment about a better future where we can get the state diff between two events. ([\matrix-org#13586](matrix-org#13586))
- Instrument `_check_sigs_and_hash_and_fetch` to trace time spent in child concurrent calls for understandable traces in Jaeger. ([\matrix-org#13588](matrix-org#13588))
- Improve performance of `@cachedList`. ([\matrix-org#13591](matrix-org#13591))
- Minor speed up of fetching large numbers of push rules. ([\matrix-org#13592](matrix-org#13592))
- Optimise push action fetching queries. Contributed by Nick @ Beeper (@Fizzadar). ([\matrix-org#13597](matrix-org#13597))
- Rename `event_map` to `unpersisted_events` when computing the auth differences. ([\matrix-org#13603](matrix-org#13603))
- Refactor `get_users_in_room(room_id)` mis-use with dedicated `get_current_hosts_in_room(room_id)` function. ([\matrix-org#13605](matrix-org#13605))
- Use dedicated `get_local_users_in_room(room_id)` function to find local users when calculating `join_authorised_via_users_server` of a `/make_join` request. ([\matrix-org#13606](matrix-org#13606))
- Refactor `get_users_in_room(room_id)` mis-use to lookup single local user with dedicated `check_local_user_in_room(...)` function. ([\matrix-org#13608](matrix-org#13608))
- Drop unused column `application_services_state.last_txn`. ([\matrix-org#13627](matrix-org#13627))
- Improve readability of Complement CI logs by printing failure results last. ([\matrix-org#13639](matrix-org#13639))
- Generalise the `@cancellable` annotation so it can be used on functions other than just servlet methods. ([\matrix-org#13662](matrix-org#13662))
- Introduce a `CommonUsageMetrics` class to share some usage metrics between the Prometheus exporter and the phone home stats. ([\matrix-org#13671](matrix-org#13671))
- Add some logging to help track down matrix-org#13444. ([\matrix-org#13679](matrix-org#13679))
- Update poetry lock file for v1.2.0. ([\matrix-org#13689](matrix-org#13689))
- Add cache to `is_partial_state_room`. ([\matrix-org#13693](matrix-org#13693))
- Update the Grafana dashboard that is included with Synapse in the `contrib` directory. ([\matrix-org#13697](matrix-org#13697))
- Only run trial CI on all python versions on non-PRs. ([\matrix-org#13698](matrix-org#13698))
- Fix typechecking with latest types-jsonschema. ([\matrix-org#13712](matrix-org#13712))
- Reduce number of CI checks we run for PRs. ([\matrix-org#13713](matrix-org#13713))

# -----BEGIN PGP SIGNATURE-----
#
# iQFEBAABCgAuFiEEBTGR3/RnAzBGUif3pULk7RsPrAkFAmMgR2QQHGVyaWtAbWF0
# cml4Lm9yZwAKCRClQuTtGw+sCfG7B/94PwW1ChsaI8hkz/3e+93PEl/mNJ6YFaEB
# 5pP4Dh/0dipP/iKbpgNuj5xz/JFnIi8D49A8sKNnku3jk0/8AZHgqDiBgOkrN76z
# Y3awo5Q9ag4xww/105V3bhdnX1NrX8Avf6F2jchDv6/9q8wQHGBPg6DMgfZ/m/BL
# SB4dypbbNpgLykuwtWxx6YMUYH+trsXJOn/MoAqld3QcZsqkDR25wXCt9+Dr+6AT
# dPd/czi8kV8ruU59tf2K5HB7XKzBW9S3Qb3dJJmGOTTJ7ccUkN/XuTwqnII950Mo
# bSlMXjY2hqk8rKUNhGZpi9bqUkwNhMgOkZl9A0Y1XtsXx6yjy0T/
# =zSGi
# -----END PGP SIGNATURE-----
# gpg: Signature made Tue Sep 13 10:03:32 2022 BST
# gpg:                using RSA key 053191DFF4670330465227F7A542E4ED1B0FAC09
# gpg:                issuer "erik@matrix.org"
# gpg: Can't check signature: No public key

# Conflicts:
#	synapse/config/experimental.py
#	synapse/push/bulk_push_rule_evaluator.py
#	synapse/storage/databases/main/event_push_actions.py
#	synapse/util/caches/descriptors.py
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants