Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Conversation

MadLittleMods
Copy link
Contributor

@MadLittleMods MadLittleMods commented Aug 26, 2022

Fix rate limit gauge metrics registering twice and misreporting by registering once and adding a label to differentiate the instances (although we only track the federation servlets at the moment). The gauge metrics were originally added in #13541

FederationRateLimiter is being instantiated twice in Synapse:

  1. UsernameAvailabilityRestServlet which uses it to rate limit people by IP
  2. For the Federation servlets which FederationRateLimiter was made for

Fix #13641

Dev notes

https://github.com/matrix-org/synapse/blob/c4e29b6908ac8ae57b5e9a3e7662ad638b61e94a/docs/metrics-howto.md

docker run \
    -p 9090:9090 \
    -v /Users/eric/Documents/github/element/.configs/prometheus.yml:/etc/prometheus/prometheus.yml \
    prom/prometheus

prometheus.yml

global:
  scrape_interval:     15s # By default, scrape targets every 15 seconds.

scrape_configs:
  - job_name: "synapse"
    scrape_interval: 15s
    metrics_path: "/_synapse/metrics"
    static_configs:
      - targets: ["host.docker.internal:8008"]

Pull Request Checklist

  • Pull request is based on the develop branch
  • Pull request includes a changelog file. The entry should:
    • Be a short description of your change which makes sense to users. "Fixed a bug that prevented receiving messages from other servers." instead of "Moved X method from EventStore to EventWorkerStore.".
    • Use markdown where necessary, mostly for code blocks.
    • End with either a period (.) or an exclamation mark (!).
    • Start with a capital letter.
    • Feel free to credit yourself, by adding a sentence "Contributed by @github_username." or "Contributed by [Your Name]." to the end of the entry.
  • Pull request includes a sign off
  • Code style is correct
    (run the linters)

LaterGauge(
"synapse_rate_limit_sleep_affected_hosts",
"Number of hosts that had requests put to sleep",
["rate_limiter_name"],
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made the label rate_limiter_name instead of name because name seemed pretty ambiguous in the context of synapse_rate_limit_sleep_affected_hosts. Trying to clarify the question "name of what?"

@MadLittleMods MadLittleMods added T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues. A-Metrics metrics, measures, stuff we put in Prometheus labels Aug 26, 2022
Comment on lines +89 to +114
_rate_limiter_instances: Set["FederationRateLimiter"] = set()
# Protects the _rate_limiter_instances set from concurrent access
_rate_limiter_instances_lock = threading.Lock()


def _get_counts_from_rate_limiter_instance(
count_func: Callable[["FederationRateLimiter"], int]
) -> Mapping[Tuple[str, ...], int]:
"""Returns a count of something (slept/rejected hosts) by (metrics_name)"""
# Cast to a list to prevent it changing while the Prometheus
# thread is collecting metrics
with _rate_limiter_instances_lock:
rate_limiter_instances = list(_rate_limiter_instances)

# Map from (metrics_name,) -> int, the number of something like slept hosts
# or rejected hosts. The key type is Tuple[str], but we leave the length
# unspecified for compatability with LaterGauge's annotations.
counts: Dict[Tuple[str, ...], int] = {}
for rate_limiter_instance in rate_limiter_instances:
# Only track metrics if they provided a `metrics_name` to
# differentiate this instance of the rate limiter.
if rate_limiter_instance.metrics_name:
key = (rate_limiter_instance.metrics_name,)
counts[key] = count_func(rate_limiter_instance)

return counts
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the same pattern as

_in_flight_requests: Set["RequestMetrics"] = set()
# Protects the _in_flight_requests set from concurrent access
_in_flight_requests_lock = threading.Lock()
def _get_in_flight_counts() -> Mapping[Tuple[str, ...], int]:
"""Returns a count of all in flight requests by (method, server_name)"""
# Cast to a list to prevent it changing while the Prometheus
# thread is collecting metrics
with _in_flight_requests_lock:
reqs = list(_in_flight_requests)
for rm in reqs:
rm.update_metrics()
# Map from (method, name) -> int, the number of in flight requests of that
# type. The key type is Tuple[str, str], but we leave the length unspecified
# for compatability with LaterGauge's annotations.
counts: Dict[Tuple[str, ...], int] = {}
for rm in reqs:
key = (rm.method, rm.name)
counts[key] = counts.get(key, 0) + 1
return counts
LaterGauge(
"synapse_http_server_in_flight_requests_count",
"",
["method", "servlet"],
_get_in_flight_counts,
)

Comment on lines +266 to +269
maybe_metrics_cm: ContextManager = contextlib.nullcontext()
if self.metrics_name:
maybe_metrics_cm = queue_wait_timer.labels(self.metrics_name).time()
with start_active_span("ratelimit wait"), maybe_metrics_cm:
Copy link
Contributor Author

@MadLittleMods MadLittleMods Aug 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slightly worried about context issues here again just because it's going a bit off the beaten path (please double-check) 😰

I've made metrics_name optional because I'm not concerned about tracking the UsernameAvailabilityRestServlet which is used to rate limit people by IP. I only care about tracking the federation servlets which FederationRateLimiter was made for.

We could make metrics_name required and record metrics for the username availability servlet to get rid of this complexity though.

@MadLittleMods MadLittleMods marked this pull request as ready for review August 26, 2022 20:25
@MadLittleMods MadLittleMods requested a review from a team as a code owner August 26, 2022 20:25
@DMRobertson DMRobertson changed the base branch from develop to release-v1.66 August 30, 2022 09:18
@DMRobertson DMRobertson force-pushed the madlittlemods/13641-fix-rate-limit-metrics-registering-twice-and-misreporting branch from d8ea855 to 7921b1c Compare August 30, 2022 09:24
Copy link
Member

@richvdh richvdh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm otherwise

synapse/util/ratelimitutils.py Outdated Show resolved Hide resolved
Copy link
Contributor

@DMRobertson DMRobertson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about this @richvdh @MadLittleMods?

synapse/util/ratelimitutils.py Outdated Show resolved Hide resolved
synapse/util/ratelimitutils.py Show resolved Hide resolved
@richvdh
Copy link
Member

richvdh commented Aug 30, 2022

@DMRobertson: sgtm. Shall we just merge that and get an RC out?

@DMRobertson
Copy link
Contributor

Let's do it.

@DMRobertson: sgtm. Shall we just merge that and get an RC out?

@DMRobertson
Copy link
Contributor

Given that [b077d81](https://github.com/matrix-org/synapse/pull/13649/commits/b077d815e95da208ed981338793ded9d21c2e644) only changes docstrings and that its parent passed CI, I am going to merge without CI.

@DMRobertson DMRobertson merged commit 1eea73b into release-v1.66 Aug 30, 2022
@DMRobertson DMRobertson deleted the madlittlemods/13641-fix-rate-limit-metrics-registering-twice-and-misreporting branch August 30, 2022 11:08
@DMRobertson
Copy link
Contributor

I sanity checked this by grepping for "already registered" in logs. No matches from today.

@MadLittleMods
Copy link
Contributor Author

MadLittleMods commented Aug 30, 2022

Thanks for the review @richvdh and @DMRobertson for making the changes to make this mergeable and confirming the duplicate registering message wasn't in the logs 🐸

Fizzadar added a commit to beeper/synapse-legacy-fork that referenced this pull request Sep 1, 2022
Synapse 1.66.0 (2022-08-31)
===========================

No significant changes since 1.66.0rc2.

This release removes the ability for homeservers to delegate email ownership
verification and password reset confirmation to identity servers. This removal
was originally planned for Synapse 1.64, but was later deferred until now. See
the [upgrade notes](https://matrix-org.github.io/synapse/v1.66/upgrade.html#upgrading-to-v1660) for more details.

Deployments with multiple workers should note that the direct TCP replication
configuration was deprecated in Synapse v1.18.0 and will be removed in Synapse
v1.67.0. In particular, the TCP `replication` [listener](https://matrix-org.github.io/synapse/v1.66/usage/configuration/config_documentation.html#listeners)
type (not to be confused with the `replication` resource on the `http` listener
type) and the `worker_replication_port` config option will be removed .

To migrate to Redis, add the [`redis` config](https://matrix-org.github.io/synapse/v1.66/workers.html#shared-configuration),
then remove the TCP `replication` listener from config of the master and
`worker_replication_port` from worker config. Note that a HTTP listener with a
`replication` resource is still required. See the
[worker documentation](https://matrix-org.github.io/synapse/v1.66/workers.html)
for more details.

Synapse 1.66.0rc2 (2022-08-30)
==============================

Bugfixes
--------

- Fix a bug introduced in Synapse 1.66.0rc1 where the new rate limit metrics were misreported (`synapse_rate_limit_sleep_affected_hosts`, `synapse_rate_limit_reject_affected_hosts`). ([\matrix-org#13649](matrix-org#13649))

Synapse 1.66.0rc1 (2022-08-23)
==============================

Features
--------

- Improve validation of request bodies for the following client-server API endpoints: [`/account/password`](https://spec.matrix.org/v1.3/client-server-api/#post_matrixclientv3accountpassword), [`/account/password/email/requestToken`](https://spec.matrix.org/v1.3/client-server-api/#post_matrixclientv3accountpasswordemailrequesttoken), [`/account/deactivate`](https://spec.matrix.org/v1.3/client-server-api/#post_matrixclientv3accountdeactivate) and [`/account/3pid/email/requestToken`](https://spec.matrix.org/v1.3/client-server-api/#post_matrixclientv3account3pidemailrequesttoken). ([\matrix-org#13188](matrix-org#13188), [\matrix-org#13563](matrix-org#13563))
- Add forgotten status to [Room Details Admin API](https://matrix-org.github.io/synapse/latest/admin_api/rooms.html#room-details-api). ([\matrix-org#13503](matrix-org#13503))
- Add an experimental implementation for [MSC3852 (Expose user agents on `Device`)](matrix-org/matrix-spec-proposals#3852). ([\matrix-org#13549](matrix-org#13549))
- Add `org.matrix.msc2716v4` experimental room version with updated content fields. Part of [MSC2716 (Importing history)](matrix-org/matrix-spec-proposals#2716).  ([\matrix-org#13551](matrix-org#13551))
- Add support for compression to federation responses. ([\matrix-org#13537](matrix-org#13537))
- Improve performance of sending messages in rooms with thousands of local users. ([\matrix-org#13522](matrix-org#13522), [\matrix-org#13547](matrix-org#13547))

Bugfixes
--------

- Faster room joins: make `/joined_members` block whilst the room is partial stated. ([\matrix-org#13514](matrix-org#13514))
- Fix a bug introduced in Synapse 1.21.0 where the [`/event_reports` Admin API](https://matrix-org.github.io/synapse/develop/admin_api/event_reports.html) could return a total count which was larger than the number of results you can actually query for. ([\matrix-org#13525](matrix-org#13525))
- Fix a bug introduced in Synapse 1.52.0 where sending server notices fails if `max_avatar_size` or `allowed_avatar_mimetypes` is set and not `system_mxid_avatar_url`. ([\matrix-org#13566](matrix-org#13566))
- Fix a bug where the `opentracing.force_tracing_for_users` config option would not apply to [`/sendToDevice`](https://spec.matrix.org/v1.3/client-server-api/#put_matrixclientv3sendtodeviceeventtypetxnid) and [`/keys/upload`](https://spec.matrix.org/v1.3/client-server-api/#post_matrixclientv3keysupload) requests. ([\matrix-org#13574](matrix-org#13574))

Improved Documentation
----------------------

- Add `openssl` example for generating registration HMAC digest. ([\matrix-org#13472](matrix-org#13472))
- Tidy up Synapse's README. ([\matrix-org#13491](matrix-org#13491))
- Document that event purging related to the `redaction_retention_period` config option is executed only every 5 minutes. ([\matrix-org#13492](matrix-org#13492))
- Add a warning to retention documentation regarding the possibility of database corruption. ([\matrix-org#13497](matrix-org#13497))
- Document that the `DOCKER_BUILDKIT=1` flag is needed to build the docker image. ([\matrix-org#13515](matrix-org#13515))
- Add missing links in `user_consent` section of configuration manual. ([\matrix-org#13536](matrix-org#13536))
- Fix the doc and some warnings that were referring to the nonexistent `custom_templates_directory` setting (instead of `custom_template_directory`). ([\matrix-org#13538](matrix-org#13538))

Deprecations and Removals
-------------------------

- Remove the ability for homeservers to delegate email ownership verification
  and password reset confirmation to identity servers. See [upgrade notes](https://matrix-org.github.io/synapse/v1.66/upgrade.html#upgrading-to-v1660) for more details.

Internal Changes
----------------

- Update the rejected state of events during de-partial-stating. ([\matrix-org#13459](matrix-org#13459))
- Avoid blocking lazy-loading `/sync`s during partial joins due to remote memberships. Pull remote memberships from auth events instead of the room state. ([\matrix-org#13477](matrix-org#13477))
- Refuse to start when faster joins is enabled on a deployment with workers, since worker configurations are not currently supported. ([\matrix-org#13531](matrix-org#13531))

- Allow use of both `@trace` and `@tag_args` stacked on the same function. ([\matrix-org#13453](matrix-org#13453))
- Instrument the federation/backfill part of `/messages` for understandable traces in Jaeger. ([\matrix-org#13489](matrix-org#13489))
- Instrument `FederationStateIdsServlet` (`/state_ids`) for understandable traces in Jaeger. ([\matrix-org#13499](matrix-org#13499), [\matrix-org#13554](matrix-org#13554))
- Track HTTP response times over 10 seconds from `/messages` (`synapse_room_message_list_rest_servlet_response_time_seconds`). ([\matrix-org#13533](matrix-org#13533))
- Add metrics to track how the rate limiter is affecting requests (sleep/reject). ([\matrix-org#13534](matrix-org#13534), [\matrix-org#13541](matrix-org#13541))
- Add metrics to time how long it takes us to do backfill processing (`synapse_federation_backfill_processing_before_time_seconds`, `synapse_federation_backfill_processing_after_time_seconds`). ([\matrix-org#13535](matrix-org#13535), [\matrix-org#13584](matrix-org#13584))
- Add metrics to track rate limiter queue timing (`synapse_rate_limit_queue_wait_time_seconds`). ([\matrix-org#13544](matrix-org#13544))
- Update metrics to track `/messages` response time by room size. ([\matrix-org#13545](matrix-org#13545))

- Refactor methods in `synapse.api.auth.Auth` to use `Requester` objects everywhere instead of user IDs. ([\matrix-org#13024](matrix-org#13024))
- Clean-up tests for notifications. ([\matrix-org#13471](matrix-org#13471))
- Add some miscellaneous comments to document sync, especially around `compute_state_delta`. ([\matrix-org#13474](matrix-org#13474))
- Use literals in place of `HTTPStatus` constants in tests. ([\matrix-org#13479](matrix-org#13479), [\matrix-org#13488](matrix-org#13488))
- Add comments about how event push actions are rotated. ([\matrix-org#13485](matrix-org#13485))
- Modify HTML template content to better support mobile devices' screen sizes. ([\matrix-org#13493](matrix-org#13493))
- Add a linter script which will reject non-strict types in Pydantic models. ([\matrix-org#13502](matrix-org#13502))
- Reduce the number of tests using legacy TCP replication. ([\matrix-org#13543](matrix-org#13543))
- Allow specifying additional request fields when using the `HomeServerTestCase.login` helper method. ([\matrix-org#13549](matrix-org#13549))
- Make `HomeServerTestCase` load any configured homeserver modules automatically. ([\matrix-org#13558](matrix-org#13558))

# -----BEGIN PGP SIGNATURE-----
#
# iQGzBAABCgAdFiEEWMTnW8Z8khaaf90R+84KzgcyGG8FAmMPT8QACgkQ+84Kzgcy
# GG9CUAv+Pv/iDpE2jKlV7zQ/cagaKCGsFK5jy0+K9Wr215nP89tuhU37bJXsgvVu
# GP3A8k1c/ENPhXwYHLCnnxV3jick1FuVE0W6h0j2PMYeIGNCQhDswytnsQO4JExg
# fGLL4ygCzpe8bFX9+mhIM4z8xkZjZX3lIa8CN2LtRLIo0m7qoT1ZWqdt7kAjj5yL
# XMk+3Y1yq/Y4SHHqgKurBNdwNcwnv7ynchWxTYa12WVTINt26dLV0Syk3p8u2SLl
# 5YNzcDs2TAM7+VxAu7E0AQl426+Ufi122Oj1ZBUG2FxTPLH8Xr18cN2M/at6WxoX
# 8pOkGiuahKKvahw1iCoHAGIC66gFIPxBE9xW4R2SKrQtG4sDuKJI0kvunRV8+cy5
# TuJ9cmdDmJR2vj3P3OULqLXGkWsGNJqfZZF8OWkHEI8LUIXZLrAZocFtlonkr9rV
# Y8r8LxL8Id1rbHAnCXcJnYdaJ6ol0RIObDFpitY/D8BDUONVw/byeOyAEkq/XPrZ
# Ke/9K8sy
# =eg1L
# -----END PGP SIGNATURE-----
# gpg: Signature made Wed Aug 31 13:10:44 2022 BST
# gpg:                using RSA key 58C4E75BC67C92169A7FDD11FBCE0ACE0732186F
# gpg: Can't check signature: No public key

# Conflicts:
#	synapse/api/auth.py
#	synapse/push/baserules.py
#	synapse/push/bulk_push_rule_evaluator.py
#	synapse/push/push_rule_evaluator.py
#	synapse/storage/databases/main/event_push_actions.py
#	tests/server_notices/test_resource_limits_server_notices.py
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A-Metrics metrics, measures, stuff we put in Prometheus T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

synapse_rate_limit_sleep_affected_hosts registered twice
3 participants