Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

federation: Server without IPv6 support retries failed IPv6 connections without back off, spamming .well-known lookups many times a second #3291

Open
Ablu opened this issue Dec 19, 2023 · 1 comment

Comments

@Ablu
Copy link

Ablu commented Dec 19, 2023

Background information

  • Dendrite version or git SHA: 0.13.5+b7054f4
  • SQLite3 or Postgres?: unknown
  • Running in Docker?: yes
  • go version: unknown
  • Client used (if applicable): na

Description

  • What is the problem:

If a server does not support IPv6, but another server is only reachable via IPv6 connections are attempted in a tight loop without any back-off. This results in many .well-known lookups on the target server per second.

  • Who is affected: The IPv6-only server is getting hammered by .well-known request non-stop. The IPv4-only server gets a huge amount of log errors
  • How is this bug manifesting: Spam in logs, rate-limiting kicking in and DDoSing the target server.
  • When did this first appear: unknown, noticed it when I got rate limited by my webhosting provider that hosted the .well-known file.

Steps to reproduce

  • Run dendrite on a IPv4 only host
  • Attempt federation with an IPv6 only host (For example, message @ablu:ablu.org)
  • Observe non-stop log errors:
time="2023-12-18T15:14:28.942253091Z" level=debug msg="Error sending request to https://matrix.ablu.org:443/_matrix/key/v2/server: dial tcp [2001:9e8:d58b:8100:51be:e7e4:9b58:7507]:443: connect: network is unreachable" out.req.ID=lpqzvlnCpkM4 out.req.method=GET out.req.uri="matrix://ablu.org/_matrix/key/v2/server" req.id=1tjUd9EkbTnE req.method=PUT req.path=/_matrix/federation/v1/send/1699241410034
time="2023-12-18T15:14:28.949413244Z" level=debug msg="Error sending request to https://matrix.ablu.org:443/_matrix/key/v2/server: dial tcp [2001:9e8:d58b:8100:51be:e7e4:9b58:7507]:443: connect: network is unreachable" out.req.ID=lpqzvlnCpkM4 out.req.method=GET out.req.uri="matrix://ablu.org/_matrix/key/v2/server" req.id=1tjUd9EkbTnE req.method=PUT req.path=/_matrix/federation/v1/send/1699241410034
time="2023-12-18T15:14:28.949483724Z" level=debug msg="Outgoing request failed" error="Get \"matrix://ablu.org/_matrix/key/v2/server\": dial tcp [2001:9e8:d58b:8100:51be:e7e4:9b58:7507]:443: connect: network is unreachable" out.req.ID=lpqzvlnCpkM4 out.req.method=GET out.req.uri="matrix://ablu.org/_matrix/key/v2/server" req.id=1tjUd9EkbTnE req.method=PUT req.path=/_matrix/federation/v1/send/1699241410034
time="2023-12-18T15:14:28.979877904Z" level=debug msg="Error sending request to https://matrix.ablu.org:443/_matrix/key/v2/query: dial tcp [2001:9e8:d58b:8100:51be:e7e4:9b58:7507]:443: connect: network is unreachable" out.req.ID=av7ZmKa6ZhFR out.req.method=POST out.req.uri="matrix://ablu.org/_matrix/key/v2/query" req.id=1tjUd9EkbTnE req.method=PUT req.path=/_matrix/federation/v1/send/1699241410034
time="2023-12-18T15:14:28.981335511Z" level=debug msg="Error sending request to https://matrix.ablu.org:443/_matrix/key/v2/query: dial tcp [2001:9e8:d58b:8100:51be:e7e4:9b58:7507]:443: connect: network is unreachable" out.req.ID=av7ZmKa6ZhFR out.req.method=POST out.req.uri="matrix://ablu.org/_matrix/key/v2/query" req.id=1tjUd9EkbTnE req.method=PUT req.path=/_matrix/federation/v1/send/1699241410034
  • Observe matching .well-known lookups on the target server

Expected behaviour:

The connection should fail and exponentially back off. The .well-known entry should probably be cached as well.

/cc @davralin

@davralin
Copy link

So, the dendrite-server in question is mine, and the server it targeted is @Ablu 's.

To fill in the gaps:

  • Postgresql-database.
  • Same go-version as is present in ghcr.io/matrix-org/dendrite-monolith:v0.13.5.
  • Container is running on a talos-node in Oracle Cloud (free tier), which is overwritten from a "normal" linux install.
    I never bothered with IPv6-connectivity there, so nothing is configured on the host, IPv6-wize.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants