Skip to content

Conversation

@tcpipuk
Copy link
Contributor

@tcpipuk tcpipuk commented Sep 4, 2024

Rendered

Signed-off-by: Tom Foster tom@tcpip.uk

@tcpipuk tcpipuk changed the title Recognising HTTP 410 Gone status during discovery MSC4188: Recognising HTTP 410 Gone status during discovery Sep 4, 2024
@turt2live turt2live added proposal A matrix spec change proposal client-server Client-Server API kind:feature MSC for not-core and not-maintenance stuff needs-implementation This MSC does not have a qualifying implementation for the SCT to review. The MSC cannot enter FCP. labels Sep 4, 2024
@tcpipuk tcpipuk closed this Sep 4, 2024
@tcpipuk tcpipuk reopened this Sep 4, 2024
Copy link
Member

@turt2live turt2live Sep 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementation requirements:

  • Server advertising 410 Gone
  • Server respecting 410 Gone

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally I think this requires a quick survey of existing servers to see if it would have any false positives? Would be curious if any deployed servers e.g. properly have DNS SRV delegation setup, but (for some reason) return HTTP 410 for .well-known.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initially I've pulled the member list of Matrix HQ, created a unique list of 9550 servers, pulled the .well-known for each one, and recorded the HTTP response status.

There were 55 that returned 410 - I've only checked a handful so far and found them all to fail https://federationtester.matrix.org (no SRV, and either timeout or reject port 8448) but I haven't checked them all yet.

I'll find time soon to check them all and provide a conclusive answer, but my initial testing suggests ex-admins are actually already using this code to signal their server is offline, so this feature may offer an immediate benefit to live servers!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about the delay!

Yeah, I haven't found a single server in HQ that returns 410 and appears to be a live Matrix server. There's a wide variety of return codes being used by servers in that room, but it would seem servers currently advertising 410 are trying to signal that they are no longer Matrix servers - there are none I could find that accept port 8448 or have working SRV delegation.

I'm not sure if that'd technically count as an existing implementation of advertising?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initially I've pulled the member list of Matrix HQ, created a unique list of 9550 servers, pulled the .well-known for each one, and recorded the HTTP response status.

There were 55 that returned 410 - I've only checked a handful so far and found them all to fail https://federationtester.matrix.org (no SRV, and either timeout or reject port 8448) but I haven't checked them all yet.

I'll find time soon to check them all and provide a conclusive answer, but my initial testing suggests ex-admins are actually already using this code to signal their server is offline, so this feature may offer an immediate benefit to live servers!

I confirm I'm one of those. I set up a 410 on my well-known in hopes it would get noticed as down. I'm still getting 1800 queries per day on that page alone

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Server advertising 410 Gone

Does this need to be a server implementation, or can any old server be serving it? I have a few domains now returning 410 Gone that I've simply thrown into a section in my reverse proxy config. I'd assume it'd be a bit silly to require an implementation to return this code given the whole point is that the implementation won't be there anymore/much longer, but the wording isn't clear.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My interpretation was that we simply need a server that respects the status code, and at least one server that advertises it - it wouldn't need to be a real Matrix server, just a server that correctly advertises that status code, as the objective is to advertise that there isn't a Matrix server here 🙂

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah well in that case, both dendrite.nexy7574.co.uk and construct.nexy7574.co.uk are returning HTTP410 (granted with some spice in the response body to try and further ward off servers that don't respect 410 yet) and I plan to decommission further servers into this same response block later for the foreseeable future. If you'd like to count that as a server responding.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this just needs a PR for a server honoring the 410 and we can call FCP on this.

@tcpipuk tcpipuk changed the title MSC4188: Recognising HTTP 410 Gone status during discovery MSC4188: Handling HTTP 410 Gone Status in Matrix Server Discovery Sep 4, 2024
If the suspension period is too long, it may delay the detection of a server that has come back
online. Conversely, too short and it may lead to unnecessary requests.

## Alternatives
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I host my .well-known files statically via GitHub Pages1 so it would be nice if I could specify something in the JSON to indicate that the homeserver is gone.

For example, maybe:

{
  "m.gone": true
}

The issue with this is that it's not really backwards compatible I think.

Footnotes

  1. I know this isn't recommended, since it doesn't set the Content-Type header correctly, but servers are required to be resilient to this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this as a possible solution, though as you say, it's technically a second separate one.

Perhaps for sanity you could propose additions/changes to the PR through a review to cover this extra type of "gone" reporting?

Then, not only can I make sure it properly caters to your use case, it'd cause you to be listed as a co-contributor on the PR 🙂

checking for SRV records and/or trying the default port.

This proposal suggests that homeservers should interpret a `410 Gone` status code as an indication
that the server is permanently offline and should suspend further discovery attempts for a period
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused - here, it is stated that servers should treat other servers as permanently offline if they receive a 410 code, but in a later review you said that it was intended for people who want to semi-permanently retire, and then further down in the proposal it is recommended that implementations should only honour the 410 status code temporarily (for 1-30 days).

Is this supposed to be used to permanently mark a server as "I'm never coming back", or "I'm not here anymore but I might be back later"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit of both!

The intention is to say "I'm permanently gone" but no one can actually own a DNS domain permanently, so we should never store that a domain is permanently un-federatable.

The intent is to offer a mechanism for someone to block federation but also potentially change their mind months/years in the future.

Copy link
Contributor

@nexy7574 nexy7574 Jun 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, in that case, could the "maximum" cache time be upped to something like 90 days? At 30 days, if you're federating with 25k servers (which isn't uncommon - my tiny synapse in 50 odd rooms already has 10k destinations), that's still potentially 300,000 requests a year best-case, but it'd be a third of that if servers only asked every quarter.

Also, correct me if I'm wrong, but if servers received a new event from a server they previously marked offline, they'll discard that "offline" status and resume standard federation operations with them, right? In that case, I don't think an extended cache lifetime will do much harm if a server decides to come back prematurely.

Of course, if this is better suited for a followup MSC, let me know. Just trying to see if I can find a happy balance between being able to mark my servers as never coming back (which they are), giving servers plenty of notice before they start getting TLS cert errors (since I won't be renewing them anymore), and also still having the opportunity to spin up a new server under the same server name later on down the line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The quoted numbers are mainly just a recommendation rather than a requirement - it's up to a server to decide how often (if ever) it wants to communicate with another, but I probably wouldn't recommend longer than 30 days in the MSC as that's already quite a long time.

The main thing I was worried about was that some implementations might not reset the timer, but you make a good point that shouldn't really be a concern. I'll revisit the wording when I'm logged in tomorrow morning!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nexy7574 how does bc23843 look?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks perfect, thanks for the clarifications @tcpipuk 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

client-server Client-Server API kind:feature MSC for not-core and not-maintenance stuff needs-implementation This MSC does not have a qualifying implementation for the SCT to review. The MSC cannot enter FCP. proposal A matrix spec change proposal

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants