-
Notifications
You must be signed in to change notification settings - Fork 416
MSC4188: Handling HTTP 410 Gone Status in Matrix Server Discovery #4188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implementation requirements:
- Server advertising
410 Gone - Server respecting
410 Gone
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additionally I think this requires a quick survey of existing servers to see if it would have any false positives? Would be curious if any deployed servers e.g. properly have DNS SRV delegation setup, but (for some reason) return HTTP 410 for .well-known.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initially I've pulled the member list of Matrix HQ, created a unique list of 9550 servers, pulled the .well-known for each one, and recorded the HTTP response status.
There were 55 that returned 410 - I've only checked a handful so far and found them all to fail https://federationtester.matrix.org (no SRV, and either timeout or reject port 8448) but I haven't checked them all yet.
I'll find time soon to check them all and provide a conclusive answer, but my initial testing suggests ex-admins are actually already using this code to signal their server is offline, so this feature may offer an immediate benefit to live servers!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry about the delay!
Yeah, I haven't found a single server in HQ that returns 410 and appears to be a live Matrix server. There's a wide variety of return codes being used by servers in that room, but it would seem servers currently advertising 410 are trying to signal that they are no longer Matrix servers - there are none I could find that accept port 8448 or have working SRV delegation.
I'm not sure if that'd technically count as an existing implementation of advertising?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initially I've pulled the member list of Matrix HQ, created a unique list of 9550 servers, pulled the .well-known for each one, and recorded the HTTP response status.
There were 55 that returned 410 - I've only checked a handful so far and found them all to fail https://federationtester.matrix.org (no SRV, and either timeout or reject port 8448) but I haven't checked them all yet.
I'll find time soon to check them all and provide a conclusive answer, but my initial testing suggests ex-admins are actually already using this code to signal their server is offline, so this feature may offer an immediate benefit to live servers!
I confirm I'm one of those. I set up a 410 on my well-known in hopes it would get noticed as down. I'm still getting 1800 queries per day on that page alone
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Server advertising
410 Gone
Does this need to be a server implementation, or can any old server be serving it? I have a few domains now returning 410 Gone that I've simply thrown into a section in my reverse proxy config. I'd assume it'd be a bit silly to require an implementation to return this code given the whole point is that the implementation won't be there anymore/much longer, but the wording isn't clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My interpretation was that we simply need a server that respects the status code, and at least one server that advertises it - it wouldn't need to be a real Matrix server, just a server that correctly advertises that status code, as the objective is to advertise that there isn't a Matrix server here 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah well in that case, both dendrite.nexy7574.co.uk and construct.nexy7574.co.uk are returning HTTP410 (granted with some spice in the response body to try and further ward off servers that don't respect 410 yet) and I plan to decommission further servers into this same response block later for the foreseeable future. If you'd like to count that as a server responding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this just needs a PR for a server honoring the 410 and we can call FCP on this.
| If the suspension period is too long, it may delay the detection of a server that has come back | ||
| online. Conversely, too short and it may lead to unnecessary requests. | ||
|
|
||
| ## Alternatives |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I host my .well-known files statically via GitHub Pages1 so it would be nice if I could specify something in the JSON to indicate that the homeserver is gone.
For example, maybe:
{
"m.gone": true
}The issue with this is that it's not really backwards compatible I think.
Footnotes
-
I know this isn't recommended, since it doesn't set the
Content-Typeheader correctly, but servers are required to be resilient to this. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this as a possible solution, though as you say, it's technically a second separate one.
Perhaps for sanity you could propose additions/changes to the PR through a review to cover this extra type of "gone" reporting?
Then, not only can I make sure it properly caters to your use case, it'd cause you to be listed as a co-contributor on the PR 🙂
proposals/4188-http-gone-status.md
Outdated
| checking for SRV records and/or trying the default port. | ||
|
|
||
| This proposal suggests that homeservers should interpret a `410 Gone` status code as an indication | ||
| that the server is permanently offline and should suspend further discovery attempts for a period |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused - here, it is stated that servers should treat other servers as permanently offline if they receive a 410 code, but in a later review you said that it was intended for people who want to semi-permanently retire, and then further down in the proposal it is recommended that implementations should only honour the 410 status code temporarily (for 1-30 days).
Is this supposed to be used to permanently mark a server as "I'm never coming back", or "I'm not here anymore but I might be back later"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bit of both!
The intention is to say "I'm permanently gone" but no one can actually own a DNS domain permanently, so we should never store that a domain is permanently un-federatable.
The intent is to offer a mechanism for someone to block federation but also potentially change their mind months/years in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, in that case, could the "maximum" cache time be upped to something like 90 days? At 30 days, if you're federating with 25k servers (which isn't uncommon - my tiny synapse in 50 odd rooms already has 10k destinations), that's still potentially 300,000 requests a year best-case, but it'd be a third of that if servers only asked every quarter.
Also, correct me if I'm wrong, but if servers received a new event from a server they previously marked offline, they'll discard that "offline" status and resume standard federation operations with them, right? In that case, I don't think an extended cache lifetime will do much harm if a server decides to come back prematurely.
Of course, if this is better suited for a followup MSC, let me know. Just trying to see if I can find a happy balance between being able to mark my servers as never coming back (which they are), giving servers plenty of notice before they start getting TLS cert errors (since I won't be renewing them anymore), and also still having the opportunity to spin up a new server under the same server name later on down the line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The quoted numbers are mainly just a recommendation rather than a requirement - it's up to a server to decide how often (if ever) it wants to communicate with another, but I probably wouldn't recommend longer than 30 days in the MSC as that's already quite a long time.
The main thing I was worried about was that some implementations might not reset the timer, but you make a good point that shouldn't really be a concern. I'll revisit the wording when I'm logged in tomorrow morning!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks perfect, thanks for the clarifications @tcpipuk 😄
Rendered
Signed-off-by: Tom Foster tom@tcpip.uk