New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MSC1708: .well-known support for server name resolution #1708

Merged
merged 12 commits into from Jan 14, 2019

Conversation

@richvdh
Copy link
Member

richvdh commented Nov 5, 2018

Rendered

@richvdh

This comment has been minimized.

Copy link
Member

richvdh commented Nov 6, 2018

Thanks for the quick feedback @turt2live !

@richvdh richvdh removed the proposal-wip label Nov 7, 2018

@richvdh richvdh added this to To Do in Backend Core Team via automation Nov 8, 2018

@richvdh richvdh moved this from To Do to In Progress: Planned Project Work in Backend Core Team Nov 8, 2018

@richvdh richvdh self-assigned this Nov 8, 2018

Show resolved Hide resolved proposals/1708-well-known-for-federation.md Outdated
Show resolved Hide resolved proposals/1708-well-known-for-federation.md Outdated
Show resolved Hide resolved proposals/1708-well-known-for-federation.md Outdated
Show resolved Hide resolved proposals/1708-well-known-for-federation.md Outdated
Show resolved Hide resolved proposals/1708-well-known-for-federation.md
@erikjohnston

This comment was marked as outdated.

Copy link
Member

erikjohnston commented Nov 12, 2018

@jevolk

This comment was marked as outdated.

Copy link
Contributor

jevolk commented Nov 13, 2018

There are certain fundamental problems with this proposal. Foremost that .well-known is not a replacement for the Domain Name System.

Latency

First consider the total amount of latency being paid by this indirection. To use .well-known for DNS the following round-trips have to occur to speak matrix:

    1. UDP DNS A question to find the .well-known server.
    2. UDP DNS answer with an A record.
    1. TCP SYN to the .well-known server.
    2. TCP SYN-ACK from the .well-known server.
    1. TLS hello to the .well-known server.
    2. TLS certificate from the .well-known server.
    1. TLS client key exchange to the .well-known server.
    2. TLS client key exchange from the .well-known server.
    1. HTTPS GET request to the .well-known server.
    2. HTTPS response with content from the .well-known server.
    1. UDP DNS SRV question to find the matrix server.
    2. UDP DNS answer with an SRV record.
    1. UDP DNS A question to find the matrix server.
    2. UDP DNS answer with an A record.
    1. TCP SYN to the matrix server.
    2. TCP SYN-ACK from the matrix server.
    1. TLS hello to the matrix server.
    2. TLS certificate from the matrix server.
    1. TLS client key exchange to the matrix server.
    2. TLS client key exchange from the matrix server.
    1. HTTPS GET request to the matrix server.
    2. HTTPS response with content from the matrix server.

Now reconsider that in the matrix broadcast model in a large room -- meaning this has to occur with every server.

You Won't Reinvent The DNS Wheel

  • DNS was formally spec'ed by RFC1035 in November, 1987 and remains mostly the same to this day. It is the largest distributed system ever created.

  • It is lightweight, using a binary protocol over single UDP packets to minimize latency.

  • DNS servers are positioned throughout the world close to their users. Every ISP, corporate network, Small-Or-Home-Office router, even individual machines, already handle DNS appropriately for the users behind them. Even these public DNS servers (the first being an anycast setup specifically for distributing DNS load) are positioned for reasonable latency:

PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
64 bytes from 8.8.8.8: icmp_req=1 ttl=122 time=14.6 ms

PING 4.2.2.3 (4.2.2.3) 56(84) bytes of data.
64 bytes from 4.2.2.3: icmp_req=1 ttl=58 time=13.9 ms
  • DNS has a fully mature cache system. All of those already established entities as stated above properly participate in the comprehensive caching of DNS data.

  • DNS already has an established authority delegation system. The sub-domain hierarchy can be partitioned out to other nameservers within some organization.

While I've had some reservations about using SRV records due to the extra RTT as well as the marginal support I've encountered on *nix platforms, I've accepted this price given the specific nature of the matrix application and the value SRV brings to it. Using a webpage, on the other hand, in lieu of DNS is just shameful. The only thing truly worse is using both. Matrix has already gone the SRV route and it should stick with it.

Doesn't Add More Value Than It Costs

Finally consider the cost/benefit of further complicating implementations based on real data from the matrix federation. Here's a sample of federation servers gathered from a few large rooms. The first is just a count of the DNS A queries made by my server. Consider this just a control. (note: not all of the resolved records are still active matrix servers; errors here are counting cached NXDOMAIN responses)

> net host cache A count
resolved:  627
error:     85

Here are the contents of the SRV cache:

> net host cache SRV count
resolved:  289
error:     370

SRV is certainly used, though less than half of the queries find a record. Nevertheless, its usage is still not niche. Let's dig into the cache data itself a bit which I've dumped to a file called foo and then to this pastebin:

The number of non-records and records respectively:

jason@hud:~/charybdis$ grep " :0" foo | wc -l
370
jason@hud:~/charybdis$ grep -v " :0" foo | wc -l
289

The number of SRV records which still point to 8448:

jason@hud:~/charybdis$ grep ":8448" foo | wc -l
235
jason@hud:~/charybdis$ expr 289 - 235
54

There are already a small fraction of users (54 out of 659) who benefit from SRV's capability to change their port from away from 8448. The subset of that fraction requiring a different port who also have trouble setting up an SRV record can still use an alternate port number directly in their mxid. That makes this proposal entirely not worth its cost.

@jcgruenhage

This comment has been minimized.

Copy link
Member

jcgruenhage commented Nov 29, 2018

A note on all the let's look what the rest of the world is doing: Apparently the IETF has decided that DNSSEC+DANE is not enough for SMTP, so they created MTA-MTS, which is basically the same as this proposal, getting trust from .well-known.

Source: https://datatracker.ietf.org/doc/rfc8461/

@jevolk

This comment was marked as off-topic.

Copy link
Contributor

jevolk commented Nov 29, 2018

A note on all the let's look what the rest of the world is doing: Apparently the IETF has decided that DNSSEC+DANE is not enough for SMTP, so they created MTA-MTS, which is basically the same as this proposal, getting trust from .well-known.

Source: https://datatracker.ietf.org/doc/rfc8461/

@jcgruenhage Does that RFC cut down on the latency problems with this proposal by using a TXT record?

  1. Policy Discovery
    MTA-STS policies are distributed via HTTPS from a "well-known"
    [RFC5785] path served within the Policy Domain, and their presence
    and current version are indicated by a TXT record at the Policy
    Domain. These TXT records additionally contain a policy "id" field,
    allowing Sending MTAs to check that a cached policy is still current
    without performing an HTTPS request ... To discover if a recipient domain
    implements MTA-STS, a sender need only resolve a single TXT record.

Can this proposal do that too? It seems appropriate considering the minority use-case here. This way we don't have to make the HTTPS query to find out there is no .well-known in the majority case.

@jcgruenhage

This comment was marked as resolved.

Copy link
Member

jcgruenhage commented Nov 29, 2018

It's worth noting that the absence of such a record could be spoofed by DNS cache poisoning, so I'm not sure that is desirable.

@jcgruenhage
Copy link
Member

jcgruenhage left a comment

Since #1711 is very important and this spec change is considered a pre-requisite for it, I'll try to move forward here: I'd be in favour of matrix-sts (matrix-strict transport security), instead of this PR, based on RFC8461. The goal would not be to move the delegation out of DNS to .well-known, but to confirm it via .well-known. Does that sound like an okay option?

@richvdh

This comment has been minimized.

Copy link
Member

richvdh commented Dec 11, 2018

@jcgruenhage it's certainly something worth considering. I'd need to be convinced it actually represents a material improvement over what's suggested here. I haven't been yet, and my questions would depend on how exactly you envision it working...

@richvdh richvdh added the r0 P1 label Jan 7, 2019

Do a SRV lookup before .well-known lookup
also other clarifications and corrections.
@richvdh

This comment has been minimized.

Copy link
Member

richvdh commented Jan 8, 2019

Ok, so the situation is that we have considered this long and hard, and decided to go with it, despite the lack of elegance.

We've tweaked it slightly to avoid the .well-known lookup if an SRV record is found, which we hope will reduce the need to do .well-knowns in the common case.

I'll spend some time tomorrow writing up why we have chosen to dismiss the other options here, but for now, here is an updated version of this proposal.

@richvdh richvdh force-pushed the rav/proposal/well-known-for-federation branch from 25310fb to 5812450 Jan 8, 2019

@richvdh

This comment has been minimized.

Copy link
Member

richvdh commented Jan 8, 2019

There we go. We'd like to get this finalised ASAP so that we can get people started on the process of sorting out their certificates (which is obviously required for S2S r0), and I think that most people who are likely to care have had plenty of time to look at this. Accordingly, I'm going to propose a FCP on this.

@mscbot fcp merge

@mscbot

This comment has been minimized.

Copy link
Collaborator

mscbot commented Jan 8, 2019

Team member @richvdh has proposed to merge this. The next step is review by the rest of the tagged teams:

No concerns currently listed.

Once a majority of reviewers approve (and none object), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@richvdh richvdh moved this from In Progress: Planned Project Work to Review in Backend Core Team Jan 8, 2019

@uhoreg

uhoreg approved these changes Jan 8, 2019

Copy link
Member

uhoreg left a comment

Looks good overall. Just a few suggestions.

Show resolved Hide resolved proposals/1708-well-known-for-federation.md Outdated
Show resolved Hide resolved proposals/1708-well-known-for-federation.md Outdated
Show resolved Hide resolved proposals/1708-well-known-for-federation.md Outdated
Show resolved Hide resolved proposals/1708-well-known-for-federation.md Outdated
Show resolved Hide resolved proposals/1708-well-known-for-federation.md Outdated
@mscbot

This comment has been minimized.

Copy link
Collaborator

mscbot commented Jan 9, 2019

🔔 This is now entering its final comment period, as per the review above. 🔔

@erikjohnston

This comment has been minimized.

Copy link
Member

erikjohnston commented Jan 10, 2019

For completeness, it might be worth noting the upgrade path for servers already delegating. AIUI, currently the servers would use SRV records pointing to the delegated server, so the upgrade path would be:

  1. .well-known support added to servers (e.g. synapse) and released.
  2. Wait for a while so that ecosystem has time to upgrade.
  3. Servers that delegate add .well-known and remove their SRV records. (This step will cause older servers to not be able to route to the server anymore)
  4. (At a later time MSC1711 would start being enforced).

(If we did .well-known first then step 3 wouldn't cause old servers to fail to route, and so not require waiting in step 2 before doing adding the .well-known, but as noted in the MSC other changes mean that its likely the ecosystem will need to upgrade any way)

@mscbot

This comment has been minimized.

Copy link
Collaborator

mscbot commented Jan 14, 2019

The final comment period, with a disposition to merge, as per the review above, is now complete.

@turt2live turt2live merged commit fe4928c into master Jan 14, 2019

7 checks passed

ci/circleci: build-dev-scripts Your tests passed on CircleCI!
Details
ci/circleci: build-docs Your tests passed on CircleCI!
Details
ci/circleci: build-swagger Your tests passed on CircleCI!
Details
ci/circleci: check-docs Your tests passed on CircleCI!
Details
ci/circleci: validate-docs Your tests passed on CircleCI!
Details
docs Click details to preview the HTML documentation.
Details
swagger Click to preview the swagger build.
Details

Backend Core Team automation moved this from Review to Done - Operations Jan 14, 2019

@neilisfragile neilisfragile moved this from Done - Operations to Done - Planned Project in Backend Core Team Jan 16, 2019

@richvdh richvdh removed their assignment Jan 17, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment