Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MSC3255: Use SRV record for homeservers discovery by clients #3255

Closed
wants to merge 1 commit into from

Conversation

Berbe
Copy link

@Berbe Berbe commented Jun 21, 2021

Use SRV record for homeservers discovery by clients

@Berbe Berbe changed the title Add: Use SRV record for homeservers discovery by clients MSC3255: Use SRV record for homeservers discovery by clients Jun 21, 2021
@Berbe Berbe marked this pull request as ready for review June 21, 2021 23:19
@@ -0,0 +1,29 @@
# Proposal to leverage SRV records to discover homeservers from clients
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I know this MSC is impossible. the difference exists as Browsers are unable to access any DNS related data. So it is impossible for those to get the SRV record unfortunately. Apart from that the SRV record is from what I heard more or less deprecated and well-known should get preferred. Though on the deprecation the spec core team should correct me if I am wrong.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Impossible for browser-based, aka Web clients, which the reference implementation Element belongs to. That way, I admit my additional note might hurt my proposal a tad.

However, that merely questions the soundness of using a browser framework as a foundation for clients. The specifications being broader that any choice made during implementation, I see nothing in this MSC being impossible nor invalid.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we've been there before. I suggest looking at MSC1708 (it's for federation but the rationale is equally applicable to client-server interaction). Also, the original MSC for .well-known lookup in clients says this in the rationale (aside from the point mentioned above about browser-based clients being unable to use SRV records):

  • Well-known is designed to provide a namespace for application where arbitrary data can be served. It makes the process of delivering Home and Identity server URLs trivial while DNS SRV record only provides a way to return a single hostname and port per record, without a URL scheme or path.

Regarding the soundness of using a browser framework - I personally can relate to that but nobody's going to rewrite Element Web outside of the web, and it's the most used Matrix client for now and in the foreseeable future. Not supporting web implementations is a deal-breaker, I'm afraid.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, that merely questions the soundness of using a browser framework as a foundation for clients. The specifications being broader that any choice made during implementation, I see nothing in this MSC being impossible nor invalid.

I think you have that backwards. We are not saying to use .well-known because we are only thinking about browser-based clients. We are saying that browser-based clients are something that we want people to be able to create, and so things that we add to the spec should take browser-based clients into consideration unless there's a really good reason to omit support for browser-based clients.

As a more concrete example, there is an MSC for a low-bandwidth CS API, which is something that browser-based clients cannot use, but is being pursued because it gives an actual practical benefit. On the other hand, browser-based clients aren't really disadvantaged by it because they can still use the normal HTTP-based API. So it has a big benefit + no disadvantage to browser-based clients means that nobody is going to raise the "it doesn't support browser-based clients" flag on that MSC.

However, this proposal has little benefit, and prevents use by browser-based clients, hence the "it doesn't support browser-based clients" flag gets raised here.

Copy link
Author

@Berbe Berbe Jun 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess my looking glass was putting more concern on the users and the operators of instances than on the clients' developers.

Having a service you can delegate to any host, any port, is flexible, and opens possibilities. The SRV record is there to make sure resolution for a specific service encompasses all the required information to connect to said service, hence address and port. It is an elegant, long-lasting feature of the DNS standard, serving this exact purpose.
The way I see it, the .well-known feature is trying to reimplement/reinvent that name resolution ability on the HTTP layer.

As you pointed out in your own comment, if one wants to supports browser-based clients today, configuration of a well-known path seems somewhat mandatory. This proposal doesn't remove anything to that, hence such client can continue being served, provided instances' management take special care of them.

The benefit here is to manage service resolution at the proper location, in the name resolution protocol, allowing strong decoupling between actual application layers and the piping leading to them.
The leads to simpler names being used in clients (helps users and adoption), and name resolution can potentially exclusively be managed in DNS if operators wish so (helps operators).

Copy link
Contributor

@MTRNord MTRNord Jun 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue with this is mainly that if we add SRV there will be people only adding that and complaining on the element-web repo or other web based client repos that it doesnt work. Simply because this makes "normal admins" think that this solves it for all clients while it clearly cannot while well-known archives a uniform experience for all clients and therefor users.

allowing strong decoupling between actual application layers and the piping leading to them.

I dont think this makes in this proto too much of a difference simply for the fact it is a HTTP based proto. If it were a mainly tcp or udp based proto DNS would make for sure sense but HTTP is for matrix the common ground thats already a hard requirement anyway so it can be expected to work with any client and server setup while DNS will not always work due to browser limitations.

The leads to simpler names being used in clients (helps users and adoption)

well-known already solved this.

and name resolution can potentially exclusively be managed in DNS if operators wish so (helps operators).

I dont think this is actually true as like above said (and also by others) this would exclude proper usage from any web or electron client which is quite a substantial amount of clients at this time.

This proposal doesn't remove anything to that, hence such client can continue being served, provided instances' management take special care of them.

While it doesnt remove them I think this will lead to even more confusion on this process than already exists.

This proposal doesn't remove anything to that, hence such client can continue being served, provided instances' management take special care of them. It is an elegant, long-lasting feature of the DNS standard, serving this exact purpose.

well-known does the same thing according to RFC8615 (well-known RFC).

I guess my looking glass was putting more concern on the users and the operators of instances than on the clients' developers.

I honestly dont see the benefit for operators or users here as it just adds confusion and more possible configuration mistakes because people aren't aware that browser applications dont have access to DNS requests.

@richvdh
Copy link
Member

richvdh commented Jun 22, 2021

related: element-hq/element-web#2682

# Proposal to leverage SRV records to discover homeservers from clients

Currently, the [specifications on server discovery by client](https://spec.matrix.org/unstable/client-server-api/#server-discovery) merely mentions the use of the `/.well-known/matrix/` HTTP path.
This comes in contradiction with the [specifications on server discovery by servers](https://spec.matrix.org/unstable/server-server-api/#server-discovery) which also leverage the existence of a SRV record.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My experience of the use of SRV in server-server discovery is that the number of times it is useful there is tiny, and its existence repeatedly confuses people into thinking it is something they want or need.

Irrespective of the practicalities of supporting this in web-based clients, I would be very much opposed to adding this complication to the C-S API, particularly given there doesn't seem to be a particularly compelling reason for it.

Copy link
Member

@KitsuneRal KitsuneRal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, I suggest the MSC author to do a search on ".well-known" and/or "SRV" within this repo. There's quite a bit of prior art on the subject. Given that this effectively proposes a reversal of the change introduced a few years ago, there should be some really strong arguments overriding the reasons why that original change took place.

@@ -0,0 +1,29 @@
# Proposal to leverage SRV records to discover homeservers from clients
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we've been there before. I suggest looking at MSC1708 (it's for federation but the rationale is equally applicable to client-server interaction). Also, the original MSC for .well-known lookup in clients says this in the rationale (aside from the point mentioned above about browser-based clients being unable to use SRV records):

  • Well-known is designed to provide a namespace for application where arbitrary data can be served. It makes the process of delivering Home and Identity server URLs trivial while DNS SRV record only provides a way to return a single hostname and port per record, without a URL scheme or path.

Regarding the soundness of using a browser framework - I personally can relate to that but nobody's going to rewrite Element Web outside of the web, and it's the most used Matrix client for now and in the foreseeable future. Not supporting web implementations is a deal-breaker, I'm afraid.

# Proposal to leverage SRV records to discover homeservers from clients

Currently, the [specifications on server discovery by client](https://spec.matrix.org/unstable/client-server-api/#server-discovery) merely mentions the use of the `/.well-known/matrix/` HTTP path.
This comes in contradiction with the [specifications on server discovery by servers](https://spec.matrix.org/unstable/server-server-api/#server-discovery) which also leverage the existence of a SRV record.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no contradiction here. Server-Server API is exposed at an unrelated set of endpoints and can even be served from a different host than Client-Server API.

adoption, the most complicated when a HTTP path is not used by the Matrix instance operator.
For instance, in such a case, when an instance operator uses a `_matrix._tcp.example.org` SRV record
pointing to an `example.com` instance on port `8448`, the `example.org` hostname shall be used
in conjunction with the instance's (`example.com`) port for a client to find the homeserver.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's pretty fragile and not very flexible. If I want to expose my Client-Server API at matrix.example.com (with Server-Server API still running at example.com:8448), what do I do?

Copy link
Author

@Berbe Berbe Jun 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part was merely describing what address is to be used today when you do not leverage the .well-known ability, making the whole process ugly and confusing.
In your example, I reckon the use of a _matrix._tcp.matrix.example.com. SRV record poiting to (example.com, 8448) by a client would be satisfactory? You would only have to input matrix.example.com in said client, while today you'll have to use example.com:8448.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid we're not on the same page. If my MXID is @me:example.com, and the Client-Server API is served at matrix.example.com but .well-known record is not there, I pass https://matrix.example.com to my client. In the meantime, other servers federating with example.com can find that they should connect to example.com:8448 in the SRV record and/or (since MSC1708) at .well-known/matrix/server; but clients do not have anything to do with this.

@turt2live turt2live added kind:feature MSC for not-core and not-maintenance stuff needs-implementation This MSC does not have a qualifying implementation for the SCT to review. The MSC cannot enter FCP. proposal A matrix spec change proposal proposal-in-review client-server Client-Server API labels Jun 22, 2021
distributed design of DNS and the usual cache done by recursive (non-authoritative) resolvers,
distributed amongst different operators, makes it so extra DNS requests usually have small to no impact
on a target instance. On the other hand, HTTP endpoints scaling remaining in the hands of the end-of-line
operators, extra HTTP request _do_ have an immediate impact on them.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you are unable to handle scaling for what is essentially a static file, I'd question your ability to handle scaling for your Matrix server.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment misses the point.
The question at hand is not how one individual handles the scale but how and where the load is handled. By design those 2 protocols work totally differently. The key here is DNS is way more robust, being distributed, than HTTP could ever be.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point is that we don't look at things in isolation, but we look at things in the context of the larger system. Yes, DNS scales better than .well-known. But given that clients only check the .well-known at login/registration time, and .well-known is essentially a static file (and is a static file in many setups), scaling the actual Matrix server is a much bigger issue than scaling the .well-known server, and an admin who is able to scale the Matrix server should also be able to scale the .well-known server. So even though DNS does scale better than .well-known, and there is no argument about that, it just isn't a compelling argument for switching to DNS in this situation, especially in balance with other arguments for sticking with .well-known.


## Proposal

The `SRV` record shall be used as specified in the server -> server API for client -> server discovery.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't really enough detail for a proposal. Are you proposing to remove the .well-known method of discovery, or are you proposing SRV as an alternative? And if it's an alternative, then which one is preferred?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The specs I quote include both: I never remotely implied .well-known shall be removed, only that SRV shall be used.
I already weighted the benefits and the costs of doing one before the other, but that could even be considered nitpicking. The important part is that SRV is to be tried, be it first or after .well-known fails.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thing is, an MSC has to provide enough detail that a spec editor knows exactly how the spec should be changed, because the next step after an MSC is accepted is actually changing the spec to reflect what the MSC says. So big questions such as which one should be tried first can't be left open. (Unless you intend for it to be up to the client developer which one should be tried first, in which case you should explicitly say that. But that's probably a bad idea.)

DNS being designed to scale.

## Additional information
This proposal would help proving clients, like Element, cf. [vector-im/element-web#15054](https://github.com/vector-im/element-web/issues/15054#issuecomment-681969376)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that the linked issue relates to the Element Web, and we've already established that Element Web cannot make use of SRV records, that doesn't seem relevant here. (Also, the linked issue turned out to be a misconfiguration rather than an actual problem.)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As already pointed out, I am afraid this Additional information section actually hurts the proposal, because the issue belongs to one implementation which made the implementation choice of a browser framework which kills the SRV ability.
If you feel the same, I could drop that section altogether.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think the linked issue can be dropped from this proposal.

@@ -0,0 +1,29 @@
# Proposal to leverage SRV records to discover homeservers from clients

Currently, the [specifications on server discovery by client](https://spec.matrix.org/unstable/client-server-api/#server-discovery) merely mentions the use of the `/.well-known/matrix/` HTTP path.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should also be noted that .well-known does more than just discovery of the Matrix server location. It can also provide discovery of the preferred identity server and integration server, and other information as well. It can be extended as needed, and clients can even make up their own extensions if they want to.

@turt2live
Copy link
Member

This MSC doesn't appear to have been particularly well received by members of the SCT or by the community. There may be a practical use case within the MSC's problem statement, however the approach seems too far off from how it would be solved for this to stay open.

As such,

@mscbot fcp close

@mscbot
Copy link
Collaborator

mscbot commented Jul 19, 2022

Team member @turt2live has proposed to close this. The next step is review by the rest of the tagged people:

Once at least 75% of reviewers approve (and there are no outstanding concerns), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for information about what commands tagged team members can give me.

@mscbot mscbot added disposition-close proposed-final-comment-period Currently awaiting signoff of a majority of team members in order to enter the final comment period. labels Jul 19, 2022
@turt2live turt2live added this to Needs idea feedback / initial review in Spec Core Team Backlog via automation Jul 19, 2022
@turt2live turt2live moved this from Needs idea feedback / initial review to Ready for FCP ticks in Spec Core Team Backlog Jul 19, 2022
@mscbot
Copy link
Collaborator

mscbot commented Jul 26, 2022

🔔 This is now entering its final comment period, as per the review above. 🔔

@mscbot mscbot added final-comment-period This MSC has entered a final comment period in interest to approval, postpone, or delete in 5 days. and removed proposed-final-comment-period Currently awaiting signoff of a majority of team members in order to enter the final comment period. labels Jul 26, 2022
@richvdh richvdh moved this from Ready for FCP ticks to In FCP in Spec Core Team Backlog Jul 26, 2022
@mscbot
Copy link
Collaborator

mscbot commented Jul 31, 2022

The final comment period, with a disposition to close, as per the review above, is now complete.

@mscbot mscbot closed this Jul 31, 2022
@mscbot mscbot added finished-final-comment-period and removed disposition-close final-comment-period This MSC has entered a final comment period in interest to approval, postpone, or delete in 5 days. labels Jul 31, 2022
@turt2live turt2live added rejected A proposal which has been rejected for inclusion in the spec and removed finished-final-comment-period labels Jul 31, 2022
@turt2live turt2live moved this from In FCP to Done to some definition in Spec Core Team Backlog Jul 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
client-server Client-Server API kind:feature MSC for not-core and not-maintenance stuff needs-implementation This MSC does not have a qualifying implementation for the SCT to review. The MSC cannot enter FCP. proposal A matrix spec change proposal rejected A proposal which has been rejected for inclusion in the spec
Projects
Archived in project
Spec Core Team Backlog
  
Done to some definition
Development

Successfully merging this pull request may close these issues.

None yet

7 participants