New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Send SNI indication to support vhosts over federation (SYN-620) #1491

Closed
matrixbot opened this Issue Feb 6, 2016 · 17 comments

Comments

Projects
None yet
8 participants
@matrixbot
Member

matrixbot commented Feb 6, 2016

TwoOne problems:

  • Federation doesn't send an SNI indication (because twisted), so for vhosted servers we tend to end up on the default.
  • We send the server_name in the Host header, rather than what the SRV tells us to use. (We think the current behaviour is correct, as per #2525).

(Imported from https://matrix.org/jira/browse/SYN-620)

@matrixbot

This comment has been minimized.

Member

matrixbot commented Feb 6, 2016

Jira watchers: @ara4n @richvdh

@matrixbot

This comment has been minimized.

Member

matrixbot commented Feb 6, 2016

Links exported from Jira:

is duplicated by SYN-233

@matrixbot

This comment has been minimized.

Member

matrixbot commented Jul 13, 2016

doublemalt (re)submitted a PR to try to get twisted to implement SNI in the http client: twisted/twisted#281.

-- @richvdh

@matrixbot matrixbot changed the title Support vhosts over federation (SYN-620) Support vhosts over federation (https://github.com/matrix-org/synapse/issues/1491) Nov 7, 2016

@matrixbot matrixbot changed the title Support vhosts over federation (https://github.com/matrix-org/synapse/issues/1491) Support vhosts over federation (SYN-620) Nov 7, 2016

@richvdh

This comment has been minimized.

Member

richvdh commented Nov 15, 2016

Looks like this is worth another look; the twisted PR has been closed with a link to an API

@nja0087

This comment has been minimized.

nja0087 commented Aug 3, 2017

Install docs should probably be made more clear to account for this issue:

For example, you might want to run your server at synapse.example.com, but have your Matrix user-ids look like @user:example.com

Is not possible with reverse-proxying. The readme states:

Synapse does not currently support SNI on the federation protocol (bug #1491), which means that using name-based virtual hosting is unreliable.

But in actuality it's not unreliable, it's impossible.

@richvdh

This comment has been minimized.

Member

richvdh commented Sep 6, 2017

But in actuality it's not unreliable, it's impossible.

It's possible if the reverse-proxy is configured to forward to synapse by 'default', when there is no SNI header.

@richvdh richvdh changed the title Support vhosts over federation (SYN-620) Send SNI indication to support vhosts over federation (SYN-620) Oct 16, 2017

@simonszu

This comment has been minimized.

simonszu commented Mar 29, 2018

Is this SNI bug still open after two years?

@neilisfragile

This comment has been minimized.

Contributor

neilisfragile commented Mar 29, 2018

For work conducted by the core team it comes down to a question of priority - right now that means dealing with the massive growth on matrix.org, hence the bias towards performance in the common case.

With that in mind, community contributions much appreciated :)

@krombel

This comment has been minimized.

Contributor

krombel commented Mar 29, 2018

AFAIKT synapse now at least sends the SNI headers.
At least my reverse proxy shows my mxdomain for the federation requests that are coming in.

@richvdh

This comment has been minimized.

Member

richvdh commented Apr 25, 2018

@krombel: I don't think so. I can't see any SNI headers on SSL traffic arriving on my server.

@terribleplan

This comment has been minimized.

terribleplan commented May 25, 2018

re:

We think the current behaviour is correct, as per #2525

I'm aware this conversation may be long passed, but I just wanted to throw my 2¢ out there:

Because SRV never got mainstream traction I would expect SRV resolution to take place in user-space and then any client to make a standard request to https://SRV_HOST:SRV_PORT and validate that they have an SSL certificate for SRV_HOST. Doing it as is currently implemented has some interesting implications:

  1. Compromise of matrix.example.com results in a valid SSL cert for example.com
  2. example.com is not a name that the server is intended to be reachable at through normal means (it's unexpected/surprising to have the vhost listen on that, avoiding surprises is good)
  3. It is hard to host a matrix.example.com on the same machine as example.com due to having to configure path or other odd routing methods since that server is expected to respond to matrix requests on example.com and serve normal web requests for example.com. (This could be valid when using SRV records for some sort of HA/failover)

The one positive I see with the current implementation is that it allows for SRV records to point to IP addresses, and works around any issues about what certs to validate there.

@richvdh

This comment has been minimized.

Member

richvdh commented Aug 22, 2018

Hopefully this is finally fixed as of 0.33.3, thanks to @vojeroen.

@richvdh richvdh closed this Aug 22, 2018

@euank

This comment has been minimized.

Contributor

euank commented Aug 22, 2018

Unfortunately, until the ecosystem of federated servers have all upgraded, SNI still can't be relied on since older servers won't send it.

Putting a homeserver behind SNI right now will mean you can only federate with a subset of up to date servers.

Unfortunately, there's also not a good summary of the "versions" present on the broader matrix network, so it's difficult for server operators to know when they can rely on SNI.

This relates to issue matrix-org/matrix.org#67 to some degree.

As a start, a server operator should check /_matrix/federation/v1/version for all the servers they already federate with to make sure flipping on an SNI load balancer or such wont' break existing rooms / chats.

However, it still breaks communication with other servers that aren't new enough, and there's really not a great way for a server operator to make an informed decision about when there are few enough active servers in the broader matrix fediverse of older versions that they're okay breaking them.

As far as I know, good tooling for doing the above isn't available, so I'll be writing some once-off postgres queries and scripts to do it for my server, but it's not really reasonable to expect every server operator to do that before using an LB which requires SNI.

@krombel

This comment has been minimized.

Contributor

krombel commented Aug 22, 2018

@euank That is the case with every new feature. This issue just mentioned sending SNI support - not requiring it on the receiving side. That will stay the case for some time.
And just to note: It is possible to use the user-agent header to identify which server versions are connecting your server. As you are interested in the incoming connections that might be a better place to check if the federating servers would be able to connect when you start requiring SNI support

@euank

This comment has been minimized.

Contributor

euank commented Aug 22, 2018

That is the case with every new feature

@krombel this is different than other features. Most of the time, if my server has a new feature X, old servers will simply not use it, but I can still receive messages from users on those servers.

SNI is special in that if I require SNI, there is no degradation; all old servers will simply get a certificate error and I won't be able to communicate with them at all. To my knowledge, there hasn't been any other such feature that had such a negative impact to use.

This issue just mentioned sending SNI support - not requiring it on the receiving side

They're two sides of the same coin. People want it to be sent so they can host synapse as they do other http endpoints: behind a service proxy, load balancer, ingress controller, whatever. The issue title even mentions "support vhosts" which is another way to say "require SNI on the receiving side".

Perhaps we should create a new issue for the receiving side, which would basically be figuring out when to update the readme (here), that is to say figuring out what criterion we need to meet in the broader matrix network before we can recommend running synapse behind some SNI-aware LB.

And just to note: It is possible to use the user-agent header to identify which server versions are connecting your server. As you are interested in the incoming connections that might be a better place to check if the federating servers would be able to connect when you start requiring SNI support

I think it's worse. User-agents can be re-written (e.g. if it goes through certain proxies or other things) and are a less specified behaviour in server-server communication. The server-server api at least documents the api I referenced.

@euank

This comment has been minimized.

Contributor

euank commented Oct 19, 2018

I decided to check the adaption of this feature from my view of the network to determine if I could finally put synapse behind a regular load balancer like all the other http services I run. The tl;dr is I don't feel I can rely on this issue being fixed yet for a large enough percentage of federation clients yet, so I can't.

I'm sharing the information I collected to decide this below in case anyone else following this issue for a similar reason finds it useful.

The hello-matrix site conveniently offers a list of servers and their version, so I went ahead and checked what that data shows:

$ curl -s "https://www.hello-matrix.net/public_servers.php?format=json" | jq '.[] | select(.last_response == 200).server_version' -r | sort | uniq -c
      6 null
      1 0.26.0
      2 0.29.0
      2 0.30.0
      1 0.31.2
      2 0.32.2
      1 0.33.0
      4 0.33.2.1
      1 0.33.3
      4 0.33.3.1
      8 0.33.4
     10 0.33.5.1
     32 0.33.6
      1 0.33.7rc2

At the time of writing, of the 75 active servers tracked by hello-matrix, 19 of them (~25%) are on versions too old to send SNI headers. The lower bound is 17% if the 'null' versioned servers all support it, but it seems more likely those versions are very old really.

Now, this isn't really representative because most servers don't add themselves to the list, and those that do are probably more closely involved in the matrix ecosystem and upgrading.
I also decided to check the list of servers I federate with to see what impact it might have on my server.

$ psql ..... 
database=# COPY (SELECT server_name FROM server_keys_json) TO '/tmp/servers-from-keys.csv' WITH CSV DELIMITER ',';

$ wc -l /tmp/servers-from-keys.csv 
3146 /tmp/servers-from-keys.csv

$ ./fetch-version-stats.sh < /tmp/servers-from-keys.csv > server_stats.csv

(script here if anyone wants it 🤷‍♂️)

Looking at the information from servers I have ever federated with, I get 1919 servers that no longer respond (typically because the operator is no longer running a synapse server for whatever reason), 446 that are < version 0.33.3, and 781 >= 0.33.3.

That means that putting synapse's federation endpoint behind a load balancer partitions me from 35% of the servers mine has interacted with (that are still active).

Of course, there's still one more statistic which is more useful: what about servers I've recently interacted with?
In reality, most of those servers in the 3k my server has seen aren't that active anymore, or at least I don't communicate with them anymore so it doesn't really matter that much if I partition myself from them, right?

Let's also look at the servers I've specifically talked to in the last 1 month period:

SELECT split_part(sender, ':', 2) as server FROM events WHERE received_ts > (extract(epoch from TIMESTAMP 'now'::timestamp - '1 month'::interval) * 1000) GROUP BY server;

Throwing the output of that query through my stats process tells me that 17% of the servers I've specifically received events from over the last 1 month period are on synapse versions too old to send SNI headers.

In summary, I don't think we can yet rely on SNI headers for our synapse setups unless we're okay with partitioning ourselves off from a subset of the fediverse.

It does seem like the majority of servers are up to date, but there's still enough lagging behind that I'm personally going to continue to dedicate a wonky special ingress setup just to matrix.

@neilisfragile

This comment has been minimized.

Contributor

neilisfragile commented Oct 19, 2018

Hi @euank fwiw those stats in terms of percentages are in line with our view from matrix.org with 0.31.2 being strangely popular in 4th place

richvdh added a commit that referenced this issue Nov 15, 2018

richvdh added a commit that referenced this issue Nov 15, 2018

richvdh added a commit that referenced this issue Nov 15, 2018

neilisfragile added a commit that referenced this issue Nov 15, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment