Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Matrix-Synapse Client API and Federation 502 Errors with External Nginx Server #790

Closed
thw26 opened this issue Jan 13, 2021 · 10 comments
Closed

Comments

@thw26
Copy link

thw26 commented Jan 13, 2021

The problem

After installing the playbook and attempting to run Matrix, I am bumping into a 502 error that I'm having trouble pinning down and correcting. This error is preventing logging into server.

The Server, Configs, and Ansible

The Matrix install was performed on an Ubuntu 20.04 server that had previously used the Matrix Synapse repo and matrix-synapse-py3 package. Matrix was properly functioning in that installation. This package has been removed from the server. (I turned to this Ansible installation after failing to get Coturn properly functioning for Audio and Video in Element.)

The current installation through Ansible has imported the database and media store from the previous install.

# vars.yml Customization
matrix_postgres_db_name: synapse
matrix_nginx_proxy_enabled: false
matrix_ssl_retrieval_method: none
matrix_synapse_allow_public_rooms_over_federation: true
matrix_mailer_sender_address: "someone@domain.tld"
matrix_mailer_relay_use: true
matrix_mailer_relay_host_name: "mail.domain.tld"
matrix_mailer_relay_host_port: 587
matrix_mailer_relay_auth: true
matrix_mailer_relay_auth_username: "someone@domain.tld"
matrix_mailer_relay_auth_password: "password"

I am utilizing the generated Nginx configs through an external Nginx server, as the server hosts other websites and a Mailcow-dockerized installation. I am not seeing any conflicting ports between the the Matrix and Mailcow-dockerized installations.

I have added the following to the base domain's Nginx server config.

        location /.well-known/matrix {
                proxy_pass https://matrix.domain.tld/.well-known/matrix;
                proxy_set_header X-Forwarded-For $remote_addr;
        }

The Ansible installation is running.

docker ps
IMAGE                           COMMAND                  CREATED          STATUS                    PORTS                                                                                                                                                                                                                               NAMES
matrixdotorg/synapse:v1.24.0    "python -m synapse.a…"   39 minutes ago   Up 39 minutes (healthy)   127.0.0.1:8008->8008/tcp, 8009/tcp, 127.0.0.1:8048->8048/tcp, 8448/tcp                                                                                                                                                              matrix-synapse
ma1uta/ma1sd:2.4.0-amd64        "/start.sh"              53 minutes ago   Up 53 minutes             127.0.0.1:8090->8090/tcp                                                                                                                                                                                                            matrix-ma1sd
vectorim/element-web:v1.7.16    "/docker-entrypoint.…"   53 minutes ago   Up 53 minutes             80/tcp, 127.0.0.1:8765->8080/tcp                                                                                                                                                                                                    matrix-client-element
instrumentisto/coturn:4.5.1.3   "turnserver -c /turn…"   53 minutes ago   Up 53 minutes             0.0.0.0:3478->3478/tcp, 0.0.0.0:3478->3478/udp, 0.0.0.0:5349->5349/udp, 0.0.0.0:5349->5349/tcp, 0.0.0.0:49152-49172->49152-49172/udp                                                                                                matrix-coturn
postgres:13.1-alpine            "docker-entrypoint.s…"   53 minutes ago   Up 53 minutes             5432/tcp                                                                                                                                                                                                                            matrix-postgres
devture/exim-relay:4.93.1-r0    "exim -bdf -q15m"        53 minutes ago   Up 53 minutes             8025/tcp                                                                                                                                                                                                                            matrix-mailer
…

Nginx is running as is Synpase; port 443 is most definitely open.

● matrix-synapse.service - Synapse server
     Loaded: loaded (/etc/systemd/system/matrix-synapse.service; enabled; vendor preset: enabled)
     Active: active (running) since Tue 2021-01-12 23:37:04 EST; 24min ago
    Process: 327517 ExecStartPre=/usr/bin/env docker kill matrix-synapse (code=exited, status=1/FAILURE)
    Process: 327538 ExecStartPre=/usr/bin/env docker rm matrix-synapse (code=exited, status=1/FAILURE)
   Main PID: 327545 (docker)
      Tasks: 10 (limit: 9285)
     Memory: 26.2M
     CGroup: /system.slice/matrix-synapse.service
             └─327545 docker run --rm --name matrix-synapse --log-driver=none --user=998:1004 --cap-drop=ALL --entrypoint=python --read-only --tmpfs=/tmp:rw,noexec,nosuid,size=2500m --network=matrix -p 127.0.0.1:8008:8008 -p 127.0.0.1:8048:8048 --mount type=bind,src=/matrix/synapse/config,dst=/data,ro --mount type=bind,src=/matrix/synapse/storage,dst=/matrix-media-store-parent,bind-propagation=slave docker.io/matrixdotorg/synapse:v1.24.0 -m synapse.app.homeserver -c /data/homeserver.yaml>

Jan 12 23:37:04 hostname matrix-synapse[327545]: WARNING: Error loading config file: .dockercfg: $HOME is not defined
Jan 12 23:37:06 hostname matrix-synapse[327545]: This server is configured to use 'matrix.org' as its trusted key server via the
Jan 12 23:37:06 hostname matrix-synapse[327545]: 'trusted_key_servers' config option. 'matrix.org' is a good choice for a key
Jan 12 23:37:06 hostname matrix-synapse[327545]: server since it is long-lived, stable and trusted. However, some admins may
Jan 12 23:37:06 hostname matrix-synapse[327545]: wish to use another server for this purpose.
Jan 12 23:37:06 hostname matrix-synapse[327545]: To suppress this warning and continue using 'matrix.org', admins should set
Jan 12 23:37:06 hostname matrix-synapse[327545]: 'suppress_key_server_warning' to 'true' in homeserver.yaml.
Jan 12 23:37:06 hostname matrix-synapse[327545]: --------------------------------------------------------------------------------
Jan 12 23:37:06 hostname matrix-synapse[327545]: 2021-01-13 04:37:06,933 - root - 319 - WARNING - None - ***** STARTING SERVER *****
Jan 12 23:37:06 hostname matrix-synapse[327545]: 2021-01-13 04:37:06,933 - root - 320 - WARNING - None - Server /usr/local/lib/python3.8/site-packages/synapse/app/homeserver.py version 1.24.0

Errors, Logs, Federation Testing

I am seeing the following errors on a self-check.

TASK [matrix-synapse : Check Matrix Client API] ********************************************************
fatal: [matrix.domain.tld]: FAILED! => {"changed": false, "connection": "close", "content_length": "150", "content_type": "text/html", "date": "Wed, 13 Jan 2021 04:37:54 GMT", "elapsed": 1, "msg": "Status code was 502 and not [200]: HTTP Error 502: Bad Gateway", "redirected": false, "server": "nginx", "status": 502, "url": "https://matrix.domain.tld/_matrix/client/versions"}
...ignoring

TASK [matrix-synapse : Fail if Matrix Client API not working] ******************************************
fatal: [matrix.domain.tld]: FAILED! => {"changed": false, "msg": "Failed checking Matrix Client API is up at `matrix.domain.tld` (checked endpoint: `https://matrix.domain.tld/_matrix/client/versions`). Is Synapse running? Is port 443 open in your firewall? Full error: {'redirected': False, 'url': 'https://matrix.domain.tld/_matrix/client/versions', 'status': 502, 'server': 'nginx', 'date': 'Wed, 13 Jan 2021 04:37:54 GMT', 'content_type': 'text/html', 'content_length': '150', 'connection': 'close', 'elapsed': 1, 'changed': False, 'failed': True, 'msg': 'Status code was 502 and not [200]: HTTP Error 502: Bad Gateway'}"}

Nginx reports the following on https://matrix.domain.tld:8448.

2021/01/13 00:09:00 [error] 338700#338700: *90 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: ip.ad.dr.ess, server: matrix.domain.tld, request: "GET / HTTP/2.0", upstream: "http://127.0.0.1:8048/", host: "matrix.domain.tld:8448"

Nginx reports the following on https://element.domain.tld

2021/01/13 00:12:39 [error] 338700#338700: *155 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: ip.ad.dr.ess, server: matrix.domain.tld, request: "GET /_matrix/client/versions HTTP/2.0", upstream: "http://127.0.0.1:8008/_matrix/client/versions", host: "matrix.domain.tld"
2021/01/13 00:12:40 [error] 338700#338700: *155 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: ip.ad.dr.ess, server: matrix.domain.tld, request: "GET /_matrix/client/versions HTTP/2.0", upstream: "http://127.0.0.1:8008/_matrix/client/versions", host: "matrix.domain.tld"
2021/01/13 00:12:42 [error] 338700#338700: *155 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: ip.ad.dr.ess, server: matrix.domain.tld, request: "GET /_matrix/client/r0/login HTTP/2.0", upstream: "http://127.0.0.1:8008/_matrix/client/r0/login", host: "matrix.domain.tld"

Federationtester logs the following Nginx error.

2021/01/13 00:09:47 [error] 338700#338700: *135 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: ip.ad.dr.ess, server: matrix.domain.tld, request: "GET /_matrix/federation/v1/version HTTP/1.1", upstream: "http://127.0.0.1:8048/_matrix/federation/v1/version", host: "matrix.domain.tld:8448"
2021/01/13 00:09:48 [error] 338700#338700: *137 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: ip.ad.dr.ess, server: matrix.domain.tld, request: "GET /_matrix/key/v2/server HTTP/1.1", upstream: "http://127.0.0.1:8048/_matrix/key/v2/server", host: "matrix.domain.tld:8448"

Federationtester reports the following error regarding the JSON.

{
  "WellKnownResult": {
    "m.server": "matrix.domain.tld:8448"
  },
  "DNSResult": {
    "SRVCName": "",
    "SRVRecords": null,
    "SRVError": null,
    "Hosts": {
      "matrix.domain.tld": {
        "CName": "matrix.domain.tld.",
        "Addrs": [
          "ip.ad.dr.ess"
        ],
        "Error": null
      }
    },
    "Addrs": [
      "ip.ad.dr.ess:8448"
    ]
  },
  "ConnectionReports": {},
  "ConnectionErrors": {
    "ip.ad.dr.ess:8448": {
      "Message": "Non-200 response 502 from remote server"
    }
  },
  "Version": {
    "error": "msg=Failed to GET JSON to : \u003chtml\u003e\r\n\u003chead\u003e\u003ctitle\u003e502 Bad Gateway\u003c/title\u003e\u003c/head\u003e\r\n\u003cbody\u003e\r\n\u003ccenter\u003e\u003ch1\u003e502 Bad Gateway\u003c/h1\u003e\u003c/center\u003e\r\n\u003chr\u003e\u003ccenter\u003enginx\u003c/center\u003e\r\n\u003c/body\u003e\r\n\u003c/html\u003e\r\n code=502 wrapped="
  },
  "FederationOK": false
}

Element reports the following.

There was a problem communicating with the homeserver, please try again later.
Cannot reach homeserver
Ensure you have a stable internet connection, or get in touch with the server admin

My hypothesis is something is improperly configured in the Nginx server, but I am not sure what after looking over the matrix-synapse.conf file.

Please let me know if I can provide further information and what information is desired.

@spantaleev
Copy link
Owner

Hi!

Is nginx running on the host itself or is it in some container?

The logs for https://element.DOMAIN look somewhat weird - forwarding to the wrong place.

Those for https://matrix.DOMAIN:8448 look okay, but I'm curious why http://127.0.0.1:8048 is not available to it. You can try doing curl http://127.0.0.1:8048 on the host itself.


If you're running your external nginx server in some container, you'd need other modifications.

@thw26
Copy link
Author

thw26 commented Jan 13, 2021

Hello! Thanks for taking a look.

Nginx is not running in a container but is installed through Ubuntu's repos.

● nginx.service - A high performance web server and a reverse proxy server
     Loaded: loaded (/lib/systemd/system/nginx.service; enabled; vendor preset: enabled)
     Active: active (running) since Tue 2021-01-12 23:55:44 EST; 7h ago
       Docs: man:nginx(8)
    Process: 336693 ExecStartPre=/usr/sbin/nginx -t -q -g daemon on; master_process on; (code=exited, status=0/SUCCESS)
    Process: 336710 ExecStart=/usr/sbin/nginx -g daemon on; master_process on; (code=exited, status=0/SUCCESS)
    Process: 357178 ExecReload=/usr/sbin/nginx -g daemon on; master_process on; -s reload (code=exited, status=0/SUCCESS)
   Main PID: 336711 (nginx)
      Tasks: 3 (limit: 9285)
     Memory: 11.9M
     CGroup: /system.slice/nginx.service
             ├─336711 nginx: master process /usr/sbin/nginx -g daemon on; master_process on;
             ├─357183 nginx: worker process
             └─357184 nginx: worker process

Jan 12 23:55:44 hostname systemd[1]: Starting A high performance web server and a reverse proxy server...
Jan 12 23:55:44 hostname systemd[1]: Started A high performance web server and a reverse proxy server.

Here is curl http://127.0.0.1:8048 output.

curl: (56) Recv failure: Connection reset by peer

Here is telnet 127.0.0.1 8048 output.

Trying 127.0.0.1...
Connected to 127.0.0.1.
Escape character is '^]'.
Connection closed by foreign host.

/etc/hosts contains:

…
127.0.0.1 localhost
…

Here is partial netstat -plnt output, cleaned up to organize ports. I've included port 8080 and 8443, which Mailcow is utilizing; Element was bumped to 8765 by Ansible automatically.

Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name    
tcp        0      0 0.0.0.0:80              0.0.0.0:*               LISTEN      336711/nginx: maste 
tcp        0      0 0.0.0.0:443             0.0.0.0:*               LISTEN      336711/nginx: maste 
tcp        0      0 0.0.0.0:3478            0.0.0.0:*               LISTEN      319257/docker-proxy 
tcp        0      0 0.0.0.0:4190            0.0.0.0:*               LISTEN      280957/docker-proxy 
tcp        0      0 0.0.0.0:5349            0.0.0.0:*               LISTEN      319232/docker-proxy 
tcp        0      0 127.0.0.1:5432          0.0.0.0:*               LISTEN      84392/postgres      
tcp        0      0 127.0.0.1:8008          0.0.0.0:*               LISTEN      327584/docker-proxy 
tcp        0      0 127.0.0.1:8048          0.0.0.0:*               LISTEN      327571/docker-proxy 
tcp        0      0 127.0.0.1:8080          0.0.0.0:*               LISTEN      280513/docker-proxy 
tcp        0      0 127.0.0.1:8090          0.0.0.0:*               LISTEN      319588/docker-proxy 
tcp        0      0 127.0.0.1:8443          0.0.0.0:*               LISTEN      280489/docker-proxy 
tcp        0      0 0.0.0.0:8448            0.0.0.0:*               LISTEN      336711/nginx: maste 
tcp        0      0 127.0.0.1:8765          0.0.0.0:*               LISTEN      319369/docker-proxy 
tcp6       0      0 :::80                   :::*                    LISTEN      336711/nginx: maste 
tcp6       0      0 :::443                  :::*                    LISTEN      336711/nginx: maste 
tcp6       0      0 :::4190                 :::*                    LISTEN      280963/docker-proxy 
tcp6       0      0 :::8448                 :::*                    LISTEN      336711/nginx: maste

Anything else networking related you want to know?


Here are contents of the nginx element.conf file. (I've made no manual modifications besides copying it to my nginx directory, cleaning up the whitespace, and modifying the SSL path. The Mailcow installation took over ports 8080 and 8443 and Ansible modified the port to 8765.) The element webpage loads but reports the server is not found, as mentioned.

server {
        listen 80;

        server_name element.domain.tld;

        server_tokens off;
        root /dev/null;

        location / {
                return 301 https://$http_host$request_uri;
        }
}

server {
        listen 443 ssl http2;
        listen [::]:443 ssl http2;

        server_name element.domain.tld;

        server_tokens off;
        root /dev/null;

        ssl_certificate /etc/letsencrypt/live/element.domain.tld/fullchain.pem;
        ssl_certificate_key /etc/letsencrypt/live/element.domain.tld/privkey.pem;

        ssl_protocols TLSv1.2 TLSv1.3;
        ssl_ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384;
        ssl_prefer_server_ciphers off;

        gzip on;
        gzip_types text/plain application/json application/javascript text/css image/x-icon font/ttf image/gif;
        add_header Strict-Transport-Security "max-age=31536000; includeSubDomains" always;
        add_header X-Content-Type-Options nosniff;
        add_header X-Frame-Options SAMEORIGIN;

        location / {
                proxy_pass http://127.0.0.1:8765;
                proxy_set_header Host $host;
                proxy_set_header X-Forwarded-For $remote_addr;
        }

}

Here are the contents of the matrix-synapse conf. (Modified to reduce whitespace manually and again copied to the nginx directory and modified the SSL path.)

server {
	listen 80;
	server_name matrix.domain.tld;

	server_tokens off;
	root /dev/null;

	location / {
		return 301 https://$http_host$request_uri;
	}
}

server {
	listen 443 ssl http2;
	listen [::]:443 ssl http2;
	server_name matrix.domain.tld;

	server_tokens off;
	root /dev/null;

	ssl_certificate /etc/letsencrypt/live/domain.tld/fullchain.pem;
	ssl_certificate_key /etc/letsencrypt/live/domain.tld/privkey.pem;

	ssl_protocols TLSv1.2 TLSv1.3;
	ssl_ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384;
	ssl_prefer_server_ciphers off;

	gzip on;
	gzip_types text/plain application/json;

	location /.well-known/matrix {
		root /matrix/static-files;
		expires 4h;
		default_type application/json;
		add_header Access-Control-Allow-Origin *;
	}

	location ^~ /_matrix/identity {
		proxy_pass http://127.0.0.1:8090;
		proxy_set_header Host $host;
		proxy_set_header X-Forwarded-For $remote_addr;
	}

	location ^~ /_matrix/client/r0/user_directory/search {
		proxy_pass http://127.0.0.1:8090;
		proxy_set_header Host $host;
		proxy_set_header X-Forwarded-For $remote_addr;
	}

	location ~* ^(/_matrix|/_synapse/client) {
		proxy_pass http://127.0.0.1:8008;
		proxy_set_header Host $host;
		proxy_set_header X-Forwarded-For $remote_addr;
		client_body_buffer_size 25M;
		client_max_body_size 50M;
		proxy_max_temp_file_size 0;
	}

	location / {
		return 302 $scheme://element.domain.tld$request_uri;
	}

}

server {
	listen 8448 ssl http2;
	listen [::]:8448 ssl http2;
	server_name matrix.domain.tld;
	server_tokens off;

	root /dev/null;

	gzip on;
	gzip_types text/plain application/json;

	ssl_certificate /etc/letsencrypt/live/domain.tld/fullchain.pem;
	ssl_certificate_key /etc/letsencrypt/live/domain.tld/privkey.pem;

	ssl_protocols TLSv1.2 TLSv1.3;
	ssl_ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384;
	ssl_prefer_server_ciphers off;

	location / {
		proxy_pass http://127.0.0.1:8048;
		proxy_set_header Host $host;
		proxy_set_header X-Forwarded-For $remote_addr;
		client_body_buffer_size 25M;
		client_max_body_size 150M;
		proxy_max_temp_file_size 0;
	}

}

@spantaleev
Copy link
Owner

Hmm.. strange.. I can't really reproduce this in my testing.

Do you also get "connection refused" when doing curl http://127.0.0.1:8765?

@thw26
Copy link
Author

thw26 commented Jan 13, 2021

No, that's coming through fine.

curl http://127.0.0.1:8765

<!doctype html>
<html lang="en" style="height: 100%;">
  <head>
    <meta charset="utf-8">
    <title>Element</title>
…

Could something have gone wrong with installation (and in attempted setting up the server again after initial install) due to the preexisting matrix-synpase-py3 server?

@spantaleev
Copy link
Owner

Sounds like Synapse may be failing to start then.

Can you try journalctl -fu matrix-synapse?

But.. it was showing up in docker ps, so.. it's strange.

Both curl http://127.0.0.1:8008 and curl http://127.0.0.1:8048 should be returning a response.


I don't see how matrix-synapse-p3 would be getting in the way.

@thw26
Copy link
Author

thw26 commented Jan 15, 2021

I performed Method 2 and everything is working. I'm not sure what makes the difference, perhaps something with Mailcow-dockerized.

@spantaleev
Copy link
Owner

Good to hear you found some way! 👍

@dotprofile
Copy link

I am having a similar issue. I have two VMs, one a reverse proxy and another this project. All my DNS records are fine, but when I curl the services I get connection refused. According to netstat the docker-proxies are binding to localhost. Shouldn't they be binding to 0.0.0.0 in order for my external nginx proxy to work properly?

@spantaleev
Copy link
Owner

spantaleev commented Jan 20, 2021

If you disable matrix-nginx-proxy (matrix_nginx_proxy_enabled: false), by default we assume you'll run another server on that same machine, so we bind to loopback ports.

All of these port bindings can be changed though. If you search for matrix_nginx_proxy enabled in group_vars/matrix_servers, you'll see a lot of instances where we do that.


Instead of exposing each service's ports one by one, you can do something else -- keep using matrix-nginx-proxy, but in a more limited capacity. And then front it with another webserver.

It's described here: https://github.com/spantaleev/matrix-docker-ansible-deploy/blob/master/docs/configuring-playbook-own-webserver.md#method-2-fronting-the-integrated-nginx-reverse-proxy-webserver-with-another-reverse-proxy

You'd probably wish to change matrix_nginx_proxy_container_http_host_bind_port and matrix_nginx_proxy_container_federation_host_bind_port to use a local IP address or 0.0.0.0. You'd then point your other webserver (the one on the network) to this.

@dotprofile
Copy link

Perfect, I don't know why I overlooked that. Everything is working excellently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants