Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Working Matrix install broken after trying to update! Help!! #3146

Closed
drelephant opened this issue Feb 2, 2024 · 13 comments
Closed

Working Matrix install broken after trying to update! Help!! #3146

drelephant opened this issue Feb 2, 2024 · 13 comments

Comments

@drelephant
Copy link
Contributor

drelephant commented Feb 2, 2024

Playbook Configuration:

My vars.yml file looks like this:

---
# The bare domain name which represents your Matrix identity.
# Matrix user ids for your server will be of the form (`@user:<matrix-domain>`).
#
# Note: this playbook does not touch the server referenced here.
# Installation happens on another server ("matrix.<matrix-domain>").
#
# If you've deployed using the wrong domain, you'll have to run the Uninstalling step,
# because you can't change the Domain after deployment.
#
# Example value: example.com
matrix_domain: sitename.org

# The Matrix homeserver software to install.
# See:
#  - `roles/matrix-base/defaults/main.yml` for valid options
# - the `docs/configuring-playbook-IMPLEMENTATION_NAME.md` documentation page, if one is available for your implementation choice
matrix_homeserver_implementation: synapse

# A secret used as a base, for generating various other secrets.
# You can put any string here, but generating a strong one is preferred (e.g. `pwgen -s 64 1`).
matrix_homeserver_generic_secret_key: 'secret'

# This is something which is provided to Let's Encrypt when retrieving SSL certificates for domains.
#
# In case SSL renewal fails at some point, you'll also get an email notification there.
#
# If you decide to use another method for managing SSL certificates (different than the default Let's Encrypt),
# you won't be required to define this variable (see `docs/configuring-playbook-ssl-certificates.md`).
#
# Example value: someone@example.com
#matrix_ssl_lets_encrypt_support_email: 'revoked@gmail.com'
devture_traefik_config_certificatesResolvers_acme_email: 'redacted@gmail.com'

# A Postgres password to use for the superuser Postgres user (called `matrix` by default).
#
# The playbook creates additional Postgres users and databases (one for each enabled service)
# using this superuser account.
#matrix_postgres_connection_password: 'secret'
devture_postgres_connection_password: 'secret'

#matrix_nginx_proxy_base_domain_serving_enabled: true
matrix_static_files_container_labels_base_domain_enabled: true


matrix_playbook_reverse_proxy_type: playbook-managed-traefik
devture_traefik_config_certificatesResolvers_acme_email: 'redacted@gmail.com'

Matrix Server:

  • OS: Ubuntu 20.04.6 LTS
  • Architecture amd64

Ansible:

I think Ansible is working.

ansible --version

[DEPRECATION WARNING]: Ansible will require Python 3.8 or newer on the controller starting with Ansible 2.12. Current version: 3.7.17 (default, Jun  6 2023, 20:10:10) [GCC 9.4.0]. This
feature will be removed from ansible-core in version 2.12. Deprecation warnings can be disabled by setting deprecation_warnings=False in ansible.cfg.
ansible [core 2.11.12]
  config file = /home/redacted/matrix-docker-ansible-deploy/ansible.cfg
  configured module search path = ['/home/redacted/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /home/redacted/.local/lib/python3.7/site-packages/ansible
  ansible collection location = /home/redacted/.ansible/collections:/usr/share/ansible/collections
  executable location = /home/redacted/.local/bin/ansible
  python version = 3.7.17 (default, Jun  6 2023, 20:10:10) [GCC 9.4.0]
  jinja version = 3.1.2
  libyaml = False

Problem description:

I just tried to update my working but probably very out of date matrix install, but now I've broken things and now it isn't working.

Here's what I did:
ran git pull
installed Prebuilt-MPR
installed just - sudo apt install just
did just roles
ran just setup-all, fixing variables in my inventory/host_vars/sitename/vars.yml as it kept failing with errors:

changed matrix_ssl_lets_encrypt_support_email to devture_traefik_config_certificatesResolvers_acme_email

changed matrix_postgres_connection_password to devture_postgres_connection_password

changed matrix_nginx_proxy_base_domain_serving_enabled to matrix_static_files_container_labels_base_domain_enabled

After reading this about changing to traefik
I added the line matrix_playbook_reverse_proxy_type: playbook-managed-traefik
and added my email as devture_traefik_config_certificatesResolvers_acme_email:

It looks like it was mostly able to update, but now
just setup-all , just stop-all and just start-all now fail at the end with this error:

TASK [galaxy/systemd_service_manager : Ensure systemd is reloaded] **********************************************
s_push: parser stack overflow
fatal: [sitename.org]: FAILED! =>
  msg: |-
    The conditional check 'devture_systemd_service_manager_services_list_to_work_with | length > 0' failed. The error was: An unhandled exception occurred while templating '{{ devture_system
d_service_manager_services_list }}'. Error was a <class 'ansible.errors.AnsibleError'>, original message: An unhandled exception occurred while templating '{{ devture_systemd_service_manager
_services_list_auto + devture_systemd_service_manager_services_list_additional }}'. Error was a <class 'ansible.errors.AnsibleError'>, original message: Unexpected templating type error occu
rred on ({{ devture_systemd_service_manager_services_list_auto + devture_systemd_service_manager_services_list_additional }}): can only concatenate str (not "list") to str

    The error appears to be in '/home/redacted/matrix-docker-ansible-deploy/roles/galaxy/systemd_service_manager/tasks/restart_specified.yml': line 3, column 3, but may
    be elsewhere in the file depending on the exact syntax problem.

    The offending line appears to be:


    - when: devture_systemd_service_manager_services_list_to_work_with | length > 0
      ^ here

PLAY RECAP ******************************************************************************************************
sitename.org : ok=336  changed=4    unreachable=0    failed=1    skipped=521  rescued=0    ignored=0

I'm not sure how to proceed, can anyone help me?

@spantaleev
Copy link
Owner

Have you tried with a newer version of Ansible / Jinja? If you can't get a newer version installed, you may also run Ansible in a container, as described in docs/ansible.md

@drelephant
Copy link
Contributor Author

drelephant commented Feb 2, 2024

I'm trying to work out how to update it now.

I just restarted in the hope that would do something, and just noticed that in my /var/log/syslog there's lots of these messages:

matrix-traefik-certs-dumper[4947]: /in/acme.json is missing.. Waiting (297/inf.)...

Is that something I need to fix?

@spantaleev
Copy link
Owner

matrix-traefik-certs-dumper is a component which looks for new SSL certificates (obtained by Traefik and stored into the /matrix/traefik/ssl/acme.json file). If it discovers new certificates, it dumps them into another directory as standalone files, so that other components (like the Coturn TURN server - installed by default; or Postmoogle email bridge) can use the certificates.

Since it's reporting acme.json as missing, it seems like Traefik cannot obtain any SSL certificates at all.

The Traefik log in systemd-journald would contain more information. See:

  • journalctl -fu matrix-traefik and see if the latest entries indicate any problem
  • journalctl -u matrix-traefik | less and paginate through everything
  • systemctl status matrix-traefik and see if Traefik is running or dead

Generally, the problem is that DNS records are not configured correctly or port 80 in your firewall is not open. Both of these problems cause Let's Encrypt to fail validating your ownership of the domain, so it doesn't issue a certificate for you.

@drelephant
Copy link
Contributor Author

journalctl -u matrix-traefik has no entries.

-- Logs begin at Wed 2024-01-10 07:45:41 AEDT, end at Fri 2024-02-02 18:29:36 AEDT. --
-- No entries --

systemctl status matrix-traefik

● matrix-traefik.service - Traefik (matrix-traefik)
     Loaded: loaded (/etc/systemd/system/matrix-traefik.service; disabled; vendor preset: enabled)
     Active: inactive (dead)

@drelephant
Copy link
Contributor Author

Everything was working today until I tried to update, wouldn't that mean that the DNS records must be ok?

@spantaleev
Copy link
Owner

matrix-traefik seems to be stopped and likely never even started, so.. it seems like the playbook managed to start some services (like matrix-traefik-certs-dumper) somehow, but not Traefik.

In any case, I'd first investigate why the playbook cannot run until completion before trying to chase other problems.
Upgrade your Ansible/Jinja or try running Ansible in a container like described in docs/ansible.md.

Once the playbook runs until completion, you can investigate what's going on.

@drelephant
Copy link
Contributor Author

drelephant commented Feb 2, 2024

Thanks for your help and advice.

I just tried the info from docs/ansible.md - apt-get remove ansible then pip install ansible, but it ended up with the same version.

[DEPRECATION WARNING]: Ansible will require Python 3.8 or newer on the controller starting with Ansible 2.12. Current
 version: 3.7.17 (default, Jun  6 2023, 20:10:10) [GCC 9.4.0]. This feature will be removed from ansible-core in
version 2.12. Deprecation warnings can be disabled by setting deprecation_warnings=False in ansible.cfg.
ansible [core 2.11.12]
  config file = /home/redacted/matrix-docker-ansible-deploy/ansible.cfg
  configured module search path = ['/home/redacted/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /home/redacted/.local/lib/python3.7/site-packages/ansible
  ansible collection location = /home/redacted/.ansible/collections:/usr/share/ansible/collections
  executable location = /home/redacted/.local/bin/ansible
  python version = 3.7.17 (default, Jun  6 2023, 20:10:10) [GCC 9.4.0]
  jinja version = 3.1.2
  libyaml = False

I came across this and mucked around to eventually get pipx upgrade ansible

ansible is already at latest version 6.7.0 (location: /home/redacted/.local/pipx/venvs/ansible)

But I can only seem to run that one using /home/redacted/.local/pipx/venvs/ansible/bin/ansible --version

ansible [core 2.13.13]
  config file = /home/redacted/matrix-docker-ansible-deploy/ansible.cfg
  configured module search path = ['/home/redacted/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /home/redacted/.local/pipx/venvs/ansible/lib/python3.8/site-packages/ansible
  ansible collection location = /home/redacted/.ansible/collections:/usr/share/ansible/collections
  executable location = /home/redacted/.local/pipx/venvs/ansible/bin/ansible
  python version = 3.8.10 (default, Nov 22 2023, 10:22:35) [GCC 9.4.0]
  jinja version = 3.1.3
  libyaml = True

Otherwise it runs the older version.

The instructions for running ansible in a container seem so complicated I'm scared to attempt it!

Do you happen to know how I can get just setup-all to use the newer version?

@spantaleev
Copy link
Owner

One way is to activate the new Python environment by running: /home/redacted/.local/pipx/venvs/activate or /home/redacted/.local/pipx/venvs/bin/activate (I forgot where the binary was).

Then you can call ansible-playbook and it may work. I'm not sure.

Alternatively, you can directly use /home/redacted/.local/pipx/venvs/bin/ansible-playbook in your commands. Also consider uninstalling your regular Ansible to avoid calling it accidentally.

@drelephant
Copy link
Contributor Author

Wooo! It worked with:
/home/redacted/.local/pipx/venvs/ansible/bin/ansible-playbook -i inventory/hosts setup.yml --tags=setup-all,ensure-matrix-users-created,start

I did get one error, but I'll try stop-all, start-all once I work out how to get the arguments from the justfile:

failed: [matrix.redacted.org] (item={'name': 'matrix-coturn.service', 'priority': 900, 'groups': ['matrix', 'coturn']}) => changed=false
  ansible_loop_var: item
  item:
    groups:
    - matrix
    - coturn
    name: matrix-coturn.service
    priority: 900
  msg: |-
    Unable to start service matrix-coturn.service: A dependency job for matrix-coturn.service failed. See 'journalctl -xe' for details.

Thanks!

@spantaleev
Copy link
Owner

To add to the above: if you invoke from just, it will just call ansible-playbook and it's up to your PATH environment variable which one would be found first.


Your matrix-coturn.service error is most likely related to Coturn failing to start because the certs dumper cannot get SSL certificates. Check the Traefik status and logs as mentioned in my previous comment.

@drelephant
Copy link
Contributor Author

drelephant commented Feb 2, 2024

It's all working now after

/home/redacted/.local/pipx/venvs/ansible/bin/ansible-playbook -i inventory/hosts setup.yml --tags=stop-all
/home/redacted/.local/pipx/venvs/ansible/bin/ansible-playbook -i inventory/hosts setup.yml --tags=start-all

When I've recovered from all that, I'll attempt to fix the wrong ansible version getting run by default problem...

Thanks so much for your help!!

@drelephant
Copy link
Contributor Author

Forgot to mention, I also changed my external ip in inventory/hosts because I'm on dynamic ip and it had changed from the initial install years ago, not sure if that had any effect.

Just in case someone else comes across this issue.

Thanks again.

@FizzyTea
Copy link

FizzyTea commented Jun 28, 2024

I am having the same issue but have not been able to overcome it by updating ansible to the version shown below.

  ansible [core 2.13.13]
  config file = /home/joe/Apps/matrix-docker-ansible-deploy/ansible.cfg
  configured module search path = ['/home/joe/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /home/joe/.local/lib/python3.8/site-packages/ansible
  ansible collection location = /home/joe/.ansible/collections:/usr/share/ansible/collections
  executable location = /home/joe/.local/bin/ansible
  python version = 3.8.10 (default, Nov 22 2023, 10:22:35) [GCC 9.4.0]
  jinja version = 3.1.4
  libyaml = True

Any advice would be appreciated as my server is now down and I am at a loss. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants