Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Synapse workers leave "not-found failed" units after removal #1461

Closed
rakshazi opened this issue Dec 14, 2021 · 4 comments
Closed

Synapse workers leave "not-found failed" units after removal #1461

rakshazi opened this issue Dec 14, 2021 · 4 comments

Comments

@rakshazi
Copy link
Contributor

Hello,

when you enable synapse workers and then disable them, even after removal by playbook workers' units' info still exists in systemd and you can see not-found failed state if you list services (just run systemctl without params to get the full list).

That's not a problem itself (even if you google for such behavior you'll find answers like "that's ok"), but when you run playbook again, it will fail with following errors (keep in mind that units were already removed and those services are just "ghosts" without any actual service):

failed: [your.host] (item={'key': 'matrix-synapse-worker-federation_sender-0.service', 'value': {'name': 'matrix-synapse-worker-federation_sender-0.service', 'state': 'stopped', 'status': 'failed', 'source': 'systemd'}}) => {"ansible_loop_var": "item", "changed": false, "item": {"key": "matrix-synapse-worker-federation_sender-0.service", "value": {"name": "matrix-synapse-worker-federation_sender-0.service", "source": "systemd", "state": "stopped", "status": "failed"}}, "msg": "Could not find the requested service matrix-synapse-worker-federation_sender-0.service: host"}
failed: [your.host] (item={'key': 'matrix-synapse-worker-frontend_proxy-18771.service', 'value': {'name': 'matrix-synapse-worker-frontend_proxy-18771.service', 'state': 'stopped', 'status': 'failed', 'source': 'systemd'}}) => {"ansible_loop_var": "item", "changed": false, "item": {"key": "matrix-synapse-worker-frontend_proxy-18771.service", "value": {"name": "matrix-synapse-worker-frontend_proxy-18771.service", "source": "systemd", "state": "stopped", "status": "failed"}}, "msg": "Could not find the requested service matrix-synapse-worker-frontend_proxy-18771.service: host"}

To fix that issue manually, you can run systemctl reset-failed, but I think how it can be automated.

My first idea was to add following task right under "Ensure any worker services are stopped" task in the roles/matrix-synapse/tasks/synapse/workers/setup_uninstall.yml:

- name: Ensure any worker services are properly removed
  command: "systemctl reset-failed {{ item.key }}" # note about command - reset-failed is available neither in ansible.builtin.service nor ansible.builtin.systemd
  when: ansible_service_mgr == "systemd" # because that's special hack required only for systemd
  with_dict: "{{ ansible_facts.services|default({})|dict2items|selectattr('key', 'match', 'matrix-synapse-worker-.+\\.service')|list|items2dict }}"

But it will not work on the first run (because units will not be marked as not-found failed at that moment), so it should be actually before the "Ensure any worker services are stopped" to fix the issue, but it will look weird.

Sorry, I don't have better idea how to implement it, so here is the solution (the code above) - I hope you will find a correct place to add it

@spantaleev
Copy link
Owner

Do you get these errors when you do --tags=start?

From what I remember, we are dynamically populating the list of services that need to be started as the playbook executes. If workers are disabled, there should never be a matrix-synapse-worker-* systemd service in the "services that should be started" list, regardless of whether such a systemd .service exists on the host or not.

Or is this some error during worker cleanup, not during --tags=start?

@rakshazi
Copy link
Contributor Author

The error is part of workers cleanup process (task "Ensure any worker services are stopped"), so it's --tags setup-all, not during start

@spantaleev
Copy link
Owner

Thanks for reporting this! While working on the Dendrite support branch (#818), I've encountered this same problem (matrix_synapse_enabled: false and it tries to uninstall Synapse along with all old workers, etc.)


Seems like running a bar systemctl doesn't output these failed units for me on CentOS 7.9.

The Ansible service_facts built-in module which collects the unit files actually performs systemctl list-units --no-pager --type service --all: https://github.com/ansible/ansible/blob/bc753c0518fd87c38fd3304f860fe55e00276303/lib/ansible/modules/service_facts.py#L247

I see a bunch of (not-found, inactive, dead) services when I do systemctl list-units --no-pager --type service --all | grep synapse.

Interestingly, neither systemctl reset-failed (to reset all), not systemctl reset-failed SERVICE_NAME change anything with regard to what I see for systemctl list-units --no-pager --type service --all | grep synapse.


Thankfully, ansible_facts.services contains a list of key/value things like this:

      matrix-synapse-worker-appservice-0.service:
        name: matrix-synapse-worker-appservice-0.service
        source: systemd
        state: stopped
        status: not-found

By excluding status != 'not-found' we can work around it, which is what I've done in 4625b34.

Let's see how it goes with this fix. If anyone has a better idea, we can revisit this.

HarHarLinks pushed a commit to HarHarLinks/matrix-docker-ansible-deploy that referenced this issue Feb 16, 2022
@ofalvai
Copy link
Contributor

ofalvai commented Mar 30, 2022

I just want to mention I ran into the same problem, even with the fix applied a few months ago. What helped me is running systemctl reset-failed SERVICE_NAME, which completely removed the service entry (I'm running Debian, not CentOS).

Maybe other people discovering this thread will find it useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants