Authentik worker become "unhealthy" and never recover after restarting reddis docker container #6221

freender · 2023-07-11T18:36:24Z

Describe the bug
Authentik worker become "unhealthy" and never recover after restarting reddis docker container

To Reproduce
Steps to reproduce the behavior:

Check if authentik worker is up and running

docker inspect auth-worker | grep Status

Actual Result = Expected Result

"Status": "running",
    "Status": "healthy",

Restart "reddis" docker container
Run worker healthcheck

docker exec auth-worker /lifecycle/ak healthcheck

Actual Result:
Worker lost reddis connectivity, the only option to fix is to restart the authentik worker

root@NAS:~# docker exec auth-worker /lifecycle/ak healthcheck
{"event":"checking health","level":"debug","mode":"worker","timestamp":"2023-07-11T14:26:48-04:00"}
{"delta":104.817957282,"event":"Worker hasn't updated heartbeat in threshold","level":"warning","threshold":30,"timestamp":"2023-07-11T14:26:48-04:00"}

Expected behavior
Please investigate if we can detect such cases and automatically recover. As for now the only option is to restart 'authentik worker'

Version and Deployment (please complete the following information):

authentik version: 2023.6.1
Deployment: docker 20.10.24, unraid

The text was updated successfully, but these errors were encountered:

a-gerhard · 2023-08-07T13:39:23Z

I can confirm this. It seems like once the worker has successfully connected to redis, and then the redis connection is lost, the worker does not handle the resulting Exception (where it should be trying to re-establish the connection).

This is an issue for us, because we use Watchtower to keep our containers up-to-date, and therefore Redis container is recreated regularly in our setup.

Solution: Either have the worker exit (and therefore restart) when the Redis connection becomes unavailable, or find a way to try to re-connect to redis if a connection loss is detected.

mgrimace · 2023-08-28T17:42:49Z

I'm experiencing the authentik-worker becoming unhealthy using image: ghcr.io/goauthentik/server:2023.6.1no specific actions or changes, just noticed in portainer.

a-gerhard · 2023-09-08T15:35:20Z

As a workaround until this is fixed, I have set up autoheal for the workers.

add this to docker-compose.yml

  autoheal:
    restart: always
    image: willfarrell/autoheal
    environment:
      - AUTOHEAL_CONTAINER_LABEL=autoheal
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock

and then add the autoheal label to your worker service:

    labels:
      autoheal: "true"

mgrimace · 2023-09-08T20:13:27Z

autoheal:
restart: always
image: willfarrell/autoheal
environment:
- AUTOHEAL_CONTAINER_LABEL=autoheal
volumes:
- /var/run/docker.sock:/var/run/docker.sock

Thank you for this, I noticed that others in this thread were also using watchtower, and I tested adding the label:

   
labels:
      com.centurylinklabs.watchtower.enable: false

to each service in the Authentik stack.

This also appears to have solved the issue for me (at least in the short term). Perhaps something to do with Watchtower attempting to update/restart(?) redis, which is not on a fixed version, while the worker remains on a fixed version. My knowledge is limited in this area, but hopefully another piece of the puzzle.

authentik-automation · 2023-11-08T01:47:04Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

mooglestiltzkin · 2023-12-03T05:04:10Z

i noticed this issue as well. can we expect a permanent fix?

yes i also use watchtower. I noticed the correlation between straight after receiving watchtower email notification about restarting authentik (probably because it was updating), after that event then the worker became unhealthy i noticed.

keliansb · 2024-02-17T15:54:42Z

I'm also facing this issue, and using Watchtower to update the database and Redis

freender added the bug Something isn't working label Jul 11, 2023

authentik-automation bot added the wontfix label Nov 8, 2023

authentik-automation bot closed this as not planned Won't fix, can't repro, duplicate, stale Nov 15, 2023

channel-42 mentioned this issue Mar 31, 2024

feat(charts/authentik): add probes to worker deployment goauthentik/helm#255

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Authentik worker become "unhealthy" and never recover after restarting reddis docker container #6221

Authentik worker become "unhealthy" and never recover after restarting reddis docker container #6221

freender commented Jul 11, 2023 •

edited

a-gerhard commented Aug 7, 2023

mgrimace commented Aug 28, 2023

a-gerhard commented Sep 8, 2023

mgrimace commented Sep 8, 2023

authentik-automation bot commented Nov 8, 2023

mooglestiltzkin commented Dec 3, 2023

keliansb commented Feb 17, 2024

Authentik worker become "unhealthy" and never recover after restarting reddis docker container #6221

Authentik worker become "unhealthy" and never recover after restarting reddis docker container #6221

Comments

freender commented Jul 11, 2023 • edited

a-gerhard commented Aug 7, 2023

mgrimace commented Aug 28, 2023

a-gerhard commented Sep 8, 2023

mgrimace commented Sep 8, 2023

authentik-automation bot commented Nov 8, 2023

mooglestiltzkin commented Dec 3, 2023

keliansb commented Feb 17, 2024

freender commented Jul 11, 2023 •

edited