Healthcheck for Portainer Agents Running on Docker Swarm #433

joe-eklund · 2023-02-28T23:21:17Z

Hello-

I am attempting to get a healthcheck running on my portainer agents that are running on swarm nodes. I have successfully deployed an agent with a healthcheck on a Docker Swarm cluster of a single node using the following stack file:

version: '3.7'

services:
  agent:
    image: portainer/agent:alpine
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
      - /var/lib/docker/volumes:/var/lib/docker/volumes
    environment:
      AGENT_SECRET: <removed>
      AGENT_CLUSTER_ADDR: localhost
      LOG_LEVEL: DEBUG
    ports:
      - target: 9001
        published: 9001
        protocol: tcp
        mode: host
    networks:
      - portainer_agent
    deploy:
      mode: global
      placement:
        constraints: [node.platform.os == linux]
    healthcheck:
      test: ["CMD-SHELL", "wget --no-verbose --tries=1 --spider --no-check-certificate https://localhost:9001/ping"]
      interval: 30s
      retries: 3
      start_period: 20s
      timeout: 10s
networks:
  portainer_agent:
    driver: overlay
    attachable: true

What was critical in getting the above healthcheck to work was setting the AGENT_CLUSTER_ADDR to localhost, otherwise I get the following error:

2023/02/28 11:09PM INF github.com/portainer/agent/cmd/agent/main.go:83 > agent running on Docker platform |
2023/02/28 11:09PM DBG github.com/portainer/agent/cmd/agent/main.go:93 > member_tags="&{AgentPort:9001 EdgeKeySet:false NodeName:<node1> DockerConfiguration:{EngineStatus:2 Leader:true NodeRole:1} KubernetesConfiguration:{}}"
2023/02/28 11:09PM INF github.com/portainer/agent/cmd/agent/main.go:98 > agent running on a Swarm cluster node. Running in cluster mode |
2023/02/28 11:09PM DBG github.com/portainer/agent/docker/docker.go:104 > retrieving IP address from container network | ip_address=<ip> network_name=portainer_agent_portainer_agent
2023/02/28 11:09PM FTL github.com/portainer/agent/cmd/agent/main.go:148 > unable to retrieve a list of IP associated to the host | error="lookup tasks.portainer_agent_agent on 127.0.0.11:53: no such host" host=tasks.portainer_agent_agent

But this solution does not work once you have a Swarm cluster larger than one node.

Basically this is all to do with Docker will not resolve DNS in a container until a healthcheck (if defined) passes. This means if your service requires talking to other services outside the container itself (i.e. Portainer Agents talking to each other) before it can be running and considered healthy, you have a chicken and egg problem. You can read more about this long standing issue at moby/moby#35451.

My question is does anyone have a solution for deploying a healthcheck for Portainer Agent across multiple nodes in Docker Swarm?

The text was updated successfully, but these errors were encountered:

joe-eklund · 2023-03-01T22:53:51Z

Closing in favor of portainer/portainer#8578.

joe-eklund closed this as completed Mar 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Healthcheck for Portainer Agents Running on Docker Swarm #433

Healthcheck for Portainer Agents Running on Docker Swarm #433

joe-eklund commented Feb 28, 2023

joe-eklund commented Mar 1, 2023

Healthcheck for Portainer Agents Running on Docker Swarm #433

Healthcheck for Portainer Agents Running on Docker Swarm #433

Comments

joe-eklund commented Feb 28, 2023

joe-eklund commented Mar 1, 2023