Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Healthcheck for Portainer Agents Running on Docker Swarm #433

Closed
joe-eklund opened this issue Feb 28, 2023 · 1 comment
Closed

Healthcheck for Portainer Agents Running on Docker Swarm #433

joe-eklund opened this issue Feb 28, 2023 · 1 comment

Comments

@joe-eklund
Copy link

Hello-

I am attempting to get a healthcheck running on my portainer agents that are running on swarm nodes. I have successfully deployed an agent with a healthcheck on a Docker Swarm cluster of a single node using the following stack file:

version: '3.7'

services:
  agent:
    image: portainer/agent:alpine
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
      - /var/lib/docker/volumes:/var/lib/docker/volumes
    environment:
      AGENT_SECRET: <removed>
      AGENT_CLUSTER_ADDR: localhost
      LOG_LEVEL: DEBUG
    ports:
      - target: 9001
        published: 9001
        protocol: tcp
        mode: host
    networks:
      - portainer_agent
    deploy:
      mode: global
      placement:
        constraints: [node.platform.os == linux]
    healthcheck:
      test: ["CMD-SHELL", "wget --no-verbose --tries=1 --spider --no-check-certificate https://localhost:9001/ping"]
      interval: 30s
      retries: 3
      start_period: 20s
      timeout: 10s
networks:
  portainer_agent:
    driver: overlay
    attachable: true

What was critical in getting the above healthcheck to work was setting the AGENT_CLUSTER_ADDR to localhost, otherwise I get the following error:

2023/02/28 11:09PM INF github.com/portainer/agent/cmd/agent/main.go:83 > agent running on Docker platform |
2023/02/28 11:09PM DBG github.com/portainer/agent/cmd/agent/main.go:93 > member_tags="&{AgentPort:9001 EdgeKeySet:false NodeName:<node1> DockerConfiguration:{EngineStatus:2 Leader:true NodeRole:1} KubernetesConfiguration:{}}"
2023/02/28 11:09PM INF github.com/portainer/agent/cmd/agent/main.go:98 > agent running on a Swarm cluster node. Running in cluster mode |
2023/02/28 11:09PM DBG github.com/portainer/agent/docker/docker.go:104 > retrieving IP address from container network | ip_address=<ip> network_name=portainer_agent_portainer_agent
2023/02/28 11:09PM FTL github.com/portainer/agent/cmd/agent/main.go:148 > unable to retrieve a list of IP associated to the host | error="lookup tasks.portainer_agent_agent on 127.0.0.11:53: no such host" host=tasks.portainer_agent_agent

But this solution does not work once you have a Swarm cluster larger than one node.

Basically this is all to do with Docker will not resolve DNS in a container until a healthcheck (if defined) passes. This means if your service requires talking to other services outside the container itself (i.e. Portainer Agents talking to each other) before it can be running and considered healthy, you have a chicken and egg problem. You can read more about this long standing issue at moby/moby#35451.

My question is does anyone have a solution for deploying a healthcheck for Portainer Agent across multiple nodes in Docker Swarm?

@joe-eklund
Copy link
Author

Closing in favor of portainer/portainer#8578.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant