Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Healthcheck for Portainer Agents Running on Docker Swarm #8578

Closed
joe-eklund opened this issue Mar 1, 2023 · 5 comments
Closed

Healthcheck for Portainer Agents Running on Docker Swarm #8578

joe-eklund opened this issue Mar 1, 2023 · 5 comments

Comments

@joe-eklund
Copy link

Hello-

I am attempting to get a healthcheck running on my portainer agents that are running on swarm nodes. I have successfully deployed an agent with a healthcheck on a Docker Swarm cluster of a single node using the following stack file:

version: '3.7'

services:
  agent:
    image: portainer/agent:alpine
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
      - /var/lib/docker/volumes:/var/lib/docker/volumes
    environment:
      AGENT_SECRET: <removed>
      AGENT_CLUSTER_ADDR: localhost
      LOG_LEVEL: DEBUG
    ports:
      - target: 9001
        published: 9001
        protocol: tcp
        mode: host
    networks:
      - portainer_agent
    deploy:
      mode: global
      placement:
        constraints: [node.platform.os == linux]
    healthcheck:
      test: ["CMD-SHELL", "wget --no-verbose --tries=1 --spider --no-check-certificate https://localhost:9001/ping"]
      interval: 30s
      retries: 3
      start_period: 20s
      timeout: 10s
networks:
  portainer_agent:
    driver: overlay
    attachable: true

What was critical in getting the above healthcheck to work was setting the AGENT_CLUSTER_ADDR to localhost, otherwise I get the following error:

2023/02/28 11:09PM INF github.com/portainer/agent/cmd/agent/main.go:83 > agent running on Docker platform |
2023/02/28 11:09PM DBG github.com/portainer/agent/cmd/agent/main.go:93 > member_tags="&{AgentPort:9001 EdgeKeySet:false NodeName:<node1> DockerConfiguration:{EngineStatus:2 Leader:true NodeRole:1} KubernetesConfiguration:{}}"
2023/02/28 11:09PM INF github.com/portainer/agent/cmd/agent/main.go:98 > agent running on a Swarm cluster node. Running in cluster mode |
2023/02/28 11:09PM DBG github.com/portainer/agent/docker/docker.go:104 > retrieving IP address from container network | ip_address=<ip> network_name=portainer_agent_portainer_agent
2023/02/28 11:09PM FTL github.com/portainer/agent/cmd/agent/main.go:148 > unable to retrieve a list of IP associated to the host | error="lookup tasks.portainer_agent_agent on 127.0.0.11:53: no such host" host=tasks.portainer_agent_agent

But this solution does not work once you have a Swarm cluster larger than one node.

Basically this is all to do with Docker will not resolve DNS in a container until a healthcheck (if defined) passes. This means if your service requires talking to other services outside the container itself (i.e. Portainer Agents talking to each other) before it can be running and considered healthy, you have a chicken and egg problem. You can read more about this long standing issue at moby/moby#35451.

My question is does anyone have a solution for deploying a healthcheck for Portainer Agent across multiple nodes in Docker Swarm?

@tamarahenson
Copy link

@joe-eklund

Thank you for the information. I am going to further investigate. I will update you as I learn more.

Thanks!

@tamarahenson
Copy link

@joe-eklund

I want to follow up on this. Portainer Agent requires Docker DNS to be working. Docker DNS will not resolve until the healthcheck is successful. I am going to further review with Product. I will update you as I learn more.

Thanks!

@siddjellali
Copy link

same issue here, thanks :)

@github-actions
Copy link

This issue has been marked as stale as it has not had recent activity, it will be closed if no further activity occurs in the next 7 days. If you believe that it has been incorrectly labelled as stale, leave a comment and the label will be removed.

@github-actions
Copy link

github-actions bot commented Aug 4, 2023

Since no further activity has appeared on this issue it will be closed. If you believe that it has been incorrectly closed, leave a comment mentioning portainer/support and one of our staff will then review the issue. Note - If it is an old bug report, make sure that it is reproduceable in the latest version of Portainer as it may have already been fixed.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Aug 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants