Harden the agent to be reliable against node reboot, crash, shutdown etc. #114

ghost · 2020-04-20T11:41:49Z

Currently the agent is not reliable in certain situations and sometimes needs to be force updated or removed and re-deployed either when a node is rebooted, crashes, is drained for maintenance or is under a lot of load. This can also occur when the docker daemon is restarted.

When the above issues occur, the endpoint could show as down, or you might see an error when browsing different views in Portainer such as Failure could not retrieve images.

The agent should be made more reliable as it should handle these situations.

Additional info:
The symptoms of this are discussed a lot here on the Portainer repo, but I have moved it here to be a feature request

The text was updated successfully, but these errors were encountered:

nivekuil · 2020-12-04T07:16:33Z

I was able to break the portainer agent, with the UI showing the swarm in a "down" state and not loading and erroring on every page, by quicking draining 2/3 nodes in a 3-node 3-manager swarm. One of the drained nodes was the leader. Problem went away after restarting the agent on the remaining node.

keywinf · 2021-10-26T09:09:02Z

Hi there, seems to me that demoted managers are still treated as managers, and vice-versa. The UI gets errors such as "cannot retrieve tasks, services, etc." in a config like 1 Manager + 1 Worker. I suppose that agents behind the scene are not appropriate.

huib-portainer · 2021-10-26T21:49:34Z

Are you running the Agent globally?

    deploy:
      mode: global
      placement:
        constraints: [node.platform.os == linux]

Which is different from Portainer, which will be running on the manager node:

    deploy:
      mode: replicated
      replicas: 1
      placement:
        constraints: [node.role == manager]

Unless there's something going wrong with your overlay network...

keywinf · 2021-10-26T22:00:48Z

Yeah

keywinf · 2021-10-26T22:11:09Z

And I do not encounter this problem with two manager nodes. That's only with 1 manager and 1 worker. If I let the autorefresh on the UI, it will give a list, then a red error flash, then a list, then a red error flash, etc.

yorickdowne · 2022-06-20T14:32:28Z

This continues to be an issue for us with docker swarm mode. The agent keeps the old IP when a node is rebooted, either worker or manager. We run 3 managers and 3 workers. docker service update --force portainer_agent fixes it as long as all nodes stay up and do not reboot.

The issue is exacerbated by running in a cloud environment - AWS - with ephemeral private IPs. It would likely never surface if the nodes had statically assigned IPs.

ghost added the enhancement New feature or request label Apr 20, 2020

This was referenced Apr 20, 2020

Endpoint Instability portainer/portainer#2535

Closed

Duplicate resources with agent endpoint portainer/portainer#1867

Closed

ghost changed the title ~~Harden the agent to be resillient against node reboot, crash, shutdown etc.~~ Harden the agent to be reliable against node reboot, crash, shutdown etc. Apr 24, 2020

ghost mentioned this issue Apr 24, 2020

Agent running on worker node not updating state when node is promoted to manager portainer/portainer#2937

Closed

samdulam mentioned this issue Nov 10, 2021

Portainer-CE agent on docker swarm keeps old agent IP after node reboot portainer/portainer#6051

Closed

t0mtaylor mentioned this issue Jan 17, 2023

Portainer Performance Issues portainer/portainer#2729

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Harden the agent to be reliable against node reboot, crash, shutdown etc. #114

Harden the agent to be reliable against node reboot, crash, shutdown etc. #114

ghost commented Apr 20, 2020 •

edited by ghost

nivekuil commented Dec 4, 2020

keywinf commented Oct 26, 2021 •

edited

huib-portainer commented Oct 26, 2021

keywinf commented Oct 26, 2021

keywinf commented Oct 26, 2021 •

edited

yorickdowne commented Jun 20, 2022 •

edited

Harden the agent to be reliable against node reboot, crash, shutdown etc. #114

Harden the agent to be reliable against node reboot, crash, shutdown etc. #114

Comments

ghost commented Apr 20, 2020 • edited by ghost

nivekuil commented Dec 4, 2020

keywinf commented Oct 26, 2021 • edited

huib-portainer commented Oct 26, 2021

keywinf commented Oct 26, 2021

keywinf commented Oct 26, 2021 • edited

yorickdowne commented Jun 20, 2022 • edited

ghost commented Apr 20, 2020 •

edited by ghost

keywinf commented Oct 26, 2021 •

edited

keywinf commented Oct 26, 2021 •

edited

yorickdowne commented Jun 20, 2022 •

edited