Replies: 11 comments
-
Here's another screenshot, after scaling the above Only when closely looking for a missing value in the |
Beta Was this translation helpful? Give feedback.
-
Another another one, that would make me stop and think why does this service still have pending tasks? |
Beta Was this translation helpful? Give feedback.
-
And why did the previous task fail for this service (even though it actually completed successfully): |
Beta Was this translation helpful? Give feedback.
-
We have the same problems with our removed nodes. |
Beta Was this translation helpful? Give feedback.
-
@sorenhansendk can you add your experience to the issue covering the root cause in Docker swarm mode? moby/moby#34122 |
Beta Was this translation helpful? Give feedback.
-
We have the same issue, with ghost tasks on nodes there is removed. The tasks still "running" but the nodes dosen't exist anymore - but Portainer still shows the tasks on the overview. |
Beta Was this translation helpful? Give feedback.
-
I have same issue. |
Beta Was this translation helpful? Give feedback.
-
Counters will be fixed with PR #2127 , tasks list can be fixed with the same "hack" way (e.g. hiding tasks that have |
Beta Was this translation helpful? Give feedback.
-
FYI, I've discovered a way to remove the ghost tasks. Scale a service down to 0 then click the checkbox and then click |
Beta Was this translation helpful? Give feedback.
-
After removing one node from a swarm a bit too hastily, I can vouch that hiding stuck/ghost tasks may be ill-advised, especially if you rely on using service names for routing connections (like HTTP from nginx service with a published port to a container belonging to a service with unpublished port, sharing the same network). Any "ghosted" tasks will still be considered as valid targets for the swarm's internal load balancing, regardless of their actual state (this took a while to figure out). The workaround of scaling to 0, force updating and then scaling back does work, but it might require toggling back and forth with especially persistent ghost tasks. |
Beta Was this translation helpful? Give feedback.
-
It just took me about half an hour to review a list of 174 services looking for any where replicas > scale AND no node was defined for some of the replicas, scale them all down to zero (page reload every time), click update (page reload every time), scale back up (another page reload every time), and lots of scrolling a long list in between every click/reload... if Portainer didn't reload after every scale/update, that would make this process WAY quicker and easier... Even better would be if Portainer had a button that did all this for me... Or perhaps this could be done automatically with a Docker image deployed to a manager node and bind mounting the docker socket... I doubt Docker will ever fix this bug in swarm... |
Beta Was this translation helpful? Give feedback.
-
Is your feature request related to a problem? Please describe.
Docker swarm mode has a bug where vestigial "ghost" tasks on nodes that have been removed from the swarm are never removed from the task history.
docker swarm update --task-history-limit 0
(or 1 or 5) does not fix the problem.This is causing us a serious problem in being able to ascertain the running status of all our stacks, as we need carefully examine the detail of every task to find out if it is actually running or if only ghost tasks are still being reported as running.
Describe the solution you'd like
scale
link.Describe alternatives you've considered
I think the only way to remove these ghost tasks is to delete and recreate from scratch all stacks that contain affected services. For us this is around 20 production stacks that will experience downtime as a result.
Without a "recreate stack" button, I will need to manually copy and paste the stack name, stack file, and each environment variable name and value individually back into the form (while the site is down).
Some of our stacks have 15 environment variables. So that's over 30 copy and paste operations to recreate the stack and clear out these vestigial ghost tasks, while the site is down.
Additional context
I realise that the root problem is a bug in Docker swarm mode, not Portainer, but the impact of this bug seriously limits the usefulness of Portainer in being able to confidently (or at all) ascertain the running state of our stacks.
This issue has been open for some time with Docker already with not much traction. And even if it were fixed tomorrow, I fear the fix would be at least 3 months before it lands in Docker for AWS. Or more likely 6 months or even never, given that Docker for AWS is already a full stable release behind and Docker has switched to a twice-a-year release cadence.
Here's the open issue: moby/moby#34122
Here's a screenshot from Portainer:
Even if solutions 1 and 2 from above are rejected as workarounds for someone else's bug, 3 should at least allow us to get back to a normal state with as little downtime as possible and might also be useful in other scenarios.
Beta Was this translation helpful? Give feedback.
All reactions