Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nomad stop csi plugin job without waiting of shutdown jobs that used volume provided by csi plugin #22192

Closed
pavel-z1 opened this issue May 22, 2024 · 2 comments
Assignees
Labels
theme/docs Documentation issues and enhancements theme/drain theme/storage

Comments

@pavel-z1
Copy link

Hi,

We use ceph csi plugin on nomad clients.
This plugin provides volume for several nomad jobs.

The problem is that nomad stops csi plugin job before jobs, depended on ceph volume, are stopped.
As result we have situations when csi plugin job stopped, but all jobs that used ceph volume handed. As result docker service can't stop docker containers. The only way out in this situation is to reboot the nomad client node.

In the nomad client logs we see that nomad client stop jobs in correct order. He try to stop jobs with service type first, then try to stop csi plugin job. But Nomad doesn't wait for service jobs to stop finished before sending exit code to csi plugin job.

Usually during one second nomad send interrupt command for all jobs, as result volume stop before depended docker containers finished shutdown operations.

Is there a solution for this problem?

Nomad version

Nomad v1.6.3
BuildDate 2023-10-30T12:58:10Z
Revision e0497bf

Operating system and Environment details

Rocky Linux release 8.8 (Green Obsidian)

Reproduction steps

  1. Deploy ceph csi plugin
  2. Deploy ceph volume
  3. Deploy job that used ceph volume
  4. Sent to drain several times node clients with options: Force Drain, Drain System Jobs

Expected Result

CSI plugin job can be stopped only when all jobs that depend on plugin volume are stopped.

Nomad Client logs (if appropriate)

May 22 12:18:38 prd-nomad-client-01 nomad[1392]:    2024-05-22T12:18:38.723+0200 [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=b67f2ccc-a37d-cda2-c5b5-2ab0ce7ef288 task=jenkins type=Killing msg="Sent interrupt. Waiting 5s before force killing" failed=false
May 22 12:18:38 prd-nomad-client-01 nomad[1392]: client.alloc_runner.task_runner: Task event: alloc_id=b67f2ccc-a37d-cda2-c5b5-2ab0ce7ef288 task=jenkins type=Killing msg="Sent interrupt. Waiting 5s before force killing" failed=false
May 22 12:18:38 prd-nomad-client-01 consul[1387]: 2024-05-22T12:18:38.752+0200 [INFO]  agent: Deregistered service: service=_nomad-task-b67f2ccc-a37d-cda2-c5b5-2ab0ce7ef288-jenkins-jenkins-http
May 22 12:18:39 prd-nomad-client-01 nomad[1392]:    2024-05-22T12:18:39.707+0200 [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=7ad25a0f-d4d0-e6c5-f470-526f1b9938aa task=ceph-node type=Killing msg="Sent interrupt. Waiting 5s before force killing" failed=false
May 22 12:18:39 prd-nomad-client-01 nomad[1392]: client.alloc_runner.task_runner: Task event: alloc_id=7ad25a0f-d4d0-e6c5-f470-526f1b9938aa task=ceph-node type=Killing msg="Sent interrupt. Waiting 5s before force killing" failed=false
May 22 12:18:39 prd-nomad-client-01 consul[1387]: 2024-05-22T12:18:39.728+0200 [INFO]  agent: Deregistered service: service=_nomad-task-7ad25a0f-d4d0-e6c5-f470-526f1b9938aa-ceph-node-ceph-csi-nodes-metrics
May 22 12:18:39 prd-nomad-client-01 dockerd[1390]: time="2024-05-22T12:18:39.815242473+02:00" level=info msg="ignoring event" container=d3bdb24d18311464f1922c0e5d8e89929b8cfa5492ed6f4c641c9c4f97cc7244 module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"

Here we see that nomad sent. interrupt signal to Jenkins job at 12:18:38, after that sent interrupt to ceph csi plugin task at 12:18:39 without waiting for Jenkins stop

@tgross tgross added this to Needs Triage in Nomad - Community Issues Triage via automation May 22, 2024
@tgross
Copy link
Member

tgross commented Jun 21, 2024

Hi @pavel-z1! We generally recommend that you pass the -ignore-system flag when draining a node with CSI volumes on it for this reason. But I see we're missing that from our Recommendations for Deplyoing CSI pluigins, so I'll try to get that added to the docs.

@tgross tgross added theme/storage theme/docs Documentation issues and enhancements and removed type/bug labels Jun 21, 2024
@tgross tgross self-assigned this Jun 21, 2024
@tgross tgross moved this from Needs Triage to Triaging in Nomad - Community Issues Triage Jun 21, 2024
@tgross
Copy link
Member

tgross commented Jun 24, 2024

I took a second look at the code that governs this and remembered that I spent a bunch of time making drain safe to use without extra flags. But I actually missed before that you were using the -force flag on the drain. That's already specifically called out in the docs as not being safe for CSI plugins without -ignore-system:

-force: Remove allocations off the node immediately, regardless of the allocation's migrate block. This will include system jobs and CSI plugins if -ignore-system is not also set, and is not safe for use with CSI node plugins if the volumes are not being detached externally (for example, a cloud VM is being terminated).

@tgross tgross closed this as not planned Won't fix, can't repro, duplicate, stale Jun 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
theme/docs Documentation issues and enhancements theme/drain theme/storage
Projects
Development

No branches or pull requests

2 participants