Nomad stop csi plugin job without waiting of shutdown jobs that used volume provided by csi plugin #22192

pavel-z1 · 2024-05-22T11:24:27Z

Hi,

We use ceph csi plugin on nomad clients.
This plugin provides volume for several nomad jobs.

The problem is that nomad stops csi plugin job before jobs, depended on ceph volume, are stopped.
As result we have situations when csi plugin job stopped, but all jobs that used ceph volume handed. As result docker service can't stop docker containers. The only way out in this situation is to reboot the nomad client node.

In the nomad client logs we see that nomad client stop jobs in correct order. He try to stop jobs with service type first, then try to stop csi plugin job. But Nomad doesn't wait for service jobs to stop finished before sending exit code to csi plugin job.

Usually during one second nomad send interrupt command for all jobs, as result volume stop before depended docker containers finished shutdown operations.

Is there a solution for this problem?

Nomad version

Nomad v1.6.3
BuildDate 2023-10-30T12:58:10Z
Revision e0497bf

Operating system and Environment details

Rocky Linux release 8.8 (Green Obsidian)

Reproduction steps

Deploy ceph csi plugin
Deploy ceph volume
Deploy job that used ceph volume
Sent to drain several times node clients with options: Force Drain, Drain System Jobs

Expected Result

CSI plugin job can be stopped only when all jobs that depend on plugin volume are stopped.

Nomad Client logs (if appropriate)

May 22 12:18:38 prd-nomad-client-01 nomad[1392]:    2024-05-22T12:18:38.723+0200 [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=b67f2ccc-a37d-cda2-c5b5-2ab0ce7ef288 task=jenkins type=Killing msg="Sent interrupt. Waiting 5s before force killing" failed=false
May 22 12:18:38 prd-nomad-client-01 nomad[1392]: client.alloc_runner.task_runner: Task event: alloc_id=b67f2ccc-a37d-cda2-c5b5-2ab0ce7ef288 task=jenkins type=Killing msg="Sent interrupt. Waiting 5s before force killing" failed=false
May 22 12:18:38 prd-nomad-client-01 consul[1387]: 2024-05-22T12:18:38.752+0200 [INFO]  agent: Deregistered service: service=_nomad-task-b67f2ccc-a37d-cda2-c5b5-2ab0ce7ef288-jenkins-jenkins-http
May 22 12:18:39 prd-nomad-client-01 nomad[1392]:    2024-05-22T12:18:39.707+0200 [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=7ad25a0f-d4d0-e6c5-f470-526f1b9938aa task=ceph-node type=Killing msg="Sent interrupt. Waiting 5s before force killing" failed=false
May 22 12:18:39 prd-nomad-client-01 nomad[1392]: client.alloc_runner.task_runner: Task event: alloc_id=7ad25a0f-d4d0-e6c5-f470-526f1b9938aa task=ceph-node type=Killing msg="Sent interrupt. Waiting 5s before force killing" failed=false
May 22 12:18:39 prd-nomad-client-01 consul[1387]: 2024-05-22T12:18:39.728+0200 [INFO]  agent: Deregistered service: service=_nomad-task-7ad25a0f-d4d0-e6c5-f470-526f1b9938aa-ceph-node-ceph-csi-nodes-metrics
May 22 12:18:39 prd-nomad-client-01 dockerd[1390]: time="2024-05-22T12:18:39.815242473+02:00" level=info msg="ignoring event" container=d3bdb24d18311464f1922c0e5d8e89929b8cfa5492ed6f4c641c9c4f97cc7244 module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"

Here we see that nomad sent. interrupt signal to Jenkins job at 12:18:38, after that sent interrupt to ceph csi plugin task at 12:18:39 without waiting for Jenkins stop

The text was updated successfully, but these errors were encountered:

tgross · 2024-06-21T20:49:37Z

Hi @pavel-z1! We generally recommend that you pass the -ignore-system flag when draining a node with CSI volumes on it for this reason. But I see we're missing that from our Recommendations for Deplyoing CSI pluigins, so I'll try to get that added to the docs.

tgross · 2024-06-24T19:44:53Z

I took a second look at the code that governs this and remembered that I spent a bunch of time making drain safe to use without extra flags. But I actually missed before that you were using the -force flag on the drain. That's already specifically called out in the docs as not being safe for CSI plugins without -ignore-system:

-force: Remove allocations off the node immediately, regardless of the allocation's migrate block. This will include system jobs and CSI plugins if -ignore-system is not also set, and is not safe for use with CSI node plugins if the volumes are not being detached externally (for example, a cloud VM is being terminated).

pavel-z1 added the type/bug label May 22, 2024

tgross added this to Needs Triage in Nomad - Community Issues Triage via automation May 22, 2024

tgross added theme/storage theme/docs Documentation issues and enhancements and removed type/bug labels Jun 21, 2024

tgross self-assigned this Jun 21, 2024

tgross moved this from Needs Triage to Triaging in Nomad - Community Issues Triage Jun 21, 2024

tgross added the theme/drain label Jun 21, 2024

tgross closed this as not planned Won't fix, can't repro, duplicate, stale Jun 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nomad stop csi plugin job without waiting of shutdown jobs that used volume provided by csi plugin #22192

Nomad stop csi plugin job without waiting of shutdown jobs that used volume provided by csi plugin #22192

pavel-z1 commented May 22, 2024

tgross commented Jun 21, 2024

tgross commented Jun 24, 2024

Nomad stop csi plugin job without waiting of shutdown jobs that used volume provided by csi plugin #22192

Nomad stop csi plugin job without waiting of shutdown jobs that used volume provided by csi plugin #22192

Comments

pavel-z1 commented May 22, 2024

Nomad version

Operating system and Environment details

Reproduction steps

Expected Result

Nomad Client logs (if appropriate)

tgross commented Jun 21, 2024

tgross commented Jun 24, 2024