Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSI volumes staging mount path collision between namespaces with CSI plugins that support staging #18741

Closed
ygg-drop opened this issue Oct 12, 2023 · 10 comments · Fixed by #20532
Assignees
Labels
hcc/cst Admin - internal stage/accepted Confirmed, and intend to work on. No timeline committment though. theme/storage type/bug
Milestone

Comments

@ygg-drop
Copy link

Nomad version

Output from nomad version

# nomad version
Nomad v1.6.2+ent
BuildDate 2023-09-13T17:11:57Z
Revision a93af012175baf56ed529b9f97e24afbf3415738

Operating system and Environment details

Ubuntu 23.04

Issue

CSI volumes are namespace-scoped in Nomad, yet Nomad does not include the namespace name in mount path for CSI plugins that support staging (like ceph-csi). If there are namespaces with CSI volumes that have the same ID, then when jobs from those namespace get scheduled on the same node the staging mount path for those volumes will collide.

Reproduction steps

  1. Deploy CSI plugin that supports staging (for example ceph-csi)
  2. Create namespace A and B
  3. Create CSI volume with ID testvol in both namespaces
  4. Run a job in namespace A that mounts testvol and is constrained to a certain node
  5. Run a job in namespace B that mounts testvol and is constrained to the same node as job in namespace A

Expected Result

Both jobs should run successfully and each have access to the testvol CSI volume from their respective namespace.

Actual Result

I only tested with multi-node-multi-writer (CephFS volume) and the result was that testvol from namespace A was bind-mounted to per-alloc directory for an allocation from a job from namespace B. This is a potential security issue.

The staging path looks like $NOMAD_DATA_DIR/client/csi/node/$CSI_PLUGIN_ID/staging/testvol/rw-file-system-multi-node-multi-writer. The namespace is not included in the path.

Job file (if appropriate)

Nomad Server logs (if appropriate)

Nomad Client logs (if appropriate)

@tgross tgross added this to Needs Triage in Nomad - Community Issues Triage via automation Oct 12, 2023
@the-nando
Copy link
Contributor

the-nando commented Oct 12, 2023

I've tested it with csi-efs, which doesn't use StageVolume and it works as expected in a similar setup as the OP.
According to the CSI specs for NodeStageVolume:

// The path to which the volume MAY be staged. It MUST be an
// absolute path in the root filesystem of the process serving this
// request, and MUST be a directory. The CO SHALL ensure that there
// is only one `staging_target_path` per volume.

So the issue seems to be originating from https://github.com/hashicorp/nomad/blob/v1.6.2/client/pluginmanager/csimanager/volume.go#L170 which doesn't include the namespace, if any:

pluginStagingPath := v.stagingDirForVolume(v.containerMountPoint, vol.ID, usage)

@jrasell
Copy link
Member

jrasell commented Oct 16, 2023

Hi @the-nando and @ygg-drop for raising this issue, I'll add this to our roadmapping list.

@jrasell jrasell added theme/storage stage/accepted Confirmed, and intend to work on. No timeline committment though. labels Oct 16, 2023
@jrasell jrasell moved this from Needs Triage to Needs Roadmapping in Nomad - Community Issues Triage Oct 16, 2023
@ron-savoia
Copy link
Contributor

I've tested this in my lab and see the same results with the staging directory. I also noticed another oddity when creating volumes in separate namespaces where the volume name is the same. When the name is the same for two volumes in separate nomad namespaces only one volume was created on the storage side, which was accessable by both jobs in different namespaces. Output from both scenarios are below.

Same ID in Volume file

Volume 1

# vol1.hcl
id = "testvol"
namespace = "A"
name = "vol-ceph-1g-A"
type = "csi"
plugin_id = "ceph-csi"
capacity_max = "1G"
capacity_min = "1G"
capability {
  access_mode     = "single-node-writer"
  attachment_mode = "file-system"
}

mount_options {
  fs_type     = "ext4"
  mount_flags = ["noatime"]
}

secrets {
  userID  = "nomad"
  userKey = "AQA6vi5lcqAmExAADSkeWFyTfrldI9rPhfview=="
}

parameters {
  clusterID = "71bd4b18-6cf7-11ee-99e0-776f051e8016"
  pool = "nomad"
  imageFeatures = "layering"
}

Volume 2

# vol2.hcl
id = "testvol"
namespace = "B"
name = "vol-ceph-1g-B"
type = "csi"
plugin_id = "ceph-csi"
capacity_max = "1G"
capacity_min = "1G"
capability {
  access_mode     = "single-node-writer"
  attachment_mode = "file-system"
}

mount_options {
  fs_type     = "ext4"
  mount_flags = ["noatime"]
}

secrets {
  userID  = "nomad"
  userKey = "AQA6vi5lcqAmExAADSkeWFyTfrldI9rPhfview=="
}

parameters {
  clusterID = "71bd4b18-6cf7-11ee-99e0-776f051e8016"
  pool = "nomad"
  imageFeatures = "layering"
}

VOLUME1 - NS A

root@nomad-server-1:/home/vagrant# nomad volume create vol1.hcl
Created external volume 0001-0024-71bd4b18-6cf7-11ee-99e0-776f051e8016-000000000000000f-2926129f-6e90-11ee-b829-0242ac110002 with ID testvol

root@nomad-server-1:/home/vagrant# nomad volume status -namespace="A"
Container Storage Interface
ID       Name           Namespace  Plugin ID  Schedulable  Access Mode
testvol  vol-ceph-1g-A  A          ceph-csi   true         <none>

root@ceph1:/home/ron# rbd ls nomad
csi-vol-2926129f-6e90-11ee-b829-0242ac110002

VOLUME2 - NS B

root@nomad-server-1:/home/vagrant# nomad volume create vol2.hcl
Created external volume 0001-0024-71bd4b18-6cf7-11ee-99e0-776f051e8016-000000000000000f-2fbb5f27-6e90-11ee-b829-0242ac110002 with ID testvol

root@nomad-server-1:/home/vagrant# nomad volume status -namespace="B"
Container Storage Interface
ID       Name           Namespace  Plugin ID  Schedulable  Access Mode
testvol  vol-ceph-1g-B  B          ceph-csi   true         <none>

root@ceph1:/home/ron# rbd ls nomad
csi-vol-2926129f-6e90-11ee-b829-0242ac110002
csi-vol-2fbb5f27-6e90-11ee-b829-0242ac110002

Job 1

root@nomad-server-1:/home/vagrant# nomad run job1.nomad
==> 2023-10-20T14:55:37Z: Monitoring evaluation "ad6b5ccd"
    2023-10-20T14:55:37Z: Evaluation triggered by job "busybox1"
    2023-10-20T14:55:37Z: Allocation "e07db70a" created: node "d73d5317", group "bbox1"
    2023-10-20T14:55:38Z: Evaluation within deployment: "5325ebf3"
    2023-10-20T14:55:38Z: Evaluation status changed: "pending" -> "complete"
==> 2023-10-20T14:55:38Z: Evaluation "ad6b5ccd" finished with status "complete"
==> 2023-10-20T14:55:38Z: Monitoring deployment "5325ebf3"
  ✓ Deployment "5325ebf3" successful

    2023-10-20T14:55:55Z
    ID          = 5325ebf3
    Job ID      = busybox1
    Job Version = 0
    Status      = successful
    Description = Deployment completed successfully

    Deployed
    Task Group  Desired  Placed  Healthy  Unhealthy  Progress Deadline
    bbox1       1        1       1        0          2023-10-20T15:05:54Z

$ nomad alloc exec -i -t -task busybox1 e07db70a sh
/ # 
/ # ls
alloc    dev      home     lib64    proc     secrets  tmp      var
bin      etc      lib      local    root     sys      usr
/ # cd alloc/
/alloc # ls
data  logs  test  tmp
/alloc # cd test/
/alloc/test # ls
lost+found
/alloc/test # touch job1
/alloc/test # 

root@nomad-client-1:/opt/nomad/data/client/csi# tree
.
├── controller
│   └── ceph-csi
├── node
│   └── ceph-csi
│       ├── per-alloc
│       │   └── e07db70a-c6ab-c65b-ae67-141fc0296938
│       │       └── testvol
│       │           └── rw-file-system-single-node-writer
│       │               ├── job1
│       │               └── lost+found
│       └── staging
│           └── testvol
│               └── rw-file-system-single-node-writer
│                   ├── 0001-0024-71bd4b18-6cf7-11ee-99e0-776f051e8016-000000000000000f-2926129f-6e90-11ee-b829-0242ac110002
│                   │   ├── job1
│                   │   └── lost+found
│                   └── image-meta.json
└── plugins
    ├── 660c73a3-ad59-f405-eaee-7ef1326eb593
    │   └── csi.sock
    └── 8a4b4786-52ba-6234-249c-f0019137291a
        └── csi.sock

17 directories, 5 files

JOB2

root@nomad-server-1:/home/vagrant# nomad run job2.nomad
==> 2023-10-20T14:57:54Z: Monitoring evaluation "d3298440"
    2023-10-20T14:57:54Z: Evaluation triggered by job "busybox2"
    2023-10-20T14:57:54Z: Allocation "4e504fd3" created: node "d73d5317", group "bbox2"
    2023-10-20T14:57:55Z: Evaluation within deployment: "a0f456e3"
    2023-10-20T14:57:55Z: Evaluation status changed: "pending" -> "complete"
==> 2023-10-20T14:57:55Z: Evaluation "d3298440" finished with status "complete"
==> 2023-10-20T14:57:55Z: Monitoring deployment "a0f456e3"
  ✓ Deployment "a0f456e3" successful

    2023-10-20T14:58:10Z
    ID          = a0f456e3
    Job ID      = busybox2
    Job Version = 0
    Status      = successful
    Description = Deployment completed successfully

    Deployed
    Task Group  Desired  Placed  Healthy  Unhealthy  Progress Deadline
    bbox2       1        1       1        0          2023-10-20T15:08:08Z


$ nomad alloc exec -i -t -task busybox2 4e504fd3 sh
/ # 
/ # cd /alloc/test/
/alloc/test # ls
lost+found
/alloc/test # touch job2
/alloc/test # 

root@nomad-client-1:/opt/nomad/data/client/csi# tree
.
├── controller
│   └── ceph-csi
├── node
│   └── ceph-csi
│       ├── per-alloc
│       │   ├── 4e504fd3-7205-8f54-71cd-243d2a506a7f
│       │   │   └── testvol
│       │   │       └── rw-file-system-single-node-writer
│       │   │           ├── job2
│       │   │           └── lost+found
│       │   └── e07db70a-c6ab-c65b-ae67-141fc0296938
│       │       └── testvol
│       │           └── rw-file-system-single-node-writer
│       │               ├── job1
│       │               └── lost+found
│       └── staging
│           └── testvol
│               └── rw-file-system-single-node-writer
│                   ├── 0001-0024-71bd4b18-6cf7-11ee-99e0-776f051e8016-000000000000000f-2926129f-6e90-11ee-b829-0242ac110002
│                   │   ├── job1
│                   │   └── lost+found
│                   ├── 0001-0024-71bd4b18-6cf7-11ee-99e0-776f051e8016-000000000000000f-2fbb5f27-6e90-11ee-b829-0242ac110002
│                   │   ├── job2
│                   │   └── lost+found
│                   └── image-meta.json
└── plugins
    ├── 660c73a3-ad59-f405-eaee-7ef1326eb593
    │   └── csi.sock
    └── 8a4b4786-52ba-6234-249c-f0019137291a  
        └── csi.sock

23 directories, 7 files

Same Name in Volume file

Volume 1

# vol1.hcl
id = "vol-ceph-1g-A"
name = "vol-ceph-1g"
type = "csi"
plugin_id = "ceph-csi"
capacity_max = "1G"
capacity_min = "1G"
capability {
  access_mode     = "single-node-writer"
  attachment_mode = "file-system"
}

mount_options {
  fs_type     = "ext4"
  mount_flags = ["noatime"]
}

secrets {
  userID  = "dan"
  userKey = "AQCIayllDOWPGBAAVbfb38ng6jksv9+7DTQ6wA=="
}

parameters {
  clusterID = "2919d323-dd87-426b-b3d9-a7b2b84bd156"
  pool = "nomad"
  imageFeatures = "layering"
}

Volume 2

# vol2.hcl
id = "vol-ceph-1g-B"
name = "vol-ceph-1g"
type = "csi"
plugin_id = "ceph-csi"
capacity_max = "1G"
capacity_min = "1G"
capability {
  access_mode     = "single-node-writer"
  attachment_mode = "file-system"
}

mount_options {
  fs_type     = "ext4"
  mount_flags = ["noatime"]
}

secrets {
  userID  = "dan"
  userKey = "AQCIayllDOWPGBAAVbfb38ng6jksv9+7DTQ6wA=="
}

parameters {
  clusterID = "2919d323-dd87-426b-b3d9-a7b2b84bd156"
  pool = "nomad"
  imageFeatures = "layering"
}

VOLUME1 - NS A

root@nomad-client-1:/home/vagrant# ls
vol1.hcl  vol2.hcl  vol_reg.hcl
root@nomad-client-1:/home/vagrant# date; nomad volume create -namespace=A vol1.hcl
Fri 13 Oct 2023 07:40:51 PM UTC
Created external volume 0001-0024-2919d323-dd87-426b-b3d9-a7b2b84bd156-0000000000000002-c2b6f110-5924-459f-8695-4fec5d008f4a with ID vol-ceph-1g-A
root@nomad-client-1:/home/vagrant# date; nomad volume status -namespace=A vol-ceph-1g-A
Fri 13 Oct 2023 07:40:57 PM UTC
ID                   = vol-ceph-1g-A
Name                 = vol-ceph-1g
Namespace            = A
External ID          = 0001-0024-2919d323-dd87-426b-b3d9-a7b2b84bd156-0000000000000002-c2b6f110-5924-459f-8695-4fec5d008f4a
Plugin ID            = ceph-csi
Provider             = rbd.csi.ceph.com
Version              = canary
Schedulable          = true
Controllers Healthy  = 1
Controllers Expected = 1
Nodes Healthy        = 1
Nodes Expected       = 1
Access Mode          = <none>
Attachment Mode      = <none>
Mount Options        = fs_type: ext4 flags: [REDACTED]
Namespace            = A

Allocations
No allocations placed

root@nomad-client-2:/home/vagrant# rbd ls nomad
csi-vol-c2b6f110-5924-459f-8695-4fec5d008f4a

VOLUME2 - NS B

root@nomad-client-1:/home/vagrant# date; nomad volume create -namespace=B vol2.hcl
Fri 13 Oct 2023 07:41:34 PM UTC
Created external volume 0001-0024-2919d323-dd87-426b-b3d9-a7b2b84bd156-0000000000000002-c2b6f110-5924-459f-8695-4fec5d008f4a with ID vol-ceph-1g-B
root@nomad-client-1:/home/vagrant# date; nomad volume status -namespace=B vol-ceph-1g-B
Fri 13 Oct 2023 07:41:42 PM UTC
ID                   = vol-ceph-1g-B
Name                 = vol-ceph-1g
Namespace            = B
External ID          = 0001-0024-2919d323-dd87-426b-b3d9-a7b2b84bd156-0000000000000002-c2b6f110-5924-459f-8695-4fec5d008f4a
Plugin ID            = ceph-csi
Provider             = rbd.csi.ceph.com
Version              = canary
Schedulable          = true
Controllers Healthy  = 1
Controllers Expected = 1
Nodes Healthy        = 1
Nodes Expected       = 1
Access Mode          = <none>
Attachment Mode      = <none>
Mount Options        = fs_type: ext4 flags: [REDACTED]
Namespace            = B

Allocations
No allocations placed

root@nomad-client-2:/home/vagrant# rbd ls nomad
csi-vol-c2b6f110-5924-459f-8695-4fec5d008f4a

Controller log during volumes create

I1013 19:40:50.613909       1 utils.go:195] ID: 1813 GRPC call: /csi.v1.Identity/Probe
I1013 19:40:50.613958       1 utils.go:206] ID: 1813 GRPC request: {}
I1013 19:40:50.614018       1 utils.go:212] ID: 1813 GRPC response: {}
I1013 19:40:51.041177       1 utils.go:195] ID: 1814 Req-ID: vol-ceph-1g GRPC call: /csi.v1.Controller/CreateVolume
I1013 19:40:51.041311       1 utils.go:206] ID: 1814 Req-ID: vol-ceph-1g GRPC request: {"accessibility_requirements":{},"capacity_range":{"limit_bytes":1000000000,"required_bytes":1000000000},"name":"vol-ceph-1g","parameters":{"clusterID":"2919d323-dd87-426b-b3d9-a7b2b84bd156","imageFeatures":"layering","pool":"nomad"},"secrets":"***stripped***","volume_capabilities":[{"AccessType":{"Mount":{"fs_type":"ext4","mount_flags":["noatime"]}},"access_mode":{"mode":1}}]}
I1013 19:40:51.041402       1 rbd_util.go:1309] ID: 1814 Req-ID: vol-ceph-1g setting disableInUseChecks: false image features: [layering] mounter: rbd
I1013 19:40:51.043848       1 omap.go:88] ID: 1814 Req-ID: vol-ceph-1g got omap values: (pool="nomad", namespace="", name="csi.volumes.nomad-client-1-controller"): map[]
I1013 19:40:51.051210       1 omap.go:158] ID: 1814 Req-ID: vol-ceph-1g set omap keys (pool="nomad", namespace="", name="csi.volumes.nomad-client-1-controller"): map[csi.volume.vol-ceph-1g:c2b6f110-5924-459f-8695-4fec5d008f4a])
I1013 19:40:51.051825       1 omap.go:158] ID: 1814 Req-ID: vol-ceph-1g set omap keys (pool="nomad", namespace="", name="csi.volume.c2b6f110-5924-459f-8695-4fec5d008f4a"): map[csi.imagename:csi-vol-c2b6f110-5924-459f-8695-4fec5d008f4a csi.volname:vol-ceph-1g])
I1013 19:40:51.051839       1 rbd_journal.go:490] ID: 1814 Req-ID: vol-ceph-1g generated Volume ID (0001-0024-2919d323-dd87-426b-b3d9-a7b2b84bd156-0000000000000002-c2b6f110-5924-459f-8695-4fec5d008f4a) and image name (csi-vol-c2b6f110-5924-459f-8695-4fec5d008f4a) for request name (vol-ceph-1g)
I1013 19:40:51.051879       1 rbd_util.go:423] ID: 1814 Req-ID: vol-ceph-1g rbd: create nomad/csi-vol-c2b6f110-5924-459f-8695-4fec5d008f4a size 954M (features: [layering]) using mon 172.16.45.132
I1013 19:40:51.051890       1 rbd_util.go:1557] ID: 1814 Req-ID: vol-ceph-1g setting image options on nomad/csi-vol-c2b6f110-5924-459f-8695-4fec5d008f4a
I1013 19:40:51.064933       1 controllerserver.go:743] ID: 1814 Req-ID: vol-ceph-1g created image nomad/csi-vol-c2b6f110-5924-459f-8695-4fec5d008f4a backed for request name vol-ceph-1g
I1013 19:40:51.074451       1 omap.go:158] ID: 1814 Req-ID: vol-ceph-1g set omap keys (pool="nomad", namespace="", name="csi.volume.c2b6f110-5924-459f-8695-4fec5d008f4a"): map[csi.imageid:105826efcd7a])
I1013 19:40:51.074571       1 utils.go:212] ID: 1814 Req-ID: vol-ceph-1g GRPC response: {"volume":{"capacity_bytes":1000341504,"volume_context":{"clusterID":"2919d323-dd87-426b-b3d9-a7b2b84bd156","imageFeatures":"layering","imageName":"csi-vol-c2b6f110-5924-459f-8695-4fec5d008f4a","journalPool":"nomad","pool":"nomad"},"volume_id":"0001-0024-2919d323-dd87-426b-b3d9-a7b2b84bd156-0000000000000002-c2b6f110-5924-459f-8695-4fec5d008f4a"}}
I1013 19:40:52.500763       1 utils.go:195] ID: 1815 GRPC call: /csi.v1.Identity/Probe
I1013 19:40:52.500808       1 utils.go:206] ID: 1815 GRPC request: {}
I1013 19:40:52.500830       1 utils.go:212] ID: 1815 GRPC response: {}
I1013 19:40:52.502183       1 utils.go:195] ID: 1816 GRPC call: /csi.v1.Controller/ControllerGetCapabilities
I1013 19:40:52.502212       1 utils.go:206] ID: 1816 GRPC request: {}
I1013 19:40:52.502221       1 controllerserver-default.go:72] ID: 1816 Using default ControllerGetCapabilities
I1013 19:40:52.504936       1 utils.go:212] ID: 1816 GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":5}}},{"Type":{"Rpc":{"type":7}}},{"Type":{"Rpc":{"type":9}}}]}
I1013 19:41:20.616355       1 utils.go:195] ID: 1817 GRPC call: /csi.v1.Identity/Probe
I1013 19:41:20.616421       1 utils.go:206] ID: 1817 GRPC request: {}
I1013 19:41:20.616446       1 utils.go:212] ID: 1817 GRPC response: {}
I1013 19:41:22.509589       1 utils.go:195] ID: 1818 GRPC call: /csi.v1.Identity/Probe
I1013 19:41:22.509619       1 utils.go:206] ID: 1818 GRPC request: {}
I1013 19:41:22.509634       1 utils.go:212] ID: 1818 GRPC response: {}
I1013 19:41:22.509988       1 utils.go:195] ID: 1819 GRPC call: /csi.v1.Controller/ControllerGetCapabilities
I1013 19:41:22.510023       1 utils.go:206] ID: 1819 GRPC request: {}
I1013 19:41:22.510030       1 controllerserver-default.go:72] ID: 1819 Using default ControllerGetCapabilities
I1013 19:41:22.510102       1 utils.go:212] ID: 1819 GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":5}}},{"Type":{"Rpc":{"type":7}}},{"Type":{"Rpc":{"type":9}}}]}
I1013 19:41:34.883635       1 utils.go:195] ID: 1820 Req-ID: vol-ceph-1g GRPC call: /csi.v1.Controller/CreateVolume
I1013 19:41:34.883798       1 utils.go:206] ID: 1820 Req-ID: vol-ceph-1g GRPC request: {"accessibility_requirements":{},"capacity_range":{"limit_bytes":1000000000,"required_bytes":1000000000},"name":"vol-ceph-1g","parameters":{"clusterID":"2919d323-dd87-426b-b3d9-a7b2b84bd156","imageFeatures":"layering","pool":"nomad"},"secrets":"***stripped***","volume_capabilities":[{"AccessType":{"Mount":{"fs_type":"ext4","mount_flags":["noatime"]}},"access_mode":{"mode":1}}]}
I1013 19:41:34.883932       1 rbd_util.go:1309] ID: 1820 Req-ID: vol-ceph-1g setting disableInUseChecks: false image features: [layering] mounter: rbd
I1013 19:41:34.885933       1 omap.go:88] ID: 1820 Req-ID: vol-ceph-1g got omap values: (pool="nomad", namespace="", name="csi.volumes.nomad-client-1-controller"): map[csi.volume.vol-ceph-1g:c2b6f110-5924-459f-8695-4fec5d008f4a]
I1013 19:41:34.887524       1 omap.go:88] ID: 1820 Req-ID: vol-ceph-1g got omap values: (pool="nomad", namespace="", name="csi.volume.c2b6f110-5924-459f-8695-4fec5d008f4a"): map[csi.imageid:105826efcd7a csi.imagename:csi-vol-c2b6f110-5924-459f-8695-4fec5d008f4a csi.volname:vol-ceph-1g]
I1013 19:41:34.899661       1 rbd_journal.go:345] ID: 1820 Req-ID: vol-ceph-1g found existing volume (0001-0024-2919d323-dd87-426b-b3d9-a7b2b84bd156-0000000000000002-c2b6f110-5924-459f-8695-4fec5d008f4a) with image name (csi-vol-c2b6f110-5924-459f-8695-4fec5d008f4a) for request (vol-ceph-1g)
I1013 19:41:34.899820       1 utils.go:212] ID: 1820 Req-ID: vol-ceph-1g GRPC response: {"volume":{"capacity_bytes":1000341504,"volume_context":{"clusterID":"2919d323-dd87-426b-b3d9-a7b2b84bd156","imageFeatures":"layering","imageName":"csi-vol-c2b6f110-5924-459f-8695-4fec5d008f4a","journalPool":"nomad","pool":"nomad"},"volume_id":"0001-0024-2919d323-dd87-426b-b3d9-a7b2b84bd156-0000000000000002-c2b6f110-5924-459f-8695-4fec5d008f4a"}}
I1013 19:41:50.620247       1 utils.go:195] ID: 1821 GRPC call: /csi.v1.Identity/Probe
I1013 19:41:50.620376       1 utils.go:206] ID: 1821 GRPC request: {}
I1013 19:41:50.620430       1 utils.go:212] ID: 1821 GRPC response: {}
I1013 19:41:52.511307       1 utils.go:195] ID: 1822 GRPC call: /csi.v1.Identity/Probe
I1013 19:41:52.511364       1 utils.go:206] ID: 1822 GRPC request: {}
I1013 19:41:52.511387       1 utils.go:212] ID: 1822 GRPC response: {}
I1013 19:41:52.512331       1 utils.go:195] ID: 1823 GRPC call: /csi.v1.Controller/ControllerGetCapabilities
I1013 19:41:52.512352       1 utils.go:206] ID: 1823 GRPC request: {}
I1013 19:41:52.512361       1 controllerserver-default.go:72] ID: 1823 Using default ControllerGetCapabilities
I1013 19:41:52.512451       1 utils.go:212] ID: 1823 GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":5}}},{"Type":{"Rpc":{"type":7}}},{"Type":{"Rpc":{"type":9}}}]}
I1013 19:42:20.624092       1 utils.go:195] ID: 1824 GRPC call: /csi.v1.Identity/Probe
I1013 19:42:20.624139       1 utils.go:206] ID: 1824 GRPC request: {}
I1013 19:42:20.624167       1 utils.go:212] ID: 1824 GRPC response: {}
I1013 19:42:22.513491       1 utils.go:195] ID: 1825 GRPC call: /csi.v1.Identity/Probe
I1013 19:42:22.513516       1 utils.go:206] ID: 1825 GRPC request: {}
I1013 19:42:22.513529       1 utils.go:212] ID: 1825 GRPC response: {}
I1013 19:42:22.514307       1 utils.go:195] ID: 1826 GRPC call: /csi.v1.Controller/ControllerGetCapabilities
I1013 19:42:22.514319       1 utils.go:206] ID: 1826 GRPC request: {}
I1013 19:42:22.514325       1 controllerserver-default.go:72] ID: 1826 Using default ControllerGetCapabilities
I1013 19:42:22.514382       1 utils.go:212] ID: 1826 GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":5}}},{"Type":{"Rpc":{"type":7}}},{"Type":{"Rpc":{"type":9}}}]}

Job 1 run and file created in volume

root@nomad-server-1:/home/vagrant# date; nomad run job1.nomad
Fri 13 Oct 2023 07:43:32 PM UTC
==> 2023-10-13T19:43:32Z: Monitoring evaluation "27c0ac72"
    2023-10-13T19:43:32Z: Evaluation triggered by job "mysql-busybox1"
    2023-10-13T19:43:33Z: Evaluation within deployment: "879eb23e"
    2023-10-13T19:43:33Z: Allocation "cfa1adfd" created: node "ad7dc74c", group "mysql1"
    2023-10-13T19:43:33Z: Evaluation status changed: "pending" -> "complete"
==> 2023-10-13T19:43:33Z: Evaluation "27c0ac72" finished with status "complete"
==> 2023-10-13T19:43:33Z: Monitoring deployment "879eb23e"
  ✓ Deployment "879eb23e" successful

    2023-10-13T19:43:45Z
    ID          = 879eb23e
    Job ID      = mysql-busybox1
    Job Version = 2
    Status      = successful
    Description = Deployment completed successfully

    Deployed
    Task Group  Desired  Placed  Healthy  Unhealthy  Progress Deadline
    mysql1      1        1       1        0          2023-10-13T19:53:44Z

$ nomad alloc exec -i -t -task busybox1 cfa1adfd sh
/ # 
/ # ls
alloc    dev      home     lib64    proc     secrets  tmp      var
bin      etc      lib      local    root     sys      usr
/ # cd alloc/
/alloc # ls
data  logs  test  tmp
/alloc # cd test/
/alloc/test # ls
lost+found
/alloc/test # touch job1
/alloc/test # 

Job 2 run and ls of its volume

root@nomad-server-1:/home/vagrant# date; nomad run job2.nomad
Fri 13 Oct 2023 07:44:41 PM UTC
==> 2023-10-13T19:44:41Z: Monitoring evaluation "4c2d2cec"
    2023-10-13T19:44:41Z: Evaluation triggered by job "mysql-busybox2"
    2023-10-13T19:44:41Z: Allocation "82c8ac43" created: node "ad7dc74c", group "mysql2"
    2023-10-13T19:44:42Z: Evaluation within deployment: "3183da2c"
    2023-10-13T19:44:42Z: Evaluation status changed: "pending" -> "complete"
==> 2023-10-13T19:44:42Z: Evaluation "4c2d2cec" finished with status "complete"
==> 2023-10-13T19:44:42Z: Monitoring deployment "3183da2c"
  ✓ Deployment "3183da2c" successful

    2023-10-13T19:44:53Z
    ID          = 3183da2c
    Job ID      = mysql-busybox2
    Job Version = 2
    Status      = successful
    Description = Deployment completed successfully

    Deployed
    Task Group  Desired  Placed  Healthy  Unhealthy  Progress Deadline
    mysql2      1        1       1        0          2023-10-13T19:54:52Z

$ nomad alloc exec -i -t -task busybox2 82c8ac43 sh
/ # 
/ # ls
alloc    dev      home     lib64    proc     secrets  tmp      var
bin      etc      lib      local    root     sys      usr
/ # cd alloc/
/alloc # ls
data  logs  test  tmp
/alloc # cd test/
/alloc/test # ls
job1        lost+found
/alloc/test # 

ceph-node_vol-same-name.log.zip

@tgross
Copy link
Member

tgross commented May 3, 2024

Hi folks, just a heads up that I'm picking this up (as well as #20424). But @ron-savoia I've split out #20530 around the name field collision.

tgross added a commit that referenced this issue May 3, 2024
CSI volumes are are namespaced. But the client does not include the namespace in
the staging mount path. This causes CSI volumes with the same volume ID but
different namespace to collide if they happen to be placed on the same host.

Fixes: #18741
tgross added a commit that referenced this issue May 3, 2024
CSI volumes are are namespaced. But the client does not include the namespace in
the staging mount path. This causes CSI volumes with the same volume ID but
different namespace to collide if they happen to be placed on the same host.

Fixes: #18741
@tgross
Copy link
Member

tgross commented May 3, 2024

Initial draft PR is up here #20532. I think the upgrade path ends up being ok, but I need to do some end-to-end testing to verify that before marking this ready for review. Will do that testing early next week.

@tgross tgross moved this from Needs Roadmapping to In Progress in Nomad - Community Issues Triage May 3, 2024
@tgross
Copy link
Member

tgross commented May 7, 2024

Upgrade testing didn't go so well, and I've at least broken unstaging when clients are upgraded before servers (which isn't the recommended upgrade path but we want to handle it gracefully). Going to do some test code rework that'll help debug this plus #20424 in more fine detail.

tgross added a commit that referenced this issue May 8, 2024
CSI volumes are are namespaced. But the client does not include the namespace in
the staging mount path. This causes CSI volumes with the same volume ID but
different namespace to collide if they happen to be placed on the same host.

Fixes: #18741
tgross added a commit that referenced this issue May 8, 2024
CSI volumes are are namespaced. But the client does not include the namespace in
the staging mount path. This causes CSI volumes with the same volume ID but
different namespace to collide if they happen to be placed on the same host.

Fixes: #18741
@tgross
Copy link
Member

tgross commented May 8, 2024

After a bit of rework I've tested the upgrade path from 1.8.0-beta.1 to the patch I've got in #20532. Looks like this should work now and I'll mark it ready for review. Test details below.


Existing behavior

First, I started with 1.8.0-beta.1 and a running allocation that consumes a CSI volume. I see the following filesystem and mounts.

$ sudo tree /var/nomad/data/client/csi
/var/nomad/data/client/csi
├── controller
│   └── org.democratic-csi.nfs
├── node
│   └── org.democratic-csi.nfs
│       ├── per-alloc
│       │   └── 55113309-8135-fd80-112b-d9f0f2c4cc6f # consuming alloc
│       │       └── csi-volume-nfs
│       │           └── rw-file-system-single-node-writer # per-alloc mount point
│       │               └── test.txt
│       └── staging
│           └── csi-volume-nfs                        # staging is not namespaced
│               └── rw-file-system-single-node-writer # staging mount point
│                   └── test.txt
└── plugins
    ├── 13bf0a2a-7866-7ded-8436-2c53f1268a41
    │   └── csi.sock
    └── 60e62cfc-0bbd-19ff-8f4d-a97b8e17d5cd
        └── csi.sock

14 directories, 4 files

$ mount | grep csi-volume-nfs
192.168.1.170:/srv/nfs_data/v/csi-volume-nfs on /var/nomad/data/client/csi/node/org.democratic-csi.nfs/staging/csi-volume-nfs/rw-file-system-single-node-writer type nfs4 (rw,noatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.37.105.17,local_lock=none,addr=192.168.1.170)
192.168.1.170:/srv/nfs_data/v/csi-volume-nfs on /var/nomad/data/client/csi/node/org.democratic-csi.nfs/per-alloc/55113309-8135-fd80-112b-d9f0f2c4cc6f/csi-volume-nfs/rw-file-system-single-node-writer type nfs4 (rw,noatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.37.105.17,local_lock=none,addr=192.168.1.170)

I then stop the job, so that we have a baseline behavior.

$ sudo tree /var/nomad/data/client/csi
/var/nomad/data/client/csi
├── controller
│   └── org.democratic-csi.nfs
├── node
│   └── org.democratic-csi.nfs
│       ├── per-alloc
│       │   └── 55113309-8135-fd80-112b-d9f0f2c4cc6f
│       │       └── csi-volume-nfs # mount is gone
│       └── staging
│           └── csi-volume-nfs # mount is gone
└── plugins
    ├── 13bf0a2a-7866-7ded-8436-2c53f1268a41
    │   └── csi.sock
    └── 60e62cfc-0bbd-19ff-8f4d-a97b8e17d5cd
        └── csi.sock

12 directories, 2 files

$ mount | grep csi-volume-nfs

Note that this leaves behind the parent directories of the mount points. That's mostly harmless but not very tidy, so I've opened #20544 to follow-up on fixing that.

Client Upgrade

The first test is upgrading the client first (although this isn't our recommended approach).

First I run the job, and I see the following filesystem and mounts.

$ sudo tree /var/nomad/data/client/csi
/var/nomad/data/client/csi
├── controller
│   └── org.democratic-csi.nfs
├── node
│   └── org.democratic-csi.nfs
│       ├── per-alloc
│       │   ├── 55113309-8135-fd80-112b-d9f0f2c4cc6f # old job version
│       │   │   └── csi-volume-nfs
│       │   └── 65a7fc03-2205-7f0d-1e38-a56e39e68ac0 # new alloc
│       │       └── csi-volume-nfs
│       │           └── rw-file-system-single-node-writer # mount point
│       │               └── test.txt
│       └── staging
│           └── csi-volume-nfs                        # staging is not namespaced
│               └── rw-file-system-single-node-writer # mount point
│                   └── test.txt
└── plugins
    ├── 13bf0a2a-7866-7ded-8436-2c53f1268a41
    │   └── csi.sock
    └── 60e62cfc-0bbd-19ff-8f4d-a97b8e17d5cd
        └── csi.sock

16 directories, 4 files

$ mount | grep csi-volume-nfs
192.168.1.170:/srv/nfs_data/v/csi-volume-nfs on /var/nomad/data/client/csi/node/org.democratic-csi.nfs/staging/csi-volume-nfs/rw-file-system-single-node-writer type nfs4 (rw,noatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.37.105.17,local_lock=none,addr=192.168.1.170)
192.168.1.170:/srv/nfs_data/v/csi-volume-nfs on /var/nomad/data/client/csi/node/org.democratic-csi.nfs/per-alloc/65a7fc03-2205-7f0d-1e38-a56e39e68ac0/csi-volume-nfs/rw-file-system-single-node-writer type nfs4 (rw,noatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.37.105.17,local_lock=none,addr=192.168.1.170)

Then I upgrade the client from 1.8.0-beta.1 to the patch in #20532 and restart. I checked the filesystem and mounts are unchanged after restoring the allocation (as expected).

Then I stopped the job, and see that the old mounts and paths are cleaned up just as before (the claim is also released):

$ sudo tree /var/nomad/data/client/csi
/var/nomad/data/client/csi
├── controller
│   └── org.democratic-csi.nfs
├── node
│   └── org.democratic-csi.nfs
│       ├── per-alloc
│       │   ├── 55113309-8135-fd80-112b-d9f0f2c4cc6f
│       │   │   └── csi-volume-nfs
│       │   └── 65a7fc03-2205-7f0d-1e38-a56e39e68ac0
│       │       └── csi-volume-nfs
│       └── staging
│           └── csi-volume-nfs
└── plugins
    ├── 13bf0a2a-7866-7ded-8436-2c53f1268a41
    │   └── csi.sock
    └── 60e62cfc-0bbd-19ff-8f4d-a97b8e17d5cd
        └── csi.sock

14 directories, 2 files

$ mount | grep csi-volume-nfs

Then I started the job again, and see that staging is properly namespaced.

$ sudo tree /var/nomad/data/client/csi
/var/nomad/data/client/csi
├── controller
│   └── org.democratic-csi.nfs
├── node
│   └── org.democratic-csi.nfs
│       ├── per-alloc
│       │   ├── 55113309-8135-fd80-112b-d9f0f2c4cc6f # from previous alloc
│       │   │   └── csi-volume-nfs
│       │   ├── 65a7fc03-2205-7f0d-1e38-a56e39e68ac0 # from previous alloc
│       │   │   └── csi-volume-nfs
│       │   └── a86e82ee-0d91-f48f-36ea-70c19d93fce8
│       │       └── csi-volume-nfs
│       │           └── rw-file-system-single-node-writer # mount point
│       │               └── test.txt
│       └── staging
│           ├── csi-volume-nfs # from previous alloc
│           └── prod           # staging is now namespaced
│               └── csi-volume-nfs
│                   └── rw-file-system-single-node-writer # mount point
│                       └── test.txt
└── plugins
    ├── 13bf0a2a-7866-7ded-8436-2c53f1268a41
    │   └── csi.sock
    └── 60e62cfc-0bbd-19ff-8f4d-a97b8e17d5cd
        └── csi.sock

20 directories, 4 files

$ mount | grep csi-volume-nfs
192.168.1.170:/srv/nfs_data/v/csi-volume-nfs on /var/nomad/data/client/csi/node/org.democratic-csi.nfs/staging/prod/csi-volume-nfs/rw-file-system-single-node-writer type nfs4 (rw,noatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.37.105.17,local_lock=none,addr=192.168.1.170)
192.168.1.170:/srv/nfs_data/v/csi-volume-nfs on /var/nomad/data/client/csi/node/org.democratic-csi.nfs/per-alloc/a86e82ee-0d91-f48f-36ea-70c19d93fce8/csi-volume-nfs/rw-file-system-single-node-writer type nfs4 (rw,noatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.37.105.17,local_lock=none,addr=192.168.1.170)

Server Upgrade

Next I wiped the client and server and started over from a clean datadir and 1.8.0-beta.1. After deplying the job that consumes the volume, I have the following filesystem and mounts:

$ sudo tree /var/nomad/data/client/csi
/var/nomad/data/client/csi
├── controller
│   └── org.democratic-csi.nfs
├── node
│   └── org.democratic-csi.nfs
│       ├── per-alloc
│       │   └── d6a782ee-0623-4d7e-7633-51aaf04c8286
│       │       └── csi-volume-nfs
│       │           └── rw-file-system-single-node-writer
│       └── staging
│           └── csi-volume-nfs
│               └── rw-file-system-single-node-writer
└── plugins
    ├── 08ad88e1-e347-c0ed-c51b-4c4d5de9a0c5
    │   └── csi.sock
    └── 7929b8b4-9057-dc0f-bfec-f5d902501d0e
        └── csi.sock

14 directories, 2 files

$ mount | grep csi-volume-nfs
192.168.1.170:/srv/nfs_data/v/csi-volume-nfs on /var/nomad/data/client/csi/node/org.democratic-csi.nfs/staging/csi-volume-nfs/rw-file-system-single-node-writer type nfs4 (rw,noatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.37.105.17,local_lock=none,addr=192.168.1.170)
192.168.1.170:/srv/nfs_data/v/csi-volume-nfs on /var/nomad/data/client/csi/node/org.democratic-csi.nfs/per-alloc/d6a782ee-0623-4d7e-7633-51aaf04c8286/csi-volume-nfs/rw-file-system-single-node-writer type nfs4 (rw,noatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.37.105.17,local_lock=none,addr=192.168.1.170)

Then I upgrade the server to the patched version and restart it. Then, just to verify restore is good I restarted the client without upgrading. As expected, that's all good.

Next I stopped the job, and everything unmounted as expected. The volume claim was also freed.

$ sudo tree /var/nomad/data/client/csi
/var/nomad/data/client/csi
├── controller
│   └── org.democratic-csi.nfs
├── node
│   └── org.democratic-csi.nfs
│       ├── per-alloc
│       │   └── d6a782ee-0623-4d7e-7633-51aaf04c8286
│       │       └── csi-volume-nfs
│       └── staging
│           └── csi-volume-nfs
└── plugins
    ├── 08ad88e1-e347-c0ed-c51b-4c4d5de9a0c5
    │   └── csi.sock
    └── 7929b8b4-9057-dc0f-bfec-f5d902501d0e
        └── csi.sock

12 directories, 2 files

$ mount | grep csi-volume-nfs

Then I re-ran the job (now with upgraded server but non-upgraded client), and see the old client behavior is still safely in place.

$ sudo tree /var/nomad/data/client/csi
/var/nomad/data/client/csi
├── controller
│   └── org.democratic-csi.nfs
├── node
│   └── org.democratic-csi.nfs
│       ├── per-alloc
│       │   ├── 531da649-bd64-b692-dbed-ea93840808b7
│       │   │   └── csi-volume-nfs
│       │   │       └── rw-file-system-single-node-writer
│       │   └── d6a782ee-0623-4d7e-7633-51aaf04c8286
│       │       └── csi-volume-nfs
│       └── staging
│           └── csi-volume-nfs
│               └── rw-file-system-single-node-writer
└── plugins
    ├── 08ad88e1-e347-c0ed-c51b-4c4d5de9a0c5
    │   └── csi.sock
    └── 7929b8b4-9057-dc0f-bfec-f5d902501d0e
        └── csi.sock

16 directories, 2 files

$ mount | grep csi-volume-nfs
192.168.1.170:/srv/nfs_data/v/csi-volume-nfs on /var/nomad/data/client/csi/node/org.democratic-csi.nfs/staging/csi-volume-nfs/rw-file-system-single-node-writer type nfs4 (rw,noatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.37.105.17,local_lock=none,addr=192.168.1.170)
192.168.1.170:/srv/nfs_data/v/csi-volume-nfs on /var/nomad/data/client/csi/node/org.democratic-csi.nfs/per-alloc/531da649-bd64-b692-dbed-ea93840808b7/csi-volume-nfs/rw-file-system-single-node-writer type nfs4 (rw,noatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.37.105.17,local_lock=none,addr=192.168.1.170)

Lastly, I upgraded the client as well. I stopped the job, and restarted the job.

$ sudo tree /var/nomad/data/client/csi
/var/nomad/data/client/csi
├── controller
│   └── org.democratic-csi.nfs
├── node
│   └── org.democratic-csi.nfs
│       ├── per-alloc
│       │   ├── 531da649-bd64-b692-dbed-ea93840808b7 # old alloc
│       │   │   └── csi-volume-nfs
│       │   ├── 92ffa2bd-9399-29ad-99b0-748e3d7a0c4d # current alloc
│       │   │   └── csi-volume-nfs
│       │   │       └── rw-file-system-single-node-writer # mount point
│       │   └── d6a782ee-0623-4d7e-7633-51aaf04c8286 # old alloc
│       │       └── csi-volume-nfs
│       └── staging
│           ├── csi-volume-nfs # old staging
│           └── prod           # staging is now namespaced
│               └── csi-volume-nfs
│                   └── rw-file-system-single-node-writer # mount point
└── plugins
    ├── 08ad88e1-e347-c0ed-c51b-4c4d5de9a0c5
    │   └── csi.sock
    └── 7929b8b4-9057-dc0f-bfec-f5d902501d0e
        └── csi.sock

20 directories, 2 files

$ mount | grep csi-volume-nfs
192.168.1.170:/srv/nfs_data/v/csi-volume-nfs on /var/nomad/data/client/csi/node/org.democratic-csi.nfs/staging/prod/csi-volume-nfs/rw-file-system-single-node-writer type nfs4 (rw,noatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.37.105.17,local_lock=none,addr=192.168.1.170)
192.168.1.170:/srv/nfs_data/v/csi-volume-nfs on /var/nomad/data/client/csi/node/org.democratic-csi.nfs/per-alloc/92ffa2bd-9399-29ad-99b0-748e3d7a0c4d/csi-volume-nfs/rw-file-system-single-node-writer type nfs4 (rw,noatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.37.105.17,local_lock=none,addr=192.168.1.170)

@tgross
Copy link
Member

tgross commented May 8, 2024

Bah, I missed that the unstage code path for the new code isn't quite working as expected. Need to fix that. The plugin returns no error in that case but it's not unstaging because the unstage path for some reason doesn't include the namespace. I've had a pass through the code but it's not yet obvious why this is missing. Will pick that up tomorrow morning.

2024-05-08T16:41:47.043-0400 [TRACE] client.alloc_runner: running post-run hook: alloc_id=92ffa2bd-9399-29ad-99b0-748e3d7a0c4d name=csi_hook start="2024-05-08 16:41:47.043329149 -0400 EDT m=+460.582850122"
2024-05-08T16:41:47.043-0400 [TRACE] client.csi_manager.org.democratic-csi.nfs.volume_manager: unpublishing volume: alloc_id=92ffa2bd-9399-29ad-99b0-748e3d7a0c4d volume_id=csi-volume-nfs plugin_target_path=/local/csi/per-alloc/92ffa2bd-9399-29ad-99b0-748e3d7a0c4d/csi-volume-nfs/rw-file-system-single-node-writer
2024-05-08T16:41:47.053-0400 [TRACE] client.csi_manager.org.democratic-csi.nfs: finished client unary call: grpc.code=OK duration=10.523676ms grpc.service=csi.v1.Node grpc.method=NodeUnpublishVolume
2024-05-08T16:41:47.053-0400 [TRACE] client.csi_manager.org.democratic-csi.nfs.volume_manager: unstaging volume: alloc_id=92ffa2bd-9399-29ad-99b0-748e3d7a0c4d volume_id=csi-volume-nfs staging_path=/local/csi/staging/csi-volume-nfs/rw-file-system-single-node-writer
2024-05-08T16:41:47.058-0400 [TRACE] client.csi_manager.org.democratic-csi.nfs: finished client unary call: grpc.code=OK duration=4.768144ms grpc.service=csi.v1.Node grpc.method=NodeUnstageVolume
2024-05-08T16:41:47.064-0400 [TRACE] client.alloc_runner: finished post-run hooks: alloc_id=92ffa2bd-9399-29ad-99b0-748e3d7a0c4d name=csi_hook end="2024-05-08 16:41:47.064857476 -0400 EDT m=+460.604378469" duration=21.528347ms
2024-05-08T16:41:47.064-0400 [TRACE] client.alloc_runner: finished post-run hooks: alloc_id=92ffa2bd-9399-29ad-99b0-748e3d7a0c4d end="2024-05-08 16:41:47.064885581 -0400 EDT m=+460.604406553" duration=173.813468ms

@tgross
Copy link
Member

tgross commented May 9, 2024

Ok, the problem was that I was checking the existence of the staging path using the path inside the plugin container, which of course will never exist from the perspective of the CSI hook. With that adjustment, everything appears to be working as we want.

Mount / unmount with patched version

After running the job, our filesystem and mounts are as expected.

$ sudo tree /var/nomad/data/client/csi
/var/nomad/data/client/csi
├── controller
│   └── org.democratic-csi.nfs
├── node
│   └── org.democratic-csi.nfs
│       ├── per-alloc
│       │   └── 51a520b1-dae0-3403-e20e-d4565cd7f68e
│       │       └── csi-volume-nfs
│       │           └── rw-file-system-single-node-writer
│       └── staging
│           └── prod
│               └── csi-volume-nfs
│                   └── rw-file-system-single-node-writer
└── plugins
    ├── b97f67e5-3969-a876-75c6-0a97b6c1c150
    │   └── csi.sock
    └── e018a1d8-26b1-229b-79ed-ed0ef77f5e1f
        └── csi.sock

15 directories, 2 files

$ mount | grep csi-volume-nfs
192.168.1.170:/srv/nfs_data/v/csi-volume-nfs on /var/nomad/data/client/csi/node/org.democratic-csi.nfs/staging/prod/csi-volume-nfs/rw-file-system-single-node-writer type nfs4 (rw,noatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.37.105.17,local_lock=none,addr=192.168.1.170)
192.168.1.170:/srv/nfs_data/v/csi-volume-nfs on /var/nomad/data/client/csi/node/org.democratic-csi.nfs/per-alloc/51a520b1-dae0-3403-e20e-d4565cd7f68e/csi-volume-nfs/rw-file-system-single-node-writer type nfs4 (rw,noatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.37.105.17,local_lock=none,addr=192.168.1.170)

I stop the job and see the mounts are cleaned up.

$ sudo tree /var/nomad/data/client/csi
/var/nomad/data/client/csi
├── controller
│   └── org.democratic-csi.nfs
├── node
│   └── org.democratic-csi.nfs
│       ├── per-alloc
│       │   └── 51a520b1-dae0-3403-e20e-d4565cd7f68e
│       │       └── csi-volume-nfs
│       └── staging
│           └── prod
│               └── csi-volume-nfs
└── plugins
    ├── b97f67e5-3969-a876-75c6-0a97b6c1c150
    │   └── csi.sock
    └── e018a1d8-26b1-229b-79ed-ed0ef77f5e1f
        └── csi.sock

13 directories, 2 files

$ mount | grep csi-volume-nfs

Just to double-check everything is good, I grabbed the trace logs for the allocation and those look ok.

2024-05-09T09:46:34.910-0400 [TRACE] client.alloc_runner: running post-run hook: alloc_id=51a520b1-dae0-3403-e20e-d4565cd7f68e name=csi_hook start="2024-05-09 09:46:34.910896442 -0400 EDT m=+146.819772211"
2024-05-09T09:46:34.910-0400 [TRACE] client.csi_manager.org.democratic-csi.nfs.volume_manager: unmounting volume: alloc_id=51a520b1-dae0-3403-e20e-d4565cd7f68e ns=prod volume_id=csi-volume-nfs
2024-05-09T09:46:34.910-0400 [TRACE] client.csi_manager.org.democratic-csi.nfs.volume_manager: unpublishing volume: alloc_id=51a520b1-dae0-3403-e20e-d4565cd7f68e ns=prod volume_id=csi-volume-nfs plugin_target_path=/local/csi/per-alloc/51a520b1-dae0-3403-e20e-d4565cd7f68e/csi-volume-nfs/rw-file-system-single-node-writer
2024-05-09T09:46:34.921-0400 [TRACE] client.csi_manager.org.democratic-csi.nfs: finished client unary call: grpc.code=OK duration=10.416262ms grpc.service=csi.v1.Node grpc.method=NodeUnpublishVolume
2024-05-09T09:46:34.921-0400 [TRACE] client.csi_manager.org.democratic-csi.nfs.volume_manager: unstaging volume: alloc_id=51a520b1-dae0-3403-e20e-d4565cd7f68e ns=prod volume_id=csi-volume-nfs staging_path=/local/csi/staging/prod/csi-volume-nfs/rw-file-system-single-node-writer
2024-05-09T09:46:34.949-0400 [TRACE] client.csi_manager.org.democratic-csi.nfs: finished client unary call: grpc.code=OK duration=27.758998ms grpc.service=csi.v1.Node grpc.method=NodeUnstageVolume
2024-05-09T09:46:34.956-0400 [TRACE] client.alloc_runner: finished post-run hooks: alloc_id=51a520b1-dae0-3403-e20e-d4565cd7f68e name=csi_hook end="2024-05-09 09:46:34.956975477 -0400 EDT m=+146.865851275" duration=46.079064ms

Client upgrade

Reset both hosts to 1.8.0-beta.1, start from a fresh datadir, and deploy the job.

$ sudo tree /var/nomad/data/client/csi
/var/nomad/data/client/csi
├── controller
│   └── org.democratic-csi.nfs
├── node
│   └── org.democratic-csi.nfs
│       ├── per-alloc
│       │   └── a8b8c6cb-3d62-de5b-930e-2ac53864ef21
│       │       └── csi-volume-nfs
│       │           └── rw-file-system-single-node-writer
│       └── staging
│           └── csi-volume-nfs
│               └── rw-file-system-single-node-writer
└── plugins
    ├── 3d454d90-bb7a-2c8e-5e64-8c15cf87cef8
    │   └── csi.sock
    └── 95c8fc46-9caa-29fe-9978-267be888881e
        └── csi.sock

14 directories, 2 files

$ mount | grep csi-volume-nfs
192.168.1.170:/srv/nfs_data/v/csi-volume-nfs on /var/nomad/data/client/csi/node/org.democratic-csi.nfs/staging/csi-volume-nfs/rw-file-system-single-node-writer type nfs4 (rw,noatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.37.105.17,local_lock=none,addr=192.168.1.170)
192.168.1.170:/srv/nfs_data/v/csi-volume-nfs on /var/nomad/data/client/csi/node/org.democratic-csi.nfs/per-alloc/a8b8c6cb-3d62-de5b-930e-2ac53864ef21/csi-volume-nfs/rw-file-system-single-node-writer type nfs4 (rw,noatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.37.105.17,local_lock=none,addr=192.168.1.170)

Upgrade the client to the patched version and restart it. Restoring the alloc looks good in the trace logs.

2024-05-09T09:58:04.797-0400 [TRACE] client.alloc_runner: running pre-run hook: alloc_id=a8b8c6cb-3d62-de5b-930e-2ac53864ef21 name=csi_hook start="2024-05-09 09:58:04.797707724 -0400 EDT m=+0.208162277"
2024-05-09T09:58:04.998-0400 [DEBUG] client.alloc_runner.runner_hook.csi_hook: found CSI plugin: alloc_id=a8b8c6cb-3d62-de5b-930e-2ac53864ef21 type=csi-node name=org.democratic-csi.nfs
2024-05-09T09:58:05.015-0400 [TRACE] client.alloc_runner: finished pre-run hook: alloc_id=a8b8c6cb-3d62-de5b-930e-2ac53864ef21 name=csi_hook end="2024-05-09 09:58:05.015799926 -0400 EDT m=+0.426254597" duration=218.09232ms

Stop the job and see everything is unmounted as we'd hope.

$ sudo tree /var/nomad/data/client/csi
/var/nomad/data/client/csi
├── controller
│   └── org.democratic-csi.nfs
├── node
│   └── org.democratic-csi.nfs
│       ├── per-alloc
│       │   └── a8b8c6cb-3d62-de5b-930e-2ac53864ef21
│       │       └── csi-volume-nfs
│       └── staging
│           └── csi-volume-nfs
└── plugins
    ├── 3d454d90-bb7a-2c8e-5e64-8c15cf87cef8
    │   └── csi.sock
    └── 95c8fc46-9caa-29fe-9978-267be888881e
        └── csi.sock

12 directories, 2 files

$ mount | grep csi-volume-nfs

Start the job again (still using the old server, but with the new client), and see the namespaced staging dir now.

$ sudo tree /var/nomad/data/client/csi
/var/nomad/data/client/csi
├── controller
│   └── org.democratic-csi.nfs
├── node
│   └── org.democratic-csi.nfs
│       ├── per-alloc
│       │   ├── 479e6afd-4dde-15a4-b062-de01e5dc8195
│       │   │   └── csi-volume-nfs
│       │   │       └── rw-file-system-single-node-writer
│       │   └── a8b8c6cb-3d62-de5b-930e-2ac53864ef21
│       │       └── csi-volume-nfs
│       └── staging
│           ├── csi-volume-nfs
│           └── prod
│               └── csi-volume-nfs
│                   └── rw-file-system-single-node-writer
└── plugins
    ├── 3d454d90-bb7a-2c8e-5e64-8c15cf87cef8
    │   └── csi.sock
    └── 95c8fc46-9caa-29fe-9978-267be888881e
        └── csi.sock

18 directories, 2 files

$ mount | grep csi-volume-nfs
192.168.1.170:/srv/nfs_data/v/csi-volume-nfs on /var/nomad/data/client/csi/node/org.democratic-csi.nfs/staging/prod/csi-volume-nfs/rw-file-system-single-node-writer type nfs4 (rw,noatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.37.105.17,local_lock=none,addr=192.168.1.170)
192.168.1.170:/srv/nfs_data/v/csi-volume-nfs on /var/nomad/data/client/csi/node/org.democratic-csi.nfs/per-alloc/479e6afd-4dde-15a4-b062-de01e5dc8195/csi-volume-nfs/rw-file-system-single-node-writer type nfs4 (rw,noatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.37.105.17,local_lock=none,addr=192.168.1.170)

Stop the job again, just to verify that the old server can't interfere with cleanup on a new client.

$ sudo tree /var/nomad/data/client/csi
/var/nomad/data/client/csi
├── controller
│   └── org.democratic-csi.nfs
├── node
│   └── org.democratic-csi.nfs
│       ├── per-alloc
│       │   ├── 479e6afd-4dde-15a4-b062-de01e5dc8195
│       │   │   └── csi-volume-nfs
│       │   └── a8b8c6cb-3d62-de5b-930e-2ac53864ef21
│       │       └── csi-volume-nfs
│       └── staging
│           ├── csi-volume-nfs
│           └── prod
│               └── csi-volume-nfs
└── plugins
    ├── 3d454d90-bb7a-2c8e-5e64-8c15cf87cef8
    │   └── csi.sock
    └── 95c8fc46-9caa-29fe-9978-267be888881e
        └── csi.sock

16 directories, 2 files

$ mount | grep csi-volume-nfs

Server upgrade

Reset both client and server to 1.8.0-beta.1, start from a fresh data dir, and run the job.

$ sudo tree /var/nomad/data/client/csi
/var/nomad/data/client/csi
├── controller
│   └── org.democratic-csi.nfs
├── node
│   └── org.democratic-csi.nfs
│       ├── per-alloc
│       │   └── 7929db60-1e23-2cb4-6e06-682949d86bd9
│       │       └── csi-volume-nfs
│       │           └── rw-file-system-single-node-writer
│       └── staging
│           └── csi-volume-nfs
│               └── rw-file-system-single-node-writer
└── plugins
    ├── 49da002f-29d6-1f1c-7ee1-de096822ccaa
    │   └── csi.sock
    └── b9482d40-ffda-af79-cb57-c44f8c7e6b54
        └── csi.sock

14 directories, 2 files

$  mount | grep csi-volume-nfs
192.168.1.170:/srv/nfs_data/v/csi-volume-nfs on /var/nomad/data/client/csi/node/org.democratic-csi.nfs/staging/csi-volume-nfs/rw-file-system-single-node-writer type nfs4 (rw,noatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.37.105.17,local_lock=none,addr=192.168.1.170)
192.168.1.170:/srv/nfs_data/v/csi-volume-nfs on /var/nomad/data/client/csi/node/org.democratic-csi.nfs/per-alloc/7929db60-1e23-2cb4-6e06-682949d86bd9/csi-volume-nfs/rw-file-system-single-node-writer type nfs4 (rw,noatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.37.105.17,local_lock=none,addr=192.168.1.170)

Upgrade the server and restart it. Then stop the job. Everything is unmounted as expected.

$ sudo tree /var/nomad/data/client/csi
/var/nomad/data/client/csi
├── controller
│   └── org.democratic-csi.nfs
├── node
│   └── org.democratic-csi.nfs
│       ├── per-alloc
│       │   └── 7929db60-1e23-2cb4-6e06-682949d86bd9
│       │       └── csi-volume-nfs
│       └── staging
│           └── csi-volume-nfs
└── plugins
    ├── 49da002f-29d6-1f1c-7ee1-de096822ccaa
    │   └── csi.sock
    └── b9482d40-ffda-af79-cb57-c44f8c7e6b54
        └── csi.sock

12 directories, 2 files

$ mount | grep csi-volume-nfs

Run the job again (without upgrading client), just to make sure a new server can't force incorrect behavior on an old client. This looks as expected -- the bug is still in place on the client but the mount works.

$ sudo tree /var/nomad/data/client/csi
/var/nomad/data/client/csi
├── controller
│   └── org.democratic-csi.nfs
├── node
│   └── org.democratic-csi.nfs
│       ├── per-alloc
│       │   ├── 7929db60-1e23-2cb4-6e06-682949d86bd9
│       │   │   └── csi-volume-nfs
│       │   └── fd381fa6-1022-c8d7-4f43-65ba8e23c6dd
│       │       └── csi-volume-nfs
│       │           └── rw-file-system-single-node-writer
│       └── staging
│           └── csi-volume-nfs
│               └── rw-file-system-single-node-writer
└── plugins
    ├── 49da002f-29d6-1f1c-7ee1-de096822ccaa
    │   └── csi.sock
    └── b9482d40-ffda-af79-cb57-c44f8c7e6b54
        └── csi.sock

16 directories, 2 files

$ mount | grep csi-volume-nfs
192.168.1.170:/srv/nfs_data/v/csi-volume-nfs on /var/nomad/data/client/csi/node/org.democratic-csi.nfs/staging/csi-volume-nfs/rw-file-system-single-node-writer type nfs4 (rw,noatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.37.105.17,local_lock=none,addr=192.168.1.170)
192.168.1.170:/srv/nfs_data/v/csi-volume-nfs on /var/nomad/data/client/csi/node/org.democratic-csi.nfs/per-alloc/fd381fa6-1022-c8d7-4f43-65ba8e23c6dd/csi-volume-nfs/rw-file-system-single-node-writer type nfs4 (rw,noatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.37.105.17,local_lock=none,addr=192.168.1.170)

Now update the client and restart it, and stop the job. Everything is unmounted as expected.

$ sudo tree /var/nomad/data/client/csi
/var/nomad/data/client/csi
├── controller
│   └── org.democratic-csi.nfs
├── node
│   └── org.democratic-csi.nfs
│       ├── per-alloc
│       │   ├── 7929db60-1e23-2cb4-6e06-682949d86bd9
│       │   │   └── csi-volume-nfs
│       │   └── fd381fa6-1022-c8d7-4f43-65ba8e23c6dd
│       │       └── csi-volume-nfs
│       └── staging
│           └── csi-volume-nfs
└── plugins
    ├── 49da002f-29d6-1f1c-7ee1-de096822ccaa
    │   └── csi.sock
    └── b9482d40-ffda-af79-cb57-c44f8c7e6b54
        └── csi.sock

14 directories, 2 files

$ mount | grep csi-volume-nfs

Start the job again, showing that new server + new client results in namespaced staging as we'd expect.

$ sudo tree /var/nomad/data/client/csi
/var/nomad/data/client/csi
├── controller
│   └── org.democratic-csi.nfs
├── node
│   └── org.democratic-csi.nfs
│       ├── per-alloc
│       │   ├── 4ebf15be-bcf0-3c00-9b99-9ce72895f0c1
│       │   │   └── csi-volume-nfs
│       │   │       └── rw-file-system-single-node-writer
│       │   ├── 7929db60-1e23-2cb4-6e06-682949d86bd9
│       │   │   └── csi-volume-nfs
│       │   └── fd381fa6-1022-c8d7-4f43-65ba8e23c6dd
│       │       └── csi-volume-nfs
│       └── staging
│           ├── csi-volume-nfs
│           └── prod
│               └── csi-volume-nfs
│                   └── rw-file-system-single-node-writer
└── plugins
    ├── 49da002f-29d6-1f1c-7ee1-de096822ccaa
    │   └── csi.sock
    └── b9482d40-ffda-af79-cb57-c44f8c7e6b54
        └── csi.sock

20 directories, 2 files

$ mount | grep csi-volume-nfs
192.168.1.170:/srv/nfs_data/v/csi-volume-nfs on /var/nomad/data/client/csi/node/org.democratic-csi.nfs/staging/prod/csi-volume-nfs/rw-file-system-single-node-writer type nfs4 (rw,noatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.37.105.17,local_lock=none,addr=192.168.1.170)
192.168.1.170:/srv/nfs_data/v/csi-volume-nfs on /var/nomad/data/client/csi/node/org.democratic-csi.nfs/per-alloc/4ebf15be-bcf0-3c00-9b99-9ce72895f0c1/csi-volume-nfs/rw-file-system-single-node-writer type nfs4 (rw,noatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.37.105.17,local_lock=none,addr=192.168.1.170)

tgross added a commit that referenced this issue May 9, 2024
CSI volumes are are namespaced. But the client does not include the namespace in
the staging mount path. This causes CSI volumes with the same volume ID but
different namespace to collide if they happen to be placed on the same host.

Fixes: #18741
@tgross tgross added this to the 1.8.0 milestone May 9, 2024
tgross added a commit that referenced this issue May 13, 2024
CSI volumes are namespaced. But the client does not include the namespace in the
staging mount path. This causes CSI volumes with the same volume ID but
different namespace to collide if they happen to be placed on the same host. The
per-allocation paths don't need to be namespaced, because an allocation can only
mount volumes from its job's own namespace.

Rework the CSI hook tests to have more fine-grained control over the mock
on-disk state. Add tests covering upgrades from staging paths missing
namespaces.

Fixes: #18741
Nomad - Community Issues Triage automation moved this from In Progress to Done May 13, 2024
tgross added a commit that referenced this issue May 13, 2024
CSI volumes are namespaced. But the client does not include the namespace in the
staging mount path. This causes CSI volumes with the same volume ID but
different namespace to collide if they happen to be placed on the same host. The
per-allocation paths don't need to be namespaced, because an allocation can only
mount volumes from its job's own namespace.

Rework the CSI hook tests to have more fine-grained control over the mock
on-disk state. Add tests covering upgrades from staging paths missing
namespaces.

Fixes: #18741
tgross added a commit that referenced this issue May 13, 2024
…e/1.6.x (#20572)

CSI volumes are namespaced. But the client does not include the namespace in the
staging mount path. This causes CSI volumes with the same volume ID but
different namespace to collide if they happen to be placed on the same host. The
per-allocation paths don't need to be namespaced, because an allocation can only
mount volumes from its job's own namespace.

Rework the CSI hook tests to have more fine-grained control over the mock
on-disk state. Add tests covering upgrades from staging paths missing
namespaces.

Fixes: #18741

Co-authored-by: Tim Gross <tgross@hashicorp.com>
@tgross
Copy link
Member

tgross commented May 16, 2024

#20532 has been merged and will ship in Nomad 1.8.0 (with backports to supported versions)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hcc/cst Admin - internal stage/accepted Confirmed, and intend to work on. No timeline committment though. theme/storage type/bug
Development

Successfully merging a pull request may close this issue.

6 participants