Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mounts fail for containerd-driver #13229

Closed
alemonmk opened this issue Jun 6, 2022 · 19 comments
Closed

mounts fail for containerd-driver #13229

alemonmk opened this issue Jun 6, 2022 · 19 comments

Comments

@alemonmk
Copy link

alemonmk commented Jun 6, 2022

Nomad version

Nomad v1.3.1 (2b054e3)

Operating system and Environment details

Vanilla Vultr Debian 11 amd64 image
nomad-driver-containerd 0.9.3
containerd 1.4.13~ds1-1~deb11u1
vultr-csi v0.7.0

Issue

I'm trying to run PostgreSQL container with Nomad and use Vultr Block Storage for persistence.

After running CSI driver and have a healthy plugin, I proceeded to add vultr_block_storage and nomad_volume via Terraform, the volume showed up in UI. Then I run a PostgreSQL job that has the volume and volume_mount stanzas for the block storage volume, the container-driver always fails with mounting "<Path to CSI mounted volume path>" to rootfs at "<mount destination>" caused: stat <Path to CSI mounted volume>: no such file or directory.

The mounting and unmounting happens in 5 seconds so I'm unable to confirm if everything is correctly set up. I ran the example busybox job from the vultr-csi repository and it fails too. Checked logs from CSI nodes containers, the volume should've been mounted correctly, and the <Path to CSI mounted volume> provided by Nomad to containerd seem to be correct. I have a few templates that is going to be mounted into the volume but I don't think it would cause any problem.

I don't know where to shoot this issue to, I guess I can start here?

Reproduction steps

  1. Create a Vultr Block Stroage volume
  2. Run a job with the volume mounted

Expected Result

Task starts.

Actual Result

PS B:\iac\vps\3-nomad> nomad alloc status 57c10b92
ID                  = 57c10b92-3e7a-bee9-8926-6e00d6800957
Eval ID             = f5ef2cdd
Name                = postgres.app[0]
Node ID             = e465bda6
Node Name           = <Redacted>
Job ID              = postgres
Job Version         = 0
Client Status       = failed
Client Description  = Failed tasks
Desired Status      = run
Desired Description = <none>
Created             = 19m58s ago
Modified            = 19m50s ago
Deployment ID       = 11648405
Deployment Health   = unhealthy

Allocation Addresses
Label      Dynamic  Address
*postgres  yes      <Redacted>:5432

Task "server" is "dead"
Task Resources
CPU       Memory    Disk     Addresses
1996 MHz  1000 MiB  300 MiB

CSI Volumes:
ID    Read Only
data  false

Task Events:
Started At     = N/A
Finished At    = 2022-06-06T10:16:30Z
Total Restarts = 0
Last Restart   = N/A

Recent Events:
Time                       Type             Description
2022-06-06T18:16:32+08:00  Killing          Sent interrupt. Waiting 5s before force killing
2022-06-06T18:16:30+08:00  Alloc Unhealthy  Unhealthy because of failed task
2022-06-06T18:16:30+08:00  Not Restarting   Error was unrecoverable
2022-06-06T18:16:30+08:00  Driver Failure   rpc error: code = Unknown desc = Error in creating task: OCI runtime create failed: container_linux.go:367: starting container process caused: process_linux.go:495: container init caused: rootfs_linux.go:76: mounting "/opt/nomad/data/client/csi/node/csi-vultr-jp/per-alloc/57c10b92-3e7a-bee9-8926-6e00d6800957/pg_store/rw-file-system-single-node-writer" to rootfs at "/bitnami/postgresql" caused: stat /opt/nomad/data/client/csi/node/csi-vultr-jp/per-alloc/57c10b92-3e7a-bee9-8926-6e00d6800957/pg_store/rw-file-system-single-node-writer: no such file or directory: unknown
2022-06-06T18:16:28+08:00  Task Setup       Building Task Directory
2022-06-06T18:16:24+08:00  Received         Task received by client

Log from Vultr CSI driver:

time="2022-06-06T10:16:28Z" level=info msg="Node Stage Volume: called" capacity="mount:<fs_type:\"ext4\" > access_mode:<mode:SINGLE_NODE_WRITER > " host_id=f593da5b-08b2-47a6-a802-28128d16130c region=NRT target=/local/csi/staging/pg_store/rw-file-system-single-node-writer version=0.7.0 volume=3e698801-6841-4bde-b346-3e8f2d5b5ce4
time="2022-06-06T10:16:28Z" level=info msg="isFormatted called" format-args="[/dev/disk/by-id/virtio-nrt-3e69880168414b]" format-command=blkid host_id=f593da5b-08b2-47a6-a802-28128d16130c region=NRT version=0.7.0
time="2022-06-06T10:16:28Z" level=info msg="isFormatted end" format-output="[47 100 101 118 47 100 105 115 107 47 98 121 45 105 100 47 118 105 114 116 105 111 45 110 114 116 45 51 101 54 57 56 56 48 49 54 56 52 49 52 98 58 32 85 85 73 68 61 34 49 49 50 52 100 52 55 53 45 101 48 48 98 45 52 57 101 98 45 56 97 51 99 45 102 99 54 100 54 51 101 101 56 102 51 99 34 32 84 89 80 69 61 34 101 120 116 52 34 10]" host_id=f593da5b-08b2-47a6-a802-28128d16130c region=NRT version=0.7.0
time="2022-06-06T10:16:28Z" level=info msg="Mount Called" filesystem=ext4 host_id=f593da5b-08b2-47a6-a802-28128d16130c methods=mount options="[]" region=NRT source=/dev/disk/by-id/virtio-nrt-3e69880168414b target=/local/csi/staging/pg_store/rw-file-system-single-node-writer version=0.7.0
time="2022-06-06T10:16:28Z" level=info msg="mount command and arguments" host_id=f593da5b-08b2-47a6-a802-28128d16130c mount arguments="[-t ext4 /dev/disk/by-id/virtio-nrt-3e69880168414b /local/csi/staging/pg_store/rw-file-system-single-node-writer]" mount command=mount region=NRT version=0.7.0
time="2022-06-06T10:16:28Z" level=info msg="mount command - removal of lost+found error" error="stat /local/csi/staging/pg_store/rw-file-system-single-node-writer/lost+found: no such file or directory" host_id=f593da5b-08b2-47a6-a802-28128d16130c region=NRT version=0.7.0
time="2022-06-06T10:16:28Z" level=info msg="Node Stage Volume: volume staged" host_id=f593da5b-08b2-47a6-a802-28128d16130c region=NRT version=0.7.0
time="2022-06-06T10:16:28Z" level=info msg="GRPC response: " GRPC.call=/csi.v1.Node/NodeStageVolume GRPC.request="volume_id:\"3e698801-6841-4bde-b346-3e8f2d5b5ce4\" publish_context:<key:\"\" value:\"nrt-3e69880168414b\" > staging_target_path:\"/local/csi/staging/pg_store/rw-file-system-single-node-writer\" volume_capability:<mount:<fs_type:\"ext4\" > access_mode:<mode:SINGLE_NODE_WRITER > > "
time="2022-06-06T10:16:28Z" level=info msg="Node Publish Volume: called" host_id=f593da5b-08b2-47a6-a802-28128d16130c region=NRT staging_target_path=/local/csi/staging/pg_store/rw-file-system-single-node-writer target_path=/local/csi/per-alloc/57c10b92-3e7a-bee9-8926-6e00d6800957/pg_store/rw-file-system-single-node-writer version=0.7.0 volume_id=3e698801-6841-4bde-b346-3e8f2d5b5ce4
time="2022-06-06T10:16:28Z" level=info msg="Mount Called" filesystem=ext4 host_id=f593da5b-08b2-47a6-a802-28128d16130c methods=mount options="[bind]" region=NRT source=/local/csi/staging/pg_store/rw-file-system-single-node-writer target=/local/csi/per-alloc/57c10b92-3e7a-bee9-8926-6e00d6800957/pg_store/rw-file-system-single-node-writer version=0.7.0
time="2022-06-06T10:16:28Z" level=info msg="mount command and arguments" host_id=f593da5b-08b2-47a6-a802-28128d16130c mount arguments="[-t ext4 -o bind /local/csi/staging/pg_store/rw-file-system-single-node-writer /local/csi/per-alloc/57c10b92-3e7a-bee9-8926-6e00d6800957/pg_store/rw-file-system-single-node-writer]" mount command=mount region=NRT version=0.7.0
time="2022-06-06T10:16:28Z" level=info msg="mount command - removal of lost+found error" error="stat /local/csi/per-alloc/57c10b92-3e7a-bee9-8926-6e00d6800957/pg_store/rw-file-system-single-node-writer/lost+found: no such file or directory" host_id=f593da5b-08b2-47a6-a802-28128d16130c region=NRT version=0.7.0
time="2022-06-06T10:16:28Z" level=info msg="Node Publish Volume: published" host_id=f593da5b-08b2-47a6-a802-28128d16130c region=NRT version=0.7.0
time="2022-06-06T10:16:28Z" level=info msg="GRPC response: " GRPC.call=/csi.v1.Node/NodePublishVolume GRPC.request="volume_id:\"3e698801-6841-4bde-b346-3e8f2d5b5ce4\" publish_context:<key:\"\" value:\"nrt-3e69880168414b\" > staging_target_path:\"/local/csi/staging/pg_store/rw-file-system-single-node-writer\" target_path:\"/local/csi/per-alloc/57c10b92-3e7a-bee9-8926-6e00d6800957/pg_store/rw-file-system-single-node-writer\" volume_capability:<mount:<fs_type:\"ext4\" > access_mode:<mode:SINGLE_NODE_WRITER > > "
time="2022-06-06T10:16:32Z" level=info msg="Node Unpublish Volume: called" host_id=f593da5b-08b2-47a6-a802-28128d16130c region=NRT target-path=/local/csi/per-alloc/57c10b92-3e7a-bee9-8926-6e00d6800957/pg_store/rw-file-system-single-node-writer version=0.7.0 volume-id=3e698801-6841-4bde-b346-3e8f2d5b5ce4
time="2022-06-06T10:16:32Z" level=info msg="executing umount command" args="[/local/csi/per-alloc/57c10b92-3e7a-bee9-8926-6e00d6800957/pg_store/rw-file-system-single-node-writer]" cmd=umount host_id=f593da5b-08b2-47a6-a802-28128d16130c region=NRT version=0.7.0
time="2022-06-06T10:16:32Z" level=info msg="Node Publish Volume: unpublished" host_id=f593da5b-08b2-47a6-a802-28128d16130c region=NRT version=0.7.0
time="2022-06-06T10:16:32Z" level=info msg="GRPC response: " GRPC.call=/csi.v1.Node/NodeUnpublishVolume GRPC.request="volume_id:\"3e698801-6841-4bde-b346-3e8f2d5b5ce4\" target_path:\"/local/csi/per-alloc/57c10b92-3e7a-bee9-8926-6e00d6800957/pg_store/rw-file-system-single-node-writer\" "
time="2022-06-06T10:16:32Z" level=info msg="Node Unstage Volume: called" host_id=f593da5b-08b2-47a6-a802-28128d16130c region=NRT staging-target-path=/local/csi/staging/pg_store/rw-file-system-single-node-writer version=0.7.0 volume-id=3e698801-6841-4bde-b346-3e8f2d5b5ce4
time="2022-06-06T10:16:32Z" level=info msg="executing umount command" args="[/local/csi/staging/pg_store/rw-file-system-single-node-writer]" cmd=umount host_id=f593da5b-08b2-47a6-a802-28128d16130c region=NRT version=0.7.0
time="2022-06-06T10:16:32Z" level=info msg="Node Unstage Volume: volume unstaged" host_id=f593da5b-08b2-47a6-a802-28128d16130c region=NRT version=0.7.0

Also after this failed task, Access mode and attachment mode are cleared from the volume for some reason:

PS B:\iac\vps\3-nomad> nomad volume status pg_store
ID                   = pg_store
Name                 = pg_store
External ID          = 3e698801-6841-4bde-b346-3e8f2d5b5ce4
Plugin ID            = csi-vultr-jp
Provider             = block.csi.vultr.com
Version              = 0.7.0
Schedulable          = true
Controllers Healthy  = 1
Controllers Expected = 1
Nodes Healthy        = 4
Nodes Expected       = 4
Access Mode          = <none>
Attachment Mode      = <none>
Mount Options        = fs_type: ext4
Namespace            = default

Allocations
No allocations placed

Job file (if appropriate)

job "postgres" {
  region = "jp"

  datacenters = ["cloud-prod-2"]

  type = "service"

  group "app" {
    network {
      port "postgres" {
        static       = 5432
        host_network = "private"
      }
    }

    volume "data" {
      type            = "csi"
      source          = "${volume_id}"
      attachment_mode = "file-system"
      access_mode     = "single-node-writer"
    }

    service {
      name = "postgres"
      port = "postgres"
    }

    task "server" {
      driver = "containerd-driver"

      config {
        image        = "bitnami/postgresql:13.7.0-debian-10-r21"
        host_network = true
        mounts = [
          {
            type    = "bind"
            source  = "local/conf"
            target  = "/bitnami/postgresql/conf"
            options = ["rbind", "ro"]
          }
        ]
      }

      volume_mount {
        volume      = "data"
        destination = "/bitnami/postgresql"
      }

      template {
        data        = <<-EOT
        POSTGRESQL_PASSWORD={{ key "config/postgres/passwords/su" }}
        POSTGRESQL_INITDB_ARGS="--data-checksums"
        EOT
        destination = "secrets/env"
        env         = true
      }

      template {
        data        = <<-EOT
        local all postgres peer
        host all postgres 0.0.0.0/0 scram-sha-256
        host sameuser all 0.0.0.0/0 scram-sha-256
        EOT
        destination = "local/conf/pg_hba.conf"
      }

      template {
        data        = <<-EOT
        listen_address = '{{ env "NOMAD_IP_postgres" }}'
        port = {{ env "NOMAD_PORT_postgres" }}
        password_encryption = scram-sha-256
        ssl = off
        EOT
        destination = "local/conf/conf.d/extra.conf"
      }

      resources {
        cores      = 1
        memory     = 1000
        memory_max = 1500
      }

      logs {
        max_files     = 1
        max_file_size = 5
      }
    }
  }
}

Nomad Server/Client logs (if appropriate)

Nothing fruitful.

@tgross tgross added this to Needs Triage in Nomad - Community Issues Triage via automation Jun 6, 2022
@tgross
Copy link
Member

tgross commented Jun 6, 2022

Hi @alemonmk!

Thanks for providing all the CSI plugin's allocation logs, as those will help narrow down the problem. When Nomad asks the CSI plugin for a volume, it does a "Node Stage Volume" RPC (which succeeds in your logs), followed by a "Nomad Publish Volume" RPC. Most plugins literally just call mount(8) and it looks like this one does too:

time="2022-06-06T10:16:28Z" level=info msg="Node Publish Volume: called" host_id=f593da5b-08b2-47a6-a802-28128d16130c region=NRT staging_target_path=/local/csi/staging/pg_store/rw-file-system-single-node-writer target_path=/local/csi/per-alloc/57c10b92-3e7a-bee9-8926-6e00d6800957/pg_store/rw-file-system-single-node-writer version=0.7.0 volume_id=3e698801-6841-4bde-b346-3e8f2d5b5ce4

time="2022-06-06T10:16:28Z" level=info msg="Mount Called" filesystem=ext4 host_id=f593da5b-08b2-47a6-a802-28128d16130c methods=mount options="[bind]" region=NRT source=/local/csi/staging/pg_store/rw-file-system-single-node-writer target=/local/csi/per-alloc/57c10b92-3e7a-bee9-8926-6e00d6800957/pg_store/rw-file-system-single-node-writer version=0.7.0

time="2022-06-06T10:16:28Z" level=info msg="mount command and arguments" host_id=f593da5b-08b2-47a6-a802-28128d16130c mount arguments="[-t ext4 -o bind /local/csi/staging/pg_store/rw-file-system-single-node-writer /local/csi/per-alloc/57c10b92-3e7a-bee9-8926-6e00d6800957/pg_store/rw-file-system-single-node-writer]" mount command=mount region=NRT version=0.7.0

time="2022-06-06T10:16:28Z" level=info msg="mount command - removal of lost+found error" error="stat /local/csi/per-alloc/57c10b92-3e7a-bee9-8926-6e00d6800957/pg_store/rw-file-system-single-node-writer/lost+found: no such file or directory" host_id=f593da5b-08b2-47a6-a802-28128d16130c region=NRT version=0.7.0

I don't understand what it's trying to do with this "lost+found" entity. That's coming from the plugin, not Nomad. It looks like it was added in vultr/vultr-csi@774479d to driver/mounter.go#L152-L158.

However, that's just a spurious warning and the plugin goes on to tell us it's published the volume successfully:

time="2022-06-06T10:16:28Z" level=info msg="Node Publish Volume: published" host_id=f593da5b-08b2-47a6-a802-28128d16130c region=NRT version=0.7.0

time="2022-06-06T10:16:28Z" level=info msg="GRPC response: " GRPC.call=/csi.v1.Node/NodePublishVolume GRPC.request="volume_id:"3e698801-6841-4bde-b346-3e8f2d5b5ce4" publish_context:<key:"" value:"nrt-3e69880168414b" > staging_target_path:"/local/csi/staging/pg_store/rw-file-system-single-node-writer" target_path:"/local/csi/per-alloc/57c10b92-3e7a-bee9-8926-6e00d6800957/pg_store/rw-file-system-single-node-writer" volume_capability:<mount:<fs_type:"ext4" > access_mode:<mode:SINGLE_NODE_WRITER > > "

But then 4 seconds later we see an "Unpublish Volume" RPC come in, which means Nomad said "nope, this isn't going to work" and unpublished it.

time="2022-06-06T10:16:32Z" level=info msg="Node Unpublish Volume: called" host_id=f593da5b-08b2-47a6-a802-28128d16130c region=NRT target-path=/local/csi/per-alloc/57c10b92-3e7a-bee9-8926-6e00d6800957/pg_store/rw-file-system-single-node-writer version=0.7.0 volume-id=3e698801-6841-4bde-b346-3e8f2d5b5ce4

Once we've hit this point, we set some state for the allocation (ref csi_hook.go#L111-L113) and then pass that state to the task driver in the volume_hook. The failure is bubbling up from containerd itself:

2022-06-06T18:16:30+08:00 Driver Failure rpc error: code = Unknown desc = Error in creating task: OCI runtime create failed: container_linux.go:367: starting container process caused: process_linux.go:495: container init caused: rootfs_linux.go:76: mounting "/opt/nomad/data/client/csi/node/csi-vultr-jp/per-alloc/57c10b92-3e7a-bee9-8926-6e00d6800957/pg_store/rw-file-system-single-node-writer" to rootfs at "/bitnami/postgresql" caused: stat /opt/nomad/data/client/csi/node/csi-vultr-jp/per-alloc/57c10b92-3e7a-bee9-8926-6e00d6800957/pg_store/rw-file-system-single-node-writer: no such file or directory: unknown

Note that the Nomad datadir path we see there is mapped to /local/csi/staging inside the plugin.

So somewhere between the volume being mounted and us exposing it to the task, we're dropping the state. It could also be a problem with the third-party containerd task driver. In any case, the next thing to do here is to look at the debug-level logs on the client. Can you share those from the time the allocation is placed to when it fails?


Aside:

Also after this failed task, Access mode and attachment mode are cleared from the volume for some reason:

Yes, that's the correct behavior! The access mode and attachment mode are set on the volume only once it's actually attached. We're not attached because the allocation failed.

@tgross tgross moved this from Needs Triage to In Progress in Nomad - Community Issues Triage Jun 6, 2022
@tgross tgross self-assigned this Jun 6, 2022
@alemonmk
Copy link
Author

alemonmk commented Jun 7, 2022

So somewhere between the volume being mounted and us exposing it to the task, we're dropping the state. It could also be a problem with the third-party containerd task driver. In any case, the next thing to do here is to look at the debug-level logs on the client. Can you share those from the time the allocation is placed to when it fails?

I hope you are actually talking about server logs because there's literally nothing about allocations in client log, even on trace level.

It probably does not matter much anymore, because there's nothing can be mounted unless it's one of default mounts or host volume:
Job:

job "dcim" {
  region = "jp"

  datacenters = ["cloud-prod-2"]

  type = "service"

  constraint {
    attribute = "${node.unique.name}"
    value     = "node1"
  }

  group "app" {
    network {
      port "http" {
        static       = 3010
        host_network = "private"
      }
    }

    service {
      name = "netbox-http"
      port = "http"

      check {
        type     = "http"
        path     = "/login/"
        interval = "30s"
        timeout  = "3s"
        header {
          Host = ["REDACTED"]
        }

        check_restart {
          limit = 1
          grace = "60s"
        }
      }
    }

    task "netbox" {
      driver = "containerd-driver"

      env {
        SKIP_SUPERUSER       = "true"
        SKIP_STARTUP_SCRIPTS = "true"
      }

      config {
        image        = "netboxcommunity/netbox:v3.2.3"
        cap_drop     = ["all"]
        host_network = true
        mounts = [
          {
            type    = "bind"
            source  = "secrets/extra.py"
            target  = "/opt/netbox/netbox/config/extra.py"
            options = ["rbind", "ro"]
          },
          {
            type   = "tmpfs"
            target = "/opt/unit"
          }
        ]
      }

      template {
        data        = <<-EOT
<< snip >>
        EOT
        destination = "secrets/extra.py"
      }

      resources {
        cpu    = 500
        memory = 512
      }

      logs {
        max_files     = 1
        max_file_size = 5
      }
    }
  }
}

Server log:

2022-06-07T19:23:24.059+0900 [INFO]  agent: (runner) creating new runner (dry: false, once: false)
2022-06-07T19:23:24.059+0900 [DEBUG] agent: (runner) final config: { REDACTED }
2022-06-07T19:23:24.059+0900 [INFO]  agent: (runner) creating watcher
2022-06-07T19:23:24.060+0900 [INFO]  agent: (runner) starting
2022-06-07T19:23:24.060+0900 [DEBUG] agent: (runner) running initial templates
2022-06-07T19:23:24.060+0900 [DEBUG] agent: (runner) initiating run
2022-06-07T19:23:24.060+0900 [DEBUG] agent: (runner) checking template 7be357fb473e0b4b3fc43b7ed5a4529f
2022-06-07T19:23:24.060+0900 [DEBUG] agent: (runner) missing data for 5 dependencies
2022-06-07T19:23:24.060+0900 [DEBUG] agent: (runner) missing dependency: kv.block(config/netbox/passwords/postgres)
2022-06-07T19:23:24.060+0900 [DEBUG] agent: (runner) missing dependency: kv.block(config/netbox/passwords/redis)
2022-06-07T19:23:24.060+0900 [DEBUG] agent: (runner) missing dependency: kv.block(config/netbox/passwords/random)
2022-06-07T19:23:24.060+0900 [DEBUG] agent: (runner) missing dependency: kv.block(config/netbox/passwords/azuread/key)
2022-06-07T19:23:24.060+0900 [DEBUG] agent: (runner) missing dependency: kv.block(config/netbox/passwords/azuread/secret)
2022-06-07T19:23:24.060+0900 [DEBUG] agent: (runner) add used dependency kv.block(config/netbox/passwords/postgres) to missing since isLeader but do not have a watcher
2022-06-07T19:23:24.060+0900 [DEBUG] agent: (runner) add used dependency kv.block(config/netbox/passwords/redis) to missing since isLeader but do not have a watcher
2022-06-07T19:23:24.060+0900 [DEBUG] agent: (runner) add used dependency kv.block(config/netbox/passwords/random) to missing since isLeader but do not have a watcher
2022-06-07T19:23:24.060+0900 [DEBUG] agent: (runner) add used dependency kv.block(config/netbox/passwords/azuread/key) to missing since isLeader but do not have a watcher
2022-06-07T19:23:24.060+0900 [DEBUG] agent: (runner) add used dependency kv.block(config/netbox/passwords/azuread/secret) to missing since isLeader but do not have a watcher
2022-06-07T19:23:24.060+0900 [DEBUG] agent: (runner) was not watching 5 dependencies
2022-06-07T19:23:24.060+0900 [DEBUG] agent: (watcher) adding kv.block(config/netbox/passwords/postgres)
2022-06-07T19:23:24.060+0900 [TRACE] agent: (watcher) kv.block(config/netbox/passwords/postgres) starting
2022-06-07T19:23:24.060+0900 [DEBUG] agent: (watcher) adding kv.block(config/netbox/passwords/redis)
2022-06-07T19:23:24.060+0900 [TRACE] agent: (watcher) kv.block(config/netbox/passwords/redis) starting
2022-06-07T19:23:24.060+0900 [DEBUG] agent: (watcher) adding kv.block(config/netbox/passwords/random)
2022-06-07T19:23:24.060+0900 [TRACE] agent: (watcher) kv.block(config/netbox/passwords/random) starting
2022-06-07T19:23:24.060+0900 [DEBUG] agent: (watcher) adding kv.block(config/netbox/passwords/azuread/key)
2022-06-07T19:23:24.060+0900 [TRACE] agent: (watcher) kv.block(config/netbox/passwords/azuread/key) starting
2022-06-07T19:23:24.060+0900 [DEBUG] agent: (watcher) adding kv.block(config/netbox/passwords/azuread/secret)
2022-06-07T19:23:24.060+0900 [TRACE] agent: (watcher) kv.block(config/netbox/passwords/azuread/secret) starting
2022-06-07T19:23:24.060+0900 [DEBUG] agent: (runner) diffing and updating dependencies
2022-06-07T19:23:24.060+0900 [DEBUG] agent: (runner) watching 5 dependencies
2022-06-07T19:23:24.060+0900 [TRACE] agent: (view) kv.block(config/netbox/passwords/random) starting fetch
2022-06-07T19:23:24.060+0900 [TRACE] agent: kv.block(config/netbox/passwords/random): GET /v1/kv/config/netbox/passwords/random?stale=true&wait=1m0s
2022-06-07T19:23:24.060+0900 [TRACE] agent: (view) kv.block(config/netbox/passwords/azuread/secret) starting fetch
2022-06-07T19:23:24.060+0900 [TRACE] agent: kv.block(config/netbox/passwords/azuread/secret): GET /v1/kv/config/netbox/passwords/azuread/secret?stale=true&wait=1m0s
2022-06-07T19:23:24.060+0900 [TRACE] agent: (view) kv.block(config/netbox/passwords/redis) starting fetch
2022-06-07T19:23:24.060+0900 [TRACE] agent: kv.block(config/netbox/passwords/redis): GET /v1/kv/config/netbox/passwords/redis?stale=true&wait=1m0s
2022-06-07T19:23:24.060+0900 [TRACE] agent: (view) kv.block(config/netbox/passwords/azuread/key) starting fetch
2022-06-07T19:23:24.061+0900 [TRACE] agent: (view) kv.block(config/netbox/passwords/postgres) starting fetch
2022-06-07T19:23:24.061+0900 [TRACE] agent: kv.block(config/netbox/passwords/postgres): GET /v1/kv/config/netbox/passwords/postgres?stale=true&wait=1m0s
2022-06-07T19:23:24.061+0900 [TRACE] agent: kv.block(config/netbox/passwords/redis): returned "REDACTED"
2022-06-07T19:23:24.061+0900 [TRACE] agent: (view) kv.block(config/netbox/passwords/redis) marking successful data response
2022-06-07T19:23:24.061+0900 [TRACE] agent: (view) kv.block(config/netbox/passwords/redis) successful contact, resetting retries
2022-06-07T19:23:24.061+0900 [TRACE] agent: kv.block(config/netbox/passwords/azuread/secret): returned "REDACTED"
2022-06-07T19:23:24.061+0900 [TRACE] agent: (view) kv.block(config/netbox/passwords/azuread/secret) marking successful data response
2022-06-07T19:23:24.061+0900 [TRACE] agent: (view) kv.block(config/netbox/passwords/azuread/secret) successful contact, resetting retries
2022-06-07T19:23:24.061+0900 [TRACE] agent: kv.block(config/netbox/passwords/random): returned "REDACTED"
2022-06-07T19:23:24.061+0900 [TRACE] agent: (view) kv.block(config/netbox/passwords/random) marking successful data response
2022-06-07T19:23:24.061+0900 [TRACE] agent: (view) kv.block(config/netbox/passwords/random) successful contact, resetting retries
2022-06-07T19:23:24.061+0900 [TRACE] agent: kv.block(config/netbox/passwords/azuread/key): GET /v1/kv/config/netbox/passwords/azuread/key?stale=true&wait=1m0s
2022-06-07T19:23:24.061+0900 [TRACE] agent: kv.block(config/netbox/passwords/postgres): returned "REDACTED"
2022-06-07T19:23:24.061+0900 [TRACE] agent: (view) kv.block(config/netbox/passwords/postgres) marking successful data response
2022-06-07T19:23:24.061+0900 [TRACE] agent: (view) kv.block(config/netbox/passwords/postgres) successful contact, resetting retries
2022-06-07T19:23:24.062+0900 [TRACE] agent: kv.block(config/netbox/passwords/azuread/key): returned "REDACTED"
2022-06-07T19:23:24.062+0900 [TRACE] agent: (view) kv.block(config/netbox/passwords/azuread/key) marking successful data response
2022-06-07T19:23:24.062+0900 [TRACE] agent: (view) kv.block(config/netbox/passwords/azuread/key) successful contact, resetting retries
2022-06-07T19:23:24.181+0900 [DEBUG] agent: (runner) initiating run
2022-06-07T19:23:24.181+0900 [DEBUG] agent: (runner) checking template 7be357fb473e0b4b3fc43b7ed5a4529f
2022-06-07T19:23:24.181+0900 [DEBUG] agent: (runner) rendering "(dynamic)" => "/opt/nomad/data/alloc/8929fc2c-2523-4560-4678-4f278cc65ff5/netbox/secrets/extra.py"
2022-06-07T19:23:24.182+0900 [INFO]  agent: (runner) rendered "(dynamic)" => "/opt/nomad/data/alloc/8929fc2c-2523-4560-4678-4f278cc65ff5/netbox/secrets/extra.py"
2022-06-07T19:23:24.182+0900 [DEBUG] agent: (runner) diffing and updating dependencies
2022-06-07T19:23:24.182+0900 [DEBUG] agent: (runner) kv.block(config/netbox/passwords/azuread/secret) is still needed
2022-06-07T19:23:24.182+0900 [DEBUG] agent: (runner) kv.block(config/netbox/passwords/postgres) is still needed
2022-06-07T19:23:24.182+0900 [DEBUG] agent: (runner) kv.block(config/netbox/passwords/redis) is still needed
2022-06-07T19:23:24.182+0900 [DEBUG] agent: (runner) kv.block(config/netbox/passwords/random) is still needed
2022-06-07T19:23:24.182+0900 [DEBUG] agent: (runner) kv.block(config/netbox/passwords/azuread/key) is still needed
2022-06-07T19:23:24.182+0900 [DEBUG] agent: (runner) watching 5 dependencies
2022-06-07T19:23:24.182+0900 [DEBUG] agent: (runner) all templates rendered
2022-06-07T19:23:24.184+0900 [TRACE] client.alloc_runner.task_runner: waiting for cgroup to exist for: alloc_id=8929fc2c-2523-4560-4678-4f278cc65ff5 task=netbox allocID=8929fc2c-2523-4560-4678-4f278cc65ff5 task="&{netbox containerd-driver  map[cap_drop:[all] host_network:true image:netboxcommunity/netbox:v3.2.3 mounts:[map[options:[rbind ro] source:secrets/extra.py target:/opt/netbox/netbox/config/extra.py type:bind] map[target:/opt/unit type:tmpfs]]] map[SKIP_STARTUP_SCRIPTS:true SKIP_SUPERUSER:true] [] <nil> [0xc001810b40] [] [] 0xc0000200c0 0xc001e2fd40 <nil> <nil> map[] 5s 0xc002d787f0 [] false 0s [] []   <nil>}"
2022-06-07T19:23:24.185+0900 [INFO]  client.driver_mgr.containerd-driver: starting task: driver=containerd-driver @module=containerd-driver driver_cfg="{Image:netboxcommunity/netbox:v3.2.3 Command: Args:[] CapAdd:[] CapDrop:[all] Cwd: Devices:[] Seccomp:false SeccompProfile: ShmSize: Sysctl:map[] Privileged:false PidsLimit:0 PidMode: Hostname: HostDNS:true ImagePullTimeout:5m ExtraHosts:[] Entrypoint:[] ReadOnlyRootfs:false HostNetwork:true Auth:{Username: Password:} Mounts:[{Type:bind Target:/opt/netbox/netbox/config/extra.py Source:secrets/extra.py Options:[rbind ro]} {Type:tmpfs Target:/opt/unit Source: Options:[]}]}" timestamp="2022-06-07T19:23:24.185+0900"
2022-06-07T19:23:24.194+0900 [TRACE] client: next heartbeat: period=14.864207112s
2022-06-07T19:23:24.194+0900 [DEBUG] client: updated allocations: index=2672 total=14 pulled=10 filtered=4
2022-06-07T19:23:24.194+0900 [DEBUG] client: allocation updates: added=0 removed=0 updated=10 ignored=4
2022-06-07T19:23:24.196+0900 [TRACE] client.alloc_runner: AllocRunner has terminated, skipping alloc update: alloc_id=e9f3e280-402c-464b-065f-da0640f9bca4 modify_index=2548
2022-06-07T19:23:24.196+0900 [TRACE] client.alloc_runner: AllocRunner has terminated, skipping alloc update: alloc_id=bf940d18-edea-d982-4feb-ee2b7d062983 modify_index=2626
2022-06-07T19:23:24.197+0900 [TRACE] client.alloc_runner: AllocRunner has terminated, skipping alloc update: alloc_id=730096ea-34f0-3d7d-b8ad-74ccf486d7d0 modify_index=2670
2022-06-07T19:23:24.198+0900 [TRACE] client.alloc_runner: AllocRunner has terminated, skipping alloc update: alloc_id=f408b799-93c4-1901-82f6-f8b5f10f76fe modify_index=2600
2022-06-07T19:23:24.198+0900 [TRACE] client.alloc_runner: AllocRunner has terminated, skipping alloc update: alloc_id=c08df3b8-6bc1-2442-9e6b-3419efa4c2bf modify_index=2637
2022-06-07T19:23:24.199+0900 [TRACE] client.alloc_runner: AllocRunner has terminated, skipping alloc update: alloc_id=e0211280-1cc7-0c11-c509-8c823777e2ea modify_index=2615
2022-06-07T19:23:24.199+0900 [TRACE] client.alloc_runner: AllocRunner has terminated, skipping alloc update: alloc_id=34f5f8c0-40e5-e960-0fdd-8965a0696831 modify_index=2523
2022-06-07T19:23:24.200+0900 [TRACE] client.alloc_runner: AllocRunner has terminated, skipping alloc update: alloc_id=943ab85f-0921-e8d0-cdc4-e2944ff044cb modify_index=2536
2022-06-07T19:23:24.201+0900 [TRACE] client.alloc_runner: AllocRunner has terminated, skipping alloc update: alloc_id=7911106b-f333-1be8-07b5-faa498ab706d modify_index=2563
2022-06-07T19:23:24.201+0900 [TRACE] client.alloc_runner: AllocRunner has terminated, skipping alloc update: alloc_id=849709e0-92b1-0d17-bd10-6e2e62bb9312 modify_index=2651
2022-06-07T19:23:24.201+0900 [DEBUG] client: allocation updates applied: added=0 removed=0 updated=10 ignored=4 errors=0
2022-06-07T19:23:24.254+0900 [DEBUG] nomad: memberlist: Stream connection from=10.85.183.1:38700
2022-06-07T19:23:24.763+0900 [DEBUG] http: request complete: method=GET path=/v1/evaluation/7bf0d478-e783-e693-00f6-d05d0e4910bd duration="510.224µs"
2022-06-07T19:23:24.817+0900 [DEBUG] http: request complete: method=GET path=/v1/evaluation/7bf0d478-e783-e693-00f6-d05d0e4910bd/allocations duration="467.375µs"
2022-06-07T19:23:24.873+0900 [DEBUG] http: request complete: method=GET path="/v1/deployment/08f9eca3-08c2-088b-1754-da2852127ff5?stale=&wait=2000ms" duration="331.14µs"
2022-06-07T19:23:26.002+0900 [INFO]  client.driver_mgr.containerd-driver: Successfully pulled docker.io/netboxcommunity/netbox:v3.2.3 image
: driver=containerd-driver @module=containerd-driver timestamp="2022-06-07T19:23:26.001+0900"
2022-06-07T19:23:26.071+0900 [INFO]  client.driver_mgr.containerd-driver: Successfully created container with name: netbox-8929fc2c-2523-4560-4678-4f278cc65ff5
: driver=containerd-driver @module=containerd-driver timestamp="2022-06-07T19:23:26.071+0900"
2022-06-07T19:23:26.174+0900 [ERROR] client.alloc_runner.task_runner: running driver failed: alloc_id=8929fc2c-2523-4560-4678-4f278cc65ff5 task=netbox error="rpc error: code = Unknown desc = Error in creating task: OCI runtime create failed: container_linux.go:367: starting container process caused: process_linux.go:495: container init caused: rootfs_linux.go:76: mounting \"/run/containerd/io.containerd.runtime.v2.task/nomad/netbox-8929fc2c-2523-4560-4678-4f278cc65ff5/secrets/extra.py\" to rootfs at \"/opt/netbox/netbox/config/extra.py\" caused: stat /run/containerd/io.containerd.runtime.v2.task/nomad/netbox-8929fc2c-2523-4560-4678-4f278cc65ff5/secrets/extra.py: no such file or directory: unknown"
2022-06-07T19:23:26.174+0900 [INFO]  client.alloc_runner.task_runner: not restarting task: alloc_id=8929fc2c-2523-4560-4678-4f278cc65ff5 task=netbox reason="Error was unrecoverable"
2022-06-07T19:23:26.174+0900 [TRACE] client.alloc_runner.task_runner: setting task state: alloc_id=8929fc2c-2523-4560-4678-4f278cc65ff5 task=netbox state=dead event="Not Restarting"
2022-06-07T19:23:26.175+0900 [TRACE] client.alloc_runner: handling task state update: alloc_id=8929fc2c-2523-4560-4678-4f278cc65ff5 done=false
2022-06-07T19:23:26.175+0900 [TRACE] client.alloc_runner.task_runner: setting task state: alloc_id=8929fc2c-2523-4560-4678-4f278cc65ff5 task=redis state=running event=Started
2022-06-07T19:23:26.178+0900 [DEBUG] client.driver_mgr.containerd-driver: Overriding client stats interval: 1s with driver stats interval: 5s
: driver=containerd-driver @module=containerd-driver timestamp="2022-06-07T19:23:26.178+0900"
2022-06-07T19:23:26.178+0900 [DEBUG] client.alloc_runner: task failure, destroying all tasks: alloc_id=8929fc2c-2523-4560-4678-4f278cc65ff5 failed_task=netbox
2022-06-07T19:23:26.179+0900 [TRACE] consul.sync: commit sync operations: ops="<0, 0, 1, 1>"
2022-06-07T19:23:26.179+0900 [TRACE] client.alloc_runner.task_runner: Kill requested: alloc_id=8929fc2c-2523-4560-4678-4f278cc65ff5 task=netbox event_type=Killing event_reason=""
2022-06-07T19:23:26.180+0900 [TRACE] consul.sync: execute sync: reason=operations

Wait, what's with that mount source path? At least it has to be in netbox-.../rootfs/secrets/extra.py ...
If I use /secrets/extra.py or ${NOMAD_SECRETS_DIR}\extra.py it probably goes to host's root, which also fails.

As far as I can read into codes, paths go into syscall pretty much as-is, so I don't even know if who did anything wrong here.

@tgross
Copy link
Member

tgross commented Jun 7, 2022

I hope you are actually talking about server logs because there's literally nothing about allocations in client log, even on trace level.

Nope! Nomad servers are the control plane running raft. Nomad clients are the nodes where workloads are run. This is admittedly a little confusing with the Nomad CLI also being a "client" of sorts. But this is probably made more confusing because you're running the server and the client in the same agent, which we strongly recommend against doing for production.

Fortunately, the client logs are included in what you've provided. 😀

Here's the relevant log line:

2022-06-07T19:23:26.174+0900 [ERROR] client.alloc_runner.task_runner: running driver failed: alloc_id=8929fc2c-2523-4560-4678-4f278cc65ff5 task=netbox error="rpc error: code = Unknown desc = Error in creating task: OCI runtime create failed: container_linux.go:367: starting container process caused: process_linux.go:495: container init caused: rootfs_linux.go:76: mounting "/run/containerd/io.containerd.runtime.v2.task/nomad/netbox-8929fc2c-2523-4560-4678-4f278cc65ff5/secrets/extra.py" to rootfs at "/opt/netbox/netbox/config/extra.py" caused: stat /run/containerd/io.containerd.runtime.v2.task/nomad/netbox-8929fc2c-2523-4560-4678-4f278cc65ff5/secrets/extra.py: no such file or directory: unknown"

As you've noted, this looks like a general problem with mounts and not specifically a CSI issue. I've retitled the issue for clarity

Wait, what's with that mount source path? At least it has to be in netbox-.../rootfs/secrets/extra.py ...
If I use /secrets/extra.py or ${NOMAD_SECRETS_DIR}\extra.py it probably goes to host's root, which also fails.

Yes, that looks very strange to me. I would expect the path shown to be the absolute path from the Nomad data directory for the task, but this path looks almost as if you were running Nomad as a containerd container. If that's the case, you're "off the edge of the map" and although we'd love to read a write-up of your adventure to get that working once you've done so... it's an entirely unsupported installation.

If that's not the case, it looks like there might be an issue with your containerd driver configuration. If you can provide your containerd driver configuration that would be very helpful. The driver is third-party, so I'm also going to tag our good friend @shishir-a412ed over at Roblox who is the primary author of the driver. He might have some insights here as well.

@tgross tgross changed the title Running job with containerd-driver and Vultr CSI volume always fails mounts fail for containerd-driver Jun 7, 2022
@alemonmk
Copy link
Author

alemonmk commented Jun 7, 2022

Nope! Nomad servers are the control plane running raft. Nomad clients are the nodes where workloads are run. This is admittedly a little confusing with the Nomad CLI also being a "client" of sorts. But this is probably made more confusing because you're running the server and the client in the same agent, which we strongly recommend against doing for production.

This log comes from Cluster -> Server -> <node> -> Monitor on the UI. I'm aware it's recommended against though with limited budget and not critical workload it is probably fine.

running Nomad as a containerd container

It's not the case, the bootstrapping effort would be off the charts, and admittedly I'm no expert in container tech.
The containerd driver config is simply:

plugin "containerd-driver" {
    config {
        enabled = true
        containerd_runtime = "io.containerd.runc.v2"
        stats_interval = "5s"
        allow_privileged = true
    }
}

@alemonmk
Copy link
Author

alemonmk commented Jun 8, 2022

For the record, I changed the netbox job above to use docker driver and the mount is up.

@tgross
Copy link
Member

tgross commented Jun 8, 2022

Thanks @alemonmk. We might need to open an issue with the driver authors here but let me see if I can make a minimal reproduction as well first.

@tgross
Copy link
Member

tgross commented Jun 8, 2022

I was able to reproduce this issue on my local Vagrant environment and it looks like the problem is that the containerd driver expects the mount source to be from the host and not the task.

As it turns out, this behavior is exactly what's documented for the containerd driver (ref source):

source - (Optional) Source path on the host.

This is different than other task drivers, which want to isolate the host's filesystem and then end up needing to have a volumes { enabled = true } flag to allow host access. So it looks like everything is working just as intended and the task driver just doesn't do what you'd expect.

To reproduce, first I reduced the jobspec you provided down to a minimal spec that demonstrates the problem:

jobspec
job "example" {
  datacenters = ["dc1"]

  group "app" {

    task "netbox" {
      driver = "containerd-driver"

      config {
        image   = "busybox:1"
        command = "httpd"
        args    = ["-v", "-f", "-p", "8001", "-h", "/local"]

        mounts = [
          {
            type    = "bind"
            source  = "${NOMAD_SECRETS_DIR}/extra.py"
            target  = "/opt/netbox/netbox/config/extra.py"
            options = ["rbind", "ro"]
          }
        ]
      }

      template {
        data        = "xyzzy"
        destination = "${NOMAD_SECRETS_DIR}/extra.py"
      }

      resources {
        cpu    = 256
        memory = 256
      }
    }
  }
}

I run that job and get the same error you're seeing:

2022-06-08T19:09:43.139Z [ERROR] client.alloc_runner.task_runner: running driver failed: alloc_id=f7faf875-3c41-642f-8eb9-d78a43ab2763 task=netbox error="rpc error: code = Unknown desc = Error in creating task: failed to create shim: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: rootfs_linux.go:75: mounting \"/run/containerd/io.containerd.runtime.v2.task/nomad/netbox-f7faf875-3c41-642f-8eb9-d78a43ab2763/secrets/extra.py\" to rootfs at \"/opt/netbox/netbox/config/extra.py\" caused: stat /run/containerd/io.containerd.runtime.v2.task/nomad/netbox-f7faf875-3c41-642f-8eb9-d78a43ab2763/secrets/extra.py: no such file or directory: unknown"

And if I remove the bind-mount and/or replace it with the tmpfs mount, everything works as expected.

Next I dropped a file into a directory on the host and switched the mounts to the following:

mounts = [
  {
    type    = "bind"
    source = "/srv/data/example.txt"
    target  = "/opt/netbox/netbox/config/extra.py"
    options = ["rbind", "ro"]
  }
]

Now I can run the job and it works as expected:

$ cat /srv/data/example.txt
hello, world

$ nomad job run ./jobs/containerd.nomad
==> 2022-06-08T15:19:30-04:00: Monitoring evaluation "e215ced1"
    2022-06-08T15:19:30-04:00: Evaluation triggered by job "example"
==> 2022-06-08T15:19:31-04:00: Monitoring evaluation "e215ced1"
    2022-06-08T15:19:31-04:00: Evaluation within deployment: "ef7e24ed"
    2022-06-08T15:19:31-04:00: Allocation "c0e8e1d2" created: node "74c22702", group "app"
    2022-06-08T15:19:31-04:00: Evaluation status changed: "pending" -> "complete"
==> 2022-06-08T15:19:31-04:00: Evaluation "e215ced1" finished with status "complete"
==> 2022-06-08T15:19:31-04:00: Monitoring deployment "ef7e24ed"
  ✓ Deployment "ef7e24ed" successful

    2022-06-08T15:19:42-04:00
    ID          = ef7e24ed
    Job ID      = example
    Job Version = 0
    Status      = successful
    Description = Deployment completed successfully

    Deployed
    Task Group  Desired  Placed  Healthy  Unhealthy  Progress Deadline
    app         1        1       1        0          2022-06-08T19:29:40Z

$ nomad alloc exec c0e8e1d2 cat /opt/netbox/netbox/config/extra.py
hello, world

This still leaves you with a problem with the template rendering though. Template rendering happens before the container is created, so you can't render into the task's pivot root. There are two ways to deal with this:

  • You can configure your application to look for its config file on the template directory.
  • You can create a symlink in your container image that points to the location that'll be rendered into /secrets. The symlink will be dangling until the template gets rendered.

@tgross
Copy link
Member

tgross commented Jun 8, 2022

Oh right, but the original problem was that you couldn't get CSI to work. 😊 I just double-checked your original jobspec and it looks like you're doing this same bind-mount trick from the CSI volume mount to the location you want the config file. So the same two of solutions apply there as well.

I was able to confirm that CSI should work generally with the containerd plugin. This jobspec uses the hostpath plugin demo from the demos directory:

jobspec
job "example" {
  datacenters = ["dc1"]

  group "app" {

    volume "volume0" {
      type            = "csi"
      source          = "test-volume"
      attachment_mode = "file-system"
      access_mode     = "single-node-reader-only"
      read_only       = true
    }

    task "netbox" {
      driver = "containerd-driver"

      config {
        image   = "busybox:1"
        command = "httpd"
        args    = ["-v", "-f", "-p", "8001", "-h", "/local"]
      }

      volume_mount {
        volume      = "volume0"
        destination = "${NOMAD_ALLOC_DIR}/volume0"
      }

      resources {
        cpu    = 256
        memory = 256
      }
    }
  }
}

@alemonmk
Copy link
Author

alemonmk commented Jun 9, 2022

Thank you for the detailed analysis. So it's all my misunderstanding...

This is different than other task drivers

Podman (which I originally wanted to use) also ask for path on host as well.

you're doing this same bind-mount trick from the CSI volume mount to the location you want the config file

I believe the config file mount is irrelevant, I tried this and it's failing.

  • You can configure your application to look for its config file on the template directory.
  • You can create a symlink in your container image that points to the location that'll be rendered into /secrets. The symlink will be dangling until the template gets rendered.

Thanks for the suggestion, though I made things work with source = "/opt/nomad/data/alloc/${NOMAD_ALLOC_ID}/${NOMAD_TASK_NAME}/{local,secrets}". CSI volume staged path can be constructed like this as well but I doubt it would work as it will be exactly the same as in OP and it's proven to be not working.
If this works, then I highly doubt the usefulness of NOMAD_{ALLOC,TASK,SECRETS}_DIR since their contents is not configurable and can be easily hardcorded with less key strokes. Without this translation it's hacky (or not very clean) to use lower-level driver. Maybe we could not live without docker after all. 😆

@tgross
Copy link
Member

tgross commented Jun 9, 2022

I believe the config file mount is irrelevant, I tried this and it's failing.

In the same way with the same error as before? As I noted, the hostpath driver seemed to work fine with containerd for me. So it'd be interesting to see if the Vultr driver is doing something unusual here. Can you provide the client debug logs and plugin logs for this exact scenario without the complication of the template paths?

If this works, then I highly doubt the usefulness of NOMAD_{ALLOC,TASK,SECRETS}_DIR since their contents is not configurable and can be easily hardcorded with less key strokes.

Those environment variables are for the the paths inside the task's view of the filesystem. For task drivers that have filesystem isolation like docker or containerd, you're right that they're not super useful. But for drivers that don't have filesystem isolation like raw_exec they're essential.

@alemonmk
Copy link
Author

alemonmk commented Jun 9, 2022

OK, I used the same busybox sleep container above and look further into it by execing into the CSI node container while have no volume_mount stanza to mount the volume. Very interesting finding:

From client log (minus 9 hours from time to be consistent to time shown below):

2022-06-09T22:29:07.605+0900 [DEBUG] http: request complete: method=GET path=/v1/client/allocation/25b7acaa-10a7-9951-3f8f-586d297d99c5/stats duration=15.410514ms
2022-06-09T22:29:08.831+0900 [DEBUG] http: request complete: method=GET path=/v1/agent/health?type=client duration="369.089µs"
2022-06-09T22:29:09.429+0900 [DEBUG] http: request complete: method=POST path=/v1/allocation/25b7acaa-10a7-9951-3f8f-586d297d99c5/stop duration=30.909755ms
2022-06-09T22:29:09.437+0900 [DEBUG] http: request complete: method=GET path=/v1/allocation/25b7acaa-10a7-9951-3f8f-586d297d99c5?index=8093 duration=2.083425134s
2022-06-09T22:29:09.439+0900 [DEBUG] http: request complete: method=GET path=/v1/node/e465bda6-7589-5daa-9d36-d7a290e2245a/allocations?index=8093 duration=13.088001515s
2022-06-09T22:29:09.443+0900 [DEBUG] http: request complete: method=GET path=/v1/node/e465bda6-7589-5daa-9d36-d7a290e2245a/allocations?index=8093 duration=25.761223702s
2022-06-09T22:29:09.446+0900 [DEBUG] http: request complete: method=GET path=/v1/node/e465bda6-7589-5daa-9d36-d7a290e2245a/allocations?index=8093 duration=16.011109752s
2022-06-09T22:29:09.836+0900 [DEBUG] http: request complete: method=GET path=/v1/allocation/25b7acaa-10a7-9951-3f8f-586d297d99c5?index=8095 duration=323.689792ms
2022-06-09T22:29:09.904+0900 [DEBUG] http: request complete: method=GET path=/v1/allocation/0ec55f9a-8320-f1c6-d1cf-d32142d2ac4e duration=1.867516ms
2022-06-09T22:29:11.516+0900 [DEBUG] http: request complete: method=GET path=/v1/allocation/25b7acaa-10a7-9951-3f8f-586d297d99c5?index=8096 duration=2.348508ms
2022-06-09T22:29:11.787+0900 [DEBUG] http: request complete: method=GET path=/v1/allocation/0ec55f9a-8320-f1c6-d1cf-d32142d2ac4e duration=2.248318ms
2022-06-09T22:29:11.795+0900 [DEBUG] http: request complete: method=GET path=/v1/job/csi-test duration=1.807503ms
2022-06-09T22:29:11.796+0900 [DEBUG] http: request complete: method=GET path=/v1/namespaces duration=1.392388ms
2022-06-09T22:29:15.068+0900 [ERROR] nomad.fsm: CSIVolumeClaim failed: error="volume max claims reached"
2022-06-09T22:29:15.160+0900 [DEBUG] http: request complete: method=GET path=/v1/allocation/25b7acaa-10a7-9951-3f8f-586d297d99c5?index=8098 duration=1.644702814s
2022-06-09T22:29:17.103+0900 [ERROR] nomad.fsm: CSIVolumeClaim failed: error="volume max claims reached"
2022-06-09T22:29:17.382+0900 [DEBUG] http: request complete: method=GET path=/v1/agent/health?type=server duration=1.611932ms
2022-06-09T22:29:17.409+0900 [DEBUG] nomad: memberlist: Stream connection from=xx.xx.xx.xx:57818
2022-06-09T22:29:18.832+0900 [DEBUG] http: request complete: method=GET path=/v1/agent/health?type=client duration="230.928µs"
2022-06-09T22:29:19.039+0900 [WARN]  client.csi_client: finished client unary call: plugin.name=csi-vultr-jp plugin.type=controller grpc.code=Internal duration=2.024307619s grpc.service=csi.v1.Controller grpc.method=ControllerUnpublishVolume
2022-06-09T22:29:22.893+0900 [DEBUG] http: request complete: method=GET path=/v1/nodes?index=8092 duration=37.5939375s
2022-06-09T22:29:22.893+0900 [DEBUG] http: request complete: method=GET path=/v1/nodes?index=8092 duration=29.459806056s
2022-06-09T22:29:24.717+0900 [DEBUG] nomad: memberlist: Initiating push/pull sync with: xxxx.yy xx.xx.xx.xx:4648
2022-06-09T22:29:27.388+0900 [DEBUG] http: request complete: method=GET path=/v1/agent/health?type=server duration=2.976346ms
2022-06-09T22:29:27.410+0900 [DEBUG] nomad: memberlist: Stream connection from=xx.xx.xx.xx:57860

I have to do a stop alloc to get new log, but the content is reproducible.


From CSI node plugin log:

time="2022-06-09T13:02:43Z" level=info msg="GRPC response: capabilities:<rpc:<type:STAGE_UNSTAGE_VOLUME > > " GRPC.call=/csi.v1.Node/NodeGetCapabilities GRPC.request=
time="2022-06-09T13:03:08Z" level=info msg="Node Stage Volume: called" capacity="mount:<fs_type:\"ext4\" > access_mode:<mode:SINGLE_NODE_WRITER > " host_id=f593da5b-08b2-47a6-a802-28128d16130c region=NRT target=/local/csi/staging/grav_store/rw-file-system-single-node-writer version=0.7.0 volume=14479912-4e97-42ea-9128-8b37d51ad619
time="2022-06-09T13:03:08Z" level=info msg="isFormatted called" format-args="[/dev/disk/by-id/virtio-nrt-144799124e9742]" format-command=blkid host_id=f593da5b-08b2-47a6-a802-28128d16130c region=NRT version=0.7.0
time="2022-06-09T13:03:08Z" level=info msg="Format called" format-args="[-F /dev/disk/by-id/virtio-nrt-144799124e9742]" format-cmd=mkfs.ext4 fs-type=ext4 host_id=f593da5b-08b2-47a6-a802-28128d16130c region=NRT source=/dev/disk/by-id/virtio-nrt-144799124e9742 version=0.7.0
time="2022-06-09T13:03:08Z" level=info msg="Mount Called" filesystem=ext4 host_id=f593da5b-08b2-47a6-a802-28128d16130c methods=mount options="[]" region=NRT source=/dev/disk/by-id/virtio-nrt-144799124e9742 target=/local/csi/staging/grav_store/rw-file-system-single-node-writer version=0.7.0
time="2022-06-09T13:03:08Z" level=info msg="mount command and arguments" host_id=f593da5b-08b2-47a6-a802-28128d16130c mount arguments="[-t ext4 /dev/disk/by-id/virtio-nrt-144799124e9742 /local/csi/staging/grav_store/rw-file-system-single-node-writer]" mount command=mount region=NRT version=0.7.0
time="2022-06-09T13:03:08Z" level=info msg="Node Stage Volume: volume staged" host_id=f593da5b-08b2-47a6-a802-28128d16130c region=NRT version=0.7.0
time="2022-06-09T13:03:08Z" level=info msg="GRPC response: " GRPC.call=/csi.v1.Node/NodeStageVolume GRPC.request="volume_id:\"14479912-4e97-42ea-9128-8b37d51ad619\" publish_context:<key:\"\" value:\"nrt-144799124e9742\" > staging_target_path:\"/local/csi/staging/grav_store/rw-file-system-single-node-writer\" volume_capability:<mount:<fs_type:\"ext4\" > access_mode:<mode:SINGLE_NODE_WRITER > > "
time="2022-06-09T13:03:08Z" level=info msg="Node Publish Volume: called" host_id=f593da5b-08b2-47a6-a802-28128d16130c region=NRT staging_target_path=/local/csi/staging/grav_store/rw-file-system-single-node-writer target_path=/local/csi/per-alloc/557f2c4e-00fe-0d45-42e8-56182a130942/grav_store/rw-file-system-single-node-writer version=0.7.0 volume_id=14479912-4e97-42ea-9128-8b37d51ad619
time="2022-06-09T13:03:08Z" level=info msg="Mount Called" filesystem=ext4 host_id=f593da5b-08b2-47a6-a802-28128d16130c methods=mount options="[bind]" region=NRT source=/local/csi/staging/grav_store/rw-file-system-single-node-writer target=/local/csi/per-alloc/557f2c4e-00fe-0d45-42e8-56182a130942/grav_store/rw-file-system-single-node-writer version=0.7.0
time="2022-06-09T13:03:08Z" level=info msg="mount command and arguments" host_id=f593da5b-08b2-47a6-a802-28128d16130c mount arguments="[-t ext4 -o bind /local/csi/staging/grav_store/rw-file-system-single-node-writer /local/csi/per-alloc/557f2c4e-00fe-0d45-42e8-56182a130942/grav_store/rw-file-system-single-node-writer]" mount command=mount region=NRT version=0.7.0
time="2022-06-09T13:03:08Z" level=info msg="mount command - removal of lost+found error" error="stat /local/csi/per-alloc/557f2c4e-00fe-0d45-42e8-56182a130942/grav_store/rw-file-system-single-node-writer/lost+found: no such file or directory" host_id=f593da5b-08b2-47a6-a802-28128d16130c region=NRT version=0.7.0
time="2022-06-09T13:03:08Z" level=info msg="Node Publish Volume: published" host_id=f593da5b-08b2-47a6-a802-28128d16130c region=NRT version=0.7.0
time="2022-06-09T13:03:08Z" level=info msg="GRPC response: " GRPC.call=/csi.v1.Node/NodePublishVolume GRPC.request="volume_id:\"14479912-4e97-42ea-9128-8b37d51ad619\" publish_context:<key:\"\" value:\"nrt-144799124e9742\" > staging_target_path:\"/local/csi/staging/grav_store/rw-file-system-single-node-writer\" target_path:\"/local/csi/per-alloc/557f2c4e-00fe-0d45-42e8-56182a130942/grav_store/rw-file-system-single-node-writer\" volume_capability:<mount:<fs_type:\"ext4\" > access_mode:<mode:SINGLE_NODE_WRITER > > "

Same good old successful volume publishing here.


From CSI node container:

$ nomad alloc exec -i -t -task plugin 99df0b74 sh
/ # ls /local/csi/per-alloc/557f2c4e-00fe-0d45-42e8-56182a130942/grav_store/rw-file-system-single-node-writer/ -al
total 8
drwxr-xr-x    2 root     root          4096 Jun  9 13:03 .
drwxr-x---    3 root     root          4096 Jun  9 13:03 ..
/ # ls /local/csi/staging/grav_store/rw-file-system-single-node-writer/ -al
total 8
drwxr-xr-x    2 root     root          4096 Jun  9 13:03 .
drwxr-x---    3 root     root          4096 Jun  9 13:03 ..
/ # mount
overlay on / type overlay (rw,relatime,lowerdir=/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/137/fs:/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/136/fs:/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/1/fs,upperdir=/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/173/fs,workdir=/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/173/work)
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
tmpfs on /dev type tmpfs (rw,nosuid,size=65536k,mode=755)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=666)
shm on /dev/shm type tmpfs (rw,nosuid,nodev,noexec,relatime,size=65536k)
mqueue on /dev/mqueue type mqueue (rw,nosuid,nodev,noexec,relatime)
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
tmpfs on /run type tmpfs (rw,nosuid,size=65536k,mode=755)
/dev/vda1 on /etc/hosts type ext4 (ro,relatime,errors=remount-ro)
tmpfs on /etc/resolv.conf type tmpfs (ro,relatime,size=402564k,mode=755)
/dev/vda1 on /csi type ext4 (rw,relatime,errors=remount-ro)
/dev/vda1 on /local/csi type ext4 (rw,relatime,errors=remount-ro)
udev on /dev type devtmpfs (rw,nosuid,relatime,size=1996132k,nr_inodes=499033,mode=755)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)
mqueue on /dev/mqueue type mqueue (rw,nosuid,nodev,noexec,relatime)
hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime,pagesize=2M)
tmpfs on /etc/resolv.conf type tmpfs (ro,relatime,size=402564k,mode=755)
tmpfs on /secrets type tmpfs (rw,noexec,relatime,size=1024k)
/dev/vda1 on /local type ext4 (rw,relatime,errors=remount-ro)
/dev/vda1 on /alloc type ext4 (rw,relatime,errors=remount-ro)
/dev/disk/by-id/virtio-nrt-144799124e9742 on /local/csi/staging/grav_store/rw-file-system-single-node-writer type ext4 (rw,relatime)
/dev/disk/by-id/virtio-nrt-144799124e9742 on /local/csi/per-alloc/557f2c4e-00fe-0d45-42e8-56182a130942/grav_store/rw-file-system-single-node-writer type ext4 (rw,relatime)

From host (also minus 9 hours on the file time):

root@sajuna:~# ls /opt/nomad/data/client/csi/node/csi-vultr-jp/staging/grav_store/ -al
total 12
drwx------ 3 root root 4096 Jun  9 22:03 .
drwx------ 4 root root 4096 Jun  9 22:03 ..
drwx------ 2 root root 4096 Jun  9 22:03 rw-file-system-single-node-writer
root@sajuna:~# ls /opt/nomad/data/client/csi/node/csi-vultr-jp/per-alloc/557f2c4e-00fe-0d45-42e8-56182a130942/grav_store/ -al
total 8
drwx------ 2 root root 4096 Jun  9 22:03 .
drwx------ 3 root root 4096 Jun  9 22:03 ..
root@sajuna:~# mount
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
udev on /dev type devtmpfs (rw,nosuid,relatime,size=1996132k,nr_inodes=499033,mode=755)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
tmpfs on /run type tmpfs (rw,nosuid,nodev,noexec,relatime,size=402564k,mode=755)
/dev/vda1 on / type ext4 (rw,relatime,errors=remount-ro)
securityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,relatime)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)
tmpfs on /run/lock type tmpfs (rw,nosuid,nodev,noexec,relatime,size=5120k)
cgroup2 on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime,nsdelegate,memory_recursiveprot)
pstore on /sys/fs/pstore type pstore (rw,nosuid,nodev,noexec,relatime)
none on /sys/fs/bpf type bpf (rw,nosuid,nodev,noexec,relatime,mode=700)
systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=30,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=9164)
mqueue on /dev/mqueue type mqueue (rw,nosuid,nodev,noexec,relatime)
hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime,pagesize=2M)
debugfs on /sys/kernel/debug type debugfs (rw,nosuid,nodev,noexec,relatime)
tracefs on /sys/kernel/tracing type tracefs (rw,nosuid,nodev,noexec,relatime)
configfs on /sys/kernel/config type configfs (rw,nosuid,nodev,noexec,relatime)
fusectl on /sys/fs/fuse/connections type fusectl (rw,nosuid,nodev,noexec,relatime)
tmpfs on /opt/nomad/data/alloc/99df0b74-4e4b-8fa3-4c73-72d4851ede6e/plugin/secrets type tmpfs (rw,noexec,relatime,size=1024k)
overlay on /run/containerd/io.containerd.runtime.v2.task/nomad/plugin-99df0b74-4e4b-8fa3-4c73-72d4851ede6e/rootfs type overlay (rw,relatime,lowerdir=/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/137/fs:/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/136/fs:/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/1/fs,upperdir=/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/173/fs,workdir=/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/173/work)
tmpfs on /opt/nomad/data/alloc/ece3722a-bf90-ac5d-0bdb-fd5c348e5fa3/caddy/secrets type tmpfs (rw,noexec,relatime,size=1024k)
overlay on /run/containerd/io.containerd.runtime.v2.task/nomad/caddy-ece3722a-bf90-ac5d-0bdb-fd5c348e5fa3/rootfs type overlay (rw,relatime,lowerdir=/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/5/fs:/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/4/fs:/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/3/fs:/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/2/fs:/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/1/fs,upperdir=/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/192/fs,workdir=/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/192/work)
tmpfs on /opt/nomad/data/alloc/c43c9a57-c9c3-56c7-2573-551e95a6d4fd/coredns/secrets type tmpfs (rw,noexec,relatime,size=1024k)
overlay on /run/containerd/io.containerd.runtime.v2.task/nomad/coredns-c43c9a57-c9c3-56c7-2573-551e95a6d4fd/rootfs type overlay (rw,relatime,lowerdir=/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/11/fs:/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/10/fs,upperdir=/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/226/fs,workdir=/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/226/work)
tmpfs on /opt/nomad/data/alloc/73380f31-cebf-e22c-6136-2c702800c2eb/vaultwarden/secrets type tmpfs (rw,noexec,relatime,size=1024k)
overlay on /run/containerd/io.containerd.runtime.v2.task/nomad/vaultwarden-73380f31-cebf-e22c-6136-2c702800c2eb/rootfs type overlay (rw,relatime,lowerdir=/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/187/fs:/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/186/fs:/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/185/fs:/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/184/fs:/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/183/fs:/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/182/fs,upperdir=/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/227/fs,workdir=/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/227/work)
tmpfs on /run/user/1000 type tmpfs (rw,nosuid,nodev,relatime,size=402560k,nr_inodes=100640,mode=700,uid=1000,gid=1000)
tmpfs on /opt/nomad/data/alloc/557f2c4e-00fe-0d45-42e8-56182a130942/pause/secrets type tmpfs (rw,noexec,relatime,size=1024k)
overlay on /run/containerd/io.containerd.runtime.v2.task/nomad/pause-557f2c4e-00fe-0d45-42e8-56182a130942/rootfs type overlay (rw,relatime,lowerdir=/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/228/fs,upperdir=/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/229/fs,workdir=/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/229/work)

No wonder all the no such file or directory! It actually does not exist! The mount somehow does not propagated to host, but I have no idea what should have been set up on my system but haven't...

@tgross
Copy link
Member

tgross commented Jun 9, 2022

OK, I used the same busybox sleep container above and look further into it by execing into the CSI node container while have no volume_mount stanza to mount the volume.

Can you provide the exact jobspec you ran please so that there's no ambiguity for me to try to reproduce?

From client log (minus 9 hours from time to be consistent to time shown below):
...
2022-06-09T22:29:15.068+0900 [ERROR] nomad.fsm: CSIVolumeClaim failed: error="volume max claims reached"
...
2022-06-09T22:29:19.039+0900 [WARN] client.csi_client: finished client unary call: plugin.name=csi-vultr-jp plugin.type=controller grpc.code=Internal duration=2.024307619s grpc.service=csi.v1.Controller grpc.method=ControllerUnpublishVolume

Ok, that set of logs is mostly server logs but the first error here indicates that when the client tried to claim the volume, the volume did not have free claims available. We don't also see a "could not claim volume" error in the client logs which suggests it was retried successfully. But then 4 seconds later we're unpublishing the volume which suggests the task failed to start.

The CSI node plugin log is showing successful responses:

From CSI node plugin log:
...
time="2022-06-09T13:03:08Z" level=info msg="GRPC response: " GRPC.call=/csi.v1.Node/NodeStageVolume GRPC.request="volume_id:"14479912-4e97-42ea-9128-8b37d51ad619" publish_context:<key:"" value:"nrt-144799124e9742" > staging_target_path:"/local/csi/staging/grav_store/rw-file-system-single-node-writer" volume_capability:<mount:<fs_type:"ext4" > access_mode:<mode:SINGLE_NODE_WRITER > > "
...
time="2022-06-09T13:03:08Z" level=info msg="GRPC response: " GRPC.call=/csi.v1.Node/NodePublishVolume GRPC.request="volume_id:"14479912-4e97-42ea-9128-8b37d51ad619" publish_context:<key:"" value:"nrt-144799124e9742" > staging_target_path:"/local/csi/staging/grav_store/rw-file-system-single-node-writer" target_path:"/local/csi/per-alloc/557f2c4e-00fe-0d45-42e8-56182a130942/grav_store/rw-file-system-single-node-writer" volume_capability:<mount:<fs_type:"ext4" > access_mode:<mode:SINGLE_NODE_WRITER > > "

The per-alloc path is what's missing from the host. It probably couldn't hurt if you could share the plugin jobspec as well.

Also, I don't think you have debug logging enabled on the client, otherwise I'd expect to see a lot more logs that would really help me figure out what's going on here. As a result, I also can't see the final error message you're seeing that's getting sent up to task event. Can you provide those please?

@alemonmk
Copy link
Author

alemonmk commented Jun 9, 2022

Disregard the client log last post, I think it was the log when it was restarting the task.

Job specs: https://gist.github.com/alemonmk/7365e17ca6b57bbc4ef78311bf9332f9
Server and client logs: https://gist.github.com/alemonmk/4b48ce1710a71a18c670013ef4b4ce28
Nodes and controller plugin logs would be about the same as posted before.

I get logs from here, and I'm not sure there's a config entry to turn on a more verbose logging.
Screenshot_1

Task never fails since I don't have volume mounted.
Screenshot_2

Mounts in CSI plugin is working.
Screenshot_3

@shishir-a412ed
Copy link
Contributor

Hi @tgross Thanks for the detailed analysis and reproduction!

IIUC, the bind mount is failing because source in containerd-driver is a host path, and not relative to Nomad task allocation directory. The templates are rendered relative to Nomad task allocation directory.

      template {
        data        = <<-EOT
        local all postgres peer
        host all postgres 0.0.0.0/0 scram-sha-256
        host sameuser all 0.0.0.0/0 scram-sha-256
        EOT
        destination = "local/conf/pg_hba.conf"
      }

      template {
        data        = <<-EOT
        listen_address = '{{ env "NOMAD_IP_postgres" }}'
        port = {{ env "NOMAD_PORT_postgres" }}
        password_encryption = scram-sha-256
        ssl = off
        EOT
        destination = "local/conf/conf.d/extra.conf"

However, when the job tries to bind mount using containerd-driver, containerd doesn't understand the relative path (local/conf) and looks for file on a host path relative to containerd rather than being relative to Nomad (This is expected behaviour for the containerd-driver)

        mounts = [
          {
            type    = "bind"
            source  = "local/conf"
            target  = "/bitnami/postgresql/conf"
            options = ["rbind", "ro"]
          }
        ]

I believe this works in docker, because docker is looking for paths relative to task allocation directory.
Can you point me to the code in docker driver where this happens? Let me look into containerd-driver and see if we can make this behaviour same as Nomad docker driver.

@alemonmk Is there an easy way I can reproduce this in a vagrant VM?
I see you have posted some job specs
Can you tell me which jobs to use to:

a) Install Vultr CSI plugin. Make it up and running and register it with Nomad.
b) Provision the Vultr block storage volume.
c) Deploy the PostgresSQL container with this volume mounted into the container using containerd-driver.

I know the job spec for (c), however looking for some instructions around (a) and (b) as I am not super familiar with Vultr CSI plugin. If there is an easier way to reproduce this, please let me know.

@shishir-a412ed
Copy link
Contributor

I was going through the containerd-driver code, as I remember I fixed this in the past. But looks like I fixed this for a specific case (relative paths starting with local) and not generic case.

Snippet to code

Some more details in this issue comment

So, it will work for relative paths like local/conf but fail on secrets/extra.py.
@tgross Do you think I should change the logic to cover relative paths for all three local, secrets and alloc, and that will address all future usecases like this?

@alemonmk
Copy link
Author

alemonmk commented Jun 10, 2022

I believe this works in docker, because docker is looking for paths relative to task allocation directory.
Can you point me to the code in docker driver where this happens? Let me look into containerd-driver and see if we can make this behaviour same as Nomad docker driver.

I believe this is it:

src = expandPath(task.TaskDir().Dir, src)

hm.Source = expandPath(task.TaskDir().Dir, hm.Source)

a) Install Vultr CSI plugin. Make it up and running and register it with Nomad.

Please edit and nomad job run csi-vultr-controller.hcl and cs-vultr-nodes.hcl.

b) Provision the Vultr block storage volume.

I use Terraform to do this. I don't know anything about vagrant though, apologize.

Alternatively, vultr-csi does implement CreateVolume(), nomad volume create with the following spec may work:

name = "test-store"
type = "csi"
plugin_id = "csi-vultr-test"
capacity_min = "10G"
capacity_max = "10G"

capability {
  access_mode = "single-node-writer"
  attachment_mode = "file-system"
}

mount_options {
  fs_type     = "ext4"
  mount_flags = ["noatime"]
}

parameters {
  block_type = "high_perf"
}

@alemonmk
Copy link
Author

alemonmk commented Jun 10, 2022

I believe I have found the problem here despite my limited code reading skills.

tl;dr containerd-driver does not set bind mount propagation mode, thus what mounted in CSI node container does not appear in host.

First, the CSI mounting points /local/csi is set to be a bidirectional bind mount according to:

volumeStagingMounts := &drivers.MountConfig{
// TODO(tgross): add this TaskPath to the CSIPluginConfig as well
TaskPath: "/local/csi",
HostPath: h.mountPoint,
Readonly: false,
PropagationMode: "bidirectional",
}

The mounts defined goes to this function, which we can see m.PropagationMode was never used:
https://github.com/Roblox/nomad-driver-containerd/blob/6fb3f5a2f243b77a0b01ce6ee1b063c381562848/containerd/driver.go#L283

BTW this is how docker driver handles it:

nomad/drivers/docker/driver.go

Lines 1046 to 1063 in 2b054e3

for _, m := range task.Mounts {
hm := docker.HostMount{
Type: "bind",
Target: m.TaskPath,
Source: m.HostPath,
ReadOnly: m.Readonly,
}
// MountPropagation is only supported by Docker on Linux:
// https://docs.docker.com/storage/bind-mounts/#configure-bind-propagation
if runtime.GOOS == "linux" {
hm.BindOptions = &docker.BindOptions{
Propagation: userMountToUnixMount[m.PropagationMode],
}
}
hostConfig.Mounts = append(hostConfig.Mounts, hm)
}

Resulting in containerd mounting /local/csi with propagation mode = rprivate:
https://github.com/containerd/containerd/blob/30436a061bea78a7a54f7d1b162e21fa947e4bae/pkg/cri/opts/spec_linux.go#L198-L200
Screenshot_1

I'm not sure where hp.getVolumePath() points to, but my best guess is that either it does not use container itself as staging ground, or @tgross did not run the hostpath CSI driver with containerd-driver.

I validated this by running csi-vultr-nodes with docker driver.
Screenshot_2

@alemonmk
Copy link
Author

I have opened issues to containerd-driver, thank you @tgross and @shishir-a412ed for helping me get to the bottom of this with detailed analysis and thinking process.

@alemonmk alemonmk closed this as not planned Won't fix, can't repro, duplicate, stale Jun 11, 2022
Nomad - Community Issues Triage automation moved this from In Progress to Done Jun 11, 2022
@github-actions
Copy link

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 10, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Development

No branches or pull requests

3 participants