Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nomad CSI plugin seems healthy but controller is now working properly #10030

Closed
carlosrbcunha opened this issue Feb 15, 2021 · 11 comments
Closed

Comments

@carlosrbcunha
Copy link

carlosrbcunha commented Feb 15, 2021

We have setup CSI on our nomad cluster and choose to start node and controller by hand (nomad cli) and not use terraform to do this. We also believed that controller workload should start first than node (also there is no clear guidance on this topic as we suspect that order has an impact in the health and ability to CSI to perform without problems).

In our recent tests we noticed that there might be a correlation regarding the errors that we got randomly and the start order of the nodes and controllers.

Nomad version

Nomad v1.0.3 (08741d9)

Operating system and Environment details

Ubuntu 20.04.2 LTS running in a Azure VM

Issue

Nomad CSI plugin seems healthy but controller is now working properly

Reproduction steps

Start nomad node and controller via terraform

Job file (if appropriate)

Node job

job "plugin-azure-disk-nodes" {
  datacenters = ["dc1"]

  # you can run node plugins as service jobs as well, but this ensures
  # that all nodes in the DC have a copy.
  type = "system"

  group "nodes" {
    task "node" {
      driver = "docker"

      template {
        change_mode = "noop"
        destination = "local/azure.json"
        data = <<EOH
{
"cloud":"AzurePublicCloud",
"tenantId": "${tenant_id}",
"subscriptionId": "${subscription_id}",
"aadClientId": "${client_id}",
"aadClientSecret": "${client_secret}",
"resourceGroup": "${resource_group_name}",
"location": "${location}"
}
EOH
      }

      env {
        AZURE_CREDENTIAL_FILE = "/etc/kubernetes/azure.json"
      }

      config {
        image = "mcr.microsoft.com/k8s/csi/azuredisk-csi:${csi_version}"

        volumes = [
          "local/azure.json:/etc/kubernetes/azure.json"
        ]

        args = [
          "--nodeid=${prefix}-$${attr.unique.hostname}-vm",
          "--endpoint=unix://csi/csi.sock",
          "--logtostderr",
          "--v=5",
        ]

        # node plugins must run as privileged jobs because they
        # mount disks to the host
        privileged = true
      }

      csi_plugin {
        id        = "${csi_plugin_id}"
        type      = "node"
        mount_dir = "/csi"
      }

      resources {
        memory = 256
      }

      # ensuring the plugin has time to shut down gracefully
      kill_timeout = "2m"
    }
  }
}

Controller job

job "plugin-azure-disk-controller" {
  datacenters = ["dc1"]
  type = "service"

  group "controller" {
    count = 2

    constraint { # Don't deploy more than one instance on the same host
      distinct_hosts = true
    }

    # disable deployments
    update {
      max_parallel = 0
    }
    task "controller" {
      driver = "docker"

      template {
        change_mode = "noop"
        destination = "local/azure.json"
        data = <<EOH
{
"cloud":"AzurePublicCloud",
"tenantId": "${tenant_id}",
"subscriptionId": "${subscription_id}",
"aadClientId": "${client_id}",
"aadClientSecret": "${client_secret}",
"resourceGroup": "${resource_group_name}",
"location": "${location}"
}
EOH
      }

      env {
        AZURE_CREDENTIAL_FILE = "/etc/kubernetes/azure.json"
      }

      config {
        image = "mcr.microsoft.com/k8s/csi/azuredisk-csi:${csi_version}"

        volumes = [
          "local/azure.json:/etc/kubernetes/azure.json"
        ]

        args = [
          "--nodeid=${prefix}-$${attr.unique.hostname}-vm",
          "--endpoint=unix://csi/csi.sock",
          "--logtostderr",
          "--v=5",
        ]
      }

      csi_plugin {
        id        = "${csi_plugin_id}"
        type      = "controller"
        mount_dir = "/csi"
      }

      resources {
        memory = 256
      }

      # ensuring the plugin has time to shut down gracefully
      kill_timeout = "2m"
    }
  }
}

Bad behaviour
If I start the node and the controller without any time gap between the two I get this state
Screenshot 2021-02-15 at 12 38 19
Everything seems ok but if we drill down to the status it shows ...
Screenshot 2021-02-15 at 15 18 13
Only node info in available, no controller !! . As the result of this, the jobs requiring CSI fail with this message.
Screenshot 2021-02-15 at 15 19 39

Expected behaviour
If I start the node and then add a time gap of, for instance, 10 seconds. The node and controllers start but in the end I see different result.
Screenshot 2021-02-15 at 12 38 19
Screenshot 2021-02-15 at 12 38 53
This time with controller and node data stating 100% across the board.
In this case, all workloads start and work without any issue.

Analysing the data that we have, we identified a different setting when querying the API in this 2 states.

In the 'bad behaviour' setting when we query via ( curl http://<nomad_ip>:4646/v1/plugin/csi/az-disk0 ) , the json response had ControllerRequired": false whereas the expected behaviour had ControllerRequired": true

  • Is there any guidance on the order of workloads to start (node and controller) ? We only located a terraform registry KristophUK/terraform-nomad-aws-ebs-csi-plugin as an example with node first but without timer between workloads.

  • Is there some way we can track the status of the plugin and make the controller and node be more resilient ? Sometimes out of the blue things start failing with errors and all seams OK in the controller and node health status.

@tgross tgross added this to Needs Triage in Nomad - Community Issues Triage via automation Feb 16, 2021
@tgross tgross self-assigned this Feb 16, 2021
@tgross tgross moved this from Needs Triage to Triaging in Nomad - Community Issues Triage Feb 16, 2021
@tgross
Copy link
Member

tgross commented Feb 16, 2021

Hi @carlosrbcunha! So for starters, the order of running the controller vs node plugin definitely should not matter at all. I think that's just surfacing some other issue here.

I can't find where it's documented in the azuredisk-csi-driver repo, but there should be a flag that tells a given plugin process is running as a Node or a Controller. That's not part of the CSI specification so Nomad can't do that on it's own. Usually it's a command line flag for the plugin.

If you take a magnifying glass 😁 and compare their k8s manifests for the Controller vs the Node I think the controller shouldn't have the --node-id flag.

So if that's the case, then when the controller is getting placed before the node, it's registering itself as a node plugin but Nomad is expecting it to be a controller. So Nomad is mounting its control socket in a location that assumes it's a controller, and then the node plugin on that client can't take the same control socket location. All the behaviors you're describing could be the result of that.

Also, does the output of nomad plugin status :plugin_id match what you're seeing in the UI? It should, but I know there were some bugs in the plugin counts there fixed recently so I want to make sure we didn't miss anything.

@carlosrbcunha
Copy link
Author

Hi @tgross , thanks for answering.

I agree with you, the order should not matter but in this case its impacting, in the incorrect status only node seems to be working and not the controller.

You that see that the nomad job has a csi stanza that states the type of plugin. I did not encounter a argument for the command that had this distinction.

The --node-id in fact didn't belong there you have to use it if your VM's hostname differ from your azure vm object name so that the controller plugin knows which machine to mount the disk.

My code only works with node first , wait some seconds so that all nodes come up and then start the controllers. If I invert the order or remove the sleep timer, it start to show both but in the end it will only show the nodes.

I will try to talk with the developer in charge of the driver and see if I have more guidance. Can you try and help me from your end ? With some examples maybe ?

@tgross
Copy link
Member

tgross commented Feb 16, 2021

You that see that the nomad job has a csi stanza that states the type of plugin.

That tells Nomad the type of plugin. It doesn't tell the plugin whether it should be acting as a controller or node. Try deploying with that --node-id flag removed for the controller. If that still doesn't work, we can do some more detailed debugging.

@carlosrbcunha
Copy link
Author

I tested that and the result is still the same, not stable.

@tgross
Copy link
Member

tgross commented Feb 16, 2021

Ok, can you provide the results of nomad plugin status :plugin_id and the allocation status for some of those allocations? It would also help to take a look at the allocation logs for the plugins so we can see what specific CSI RPCs Nomad is sending them and how they're responding.

@carlosrbcunha
Copy link
Author

Hi,
Here are the logs you requested. I read in here that the node must run on all clients and there should be an instance of the controller. Even with this setup I cannot get this to be stable. I also need some guidance on this matter, have one controller , two or a node in every client.

nomad status output

nomad plugin status -verbose az-disk0
ID                   = az-disk0
Provider             = disk.csi.azure.com
Version              = v1.1.0
Controllers Healthy  = 1
Controllers Expected = 1
Nodes Healthy        = 4
Nodes Expected       = 4

Allocations
ID                                    Eval ID                               Node ID                               Node Name       Task Group             Version  Desired  Status   Created               Modified
7339181b-9050-e1f7-ec61-93b0893a3835  e76aeb25-6876-3754-cb4c-f78f412d4ad2  84b4dfcc-cbb2-2e9e-cf65-b7e538c62f1a  nomad-server-0  csi_plugin_controller  0        run      running  2021-02-16T22:06:24Z  2021-02-16T22:06:36Z
47ca5064-4023-a068-6347-9cd98684196e  db6d5974-b355-5b5d-9e45-38c985c52092  bdaaad8f-521a-617e-19e3-198d5ad23325  nomad-client-1  csi_plugin_node        0        run      running  2021-02-16T22:06:40Z  2021-02-16T22:06:41Z
dbb16617-9f88-8713-b8b1-1436368466f5  db6d5974-b355-5b5d-9e45-38c985c52092  b6ab02c4-e877-ebd0-c943-fdf7e744225f  nomad-client-2  csi_plugin_node        0        run      running  2021-02-16T22:06:40Z  2021-02-16T22:06:41Z
d9fe8cc2-58ad-1c45-0a76-829626959110  db6d5974-b355-5b5d-9e45-38c985c52092  84b4dfcc-cbb2-2e9e-cf65-b7e538c62f1a  nomad-server-0  csi_plugin_node        0        run      running  2021-02-16T22:06:40Z  2021-02-16T22:06:41Z
6675e4ef-5067-32a2-0e3e-9eaf23ea26d7  db6d5974-b355-5b5d-9e45-38c985c52092  daa19c2d-64b5-200c-3dd3-5e762f993846  nomad-client-0  csi_plugin_node        0        run      running  2021-02-16T22:06:40Z  2021-02-16T22:06:41Z

Node logs

I0216 22:06:41.602917       1 main.go:86] set up prometheus server on [::]:29604
I0216 22:06:41.604533       1 azuredisk.go:120] 
DRIVER INFORMATION:
-------------------
Build Date: "2021-02-10T12:51:38Z"
Compiler: gc
Driver Name: disk.csi.azure.com
Driver Version: v1.1.0
Git Commit: 8bec51f963901f67684828886673ee48b5f333ad
Go Version: go1.15.4
Platform: linux/amd64
Topology Key: topology.disk.csi.azure.com/zone

Streaming logs below:
W0216 22:06:41.604605       1 azure.go:48] get kubeconfig() failed with error: unable to load in-cluster configuration, KUBERNETES_SERVICE_HOST and KUBERNETES_SERVICE_PORT must be defined
I0216 22:06:41.604618       1 azure.go:62] could not read cloud config from secret
I0216 22:06:41.604625       1 azure.go:65] AZURE_CREDENTIAL_FILE env var set as /etc/kubernetes/azure.json
I0216 22:06:41.604652       1 azure.go:83] read cloud config from file: /etc/kubernetes/azure.json successfully
I0216 22:06:41.605529       1 azure_auth.go:232] Using AzurePublicCloud environment
I0216 22:06:41.605591       1 azure_auth.go:117] azure: using client_id+client_secret to retrieve access token
I0216 22:06:41.605649       1 azure_interfaceclient.go:68] Azure InterfacesClient (read ops) using rate limit config: QPS=1, bucket=5
I0216 22:06:41.605663       1 azure_interfaceclient.go:71] Azure InterfacesClient (write ops) using rate limit config: QPS=1, bucket=5
I0216 22:06:41.605688       1 azure_vmsizeclient.go:67] Azure VirtualMachineSizesClient (read ops) using rate limit config: QPS=1, bucket=5
I0216 22:06:41.605698       1 azure_vmsizeclient.go:70] Azure VirtualMachineSizesClient (write ops) using rate limit config: QPS=1, bucket=5
I0216 22:06:41.605717       1 azure_snapshotclient.go:68] Azure SnapshotClient (read ops) using rate limit config: QPS=1, bucket=5
I0216 22:06:41.605724       1 azure_snapshotclient.go:71] Azure SnapshotClient (write ops) using rate limit config: QPS=1, bucket=5
I0216 22:06:41.605739       1 azure_storageaccountclient.go:68] Azure StorageAccountClient (read ops) using rate limit config: QPS=1, bucket=5
I0216 22:06:41.605747       1 azure_storageaccountclient.go:71] Azure StorageAccountClient (write ops) using rate limit config: QPS=1, bucket=5
I0216 22:06:41.605760       1 azure_diskclient.go:68] Azure DisksClient (read ops) using rate limit config: QPS=1, bucket=5
I0216 22:06:41.605767       1 azure_diskclient.go:71] Azure DisksClient (write ops) using rate limit config: QPS=1, bucket=5
I0216 22:06:41.605779       1 azure_vmclient.go:68] Azure VirtualMachine client (read ops) using rate limit config: QPS=1, bucket=5
I0216 22:06:41.605785       1 azure_vmclient.go:71] Azure VirtualMachine client (write ops) using rate limit config: QPS=1, bucket=5
I0216 22:06:41.605798       1 azure_vmssclient.go:68] Azure VirtualMachineScaleSetClient (read ops) using rate limit config: QPS=1, bucket=5
I0216 22:06:41.605804       1 azure_vmssclient.go:71] Azure VirtualMachineScaleSetClient (write ops) using rate limit config: QPS=1, bucket=5
I0216 22:06:41.605816       1 azure_vmssvmclient.go:69] Azure vmssVM client (read ops) using rate limit config: QPS=1, bucket=5
I0216 22:06:41.605823       1 azure_vmssvmclient.go:72] Azure vmssVM client (write ops) using rate limit config: QPS=1, bucket=5
I0216 22:06:41.605842       1 azure_routeclient.go:66] Azure RoutesClient (read ops) using rate limit config: QPS=1, bucket=5
I0216 22:06:41.605849       1 azure_routeclient.go:69] Azure RoutesClient (write ops) using rate limit config: QPS=1, bucket=5
I0216 22:06:41.605862       1 azure_subnetclient.go:67] Azure SubnetsClient (read ops) using rate limit config: QPS=1, bucket=5
I0216 22:06:41.605868       1 azure_subnetclient.go:70] Azure SubnetsClient (write ops) using rate limit config: QPS=1, bucket=5
I0216 22:06:41.605881       1 azure_routetableclient.go:66] Azure RouteTablesClient (read ops) using rate limit config: QPS=1, bucket=5
I0216 22:06:41.605889       1 azure_routetableclient.go:69] Azure RouteTablesClient (write ops) using rate limit config: QPS=1, bucket=5
I0216 22:06:41.605900       1 azure_loadbalancerclient.go:68] Azure LoadBalancersClient (read ops) using rate limit config: QPS=1, bucket=5
I0216 22:06:41.605906       1 azure_loadbalancerclient.go:71] Azure LoadBalancersClient (write ops) using rate limit config: QPS=1, bucket=5
I0216 22:06:41.605922       1 azure_securitygroupclient.go:68] Azure SecurityGroupsClient (read ops) using rate limit config: QPS=1, bucket=5
I0216 22:06:41.605929       1 azure_securitygroupclient.go:71] Azure SecurityGroupsClient (write ops) using rate limit config: QPS=1, bucket=5
I0216 22:06:41.606041       1 azure_publicipclient.go:68] Azure PublicIPAddressesClient (read ops) using rate limit config: QPS=1, bucket=5
I0216 22:06:41.606049       1 azure_publicipclient.go:71] Azure PublicIPAddressesClient (write ops) using rate limit config: QPS=1, bucket=5
I0216 22:06:41.606244       1 mount_linux.go:188] Detected OS without systemd
I0216 22:06:41.606257       1 driver.go:80] Enabling controller service capability: CREATE_DELETE_VOLUME
I0216 22:06:41.606273       1 driver.go:80] Enabling controller service capability: PUBLISH_UNPUBLISH_VOLUME
I0216 22:06:41.606279       1 driver.go:80] Enabling controller service capability: CREATE_DELETE_SNAPSHOT
I0216 22:06:41.606286       1 driver.go:80] Enabling controller service capability: LIST_SNAPSHOTS
I0216 22:06:41.606291       1 driver.go:80] Enabling controller service capability: CLONE_VOLUME
I0216 22:06:41.606297       1 driver.go:80] Enabling controller service capability: EXPAND_VOLUME
I0216 22:06:41.606303       1 driver.go:80] Enabling controller service capability: LIST_VOLUMES
I0216 22:06:41.606309       1 driver.go:80] Enabling controller service capability: LIST_VOLUMES_PUBLISHED_NODES
I0216 22:06:41.606316       1 driver.go:99] Enabling volume access mode: SINGLE_NODE_WRITER
I0216 22:06:41.606323       1 driver.go:90] Enabling node service capability: STAGE_UNSTAGE_VOLUME
I0216 22:06:41.606379       1 driver.go:90] Enabling node service capability: EXPAND_VOLUME
I0216 22:06:41.606385       1 driver.go:90] Enabling node service capability: GET_VOLUME_STATS
I0216 22:06:41.606613       1 server.go:117] Listening for connections on address: &net.UnixAddr{Name:"/csi/csi.sock", Net:"unix"}
I0216 22:06:41.644010       1 utils.go:95] GRPC call: /csi.v1.Identity/GetPluginInfo
I0216 22:06:41.644036       1 utils.go:96] GRPC request: {}
I0216 22:06:41.647751       1 utils.go:102] GRPC response: {"name":"disk.csi.azure.com","vendor_version":"v1.1.0"}
I0216 22:06:41.655037       1 utils.go:95] GRPC call: /csi.v1.Identity/GetPluginCapabilities
I0216 22:06:41.655074       1 utils.go:96] GRPC request: {}
I0216 22:06:41.655115       1 utils.go:102] GRPC response: {"capabilities":[{"Type":{"Service":{"type":1}}},{"Type":{"Service":{"type":2}}}]}
I0216 22:06:41.657563       1 utils.go:95] GRPC call: /csi.v1.Node/NodeGetInfo
I0216 22:06:41.657580       1 utils.go:96] GRPC request: {}
I0216 22:06:42.080549       1 azure_vmclient.go:130] Received error in vm.get.request: resourceID: /subscriptions/02c7095c-b699-45c4-a1bf-c1bcb998e707/resourceGroups/hashitalks-demo/providers/Microsoft.Compute/virtualMachines/35fef3f81654, error: Retriable: false, RetryAfter: 0s, HTTPStatusCode: 404, RawError: Retriable: false, RetryAfter: 0s, HTTPStatusCode: 404, RawError: {"error":{"code":"ResourceNotFound","message":"The Resource 'Microsoft.Compute/virtualMachines/35fef3f81654' under resource group 'hashitalks-demo' was not found. For more details please go to https://aka.ms/ARMResourceNotFoundFix"}}
I0216 22:06:42.080639       1 azure_wrap.go:193] Virtual machine "35fef3f81654" not found
W0216 22:06:42.080656       1 nodeserver.go:310] Failed to get zone from Azure cloud provider, nodeName: HashiTalks-Demo-nomad-client-2-vm, error: instance not found
I0216 22:06:42.080666       1 nodeserver.go:329] got a matching size in getMaxDataDiskCount, VM Size: STANDARD_D2S_V3, MaxDataDiskCount: 4
I0216 22:06:42.080678       1 utils.go:102] GRPC response: {"accessible_topology":{"segments":{"topology.disk.csi.azure.com/zone":""}},"max_volumes_per_node":4,"node_id":"HashiTalks-Demo-nomad-client-2-vm"}
I0216 22:09:03.169199       1 utils.go:95] GRPC call: /csi.v1.Node/NodeStageVolume
I0216 22:09:03.169231       1 utils.go:96] GRPC request: {"staging_target_path":"/csi/staging/postgres-sonar/rw-file-system-single-node-writer","volume_capability":{"AccessType":{"Mount":{}},"access_mode":{"mode":1}},"volume_id":"/subscriptions/02c7095c-b699-45c4-a1bf-c1bcb998e707/resourceGroups/HashiTalks-Demo/providers/Microsoft.Compute/disks/HashiTalks-Demo-postgres-sonar-disk"}
E0216 22:09:03.169495       1 utils.go:100] GRPC error: rpc error: code = InvalidArgument desc = lun not provided
I0216 22:09:03.547485       1 utils.go:95] GRPC call: /csi.v1.Node/NodeUnpublishVolume
I0216 22:09:03.547517       1 utils.go:96] GRPC request: {"target_path":"/csi/per-alloc/ca3a328d-9bf6-75c3-8371-414d9b1e2d40/postgres-sonar/rw-file-system-single-node-writer","volume_id":"/subscriptions/02c7095c-b699-45c4-a1bf-c1bcb998e707/resourceGroups/HashiTalks-Demo/providers/Microsoft.Compute/disks/HashiTalks-Demo-postgres-sonar-disk"}
I0216 22:09:03.547618       1 nodeserver.go:269] NodeUnpublishVolume: unmounting volume /subscriptions/02c7095c-b699-45c4-a1bf-c1bcb998e707/resourceGroups/HashiTalks-Demo/providers/Microsoft.Compute/disks/HashiTalks-Demo-postgres-sonar-disk on /csi/per-alloc/ca3a328d-9bf6-75c3-8371-414d9b1e2d40/postgres-sonar/rw-file-system-single-node-writer
W0216 22:09:03.547659       1 mount_helper_common.go:33] Warning: Unmount skipped because path does not exist: /csi/per-alloc/ca3a328d-9bf6-75c3-8371-414d9b1e2d40/postgres-sonar/rw-file-system-single-node-writer
I0216 22:09:03.547671       1 nodeserver.go:280] NodeUnpublishVolume: unmount volume /subscriptions/02c7095c-b699-45c4-a1bf-c1bcb998e707/resourceGroups/HashiTalks-Demo/providers/Microsoft.Compute/disks/HashiTalks-Demo-postgres-sonar-disk on /csi/per-alloc/ca3a328d-9bf6-75c3-8371-414d9b1e2d40/postgres-sonar/rw-file-system-single-node-writer successfully
I0216 22:09:03.547682       1 utils.go:102] GRPC response: {}
I0216 22:09:03.548197       1 utils.go:95] GRPC call: /csi.v1.Node/NodeUnstageVolume
I0216 22:09:03.548220       1 utils.go:96] GRPC request: {"staging_target_path":"/csi/staging/postgres-sonar/rw-file-system-single-node-writer","volume_id":"/subscriptions/02c7095c-b699-45c4-a1bf-c1bcb998e707/resourceGroups/HashiTalks-Demo/providers/Microsoft.Compute/disks/HashiTalks-Demo-postgres-sonar-disk"}
I0216 22:09:03.548283       1 nodeserver.go:175] NodeUnstageVolume: unmounting /csi/staging/postgres-sonar/rw-file-system-single-node-writer
W0216 22:09:03.548315       1 mount_helper_common.go:65] Warning: "/csi/staging/postgres-sonar/rw-file-system-single-node-writer" is not a mountpoint, deleting
I0216 22:09:03.548376       1 nodeserver.go:180] NodeUnstageVolume: unmount /csi/staging/postgres-sonar/rw-file-system-single-node-writer successfully
I0216 22:09:03.548388       1 utils.go:102] GRPC response: {}
I0216 22:10:34.191741       1 utils.go:95] GRPC call: /csi.v1.Node/NodeStageVolume
I0216 22:10:34.191776       1 utils.go:96] GRPC request: {"staging_target_path":"/csi/staging/postgres-pgadmin/rw-file-system-single-node-writer","volume_capability":{"AccessType":{"Mount":{}},"access_mode":{"mode":1}},"volume_id":"/subscriptions/02c7095c-b699-45c4-a1bf-c1bcb998e707/resourceGroups/HashiTalks-Demo/providers/Microsoft.Compute/disks/HashiTalks-Demo-postgres-pgadmin-disk"}
E0216 22:10:34.191959       1 utils.go:100] GRPC error: rpc error: code = InvalidArgument desc = lun not provided
I0216 22:10:34.562347       1 utils.go:95] GRPC call: /csi.v1.Node/NodeUnpublishVolume
I0216 22:10:34.562375       1 utils.go:96] GRPC request: {"target_path":"/csi/per-alloc/c7afd7c9-2299-b8f6-84b5-032f7eda4cd0/postgres-pgadmin/rw-file-system-single-node-writer","volume_id":"/subscriptions/02c7095c-b699-45c4-a1bf-c1bcb998e707/resourceGroups/HashiTalks-Demo/providers/Microsoft.Compute/disks/HashiTalks-Demo-postgres-pgadmin-disk"}
I0216 22:10:34.562463       1 nodeserver.go:269] NodeUnpublishVolume: unmounting volume /subscriptions/02c7095c-b699-45c4-a1bf-c1bcb998e707/resourceGroups/HashiTalks-Demo/providers/Microsoft.Compute/disks/HashiTalks-Demo-postgres-pgadmin-disk on /csi/per-alloc/c7afd7c9-2299-b8f6-84b5-032f7eda4cd0/postgres-pgadmin/rw-file-system-single-node-writer
W0216 22:10:34.562493       1 mount_helper_common.go:33] Warning: Unmount skipped because path does not exist: /csi/per-alloc/c7afd7c9-2299-b8f6-84b5-032f7eda4cd0/postgres-pgadmin/rw-file-system-single-node-writer
I0216 22:10:34.562506       1 nodeserver.go:280] NodeUnpublishVolume: unmount volume /subscriptions/02c7095c-b699-45c4-a1bf-c1bcb998e707/resourceGroups/HashiTalks-Demo/providers/Microsoft.Compute/disks/HashiTalks-Demo-postgres-pgadmin-disk on /csi/per-alloc/c7afd7c9-2299-b8f6-84b5-032f7eda4cd0/postgres-pgadmin/rw-file-system-single-node-writer successfully
I0216 22:10:34.562517       1 utils.go:102] GRPC response: {}
I0216 22:10:34.562905       1 utils.go:95] GRPC call: /csi.v1.Node/NodeUnstageVolume
I0216 22:10:34.562922       1 utils.go:96] GRPC request: {"staging_target_path":"/csi/staging/postgres-pgadmin/rw-file-system-single-node-writer","volume_id":"/subscriptions/02c7095c-b699-45c4-a1bf-c1bcb998e707/resourceGroups/HashiTalks-Demo/providers/Microsoft.Compute/disks/HashiTalks-Demo-postgres-pgadmin-disk"}
I0216 22:10:34.562980       1 nodeserver.go:175] NodeUnstageVolume: unmounting /csi/staging/postgres-pgadmin/rw-file-system-single-node-writer
W0216 22:10:34.563015       1 mount_helper_common.go:65] Warning: "/csi/staging/postgres-pgadmin/rw-file-system-single-node-writer" is not a mountpoint, deleting
I0216 22:10:34.563090       1 nodeserver.go:180] NodeUnstageVolume: unmount /csi/staging/postgres-pgadmin/rw-file-system-single-node-writer successfully
I0216 22:10:34.563111       1 utils.go:102] GRPC response: {}

controller logs

W0216 22:06:25.862314       1 main.go:59] nodeid is empty
I0216 22:06:25.863496       1 main.go:86] set up prometheus server on [::]:29604
I0216 22:06:25.863987       1 azuredisk.go:120] 
DRIVER INFORMATION:
-------------------
Build Date: "2021-02-10T12:51:38Z"
Compiler: gc
Driver Name: disk.csi.azure.com
Driver Version: v1.1.0
Git Commit: 8bec51f963901f67684828886673ee48b5f333ad
Go Version: go1.15.4
Platform: linux/amd64
Topology Key: topology.disk.csi.azure.com/zone

Streaming logs below:
W0216 22:06:25.864037       1 azure.go:48] get kubeconfig() failed with error: unable to load in-cluster configuration, KUBERNETES_SERVICE_HOST and KUBERNETES_SERVICE_PORT must be defined
I0216 22:06:25.864049       1 azure.go:62] could not read cloud config from secret
I0216 22:06:25.864061       1 azure.go:65] AZURE_CREDENTIAL_FILE env var set as /etc/kubernetes/azure.json
I0216 22:06:25.864086       1 azure.go:83] read cloud config from file: /etc/kubernetes/azure.json successfully
I0216 22:06:25.864548       1 azure_auth.go:232] Using AzurePublicCloud environment
I0216 22:06:25.864594       1 azure_auth.go:117] azure: using client_id+client_secret to retrieve access token
I0216 22:06:25.864642       1 azure_interfaceclient.go:68] Azure InterfacesClient (read ops) using rate limit config: QPS=1, bucket=5
I0216 22:06:25.864655       1 azure_interfaceclient.go:71] Azure InterfacesClient (write ops) using rate limit config: QPS=1, bucket=5
I0216 22:06:25.864686       1 azure_vmsizeclient.go:67] Azure VirtualMachineSizesClient (read ops) using rate limit config: QPS=1, bucket=5
I0216 22:06:25.864695       1 azure_vmsizeclient.go:70] Azure VirtualMachineSizesClient (write ops) using rate limit config: QPS=1, bucket=5
I0216 22:06:25.864713       1 azure_snapshotclient.go:68] Azure SnapshotClient (read ops) using rate limit config: QPS=1, bucket=5
I0216 22:06:25.864720       1 azure_snapshotclient.go:71] Azure SnapshotClient (write ops) using rate limit config: QPS=1, bucket=5
I0216 22:06:25.864735       1 azure_storageaccountclient.go:68] Azure StorageAccountClient (read ops) using rate limit config: QPS=1, bucket=5
I0216 22:06:25.864742       1 azure_storageaccountclient.go:71] Azure StorageAccountClient (write ops) using rate limit config: QPS=1, bucket=5
I0216 22:06:25.864753       1 azure_diskclient.go:68] Azure DisksClient (read ops) using rate limit config: QPS=1, bucket=5
I0216 22:06:25.864759       1 azure_diskclient.go:71] Azure DisksClient (write ops) using rate limit config: QPS=1, bucket=5
I0216 22:06:25.864769       1 azure_vmclient.go:68] Azure VirtualMachine client (read ops) using rate limit config: QPS=1, bucket=5
I0216 22:06:25.864775       1 azure_vmclient.go:71] Azure VirtualMachine client (write ops) using rate limit config: QPS=1, bucket=5
I0216 22:06:25.864785       1 azure_vmssclient.go:68] Azure VirtualMachineScaleSetClient (read ops) using rate limit config: QPS=1, bucket=5
I0216 22:06:25.864792       1 azure_vmssclient.go:71] Azure VirtualMachineScaleSetClient (write ops) using rate limit config: QPS=1, bucket=5
I0216 22:06:25.864802       1 azure_vmssvmclient.go:69] Azure vmssVM client (read ops) using rate limit config: QPS=1, bucket=5
I0216 22:06:25.864807       1 azure_vmssvmclient.go:72] Azure vmssVM client (write ops) using rate limit config: QPS=1, bucket=5
I0216 22:06:25.864823       1 azure_routeclient.go:66] Azure RoutesClient (read ops) using rate limit config: QPS=1, bucket=5
I0216 22:06:25.864829       1 azure_routeclient.go:69] Azure RoutesClient (write ops) using rate limit config: QPS=1, bucket=5
I0216 22:06:25.864839       1 azure_subnetclient.go:67] Azure SubnetsClient (read ops) using rate limit config: QPS=1, bucket=5
I0216 22:06:25.864845       1 azure_subnetclient.go:70] Azure SubnetsClient (write ops) using rate limit config: QPS=1, bucket=5
I0216 22:06:25.864855       1 azure_routetableclient.go:66] Azure RouteTablesClient (read ops) using rate limit config: QPS=1, bucket=5
I0216 22:06:25.864882       1 azure_routetableclient.go:69] Azure RouteTablesClient (write ops) using rate limit config: QPS=1, bucket=5
I0216 22:06:25.864894       1 azure_loadbalancerclient.go:68] Azure LoadBalancersClient (read ops) using rate limit config: QPS=1, bucket=5
I0216 22:06:25.864899       1 azure_loadbalancerclient.go:71] Azure LoadBalancersClient (write ops) using rate limit config: QPS=1, bucket=5
I0216 22:06:25.864923       1 azure_securitygroupclient.go:68] Azure SecurityGroupsClient (read ops) using rate limit config: QPS=1, bucket=5
I0216 22:06:25.864930       1 azure_securitygroupclient.go:71] Azure SecurityGroupsClient (write ops) using rate limit config: QPS=1, bucket=5
I0216 22:06:25.864940       1 azure_publicipclient.go:68] Azure PublicIPAddressesClient (read ops) using rate limit config: QPS=1, bucket=5
I0216 22:06:25.864946       1 azure_publicipclient.go:71] Azure PublicIPAddressesClient (write ops) using rate limit config: QPS=1, bucket=5
I0216 22:06:25.864991       1 azuredisk.go:130] disable UseInstanceMetadata for controller
I0216 22:06:25.865111       1 mount_linux.go:188] Detected OS without systemd
I0216 22:06:25.865120       1 driver.go:80] Enabling controller service capability: CREATE_DELETE_VOLUME
I0216 22:06:25.865127       1 driver.go:80] Enabling controller service capability: PUBLISH_UNPUBLISH_VOLUME
I0216 22:06:25.865132       1 driver.go:80] Enabling controller service capability: CREATE_DELETE_SNAPSHOT
I0216 22:06:25.865137       1 driver.go:80] Enabling controller service capability: LIST_SNAPSHOTS
I0216 22:06:25.865142       1 driver.go:80] Enabling controller service capability: CLONE_VOLUME
I0216 22:06:25.865148       1 driver.go:80] Enabling controller service capability: EXPAND_VOLUME
I0216 22:06:25.865153       1 driver.go:80] Enabling controller service capability: LIST_VOLUMES
I0216 22:06:25.865163       1 driver.go:80] Enabling controller service capability: LIST_VOLUMES_PUBLISHED_NODES
I0216 22:06:25.865169       1 driver.go:99] Enabling volume access mode: SINGLE_NODE_WRITER
I0216 22:06:25.865175       1 driver.go:90] Enabling node service capability: STAGE_UNSTAGE_VOLUME
I0216 22:06:25.865181       1 driver.go:90] Enabling node service capability: EXPAND_VOLUME
I0216 22:06:25.865186       1 driver.go:90] Enabling node service capability: GET_VOLUME_STATS
I0216 22:06:25.865461       1 server.go:117] Listening for connections on address: &net.UnixAddr{Name:"/csi/csi.sock", Net:"unix"}
I0216 22:06:25.900813       1 utils.go:95] GRPC call: /csi.v1.Identity/GetPluginInfo
I0216 22:06:25.900836       1 utils.go:96] GRPC request: {}
I0216 22:06:25.903134       1 utils.go:102] GRPC response: {"name":"disk.csi.azure.com","vendor_version":"v1.1.0"}
I0216 22:06:25.906002       1 utils.go:95] GRPC call: /csi.v1.Identity/GetPluginCapabilities
I0216 22:06:25.906020       1 utils.go:96] GRPC request: {}
I0216 22:06:25.906056       1 utils.go:102] GRPC response: {"capabilities":[{"Type":{"Service":{"type":1}}},{"Type":{"Service":{"type":2}}}]}
I0216 22:06:25.906985       1 utils.go:95] GRPC call: /csi.v1.Controller/ControllerGetCapabilities
I0216 22:06:25.907000       1 utils.go:96] GRPC request: {}
I0216 22:06:25.907032       1 utils.go:102] GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}},{"Type":{"Rpc":{"type":5}}},{"Type":{"Rpc":{"type":6}}},{"Type":{"Rpc":{"type":7}}},{"Type":{"Rpc":{"type":9}}},{"Type":{"Rpc":{"type":3}}},{"Type":{"Rpc":{"type":10}}}]}
I0216 22:06:41.479079       1 utils.go:95] GRPC call: /csi.v1.Controller/ValidateVolumeCapabilities
I0216 22:06:41.479106       1 utils.go:96] GRPC request: {"volume_capabilities":[{"AccessType":{"Mount":{}},"access_mode":{"mode":1}}],"volume_id":"/subscriptions/02c7095c-b699-45c4-a1bf-c1bcb998e707/resourceGroups/HashiTalks-Demo/providers/Microsoft.Compute/disks/HashiTalks-Demo-postgres-pgadmin-disk"}
I0216 22:06:41.481821       1 utils.go:95] GRPC call: /csi.v1.Controller/ValidateVolumeCapabilities
I0216 22:06:41.481837       1 utils.go:96] GRPC request: {"volume_capabilities":[{"AccessType":{"Mount":{}},"access_mode":{"mode":1}}],"volume_id":"/subscriptions/02c7095c-b699-45c4-a1bf-c1bcb998e707/resourceGroups/HashiTalks-Demo/providers/Microsoft.Compute/disks/HashiTalks-Demo-postgres-sonar-disk"}
I0216 22:06:41.845760       1 utils.go:102] GRPC response: {}
I0216 22:06:41.847875       1 utils.go:102] GRPC response: {}
I0216 22:06:55.908317       1 utils.go:95] GRPC call: /csi.v1.Controller/ControllerGetCapabilities
I0216 22:06:55.908341       1 utils.go:96] GRPC request: {}
I0216 22:06:55.908404       1 utils.go:102] GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}},{"Type":{"Rpc":{"type":5}}},{"Type":{"Rpc":{"type":6}}},{"Type":{"Rpc":{"type":7}}},{"Type":{"Rpc":{"type":9}}},{"Type":{"Rpc":{"type":3}}},{"Type":{"Rpc":{"type":10}}}]}
I0216 22:07:25.916034       1 utils.go:95] GRPC call: /csi.v1.Controller/ControllerGetCapabilities
I0216 22:07:25.916062       1 utils.go:96] GRPC request: {}
I0216 22:07:25.916141       1 utils.go:102] GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}},{"Type":{"Rpc":{"type":5}}},{"Type":{"Rpc":{"type":6}}},{"Type":{"Rpc":{"type":7}}},{"Type":{"Rpc":{"type":9}}},{"Type":{"Rpc":{"type":3}}},{"Type":{"Rpc":{"type":10}}}]}
I0216 22:07:55.917517       1 utils.go:95] GRPC call: /csi.v1.Controller/ControllerGetCapabilities
I0216 22:07:55.917541       1 utils.go:96] GRPC request: {}
I0216 22:07:55.917602       1 utils.go:102] GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}},{"Type":{"Rpc":{"type":5}}},{"Type":{"Rpc":{"type":6}}},{"Type":{"Rpc":{"type":7}}},{"Type":{"Rpc":{"type":9}}},{"Type":{"Rpc":{"type":3}}},{"Type":{"Rpc":{"type":10}}}]}
I0216 22:08:25.918718       1 utils.go:95] GRPC call: /csi.v1.Controller/ControllerGetCapabilities
I0216 22:08:25.918743       1 utils.go:96] GRPC request: {}
I0216 22:08:25.918800       1 utils.go:102] GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}},{"Type":{"Rpc":{"type":5}}},{"Type":{"Rpc":{"type":6}}},{"Type":{"Rpc":{"type":7}}},{"Type":{"Rpc":{"type":9}}},{"Type":{"Rpc":{"type":3}}},{"Type":{"Rpc":{"type":10}}}]}
I0216 22:08:55.921791       1 utils.go:95] GRPC call: /csi.v1.Controller/ControllerGetCapabilities
I0216 22:08:55.921817       1 utils.go:96] GRPC request: {}
I0216 22:08:55.921870       1 utils.go:102] GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}},{"Type":{"Rpc":{"type":5}}},{"Type":{"Rpc":{"type":6}}},{"Type":{"Rpc":{"type":7}}},{"Type":{"Rpc":{"type":9}}},{"Type":{"Rpc":{"type":3}}},{"Type":{"Rpc":{"type":10}}}]}
I0216 22:09:25.924029       1 utils.go:95] GRPC call: /csi.v1.Controller/ControllerGetCapabilities
I0216 22:09:25.924056       1 utils.go:96] GRPC request: {}
I0216 22:09:25.924116       1 utils.go:102] GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}},{"Type":{"Rpc":{"type":5}}},{"Type":{"Rpc":{"type":6}}},{"Type":{"Rpc":{"type":7}}},{"Type":{"Rpc":{"type":9}}},{"Type":{"Rpc":{"type":3}}},{"Type":{"Rpc":{"type":10}}}]}
I0216 22:09:55.925371       1 utils.go:95] GRPC call: /csi.v1.Controller/ControllerGetCapabilities
I0216 22:09:55.925398       1 utils.go:96] GRPC request: {}
I0216 22:09:55.925461       1 utils.go:102] GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}},{"Type":{"Rpc":{"type":5}}},{"Type":{"Rpc":{"type":6}}},{"Type":{"Rpc":{"type":7}}},{"Type":{"Rpc":{"type":9}}},{"Type":{"Rpc":{"type":3}}},{"Type":{"Rpc":{"type":10}}}]}
I0216 22:10:25.929932       1 utils.go:95] GRPC call: /csi.v1.Controller/ControllerGetCapabilities
I0216 22:10:25.929961       1 utils.go:96] GRPC request: {}
I0216 22:10:25.930038       1 utils.go:102] GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}},{"Type":{"Rpc":{"type":5}}},{"Type":{"Rpc":{"type":6}}},{"Type":{"Rpc":{"type":7}}},{"Type":{"Rpc":{"type":9}}},{"Type":{"Rpc":{"type":3}}},{"Type":{"Rpc":{"type":10}}}]}
I0216 22:10:55.934224       1 utils.go:95] GRPC call: /csi.v1.Controller/ControllerGetCapabilities
I0216 22:10:55.934250       1 utils.go:96] GRPC request: {}
I0216 22:10:55.934305       1 utils.go:102] GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}},{"Type":{"Rpc":{"type":5}}},{"Type":{"Rpc":{"type":6}}},{"Type":{"Rpc":{"type":7}}},{"Type":{"Rpc":{"type":9}}},{"Type":{"Rpc":{"type":3}}},{"Type":{"Rpc":{"type":10}}}]}
I0216 22:11:25.936018       1 utils.go:95] GRPC call: /csi.v1.Controller/ControllerGetCapabilities
I0216 22:11:25.936045       1 utils.go:96] GRPC request: {}
I0216 22:11:25.936125       1 utils.go:102] GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}},{"Type":{"Rpc":{"type":5}}},{"Type":{"Rpc":{"type":6}}},{"Type":{"Rpc":{"type":7}}},{"Type":{"Rpc":{"type":9}}},{"Type":{"Rpc":{"type":3}}},{"Type":{"Rpc":{"type":10}}}]}
I0216 22:11:55.937830       1 utils.go:95] GRPC call: /csi.v1.Controller/ControllerGetCapabilities
I0216 22:11:55.937860       1 utils.go:96] GRPC request: {}
I0216 22:11:55.937936       1 utils.go:102] GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}},{"Type":{"Rpc":{"type":5}}},{"Type":{"Rpc":{"type":6}}},{"Type":{"Rpc":{"type":7}}},{"Type":{"Rpc":{"type":9}}},{"Type":{"Rpc":{"type":3}}},{"Type":{"Rpc":{"type":10}}}]}
I0216 22:12:25.939204       1 utils.go:95] GRPC call: /csi.v1.Controller/ControllerGetCapabilities
I0216 22:12:25.939228       1 utils.go:96] GRPC request: {}
I0216 22:12:25.939274       1 utils.go:102] GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}},{"Type":{"Rpc":{"type":5}}},{"Type":{"Rpc":{"type":6}}},{"Type":{"Rpc":{"type":7}}},{"Type":{"Rpc":{"type":9}}},{"Type":{"Rpc":{"type":3}}},{"Type":{"Rpc":{"type":10}}}]}
I0216 22:12:55.945315       1 utils.go:95] GRPC call: /csi.v1.Controller/ControllerGetCapabilities
I0216 22:12:55.945353       1 utils.go:96] GRPC request: {}
I0216 22:12:55.945426       1 utils.go:102] GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}},{"Type":{"Rpc":{"type":5}}},{"Type":{"Rpc":{"type":6}}},{"Type":{"Rpc":{"type":7}}},{"Type":{"Rpc":{"type":9}}},{"Type":{"Rpc":{"type":3}}},{"Type":{"Rpc":{"type":10}}}]}
I0216 22:13:25.947034       1 utils.go:95] GRPC call: /csi.v1.Controller/ControllerGetCapabilities
I0216 22:13:25.947057       1 utils.go:96] GRPC request: {}
I0216 22:13:25.947116       1 utils.go:102] GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}},{"Type":{"Rpc":{"type":5}}},{"Type":{"Rpc":{"type":6}}},{"Type":{"Rpc":{"type":7}}},{"Type":{"Rpc":{"type":9}}},{"Type":{"Rpc":{"type":3}}},{"Type":{"Rpc":{"type":10}}}]}
I0216 22:13:55.948563       1 utils.go:95] GRPC call: /csi.v1.Controller/ControllerGetCapabilities
I0216 22:13:55.948605       1 utils.go:96] GRPC request: {}
I0216 22:13:55.948684       1 utils.go:102] GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}},{"Type":{"Rpc":{"type":5}}},{"Type":{"Rpc":{"type":6}}},{"Type":{"Rpc":{"type":7}}},{"Type":{"Rpc":{"type":9}}},{"Type":{"Rpc":{"type":3}}},{"Type":{"Rpc":{"type":10}}}]}
I0216 22:14:25.950261       1 utils.go:95] GRPC call: /csi.v1.Controller/ControllerGetCapabilities
I0216 22:14:25.950340       1 utils.go:96] GRPC request: {}
I0216 22:14:25.950404       1 utils.go:102] GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}},{"Type":{"Rpc":{"type":5}}},{"Type":{"Rpc":{"type":6}}},{"Type":{"Rpc":{"type":7}}},{"Type":{"Rpc":{"type":9}}},{"Type":{"Rpc":{"type":3}}},{"Type":{"Rpc":{"type":10}}}]}
I0216 22:14:55.954980       1 utils.go:95] GRPC call: /csi.v1.Controller/ControllerGetCapabilities
I0216 22:14:55.955021       1 utils.go:96] GRPC request: {}
I0216 22:14:55.955078       1 utils.go:102] GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}},{"Type":{"Rpc":{"type":5}}},{"Type":{"Rpc":{"type":6}}},{"Type":{"Rpc":{"type":7}}},{"Type":{"Rpc":{"type":9}}},{"Type":{"Rpc":{"type":3}}},{"Type":{"Rpc":{"type":10}}}]}
I0216 22:15:25.956305       1 utils.go:95] GRPC call: /csi.v1.Controller/ControllerGetCapabilities
I0216 22:15:25.956336       1 utils.go:96] GRPC request: {}
I0216 22:15:25.956411       1 utils.go:102] GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}},{"Type":{"Rpc":{"type":5}}},{"Type":{"Rpc":{"type":6}}},{"Type":{"Rpc":{"type":7}}},{"Type":{"Rpc":{"type":9}}},{"Type":{"Rpc":{"type":3}}},{"Type":{"Rpc":{"type":10}}}]}
I0216 22:15:55.958881       1 utils.go:95] GRPC call: /csi.v1.Controller/ControllerGetCapabilities
I0216 22:15:55.958905       1 utils.go:96] GRPC request: {}
I0216 22:15:55.958963       1 utils.go:102] GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}},{"Type":{"Rpc":{"type":5}}},{"Type":{"Rpc":{"type":6}}},{"Type":{"Rpc":{"type":7}}},{"Type":{"Rpc":{"type":9}}},{"Type":{"Rpc":{"type":3}}},{"Type":{"Rpc":{"type":10}}}]}
I0216 22:16:25.960530       1 utils.go:95] GRPC call: /csi.v1.Controller/ControllerGetCapabilities
I0216 22:16:25.960588       1 utils.go:96] GRPC request: {}
I0216 22:16:25.960642       1 utils.go:102] GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}},{"Type":{"Rpc":{"type":5}}},{"Type":{"Rpc":{"type":6}}},{"Type":{"Rpc":{"type":7}}},{"Type":{"Rpc":{"type":9}}},{"Type":{"Rpc":{"type":3}}},{"Type":{"Rpc":{"type":10}}}]}
I0216 22:16:55.964671       1 utils.go:95] GRPC call: /csi.v1.Controller/ControllerGetCapabilities
I0216 22:16:55.964700       1 utils.go:96] GRPC request: {}
I0216 22:16:55.964776       1 utils.go:102] GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}},{"Type":{"Rpc":{"type":5}}},{"Type":{"Rpc":{"type":6}}},{"Type":{"Rpc":{"type":7}}},{"Type":{"Rpc":{"type":9}}},{"Type":{"Rpc":{"type":3}}},{"Type":{"Rpc":{"type":10}}}]}
I0216 22:17:25.966314       1 utils.go:95] GRPC call: /csi.v1.Controller/ControllerGetCapabilities
I0216 22:17:25.966351       1 utils.go:96] GRPC request: {}
I0216 22:17:25.966418       1 utils.go:102] GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}},{"Type":{"Rpc":{"type":5}}},{"Type":{"Rpc":{"type":6}}},{"Type":{"Rpc":{"type":7}}},{"Type":{"Rpc":{"type":9}}},{"Type":{"Rpc":{"type":3}}},{"Type":{"Rpc":{"type":10}}}]}
I0216 22:17:55.967540       1 utils.go:95] GRPC call: /csi.v1.Controller/ControllerGetCapabilities
I0216 22:17:55.967571       1 utils.go:96] GRPC request: {}
I0216 22:17:55.967640       1 utils.go:102] GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}},{"Type":{"Rpc":{"type":5}}},{"Type":{"Rpc":{"type":6}}},{"Type":{"Rpc":{"type":7}}},{"Type":{"Rpc":{"type":9}}},{"Type":{"Rpc":{"type":3}}},{"Type":{"Rpc":{"type":10}}}]}
I0216 22:18:25.969424       1 utils.go:95] GRPC call: /csi.v1.Controller/ControllerGetCapabilities
I0216 22:18:25.969477       1 utils.go:96] GRPC request: {}
I0216 22:18:25.969533       1 utils.go:102] GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}},{"Type":{"Rpc":{"type":5}}},{"Type":{"Rpc":{"type":6}}},{"Type":{"Rpc":{"type":7}}},{"Type":{"Rpc":{"type":9}}},{"Type":{"Rpc":{"type":3}}},{"Type":{"Rpc":{"type":10}}}]}
I0216 22:18:55.974176       1 utils.go:95] GRPC call: /csi.v1.Controller/ControllerGetCapabilities
I0216 22:18:55.974204       1 utils.go:96] GRPC request: {}
I0216 22:18:55.974265       1 utils.go:102] GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}},{"Type":{"Rpc":{"type":5}}},{"Type":{"Rpc":{"type":6}}},{"Type":{"Rpc":{"type":7}}},{"Type":{"Rpc":{"type":9}}},{"Type":{"Rpc":{"type":3}}},{"Type":{"Rpc":{"type":10}}}]}
I0216 22:19:25.975339       1 utils.go:95] GRPC call: /csi.v1.Controller/ControllerGetCapabilities
I0216 22:19:25.975364       1 utils.go:96] GRPC request: {}
I0216 22:19:25.975446       1 utils.go:102] GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}},{"Type":{"Rpc":{"type":5}}},{"Type":{"Rpc":{"type":6}}},{"Type":{"Rpc":{"type":7}}},{"Type":{"Rpc":{"type":9}}},{"Type":{"Rpc":{"type":3}}},{"Type":{"Rpc":{"type":10}}}]}
I0216 22:19:55.977504       1 utils.go:95] GRPC call: /csi.v1.Controller/ControllerGetCapabilities
I0216 22:19:55.977549       1 utils.go:96] GRPC request: {}
I0216 22:19:55.977650       1 utils.go:102] GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}},{"Type":{"Rpc":{"type":5}}},{"Type":{"Rpc":{"type":6}}},{"Type":{"Rpc":{"type":7}}},{"Type":{"Rpc":{"type":9}}},{"Type":{"Rpc":{"type":3}}},{"Type":{"Rpc":{"type":10}}}]}
I0216 22:20:25.979105       1 utils.go:95] GRPC call: /csi.v1.Controller/ControllerGetCapabilities
I0216 22:20:25.979127       1 utils.go:96] GRPC request: {}
I0216 22:20:25.979182       1 utils.go:102] GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}},{"Type":{"Rpc":{"type":5}}},{"Type":{"Rpc":{"type":6}}},{"Type":{"Rpc":{"type":7}}},{"Type":{"Rpc":{"type":9}}},{"Type":{"Rpc":{"type":3}}},{"Type":{"Rpc":{"type":10}}}]}
I0216 22:20:55.983490       1 utils.go:95] GRPC call: /csi.v1.Controller/ControllerGetCapabilities
I0216 22:20:55.983516       1 utils.go:96] GRPC request: {}
I0216 22:20:55.983569       1 utils.go:102] GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}},{"Type":{"Rpc":{"type":5}}},{"Type":{"Rpc":{"type":6}}},{"Type":{"Rpc":{"type":7}}},{"Type":{"Rpc":{"type":9}}},{"Type":{"Rpc":{"type":3}}},{"Type":{"Rpc":{"type":10}}}]}
I0216 22:21:25.984884       1 utils.go:95] GRPC call: /csi.v1.Controller/ControllerGetCapabilities
I0216 22:21:25.984919       1 utils.go:96] GRPC request: {}
I0216 22:21:25.984980       1 utils.go:102] GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}},{"Type":{"Rpc":{"type":5}}},{"Type":{"Rpc":{"type":6}}},{"Type":{"Rpc":{"type":7}}},{"Type":{"Rpc":{"type":9}}},{"Type":{"Rpc":{"type":3}}},{"Type":{"Rpc":{"type":10}}}]}
I0216 22:21:55.986380       1 utils.go:95] GRPC call: /csi.v1.Controller/ControllerGetCapabilities
I0216 22:21:55.986441       1 utils.go:96] GRPC request: {}
I0216 22:21:55.986541       1 utils.go:102] GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}},{"Type":{"Rpc":{"type":5}}},{"Type":{"Rpc":{"type":6}}},{"Type":{"Rpc":{"type":7}}},{"Type":{"Rpc":{"type":9}}},{"Type":{"Rpc":{"type":3}}},{"Type":{"Rpc":{"type":10}}}]}
I0216 22:22:25.987817       1 utils.go:95] GRPC call: /csi.v1.Controller/ControllerGetCapabilities
I0216 22:22:25.987846       1 utils.go:96] GRPC request: {}
I0216 22:22:25.987898       1 utils.go:102] GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}},{"Type":{"Rpc":{"type":5}}},{"Type":{"Rpc":{"type":6}}},{"Type":{"Rpc":{"type":7}}},{"Type":{"Rpc":{"type":9}}},{"Type":{"Rpc":{"type":3}}},{"Type":{"Rpc":{"type":10}}}]}
I0216 22:22:55.992533       1 utils.go:95] GRPC call: /csi.v1.Controller/ControllerGetCapabilities
I0216 22:22:55.992571       1 utils.go:96] GRPC request: {}
I0216 22:22:55.992654       1 utils.go:102] GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}},{"Type":{"Rpc":{"type":5}}},{"Type":{"Rpc":{"type":6}}},{"Type":{"Rpc":{"type":7}}},{"Type":{"Rpc":{"type":9}}},{"Type":{"Rpc":{"type":3}}},{"Type":{"Rpc":{"type":10}}}]}
I0216 22:23:25.994074       1 utils.go:95] GRPC call: /csi.v1.Controller/ControllerGetCapabilities
I0216 22:23:25.994096       1 utils.go:96] GRPC request: {}
I0216 22:23:25.994150       1 utils.go:102] GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}},{"Type":{"Rpc":{"type":5}}},{"Type":{"Rpc":{"type":6}}},{"Type":{"Rpc":{"type":7}}},{"Type":{"Rpc":{"type":9}}},{"Type":{"Rpc":{"type":3}}},{"Type":{"Rpc":{"type":10}}}]}
I0216 22:23:55.995655       1 utils.go:95] GRPC call: /csi.v1.Controller/ControllerGetCapabilities
I0216 22:23:55.995684       1 utils.go:96] GRPC request: {}
I0216 22:23:55.995740       1 utils.go:102] GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}},{"Type":{"Rpc":{"type":5}}},{"Type":{"Rpc":{"type":6}}},{"Type":{"Rpc":{"type":7}}},{"Type":{"Rpc":{"type":9}}},{"Type":{"Rpc":{"type":3}}},{"Type":{"Rpc":{"type":10}}}]}
I0216 22:24:25.997291       1 utils.go:95] GRPC call: /csi.v1.Controller/ControllerGetCapabilities
I0216 22:24:25.997331       1 utils.go:96] GRPC request: {}
I0216 22:24:25.997408       1 utils.go:102] GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}},{"Type":{"Rpc":{"type":5}}},{"Type":{"Rpc":{"type":6}}},{"Type":{"Rpc":{"type":7}}},{"Type":{"Rpc":{"type":9}}},{"Type":{"Rpc":{"type":3}}},{"Type":{"Rpc":{"type":10}}}]}
I0216 22:24:56.001293       1 utils.go:95] GRPC call: /csi.v1.Controller/ControllerGetCapabilities
I0216 22:24:56.001320       1 utils.go:96] GRPC request: {}
I0216 22:24:56.001375       1 utils.go:102] GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}},{"Type":{"Rpc":{"type":5}}},{"Type":{"Rpc":{"type":6}}},{"Type":{"Rpc":{"type":7}}},{"Type":{"Rpc":{"type":9}}},{"Type":{"Rpc":{"type":3}}},{"Type":{"Rpc":{"type":10}}}]}
I0216 22:25:26.002800       1 utils.go:95] GRPC call: /csi.v1.Controller/ControllerGetCapabilities
I0216 22:25:26.002825       1 utils.go:96] GRPC request: {}
I0216 22:25:26.002901       1 utils.go:102] GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}},{"Type":{"Rpc":{"type":5}}},{"Type":{"Rpc":{"type":6}}},{"Type":{"Rpc":{"type":7}}},{"Type":{"Rpc":{"type":9}}},{"Type":{"Rpc":{"type":3}}},{"Type":{"Rpc":{"type":10}}}]}
I0216 22:25:56.004522       1 utils.go:95] GRPC call: /csi.v1.Controller/ControllerGetCapabilities
I0216 22:25:56.004583       1 utils.go:96] GRPC request: {}
I0216 22:25:56.004636       1 utils.go:102] GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}},{"Type":{"Rpc":{"type":5}}},{"Type":{"Rpc":{"type":6}}},{"Type":{"Rpc":{"type":7}}},{"Type":{"Rpc":{"type":9}}},{"Type":{"Rpc":{"type":3}}},{"Type":{"Rpc":{"type":10}}}]}
I0216 22:26:26.006064       1 utils.go:95] GRPC call: /csi.v1.Controller/ControllerGetCapabilities
I0216 22:26:26.006105       1 utils.go:96] GRPC request: {}
I0216 22:26:26.006192       1 utils.go:102] GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}},{"Type":{"Rpc":{"type":5}}},{"Type":{"Rpc":{"type":6}}},{"Type":{"Rpc":{"type":7}}},{"Type":{"Rpc":{"type":9}}},{"Type":{"Rpc":{"type":3}}},{"Type":{"Rpc":{"type":10}}}]}
I0216 22:26:56.010787       1 utils.go:95] GRPC call: /csi.v1.Controller/ControllerGetCapabilities
I0216 22:26:56.010821       1 utils.go:96] GRPC request: {}
I0216 22:26:56.010877       1 utils.go:102] GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}},{"Type":{"Rpc":{"type":5}}},{"Type":{"Rpc":{"type":6}}},{"Type":{"Rpc":{"type":7}}},{"Type":{"Rpc":{"type":9}}},{"Type":{"Rpc":{"type":3}}},{"Type":{"Rpc":{"type":10}}}]}
I0216 22:27:26.012198       1 utils.go:95] GRPC call: /csi.v1.Controller/ControllerGetCapabilities
I0216 22:27:26.012232       1 utils.go:96] GRPC request: {}
I0216 22:27:26.012316       1 utils.go:102] GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}},{"Type":{"Rpc":{"type":5}}},{"Type":{"Rpc":{"type":6}}},{"Type":{"Rpc":{"type":7}}},{"Type":{"Rpc":{"type":9}}},{"Type":{"Rpc":{"type":3}}},{"Type":{"Rpc":{"type":10}}}]}
I0216 22:27:56.013738       1 utils.go:95] GRPC call: /csi.v1.Controller/ControllerGetCapabilities
I0216 22:27:56.013760       1 utils.go:96] GRPC request: {}
I0216 22:27:56.013835       1 utils.go:102] GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}},{"Type":{"Rpc":{"type":5}}},{"Type":{"Rpc":{"type":6}}},{"Type":{"Rpc":{"type":7}}},{"Type":{"Rpc":{"type":9}}},{"Type":{"Rpc":{"type":3}}},{"Type":{"Rpc":{"type":10}}}]}
I0216 22:28:26.016630       1 utils.go:95] GRPC call: /csi.v1.Controller/ControllerGetCapabilities
I0216 22:28:26.016692       1 utils.go:96] GRPC request: {}
I0216 22:28:26.016759       1 utils.go:102] GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}},{"Type":{"Rpc":{"type":5}}},{"Type":{"Rpc":{"type":6}}},{"Type":{"Rpc":{"type":7}}},{"Type":{"Rpc":{"type":9}}},{"Type":{"Rpc":{"type":3}}},{"Type":{"Rpc":{"type":10}}}]}
I0216 22:28:56.020407       1 utils.go:95] GRPC call: /csi.v1.Controller/ControllerGetCapabilities
I0216 22:28:56.020441       1 utils.go:96] GRPC request: {}
I0216 22:28:56.020495       1 utils.go:102] GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}},{"Type":{"Rpc":{"type":5}}},{"Type":{"Rpc":{"type":6}}},{"Type":{"Rpc":{"type":7}}},{"Type":{"Rpc":{"type":9}}},{"Type":{"Rpc":{"type":3}}},{"Type":{"Rpc":{"type":10}}}]}
I0216 22:29:26.022953       1 utils.go:95] GRPC call: /csi.v1.Controller/ControllerGetCapabilities
I0216 22:29:26.022979       1 utils.go:96] GRPC request: {}
I0216 22:29:26.023025       1 utils.go:102] GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}},{"Type":{"Rpc":{"type":5}}},{"Type":{"Rpc":{"type":6}}},{"Type":{"Rpc":{"type":7}}},{"Type":{"Rpc":{"type":9}}},{"Type":{"Rpc":{"type":3}}},{"Type":{"Rpc":{"type":10}}}]}
I0216 22:29:56.024025       1 utils.go:95] GRPC call: /csi.v1.Controller/ControllerGetCapabilities
I0216 22:29:56.024054       1 utils.go:96] GRPC request: {}
I0216 22:29:56.024154       1 utils.go:102] GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}},{"Type":{"Rpc":{"type":5}}},{"Type":{"Rpc":{"type":6}}},{"Type":{"Rpc":{"type":7}}},{"Type":{"Rpc":{"type":9}}},{"Type":{"Rpc":{"type":3}}},{"Type":{"Rpc":{"type":10}}}]}
I0216 22:30:26.025756       1 utils.go:95] GRPC call: /csi.v1.Controller/ControllerGetCapabilities
I0216 22:30:26.025779       1 utils.go:96] GRPC request: {}
I0216 22:30:26.025819       1 utils.go:102] GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}},{"Type":{"Rpc":{"type":5}}},{"Type":{"Rpc":{"type":6}}},{"Type":{"Rpc":{"type":7}}},{"Type":{"Rpc":{"type":9}}},{"Type":{"Rpc":{"type":3}}},{"Type":{"Rpc":{"type":10}}}]}
I0216 22:30:56.030539       1 utils.go:95] GRPC call: /csi.v1.Controller/ControllerGetCapabilities
I0216 22:30:56.030565       1 utils.go:96] GRPC request: {}
I0216 22:30:56.030615       1 utils.go:102] GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}},{"Type":{"Rpc":{"type":5}}},{"Type":{"Rpc":{"type":6}}},{"Type":{"Rpc":{"type":7}}},{"Type":{"Rpc":{"type":9}}},{"Type":{"Rpc":{"type":3}}},{"Type":{"Rpc":{"type":10}}}]}
I0216 22:31:26.031747       1 utils.go:95] GRPC call: /csi.v1.Controller/ControllerGetCapabilities
I0216 22:31:26.031778       1 utils.go:96] GRPC request: {}
I0216 22:31:26.031839       1 utils.go:102] GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}},{"Type":{"Rpc":{"type":5}}},{"Type":{"Rpc":{"type":6}}},{"Type":{"Rpc":{"type":7}}},{"Type":{"Rpc":{"type":9}}},{"Type":{"Rpc":{"type":3}}},{"Type":{"Rpc":{"type":10}}}]}
I0216 22:31:56.032949       1 utils.go:95] GRPC call: /csi.v1.Controller/ControllerGetCapabilities
I0216 22:31:56.032974       1 utils.go:96] GRPC request: {}
I0216 22:31:56.033030       1 utils.go:102] GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}},{"Type":{"Rpc":{"type":5}}},{"Type":{"Rpc":{"type":6}}},{"Type":{"Rpc":{"type":7}}},{"Type":{"Rpc":{"type":9}}},{"Type":{"Rpc":{"type":3}}},{"Type":{"Rpc":{"type":10}}}]}

Plugin status page
Screenshot 2021-02-15 at 15 18 13

@tgross
Copy link
Member

tgross commented Feb 17, 2021

I also need some guidance on this matter, have one controller , two or a node in every client.

You should run one node plugin (as a system job) on every client where you want to mount volumes. You should run at least one controller plugin. It's definitely possible the Azure plugin doesn't properly handle the HA case where there's more than one controller. That won't impact plugin counts for Nomad but it'll could cause confusion for the controller.

Here's a relevant bit of the node log. Note that the node plugin is running the controller services, which you don't want.

I0216 22:06:41.606257 1 driver.go:80] Enabling controller service capability: CREATE_DELETE_VOLUME
I0216 22:06:41.606273 1 driver.go:80] Enabling controller service capability: PUBLISH_UNPUBLISH_VOLUME
I0216 22:06:41.606279 1 driver.go:80] Enabling controller service capability: CREATE_DELETE_SNAPSHOT
I0216 22:06:41.606286 1 driver.go:80] Enabling controller service capability: LIST_SNAPSHOTS
I0216 22:06:41.606291 1 driver.go:80] Enabling controller service capability: CLONE_VOLUME
I0216 22:06:41.606297 1 driver.go:80] Enabling controller service capability: EXPAND_VOLUME
I0216 22:06:41.606303 1 driver.go:80] Enabling controller service capability: LIST_VOLUMES
I0216 22:06:41.606309 1 driver.go:80] Enabling controller service capability: LIST_VOLUMES_PUBLISHED_NODES

If we look at the GetPluginCapabilitiesResponse, we see that it's claiming to support the controller capabilities (the "type": 1 here):

I0216 22:06:41.655037 1 utils.go:95] GRPC call: /csi.v1.Identity/GetPluginCapabilities
I0216 22:06:41.655074 1 utils.go:96] GRPC request: {}
I0216 22:06:41.655115 1 utils.go:102] GRPC response: {"capabilities":[{"Type":{"Service":{"type":1}}},{"Type":{"Service":{"type":2}}}]}

Now, Nomad is only hitting it with Node RPCs, but that doesn't mean it's not confusing the plugin counts. It's going to be very challenging and not very useful to try to debug why we're getting the wrong counts for plugins if the plugins aren't reporting correctly to Nomad.

Then we continue a bit to where you've tried to claim a volume, and that's throwing errors. I'm not super familiar with Azure but it looks like you're trying to claim something for a VM that either doesn't exist or that the client doesn't have authorization for (depending on how the Azure API surfaces that):

I0216 22:06:42.080549 1 azure_vmclient.go:130] Received error in vm.get.request: resourceID: /subscriptions/02c7095c-b699-45c4-a1bf-c1bcb998e707/resourceGroups/hashitalks-demo/providers/Microsoft.Compute/virtualMachines/35fef3f81654, error: Retriable: false, RetryAfter: 0s, HTTPStatusCode: 404, RawError: Retriable: false, RetryAfter: 0s, HTTPStatusCode: 404, RawError: {"error":{"code":"ResourceNotFound","message":"The Resource 'Microsoft.Compute/virtualMachines/35fef3f81654' under resource group 'hashitalks-demo' was not found. For more details please go to https://aka.ms/ARMResourceNotFoundFix"}}
I0216 22:06:42.080639 1 azure_wrap.go:193] Virtual machine "35fef3f81654" not found
W0216 22:06:42.080656 1 nodeserver.go:310] Failed to get zone from Azure cloud provider, nodeName: HashiTalks-Demo-nomad-client-2-vm, error: instance not found
I0216 22:06:42.080666 1 nodeserver.go:329] got a matching size in getMaxDataDiskCount, VM Size: STANDARD_D2S_V3, MaxDataDiskCount: 4
I0216 22:06:42.080678 1 utils.go:102] GRPC response: {"accessible_topology":{"segments":{"topology.disk.csi.azure.com/zone":""}},"max_volumes_per_node":4,"node_id":"HashiTalks-Demo-nomad-client-2-vm"}
I0216 22:09:03.169199 1 utils.go:95] GRPC call: /csi.v1.Node/NodeStageVolume
I0216 22:09:03.169231 1 utils.go:96] GRPC request: {"staging_target_path":"/csi/staging/postgres-sonar/rw-file-system-single-node-writer","volume_capability":{"AccessType":{"Mount":{}},"access_mode":{"mode":1}},"volume_id":"/subscriptions/02c7095c-b699-45c4-a1bf-c1bcb998e707/resourceGroups/HashiTalks-Demo/providers/Microsoft.Compute/disks/HashiTalks-Demo-postgres-sonar-disk"}
E0216 22:09:03.169495 1 utils.go:100] GRPC error: rpc error: code = InvalidArgument desc = lun not provided

The controller plugin logs show that it's running the node service, so that's also a problem with the plugin config. But we don't see any Node RPCs hitting it so I think we're ok in terms of whether Nomad thinks it's a controller.

I0216 22:06:25.865175 1 driver.go:90] Enabling node service capability: STAGE_UNSTAGE_VOLUME
I0216 22:06:25.865181 1 driver.go:90] Enabling node service capability: EXPAND_VOLUME
I0216 22:06:25.865186 1 driver.go:90] Enabling node service capability: GET_VOLUME_STATS

But I only ever see the validate volume capabilities RPCs getting hit on the controller. This implies that the plugin doesn't implement much of the controller RPC API; this sort of makes sense with what we're seeing in the logs: a lot of plugins with controllers do attachment in the controller RPC but it's actually a better workflow in my opinion to do that in the node RPC. So this part looks healthy.

I0216 22:06:41.479079 1 utils.go:95] GRPC call: /csi.v1.Controller/ValidateVolumeCapabilities
I0216 22:06:41.479106 1 utils.go:96] GRPC request: {"volume_capabilities":[{"AccessType":{"Mount":{}},"access_mode":{"mode":1}}],"volume_id":"/subscriptions/02c7095c-b699-45c4-a1bf-c1bcb998e707/resourceGroups/HashiTalks-Demo/providers/Microsoft.Compute/disks/HashiTalks-Demo-postgres-pgadmin-disk"}
I0216 22:06:41.481821 1 utils.go:95] GRPC call: /csi.v1.Controller/ValidateVolumeCapabilities
I0216 22:06:41.481837 1 utils.go:96] GRPC request: {"volume_capabilities":[{"AccessType":{"Mount":{}},"access_mode":{"mode":1}}],"volume_id":"/subscriptions/02c7095c-b699-45c4-a1bf-c1bcb998e707/resourceGroups/HashiTalks-Demo/providers/Microsoft.Compute/disks/HashiTalks-Demo-postgres-sonar-disk"}
I0216 22:06:41.845760 1 utils.go:102] GRPC response: {}
I0216 22:06:41.847875 1 utils.go:102] GRPC response: {}

@carlosrbcunha
Copy link
Author

Then we continue a bit to where you've tried to claim a volume, and that's throwing errors. I'm not super familiar with Azure but it looks like you're trying to claim something for a VM that either doesn't exist or that the client doesn't have authorization for (depending on how the Azure API surfaces that):

This is because the node Plugin at first uses the name of the container and then it uses the node name to try match with the corresponding VM. I use a prefix and a suffix for my VM objects that is not present in the hostname of the client, that's why my parameter --nodeid its "--nodeid=${prefix}-$${attr.unique.hostname}-vm". If you see after the errors, it identified the correct VM as you can see here

I0216 22:06:42.080639 1 azure_wrap.go:193] Virtual machine "35fef3f81654" not found
W0216 22:06:42.080656 1 nodeserver.go:310] Failed to get zone from Azure cloud provider, nodeName: HashiTalks-Demo-nomad-client-2-vm, error: instance not found
I0216 22:06:42.080666 1 nodeserver.go:329] got a matching size in getMaxDataDiskCount, VM Size: STANDARD_D2S_V3, MaxDataDiskCount: 4

Regarding the node and controller starting advertising both capabilities, I will try to sort that out in the driver's GitHub page.
On the examples there, there is no specific flag I can use.

I tested this setup (1 controller + x nodes) and it won't start flawlessly every time I runned it. But if I restarted the controller node at the end, it will became healthy. Also with the sleep timer in the terraform , making sure that the nodes started, it became much more stable and worked 10/10 times I tested.

PS. I finished co-hosting a Hashitalks2021 presentation regarding this topic , hopefully we got people excited and will have more people using this. Thanks for the help !

@tgross
Copy link
Member

tgross commented Feb 17, 2021

It occurs to me that because the azuredisk plugin doesn't implement the controller attach RPC, it might be possible to run it in monolith mode as a system job. That way you wouldn't need the separate controller job and you wouldn't have to worry about collisions at all. But I'd verify that with the upstream developers for sure.

@tgross tgross moved this from In Progress to Needs Roadmapping in Nomad - Community Issues Triage Feb 26, 2021
@tgross tgross removed their assignment Feb 26, 2021
@tgross tgross moved this from Needs Roadmapping to In Progress in Nomad - Community Issues Triage Mar 4, 2021
@tgross tgross self-assigned this Mar 22, 2021
@tgross tgross moved this from In Progress to Needs Roadmapping in Nomad - Community Issues Triage May 17, 2021
@tgross tgross removed their assignment May 17, 2021
@tgross
Copy link
Member

tgross commented Jul 7, 2021

Haven't heard back on this one in a while, so I'm going to close it out. If you have new information on this specific issue, feel free to reopen. Otherwise go ahead and open a new issue. Thanks!

@tgross tgross closed this as completed Jul 7, 2021
Nomad - Community Issues Triage automation moved this from Needs Roadmapping to Done Jul 7, 2021
@github-actions
Copy link

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 17, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants