Skip to content
This repository has been archived by the owner on Apr 22, 2020. It is now read-only.

provision volume failed with kubernetes v1.9.0 #502

Closed
svasseur opened this issue Dec 22, 2017 · 17 comments
Closed

provision volume failed with kubernetes v1.9.0 #502

svasseur opened this issue Dec 22, 2017 · 17 comments
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@svasseur
Copy link

when i add a volume with vsphere-volume, always an error "AttachVolume.Attach failed for volume "xxxx" : 404 Not Found"
it's work like a charm in v1.6.5

i try to create a vmdk volume
vmkfstools -c 10G /vmfs/volumes/Datastore-12/volumes/kubeVolume.vmdk
and after

kind: Deployment
metadata:
  name: httpd3
  namespace: default
  labels:
    app: httpd3
spec:
  replicas: 1
  template:
    metadata:
      labels:
        app: httpd3
    spec:
      containers:
        - name: httpd3
          image: httpd:alpine
          ports:
            - name: http
              containerPort: 80
          volumeMounts:
            - name: httpd3-persistent-storage
              mountPath: /usr/local/apache2/htdocs/
      volumes:
      - name: httpd3-persistent-storage
        vsphereVolume:
         volumePath: "[Datastore-12] kubeVolume"
         fsType: ext4

kubectl describe pod :

Normal   Scheduled              2m               default-scheduler          Successfully assigned httpd3-74754b7fdc-p8j26 to kubernetes-node4
  Normal   SuccessfulMountVolume  2m               kubelet, kubernetes-node4  MountVolume.SetUp succeeded for volume "default-token-sfr9w"
  Warning  FailedMount            1m (x8 over 2m)  attachdetach-controller    AttachVolume.Attach failed for volume "httpd3-persistent-storage" : 404 Not Found

I also try with stateful storage

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: thin-disk
provisioner: kubernetes.io/vsphere-volume
parameters:
    diskformat: thin
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: web
spec:
  selector:
    matchLabels:
      app: httpd1 # has to match .spec.template.metadata.labels
  serviceName: "httpd1"
  replicas: 1 # by default is 1
  template:
    metadata:
      labels:
        app: httpd1 # has to match .spec.selector.matchLabels
    spec:
      terminationGracePeriodSeconds: 10
      containers:
      - name: httpd1
        image: httpd:alpine
        ports:
        - containerPort: 80
          name: web
        volumeMounts:
        - name: www
          mountPath: /usr/local/apache2/htdocs/
  volumeClaimTemplates:
  - metadata:
      name: www
      annotations:
        volume.beta.kubernetes.io/storage-class: thin-disk
    spec:
      accessModes: [ "ReadWriteOnce" ]
      resources:
        requests:
          storage: 1Gi

kubectl describe pvc

Events:
  Type     Reason              Age                From                         Message
  ----     ------              ----               ----                         -------
  Warning  ProvisioningFailed  1m (x321 over 1h)  persistentvolume-controller  Failed to provision volume with StorageClass "thin-disk": 404 Not Found
@divyenpatel
Copy link
Contributor

@svasseur
Can you share vSphere conf file? Is vCenter server configured on default 443 port?
Are you able to ping vCenter server from the kubernetes master node?

@svasseur
Copy link
Author

svasseur commented Dec 26, 2017

yes i can ping my vsphere server from the master node
but my vcenter port is 9443
here is my config

# Phase 1: Cluster Resource Provisioning
#
.phase1.num_nodes=4
.phase1.cluster_name="kubernetes"
.phase1.ssh_user=""
.phase1.cloud_provider="vsphere"

#
# vSphere configuration
#
.phase1.vSphere.url="172.20.19.12"
.phase1.vSphere.port=9443
.phase1.vSphere.username="xxxxxxx"
.phase1.vSphere.password="xxxxxxx"
.phase1.vSphere.insecure=y
.phase1.vSphere.datastore="Datastore-12"
.phase1.vSphere.placement="cluster"
.phase1.vSphere.useresourcepool="no"
.phase1.vSphere.vmfolderpath="kubernetes"
.phase1.vSphere.vcpu=4
.phase1.vSphere.memory=8096
.phase1.vSphere.network="Prod"
.phase1.vSphere.template="Templates/KubernetesAnywhereTemplatePhotonOS"
.phase1.vSphere.flannel_net="172.1.0.0/16"

#
# Phase 2: Node Bootstrapping
#
.phase2.kubernetes_version="v1.9.0"
.phase2.provider="ignition"
.phase2.installer_container="docker.io/cnastorage/k8s-ignition:v1.8-dev-release"
.phase2.docker_registry="gcr.io/google-containers"

#
# Phase 3: Deploying Addons
#
.phase3.run_addons=y
.phase3.kube_proxy=y
.phase3.dashboard=y
.phase3.heapster=y
.phase3.kube_dns=y
# .phase3.weave_net is not set```

@divyenpatel
Copy link
Contributor

@svasseur I see some problem in the deployment yaml with statically created volume using vmkfstools.

Can you use following volume path ?

volumePath: "[Datastore-12] volumes/kubeVolume.vmdk"

404 is different issue we still need to debug.

In the above config .phase1.vSphere.cluster is missing. I guess you have specified some valid cluster here.

Also check out /etc/kubernetes/vsphere.conf file and make sure all vCenter specific parameters are set correctly.

@svasseur
Copy link
Author

sorry to not put all parameter it's because it's working in v1.6.5
here the complete

.phase1.vSphere.url="172.20.19.12"
.phase1.vSphere.port=9443
.phase1.vSphere.username="xxxxxx"
.phase1.vSphere.password="xxxxx"
.phase1.vSphere.insecure=y
.phase1.vSphere.datacenter="Norsys-DC"
.phase1.vSphere.datastore="Datastore-12"
.phase1.vSphere.placement="cluster"
.phase1.vSphere.cluster="Norsys-CL"
.phase1.vSphere.useresourcepool="no"
.phase1.vSphere.vmfolderpath="kubernetes"
.phase1.vSphere.vcpu=4
.phase1.vSphere.memory=8096
.phase1.vSphere.network="Prod"
.phase1.vSphere.template="Templates/KubernetesAnywhereTemplatePhotonOS"
.phase1.vSphere.flannel_net="172.1.0.0/16"

and i see the volume create with vmkfstools ( wrong copy/paste )

/etc/kubernetes/vsphere.conf

[Global]
        user = "xxxxx"
        password = "xxxxxx"
        server = "172.20.19.12"
        port = "9443"
        insecure-flag = "true"
        datacenter = "Norsys-DC"
        datastore = "Datastore-12"
        working-dir = "kubernetes"
[Disk]
	scsicontrollertype = pvscsi

and the problem happen also with the stateful storage ( same yaml in 1.6.5 work)

@ghost
Copy link

ghost commented Jan 17, 2018

Hi everybody,
I need your help (@divyenpatel you seem to be the expert for vSphere)
I have a similar issue with v1.9.0 wereas it works perfectly well with v1.6.5.
I am creating a storage class with the following yaml file:

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: medhub-sc-default
provisioner: kubernetes.io/vsphere-volume
parameters:
    diskformat: zeroedthick
    fstype:     ext3
    datastore:	Shared Storages/pcc-006537

Then i try to create a PVC using this storage class:

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: pvcsc001
  annotations:
    volume.beta.kubernetes.io/storage-class: medhub-sc-default
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 2Gi

At the start the error was the following:
Failed to provision volume with StorageClass "medhub-sc-default": NotAuthenticated

I decided to restart all the nodes and then the error changed to the following:
Failed to provision volume with StorageClass "medhub-sc-default": The specified datastore Shared Storages/pcc-006537 is not a shared datastore across node VMs

Datastores are of course shared, so I do not understand this error.

Thanks for the help!

@divyenpatel
Copy link
Contributor

Failed to provision volume with StorageClass "medhub-sc-default": The specified datastore Shared Storages/pcc-006537 is not a shared datastore across node VMs
Datastores are of course shared, so I do not understand this error.

@maximematheron we are working on the fix. You can track this issue - vmware-archive/kubernetes-archived#436

Temporarily if you want to unblock your self from trying out 1.9.0, you can move datastore pcc-006537 to the root storage folder (/).

At the start the error was the following:
Failed to provision volume with StorageClass "medhub-sc-default": NotAuthenticated

Regarding this issue we have the fix merged in to 1.9 branch - kubernetes/kubernetes#58124
1.9.2 release should be out with this fix very soon.

@ghost
Copy link

ghost commented Jan 17, 2018

Thanks for the response @divyenpatel !

I just specified the Datastore without the folder name and it worked. Basically, my SC yaml is the following:

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: medhub-sc-default
provisioner: kubernetes.io/vsphere-volume
parameters:
    diskformat: zeroedthick
    fstype:     ext3
    datastore:	pcc-006537

It is really weird that with the v1.6.5 I had to specify the folder.

What is really weird also is that for both versions, I need to restart all the nodes at least one time to make it successful (to create PVC). Do you know why it is happening?

Maxime

@divyenpatel
Copy link
Contributor

I just specified the Datastore without the folder name and it worked. Basically, my SC yaml is the following:

@maximematheron thank you for sharing workaround. I was thinking other way, but your suggestion to change yaml looks good. I have also tried this out end to end (with provisioning pod using PVC) for shared datastore located in the datastore cluster with just fixing storage class. Everything worked fine.

It is really weird that with the v1.6.5 I had to specify the folder.

This is happening because in 1.9 we are checking datastore name, which user has specified in the storage class against the shared datastores we have queried from vCenter. Comparing just names. See https://github.com/kubernetes/kubernetes/blob/release-1.9/pkg/cloudprovider/providers/vsphere/vsphere.go#L1064

I need to restart all the nodes at least one time to make it successful (to create PVC). Do you know why it is happening?

With this fix - kubernetes/kubernetes#58124 we are making sure vsphere connections are getting renewed if they are timed out, so from 1.9.2 you will not require to restart nodes.

@ghost
Copy link

ghost commented Jan 18, 2018

Thanks @divyenpatel !

So you think I should use the v1.9.2-beta.0 version with your docker image docker.io/divyen/etcd3-ignition:latest?

I have built from your commit the kubectl (my cluster is still built with the v1.9.0 release) and the error is still the same: i have to restart my cluster all the time. Actually, the same error exists for the v1.6.5. I can Create PVC but then Pods cannot see them. I have to restart again and again the cluster.

Should i create the cluster with the kubectl of your commit?

Maxime

@divyenpatel
Copy link
Contributor

@maximematheron
I would suggest to try with hyperkube image built from release-1.9 branch.
If the issue persist then you should create a new issue at https://github.com/vmware/kubernetes/issues

Here is the official community support (slack and email alias) - https://vmware.github.io/hatchway/#support

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 18, 2018
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 18, 2018
@calston
Copy link

calston commented May 30, 2018

/remove-lifecycle rotten

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label May 30, 2018
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 28, 2018
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Sep 27, 2018
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

5 participants