New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS EBS Volume Only Mounts Once #22433

Closed
stevesloka opened this Issue Mar 3, 2016 · 9 comments

Comments

Projects
None yet
6 participants
@stevesloka
Contributor

stevesloka commented Mar 3, 2016

I have an EBS volume created. When I mount the volume to a pod it will mount successfully. If I then kill the pod and re-create the volume will not get attached. If I restart the kubelet on that node, the volume will then mount correctly.

Interestingly enough, it did work as expected once. So maybe there's a timing issue somewhere. I am running this pod on the same node over and over.

Kube Version: 1.1.2 (GitVersion: 3085895)
OS: CoreOS alpha (899.1.0)

Here are some logs:

aws.go:1022] Timeout waiting for volume state: actual=detached, desired=attached
aws.go:909] releaseMountDevice on non-allocated device
kubelet.go:1360] Unable to mount volumes for pod "test-ebs_default": Timeout waiting for volume state; skipping pod
pod_workers.go:112] Error syncing pod e27c9a3f-e146-11e5-ba43-12b3a7c90661, skipping: Timeout waiting for volume state
aws.go:876] Got assignment call for already-assigned volume: g@vol-d1ee9178

// cc @justinsb

@Morriz

This comment has been minimized.

Morriz commented Mar 19, 2016

Same here still...1.2.0 didn't solve it :(

@justinsb

This comment has been minimized.

Member

justinsb commented Mar 20, 2016

That is really surprising - 1.2.0 should really have solved it. Can you possibly post / send me the kubelet log from the node. (volume mounts are done directly on the node, hence it is the kubelet log we need here). Not sure about CoreOS; on other systemd systems it is in journalctl -u kubelet.

@Morriz

This comment has been minimized.

Morriz commented Mar 20, 2016

I will when I get the chance (sunday restday). Restarting kubelet is the current workaround still...

On 20 mrt. 2016, at 14:58, Justin Santa Barbara notifications@github.com wrote:

That is really surprising - 1.2.0 should really have solved it. Can you possibly send me the kubelet log from the node. (volume mounts are done directly on the node, hence it is the kubelet log we need here). Not sure about CoreOS; on other systemd systems it is in journalctl -u kubelet.


You are receiving this because you commented.
Reply to this email directly or view it on GitHub

@Morriz

This comment has been minimized.

Morriz commented Mar 21, 2016

I got this:

Mar 21 18:30:36 ip-172-20-0-60.eu-central-1.compute.internal kubelet[669]: W0321 18:30:36.960255     669 aws.go:876] Got assignment call for already-assigned volume: g@vol-fcdd3d24
Mar 21 18:30:55 ip-172-20-0-60.eu-central-1.compute.internal kubelet[669]: W0321 18:30:55.542273     669 aws.go:1022] Timeout waiting for volume state: actual=detached, desired=attached
Mar 21 18:30:55 ip-172-20-0-60.eu-central-1.compute.internal kubelet[669]: E0321 18:30:55.542301     669 aws.go:909] releaseMountDevice on non-allocated device
Mar 21 18:30:55 ip-172-20-0-60.eu-central-1.compute.internal kubelet[669]: E0321 18:30:55.542324     669 kubelet.go:1360] Unable to mount volumes for pod "kube-registry-v2-0q0d4_kube-system": Timeout waiting for volume state; skipping pod
Mar 21 18:30:55 ip-172-20-0-60.eu-central-1.compute.internal kubelet[669]: E0321 18:30:55.556273     669 pod_workers.go:112] Error syncing pod 9c36b757-ef92-11e5-9edd-02c4a6f332c1, skipping: Timeout waiting for volume state
Mar 21 18:31:39 ip-172-20-0-60.eu-central-1.compute.internal kubelet[669]: W0321 18:31:39.717333     669 aws.go:1022] Timeout waiting for volume state: actual=detached, desired=attached
Mar 21 18:31:39 ip-172-20-0-60.eu-central-1.compute.internal kubelet[669]: E0321 18:31:39.717366     669 aws.go:909] releaseMountDevice on non-allocated device
Mar 21 18:31:39 ip-172-20-0-60.eu-central-1.compute.internal kubelet[669]: E0321 18:31:39.717391     669 kubelet.go:1360] Unable to mount volumes for pod "kube-registry-v2-fhomk_kube-system": Timeout waiting for volume state; skipping pod
Mar 21 18:31:39 ip-172-20-0-60.eu-central-1.compute.internal kubelet[669]: E0321 18:31:39.738189     669 pod_workers.go:112] Error syncing pod 0182195a-ef93-11e5-9edd-02c4a6f332c1, skipping: Timeout waiting for volume state
Mar 21 18:31:39 ip-172-20-0-60.eu-central-1.compute.internal kubelet[669]: W0321 18:31:39.741915     669 aws.go:876] Got assignment call for already-assigned volume: g@vol-fcdd3d24
@stevesloka

This comment has been minimized.

Contributor

stevesloka commented Mar 22, 2016

I just got my cluster upgraded to 1.2 and the EBS mounts seemed to work fine. I'm running on CoreOS:

Server Version: version.Info{Major:"1", Minor:"2", GitVersion:"v1.2.0", GitCommit:"5cb86ee022267586db386f62781338b0483733b3", GitTreeState:"clean"}
@Morriz

This comment has been minimized.

Morriz commented Mar 22, 2016

I am targeting a 200 GB gp2 partition, and managed to mount one. Another container is trying to mount one similar disk, but that fails with the old output from pre-1.2.0. I have never seen any improvements since 1.2.0. Here's the describe pod log:

Name:       kube-registry-v2-sa98m
Namespace:  kube-system
Node:       ip-172-20-0-60.eu-central-1.compute.internal/172.20.0.60
Start Time: Tue, 22 Mar 2016 17:26:23 +0100
Labels:     k8s-app=kube-registry-v2,kubernetes.io/cluster-service=true,version=v2
Status:     Pending
IP:     
Controllers:    ReplicationController/kube-registry-v2
Containers:
  registry:
    Container ID:   
    Image:      registry:2.3.1
    Image ID:       
    Port:       5000/TCP
    QoS Tier:
      cpu:      BestEffort
      memory:       BestEffort
    State:      Waiting
      Reason:       ContainerCreating
    Ready:      False
    Restart Count:  0
    Environment Variables:
      REGISTRY_STORAGE_FILESYSTEM_ROOTDIRECTORY:    /var/lib/registry
Conditions:
  Type      Status
  Ready     False 
Volumes:
  image-store:
    Type:   AWSElasticBlockStore (a Persistent Disk resource in AWS)
    VolumeID:   vol-6751b3bf
    FSType: ext4
    Partition:  0
    ReadOnly:   false
  default-token-ftozw:
    Type:   Secret (a volume populated by a Secret)
    SecretName: default-token-ftozw
Events:
  FirstSeen LastSeen    Count   From                            SubobjectPath   Type        Reason      Message
  --------- --------    -----   ----                            -------------   --------    ------      -------
  1m        1m      1   {default-scheduler }                            Normal      Scheduled   Successfully assigned kube-registry-v2-sa98m to ip-172-20-0-60.eu-central-1.compute.internal
  35s       35s     1   {kubelet ip-172-20-0-60.eu-central-1.compute.internal}                  FailedMount Unable to mount volumes for pod "kube-registry-v2-sa98m_kube-system": Timeout waiting for volume state
  35s       35s     1   {kubelet ip-172-20-0-60.eu-central-1.compute.internal}                  FailedSync  Error syncing pod, skipping: Timeout waiting for volume state
@Morriz

This comment has been minimized.

Morriz commented Mar 22, 2016

I found out that:

  • when k8s won't mount on redeploy of a container it won't free the disk, so
  • I have to force unmount
  • which corrupts the disk (not nice at all!)
  • and I have to recreate a partition, which will work the first time
@stevesloka

This comment has been minimized.

Contributor

stevesloka commented Apr 10, 2016

I'm going to close this issue since things are working for me. I you still have troubles @Morriz might be good to open your own issue for better tracking.

@stevesloka stevesloka closed this Apr 10, 2016

@Morriz

This comment has been minimized.

Morriz commented Apr 11, 2016

I can confirm all works ok with our setup now too. The only noise left is caused by corrupt files, which are only a problem when re-mounting too soon. It's definitely worth noting in the docs tho, as it's a common scenario...

On 10 apr. 2016, at 15:57, Steve Sloka notifications@github.com wrote:

I'm going to close this issue since things are working for me. I you still have troubles @Morriz might be good to open your own issue for better tracking.


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub

@pmorie pmorie added the sig/storage label Jun 3, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment