Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS EBS Attaches but Pod Remains at Pending #11011

Closed
delianides opened this issue Jul 9, 2015 · 27 comments
Closed

AWS EBS Attaches but Pod Remains at Pending #11011

delianides opened this issue Jul 9, 2015 · 27 comments
Labels
area/os/coreos priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.

Comments

@delianides
Copy link

Running the default cluster setup on AWS and using CoreOS. Creating a mongo pod with no volume is successful. But creating a pod with an attached EBS volume is not. The volume does successfully attach to the correct node but the pod remains at the pending state.

Name:                           mongo
Namespace:                      default
Image(s):                       mongo:latest
Node:                           ip-172-20-0-194.ec2.internal/172.20.0.194
Labels:                         name=mongo,role=mongo
Status:                         Pending
Reason:
Message:
IP:
Replication Controllers:        <none>
Containers:
  mongo:
    Image:      mongo:latest
    Limits:
      cpu:              100m
    State:              Waiting
      Reason:           Image: mongo:latest is not ready on the node
    Ready:              False
    Restart Count:      0
Conditions:
  Type          Status
  Ready         False
Events:
  FirstSeen                             LastSeen                        Count   From                                    SubobjectPath   Reason          Message
  Thu, 09 Jul 2015 15:05:08 -0400       Thu, 09 Jul 2015 15:05:08 -0400 1       {scheduler }                                            scheduled       Successfully assigned mongo to ip-172-20-0-194.ec2.internal
  Thu, 09 Jul 2015 15:05:14 -0400       Thu, 09 Jul 2015 15:13:28 -0400 51      {kubelet ip-172-20-0-194.ec2.internal}                  failedMount     Unable to mount volumes for pod "mongo_default": fork/exec /usr/share/google/safe_format_and_mount: no such file or directory
  Thu, 09 Jul 2015 15:05:14 -0400       Thu, 09 Jul 2015 15:13:28 -0400 51      {kubelet ip-172-20-0-194.ec2.internal}                  failedSync      Error syncing pod, skipping: fork/exec /usr/share/google/safe_format_and_mount: no such file or directory

The template file:

kind: Pod
apiVersion: v1
metadata:
  name: mongo
  labels:
    name: mongo
    role: mongo
spec:
  containers:
  - name: mongo
    image: mongo:latest
    ports:
    - name: mongo
      containerPort: 27017
      protocol: TCP
    resources: {}
    volumeMounts:
    - mountPath: /data/db
      name: mongo-disk
  volumes:
  - name: mongo-disk
    awsElasticBlockStore:
      volumeID: vol-xxxxxxx # PR#10181
      fsType: ext4

Followed new VolumeID name per #10181. There are no helpful error messages on the kubelet or controller-manager. I moved up to use larger ec2 instances rather than the default t2.micro and didn't have any change.

@delianides delianides changed the title AWS EBS Attaches but Pod Remains at Waiting AWS EBS Attaches but Pod Remains at Pending Jul 9, 2015
@justinsb
Copy link
Member

justinsb commented Jul 9, 2015

Thanks for posting the kubectl describe output - that is very helpful, particularly:

  Thu, 09 Jul 2015 15:05:14 -0400       Thu, 09 Jul 2015 15:13:28 -0400 51      {kubelet ip-172-20-0-194.ec2.internal}                  failedMount     Unable to mount volumes for pod "mongo_default": fork/exec /usr/share/google/safe_format_and_mount: no such file or directory
  Thu, 09 Jul 2015 15:05:14 -0400       Thu, 09 Jul 2015 15:13:28 -0400 51      {kubelet ip-172-20-0-194.ec2.internal}                  failedSync      Error syncing pod, skipping: fork/exec /usr/share/google/safe_format_and_mount: no such file or directory

CoreOS setup needs to add /usr/share/google/safe_format_and_mount. @bakins?

@bakins
Copy link

bakins commented Jul 9, 2015

well, /usr is readonly. related #7042 and #10017

@vmarmol vmarmol added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. team/cluster labels Jul 10, 2015
@ghost
Copy link

ghost commented Jul 10, 2015

#8530 will solve this but post 1.0

@delianides
Copy link
Author

@swagiaal Thats fine. But shouldn't it be a concern that the built binaries require a script that is only available if you run kube-up?

@justinsb
Copy link
Member

@delianides It's actually Salt that installs it (along with all the binaries). In theory you could trigger that without using kube-up, though I don't know why you would want to! That said, I saw a suggestion in #8530 to move the functionality of the script into k8s itself; I think that would be an improvement here!

@galacto
Copy link

galacto commented Jul 28, 2015

I am trying to setup mongo pod and attach an aws ebs volume to it. It succeeds when I do it in individual POD but fails if I use replication controller to do that.

I read this in documentation Using a PD on a pod controlled by a ReplicationController will fail unless the PD is read-only or the replica count is 0 or 1.

I tried setting replica to 1 but no luck, any suggestions?

@ghost
Copy link

ghost commented Jul 31, 2015

@galacto could you open an new issue for this and cc me.

@galacto
Copy link

galacto commented Jul 31, 2015

@swagiaal here it is #12100

@bliss
Copy link

bliss commented Aug 26, 2015

Good day!
After some playing around EBS persistent storage have found out that pod creation stucks always after attaching a EBS volume with message 'Image: nginx is ready, container is creating'. This is config being fed to kubernetes:

{
    "kind": "ReplicationController",
    "spec": {
        "selector": {
            "name": "testreplica"
        },
        "template": {
            "spec": {
                "containers": [{
                    "restartCount": 0,
                    "terminationMessagePath": null,
                    "name": "nginx1051066187104",
                    "limits": {
                        "cp": "0.01 Cores",
                        "memory": "64 MB"
                    },
                    "volumeMounts": [{
                        "readOnly": false,
                        "mountPath": "/usr/share/nginx/html",
                        "name": "usr-share-nginx-html51047551774"
                    }],
                    "image": "nginx",
                    "workingDir": "",
                    "imageID": "nginx",
                    "state": "stopped",
                    "command": [
                        "nginx", "-g",
                        "daemon off;"
                    ],
                    "resources": {
                        "requires": {
                            "cp": "0.01",
                            "memory": 67108864
                        },
                        "limits": {
                            "cp": "0.01",
                            "memory": 67108864
                        }
                    },
                    "env": [{
                        "name": "NGINX_VERSION",
                        "value": "1.9.4-1~jessie"
                    }],
                    "ready": false,
                    "startedAt": null,
                    "ports": [{
                        "isPublic": false,
                        "protocol": "TCP",
                        "containerPort": 80
                    },
                    {
                        "isPublic": false,
                        "protocol": "TCP",
                        "containerPort": 443
                    }],
                        "lastState": {},
                        "containerID": "nginx1051066187104"
                    }],
                    "volumes": [{
                        "awsElasticBlockStore": {
                            "fsType": "ext4",
                            "volumeID": "aws://us-west-2b/vol-51ad5fa5"
                        },
                        "name": "usr-share-nginx-html51047551774"
                    }],
                    "nodeSelector": {
                        "kube-type": "type_0"
                    }
                }, "metadata": {
                "labels": {
                    "name": "testreplica"
                }
            }
        },
        "replicas": 1
    },
    "apiVersion": "v1",
    "metadata": {
        "labels": {
            "name": "testreplica"
        },
        "namespace": "bliss-testreplica-893b2623bf0a61267a02ee2b7bec796b",
        "name": "testreplicapzon78a1m6d0u5ikcftb",
        "uid": "b592aee7-17e3-48df-b965-68a08e272da3"
    }
}

Such a pod without EBS volume section (empytDir) successfully becomes running. Is this a kubernetes or config issue ? Thank you.
kubernetes version
kubelet --version Kubernetes 1.0.3-0.1.git61c6ac5
kube-apiserver --version Kubernetes 1.0.3-0.1.git61c6ac5

@justinsb
Copy link
Member

@bliss are you running on CoreOS or Ubuntu?

If you kubectl get pods and run kubectl describe pod <podid> for the non-starting pod you will the events for that pod, which typically (hopefully) include an error message describing any launch problems. (I don't know if this also works on the RC i.e. kubectl describe rc testreplica).

If that doesn't include any useful information, the ultimate way of diagnosing is to:

  • kubectl get pods <podid> -ojson
  • see which hostIP it ended up on
  • do kubectl get nodes -ojson to correlate that to a public IP
  • SSH into that node (ssh -i ~/.ssh/kube_aws_rsa ubuntu@<ip>)
  • then look in /var/log/kubelet.log or journalctl -u kubelet to get the logs from the kubelet.

That is pretty painful though, which is why we try to surface relevant errors through events which show up in kubectl describe :-)

@bliss
Copy link

bliss commented Aug 27, 2015

Thank you for your reply!
here the fragment from 'kubectl -o json get pods ...' related to pod in question to make sure this is EBS volume on AWS EC2 instance setup

    "metadata": {
        "name": "hello-podv2xyc817ldutfgwkb5rq-fw48a",
        "generateName": "hello-podv2xyc817ldutfgwkb5rq-",
        "namespace": "bliss-hello-pod-f1ea6bb7d84417e69b250dad4bb14ffd",
        "labels": {
            "name": "hello-pod"
        },
    },
    "spec": {
        "volumes": [
            {
                "name": "usr-share-nginx-html55991096452",
                "awsElasticBlockStore": {
                    "volumeID": "aws://us-west-2b/vol-ddbe4129",
                    "fsType": "ext4"
                }
            }
        ],

This is last log message from 'kubectl describe pods...' related to the pod:

  Thu, 27 Aug 2015 09:41:48 +0000       Thu, 27 Aug 2015 09:42:47 +0000 8       {kubelet ip-10-0-0-203.us-west-2.compute.internal}                   failedMount      Unable to mount volumes for pod "hello-podv2xyc817ldutfgwkb5rq-fw48a_bliss-hello-pod-f1ea6bb7d84417e69b250dad4bb14ffd": fork/exec /usr/share/google/safe_format_and_mount: no such file or directory

As for me it's pretty strange to look for google safe_format_and_mount on EC2 instance, isn't it?
Moreover there's no mention of google utils in docs (https://github.com/kubernetes/kubernetes/blob/release-1.0/docs/user-guide/volumes.md#awselasticblockstore).

@bliss
Copy link

bliss commented Aug 27, 2015

I found safe_format_and_mount script and placed it to required location. Thank God, on EBS without filesystem the pod started! Thank you!

@justinsb
Copy link
Member

Ah, well glad this is solved. That script should be installed for you if you're using kube-up. How did you install?

This script actually presents problems on CoreOS as well, so we're thinking of just integrating its functionality into kubelet itself (so there would be no external dependency, even if that dependency should be satisfied for you by the installation routine...)

@justinsb
Copy link
Member

In fact, it looks like #8530 which integrates the functionality has now been merged! Hooray!

@bliss
Copy link

bliss commented Aug 27, 2015

We install kubernetes from RPM packages and we have our deploy script for that because we use CentOS while kube-up primarily for Ubuntu-based setups.

@gambol99
Copy link
Contributor

CoreOS doesn't have the file command, so to # file -L --special-files /dev/ won't work ... I've implemented a temporary hack though. I have a alpine container with nothing more than apk-install file

core@ip-10-50-0-119 /opt/bin $ ls -l file
-rwxr-xr-x 1 root root 88 Aug 29 14:55 file
core@ip-10-50-0-119 /opt/bin $ cat file 
#!/usr/bin/bash
docker run -v /dev:/dev --privileged=true --rm gambol99/filecmd file $@
core@ip-10-50-0-119 /opt/bin $ file -Ls /dev/xvdf
/dev/xvdf: Linux rev 1.0 ext4 filesystem data, UUID=9b6d546b-4273-44bb-8434-d46421da5723 (needs journal recovery) (extents) (large files) (huge files)

@bliss
Copy link

bliss commented Sep 5, 2015

Good day! It's still all about AWS persistent drives. I found out that e.g. for mariadb containers persistent drive mountpoints has improper context. It prevents mariadb from writing to the drive and makes it crush. As for safe_format_and_mount script I made a workaround that solves the issue:

MOUNT_OPTIONS="discard,defaults,context=system_u:object_r:cgroup_t:s0"

But what about the issue when the script functionality gets built-in?

@bcwaldon
Copy link

I just run into this on CoreOS w/ AWS myself. Today, CoreOS ships the kubelet directly in the root filesystem, but no safe_format_and_mount. Does it make sense for CoreOS to ship this script as well, or are there plans for this script to be implemented natively in golang? Alternatively, would it be better to move away from vendoring the kubelet in the image and run the kubelet using the hyperkube docker image?

@philk
Copy link

philk commented Sep 25, 2015

I'm pretty sure #8530 is the fix for it (so yes, golang directly in the kubelet)

@liquid-sky
Copy link

Hi @bcwaldon, just wondering when you are planning to include updated kubelet in Alpha channel? Looks like v1.0.7 has been released recently.

@bcwaldon
Copy link

bcwaldon commented Nov 3, 2015

@liquid-sky v1.0.7 was marked as a "Pre-release" at https://github.com/kubernetes/kubernetes/releases until just recently, but now that it's marked as "Latest release", we can go ahead and bump it in the CoreOS image. I'll make sure it gets done ASAP.

@bcwaldon
Copy link

bcwaldon commented Nov 3, 2015

@liquid-sky and just to clarify, this bug does not appear to be fixed in v1.0.7 - it's in v1.1.0

@liquid-sky
Copy link

@bcwaldon, thanks a lot! I was really hoping that it was fixed in v1.0.7... Does it still try to call safe_format_and_mount?

@bcwaldon
Copy link

bcwaldon commented Nov 3, 2015

@liquid-sky as far as I know, v1.0.7 still depends on the safe_format_and_mount script

@mgoodness
Copy link

CoreOS 899.3.0 in the beta channel includes Kubelet v1.1.2 and resolved this issue for me.

@thockin
Copy link
Member

thockin commented Apr 25, 2016

Is this still an open issue? @rootfs

@justinsb
Copy link
Member

justinsb commented Jun 5, 2016

I think/assume this is now fixed so I'm going to close; please reopen if not

@justinsb justinsb closed this as completed Jun 5, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/os/coreos priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Projects
None yet
Development

No branches or pull requests