-
Notifications
You must be signed in to change notification settings - Fork 39.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pet Set stuck in ContainerCreating #28709
Comments
Hmm that even fired 9100 times, you're probably hitting one of the races we fixed in the last phases of 1.3. Can you confirm through the gce ui that the pd kubernetes-dynamic-pvc-a5f45f53-38f0-11e6-b7d0-42010a800002 is attached to the node kubernetes-minion-group-0jg9? |
It is attached according to the GCE UI. kubernetes-dynamic-pvc-a5f45f53-38f0-11e6-b7d0-42010a800002 SSD persistent disk 380 GB us-central1-c kubernetes-minion-group-0jg9 |
Nudge, how can we debug this?? |
@bprashanth ping ... want to kill this cluster. Do we need to debug at all? |
The same behavior when ConfigMap is mounted as volume to read some config. K8s creating service-account-token, which appear in list of unattached/unmounted volumes after restarting the pods |
I think these are 2 different bugs, the former is a bug in the pv provisioner that we fixed in later stages of 1.3 (I believe Chris was running with a beta build). @iazarny can you give me a petset yaml to repro? |
Hi, I meet similar problems after I had deleted a pod and restarted it again.
See debug information here.
How can it be solved? Thanks. |
@ShengjieLuo You are likely hitting #28750. PR #28939 should help (patch will be in k8s v1.3.4). |
I am experiencing this problem too,...
weekend past, and tried deleting the pod today, yet still same...
|
@bprashanth how do we debug this more? @Hokutosei what version are you using, and what cloud are you on? |
@chrislovecnm thanks for the reply, I am all using GKE (google container engine), and both master and nodes are using |
I'm seeing the same error (or at least the same error message) in GKE after upgrade to 1.3.5. Created the PetSet in 1.3.4 using https://github.com/Yolean/kubernetes-mysql-cluster/blob/master/50mariadb.yml. Pods restarted fine in 1.3.4. Is there any way to get more details on the error
What happens if the PetSet resource is replaced, for exampel to change env or arguments? How does K8s match the volumes? Edit: Works now, after petset+pod deletion and new |
Also got this issue with both v1.3.4 and v1.3.5, I originally reported in https://googlecloudplatform.uservoice.com/forums/302595-compute-engine/suggestions/15838738-timeout-expired-waiting-for-volumes-to-attach-moun . Happens using both Pod has been stuck in this error state for 4 days, having 2631 repeated errors in the process. I will be forced to recreate the pod. :(
(unfortunately UserVoice didn't preserve text indentation. I don't know who at Google suggested UserVoice over StackOverflow to report issues, but I can tell it's a terrible idea) I have:
... problem persists (tell me what I haven't done?). Here's the current state:
(Please ignore Nobody is using the "disk" (I'd argue it's a drive since it's SSD):
|
Tried recreating
I have a gut feeling though this has something to do with missing |
@chrislovecnm @Hokutosei I think the issue title needs to be renamed to mention that it's originally Pet Set, but potentially not limited to Pet Set, but also Unfortunately it's not because of missing
Latest
|
@ceefour Thanks for providing logs. @Hokutosei @solsson Check your |
I'm running on the same issue on AWS (1.3.4 CoreOS using kube-aws). Events report...
Worker journal...
Petset volumeClaimTemplates...
AWS shows the volume properly mounted on the instance. Any ideas/hints on what might be going wrong? :) |
Has anyone recreated this? Probably going to close. |
managed to bump into the same issue while going through this tutorial: https://cloud.google.com/endpoints/docs/quickstart-container-engine, but don't see any logs for /var/logs/kubelet.log on the cluster VM. {kubelet gke-rec-service-cluster-default-pool-28938bf0-c9vz} Warning FailedMount Unable to mount volumes for pod "esp-product-recommendations-3552363475-216am_default(bba10941-9c65-11e6-becb-42010af00224)": timeout expired waiting for volumes to attach/mount for pod "esp-product-recommendations-3552363475-216am"/"default". list of unattached/unmounted volumes=[nginx-ssl] |
ok please ignore the last post finally understood that I was using the wrong config and didn't knew that you need to create those secrets before hand |
Closing this |
This 'stuck attaching' is a common issue using k8s with AWS and GCE (at least). On AWS it appears to be an AWS problem, triggered by the fast unmounting and remounting of EBS volumes. You see an error in the AWS console for the problem volume. One suggestion (based on AWS FAQ) is it is caused by mounting a volume using the same device name as one you just detached a very short time ago. |
A previously healthy pod was running. It got unhealthy for some reason, and I did a silly thing. Deleted the pod. Now it is stuck:
Can we document the process of getting this unstuck? I am guessing this is an error, because the PV has data in it. Here is the PV.
FYI I am on 1.3 beta without the last minute volume fix stuff. I can try to recreate with the new volume code, but I don't think it will make a difference.
The text was updated successfully, but these errors were encountered: