Skip to content

Pods fail to start after server reboot #19478

@ustm

Description

@ustm

I use origin 3.7, both master and node on the same server.
After server reboot all pods fail to start with following errors in origin-node logs.
The only way I found is to reset and recreate docker storage. Is there a way to avoid/fix this bug?

Apr 23 22:24:20 os-test3 origin-node[2542]: E0423 22:24:20.140764    2616 cni.go:304] Error deleting network when building cni runtime conf: could not retrieve port mappings: checkpoint is not found.              
Apr 23 22:24:20 os-test3 origin-node[2542]: E0423 22:24:20.141255    2616 remote_runtime.go:114] StopPodSandbox "d39009bb25733767895364d47c4c7b156df4c58b6e3f512a3894a08890987f11" from runtime service failed: rpc e
rror: code = 2 desc = NetworkPlugin cni failed to teardown pod "hawkular-cassandra-1-p6lzt_openshift-infra" network: could not retrieve port mappings: checkpoint is not found.                                      
Apr 23 22:24:20 os-test3 origin-node[2542]: E0423 22:24:20.141340    2616 kuberuntime_manager.go:775] Failed to stop sandbox {"docker" "d39009bb25733767895364d47c4c7b156df4c58b6e3f512a3894a08890987f11"}           
Apr 23 22:24:20 os-test3 origin-node[2542]: E0423 22:24:20.141402    2616 remote_runtime.go:114] StopPodSandbox "75112e3ea11bdd0d2714fcd5afb1dd1f8ab69561ca2a980f4d6053fa073d27f7" from runtime service failed: rpc e
rror: code = 2 desc = NetworkPlugin cni failed to teardown pod "controller-manager-fs9br_kube-service-catalog" network: could not retrieve port mappings: checkpoint is not found.                                   
Apr 23 22:24:20 os-test3 origin-node[2542]: E0423 22:24:20.141440    2616 kuberuntime_manager.go:775] Failed to stop sandbox {"docker" "75112e3ea11bdd0d2714fcd5afb1dd1f8ab69561ca2a980f4d6053fa073d27f7"}           
Apr 23 22:24:20 os-test3 origin-node[2542]: E0423 22:24:20.141496    2616 kuberuntime_manager.go:570] killPodWithSyncResult failed: failed to "KillPodSandbox" for "7ddf9c90-472a-11e8-a67e-005056827a01" with KillPo
dSandboxError: "rpc error: code = 2 desc = NetworkPlugin cni failed to teardown pod \"controller-manager-fs9br_kube-service-catalog\" network: could not retrieve port mappings: checkpoint is not found."           
Apr 23 22:24:20 os-test3 origin-node[2542]: E0423 22:24:20.141539    2616 pod_workers.go:186] Error syncing pod 7ddf9c90-472a-11e8-a67e-005056827a01 ("controller-manager-fs9br_kube-service-catalog(7ddf9c90-472a-11
e8-a67e-005056827a01)"), skipping: failed to "KillPodSandbox" for "7ddf9c90-472a-11e8-a67e-005056827a01" with KillPodSandboxError: "rpc error: code = 2 desc = NetworkPlugin cni failed to teardown pod \"controller-
manager-fs9br_kube-service-catalog\" network: could not retrieve port mappings: checkpoint is not found."                                                                                                            
Apr 23 22:24:20 os-test3 origin-node[2542]: E0423 22:24:20.141491    2616 kuberuntime_manager.go:570] killPodWithSyncResult failed: failed to "KillPodSandbox" for "db907be0-46f8-11e8-b5d9-005056827a01" with KillPo
dSandboxError: "rpc error: code = 2 desc = NetworkPlugin cni failed to teardown pod \"hawkular-cassandra-1-p6lzt_openshift-infra\" network: could not retrieve port mappings: checkpoint is not found."              
Apr 23 22:24:20 os-test3 origin-node[2542]: E0423 22:24:20.141675    2616 pod_workers.go:186] Error syncing pod db907be0-46f8-11e8-b5d9-005056827a01 ("hawkular-cassandra-1-p6lzt_openshift-infra(db907be0-46f8-11e8-
b5d9-005056827a01)"), skipping: failed to "KillPodSandbox" for "db907be0-46f8-11e8-b5d9-005056827a01" with KillPodSandboxError: "rpc error: code = 2 desc = NetworkPlugin cni failed to teardown pod \"hawkular-cassa
ndra-1-p6lzt_openshift-infra\" network: could not retrieve port mappings: checkpoint is not found."                                                                                                                  
Apr 23 22:24:20 os-test3 origin-node[2542]: E0423 22:24:20.141749    2616 remote_runtime.go:114] StopPodSandbox "e8573132f565875633fc034ca16296030f989350290800271b210400f1b8212b" from runtime service failed: rpc e
rror: code = 2 desc = NetworkPlugin cni failed to teardown pod "nodejs-mongodb-example-8-5n2jz_test1" network: could not retrieve port mappings: checkpoint is not found.                                            
Apr 23 22:24:20 os-test3 origin-node[2542]: E0423 22:24:20.141845    2616 kuberuntime_manager.go:775] Failed to stop sandbox {"docker" "e8573132f565875633fc034ca16296030f989350290800271b210400f1b8212b"}           
Apr 23 22:24:20 os-test3 origin-node[2542]: E0423 22:24:20.141896    2616 kuberuntime_manager.go:570] killPodWithSyncResult failed: failed to "KillPodSandbox" for "22fe7e4e-46fb-11e8-b5d9-005056827a01" with KillPo
dSandboxError: "rpc error: code = 2 desc = NetworkPlugin cni failed to teardown pod \"nodejs-mongodb-example-8-5n2jz_test1\" network: could not retrieve port mappings: checkpoint is not found."                    
Apr 23 22:24:20 os-test3 origin-node[2542]: E0423 22:24:20.141941    2616 pod_workers.go:186] Error syncing pod 22fe7e4e-46fb-11e8-b5d9-005056827a01 ("nodejs-mongodb-example-8-5n2jz_test1(22fe7e4e-46fb-11e8-b5d9-0
05056827a01)"), skipping: failed to "KillPodSandbox" for "22fe7e4e-46fb-11e8-b5d9-005056827a01" with KillPodSandboxError: "rpc error: code = 2 desc = NetworkPlugin cni failed to teardown pod \"nodejs-mongodb-examp
le-8-5n2jz_test1\" network: could not retrieve port mappings: checkpoint is not found."                                                                                                                              
Apr 23 22:24:21 os-test3 origin-node[2542]: I0423 22:24:21.152335    2616 kuberuntime_manager.go:389] No ready sandbox for pod "hawkular-metrics-qlwst_openshift-infra(f1ab2b33-46f8-11e8-b5d9-005056827a01)" can be 
found. Need to start a new one
Version

openshift v3.7.1+282e43f-42
kubernetes v1.7.6+a08f5eeb62
etcd 3.2.8

Steps To Reproduce
  1. just reboot
Current Result
Expected Result
Additional Information

Metadata

Metadata

Assignees

Labels

kind/questionlifecycle/rottenDenotes an issue or PR that has aged beyond stale and will be auto-closed.sig/pod

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions