Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ambassador failed to start up #811

Closed
jiaanguo opened this issue May 15, 2018 · 8 comments
Closed

Ambassador failed to start up #811

jiaanguo opened this issue May 15, 2018 · 8 comments
Assignees

Comments

@jiaanguo
Copy link

jiaanguo commented May 15, 2018

I have build a local cluster with kubeflow v0.1.2 running on kubernetes 1.10.2.
Everything works fine except the Ambassador container within Ambassador pod never being able to spawn up.

kubernetes@local-cluster-0:~/my-kubeflow$ kubectl get pods -n kubeflow
NAME                                READY     STATUS             RESTARTS   AGE
ambassador-849fb9c8c5-2d5wv         1/2       CrashLoopBackOff   31         2h
ambassador-849fb9c8c5-mjmgx         1/2       CrashLoopBackOff   31         2h
ambassador-849fb9c8c5-zm8rw         1/2       CrashLoopBackOff   31         2h
tf-hub-0                            1/1       Running            0          2h
tf-job-dashboard-7b57c549c8-tmngv   1/1       Running            0          2h
tf-job-operator-594d8c7ddd-tw28k    1/1       Running            0          2h
kubernetes@local-cluster-0:~/my-kubeflow$ kubectl describe pod ambassador-849fb9c8c5-2d5wv -n kubeflow
ambassador:
    Container ID:   docker://f202713d1d9f466c942d79222411af5a7202ae03f83778caab2f4dc9f3183568
    Image:          quay.io/datawire/ambassador:0.30.1
    Image ID:       docker-pullable://quay.io/datawire/ambassador@sha256:652d3f1de1055624f035eba33299babc7c3f30a0a78c1f206730aa31bed96bf6
    Port:           <none>
    Host Port:      <none>
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    137
      Started:      Tue, 15 May 2018 13:07:19 +1000
      Finished:     Tue, 15 May 2018 13:09:19 +1000
    Ready:          False
    Restart Count:  31
    Limits:
      cpu:     1
      memory:  400Mi
    Requests:
      cpu:      200m
      memory:   100Mi
    Liveness:   http-get http://:8877/ambassador/v0/check_alive delay=30s timeout=1s period=30s #success=1 #failure=3
    Readiness:  http-get http://:8877/ambassador/v0/check_ready delay=30s timeout=1s period=30s #success=1 #failure=3
    Environment:
      AMBASSADOR_NAMESPACE:         kubeflow (v1:metadata.namespace)
      AMBASSADOR_SINGLE_NAMESPACE:  true
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from ambassador-token-bwzbp (ro)

Any suggestions to fix the issue? Thanks in advance!

@pdmack
Copy link
Member

pdmack commented May 15, 2018

Probably need to see the events:
kubectl get events -n kubeflow | grep ambassador

Can you try just running it directly and see what it logs?
docker run quay.io/datawire/ambassador:0.30.1

Exit Code: 137 means SIGKILL I believe.

@xieydd
Copy link
Member

xieydd commented May 16, 2018

@pdmack same issue: this is my logs

57s         57s          1         ambassador-6bd9cc864-6475r.152efbed7b84d677          Pod                                               Normal    Scheduled               default-scheduler            Successfully assigned ambassador-6bd9cc864-6475r to 00-e0-81-ee-82-5b
57s         57s          1         ambassador-6bd9cc864-6475r.152efbed924e980e          Pod                                               Normal    SuccessfulMountVolume   kubelet, 00-e0-81-ee-82-5b   MountVolume.SetUp succeeded for volume "ambassador-token-58kc9" 
13s         56s          4         ambassador-6bd9cc864-6475r.152efbedd930c3c5          Pod           spec.containers{ambassador}         Normal    Pulled                  kubelet, 00-e0-81-ee-82-5b   Container image "quay.io/datawire/ambassador:0.30.1" already present on machine
13s         56s          4         ambassador-6bd9cc864-6475r.152efbede634313e          Pod           spec.containers{ambassador}         Normal    Created                 kubelet, 00-e0-81-ee-82-5b   Created container
13s         55s          4         ambassador-6bd9cc864-6475r.152efbedeebf1966          Pod           spec.containers{ambassador}         Normal    Started                 kubelet, 00-e0-81-ee-82-5b   Started container
55s         55s          1         ambassador-6bd9cc864-6475r.152efbedeedb6df0          Pod           spec.containers{statsd}             Normal    Pulled                  kubelet, 00-e0-81-ee-82-5b   Container image "quay.io/datawire/statsd:0.30.1" already present on machine
55s         55s          1         ambassador-6bd9cc864-6475r.152efbedfa25b78c          Pod           spec.containers{statsd}             Normal    Created                 kubelet, 00-e0-81-ee-82-5b   Created container
55s         55s          1         ambassador-6bd9cc864-6475r.152efbee02b0ea1c          Pod           spec.containers{statsd}             Normal    Started                 kubelet, 00-e0-81-ee-82-5b   Started container
11s         51s          5         ambassador-6bd9cc864-6475r.152efbeede195c33          Pod           spec.containers{ambassador}         Warning   BackOff                 kubelet, 00-e0-81-ee-82-5b   Back-off restarting failed container
57s         57s          1         ambassador-6bd9cc864-jrnrz.152efbed80c016a1          Pod                                               Normal    Scheduled               default-scheduler            Successfully assigned ambassador-6bd9cc864-jrnrz to 0c-c4-7a-15-e1-9c
57s         57s          1         ambassador-6bd9cc864-jrnrz.152efbed978fdeb1          Pod                                               Normal    SuccessfulMountVolume   kubelet, 0c-c4-7a-15-e1-9c   MountVolume.SetUp succeeded for volume "ambassador-token-58kc9" 
10s         55s          4         ambassador-6bd9cc864-jrnrz.152efbedee310b66          Pod           spec.containers{ambassador}         Normal    Pulled                  kubelet, 0c-c4-7a-15-e1-9c   Container image "quay.io/datawire/ambassador:0.30.1" already present on machine
10s         55s          4         ambassador-6bd9cc864-jrnrz.152efbedfcb3e2a1          Pod           spec.containers{ambassador}         Normal    Created                 kubelet, 0c-c4-7a-15-e1-9c   Created container
10s         55s          4         ambassador-6bd9cc864-jrnrz.152efbee07b8f5db          Pod           spec.containers{ambassador}         Normal    Started                 kubelet, 0c-c4-7a-15-e1-9c   Started container
55s         55s          1         ambassador-6bd9cc864-jrnrz.152efbee07d93bf3          Pod           spec.containers{statsd}             Normal    Pulled                  kubelet, 0c-c4-7a-15-e1-9c   Container image "quay.io/datawire/statsd:0.30.1" already present on machine
55s         55s          1         ambassador-6bd9cc864-jrnrz.152efbee1c05309c          Pod           spec.containers{statsd}             Normal    Created                 kubelet, 0c-c4-7a-15-e1-9c   Created container
54s         54s          1         ambassador-6bd9cc864-jrnrz.152efbee290d03b5          Pod           spec.containers{statsd}             Normal    Started                 kubelet, 0c-c4-7a-15-e1-9c   Started container
8s          52s          5         ambassador-6bd9cc864-jrnrz.152efbeed05a5494          Pod           spec.containers{ambassador}         Warning   BackOff                 kubelet, 0c-c4-7a-15-e1-9c   Back-off restarting failed container
13m         13m          1         ambassador-6bd9cc864-tsp57.152efb428a418307          Pod                                               Normal    Scheduled               default-scheduler            Successfully assigned ambassador-6bd9cc864-tsp57 to 00-e0-81-ee-82-5b
13m         13m          1         ambassador-6bd9cc864-tsp57.152efb429e3e4fbf          Pod                                               Normal    SuccessfulMountVolume   kubelet, 00-e0-81-ee-82-5b   MountVolume.SetUp succeeded for volume "ambassador-token-khsjw" 
12m         13m          4         ambassador-6bd9cc864-tsp57.152efb42c480d5f3          Pod           spec.containers{ambassador}         Normal    Pulled                  kubelet, 00-e0-81-ee-82-5b   Container image "quay.io/datawire/ambassador:0.30.1" already present on machine
12m         13m          4         ambassador-6bd9cc864-tsp57.152efb42d0759bf4          Pod           spec.containers{ambassador}         Normal    Created                 kubelet, 00-e0-81-ee-82-5b   Created container
12m         13m          4         ambassador-6bd9cc864-tsp57.152efb42d8082bf4          Pod           spec.containers{ambassador}         Normal    Started                 kubelet, 00-e0-81-ee-82-5b   Started container
13m         13m          1         ambassador-6bd9cc864-tsp57.152efb42d827b021          Pod           spec.containers{statsd}             Normal    Pulled                  kubelet, 00-e0-81-ee-82-5b   Container image "quay.io/datawire/statsd:0.30.1" already present on machine
13m         13m          1         ambassador-6bd9cc864-tsp57.152efb42e41cb2c6          Pod           spec.containers{statsd}             Normal    Created                 kubelet, 00-e0-81-ee-82-5b   Created container
13m         13m          1         ambassador-6bd9cc864-tsp57.152efb42ebab4960          Pod           spec.containers{statsd}             Normal    Started                 kubelet, 00-e0-81-ee-82-5b   Started container
12m         13m          9         ambassador-6bd9cc864-tsp57.152efb43c94ad316          Pod           spec.containers{ambassador}         Warning   BackOff                 kubelet, 00-e0-81-ee-82-5b   Back-off restarting failed container
13m         13m          1         ambassador-6bd9cc864-v9tf9.152efb428e014d18          Pod                                               Normal    Scheduled               default-scheduler            Successfully assigned ambassador-6bd9cc864-v9tf9 to 00-25-90-c0-f7-c8
13m         13m          1         ambassador-6bd9cc864-v9tf9.152efb429e37c0d9          Pod                                               Normal    SuccessfulMountVolume   kubelet, 00-25-90-c0-f7-c8   MountVolume.SetUp succeeded for volume "ambassador-token-khsjw" 
12m         13m          4         ambassador-6bd9cc864-v9tf9.152efb4300999bd8          Pod           spec.containers{ambassador}         Normal    Pulled                  kubelet, 00-25-90-c0-f7-c8   Container image "quay.io/datawire/ambassador:0.30.1" already present on machine
12m         13m          4         ambassador-6bd9cc864-v9tf9.152efb431a7be58f          Pod           spec.containers{ambassador}         Normal    Created                 kubelet, 00-25-90-c0-f7-c8   Created container
12m         13m          4         ambassador-6bd9cc864-v9tf9.152efb4328718bbc          Pod           spec.containers{ambassador}         Normal    Started                 kubelet, 00-25-90-c0-f7-c8   Started container
13m         13m          1         ambassador-6bd9cc864-v9tf9.152efb4328860984          Pod           spec.containers{statsd}             Normal    Pulled                  kubelet, 00-25-90-c0-f7-c8   Container image "quay.io/datawire/statsd:0.30.1" already present on machine
13m         13m          1         ambassador-6bd9cc864-v9tf9.152efb433ee4204a          Pod           spec.containers{statsd}             Normal    Created                 kubelet, 00-25-90-c0-f7-c8   Created container
13m         13m          1         ambassador-6bd9cc864-v9tf9.152efb4349d13e8c          Pod           spec.containers{statsd}             Normal    Started                 kubelet, 00-25-90-c0-f7-c8   Started container
11m         13m          9         ambassador-6bd9cc864-v9tf9.152efb43f009f6ab          Pod           spec.containers{ambassador}         Warning   BackOff                 kubelet, 00-25-90-c0-f7-c8   Back-off restarting failed container
57s         57s          1         ambassador-6bd9cc864-z72fv.152efbed80c0ee91          Pod                                               Normal    Scheduled               default-scheduler            Successfully assigned ambassador-6bd9cc864-z72fv to 00-25-90-c0-f7-c8
57s         57s          1         ambassador-6bd9cc864-z72fv.152efbed9055bbe7          Pod                                               Normal    SuccessfulMountVolume   kubelet, 00-25-90-c0-f7-c8   MountVolume.SetUp succeeded for volume "ambassador-token-58kc9" 
27s         56s          2         ambassador-6bd9cc864-z72fv.152efbede8b3c4c3          Pod           spec.containers{ambassador}         Normal    Pulled                  kubelet, 00-25-90-c0-f7-c8   Container image "quay.io/datawire/ambassador:0.30.1" already present on machine
26s         55s          2         ambassador-6bd9cc864-z72fv.152efbedfbbe1051          Pod           spec.containers{ambassador}         Normal    Created                 kubelet, 00-25-90-c0-f7-c8   Created container
26s         55s          2         ambassador-6bd9cc864-z72fv.152efbee053c394f          Pod           spec.containers{ambassador}         Normal    Started                 kubelet, 00-25-90-c0-f7-c8   Started container
55s         55s          1         ambassador-6bd9cc864-z72fv.152efbee0551a161          Pod           spec.containers{statsd}             Normal    Pulled                  kubelet, 00-25-90-c0-f7-c8   Container image "quay.io/datawire/statsd:0.30.1" already present on machine
55s         55s          1         ambassador-6bd9cc864-z72fv.152efbee171a4ec2          Pod           spec.containers{statsd}             Normal    Created                 kubelet, 00-25-90-c0-f7-c8   Created container
55s         55s          1         ambassador-6bd9cc864-z72fv.152efbee1f32d809          Pod           spec.containers{statsd}             Normal    Started                 kubelet, 00-25-90-c0-f7-c8   Started container
13m         13m          1         ambassador-6bd9cc864.152efb428774a905                ReplicaSet                                        Normal    SuccessfulCreate        replicaset-controller        Created pod: ambassador-6bd9cc864-tsp57
13m         13m          1         ambassador-6bd9cc864.152efb428a351475                ReplicaSet                                        Normal    SuccessfulCreate        replicaset-controller        Created pod: ambassador-6bd9cc864-29sbm
13m         13m          1         ambassador-6bd9cc864.152efb428a3d27fa                ReplicaSet                                        Normal    SuccessfulCreate        replicaset-controller        Created pod: ambassador-6bd9cc864-v9tf9
57s         57s          1         ambassador-6bd9cc864.152efbed75d37778                ReplicaSet                                        Normal    SuccessfulCreate        replicaset-controller        Created pod: ambassador-6bd9cc864-6475r
57s         57s          1         ambassador-6bd9cc864.152efbed7b826566                ReplicaSet                                        Normal    SuccessfulCreate        replicaset-controller        Created pod: ambassador-6bd9cc864-z72fv
57s         57s          1         ambassador-6bd9cc864.152efbed7b8ec9a6                ReplicaSet                                        Normal    SuccessfulCreate        replicaset-controller        Created pod: ambassador-6bd9cc864-jrnrz
13m         13m          1         ambassador.152efb4282c525ab                          Deployment                                        Normal    ScalingReplicaSet       deployment-controller        Scaled up replica set ambassador-6bd9cc864 to 3
58s         58s          1         ambassador.152efbed6d89d809                          Deployment                                        Normal    ScalingReplicaSet       deployment-controller        Scaled up replica set ambassador-6bd9cc864 to 3



[root@0c-c4-7a-15-e1-9c ~]# docker run quay.io/datawire/ambassador:0.30.1
./entrypoint.sh: set: line 65: can't access tty; job control turned off
2018-05-16 01:27:44 kubewatch 0.30.1 INFO: No K8s
2018-05-16 01:27:44 kubewatch 0.30.1 INFO: generating config with gencount 1 (0 changes)
2018-05-16 01:27:46 kubewatch 0.30.1 WARNING: Scout: could not post report: HTTPSConnectionPool(host='kubernaut.io', port=443): Read timed out. (read timeout=1)
2018-05-16 01:27:46 kubewatch 0.30.1 INFO: Scout reports {"latest_version": "0.30.1", "exception": "could not post report: HTTPSConnectionPool(host='kubernaut.io', port=443): Read timed out. (read timeout=1)", "cached": false, "timestamp": 1526434064.41099}
[2018-05-16 01:27:46.296][10][info][upstream] source/common/upstream/cluster_manager_impl.cc:132] cm init: all clusters initialized
[2018-05-16 01:27:46.297][10][info][config] source/server/configuration_impl.cc:55] loading 1 listener(s)
[2018-05-16 01:27:46.307][10][info][config] source/server/configuration_impl.cc:95] loading tracing configuration
[2018-05-16 01:27:46.307][10][info][config] source/server/configuration_impl.cc:122] loading stats sink configuration
AMBASSADOR: starting diagd
AMBASSADOR: starting Envoy
AMBASSADOR: waiting
PIDS: 11:diagd 12:envoy 13:kubewatch
[2018-05-16 01:27:46.482][14][info][main] source/server/server.cc:184] initializing epoch 0 (hot restart version=9.200.16384.127.options=capacity=16384, num_slots=8209 hash=228984379728933363)
[2018-05-16 01:27:46.542][14][info][upstream] source/common/upstream/cluster_manager_impl.cc:132] cm init: all clusters initialized
[2018-05-16 01:27:46.542][14][info][config] source/server/configuration_impl.cc:55] loading 1 listener(s)
[2018-05-16 01:27:46.566][14][info][config] source/server/configuration_impl.cc:95] loading tracing configuration
[2018-05-16 01:27:46.566][14][info][config] source/server/configuration_impl.cc:122] loading stats sink configuration
[2018-05-16 01:27:46.567][14][info][main] source/server/server.cc:343] all clusters initialized. initializing init manager
[2018-05-16 01:27:46.567][14][info][config] source/server/listener_manager_impl.cc:543] all dependencies initialized. starting workers
[2018-05-16 01:27:46.567][14][info][main] source/server/server.cc:359] starting main dispatch loop
2018-05-16 01:27:47 kubewatch 0.30.1 INFO: No K8s
2018-05-16 01:27:47 kubewatch 0.30.1 INFO: No K8s, idling
2018-05-16 01:27:48 diagd 0.30.1 INFO: Scout reports {"latest_version": "0.32.2", "application": "ambassador", "notices": [], "cached": false, "timestamp": 1526434067.274968}
[2018-05-16 01:27:56.567][14][info][main] source/server/drain_manager_impl.cc:65] shutting down parent after drain

@jiaanguo
Copy link
Author

jiaanguo commented May 16, 2018

@pdmack
These are my event logs.

2m          1d           1149      ambassador-849fb9c8c5-2d5wv.152eabc689d21167   Pod       spec.containers{ambassador}   Warning   Unhealthy   kubelet, local-cluster-1   Readiness probe failed: Get http://192.168.31.195:8877/ambassador/v0/check_ready: dial tcp 192.168.31.195:8877: getsockopt: connection refused
7m          1d           4459      ambassador-849fb9c8c5-2d5wv.152eac849a22e060   Pod       spec.containers{ambassador}   Warning   BackOff     kubelet, local-cluster-1   Back-off restarting failed container
7m          1d           1136      ambassador-849fb9c8c5-mjmgx.152eabcafd678da5   Pod       spec.containers{ambassador}   Warning   Unhealthy   kubelet, local-cluster-2   Readiness probe failed: Get http://192.168.126.69:8877/ambassador/v0/check_ready: dial tcp 192.168.126.69:8877: getsockopt: connection refused
2m          1d           4524      ambassador-849fb9c8c5-mjmgx.152eac89054a9ac4   Pod       spec.containers{ambassador}   Warning   BackOff     kubelet, local-cluster-2   Back-off restarting failed container
7m          1d           759       ambassador-849fb9c8c5-zm8rw.152eabc6f721bb05   Pod       spec.containers{ambassador}   Warning   Unhealthy   kubelet, local-cluster-3   Liveness probe failed: Get http://192.168.112.194:8877/ambassador/v0/check_alive: dial tcp 192.168.112.194:8877: getsockopt: connection refused
16m         1d           1010      ambassador-849fb9c8c5-zm8rw.152eabc7f6f57ec8   Pod       spec.containers{ambassador}   Warning   Unhealthy   kubelet, local-cluster-3   Readiness probe failed: Get http://192.168.112.194:8877/ambassador/v0/check_ready: dial tcp 192.168.112.194:8877: getsockopt: connection refused
2m          1d           4519      ambassador-849fb9c8c5-zm8rw.152eac83affa7919   Pod       spec.containers{ambassador}   Warning   BackOff     kubelet, local-cluster-3   Back-off restarting failed container

@jiaanguo
Copy link
Author

jiaanguo commented May 16, 2018

Running ambassador locally seems no issue at all.

Digest: sha256:652d3f1de1055624f035eba33299babc7c3f30a0a78c1f206730aa31bed96bf6
Status: Downloaded newer image for quay.io/datawire/ambassador:0.30.1
./entrypoint.sh: set: line 65: can't access tty; job control turned off
2018-05-16 05:22:30 kubewatch 0.30.1 INFO: No K8s
2018-05-16 05:22:30 kubewatch 0.30.1 INFO: generating config with gencount 1 (0 changes)
2018-05-16 05:22:32 kubewatch 0.30.1 INFO: Scout reports {"latest_version": "0.32.2", "application": "ambassador", "notices": [], "cached": false, "timestamp": 1526448150.734372}
[2018-05-16 05:22:32.163][10][info][upstream] source/common/upstream/cluster_manager_impl.cc:132] cm init: all clusters initialized
[2018-05-16 05:22:32.163][10][info][config] source/server/configuration_impl.cc:55] loading 1 listener(s)
[2018-05-16 05:22:32.171][10][info][config] source/server/configuration_impl.cc:95] loading tracing configuration
[2018-05-16 05:22:32.171][10][info][config] source/server/configuration_impl.cc:122] loading stats sink configuration
AMBASSADOR: starting diagd
AMBASSADOR: starting Envoy
AMBASSADOR: waiting
PIDS: 11:diagd 12:envoy 13:kubewatch
[2018-05-16 05:22:32.280][14][info][main] source/server/server.cc:184] initializing epoch 0 (hot restart version=9.200.16384.127.options=capacity=16384, num_slots=8209 hash=228984379728933363)
[2018-05-16 05:22:32.342][14][info][upstream] source/common/upstream/cluster_manager_impl.cc:132] cm init: all clusters initialized
[2018-05-16 05:22:32.342][14][info][config] source/server/configuration_impl.cc:55] loading 1 listener(s)
[2018-05-16 05:22:32.366][14][info][config] source/server/configuration_impl.cc:95] loading tracing configuration
[2018-05-16 05:22:32.366][14][info][config] source/server/configuration_impl.cc:122] loading stats sink configuration
[2018-05-16 05:22:32.367][14][info][main] source/server/server.cc:343] all clusters initialized. initializing init manager
[2018-05-16 05:22:32.367][14][info][config] source/server/listener_manager_impl.cc:543] all dependencies initialized. starting workers
[2018-05-16 05:22:32.367][14][info][main] source/server/server.cc:359] starting main dispatch loop
2018-05-16 05:22:32 kubewatch 0.30.1 INFO: No K8s
2018-05-16 05:22:32 kubewatch 0.30.1 INFO: No K8s, idling
2018-05-16 05:22:34 diagd 0.30.1 INFO: Scout reports {"latest_version": "0.32.2", "application": "ambassador", "notices": [], "cached": false, "timestamp": 1526448153.014632}
[2018-05-16 05:22:42.365][14][info][main] source/server/drain_manager_impl.cc:65] shutting down parent after drain

@pdmack
Copy link
Member

pdmack commented May 16, 2018

Sounds somewhat like emissary-ingress/emissary#240

Can you research or open issues with that project? Is this on minikube? There may be a problem with your kube-dns.

@richarddli
Copy link

@jiaanguo opened emissary-ingress/emissary#437, so we'll track this specific issue there.

@jlewi
Copy link
Contributor

jlewi commented May 16, 2018

Duplicate of emissary-ingress/emissary#437

@jlewi jlewi marked this as a duplicate of emissary-ingress/emissary#437 May 16, 2018
@jlewi
Copy link
Contributor

jlewi commented May 16, 2018

/close

surajkota pushed a commit to surajkota/kubeflow that referenced this issue Jun 13, 2022
Change to update Metadata gRPC server to get latest updates
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants