Provide a better error message when PV failed to be attached #557

l0rd · 2018-03-05T09:48:24Z

This is the message that is shown to the user:

In che-server logs there are no details:

{"@timestamp":"2018-02-01T14:56:20.083+00:00","@version":1,"message":"Failed to get the ID of the container running in the OpenShift pod","logger_name":"org.eclipse.che.api.workspace.server.WorkspaceManager","thread_name":"WorkspaceSharedPool-5","level":"ERROR","level_value":40000,"stack_trace":"org.eclipse.che.api.core.ServerException: Failed to get the ID of the container running in the OpenShift pod
	at org.eclipse.che.api.environment.server.CheEnvironmentEngine.startInstance(CheEnvironmentEngine.java:1005)
	at org.eclipse.che.api.environment.server.CheEnvironmentEngine.startEnvironmentQueue(CheEnvironmentEngine.java:797)
	at org.eclipse.che.api.environment.server.CheEnvironmentEngine.start(CheEnvironmentEngine.java:263)
	at org.eclipse.che.api.workspace.server.WorkspaceRuntimes.startEnvironmentAndPublishEvents(WorkspaceRuntimes.java:694)
	at org.eclipse.che.api.workspace.server.WorkspaceRuntimes.access$100(WorkspaceRuntimes.java:103)
	at org.eclipse.che.api.workspace.server.WorkspaceRuntimes$StartTask.call(WorkspaceRuntimes.java:986)
	at org.eclipse.che.api.workspace.server.WorkspaceRuntimes$StartTask.call(WorkspaceRuntimes.java:948)
	at org.eclipse.che.commons.lang.concurrent.CopyThreadLocalCallable.call(CopyThreadLocalCallable.java:31)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: org.eclipse.che.api.core.ServerException: Failed to get the ID of the container running in the OpenShift pod
	at org.eclipse.che.plugin.docker.machine.MachineProviderImpl.startService(MachineProviderImpl.java:401)
	at org.eclipse.che.api.environment.server.CheEnvironmentEngine.lambda$startEnvironmentQueue$6(CheEnvironmentEngine.java:768)
	at org.eclipse.che.api.environment.server.CheEnvironmentEngine.startInstance(CheEnvironmentEngine.java:942)
	... 11 common frames omitted
Caused by: org.eclipse.che.plugin.openshift.client.exception.OpenShiftException: Failed to get the ID of the container running in the OpenShift pod
	at org.eclipse.che.plugin.openshift.client.OpenShiftConnector.createContainer(OpenShiftConnector.java:706)
	at org.eclipse.che.plugin.docker.machine.MachineProviderImpl.createContainer(MachineProviderImpl.java:648)
	at org.eclipse.che.plugin.docker.machine.MachineProviderImpl.startService(MachineProviderImpl.java:357)
	... 13 common frames omitted
"}

But in the OpenShift event logs we can see that the problem is "Unable to mount volumes for pod"

The text was updated successfully, but these errors were encountered:

l0rd · 2018-03-05T13:41:54Z

We need to verify how Che 6 behaves. Maybe the error is better handled on Che 6.

ibuziuk · 2018-04-05T17:14:36Z

Was able to reproduce the issue by starting pod and executing blocking operation before creating workspace:

apiVersion: "v1"
kind: "Pod"
metadata:
  name: "mypod"
  labels:
    name: "mypod"
spec:
  containers:
    -
      name: "mypod"
      image: "registry.access.redhat.com/rhel7-atomic"
      command:
        - /bin/bash
        - '-c'
        - >-
          echo "Sleeping for a long time..."; sleep 10000; echo "Sleep finished"
      ports:
        -
          containerPort: 80
          name: "http-server"
      volumeMounts:
        -
          mountPath: "/projects"
          subPath: "worksworkspacecf3x7fobl5q5ults/projects"
          name: "claim-che-workspace"
  volumes:
    -
      name: "claim-che-workspace"
      persistentVolumeClaim:
        claimName: "claim-che-workspace"

Openshift events:

Che server logs:

{"@timestamp":"2018-04-05T16:53:23.286+00:00","@version":1,"message":"Workspace 'ibuziuk@redhat.com/nbbiz' with id 'workspaceaicvnnmikpxrw1sa' created by user 'ibuziuk@redhat.com'","logger_name":"org.eclipse.che.api.workspace.server.WorkspaceManager","thread_name":"http-nio-8080-exec-8","level":"INFO","level_value":20000,"req_id":"6f9dadae-135a-43bf-82c1-ebd5a6eeb175","identity_id":"4f871a"}
--
  | {"@timestamp":"2018-04-05T16:53:25.335+00:00","@version":1,"message":"Starting workspace 'ibuziuk@redhat.com/nbbiz' with id 'workspaceaicvnnmikpxrw1sa' by user 'ibuziuk@redhat.com'","logger_name":"org.eclipse.che.api.workspace.server.WorkspaceRuntimes","thread_name":"http-nio-8080-exec-5","level":"INFO","level_value":20000,"req_id":"6f9dadae-135a-43bf-82c1-ebd5a6eeb175","identity_id":"4f871a"}
  | {"@timestamp":"2018-04-05T17:00:05.067+00:00","@version":1,"message":"Current reconnect backoff is 1000 milliseconds (T0)","logger_name":"io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager","thread_name":"scheduleReconnect\|Executor for Watch 9692063","level":"INFO","level_value":20000}
  | {"@timestamp":"2018-04-05T17:01:25.756+00:00","@version":1,"message":"Failed to start Kubernetes runtime of workspace workspaceaicvnnmikpxrw1sa. Cause: Waiting for Kubernetes environment 'default' of the workspace'workspaceaicvnnmikpxrw1sa' reached timeout","logger_name":"org.eclipse.che.workspace.infrastructure.kubernetes.KubernetesInternalRuntime","thread_name":"WorkspaceSharedPool-6","level":"WARN","level_value":30000,"req_id":"6f9dadae-135a-43bf-82c1-ebd5a6eeb175","identity_id":"71a"}
  | {"@timestamp":"2018-04-05T17:01:26.372+00:00","@version":1,"message":"Workspace 'ibuziuk@redhat.com:nbbiz' with id 'workspaceaicvnnmikpxrw1sa' start failed","logger_name":"org.eclipse.che.api.workspace.server.WorkspaceRuntimes","thread_name":"WorkspaceSharedPool-6","level":"INFO","level_value":20000,"req_id":"6f9dadae-135a-43bf-82c1-ebd5a6eeb175","identity_id":"54f871a"}

BTW, this approach with mounting volume in dummy pod & executing blocking operation does not reproduce the problem on minishift (It is possible to mount the same volume from different pods without any problems)

garagatyi · 2018-04-06T11:30:29Z

BTW, this approach with mounting volume in dummy pod & executing blocking operation does not reproduce the problem on minishift (It is possible to mount the same volume from different pods without any problems)

Minishift is a single node installation. You can mount the same volume to any amount of pods there.

ghost · 2018-04-07T04:33:34Z

We used to have events in machine logs in Che 6. @akorneta @garagatyi should we maybe bring them back? It is a critical piece for Che 6 as on prem too. Makes support life easier.

ibuziuk · 2018-04-07T09:53:47Z

hmmm... I did not know about that. Why openshift events were removed ?

ibuziuk · 2018-04-09T10:26:20Z

it looks like publishing openshift events & tracking pod "Failed" state were removed as part of the following PR [1]. Related upstream issue for bringing this functionality back [2]

[1] eclipse-che/che#7629
[2] eclipse-che/che#7653

ghost · 2018-04-09T11:17:04Z

Raised priority on this one

ibuziuk · 2018-04-09T11:46:19Z

@eivantsov thansk, those issues are coupled but not fully related. e.g. AbnormalStopHandler will not be able to handle volume mount / attach case since it tracks only the state of the pod and throws exception only when pod has "Failed" status. In case of Failed Attach Volume / Failed Mount the status of pod is still "Container Creating", Looks like there should be some UnrecoverableEventHandler which would track unrecoverable events and fail fast if the event occurs before timeout

ghost · 2018-04-09T12:11:55Z

@ibuziuk I see.

Yes, the idea is to get openshift events related to a ws pod - pulling image, container created etc. Smth that you have in OpenShift console and smth that is helpful to understand why a pod failed to start.

@akorneta @sleshchenko you are welcome to share your thoughts

ibuziuk · 2018-04-19T11:21:45Z

PR with fix is merged to master eclipse-che/che#9426
The issue is going to be closed once fix would be on osio

l0rd added the severity/P1 label Mar 26, 2018

l0rd mentioned this issue Mar 27, 2018

Between midnight and and ~03:00 UTC, users provisioned on cluster starter-us-east-2 canot create Che workspaces openshiftio/openshift.io#2544

Closed

l0rd added this to the sprint #147 (rh-che) milestone Mar 28, 2018

ibuziuk self-assigned this Apr 3, 2018

l0rd modified the milestones: sprint #147 (rh-che), sprint #148 (rh-che) Apr 18, 2018

l0rd closed this as completed May 11, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide a better error message when PV failed to be attached #557

Provide a better error message when PV failed to be attached #557

l0rd commented Mar 5, 2018

l0rd commented Mar 5, 2018

ibuziuk commented Apr 5, 2018 •

edited

Loading

garagatyi commented Apr 6, 2018

ghost commented Apr 7, 2018

ibuziuk commented Apr 7, 2018

ibuziuk commented Apr 9, 2018

ghost commented Apr 9, 2018

ibuziuk commented Apr 9, 2018

ghost commented Apr 9, 2018

ibuziuk commented Apr 19, 2018

Provide a better error message when PV failed to be attached #557

Provide a better error message when PV failed to be attached #557

Comments

l0rd commented Mar 5, 2018

l0rd commented Mar 5, 2018

ibuziuk commented Apr 5, 2018 • edited Loading

garagatyi commented Apr 6, 2018

ghost commented Apr 7, 2018

ibuziuk commented Apr 7, 2018

ibuziuk commented Apr 9, 2018

ghost commented Apr 9, 2018

ibuziuk commented Apr 9, 2018

ghost commented Apr 9, 2018

ibuziuk commented Apr 19, 2018

ibuziuk commented Apr 5, 2018 •

edited

Loading