-
Notifications
You must be signed in to change notification settings - Fork 66
Between midnight and and ~03:00 UTC, users provisioned on cluster starter-us-east-2 canot create Che workspaces #2544
Comments
tasks don't have severity, is this a bug? |
Not yet - still investigating - removed bug label. |
Seeing a series of these errors:
|
The pattern is consistent: Build log:
|
I just saw the creation of a Che workspace fail 5 times out of 5 - this error resulted - #2154 |
Tried this at the appropriate timezone. Created a workspace before UTC midnight and then another one two minutes after. Got this error: Could not start workspace newprojectname-tvfkw. Reason: Start of environment 'default' failed. Error: Failed to get the ID of the container running in the OpenShift pod |
Checking again at midnight UTC - this time - found this in the Che log:
{"@timestamp":"2018-03-27T00:02:17.197+00:00","@Version":1,"message":"Workspace 'ldimaggi@redhat.com/0dsiu' with id 'workspacemypusb6e60pgdxfv' created by user 'ldimaggi@redhat.com'","logger_name":"org.eclipse.che.api.workspace.server.WorkspaceManager","thread_name":"http-nio-8080-exec-1","level":"INFO","level_value":20000,"req_id":"821c3366-ddac-443b-b8d8-3c7ae8727a66","identity_id":"20ddc23a-bb62-4834-9130-9af2f54e85b1"} |
@ldimaggi What about events in OSO (starter-us-east-2)? |
No events were listed/displayed. |
This really looks like a duplicate of #2154. Our plan to mitigate/investigate is to:
|
Opened a ticket with the SRE team. We have a theory on what may be causing this. Will report back here once I have more info. |
This is the same problem as is defined in #2154 - the situation has improved - but the problem is still present. |
Updated March 22, 2018 - The pattern is consistent - starts at midnight UTC and continues for 3+ hours - only affects the starter-us-east-2 cluster.
The problem was first noted as affecting the creation of new Che workspaces starting after 19:00 (Boston EST time) here - #2154 (comment)
Since then - the same pattern has been seen in running build pipelines - E2E tests that create/run build pipelines are failing at 19:00 Boston time.
Question to be investigated - Are backups or some other system maintenance actions being performed on the starter-us-east-2 clusters at midnight UTC?
The starter-us-east-2a cluster does not seem to be affected by the issue.
The text was updated successfully, but these errors were encountered: