-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Job FailedToExecute and FailedToComplete of failing to pull the status of the containers #75
Comments
For Memverge: This is Ticket 3431 |
Adding on section on the same job son 23*: I also want to add on that Ru would submit the same job on the 23* opcenter on 4/16 and would get a different error. I have attached log bundles to some of the jobs, but it is about 200 jobs as well. She cancelled them because it seemed to hang.
I believe in this opcenter, the The error I keep seeing for all the files is on the stderr and not from opcenter. I have checked their buckets and the files exist with no issue. No folder with the same names exist. I believe she has the 200+ jobs write to the same output file, which could be a concern
|
[Update from Slack]
Because there were many jobs loading the image, this caused a network traffic issue, causing the jobs to not execute.
Of course, these files don't necessarily correspond to |
Closing this issue for now, unless we see it re-appear. Both east coast opcenters should now be on 2.5.5 |
For Memverge: This is ticket 3508 |
See some sample jobs below:
Failed to complete
https://54.81.85.209/#/opcenter/jobs/8hzjn4rh4euino8yyw45z
fagent.log
From the log we can see that the container exited gracefully with code 0, but the opcenter didn't get the status of it that the it has been done, resulting in Failed to Complete.
Failed to execute
https://54.81.85.209/#/opcenter/jobs/v9o7wky28d58w3obrhmiw
fagent.log
The container has started running, but the opcenter didn't get its status so resulting in Failed to Execute.
The text was updated successfully, but these errors were encountered: