Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jobs do not seem to restart after forceful sever reset #19

Open
offlinehacker opened this issue May 5, 2019 · 5 comments

Comments

Projects
None yet
3 participants
@offlinehacker
Copy link

commented May 5, 2019

I had job parallelism set too high and my server became unresponsive, after server reset and redeployed with lowered parallelism, jobs do not seem to restart. If i click retry nothing happens and response is 404.

@offlinehacker

This comment has been minimized.

Copy link
Author

commented May 5, 2019

It now seems if i re-trigger build with a new commit, failed and stalled jobs are shared with previous build, and now CI is stuck. Take a look here: https://hercules-ci.com/github/xtruder/kubenix/jobs/8 and here: https://hercules-ci.com/github/xtruder/kubenix/jobs/9

@roberth

This comment has been minimized.

Copy link
Contributor

commented May 5, 2019

We've had to delay features that would recover this. I've manually reset your tasks.

@domenkozar

This comment has been minimized.

Copy link
Member

commented May 6, 2019

This is an annoying one. We don't yet have "agent pings" that would allow us to see agent liveliness.

We're going to add the "Cancel" button to be able to manually recover, but the automatic fix with agent liveliness is scheduled for next sprint.

@offlinehacker

This comment has been minimized.

Copy link
Author

commented May 6, 2019

It would be also nice to see logs after job is canceled

@domenkozar

This comment has been minimized.

Copy link
Member

commented May 14, 2019

There won't be logs, because if job didn't report build finished event and it's cancelled there won't be any logs to show. This will change once streaming of logs #17 is implemented.

Note that "cancel" workaround is planned for sprint #4 so expect a fix soon. We had to postpone agent liveliness for another sprint or two.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.