Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workflow timeout a better scenario #208

Closed
DavidGOrtega opened this issue Aug 6, 2020 · 0 comments · Fixed by #583
Closed

Workflow timeout a better scenario #208

DavidGOrtega opened this issue Aug 6, 2020 · 0 comments · Fixed by #583
Assignees
Labels
cml-runner Subcommand p1-important High priority

Comments

@DavidGOrtega
Copy link
Contributor

DavidGOrtega commented Aug 6, 2020

Github Actions max workflow timeout is 72 hours. This is a very limited time for training a model.
Depending on how the vendor's runners handle this a nice way to handle this should be restarting the workflow to be able to get the green light.

However, two possible scenarios comes to mind (if not more)

  1. The runner is able to finish the job (training)
  2. The runner stops since the workflow fails

In both cases the solution would be a mechanism to restart the workflow having a cache to save the intermediate models/state.

This is related to #174 and #161

@DavidGOrtega DavidGOrtega changed the title Workflow timeout Workflow timeout a better scenario Aug 6, 2020
@DavidGOrtega DavidGOrtega added cml-runner Subcommand p1-important High priority labels Feb 23, 2021
@casperdcl casperdcl mentioned this issue Jun 1, 2021
4 tasks
@DavidGOrtega DavidGOrtega self-assigned this Jun 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cml-runner Subcommand p1-important High priority
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant