Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spot instances a better scenario #161

Closed
DavidGOrtega opened this issue Jul 16, 2020 · 2 comments
Closed

Spot instances a better scenario #161

DavidGOrtega opened this issue Jul 16, 2020 · 2 comments
Labels
cml-runner Subcommand

Comments

@DavidGOrtega
Copy link
Contributor

Spot instances can be interrupted at any time with a very short period of time to handle it
According to this conversation
a solution is to attach an storage or use DVC cache having a mechanism to restart the workflow and continue the training from the checkpoint

@DavidGOrtega
Copy link
Contributor Author

@DavidGOrtega DavidGOrtega changed the title Spot instances interruption handling Spot instances a better scenario Jul 16, 2020
@DavidGOrtega
Copy link
Contributor Author

Gitlab is having issues with self-hosted runners. If the self-hosted runner disconnects the Gitlab pipeline stucks forever

https://gitlab.com/gitlab-org/gitlab/-/issues/229851

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cml-runner Subcommand
Projects
None yet
Development

No branches or pull requests

1 participant