Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SP7-item5 : "phase 3" - Synchronous Training Experiment Resuming #39

Closed
srcansiz opened this issue Jun 22, 2021 · 1 comment
Closed
Labels
done issue is completed, it meets the DoD and was merged to the next release integration branch

Comments

@srcansiz
Copy link
Member

In GitLab by @sssilvar on Jun 22, 2021, 10:33

Experiment should be able to resume from the last round (checkpoint) where it was successful.

Usage

# Example of failure due to client timeout
In []: experiment.run()
Out []: RuntimeError "Client not responding (Timeout Error)"

# Resume training (executing again)
In []: experiment.run()
Out []: warning: "Resuming experiment from round X..."
@srcansiz
Copy link
Member Author

In GitLab by @mvesin on Jan 20, 2023, 16:41

Done with breakpoints.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
done issue is completed, it meets the DoD and was merged to the next release integration branch
Projects
None yet
Development

No branches or pull requests

1 participant