Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Train/Tune] Restore an experiment from a different machine/path #40585

Closed
justinvyu opened this issue Oct 23, 2023 · 0 comments · Fixed by #40647
Closed

[Train/Tune] Restore an experiment from a different machine/path #40585

justinvyu opened this issue Oct 23, 2023 · 0 comments · Fixed by #40647
Assignees
Labels
bug Something that is supposed to be working; but isn't P1 Issue that should be fixed within a few weeks train Ray Train Related Issue tune Tune-related issues

Comments

@justinvyu
Copy link
Contributor

What happened + What you expected to happen

  1. Run an experiment on machine A
  2. Copy the contents of the experiment directory to machine B -- where the path is now different
  3. Try to restore for further training/analysis
  4. This will error due to absolute paths being saved in the checkpoints

See user issues:

Versions / Dependencies

2.7.1

Reproduction script

Same as this issue: #28082

This is a regression introduced in 2.7.0.

Temporary workaround is to make sure the paths on the new machine match up, or use ray<2.7.

Issue Severity

Medium: It is a significant difficulty but I can work around it.

@justinvyu justinvyu added bug Something that is supposed to be working; but isn't P1 Issue that should be fixed within a few weeks tune Tune-related issues train Ray Train Related Issue labels Oct 23, 2023
@justinvyu justinvyu self-assigned this Oct 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something that is supposed to be working; but isn't P1 Issue that should be fixed within a few weeks train Ray Train Related Issue tune Tune-related issues
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant