-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[tune/train] Get rid of downsyncing path and download just the result.json / result.csv on result #38598
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments. The callback downloads look good.
f"matching {match_prefix}.*{match_suffix}" | ||
) | ||
|
||
restore_latest_match("basic-variant-state", ".json") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The searcher state file is a bit more annoying. We only have this searcher.restore_from_dir
API, so probably we should just download to have minimal API changes.
The filename is not always basic_variant depending on the searcher type.
Glob self._search_alg.CKPT_FILE_TMPL.format("*")
instead.
And if _search_alg
is a SearchGenerator
, you need to download _search_alg.searcher.CKPT_FILE_TMPL
as well...
Or, maybe let's just download all jsons in there as a hack for 2.7 😄
Actually, it's a mix of pkls and jsons. Maybe just convert it all to pkl and download all pkls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh geez.. Yeah maybe let's just download all the *.json *.pkl at the root for now.
Signed-off-by: Eric Liang <ekhliang@gmail.com>
Signed-off-by: Eric Liang <ekhliang@gmail.com>
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
I tested it out and it seems to work (I just try restoring after deleting the local cache dir in the test case). Updated the PR with logic to skip downloading if a trial has no checkpoint. PTAL, I think it's good to go assuming that tests pass. |
…persistence/no_sync_down
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Tests look good to me. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok to merge after tests pass.
….json / result.csv on result (ray-project#38598) Signed-off-by: e428265 <arvind.chandramouli@lmco.com>
….json / result.csv on result (ray-project#38598) Signed-off-by: Victor <vctr.y.m@example.com>
Why are these changes needed?
experiment_state.json
andbasic-variant-state.json
files, as well as theresult.json
andprogress.csv
files for each trial dir. The download of the result files is deferred until trial initialization.Related issue number
Closes #38576