Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[tune] get checkpoints paths for a trial after tuning #6643

Merged
merged 6 commits into from
Jan 17, 2020

Conversation

hhbyyh
Copy link
Contributor

@hhbyyh hhbyyh commented Dec 30, 2019

Add a new API in Analysis to fetch the checkpoint paths for a trial.

Checks

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/20202/
Test PASSed.

@hhbyyh
Copy link
Contributor Author

hhbyyh commented Jan 3, 2020

cc @richardliaw Is this close to what you mentioned? Thanks.

"""
from ray.tune.checkpoint_manager import Checkpoint

checkpoints = trial.checkpoint_manager.best_checkpoints()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is odd because the user may not have access to the Trial object... maybe we can load the path instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I thought this can be the next step after get_best_trial or just using self.trials. We can load from disk check points given the trial logdir.

Shall we support both? i.e. trial can be a trial or a path, and we take different approaches according to its type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I assume this should be a static method that can operate without trials in memory

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah sorry for the slow reply - supporting both would be ideal. get_best_trial is only available for live experiments (not offline analysis).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@hhbyyh
Copy link
Contributor Author

hhbyyh commented Jan 9, 2020

Thanks for the comments. Updated to find checkpoints from logdir.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/20540/
Test FAILed.

@hhbyyh
Copy link
Contributor Author

hhbyyh commented Jan 15, 2020

Sorry for the late reply. Updated to use the checkpoint file. The getting by path part has many hard coded strings, shall we move that part code to TrainableUtil, for reuse and easier change management?

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/20640/
Test FAILed.

@richardliaw
Copy link
Contributor

The getting by path part has many hard coded strings, shall we move that part code to TrainableUtil, for reuse and easier change management?

Yes, this sounds like a good idea to me!

Arguments:
trial(Trial): The log directory of a trial, or a trial instance.
metric (str): key for trial info to return, e.g. "mean_accuracy".
"training_iteration" is used by default.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"training_iteration" is used by default.
"training_iteration" is used by default.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/20679/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/20681/
Test FAILed.

@richardliaw richardliaw merged commit 5f36e6e into ray-project:master Jan 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants