[tune] get checkpoints paths for a trial after tuning #6643

hhbyyh · 2019-12-30T23:47:17Z

Add a new API in Analysis to fetch the checkpoint paths for a trial.

Checks

I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://ray.readthedocs.io/en/latest/.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failure rates at https://ray-travis-tracker.herokuapp.com/.

AmplabJenkins · 2019-12-30T23:50:22Z

Can one of the admins verify this patch?

AmplabJenkins · 2019-12-31T04:20:27Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/20202/
Test PASSed.

hhbyyh · 2020-01-03T05:34:00Z

cc @richardliaw Is this close to what you mentioned? Thanks.

python/ray/tune/analysis/experiment_analysis.py

richardliaw · 2020-01-03T05:43:59Z

python/ray/tune/analysis/experiment_analysis.py

+        """
+        from ray.tune.checkpoint_manager import Checkpoint
+
+        checkpoints = trial.checkpoint_manager.best_checkpoints()


this is odd because the user may not have access to the Trial object... maybe we can load the path instead?

Oh I thought this can be the next step after get_best_trial or just using self.trials. We can load from disk check points given the trial logdir.

Shall we support both? i.e. trial can be a trial or a path, and we take different approaches according to its type.

Also I assume this should be a static method that can operate without trials in memory

Ah sorry for the slow reply - supporting both would be ideal. get_best_trial is only available for live experiments (not offline analysis).

python/ray/tune/analysis/experiment_analysis.py

hhbyyh · 2020-01-09T02:32:45Z

Thanks for the comments. Updated to find checkpoints from logdir.

python/ray/tune/analysis/experiment_analysis.py

AmplabJenkins · 2020-01-09T03:20:21Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/20540/
Test FAILed.

hhbyyh · 2020-01-15T03:26:19Z

Sorry for the late reply. Updated to use the checkpoint file. The getting by path part has many hard coded strings, shall we move that part code to TrainableUtil, for reuse and easier change management?

AmplabJenkins · 2020-01-15T07:58:04Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/20640/
Test FAILed.

richardliaw · 2020-01-15T09:24:28Z

The getting by path part has many hard coded strings, shall we move that part code to TrainableUtil, for reuse and easier change management?

Yes, this sounds like a good idea to me!

richardliaw · 2020-01-15T09:25:00Z

python/ray/tune/analysis/experiment_analysis.py

+        Arguments:
+            trial(Trial): The log directory of a trial, or a trial instance.
+            metric (str): key for trial info to return, e.g. "mean_accuracy".
+            "training_iteration" is used by default.


Suggested change

"training_iteration" is used by default.

"training_iteration" is used by default.

AmplabJenkins · 2020-01-16T04:33:35Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/20679/
Test FAILed.

AmplabJenkins · 2020-01-16T04:50:21Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/20681/
Test FAILed.

richardliaw reviewed Jan 3, 2020

View reviewed changes

python/ray/tune/analysis/experiment_analysis.py Outdated Show resolved Hide resolved

richardliaw reviewed Jan 3, 2020

View reviewed changes

python/ray/tune/analysis/experiment_analysis.py Outdated Show resolved Hide resolved

richardliaw reviewed Jan 3, 2020

View reviewed changes

python/ray/tune/analysis/experiment_analysis.py Outdated Show resolved Hide resolved

richardliaw reviewed Jan 3, 2020

View reviewed changes

ujvl reviewed Jan 3, 2020

View reviewed changes

python/ray/tune/analysis/experiment_analysis.py Outdated Show resolved Hide resolved

richardliaw self-assigned this Jan 7, 2020

hhbyyh force-pushed the checkpointPath branch from fc1f4c8 to cf6741b Compare January 9, 2020 02:30

hhbyyh commented Jan 9, 2020

View reviewed changes

python/ray/tune/analysis/experiment_analysis.py Show resolved Hide resolved

hhbyyh added 3 commits January 14, 2020 18:23

access checkpoints path

c1716bf

load path

f98d621

use checkpoint file

9c8c850

hhbyyh force-pushed the checkpointPath branch from cf6741b to 9c8c850 Compare January 15, 2020 03:25

richardliaw reviewed Jan 15, 2020

View reviewed changes

hhbyyh added 3 commits January 15, 2020 14:17

move to TrainableUtil

dcbb144

update test names

b1fc9ef

style

ed3e6e5

richardliaw approved these changes Jan 17, 2020

View reviewed changes

richardliaw merged commit 5f36e6e into ray-project:master Jan 17, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[tune] get checkpoints paths for a trial after tuning #6643

[tune] get checkpoints paths for a trial after tuning #6643

hhbyyh commented Dec 30, 2019

AmplabJenkins commented Dec 30, 2019

AmplabJenkins commented Dec 31, 2019

hhbyyh commented Jan 3, 2020

richardliaw Jan 3, 2020

hhbyyh Jan 3, 2020

hhbyyh Jan 3, 2020

richardliaw Jan 7, 2020

hhbyyh Jan 9, 2020

hhbyyh commented Jan 9, 2020

AmplabJenkins commented Jan 9, 2020

hhbyyh commented Jan 15, 2020 •

edited

Loading

AmplabJenkins commented Jan 15, 2020

richardliaw commented Jan 15, 2020

richardliaw Jan 15, 2020

AmplabJenkins commented Jan 16, 2020

AmplabJenkins commented Jan 16, 2020

	"training_iteration" is used by default.
	"training_iteration" is used by default.

[tune] get checkpoints paths for a trial after tuning #6643

[tune] get checkpoints paths for a trial after tuning #6643

Conversation

hhbyyh commented Dec 30, 2019

Checks

AmplabJenkins commented Dec 30, 2019

AmplabJenkins commented Dec 31, 2019

hhbyyh commented Jan 3, 2020

richardliaw Jan 3, 2020

Choose a reason for hiding this comment

hhbyyh Jan 3, 2020

Choose a reason for hiding this comment

hhbyyh Jan 3, 2020

Choose a reason for hiding this comment

richardliaw Jan 7, 2020

Choose a reason for hiding this comment

hhbyyh Jan 9, 2020

Choose a reason for hiding this comment

hhbyyh commented Jan 9, 2020

AmplabJenkins commented Jan 9, 2020

hhbyyh commented Jan 15, 2020 • edited Loading

AmplabJenkins commented Jan 15, 2020

richardliaw commented Jan 15, 2020

richardliaw Jan 15, 2020

Choose a reason for hiding this comment

AmplabJenkins commented Jan 16, 2020

AmplabJenkins commented Jan 16, 2020

hhbyyh commented Jan 15, 2020 •

edited

Loading