-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update hyperopt: Choose best model from validation data; For stopped Ray Tune trials, run evaluate at search end #1612
Conversation
debug=debug | ||
) | ||
trial['eval_stats'] = json.dumps(eval_stats, cls=NumpyEncoder) | ||
except NotImplementedError: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious, when does this actually happen?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't actually happen the way we run by default, which is to use the local backend to do batch evaluation.
If one tries to use the ray backend to do batch evaluation, control goes through this code:
def batch_evaluation(self, model, dataset, collect_predictions=False, **kwargs):
raise NotImplementedError(
'Ray backend does not support batch evaluation at this time.'
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the @amholler !
Update hyperopt as discussed recently in automl community.
Details for the first item:
** The metric_score is constrained to stats computed during training; it cannot be drawn from
those only computed during post-train overall stats model evaluation. If needed, additional
stats can be added to the training set.
** The post-train overall stats evaluation for the best model is computed on the validation set.
An overall stats evaluation for the best model run on the test set can be performed by the
use as a separate step after the hyperparameter optimization job is completed.
Details for the second item:
and evaluate the trial's best model after the overall Ray Tune run completes.
training_stats set to train_stats[TRAINING] and eval_stats set to train_stats[VALIDATION].
And when the trial's training completes normally, it then calls tune.report to report the
trial's final results, which include training_stats set to include all 3 train_stats
(train_stats[TRAINING],train_stats[VALIDATION],train_stats[TEST]) and eval_stats set to
the output of running an overall stats evaluation the trial's best model on the eval_set.
tune.report is never executed, meaning that the overall stats evaluation is not computed
and reported as eval_stats. Also, the train_stats[TEST] is not reported in training_stats.
train_stats (train_stats[TRAINING],train_stats[VALIDATION],train_stats[TEST]), with eval_stats
set to empty. And when the overall Ray Tune run completes, for any stopped trials, it loads
and evaluates the trial's best model, setting eval_stats for that trial in ordered_trials,
which is returned and persisted in hyperopt_statistics.json.