New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Dask integration #2023
Add Dask integration #2023
Conversation
Codecov Report
@@ Coverage Diff @@
## master #2023 +/- ##
==========================================
- Coverage 92.26% 92.18% -0.09%
==========================================
Files 124 125 +1
Lines 9559 9810 +251
==========================================
+ Hits 8820 9043 +223
- Misses 739 767 +28
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your PR! I understand that it's WIP but left a single comment since I noticed something while taking a quick look.
Thanks for taking a quick look @hvy! I've pushed some updates and think this PR is ready for review whenever you get a chance |
This pull request has not seen any recent activity. |
Feel free to let me know if there's any additional information or context I can provide to help assist in the code review process |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added comments, could you check them?
optuna/integration/dask.py
Outdated
ext.storages[name] = optuna.storages.get_storage(storage) | ||
|
||
|
||
def _use_basestorage_doc(func: Callable) -> Callable: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[question] Do we need this function to copy docstring from BaseStorage? Actually, other implementation classes of BaseStorage does not have docstring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's certainly not required. I primarily added this to make adding docstrings to DaskStorage
easy. I'm happy to remove it, or move it to a more generic utilities module if you think this could be useful in other places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for being so late with the review, and again, thanks for the PR. I have been a bit busy with reviewing the multi-objective refactoring and implementing a BoTorch wrapper for Optuna, so @keisuke-umezawa jumped in as well. I in any case left some comments and questions if you could take another look.
The one part I'd like to clarify and make sure I understand is around the extension and the registration of the handlers through the dictionary. I couldn't find much documentation so started to look into the source code, but am wondering if this is a common patter among dask users or maybe the internals of dask. It at a quick glance looks like quite a lot of boiler-plate like code for managing the dispatches.
|
||
self._started = asyncio.ensure_future(_register()) | ||
else: | ||
self.client.run_on_scheduler(_register_with_scheduler, storage=storage, name=self.name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is my understanding correct that this line is intended to be executed if and only if either the joblib backend is not "dask"
or when n_jobs
is 1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line (and a few lines up where we also call self.client.run_on_scheduler(_register_with_scheduler, ...)
) will be called anytime a user creates a new DaskStorage
instance. This is where we handle adding new storage instances to the _OptunaSchedulerExtension.storages
dictionary on the scheduler process
examples/dask_simple.py
Outdated
clf.fit(X_train, y_train) | ||
y_pred = clf.predict(X_test) | ||
score = accuracy_score(y_test, y_pred) | ||
return score |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like the score is identical for various hyperparameter configurations and that the best trial is usually among the first one or two. Should we work on it a bit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect this is because of how small the iris dataset is. I updated the example to use a larger dataset and now it takes several iterations to find the best trial. Below is the output from running the updated version of this example
Details:
Dask dashboard is available at http://127.0.0.1:8787/status
examples/dask_simple.py:41: ExperimentalWarning: DaskStorage is experimental (supported from v2.9.0). The interface can change in the future.
storage = optuna.integration.dask.DaskStorage()
[I 2021-07-22 17:40:06,912] A new study created in memory with name: no-name-e485b064-1e1a-4e32-b0ab-b1706c2ab60e
/Users/james/projects/optuna/optuna/optuna/study/study.py:1156: ExperimentalWarning: DaskStudy is experimental (supported from v2.9.0). The interface can change in the future.
study = study_class(study_name=study_name, storage=storage, sampler=sampler, pruner=pruner)
/Users/james/projects/dask/distributed/distributed/protocol/pickle.py:75: ExperimentalWarning: DaskStorage is experimental (supported from v2.9.0). The interface can change in the future.
return pickle.loads(x)
/Users/james/projects/dask/distributed/distributed/protocol/pickle.py:75: ExperimentalWarning: DaskStorage is experimental (supported from v2.9.0). The interface can change in the future.
return pickle.loads(x)
/Users/james/projects/dask/distributed/distributed/protocol/pickle.py:75: ExperimentalWarning: DaskStorage is experimental (supported from v2.9.0). The interface can change in the future.
return pickle.loads(x)
/Users/james/projects/dask/distributed/distributed/protocol/pickle.py:75: ExperimentalWarning: DaskStorage is experimental (supported from v2.9.0). The interface can change in the future.
return pickle.loads(x)
[I 2021-07-22 17:40:07,315] Trial 0 finished with value: 0.9333333333333333 and parameters: {'max_depth': 8, 'n_estimators': 11}. Best is trial 0 with value: 0.9333333333333333.
/Users/james/projects/dask/distributed/distributed/protocol/pickle.py:75: ExperimentalWarning: DaskStorage is experimental (supported from v2.9.0). The interface can change in the future.
return pickle.loads(x)
[I 2021-07-22 17:40:07,337] Trial 1 finished with value: 0.9266666666666666 and parameters: {'max_depth': 8, 'n_estimators': 11}. Best is trial 0 with value: 0.9333333333333333.
/Users/james/projects/dask/distributed/distributed/protocol/pickle.py:75: ExperimentalWarning: DaskStorage is experimental (supported from v2.9.0). The interface can change in the future.
return pickle.loads(x)
[I 2021-07-22 17:40:07,354] Trial 4 finished with value: 0.9222222222222223 and parameters: {'max_depth': 8, 'n_estimators': 11}. Best is trial 6 with value: 0.94.
[I 2021-07-22 17:40:07,355] Trial 6 finished with value: 0.94 and parameters: {'max_depth': 8, 'n_estimators': 11}. Best is trial 6 with value: 0.9422222222222222.
/Users/james/projects/dask/distributed/distributed/protocol/pickle.py:75: ExperimentalWarning: DaskStorage is experimental (supported from v2.9.0). The interface can change in the future.
return pickle.loads(x)
/Users/james/projects/dask/distributed/distributed/protocol/pickle.py:75: ExperimentalWarning: DaskStorage is experimental (supported from v2.9.0). The interface can change in the future.
return pickle.loads(x)
[I 2021-07-22 17:40:07,378] Trial 5 finished with value: 0.9333333333333333 and parameters: {'max_depth': 8, 'n_estimators': 11}. Best is trial 3 with value: 0.9422222222222222.
[I 2021-07-22 17:40:07,384] Trial 7 finished with value: 0.9333333333333333 and parameters: {'max_depth': 8, 'n_estimators': 11}. Best is trial 3 with value: 0.9422222222222222.
[I 2021-07-22 17:40:07,402] Trial 2 finished with value: 0.9333333333333333 and parameters: {'max_depth': 8, 'n_estimators': 11}. Best is trial 3 with value: 0.9422222222222222.
[I 2021-07-22 17:40:07,418] Trial 3 finished with value: 0.9422222222222222 and parameters: {'max_depth': 8, 'n_estimators': 11}. Best is trial 3 with value: 0.9422222222222222.
[I 2021-07-22 17:40:07,645] Trial 8 finished with value: 0.9288888888888889 and parameters: {'max_depth': 8, 'n_estimators': 11}. Best is trial 3 with value: 0.9422222222222222.
/Users/james/projects/dask/distributed/distributed/protocol/pickle.py:75: ExperimentalWarning: DaskStorage is experimental (supported from v2.9.0). The interface can change in the future.
return pickle.loads(x)
[I 2021-07-22 17:40:07,680] Trial 9 finished with value: 0.9444444444444444 and parameters: {'max_depth': 8, 'n_estimators': 11}. Best is trial 10 with value: 0.9466666666666667.
/Users/james/projects/dask/distributed/distributed/protocol/pickle.py:75: ExperimentalWarning: DaskStorage is experimental (supported from v2.9.0). The interface can change in the future.
return pickle.loads(x)
[I 2021-07-22 17:40:07,685] Trial 10 finished with value: 0.9466666666666667 and parameters: {'max_depth': 8, 'n_estimators': 11}. Best is trial 10 with value: 0.9466666666666667.
/Users/james/projects/dask/distributed/distributed/protocol/pickle.py:75: ExperimentalWarning: DaskStorage is experimental (supported from v2.9.0). The interface can change in the future.
return pickle.loads(x)
[I 2021-07-22 17:40:07,693][I 2021-07-22 17:40:07,693] Trial 11 finished with value: 0.9333333333333333 and parameters: {'max_depth': 8, 'n_estimators': 11}. Best is trial 10 with value: 0.9466666666666667.
/Users/james/projects/dask/distributed/distributed/protocol/pickle.py:75: ExperimentalWarning: DaskStorage is experimental (supported from v2.9.0). The interface can change in the future.
return pickle.loads(x)
Trial 14 finished with value: 0.9288888888888889 and parameters: {'max_depth': 8, 'n_estimators': 11}. Best is trial 9 with value: 0.9466666666666667.
[I 2021-07-22 17:40:07,725] Trial 12 finished with value: 0.94 and parameters: {'max_depth': 8, 'n_estimators': 11}. Best is trial 10 with value: 0.9466666666666667.
[I 2021-07-22 17:40:07,767] Trial 13 finished with value: 0.9288888888888889 and parameters: {'max_depth': 8, 'n_estimators': 11}. Best is trial 10 with value: 0.9466666666666667.
[I 2021-07-22 17:40:07,789] Trial 15 finished with value: 0.9266666666666666 and parameters: {'max_depth': 8, 'n_estimators': 11}. Best is trial 10 with value: 0.9466666666666667.
[I 2021-07-22 17:40:08,108] Trial 16 finished with value: 0.9133333333333333 and parameters: {'max_depth': 5, 'n_estimators': 35}. Best is trial 10 with value: 0.9466666666666667.
/Users/james/projects/dask/distributed/distributed/protocol/pickle.py:75: ExperimentalWarning: DaskStorage is experimental (supported from v2.9.0). The interface can change in the future.
return pickle.loads(x)
[I 2021-07-22 17:40:08,156] Trial 17 finished with value: 0.9266666666666666 and parameters: {'max_depth': 5, 'n_estimators': 35}. Best is trial 10 with value: 0.9466666666666667.
/Users/james/projects/dask/distributed/distributed/protocol/pickle.py:75: ExperimentalWarning: DaskStorage is experimental (supported from v2.9.0). The interface can change in the future.
return pickle.loads(x)
[I 2021-07-22 17:40:08,168] Trial 19 finished with value: 0.9155555555555556 and parameters: {'max_depth': 5, 'n_estimators': 35}. Best is trial 10 with value: 0.9466666666666667.
[I 2021-07-22 17:40:08,169] Trial 18 finished with value: 0.9133333333333333 and parameters: {'max_depth': 5, 'n_estimators': 35}. Best is trial 10 with value: 0.9466666666666667.
/Users/james/projects/dask/distributed/distributed/protocol/pickle.py:75: ExperimentalWarning: DaskStorage is experimental (supported from v2.9.0). The interface can change in the future.
return pickle.loads(x)
/Users/james/projects/dask/distributed/distributed/protocol/pickle.py:75: ExperimentalWarning: DaskStorage is experimental (supported from v2.9.0). The interface can change in the future.
return pickle.loads(x)
[I 2021-07-22 17:40:08,204] Trial 21 finished with value: 0.9066666666666666 and parameters: {'max_depth': 5, 'n_estimators': 35}. Best is trial 10 with value: 0.9466666666666667.[I 2021-07-22 17:40:08,204] Trial 20 finished with value: 0.9133333333333333 and parameters: {'max_depth': 5, 'n_estimators': 35}. Best is trial 10 with value: 0.9466666666666667.
[I 2021-07-22 17:40:08,256] Trial 23 finished with value: 0.9022222222222223 and parameters: {'max_depth': 5, 'n_estimators': 35}. Best is trial 10 with value: 0.9466666666666667.
[I 2021-07-22 17:40:08,269] Trial 22 finished with value: 0.8955555555555555 and parameters: {'max_depth': 5, 'n_estimators': 35}. Best is trial 10 with value: 0.9466666666666667.
/Users/james/projects/dask/distributed/distributed/protocol/pickle.py:75: ExperimentalWarning: DaskStorage is experimental (supported from v2.9.0). The interface can change in the future.
return pickle.loads(x)
[I 2021-07-22 17:40:08,897] Trial 24 finished with value: 0.9555555555555556 and parameters: {'max_depth': 10, 'n_estimators': 91}. Best is trial 24 with value: 0.9555555555555556.
/Users/james/projects/dask/distributed/distributed/protocol/pickle.py:75: ExperimentalWarning: DaskStorage is experimental (supported from v2.9.0). The interface can change in the future.
return pickle.loads(x)
[I 2021-07-22 17:40:08,929] Trial 28 finished with value: 0.9488888888888889 and parameters: {'max_depth': 10, 'n_estimators': 91}. Best is trial 24 with value: 0.9555555555555556.
[I 2021-07-22 17:40:08,956] Trial 26 finished with value: 0.9533333333333334 and parameters: {'max_depth': 10, 'n_estimators': 91}. Best is trial 29 with value: 0.9577777777777777.
/Users/james/projects/dask/distributed/distributed/protocol/pickle.py:75: ExperimentalWarning: DaskStorage is experimental (supported from v2.9.0). The interface can change in the future.
return pickle.loads(x)
[I 2021-07-22 17:40:08,964] Trial 29 finished with value: 0.9577777777777777 and parameters: {'max_depth': 10, 'n_estimators': 91}. Best is trial 29 with value: 0.9577777777777777.
/Users/james/projects/dask/distributed/distributed/protocol/pickle.py:75: ExperimentalWarning: DaskStorage is experimental (supported from v2.9.0). The interface can change in the future.
return pickle.loads(x)
[I 2021-07-22 17:40:08,986] Trial 25 finished with value: 0.9577777777777777 and parameters: {'max_depth': 10, 'n_estimators': 91}. Best is trial 25 with value: 0.9577777777777777.
/Users/james/projects/dask/distributed/distributed/protocol/pickle.py:75: ExperimentalWarning: DaskStorage is experimental (supported from v2.9.0). The interface can change in the future.
return pickle.loads(x)
[I 2021-07-22 17:40:09,036] Trial 30 finished with value: 0.9555555555555556 and parameters: {'max_depth': 10, 'n_estimators': 91}. Best is trial 25 with value: 0.9622222222222222.
[I 2021-07-22 17:40:09,070] Trial 27 finished with value: 0.9622222222222222 and parameters: {'max_depth': 10, 'n_estimators': 91}. Best is trial 27 with value: 0.9622222222222222.
/Users/james/projects/dask/distributed/distributed/protocol/pickle.py:75: ExperimentalWarning: DaskStorage is experimental (supported from v2.9.0). The interface can change in the future.
return pickle.loads(x)
[I 2021-07-22 17:40:09,138] Trial 31 finished with value: 0.9555555555555556 and parameters: {'max_depth': 10, 'n_estimators': 91}. Best is trial 27 with value: 0.9622222222222222.
/Users/james/projects/dask/distributed/distributed/protocol/pickle.py:75: ExperimentalWarning: DaskStorage is experimental (supported from v2.9.0). The interface can change in the future.
return pickle.loads(x)
[I 2021-07-22 17:40:09,594] Trial 32 finished with value: 0.9511111111111111 and parameters: {'max_depth': 9, 'n_estimators': 81}. Best is trial 27 with value: 0.9622222222222222.
/Users/james/projects/dask/distributed/distributed/protocol/pickle.py:75: ExperimentalWarning: DaskStorage is experimental (supported from v2.9.0). The interface can change in the future.
return pickle.loads(x)
[I 2021-07-22 17:40:09,624] Trial 33 finished with value: 0.9488888888888889 and parameters: {'max_depth': 9, 'n_estimators': 81}. Best is trial 35 with value: 0.9644444444444444.
/Users/james/projects/dask/distributed/distributed/protocol/pickle.py:75: ExperimentalWarning: DaskStorage is experimental (supported from v2.9.0). The interface can change in the future.
return pickle.loads(x)
[I 2021-07-22 17:40:09,641] Trial 34 finished with value: 0.9488888888888889 and parameters: {'max_depth': 9, 'n_estimators': 81}. Best is trial 35 with value: 0.9644444444444444.
/Users/james/projects/dask/distributed/distributed/protocol/pickle.py:75: ExperimentalWarning: DaskStorage is experimental (supported from v2.9.0). The interface can change in the future.
return pickle.loads(x)
[I 2021-07-22 17:40:09,656] Trial 36 finished with value: 0.9622222222222222 and parameters: {'max_depth': 9, 'n_estimators': 81}. Best is trial 35 with value: 0.9644444444444444.
/Users/james/projects/dask/distributed/distributed/protocol/pickle.py:75: ExperimentalWarning: DaskStorage is experimental (supported from v2.9.0). The interface can change in the future.
return pickle.loads(x)
[I 2021-07-22 17:40:09,673] Trial 35 finished with value: 0.9644444444444444 and parameters: {'max_depth': 9, 'n_estimators': 81}. Best is trial 35 with value: 0.9644444444444444.
[I 2021-07-22 17:40:09,719] Trial 37 finished with value: 0.96 and parameters: {'max_depth': 9, 'n_estimators': 81}. Best is trial 35 with value: 0.9644444444444444.
[I 2021-07-22 17:40:09,759] Trial 38 finished with value: 0.9555555555555556 and parameters: {'max_depth': 9, 'n_estimators': 81}. Best is trial 35 with value: 0.9644444444444444.
/Users/james/projects/dask/distributed/distributed/protocol/pickle.py:75: ExperimentalWarning: DaskStorage is experimental (supported from v2.9.0). The interface can change in the future.
return pickle.loads(x)
[I 2021-07-22 17:40:09,818] Trial 39 finished with value: 0.9555555555555556 and parameters: {'max_depth': 9, 'n_estimators': 81}. Best is trial 35 with value: 0.9644444444444444.
/Users/james/projects/dask/distributed/distributed/protocol/pickle.py:75: ExperimentalWarning: DaskStorage is experimental (supported from v2.9.0). The interface can change in the future.
return pickle.loads(x)
[I 2021-07-22 17:40:10,167] Trial 40 finished with value: 0.8666666666666667 and parameters: {'max_depth': 3, 'n_estimators': 70}. Best is trial 35 with value: 0.9644444444444444.
/Users/james/projects/dask/distributed/distributed/protocol/pickle.py:75: ExperimentalWarning: DaskStorage is experimental (supported from v2.9.0). The interface can change in the future.
return pickle.loads(x)
[I 2021-07-22 17:40:10,203] Trial 41 finished with value: 0.8822222222222222 and parameters: {'max_depth': 3, 'n_estimators': 70}. Best is trial 35 with value: 0.9644444444444444.
/Users/james/projects/dask/distributed/distributed/protocol/pickle.py:75: ExperimentalWarning: DaskStorage is experimental (supported from v2.9.0). The interface can change in the future.
return pickle.loads(x)
[I 2021-07-22 17:40:10,206] Trial 44 finished with value: 0.8511111111111112 and parameters: {'max_depth': 3, 'n_estimators': 70}. Best is trial 35 with value: 0.9644444444444444.
[I 2021-07-22 17:40:10,216] Trial 42 finished with value: 0.86 and parameters: {'max_depth': 3, 'n_estimators': 70}. Best is trial 35 with value: 0.9644444444444444.
/Users/james/projects/dask/distributed/distributed/protocol/pickle.py:75: ExperimentalWarning: DaskStorage is experimental (supported from v2.9.0). The interface can change in the future.
return pickle.loads(x)
[I 2021-07-22 17:40:10,226] Trial 43 finished with value: 0.86 and parameters: {'max_depth': 3, 'n_estimators': 70}. Best is trial 35 with value: 0.9644444444444444.
/Users/james/projects/dask/distributed/distributed/protocol/pickle.py:75: ExperimentalWarning: DaskStorage is experimental (supported from v2.9.0). The interface can change in the future.
return pickle.loads(x)
[I 2021-07-22 17:40:10,291] Trial 45 finished with value: 0.8488888888888889 and parameters: {'max_depth': 3, 'n_estimators': 70}. Best is trial 35 with value: 0.9644444444444444.
[I 2021-07-22 17:40:10,319] Trial 46 finished with value: 0.8555555555555555 and parameters: {'max_depth': 3, 'n_estimators': 70}. Best is trial 35 with value: 0.9644444444444444.
/Users/james/projects/dask/distributed/distributed/protocol/pickle.py:75: ExperimentalWarning: DaskStorage is experimental (supported from v2.9.0). The interface can change in the future.
return pickle.loads(x)
[I 2021-07-22 17:40:10,420] Trial 47 finished with value: 0.8666666666666667 and parameters: {'max_depth': 3, 'n_estimators': 70}. Best is trial 35 with value: 0.9644444444444444.
/Users/james/projects/dask/distributed/distributed/protocol/pickle.py:75: ExperimentalWarning: DaskStorage is experimental (supported from v2.9.0). The interface can change in the future.
return pickle.loads(x)
[I 2021-07-22 17:40:10,961] Trial 48 finished with value: 0.9488888888888889 and parameters: {'max_depth': 7, 'n_estimators': 100}. Best is trial 35 with value: 0.9644444444444444.
/Users/james/projects/dask/distributed/distributed/protocol/pickle.py:75: ExperimentalWarning: DaskStorage is experimental (supported from v2.9.0). The interface can change in the future.
return pickle.loads(x)
[I 2021-07-22 17:40:11,025] Trial 49 finished with value: 0.9377777777777778 and parameters: {'max_depth': 7, 'n_estimators': 100}. Best is trial 35 with value: 0.9644444444444444.
/Users/james/projects/dask/distributed/distributed/protocol/pickle.py:75: ExperimentalWarning: DaskStorage is experimental (supported from v2.9.0). The interface can change in the future.
return pickle.loads(x)
[I 2021-07-22 17:40:11,031] Trial 50 finished with value: 0.9422222222222222 and parameters: {'max_depth': 7, 'n_estimators': 100}. Best is trial 35 with value: 0.9644444444444444.
/Users/james/projects/dask/distributed/distributed/protocol/pickle.py:75: ExperimentalWarning: DaskStorage is experimental (supported from v2.9.0). The interface can change in the future.
return pickle.loads(x)
[I 2021-07-22 17:40:11,054] Trial 51 finished with value: 0.94 and parameters: {'max_depth': 7, 'n_estimators': 100}. Best is trial 35 with value: 0.9644444444444444.
/Users/james/projects/dask/distributed/distributed/protocol/pickle.py:75: ExperimentalWarning: DaskStorage is experimental (supported from v2.9.0). The interface can change in the future.
return pickle.loads(x)
[I 2021-07-22 17:40:11,094] Trial 52 finished with value: 0.9422222222222222 and parameters: {'max_depth': 7, 'n_estimators': 100}. Best is trial 35 with value: 0.9644444444444444.
/Users/james/projects/dask/distributed/distributed/protocol/pickle.py:75: ExperimentalWarning: DaskStorage is experimental (supported from v2.9.0). The interface can change in the future.
return pickle.loads(x)
[I 2021-07-22 17:40:11,125] Trial 53 finished with value: 0.9466666666666667 and parameters: {'max_depth': 7, 'n_estimators': 100}. Best is trial 35 with value: 0.9644444444444444.
[I 2021-07-22 17:40:11,157] Trial 54 finished with value: 0.94 and parameters: {'max_depth': 7, 'n_estimators': 100}. Best is trial 35 with value: 0.9644444444444444.
/Users/james/projects/dask/distributed/distributed/protocol/pickle.py:75: ExperimentalWarning: DaskStorage is experimental (supported from v2.9.0). The interface can change in the future.
return pickle.loads(x)
[I 2021-07-22 17:40:11,263] Trial 55 finished with value: 0.9466666666666667 and parameters: {'max_depth': 7, 'n_estimators': 100}. Best is trial 35 with value: 0.9644444444444444.
/Users/james/projects/dask/distributed/distributed/protocol/pickle.py:75: ExperimentalWarning: DaskStorage is experimental (supported from v2.9.0). The interface can change in the future.
return pickle.loads(x)
[I 2021-07-22 17:40:11,785] Trial 56 finished with value: 0.96 and parameters: {'max_depth': 9, 'n_estimators': 86}. Best is trial 35 with value: 0.9644444444444444.
/Users/james/projects/dask/distributed/distributed/protocol/pickle.py:75: ExperimentalWarning: DaskStorage is experimental (supported from v2.9.0). The interface can change in the future.
return pickle.loads(x)
[I 2021-07-22 17:40:11,891] Trial 57 finished with value: 0.96 and parameters: {'max_depth': 9, 'n_estimators': 86}. Best is trial 35 with value: 0.9644444444444444.
/Users/james/projects/dask/distributed/distributed/protocol/pickle.py:75: ExperimentalWarning: DaskStorage is experimental (supported from v2.9.0). The interface can change in the future.
return pickle.loads(x)
[I 2021-07-22 17:40:11,927] Trial 58 finished with value: 0.9577777777777777 and parameters: {'max_depth': 9, 'n_estimators': 86}. Best is trial 35 with value: 0.9644444444444444.
/Users/james/projects/dask/distributed/distributed/protocol/pickle.py:75: ExperimentalWarning: DaskStorage is experimental (supported from v2.9.0). The interface can change in the future.
return pickle.loads(x)
[I 2021-07-22 17:40:11,939] Trial 59 finished with value: 0.9533333333333334 and parameters: {'max_depth': 9, 'n_estimators': 86}. Best is trial 35 with value: 0.9644444444444444.
/Users/james/projects/dask/distributed/distributed/protocol/pickle.py:75: ExperimentalWarning: DaskStorage is experimental (supported from v2.9.0). The interface can change in the future.
return pickle.loads(x)
[I 2021-07-22 17:40:12,032] Trial 62 finished with value: 0.9533333333333334 and parameters: {'max_depth': 9, 'n_estimators': 86}. Best is trial 35 with value: 0.9644444444444444.
[I 2021-07-22 17:40:12,057] Trial 61 finished with value: 0.9533333333333334 and parameters: {'max_depth': 9, 'n_estimators': 86}. Best is trial 35 with value: 0.9644444444444444.
[I 2021-07-22 17:40:12,132] Trial 60 finished with value: 0.9555555555555556 and parameters: {'max_depth': 9, 'n_estimators': 86}. Best is trial 35 with value: 0.9644444444444444.
/Users/james/projects/dask/distributed/distributed/protocol/pickle.py:75: ExperimentalWarning: DaskStorage is experimental (supported from v2.9.0). The interface can change in the future.
return pickle.loads(x)
[I 2021-07-22 17:40:12,274] Trial 63 finished with value: 0.9644444444444444 and parameters: {'max_depth': 9, 'n_estimators': 86}. Best is trial 35 with value: 0.9644444444444444.
/Users/james/projects/dask/distributed/distributed/protocol/pickle.py:75: ExperimentalWarning: DaskStorage is experimental (supported from v2.9.0). The interface can change in the future.
return pickle.loads(x)
[I 2021-07-22 17:40:12,817] Trial 64 finished with value: 0.9644444444444444 and parameters: {'max_depth': 9, 'n_estimators': 75}. Best is trial 35 with value: 0.9644444444444444.
/Users/james/projects/dask/distributed/distributed/protocol/pickle.py:75: ExperimentalWarning: DaskStorage is experimental (supported from v2.9.0). The interface can change in the future.
return pickle.loads(x)
[I 2021-07-22 17:40:12,871] Trial 65 finished with value: 0.9533333333333334 and parameters: {'max_depth': 10, 'n_estimators': 75}. Best is trial 35 with value: 0.9644444444444444.
/Users/james/projects/dask/distributed/distributed/protocol/pickle.py:75: ExperimentalWarning: DaskStorage is experimental (supported from v2.9.0). The interface can change in the future.
return pickle.loads(x)
[I 2021-07-22 17:40:12,939] Trial 68 finished with value: 0.9622222222222222 and parameters: {'max_depth': 10, 'n_estimators': 75}. Best is trial 35 with value: 0.9644444444444444.
/Users/james/projects/dask/distributed/distributed/protocol/pickle.py:75: ExperimentalWarning: DaskStorage is experimental (supported from v2.9.0). The interface can change in the future.
return pickle.loads(x)
[I 2021-07-22 17:40:12,945] Trial 66 finished with value: 0.9577777777777777 and parameters: {'max_depth': 10, 'n_estimators': 75}. Best is trial 35 with value: 0.9644444444444444.
/Users/james/projects/dask/distributed/distributed/protocol/pickle.py:75: ExperimentalWarning: DaskStorage is experimental (supported from v2.9.0). The interface can change in the future.
return pickle.loads(x)
[I 2021-07-22 17:40:13,001] Trial 69 finished with value: 0.9577777777777777 and parameters: {'max_depth': 10, 'n_estimators': 75}. Best is trial 35 with value: 0.9644444444444444.
[I 2021-07-22 17:40:13,089] Trial 67 finished with value: 0.9555555555555556 and parameters: {'max_depth': 10, 'n_estimators': 75}. Best is trial 35 with value: 0.9644444444444444.
[I 2021-07-22 17:40:13,108] Trial 70 finished with value: 0.9622222222222222 and parameters: {'max_depth': 10, 'n_estimators': 75}. Best is trial 35 with value: 0.9644444444444444.
/Users/james/projects/dask/distributed/distributed/protocol/pickle.py:75: ExperimentalWarning: DaskStorage is experimental (supported from v2.9.0). The interface can change in the future.
return pickle.loads(x)
[I 2021-07-22 17:40:13,221] Trial 71 finished with value: 0.96 and parameters: {'max_depth': 10, 'n_estimators': 75}. Best is trial 35 with value: 0.9644444444444444.
/Users/james/projects/dask/distributed/distributed/protocol/pickle.py:75: ExperimentalWarning: DaskStorage is experimental (supported from v2.9.0). The interface can change in the future.
return pickle.loads(x)
[I 2021-07-22 17:40:13,557] Trial 72 finished with value: 0.9377777777777778 and parameters: {'max_depth': 8, 'n_estimators': 57}. Best is trial 35 with value: 0.9644444444444444.
/Users/james/projects/dask/distributed/distributed/protocol/pickle.py:75: ExperimentalWarning: DaskStorage is experimental (supported from v2.9.0). The interface can change in the future.
return pickle.loads(x)
[I 2021-07-22 17:40:13,622] Trial 73 finished with value: 0.9422222222222222 and parameters: {'max_depth': 8, 'n_estimators': 60}. Best is trial 35 with value: 0.9644444444444444.
/Users/james/projects/dask/distributed/distributed/protocol/pickle.py:75: ExperimentalWarning: DaskStorage is experimental (supported from v2.9.0). The interface can change in the future.
return pickle.loads(x)
[I 2021-07-22 17:40:13,694] Trial 75 finished with value: 0.9488888888888889 and parameters: {'max_depth': 8, 'n_estimators': 60}. Best is trial 35 with value: 0.9644444444444444.
/Users/james/projects/dask/distributed/distributed/protocol/pickle.py:75: ExperimentalWarning: DaskStorage is experimental (supported from v2.9.0). The interface can change in the future.
return pickle.loads(x)
[I 2021-07-22 17:40:13,703] Trial 74 finished with value: 0.9444444444444444 and parameters: {'max_depth': 8, 'n_estimators': 60}. Best is trial 35 with value: 0.9644444444444444.
/Users/james/projects/dask/distributed/distributed/protocol/pickle.py:75: ExperimentalWarning: DaskStorage is experimental (supported from v2.9.0). The interface can change in the future.
return pickle.loads(x)
[I 2021-07-22 17:40:13,783] Trial 76 finished with value: 0.9488888888888889 and parameters: {'max_depth': 8, 'n_estimators': 60}. Best is trial 35 with value: 0.9644444444444444.
[I 2021-07-22 17:40:13,845] Trial 77 finished with value: 0.9466666666666667 and parameters: {'max_depth': 8, 'n_estimators': 60}. Best is trial 35 with value: 0.9644444444444444.
[I 2021-07-22 17:40:13,872] Trial 78 finished with value: 0.9466666666666667 and parameters: {'max_depth': 8, 'n_estimators': 60}. Best is trial 35 with value: 0.9644444444444444.
/Users/james/projects/dask/distributed/distributed/protocol/pickle.py:75: ExperimentalWarning: DaskStorage is experimental (supported from v2.9.0). The interface can change in the future.
return pickle.loads(x)
[I 2021-07-22 17:40:13,984] Trial 79 finished with value: 0.9444444444444444 and parameters: {'max_depth': 8, 'n_estimators': 60}. Best is trial 35 with value: 0.9644444444444444.
/Users/james/projects/dask/distributed/distributed/protocol/pickle.py:75: ExperimentalWarning: DaskStorage is experimental (supported from v2.9.0). The interface can change in the future.
return pickle.loads(x)
[I 2021-07-22 17:40:14,407] Trial 80 finished with value: 0.9577777777777777 and parameters: {'max_depth': 10, 'n_estimators': 78}. Best is trial 35 with value: 0.9644444444444444.
/Users/james/projects/dask/distributed/distributed/protocol/pickle.py:75: ExperimentalWarning: DaskStorage is experimental (supported from v2.9.0). The interface can change in the future.
return pickle.loads(x)
[I 2021-07-22 17:40:14,468] Trial 81 finished with value: 0.9488888888888889 and parameters: {'max_depth': 10, 'n_estimators': 78}. Best is trial 35 with value: 0.9644444444444444.
/Users/james/projects/dask/distributed/distributed/protocol/pickle.py:75: ExperimentalWarning: DaskStorage is experimental (supported from v2.9.0). The interface can change in the future.
return pickle.loads(x)
[I 2021-07-22 17:40:14,611] Trial 82 finished with value: 0.9577777777777777 and parameters: {'max_depth': 10, 'n_estimators': 95}. Best is trial 35 with value: 0.9644444444444444.
/Users/james/projects/dask/distributed/distributed/protocol/pickle.py:75: ExperimentalWarning: DaskStorage is experimental (supported from v2.9.0). The interface can change in the future.
return pickle.loads(x)
[I 2021-07-22 17:40:14,634] Trial 83 finished with value: 0.9577777777777777 and parameters: {'max_depth': 10, 'n_estimators': 95}. Best is trial 35 with value: 0.9644444444444444.
/Users/james/projects/dask/distributed/distributed/protocol/pickle.py:75: ExperimentalWarning: DaskStorage is experimental (supported from v2.9.0). The interface can change in the future.
return pickle.loads(x)
[I 2021-07-22 17:40:14,691] Trial 85 finished with value: 0.96 and parameters: {'max_depth': 10, 'n_estimators': 95}. Best is trial 84 with value: 0.9688888888888889.
[I 2021-07-22 17:40:14,739] Trial 84 finished with value: 0.9688888888888889 and parameters: {'max_depth': 10, 'n_estimators': 95}. Best is trial 84 with value: 0.9688888888888889.
[I 2021-07-22 17:40:14,781] Trial 86 finished with value: 0.9577777777777777 and parameters: {'max_depth': 10, 'n_estimators': 95}. Best is trial 84 with value: 0.9688888888888889.
/Users/james/projects/dask/distributed/distributed/protocol/pickle.py:75: ExperimentalWarning: DaskStorage is experimental (supported from v2.9.0). The interface can change in the future.
return pickle.loads(x)
[I 2021-07-22 17:40:14,856] Trial 87 finished with value: 0.9533333333333334 and parameters: {'max_depth': 10, 'n_estimators': 95}. Best is trial 84 with value: 0.9688888888888889.
/Users/james/projects/dask/distributed/distributed/protocol/pickle.py:75: ExperimentalWarning: DaskStorage is experimental (supported from v2.9.0). The interface can change in the future.
return pickle.loads(x)
[I 2021-07-22 17:40:15,262] Trial 89 finished with value: 0.9533333333333334 and parameters: {'max_depth': 9, 'n_estimators': 78}. Best is trial 84 with value: 0.9688888888888889.
/Users/james/projects/dask/distributed/distributed/protocol/pickle.py:75: ExperimentalWarning: DaskStorage is experimental (supported from v2.9.0). The interface can change in the future.
return pickle.loads(x)
[I 2021-07-22 17:40:15,288] Trial 88 finished with value: 0.9511111111111111 and parameters: {'max_depth': 9, 'n_estimators': 95}. Best is trial 84 with value: 0.9688888888888889.
/Users/james/projects/dask/distributed/distributed/protocol/pickle.py:75: ExperimentalWarning: DaskStorage is experimental (supported from v2.9.0). The interface can change in the future.
return pickle.loads(x)
[I 2021-07-22 17:40:15,424] Trial 90 finished with value: 0.9555555555555556 and parameters: {'max_depth': 9, 'n_estimators': 83}. Best is trial 84 with value: 0.9688888888888889.
/Users/james/projects/dask/distributed/distributed/protocol/pickle.py:75: ExperimentalWarning: DaskStorage is experimental (supported from v2.9.0). The interface can change in the future.
return pickle.loads(x)
[I 2021-07-22 17:40:15,453] Trial 91 finished with value: 0.9622222222222222 and parameters: {'max_depth': 9, 'n_estimators': 83}. Best is trial 84 with value: 0.9688888888888889.
/Users/james/projects/dask/distributed/distributed/protocol/pickle.py:75: ExperimentalWarning: DaskStorage is experimental (supported from v2.9.0). The interface can change in the future.
return pickle.loads(x)
[I 2021-07-22 17:40:15,552] Trial 92 finished with value: 0.96 and parameters: {'max_depth': 9, 'n_estimators': 83}. Best is trial 84 with value: 0.9688888888888889.
[I 2021-07-22 17:40:15,555] Trial 93 finished with value: 0.9555555555555556 and parameters: {'max_depth': 9, 'n_estimators': 83}. Best is trial 84 with value: 0.9688888888888889.
[I 2021-07-22 17:40:15,602] Trial 94 finished with value: 0.9644444444444444 and parameters: {'max_depth': 9, 'n_estimators': 83}. Best is trial 84 with value: 0.9688888888888889.
[I 2021-07-22 17:40:15,677] Trial 95 finished with value: 0.9488888888888889 and parameters: {'max_depth': 9, 'n_estimators': 83}. Best is trial 84 with value: 0.9688888888888889.
[I 2021-07-22 17:40:15,816] Trial 97 finished with value: 0.96 and parameters: {'max_depth': 9, 'n_estimators': 83}. Best is trial 84 with value: 0.9688888888888889.
[I 2021-07-22 17:40:15,821] Trial 96 finished with value: 0.9644444444444444 and parameters: {'max_depth': 9, 'n_estimators': 83}. Best is trial 84 with value: 0.9688888888888889.
[I 2021-07-22 17:40:15,869] Trial 98 finished with value: 0.96 and parameters: {'max_depth': 9, 'n_estimators': 78}. Best is trial 84 with value: 0.9688888888888889.
[I 2021-07-22 17:40:15,892] Trial 99 finished with value: 0.9577777777777777 and parameters: {'max_depth': 9, 'n_estimators': 78}. Best is trial 84 with value: 0.9688888888888889.
Best params: {'max_depth': 10, 'n_estimators': 95}
This pull request has not seen any recent activity. |
@jrbourbeau, don't mean to stress but do you have any concerns or questions with the review state so far? Just leaving a heads up that we've made some recent changes to the base storage class in order to better support multi objective optimization, #1980 that we probably should follow up on here. Some singular methods are made plural for instance, "direction" -> "directions", "value" -> "values". Don't hesitate to let me know if things are unclear. |
This pull request has not seen any recent activity. |
Thanks @c-bata. I'm back in the office and will roll back the |
Alright, the |
James, TODO for when this is done. Can we also add Optuna to dask-examples?
…On Tue, Nov 8, 2022 at 1:19 PM James Bourbeau ***@***.***> wrote:
Alright, the DaskStudy-related changes have been reverted both here and
over in optuna/optuna-examples#46
<optuna/optuna-examples#46>. @c-bata
<https://github.com/c-bata> @g-votte <https://github.com/g-votte> let me
know what you think about the current state of this PR and if there are any
additional comments you have.
—
Reply to this email directly, view it on GitHub
<#2023 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACKZTEZYQ55FLTFU22MLE3WHKRVXANCNFSM4TVX7CQA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Yeah, definitely |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jrbourbeau Thank you for the update!
The source code looks clear and is easy to read 💯 Looks good overall. I have some comments and questions though.
optuna.integration.dask
module only importsdistributed
package. So My question is, ifdistributed
package should be listed as an optional dependency, notdask[distributed]
.- I suggest you remove
_use_basestorage_doc
from this PR. I would say it is not required since storage methods except for__init__()
are not intended to be called by Optuna users directly. At least, it should not be defined indask.py
, so perhaps we can work on it in the following pull request with other storage classes.
Although there may be room for improvements regarding unit tests, it is acceptable for now since DaskStorage
is an experimental feature. It would be nice if DaskStorage
could be tested in the same test scenarios written in storages_tests/test_samplers.py
as the same with other storage implementations, but it should be addressed as a follow-up to keep the PR changes small.
"set_trial_system_attr", | ||
"get_trial", | ||
"get_all_trials", | ||
"get_n_trials", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you deliberately avoid to implement get_best_trial
, get_trial_params
, get_trial_user_attr
, and get_trial_system_attrs
, or is it better to support them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As is, these methods work since they internally only reference other abstract storage methods which are implemented in this PR. So the existing implementation in BaseStorage
should be sufficient. Though, you're correct that things might be sub-optimal as we're sending intermediate results back and forth between the client and scheduler processes.
Definitely open to thoughts from others, but my personal preference would be to hold off on adding these methods in this PR. Then, after this PR is merged, we can do some benchmarking to see how much benefit we'd get in practice by adding these derivative methods. If it turns out to be a sizable performance impact, then adding these methods should be relatively straightforward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure 👍 In terms of performance impact, it obviously depends on storage implementations. Probably you’ve already noticed, but for example, RDBStorage
overrides these methods to issue more efficient SQL queries.
optuna/optuna/storages/_rdb/storage.py
Lines 402 to 422 in f02f87d
def get_trial_user_attrs(self, trial_id: int) -> Dict[str, Any]: | |
with _create_scoped_session(self.scoped_session) as session: | |
# Ensure trial exists. | |
models.TrialModel.find_or_raise_by_id(trial_id, session) | |
attributes = models.TrialUserAttributeModel.where_trial_id(trial_id, session) | |
user_attrs = {attr.key: json.loads(attr.value_json) for attr in attributes} | |
return user_attrs | |
def get_trial_system_attrs(self, trial_id: int) -> Dict[str, Any]: | |
with _create_scoped_session(self.scoped_session) as session: | |
# Ensure trial exists. | |
models.TrialModel.find_or_raise_by_id(trial_id, session) | |
attributes = models.TrialSystemAttributeModel.where_trial_id(trial_id, session) | |
system_attrs = {attr.key: json.loads(attr.value_json) for attr in attributes} | |
return system_attrs |
optuna/optuna/storages/_rdb/storage.py
Lines 951 to 967 in f02f87d
def get_best_trial(self, study_id: int) -> FrozenTrial: | |
with _create_scoped_session(self.scoped_session) as session: | |
_directions = self.get_study_directions(study_id) | |
if len(_directions) > 1: | |
raise RuntimeError( | |
"Best trial can be obtained only for single-objective optimization." | |
) | |
direction = _directions[0] | |
if direction == StudyDirection.MAXIMIZE: | |
trial = models.TrialModel.find_max_value_trial(study_id, 0, session) | |
else: | |
trial = models.TrialModel.find_min_value_trial(study_id, 0, session) | |
trial_id = trial.trial_id | |
return self.get_trial(trial_id) |
Anyway, as you said, adding these methods should be relatively straightforward. So you can work on it as a follow-up.
"set_trial_system_attr", | ||
"get_trial", | ||
"get_all_trials", | ||
"get_n_trials", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure 👍 In terms of performance impact, it obviously depends on storage implementations. Probably you’ve already noticed, but for example, RDBStorage
overrides these methods to issue more efficient SQL queries.
optuna/optuna/storages/_rdb/storage.py
Lines 402 to 422 in f02f87d
def get_trial_user_attrs(self, trial_id: int) -> Dict[str, Any]: | |
with _create_scoped_session(self.scoped_session) as session: | |
# Ensure trial exists. | |
models.TrialModel.find_or_raise_by_id(trial_id, session) | |
attributes = models.TrialUserAttributeModel.where_trial_id(trial_id, session) | |
user_attrs = {attr.key: json.loads(attr.value_json) for attr in attributes} | |
return user_attrs | |
def get_trial_system_attrs(self, trial_id: int) -> Dict[str, Any]: | |
with _create_scoped_session(self.scoped_session) as session: | |
# Ensure trial exists. | |
models.TrialModel.find_or_raise_by_id(trial_id, session) | |
attributes = models.TrialSystemAttributeModel.where_trial_id(trial_id, session) | |
system_attrs = {attr.key: json.loads(attr.value_json) for attr in attributes} | |
return system_attrs |
optuna/optuna/storages/_rdb/storage.py
Lines 951 to 967 in f02f87d
def get_best_trial(self, study_id: int) -> FrozenTrial: | |
with _create_scoped_session(self.scoped_session) as session: | |
_directions = self.get_study_directions(study_id) | |
if len(_directions) > 1: | |
raise RuntimeError( | |
"Best trial can be obtained only for single-objective optimization." | |
) | |
direction = _directions[0] | |
if direction == StudyDirection.MAXIMIZE: | |
trial = models.TrialModel.find_max_value_trial(study_id, 0, session) | |
else: | |
trial = models.TrialModel.find_min_value_trial(study_id, 0, session) | |
trial_id = trial.trial_id | |
return self.get_trial(trial_id) |
Anyway, as you said, adding these methods should be relatively straightforward. So you can work on it as a follow-up.
Co-authored-by: Masashi Shibata <c-bata@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jrbourbeau !
LGTM! Let me merge this PR after 1 more approval. @amylase Could you review this PR?
hvy is unassigned from this pull request since he is busy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jrbourbeau Could you resolve a conflict?
Merge conflicts resolved 👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Just checking in -- @amylase do you think you'll have bandwidth to take a look at the changes here? @c-bata I'm not familiar with the Optuna team's development practices around merging PRs. Just for my own understanding, are two core dev approvals needed? |
Yes, two core-dev approvals are required. However, this PR has had file conflicts many times when merging other PRs. I understand it increases your work. I will ask other maintainers today whether I can merge this PR with only my 1 approval. Thank you for your patience 🙇 |
Merged 👍 Thank you for your pull request! |
Thanks all for your efforts in reviewing! |
Motivation
This PR adds initial native Dask support for Optuna. See the discussion over in #1766 for additional context.
Description of the changes
This PR adds a new
DaskStorage
class, which is adapted from https://github.com/jrbourbeau/dask-optuna, to OptunaStill TODO:
DaskStorage
Closes #1766