Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix time complexity of the samplers comparison table #3593

Merged
merged 5 commits into from Jun 3, 2022

Conversation

HideakiImamura
Copy link
Member

Motivation

This is a follow-up of #3571. This PR fixes the time complexity following private discussion with @knshnb and @not522.

Description of the changes

@HideakiImamura HideakiImamura added document Documentation related. v3 Issue/PR for Optuna version 3. labels May 27, 2022
@HideakiImamura
Copy link
Member Author

@knshnb @not522 Could you review this PR?

Copy link
Member

@not522 not522 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your great work! Let me discuss each sampler.

RandomSampler

I think O(d) would be more reasonable. It samples values d times in each trial.

GridSampler

It requires O(n) + O(number of grids) computation. O(number of grids) term is comparable when n is large. Then, the total complexity is O(n).

O(dn) is correct. _same_search_space is the bottleneck. @knshnb Thank you for pointing this out!

def _same_search_space(self, search_space: Mapping[str, Sequence[GridValueType]]) -> bool:
if set(search_space.keys()) != set(self._search_space.keys()):
return False
for param_name in search_space.keys():
if len(search_space[param_name]) != len(self._search_space[param_name]):
return False
for i, param_value in enumerate(search_space[param_name]):
if param_value != self._search_space[param_name][i]:
return False
return True

I will check other samplers later.

@knshnb
Copy link
Member

knshnb commented May 30, 2022

Thank you for the PR! This will be helpful information for users.
Let me discuss the following two points.

The dimension of the search space

This is not a strong opinion, but

since all samplers take O(d) to call the :func:~optuna.samplers.BaseSampler.sample_independent

might not be informative enough because the time complexity of sample_independent depends on each sampler. I think d term naturally appears in each sampling algorithm itself (e.g., random sampling in RandomSampler and crossover in NSGAIISampler) and we can directly incorporate it in the Time complexity column.

The number of objectives

Several samplers have ✅ in the Multi-objective optimization column, but it seems only the NSGAIISampler row describes multi-objective optimization. I think adding time complexity for the multi-objective case in TPESampler and BoTorchSampler would be better. Or if it is too complicated, we can skip it now and just add a note that only multi-objective case is considered in only NSGAIISampler.

I'm sorry that I'm not familiar with some samplers and I cannot give a concrete suggestion now. If you have any problems, let me discuss the detail of each sampler.

@HideakiImamura
Copy link
Member Author

HideakiImamura commented May 31, 2022

@not522 @knshnb Thank you for your careful reviews. I basically follow you reviews. PTAL.

@knshnb

we can directly incorporate it in the Time complexity column.

It is the user's responsibility to recognize that sample_independent is called for each parameter in the objective function, so that is where O(d) is taken for all samplers.

@HideakiImamura HideakiImamura added this to the v3.0.0-b1 milestone Jun 2, 2022
@codecov-commenter
Copy link

Codecov Report

Merging #3593 (c0c2fb7) into master (0d65e1a) will decrease coverage by 0.16%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #3593      +/-   ##
==========================================
- Coverage   91.00%   90.83%   -0.17%     
==========================================
  Files         160      161       +1     
  Lines       12307    12412     +105     
==========================================
+ Hits        11200    11275      +75     
- Misses       1107     1137      +30     
Impacted Files Coverage Δ
optuna/storages/_rdb/models.py 97.86% <0.00%> (-0.94%) ⬇️
optuna/storages/_rdb/storage.py 93.77% <0.00%> (-0.21%) ⬇️
optuna/integration/botorch.py 97.80% <0.00%> (ø)
optuna/storages/_in_memory.py 100.00% <0.00%> (ø)
optuna/importance/_fanova/_tree.py 99.45% <0.00%> (ø)
optuna/importance/_fanova/_evaluator.py 98.11% <0.00%> (ø)
optuna/storages/_rdb/alembic/versions/v3.0.0.c.py 63.95% <0.00%> (ø)
optuna/importance/_fanova/_fanova.py 96.07% <0.00%> (+7.94%) ⬆️

📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more

Copy link
Member

@knshnb knshnb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for the useful discussions and great work!

Copy link
Member

@not522 not522 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@not522 not522 merged commit 7e362b7 into optuna:master Jun 3, 2022
@HideakiImamura HideakiImamura deleted the fix-samplers-time-comp branch June 9, 2023 02:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
document Documentation related. v3 Issue/PR for Optuna version 3.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants