Fix time complexity of the samplers comparison table #3593

HideakiImamura · 2022-05-27T08:08:29Z

Motivation

This is a follow-up of #3571. This PR fixes the time complexity following private discussion with @knshnb and @not522.

Description of the changes

HideakiImamura · 2022-05-27T08:08:46Z

@knshnb @not522 Could you review this PR?

docs/source/reference/samplers/index.rst

not522

Thank you for your great work! Let me discuss each sampler.

RandomSampler

I think O(d) would be more reasonable. It samples values d times in each trial.

GridSampler

~~It requires O(n) + O(number of grids) computation. O(number of grids) term is comparable when n is large. Then, the total complexity is O(n).~~

O(dn) is correct. _same_search_space is the bottleneck. @knshnb Thank you for pointing this out!

optuna/optuna/samplers/_grid.py

Lines 254 to 267 in e972dbb

    
           def _same_search_space(self, search_space: Mapping[str, Sequence[GridValueType]]) -> bool: 
        
               if set(search_space.keys()) != set(self._search_space.keys()): 
        
                   return False 
        
               for param_name in search_space.keys(): 
        
                   if len(search_space[param_name]) != len(self._search_space[param_name]): 
        
                       return False 
        
                   for i, param_value in enumerate(search_space[param_name]): 
        
                       if param_value != self._search_space[param_name][i]: 
        
                           return False 
        
               return True

I will check other samplers later.

knshnb · 2022-05-30T10:59:33Z

Thank you for the PR! This will be helpful information for users.
Let me discuss the following two points.

The dimension of the search space

This is not a strong opinion, but

since all samplers take O(d) to call the :func:~optuna.samplers.BaseSampler.sample_independent

might not be informative enough because the time complexity of sample_independent depends on each sampler. I think d term naturally appears in each sampling algorithm itself (e.g., random sampling in RandomSampler and crossover in NSGAIISampler) and we can directly incorporate it in the Time complexity column.

The number of objectives

Several samplers have ✅ in the Multi-objective optimization column, but it seems only the NSGAIISampler row describes multi-objective optimization. I think adding time complexity for the multi-objective case in TPESampler and BoTorchSampler would be better. Or if it is too complicated, we can skip it now and just add a note that only multi-objective case is considered in only NSGAIISampler.

I'm sorry that I'm not familiar with some samplers and I cannot give a concrete suggestion now. If you have any problems, let me discuss the detail of each sampler.

HideakiImamura · 2022-05-31T03:25:05Z

@not522 @knshnb Thank you for your careful reviews. I basically follow you reviews. PTAL.

@knshnb

we can directly incorporate it in the Time complexity column.

It is the user's responsibility to recognize that sample_independent is called for each parameter in the objective function, so that is where O(d) is taken for all samplers.

codecov-commenter · 2022-06-02T09:22:08Z

Codecov Report

Merging #3593 (c0c2fb7) into master (0d65e1a) will decrease coverage by 0.16%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #3593      +/-   ##
==========================================
- Coverage   91.00%   90.83%   -0.17%     
==========================================
  Files         160      161       +1     
  Lines       12307    12412     +105     
==========================================
+ Hits        11200    11275      +75     
- Misses       1107     1137      +30

Impacted Files	Coverage Δ
optuna/storages/_rdb/models.py	`97.86% <0.00%> (-0.94%)`	⬇️
optuna/storages/_rdb/storage.py	`93.77% <0.00%> (-0.21%)`	⬇️
optuna/integration/botorch.py	`97.80% <0.00%> (ø)`
optuna/storages/_in_memory.py	`100.00% <0.00%> (ø)`
optuna/importance/_fanova/_tree.py	`99.45% <0.00%> (ø)`
optuna/importance/_fanova/_evaluator.py	`98.11% <0.00%> (ø)`
optuna/storages/_rdb/alembic/versions/v3.0.0.c.py	`63.95% <0.00%> (ø)`
optuna/importance/_fanova/_fanova.py	`96.07% <0.00%> (+7.94%)`	⬆️

📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more

knshnb

LGTM! Thanks for the useful discussions and great work!

not522

LGTM!

Fix time complexity

99be910

HideakiImamura added document Documentation related. v3 Issue/PR for Optuna version 3. labels May 27, 2022

HideakiImamura assigned not522 and knshnb May 27, 2022

not522 reviewed May 30, 2022

View reviewed changes

docs/source/reference/samplers/index.rst Outdated Show resolved Hide resolved

not522 reviewed May 30, 2022

View reviewed changes

HideakiImamura added 2 commits May 31, 2022 12:15

Follow review comments

0045911

Fix grid sampler

5aad077

Fix grid sampler2

6e662a1

HideakiImamura added this to the v3.0.0-b1 milestone Jun 2, 2022

Follow mob review comments

c0c2fb7

knshnb approved these changes Jun 2, 2022

View reviewed changes

not522 approved these changes Jun 3, 2022

View reviewed changes

not522 merged commit 7e362b7 into optuna:master Jun 3, 2022

HideakiImamura deleted the fix-samplers-time-comp branch June 9, 2023 02:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix time complexity of the samplers comparison table #3593

Fix time complexity of the samplers comparison table #3593

HideakiImamura commented May 27, 2022

HideakiImamura commented May 27, 2022

not522 left a comment •

edited

knshnb commented May 30, 2022

HideakiImamura commented May 31, 2022 •

edited

codecov-commenter commented Jun 2, 2022

knshnb left a comment

not522 left a comment

	def _same_search_space(self, search_space: Mapping[str, Sequence[GridValueType]]) -> bool:

	if set(search_space.keys()) != set(self._search_space.keys()):
	return False

	for param_name in search_space.keys():
	if len(search_space[param_name]) != len(self._search_space[param_name]):
	return False

	for i, param_value in enumerate(search_space[param_name]):
	if param_value != self._search_space[param_name][i]:
	return False

	return True

Fix time complexity of the samplers comparison table #3593

Fix time complexity of the samplers comparison table #3593

Conversation

HideakiImamura commented May 27, 2022

Motivation

Description of the changes

HideakiImamura commented May 27, 2022

not522 left a comment • edited

Choose a reason for hiding this comment

knshnb commented May 30, 2022

The dimension of the search space

The number of objectives

HideakiImamura commented May 31, 2022 • edited

codecov-commenter commented Jun 2, 2022

Codecov Report

knshnb left a comment

Choose a reason for hiding this comment

not522 left a comment

Choose a reason for hiding this comment

not522 left a comment •

edited

HideakiImamura commented May 31, 2022 •

edited