Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add QMC sampler #2423

Merged
merged 83 commits into from
Jan 27, 2022
Merged
Show file tree
Hide file tree
Changes from 81 commits
Commits
Show all changes
83 commits
Select commit Hold shift + click to select a range
08d446b
add sobol sampler
Sep 8, 2020
5ebd943
Merge branch 'master' into sobol-sampler
kstoneriv3 Sep 20, 2020
227a410
modify import
kstoneriv3 Sep 21, 2020
f66ce66
update comments
kstoneriv3 Sep 22, 2020
e18f7bc
merge
kstoneriv3 Feb 24, 2021
1b63ac8
remove sobol_seq
kstoneriv3 Feb 26, 2021
9b77379
Merge branch 'master' into sobol-sampler
kstoneriv3 Feb 26, 2021
da4afd5
re-design qmc sampler
kstoneriv3 Feb 28, 2021
a22163e
Merge branch 'master' into sobol-sampler
kstoneriv3 Feb 28, 2021
3324814
Merge branch 'master' into sobol-sampler
kstoneriv3 Feb 28, 2021
5b9316c
Merge branch 'master' of https://github.com/optuna/optuna
kstoneriv3 Feb 28, 2021
38146d8
Merge branch 'master' into feature/qmc-sampler
kstoneriv3 Feb 28, 2021
af638c8
modify format
kstoneriv3 Feb 28, 2021
04dbb26
format with black
kstoneriv3 Feb 28, 2021
44d2c08
remove comments
kstoneriv3 Feb 28, 2021
68f493e
adopt to mypy
kstoneriv3 Feb 28, 2021
1c79e13
Apply suggestions from code review
kstoneriv3 Mar 2, 2021
e69eabc
Merge branch 'master' of https://github.com/optuna/optuna
kstoneriv3 Mar 2, 2021
d06d3d0
Merge branch 'master' into feature/qmc-sampler
kstoneriv3 Mar 2, 2021
d734437
fix typos
kstoneriv3 Mar 2, 2021
bafae2b
add docstring
kstoneriv3 Mar 2, 2021
d63fab9
add comment
kstoneriv3 Mar 2, 2021
bd87bf1
update
kstoneriv3 Mar 3, 2021
2a1a3ce
add reseed
kstoneriv3 Mar 3, 2021
e74026b
make it more stateless
kstoneriv3 Mar 3, 2021
a832f29
update
kstoneriv3 Mar 3, 2021
9004da8
add some tests for qmc
kstoneriv3 Mar 3, 2021
417ad68
update
kstoneriv3 Mar 4, 2021
2c2af67
support categorical
kstoneriv3 Mar 5, 2021
0eff706
before dropping support for categorical
kstoneriv3 Mar 6, 2021
3132568
drop support for Latin hypercubes and relative sampling of categorica…
kstoneriv3 Mar 6, 2021
08095fb
require scipy 1.7.0
kstoneriv3 Mar 9, 2021
42b4a7f
reflect review
kstoneriv3 Mar 13, 2021
9013512
update a bit
kstoneriv3 Mar 17, 2021
a4db2c5
add tests and fix bugs
kstoneriv3 Mar 19, 2021
1c4f4a6
Merge branch 'master' of https://github.com/optuna/optuna
kstoneriv3 Mar 19, 2021
1e05676
Merge branch 'master' into feature/qmc-sampler
kstoneriv3 Mar 19, 2021
6e27391
remove unused imports
kstoneriv3 Mar 19, 2021
e5c3964
add tests and fix a bug
kstoneriv3 Mar 19, 2021
d0d6518
Merge branch 'master' into feature/qmc-sampler
Jun 24, 2021
7c4b3f0
Merge branch 'master' into feature/qmc-sampler
Jul 4, 2021
31a2e00
skip tests for python 3.6
Jul 4, 2021
92765aa
fix flake8
Jul 4, 2021
84703d4
import OrderedDict from collections, not from typing
Jul 4, 2021
ce81c33
temporarily skip a test to run CI
Jul 6, 2021
ebe8095
define empty QMCSampler for python 3.6
Jul 6, 2021
cf66127
fix empty QMCSampler class
Jul 6, 2021
d6a5d15
fix formatter
Jul 6, 2021
bbeba12
fix formatter
Jul 6, 2021
013bf56
fix mypy
Jul 6, 2021
121e5b2
Update optuna/samplers/_qmc.py
kstoneriv3 Jul 29, 2021
dfb86cd
Update optuna/samplers/_qmc.py
kstoneriv3 Jul 29, 2021
1c958fd
Update optuna/samplers/_qmc.py
kstoneriv3 Jul 29, 2021
d0771a3
fix random seed
Jul 29, 2021
bbe7e06
fix default argument of scramble
Jul 29, 2021
e72d243
Merge branch 'master' into feature/qmc-sampler
Jul 29, 2021
fad5610
Merge branch 'master' into feature/qmc-sampler
Oct 20, 2021
9a02e7d
remove init argument "search_space"
Oct 20, 2021
0e41caa
fix mypy
Oct 20, 2021
4a6ef96
add and modify tests for logger methods
Oct 22, 2021
d933179
Apply suggestions from code review
kstoneriv3 Oct 25, 2021
10948e2
modify according to reviews
Oct 26, 2021
afac48f
remove unnecessary comment out
Oct 26, 2021
7f1dfa8
replace numpy with np
Nov 6, 2021
30dc72c
remove unnecessary array slicing
Nov 6, 2021
1a1793e
fix according to comments
Nov 6, 2021
5734766
merge master
Nov 6, 2021
4ea95e2
remove remaining print statement for debug
Nov 6, 2021
d652971
fix by skipping test for python 3.6
Nov 7, 2021
8f64754
Merge branch 'master' into feature/qmc-sampler
Nov 8, 2021
b1b28eb
modify the version at experimental decorator from 3.0.0a1 to 3.0.0
Nov 8, 2021
b9e4393
remove caching of QMCEngine
Nov 12, 2021
9d8d75f
simplify the storage key used in `QMCSampler._find_sample_id`
Nov 12, 2021
74d320f
tiny fix following the removal of caching of QMCEngine
Nov 12, 2021
a6a9099
Merge branch 'master' into feature/qmc-sampler
Jan 24, 2022
f5125b3
Apply suggestions from code review
kstoneriv3 Jan 24, 2022
a0e3c24
Merge branch 'feature/qmc-sampler' of github.com:kstoneriv3/optuna in…
Jan 24, 2022
ee532db
modify as suggested in the reviews
Jan 24, 2022
4f1f281
Apply suggestions from code review
kstoneriv3 Jan 24, 2022
2679f5e
reflect changes recommended in the reviews
Jan 24, 2022
73dbf53
fix import order
Jan 24, 2022
7c58199
Update optuna/samplers/_qmc.py
kstoneriv3 Jan 25, 2022
3edc8b3
change default qmc_type to sobol
Jan 25, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/source/reference/samplers.rst
Original file line number Diff line number Diff line change
Expand Up @@ -20,5 +20,6 @@ The :mod:`~optuna.samplers` module defines a base class for parameter sampling a
optuna.samplers.PartialFixedSampler
optuna.samplers.NSGAIISampler
optuna.samplers.MOTPESampler
optuna.samplers.QMCSampler
optuna.samplers.IntersectionSearchSpace
optuna.samplers.intersection_search_space
2 changes: 2 additions & 0 deletions optuna/samplers/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
from optuna.samplers._grid import GridSampler
from optuna.samplers._nsga2.sampler import NSGAIISampler
from optuna.samplers._partial_fixed import PartialFixedSampler
from optuna.samplers._qmc import QMCSampler
from optuna.samplers._random import RandomSampler
from optuna.samplers._search_space import intersection_search_space
from optuna.samplers._search_space import IntersectionSearchSpace
Expand All @@ -18,6 +19,7 @@
"MOTPESampler",
"NSGAIISampler",
"PartialFixedSampler",
"QMCSampler",
kstoneriv3 marked this conversation as resolved.
Show resolved Hide resolved
"RandomSampler",
"TPESampler",
"intersection_search_space",
Expand Down
330 changes: 330 additions & 0 deletions optuna/samplers/_qmc.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,330 @@
import sys
from typing import Any
from typing import Dict
from typing import Optional
from typing import Sequence

import numpy as np

import optuna
from optuna import logging
from optuna._experimental import experimental
from optuna._imports import _LazyImport
from optuna._transform import _SearchSpaceTransform
from optuna.distributions import BaseDistribution
from optuna.distributions import CategoricalDistribution
from optuna.samplers import BaseSampler
from optuna.study import Study
from optuna.trial import FrozenTrial
from optuna.trial import TrialState


_logger = logging.get_logger(__name__)

_SUGGESTED_STATES = (TrialState.COMPLETE, TrialState.PRUNED)


@experimental("3.0.0")
class QMCSampler(BaseSampler):
"""A Quasi Monte Carlo Sampler that generates low-discrepancy sequences.

Quasi Monte Carlo (QMC) sequences are designed to have lower discrepancies than
standard random seqeunces. They are known to perform better than the standard
randam sequences in hyperparameter optimization.

For further information about the use of QMC sequences for hyperparameter optimization,
please refer to the following paper:

- `Bergstra, James, and Yoshua Bengio. Random search for hyper-parameter optimization.
Journal of machine learning research 13.2, 2012.
<https://jmlr.org/papers/v13/bergstra12a.html>`_

We use the QMC implementations in Scipy. For the details of the QMC algorithm,
see the Scipy API references on `scipy.stats.qmc
<https://scipy.github.io/devdocs/reference/stats.qmc.html>`_.

.. note:
If your search space contains categorical parameters, it samples the catagorical
parameters by its `independent_sampler` without using QMC algorithm.
kstoneriv3 marked this conversation as resolved.
Show resolved Hide resolved

.. note::
The search space of the sampler is determined by either previous trials in the study or
the first trial that this sampler samples.

If there are previous trials in the study, :class:`~optuna.samplers.QMCSamper` infers its
search space using the trial which was created first in the study.

Otherwise (if the study has no previous trials), :class:`~optuna.samplers.QMCSampler`
samples the first trial using its `independent_sampler` and then infers the search space
in the second trial.
kstoneriv3 marked this conversation as resolved.
Show resolved Hide resolved

As mentioned above, the search space of the :class:`~optuna.sampler.QMCSampler` is
determined by the first trial of the study. Once the search space is determined, it cannot
be changed afterwards.

.. note:
`QMCSampler` is not supported for Python 3.6 as it depends on `scipy.stat.qmc` module which
only supports Python 3.7 or the later versions.

Args:
qmc_type:
The type of QMC sequence to be sampled. This must be one of
`"halton"` and `"sobol"`. Default is `"halton"`.

.. note::
Sobol' sequence is designed to have low-discrepancy property when the number of
samples is :math:`n=2^m` for each positive integer :math:`m`. When it is possible
to pre-specify the number of trials suggested by `QMCSampler`, it is recommended
that the number of trials should be set as power of two.

scramble:
kstoneriv3 marked this conversation as resolved.
Show resolved Hide resolved
If this option is :obj:`True`, scrambling (randomization) is applied to the QMC
sequences.

seed:
A seed for `QMCSampler`. This argument is used only when `scramble` is :obj:`True`.
If this is :obj:`None`, the seed is initialized randomly. Default is :obj:`None`.

.. note::
When using multiple :class:`~optuna.samplers.QMCSampler`'s in parallel and/or
distributed optimization, all the samplers must share the same seed when the
`scrambling` is enabled. Otherwise, the low-discrepancy property of the samples
will be degraded.

independent_sampler:
A :class:`~optuna.samplers.BaseSampler` instance that is used for independent
sampling. The first trial of the study and the parameters not contained in the
relative search space are sampled by this sampler.

If :obj:`None` is specified, :class:`~optuna.samplers.RandomSampler` is used
as the default.

.. seealso::
:class:`~optuna.samplers` module provides built-in independent samplers
such as :class:`~optuna.samplers.RandomSampler` and
:class:`~optuna.samplers.TPESampler`.

warn_independent_sampling:
If this is :obj:`True`, a warning message is emitted when
the value of a parameter is sampled by using an independent sampler.

Note that the parameters of the first trial in a study are sampled via an
independent sampler in most cases, so no warning messages are emitted in such cases.

warn_asyncronous_seeding:
If this is :obj:`True`, a warning message is emitted when the scrambling
(randomization) is applied to the QMC sequence and the random seed of the sampler is
not set manually.

.. note::
When using parallel and/or distributed optimization without manually
setting the seed, the seed is set randomly for each instances of
:class:`~optuna.samplers.QMCSampler` for different workers, which ends up
asyncronous seeding for multiple samplers used in the optimization.

.. seealso::
See parameter ``seed`` in :class:`~optuna.samplers.QMCSampler`.


Raises:
ValueError:
If ``qmc_type`` is not one of 'halton' and 'sobol`.


Example:

Optimize a simple quadratic function by using :class:`~optuna.samplers.QMCSampler`.

.. testcode::

import optuna


def objective(trial):
x = trial.suggest_float("x", -1, 1)
y = trial.suggest_int("y", -1, 1)
return x ** 2 + y


sampler = optuna.samplers.QMCSampler()
study = optuna.create_study(sampler=sampler)
study.optimize(objective, n_trials=8)

"""

def __init__(
self,
*,
qmc_type: str = "halton",
scramble: bool = False, # default is False for simplicity in distributed environment.
seed: Optional[int] = None,
independent_sampler: Optional[BaseSampler] = None,
warn_asyncronous_seeding: bool = True,
warn_independent_sampling: bool = True,
) -> None:

version = sys.version_info
if version < (3, 7, 0):
version_txt = str(version[0]) + "." + str(version[1]) + "." + str(version[2])
message = (
f"`QMCSampler` is not supported for Python {version_txt}. "
"Consider using Python 3.7 or later."
)
raise ValueError(message)

self._scramble = scramble
self._seed = seed or np.random.PCG64().random_raw()
self._independent_sampler = independent_sampler or optuna.samplers.RandomSampler(seed=seed)
self._initial_search_space: Optional[Dict[str, BaseDistribution]] = None
self._warn_independent_sampling = warn_independent_sampling

kstoneriv3 marked this conversation as resolved.
Show resolved Hide resolved
if qmc_type in ("halton", "sobol"):
self._qmc_type = qmc_type
else:
message = (
f'The `qmc_type`, "{qmc_type}", is not a valid. '
'It must be one of "halton" and "sobol".'
)
raise ValueError(message)

if seed is None and scramble and warn_asyncronous_seeding:
# Sobol/Halton sequences without scrambling do not use seed.
self._log_asyncronous_seeding()

def reseed_rng(self) -> None:

# We must not reseed the `self._seed` like below. Otherwise, workers will have different
# seed under parallel execution because `self.reseed_rng()` is called when starting each
# parallel executor.
# >>> self._seed = np.random.MT19937().random_raw()

self._independent_sampler.reseed_rng()

def infer_relative_search_space(
HideakiImamura marked this conversation as resolved.
Show resolved Hide resolved
self, study: Study, trial: FrozenTrial
) -> Dict[str, BaseDistribution]:

if self._initial_search_space is not None:
return self._initial_search_space

past_trials = study.get_trials(deepcopy=False, states=_SUGGESTED_STATES)
# The initial trial is sampled by the independent sampler.
if len(past_trials) == 0:
return {}
# If an initial trial was already made,
# construct search_space of this sampler from the initial trial.
first_trial = min(past_trials, key=lambda t: t.number)
self._initial_search_space = self._infer_initial_search_space(first_trial)
return self._initial_search_space

def _infer_initial_search_space(self, trial: FrozenTrial) -> Dict[str, BaseDistribution]:

search_space: Dict[str, BaseDistribution] = {}
for param_name, distribution in trial.distributions.items():
if isinstance(distribution, CategoricalDistribution):
continue
search_space[param_name] = distribution

return search_space

@staticmethod
def _log_asyncronous_seeding() -> None:
_logger.warning(
"No seed is provided for `QMCSampler` and the seed is set randomly. "
"If you are running multiple `QMCSampler`s in parallel and/or distributed "
" environment, the same seed must be used in all samplers to ensure that resulting "
"samples are taken from the same QMC sequence. "
)

def _log_independent_sampling(self, trial: FrozenTrial, param_name: str) -> None:
_logger.warning(
f"The parameter '{param_name}' in trial#{trial.number} is sampled independently "
"by using `{self._independent_sampler.__class__.__name__}` instead of `QMCSampler` "
kstoneriv3 marked this conversation as resolved.
Show resolved Hide resolved
"(optimization performance may be degraded). "
"`QMCSampler` does not support dynamic search space or `CategoricalDistribution`. "
"You can suppress this warning by setting `warn_independent_sampling` "
"to `False` in the constructor of `QMCSampler`, "
"if this independent sampling is intended behavior."
)

def sample_independent(
self,
study: Study,
trial: FrozenTrial,
param_name: str,
param_distribution: BaseDistribution,
) -> Any:

if self._initial_search_space is not None:
kstoneriv3 marked this conversation as resolved.
Show resolved Hide resolved
if self._warn_independent_sampling:
self._log_independent_sampling(trial, param_name)

return self._independent_sampler.sample_independent(
study, trial, param_name, param_distribution
)

def sample_relative(
self, study: Study, trial: FrozenTrial, search_space: Dict[str, BaseDistribution]
) -> Dict[str, Any]:

if search_space == {}:
return {}

sample = self._sample_qmc(study, search_space)
trans = _SearchSpaceTransform(search_space)
sample = trans.bounds[:, 0] + sample * (trans.bounds[:, 1] - trans.bounds[:, 0])
return trans.untransform(sample[0, :])

def after_trial(
self,
study: "optuna.Study",
trial: "optuna.trial.FrozenTrial",
state: TrialState,
values: Optional[Sequence[float]],
) -> None:
self._independent_sampler.after_trial(study, trial, state, values)

kstoneriv3 marked this conversation as resolved.
Show resolved Hide resolved
def _sample_qmc(self, study: Study, search_space: Dict[str, BaseDistribution]) -> np.ndarray:

# Lazy import because the `scipy.stats.qmc` is slow to import.
qmc_module = _LazyImport("scipy.stats.qmc")

sample_id = self._find_sample_id(study, search_space)
d = len(search_space)

if self._qmc_type == "halton":
qmc_engine = qmc_module.Halton(d, seed=self._seed, scramble=self._scramble)
elif self._qmc_type == "sobol":
qmc_engine = qmc_module.Sobol(d, seed=self._seed, scramble=self._scramble)
else:
raise ValueError("Invalid `qmc_type`")

forward_size = sample_id # `sample_id` starts from 0.
qmc_engine.fast_forward(forward_size)
kstoneriv3 marked this conversation as resolved.
Show resolved Hide resolved
sample = qmc_engine.random(1)
kstoneriv3 marked this conversation as resolved.
Show resolved Hide resolved

return sample

def _find_sample_id(self, study: Study, search_space: Dict[str, BaseDistribution]) -> int:

qmc_id = ""
qmc_id += self._qmc_type
# Sobol/Halton sequences without scrambling do not use seed.
if self._scramble:
qmc_id += f" (scramble=True, seed={self._seed})"
else:
qmc_id += " (scramble=False)"
key_qmc_id = qmc_id + "'s last sample id"
Comment on lines +310 to +317
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@keisuke-umezawa

Even if multiple QMC samplers exist in one study, the search space they create is the same as it is created by looking at the first trial. Therefore, we can assume that the same QMC sampler is assigned to a certain study.

I am wondering if we should leave the dependence of key_qmc_id on self._qmc_type, self._scramble, and self._seed. Let me know if this is what you intended!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, I meant that we can use some constant string such as qmc_sampler:sample_id
e.g. https://github.com/optuna/optuna/blob/master/optuna/integration/botorch.py#L471-L473

Why do we need to create a different key for each scramble and seed pair?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the current implementation, users can use different scramble options and seed (of QMC sequence, not random seed) in different workers in distributed settings. Maybe it makes sense to warn users if they used different scramble and seed because it is unlikely to be an intended use case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I understand. But, if we really want to warn it to users, we may have better way to share the information. We do not need to add it in this PR, but we can update it in the future.


# TODO(kstoneriv3): Here, we ideally assume that the following block is
# an atomic transaction. Without such an assumption, the current implementation
# only ensures that each `sample_id` is sampled at least once.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, it is not an atomic transaction. Is it okay?

Copy link
Contributor Author

@kstoneriv3 kstoneriv3 Nov 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Atomicity is unnecessary for single-thread optimization but in parallel / distributed cases, it makes a difference. As long as the evaluation of function takes a much longer time than the access to the storage here, the transaction becomes "almost" atomic. Experimentally, these operations are almost atomic unless the objective function can be evaluated really quickly and the number of workers is more than 100. If this is not atomic, multiple workers will end up evaluating the same sample_id (therefore, same hyperparameters), but this does not result in skip of sample_id anyways. Therefore, we don't actually need atomicity, but without atomicity, the performance of optimization might decay due to duplicated evaluation of the hyperparameters.

system_attrs = study._storage.get_study_system_attrs(study._study_id)
if key_qmc_id in system_attrs.keys():
sample_id = system_attrs[key_qmc_id]
sample_id += 1
else:
sample_id = 0
study._storage.set_study_system_attr(study._study_id, key_qmc_id, sample_id)

return sample_id
4 changes: 3 additions & 1 deletion setup.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
import os
import sys
from typing import Dict
from typing import List
from typing import Optional
Expand Down Expand Up @@ -34,7 +35,8 @@ def get_install_requires() -> List[str]:
"colorlog",
"numpy",
"packaging>=20.0",
"scipy!=1.4.0",
# TODO(kstoneriv3): remove this after deprecation of Python 3.6
"scipy!=1.4.0" if sys.version[:3] == "3.6" else "scipy>=1.7.0",
"sqlalchemy>=1.1.0",
"tqdm",
"PyYAML", # Only used in `optuna/cli.py`.
Expand Down