Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move validation logic from _run_trial to study.tell #3144

Merged
merged 102 commits into from Apr 6, 2022

Conversation

himkt
Copy link
Member

@himkt himkt commented Dec 3, 2021

🔗 #3132

This PR makes the behavior of optuna tell consistent with study.optimize when an objective value is nan or a list containing nan. Inside study.optimize, the case observing nan as an objective values is considered as a special case that Optuna doesn't raise an exception and make trial.state failed and show a log message. For Optuna CLI, I think it would be natural to behave as in study.optimize.

values, values_conversion_failure_message = _check_and_convert_to_values(
len(study.directions), value_or_values, trial.number
)
if values_conversion_failure_message is not None:
state = TrialState.FAIL
else:
state = TrialState.COMPLETE

elif values_conversion_failure_message is not None:
_logger.warning(values_conversion_failure_message)

Motivation

  • Make optuna tell consistent with study.optimize
    • One may think optuna tell should only be consistent with study.tell . Any feedback is highly appreciated
      • If so, we should guide users to run optuna tell ... --state fail when they observe nan
  • When nan is passed to optuna tell, Optuna makes a trial state FAILED

Description of the changes

  • abf53a6 Add test case
  • 84413f7 Invoke _check_and_convert_to_values in _Tell.take_action

Behavior

> optuna create-study --storage sqlite:///example.db --study-name test
[I 2021-12-03 22:09:54,814] A new study created in RDB with name: test
test
> optuna ask --storage sqlite:///sample.db --study-name test
/home/himkt/work/github.com/himkt/optuna/optuna/cli.py:736: ExperimentalWarning: 'ask' is an experimental CLI command. The interface can change in the future.
  warnings.warn(
[I 2021-12-03 22:10:01,697] A new study created in RDB with name: test
[I 2021-12-03 22:10:01,715] Asked trial 0 with parameters {}.
{"number": 0, "params": {}}
> optuna tell --study-name test --storage sqlite:///sample.db --trial-number 0 --values nan
/home/himkt/work/github.com/himkt/optuna/optuna/cli.py:852: ExperimentalWarning: 'tell' is an experimental CLI command. The interface can change in the future.
  warnings.warn(
[I 2021-12-03 22:10:09,387] Told trial 0 with values None and state TrialState.FAIL.
[W 2021-12-03 22:10:09,387] Trial 0 failed, because the objective function returned nan.
>>> import optuna
op>>> study = optuna.load_study(storage="sqlite:///sample.db", study_name="test")
>>> study.get_trials()
[FrozenTrial(number=0, values=None, datetime_start=datetime.datetime(2021, 12, 3, 22, 10, 1, 708719), datetime_complete=datetime.datetime(2021, 12, 3, 22, 10, 9, 371719), params={}, distributions={}, user_attrs={}, system_attrs={}, intermediate_values={}, trial_id=1, state=TrialState.FAIL, value=None)]
>>>

@codecov-commenter
Copy link

codecov-commenter commented Dec 3, 2021

Codecov Report

Merging #3144 (921b9be) into master (6fd45a6) will increase coverage by 0.03%.
The diff coverage is 95.31%.

@@            Coverage Diff             @@
##           master    #3144      +/-   ##
==========================================
+ Coverage   91.83%   91.86%   +0.03%     
==========================================
  Files         156      157       +1     
  Lines       12292    12316      +24     
==========================================
+ Hits        11288    11314      +26     
+ Misses       1004     1002       -2     
Impacted Files Coverage Δ
optuna/cli.py 20.47% <0.00%> (-0.10%) ⬇️
optuna/multi_objective/trial.py 91.03% <ø> (-0.19%) ⬇️
optuna/study/_tell.py 97.93% <97.93%> (ø)
optuna/multi_objective/study.py 97.58% <100.00%> (+0.03%) ⬆️
optuna/study/_optimize.py 98.63% <100.00%> (+0.32%) ⬆️
optuna/study/study.py 96.15% <100.00%> (+0.13%) ⬆️
optuna/samplers/nsgaii/_sampler.py 95.33% <0.00%> (-0.85%) ⬇️
optuna/storages/__init__.py 100.00% <0.00%> (ø)
optuna/visualization/_utils.py 98.46% <0.00%> (ø)
optuna/visualization/_contour.py 98.11% <0.00%> (ø)
... and 2 more

📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more

@himkt
Copy link
Member Author

himkt commented Dec 7, 2021

@g-votte and @hvy suggested to move the validation logic inside Study.tell. I read the code of tell and I'm not sure whether we should do or not. In the current logic of Study.tell, state is immutable and I feel this design is clean. But if we won't change this design, Study.tell and CLI optuna tell would be different or we would force users to handle nan values on their own.

if not isinstance(trial, (trial_module.Trial, int)):
raise TypeError("Trial must be a trial object or trial number.")
if state == TrialState.COMPLETE:
if values is None:
raise ValueError(
"No values were told. Values are required when state is TrialState.COMPLETE."
)
elif state in (TrialState.PRUNED, TrialState.FAIL):
if values is not None:
raise ValueError(
"Values were told. Values cannot be specified when state is "
"TrialState.PRUNED or TrialState.FAIL."
)
else:
raise ValueError(f"Cannot tell with state {state}.")
if isinstance(trial, trial_module.Trial):
trial_number = trial.number
trial_id = trial._trial_id
elif isinstance(trial, int):
trial_number = trial
try:
trial_id = self._storage.get_trial_id_from_study_id_trial_number(
self._study_id, trial_number
)
except NotImplementedError as e:
warnings.warn(
"Study.tell may be slow because the trial was represented by its number but "
f"the storage {self._storage.__class__.__name__} does not implement the "
"method required to map numbers back. Please provide the trial object "
"to avoid performance degradation."
)
trials = self.get_trials(deepcopy=False)
if len(trials) <= trial_number:
raise ValueError(
f"Cannot tell for trial with number {trial_number} since it has not been "
"created."
) from e
trial_id = trials[trial_number]._trial_id
except KeyError as e:
raise ValueError(
f"Cannot tell for trial with number {trial_number} since it has not been "
"created."
) from e
else:
assert False, "Should not reach."
frozen_trial = self._storage.get_trial(trial_id)
if state == TrialState.PRUNED:
# Register the last intermediate value if present as the value of the trial.
# TODO(hvy): Whether a pruned trials should have an actual value can be discussed.
assert values is None
last_step = frozen_trial.last_step
if last_step is not None:
values = [frozen_trial.intermediate_values[last_step]]
if values is not None:
values, values_conversion_failure_message = _check_and_convert_to_values(
len(self.directions), values, trial_number
)
# When called from `Study.optimize` and `state` is pruned, the given `values` contains
# the intermediate value with the largest step so far. In this case, the value is
# allowed to be NaN and errors should not be raised.
if state != TrialState.PRUNED and values_conversion_failure_message is not None:
raise ValueError(values_conversion_failure_message)
try:
# Sampler defined trial post-processing.
study = pruners._filter_study(self, frozen_trial)
self.sampler.after_trial(study, frozen_trial, state, values)
except Exception:
raise
finally:
if values is not None:
self._storage.set_trial_values(trial_id, values)
self._storage.set_trial_state(trial_id, state)

@hvy
Copy link
Member

hvy commented Dec 8, 2021

Thanks a lot @himkt for quickly addressing the issue with an RFC.

Just to clarify, do you mean that making state mutable (by defaulting to None), would make the behavior unnecessarily complicated, from a user-perspective, or from a code complexity point of view?
I was imagining moving https://github.com/optuna/optuna/blob/release-v3.0.0-a0/optuna/study/_optimize.py#L223-L230 to Study.tell and having all the validation done there. Study.optimize would merely add logging on top of that. Would you like me to prototype on that or have you already tried perhaps?

My understanding is that the validation logic (currently only done in Study.optimize) isn't trivial and is costly for the ask-and-tell users to implement, especially via the CLI. I personally find it acceptable trying to move more logic to Study.tell, even though it'd make the API/behavior more involved positioning it as a rather high-level API (but I also understand that having the specified state always written is simpler to use in some cases. Maybe we'd even go as far as introducing an additional "strict" option to control this behavior (edit: actually passing state with an actual state may act "strictly"), or exposing a different API publicly for value validation.). Perhaps starting with the CLI is a fine first approach, but I would argue that the interfaces are better designed kept consistent between the CLI and Study.tell at least for this topic.

@himkt himkt force-pushed the cli-tell-nan branch 2 times, most recently from 829b30b to c143c3a Compare December 13, 2021 15:08
@himkt
Copy link
Member Author

himkt commented Dec 13, 2021

Sorry for the inactive @hvy.

Just to clarify, do you mean that making state mutable (by defaulting to None), would make the behavior unnecessarily complicated, from a user-perspective, or from a code complexity point of view?

I said it by the point of code complexity. I was not sure how to tell a conversion failure to study._optimize.
But later I noticed we don't have to do so: it's enough to simply do logging in tell.

And I also found _check_and_convert_to_values is also called inside study.tell.
So, I simply tried to update it (many tests will fail. I'll check them) in c143c3a.

if values is not None:
values, values_conversion_failure_message = _check_and_convert_to_values(
len(self.directions), values, trial_number
)
# When called from `Study.optimize` and `state` is pruned, the given `values` contains
# the intermediate value with the largest step so far. In this case, the value is
# allowed to be NaN and errors should not be raised.
if state != TrialState.PRUNED and values_conversion_failure_message is not None:
raise ValueError(values_conversion_failure_message)

@hvy hvy self-assigned this Dec 15, 2021
@himkt himkt force-pushed the cli-tell-nan branch 3 times, most recently from 02955ec to 582dc5b Compare December 19, 2021 13:28
@himkt himkt changed the title [RFC] Adding validation for nan in optuna tell [RFC] Move validation logic from _run_trial to study.tell Dec 20, 2021
@hvy
Copy link
Member

hvy commented Dec 21, 2021

📝 Just had a discussion with @himkt outside this PR. This work is ready for review.
📝 Make sure that Study.tell's behavior is consistent with Study.optimize's return value handling from the objective function (best-effort, i.e. putting aside the catch parameter in Study.optimize which adds a layer of behavior control above the abstraction of Study.tell).

@HideakiImamura HideakiImamura self-assigned this Dec 23, 2021
@HideakiImamura HideakiImamura mentioned this pull request Dec 24, 2021
3 tasks
@HideakiImamura HideakiImamura added bug Issue/PR about behavior that is broken. Not for typos/examples/CI/test but for Optuna itself. optuna.study Related to the `optuna.study` submodule. This is automatically labeled by github-actions. v3 Issue/PR for Optuna version 3. labels Dec 24, 2021
Copy link
Member

@HideakiImamura HideakiImamura left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for an important PR to resolve #3132. Also, I think this PR partially resolve #3205. I agree with the idea to make sure that Study.tell's behavior is consistent with Study.optimize's return value handling.

I have a minor comment. Could you take a look?

optuna/study/study.py Outdated Show resolved Hide resolved
@HideakiImamura
Copy link
Member

I confirmed that this PR makes the behavior of Study.optimize and Study.tell consistent as follows.

code: This is the same as nan-tell.py provided in #3205.

import optuna
import numpy as np


# In-memory storage
study = optuna.create_study()
trial = study.ask()
try:
    study.tell(trial, float(np.nan))
except Exception as e:
    print(e)
finally:
    print(f"In-memory : Trial state is {study.trials[-1].state}")

# RDB storage (SQLite)
study = optuna.create_study(storage='sqlite:///../sqlite.db')
trial = study.ask()
try:
    study.tell(trial, float(np.nan))
except Exception as e:
    print(e)
finally:
    print(f"SQLite    : Trial state is {study.trials[-1].state}")

# RDB storage (MySQL)
study = optuna.create_study(storage="mysql+pymysql://root:test@localhost/mysql")
trial = study.ask()
try:
    study.tell(trial, float(np.nan))
except Exception as e:
    print(e)
finally:
    print(f"MySQL     : Trial state is {study.trials[-1].state}")

# RDB storage (PostgreSQL)
study = optuna.create_study(storage="postgresql+psycopg2://postgres:test@localhost:15432/postgres")
trial = study.ask()
try:
    study.tell(trial, float(np.nan))
except Exception as e:
    print(e)
finally:
    print(f"PostgreSQL: Trial state is {study.trials[-1].state}")

output:

(venv) mamu@HideakinoMacBook-puro 3205-3206 % python nan-tell.py
[I 2021-12-24 16:02:58,636] A new study created in memory with name: no-name-91d1fb55-6f9c-4c51-861a-42892b705424
[W 2021-12-24 16:02:58,637] Trial 0 failed, because the objective function returned nan.
In-memory : Trial state is TrialState.FAIL
[I 2021-12-24 16:02:58,709] A new study created in RDB with name: no-name-ccbae3aa-b661-4e01-913e-d2d489e4cb90
[W 2021-12-24 16:02:58,754] Trial 0 failed, because the objective function returned nan.
SQLite    : Trial state is TrialState.FAIL
[I 2021-12-24 16:02:58,907] A new study created in RDB with name: no-name-92237945-fb07-46b8-92cd-b0349805fb78
[W 2021-12-24 16:02:59,011] Trial 0 failed, because the objective function returned nan.
MySQL     : Trial state is TrialState.FAIL
[I 2021-12-24 16:02:59,322] A new study created in RDB with name: no-name-ffaba79b-3556-42da-b23a-29962316e788
[W 2021-12-24 16:02:59,439] Trial 0 failed, because the objective function returned nan.
PostgreSQL: Trial state is TrialState.FAIL
`

Copy link
Member

@hvy hvy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, the code becomes simpler with this change in my opinion. Left some comments still if you could take a look. 🙏

optuna/study/_optimize.py Outdated Show resolved Hide resolved
frozen_trial.state == TrialState.FAIL
and func_err is not None
and not isinstance(func_err, catch)
):
raise func_err
return trial
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can simplify the internal logic even further (at _optimize_sequential, the caller of _run_trial) now with the return value of Study.tell.

Suggested change
return trial
return frozen_trial

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please note that the suggested change here is just a part of my suggestion in the comment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the suggestion! It really makes sense to me.
aae3a54 aaf638d ad4e8b5 (ad4e8b5 6f413c6 for multi objective) updates the logic using FrozenTrial.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot, you can simplify it even further. There is a line in _optimize_sequential that obtains a frozen trial from the previously returned Trial. Now that we return a frozen trial, we can skip this line.

frozen_trial = copy.deepcopy(study._storage.get_trial(trial._trial_id))

optuna/study/_optimize.py Outdated Show resolved Hide resolved
optuna/study/_optimize.py Show resolved Hide resolved
optuna/study/study.py Show resolved Hide resolved
Comment on lines 633 to 628
if state is None and values is None:
self._storage.set_trial_state(trial_id, TrialState.FAIL)
_logger.warning("You must specify either state or values.")
return self._storage.get_trial(trial_id)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this always be executed without taking into account skip_if_finished ? Wondering if something like the following can replace.

Suggested change
if state is None and values is None:
self._storage.set_trial_state(trial_id, TrialState.FAIL)
_logger.warning("You must specify either state or values.")
return self._storage.get_trial(trial_id)
if state is None and values is None:
state = TrialState.FAIL

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logging message might be lacking information for the user, I think adding some context would help. (Besides from the comment above)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You suggestion is applied in adecbc4. I'll update message as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the doc in 6fa50b4. I'm not sure if it improves a usability so please give me a feedback. 🙏

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Sorry for my vague comment but I meant that we could omit the warning altogether, if we get rid of the early return statement (which you did).

optuna/study/study.py Outdated Show resolved Hide resolved
@himkt himkt force-pushed the cli-tell-nan branch 2 times, most recently from cb6e986 to 03d80ea Compare January 9, 2022 03:00
@himkt himkt marked this pull request as draft January 9, 2022 03:49
@himkt himkt marked this pull request as ready for review January 9, 2022 11:01
@himkt
Copy link
Member Author

himkt commented Jan 9, 2022

Thank you so much @hvy for the careful review. I revised my PR based on your comments/suggestions!

@hvy
Copy link
Member

hvy commented Jan 11, 2022

Thanks a lot, I'll go through the changes today!

Copy link
Member

@HideakiImamura HideakiImamura left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A minor comment

optuna/study/_optimize.py Outdated Show resolved Hide resolved
@himkt
Copy link
Member Author

himkt commented Apr 1, 2022

I talked with @HideakiImamura and decided to stop moving existing tests to test_study.py. There is need to discuss where to put tests for methods in Study (e.g. Study.optimize, Study.tell, ...) and it should be discussed in the another place (this PR is already huge). It is done in 474cae8 and bc2b690.

Copy link
Member

@HideakiImamura HideakiImamura left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update. I have several comments around tests. PTAL.

tests/study_tests/test_optimize.py Outdated Show resolved Hide resolved
@@ -912,6 +913,66 @@ def objective(trial: Trial) -> float:
assert states == []


@pytest.mark.parametrize("storage_mode", STORAGE_MODES)
def test_run_trial(storage_mode: str) -> None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this test should be placed in test_optimize.py since this does not call Study.optimize but _run_trial.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally right, sorry for my careless. 🙇 8d62268



@pytest.mark.parametrize("storage_mode", STORAGE_MODES)
def test_run_trial_automatically_fail(storage_mode: str) -> None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.



@pytest.mark.parametrize("storage_mode", STORAGE_MODES)
def test_run_trial_pruned(storage_mode: str) -> None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

himkt and others added 3 commits April 1, 2022 20:50
Co-authored-by: Hideaki Imamura <38826298+HideakiImamura@users.noreply.github.com>
@HideakiImamura HideakiImamura dismissed their stale review April 4, 2022 04:05

After my approval, the codes ware changed. I will re-review.

@himkt
Copy link
Member Author

himkt commented Apr 4, 2022

https://github.com/optuna/optuna/runs/5813743672?check_suite_focus=true#step:8:128
Tests passed in my local environment but CI isn't happy...

@himkt
Copy link
Member Author

himkt commented Apr 4, 2022

3bd7bd3 solves. @hvy @HideakiImamura PTAL. 🙏


# Test trial with invalid objective value: None
def func_none(_: Trial) -> float:
logging.enable_propagation()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line is repeated, a small careless.

Suggested change
logging.enable_propagation()

tests/study_tests/test_optimize.py Show resolved Hide resolved
@hvy
Copy link
Member

hvy commented Apr 5, 2022

I ran the local test script and it looked good too. 👍

Copy link
Member

@HideakiImamura HideakiImamura left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked all of the test cases. Thanks for the long running work. LGTM!

Copy link
Member

@hvy hvy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Thank you very much for the effort. 🙇
While code complexity is still somewhat high, it turned out a lot better than expected.
Tests look great in my opinion and should make future refactoring a lot easier.

There are some nits such as commenting about the in-memory system attribute for warning propagation, and maybe a request to check in unit tests that when Study.tell fails with e.g. ValueError, that a trial is not modified. These are not significant, or not something introduced by this PR.


if value is not None and math.isnan(value):
value = None
failure_message = f"The objective function returned {original_value}."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slightly misleading message for a user of Study.tell, but I think it's acceptable.

@hvy hvy merged commit 217dd5e into optuna:master Apr 6, 2022
@hvy hvy added this to the v3.0.0-b0 milestone Apr 6, 2022
@himkt
Copy link
Member Author

himkt commented Apr 7, 2022

Thank you so much for informative reviews @hvy. ❤️ + Thank you for the patience!
I created the followup #3454 to apply your comments. 🩹

@himkt himkt deleted the cli-tell-nan branch April 7, 2022 04:46
@HideakiImamura HideakiImamura removed the enhancement Change that does not break compatibility and not affect public interfaces, but improves performance. label Apr 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compatibility Change that breaks compatibility. optuna.study Related to the `optuna.study` submodule. This is automatically labeled by github-actions. v3 Issue/PR for Optuna version 3.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants