New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unify the univariate and multivariate TPE #2618
Unify the univariate and multivariate TPE #2618
Conversation
Codecov Report
@@ Coverage Diff @@
## master #2618 +/- ##
==========================================
- Coverage 91.66% 91.64% -0.03%
==========================================
Files 140 139 -1
Lines 11522 11311 -211
==========================================
- Hits 10562 10366 -196
+ Misses 960 945 -15
Continue to review full report at Codecov.
|
This pull request has not seen any recent activity. |
I took a benchmark between this PR and the current master. In summary, the changes made by this PR do not significantly impair the performance of the algorithm. Environments:optuna: this PR (9d25958) and the current master (be407fd) Each algorithm was run 100 times with the same settings, and the mean and variance of the performance were plotted. ResultsWith NopPruner
With MedianPruner
With HyperbandPruner
|
@HideakiImamura Could you please assign another reviewer instead of me? |
@c-bata Sure! Let me remove the assignment. |
Assigned @keisuke-umezawa ! |
Could you reduce the diff now that #2615 is merged? |
Sorry, could you merge the latest master again? I just merge #2616 . |
@hvy Thanks for your careful reviews. I updated the code following your suggestions. PTAL. I take a benchmark with the NAS bench (C) to test the floating value. |
@hvy @keisuke-umezawa I have benchmarked and updated the results using the latest version, modified based on the discussion during the previous mob review. There appears to be generally no difference between this PR and master. PTAL. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for the updated benchmarks and plots. They look promising 👍
@hvy Thank you for your helpful and insightful reviews. I updated the code following your suggestions.
PTAL. |
Note: Currently, the _ParzenEstimator calculates mus, sigmas, and weights, taking into account observations outside of [low, high] during init. The restriction to the [low, high] range is done when the sample method is called. This mechanism gives unexpected results in the estimation of sigmas. This is due to the fact that when calculating the sigmas in _calculate_numerical_params, the mus is sorted and then the low and high are added. This can be avoided by adding low and high and then sorting mus, but changing the algorithm is out of the scope of this PR and will not be addressed here. We will leave this issue for future work. |
# If ``multivariate`` = True and ``group`` = True, we ignore the trials that are not | ||
# included in each subspace. | ||
# If ``multivariate`` = False, we consider such trials. | ||
if multivariate and any([param_name not in trial.params for param_name in param_names]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my understanding, the procedure of calculation does not depend on multivariate
, because multivariate
is always true when any([param_name not in trial.params for param_name in param_names])
is true. If my understanding is correct, I think we can omit multivariate
from the arguments of this method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the confusion. Here, we want to check any([param_name not in trial.params for param_name in param_names])
only when multivariate = True
. In the current master, we don't check any([param_name not in trial.params for param_name in param_names])
when multivariate = False
, so for the consistency, I include multivariate
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the code comment like: If multivariate = False, we skip the check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other than the calculation of sigmas, LGTM! I will check it again after the discussion of sigmas is settled.
sorted_mus_with_endpoints = np.asarray([], dtype=float) | ||
prior_mu = 0.5 * (low + high) | ||
prior_sigma = 1.0 * (high - low) | ||
|
||
if consider_prior: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can decrease the number of nest from 2 to 1 by the following code.
if consider_prior:
mus = np.empty(n_observations + 1)
mus[:n_observations] = observations
mus[n_observations] = prior_mu
sigmas = np.empty(n_observations + 1)
sigmas[n_observations] = prior_sigma
else:
mus = observations
sigmas = np.empty(n_observations)
if multivariate:
assert sigmas0 is not None
sigmas[:n_observations] = sigmas0 * (high - low)
else:
assert sigmas0 is None
sorted_indices = np.argsort(mus)
sorted_mus = mus[sorted_indices]
sorted_mus_with_endpoints = np.empty(len(mus) + 2, dtype=float)
sorted_mus_with_endpoints[0] = low
sorted_mus_with_endpoints[1:-1] = sorted_mus
sorted_mus_with_endpoints[-1] = high
sorted_sigmas = np.maximum(
sorted_mus_with_endpoints[1:-1] - sorted_mus_with_endpoints[0:-2],
sorted_mus_with_endpoints[2:] - sorted_mus_with_endpoints[1:-1],
)
sigmas[:n_observations] = sorted_sigmas[np.argsort(sorted_indices)]
If it is better for readability, could you use the above snippet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great. Thanks for your insightful review!
@keisuke-umezawa Thanks for your reviews! I updated the codes following your suggestions. PTAL. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates. The code around the mus and sigmas are simpler now.
By the way, are the benchmarks difficult to run? I'm wondering if we should run another round since there has been some changes.
@hvy Thanks for your review! I applied all of your suggestions. PTAL. By the way, I can re-run the benchmark experiments easily. This PR will be merged after the release of v2.8.0, so I think we should have the latest benchmark result. |
Sounds great, let's do that. 👍 |
Note: Benchmarks have been updated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thank you for the long PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work and thanks, LGTM given the benchmarks and the many discussions!
Updated the label to |
Depends on #2615 and #2616.
Part of works for #2614.
Motivation
The univariate TPE is a special case of the multivariate TPE, but the implementation of them in Optuna is overwrapped now. This PR aims to resolve the redundancy.
It seems to be difficult to support full backward compatibility including the behavior when the seed is fixed. The reason is that mus, sigmas, and weights in multivariate TPE are arranged in the order of observation, while those in current univariate TPE are arranged in the order of ascending mus. In this PR, we will adapt our logic to the multivariate TPE. We will verify through benchmarking experiments that the performance of the univariate TPE is not significantly impaired by this change.
Description of the changes
TODOs
Benchmark Results
I took a benchmark between this PR and the current master. In summary, the changes made by this PR do not significantly impair the performance of the algorithm.
Environments:
optuna: this PR (9d25958) and the current master (be407fd)
python: 3.8
kurobako: 0.2.9
algorithms: multivariate-tpe-master-PRUNER, multivariate-tpe-this-PR-PRUNER, tpe-master-PRUNER, tpe-this-PR-PRUNER
Each algorithm was run 100 times with the same settings, and the mean and variance of the performance were plotted.
Results
With NopPruner
With MedianPruner
With HyperbandPruner