Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce the time complexity of HSSP 2d from O(NK^2 log K) to O((N - K)K) #5346

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

nabenabe0928
Copy link
Collaborator

@nabenabe0928 nabenabe0928 commented Mar 19, 2024

Motivation

As the bottleneck of MO-TPE is HSSP, I enhanced the time complexity of HSSP for 2D from $O(NK^2 \log K)$ to $O((N-K)K)$ where $N$ is the number of trials to be considered and $K$ is the subset size.
I further speeded up the implementation by using numpy vectorization.

Description of the changes

  • The algorithm of HSSP for 2d setups, and
  • The speedup by numpy vectorization.

Another breaking change is in the reproducibility when study has inf, -inf, and nan.
However, this is actually a bug in the original implementation because the original implementation gets nan for inf hypervolume contribution.
It happened because hypervolume contributions were incrementally calculated by doing inf - inf = nan.
Meanwhile, the new implementation correctly calculates inf hypervolume contributions.
You can check such behavior using:

import numpy as np


solutions = [[-np.inf, np.inf], [0, 1], [1, 0], [np.inf, -np.inf]]

In this example, the reference point is [np.inf, np.inf], so the hypervolumes for solutions[0] and solutions[3] cannot be computed because they exhibit nan from (np.inf - np.inf) * np.inf.
On the other hand, the hypervolumes for solutions[1] and solutions[2] are inf.

Verification Code
import time

import numpy as np

from optuna._hypervolume.hssp import _solve_hssp


def _is_pareto_front_2d(unique_lexsorted_loss_values: np.ndarray) -> np.ndarray:
    n_trials = unique_lexsorted_loss_values.shape[0]
    cummin_value1 = np.minimum.accumulate(unique_lexsorted_loss_values[:, 1])
    on_front = np.ones(n_trials, dtype=bool)
    on_front[1:] = cummin_value1[1:] < cummin_value1[:-1]
    return on_front


def _is_pareto_front_for_unique_sorted(unique_lexsorted_loss_values: np.ndarray) -> np.ndarray:
    (n_trials, n_objectives) = unique_lexsorted_loss_values.shape
    if n_objectives == 1:
        on_front = np.zeros(len(unique_lexsorted_loss_values), dtype=bool)
        on_front[0] = True
        return on_front
    elif n_objectives == 2:
        return _is_pareto_front_2d(unique_lexsorted_loss_values)
    else:
        return _is_pareto_front_nd(unique_lexsorted_loss_values)


def is_pareto_front(loss_values: np.ndarray, assume_unique_lexsorted: bool = True) -> np.ndarray:
    if assume_unique_lexsorted:
        return _is_pareto_front_for_unique_sorted(loss_values)

    unique_lexsorted_loss_values, order_inv = np.unique(loss_values, axis=0, return_inverse=True)
    on_front = _is_pareto_front_for_unique_sorted(unique_lexsorted_loss_values)
    return on_front[order_inv]


def _solve_hssp_2d(
    rank_i_loss_vals: np.ndarray,
    rank_i_indices: np.ndarray,
    subset_size: int,
    reference_point: np.ndarray,
) -> np.ndarray:
    # The time complexity is O(subset_size * rank_i_loss_vals.shape[0]).
    assert rank_i_loss_vals.shape[-1] == 2 and subset_size <= rank_i_loss_vals.shape[0]
    n_trials = rank_i_loss_vals.shape[0]
    order = np.argsort(rank_i_loss_vals[:, 0])
    sorted_loss_vals = rank_i_loss_vals[order]
    rectangle_diagonal_points = np.repeat(reference_point[np.newaxis, :], n_trials, axis=0)

    subset_indices = np.zeros(subset_size, dtype=int)
    for i in range(subset_size):
        contribs = np.prod(rectangle_diagonal_points - sorted_loss_vals, axis=-1)
        max_index = np.argmax(contribs)
        subset_indices[i] = rank_i_indices[order[max_index]]
        rectangle_diagonal_points[: max_index + 1, 0] = np.minimum(
            sorted_loss_vals[max_index, 0], rectangle_diagonal_points[: max_index + 1, 0]
        )
        rectangle_diagonal_points[max_index:, 1] = np.minimum(
            sorted_loss_vals[max_index, 1], rectangle_diagonal_points[max_index:, 1]
        )

        keep = np.ones(n_trials - i, dtype=bool)
        keep[max_index] = False
        # Remove the chosen point.
        order = order[keep]
        rectangle_diagonal_points = rectangle_diagonal_points[keep]
        sorted_loss_vals = sorted_loss_vals[keep]

    return subset_indices


if __name__ == "__main__":
    rng = np.random.RandomState()
    for _ in range(1000):
        Y = rng.normal(size=(10000, 2)) + 10.0
        Y = np.unique(Y, axis=0)
        on_front = is_pareto_front(loss_values=Y)
        Y_pareto = Y[on_front]
        reference_point = np.max(Y, axis=0) * 1.1
        n_sols = Y_pareto.shape[0]

        start = time.time()
        ans = _solve_hssp(Y_pareto.copy(), np.arange(n_sols), int(n_sols * 0.3), reference_point.copy())
        runtime1 = time.time() - start

        start = time.time()
        output = _solve_hssp_2d(Y_pareto.copy(), np.arange(n_sols), int(n_sols * 0.3), reference_point.copy())
        runtime2 = time.time() - start
        print(runtime1 / runtime2)
        assert np.all(ans == output)

The figure below shows the solutions ($v_i$ where $v_{1,1} \leq v_{2,1} \leq v_{3,1} \leq v_{4,1}$ and $v_{1,2} \geq v_{2,2} \geq v_{3,2} \geq v_{4,2}$. Recall that $\{v_i\}$ is a non-dominated set) and their diagonal points ($d_i$).
Note that I defined $d_i$ (pink dots) so that the hypervolume contribution (the gray rectangular) of the $i$-th solution becomes $|v_i - d_i |^2$ given a current state.
In the first figure, we have not picked any point.
In the second figure, we picked $v_2$ as a member of the subset.
Then the diagonal point for $v_1$ becomes $d_1 \leftarrow [\min(v_{2,1}, d_{1,1}), d_{1, 2}]$ and the diagonal points for $v_3, v_4$ become $d_3 \leftarrow [d_{3,1}, \min(v_{2, 2}, d_{3,2})]$ and $d_4 \leftarrow [d_{4,1}, \min(v_{2, 2}, d_{4,2})]$.
In the third figure, we picked $v_3$ as another member of the subset.
Then the diagonal point for $v_1$ becomes $d_1 \leftarrow [\min(v_{3,1}, d_{1,1}), d_{1, 2}]$ and the diagonal point for $v_4$ becomes $d_4 \leftarrow [d_{4,1}, \min(v_{3, 2}, d_{4,2})]$.

In principle, when $v_i$ is picked, $d_j$ will be updated as $d_j \leftarrow [\min(v_{i,1}, d_{j,1}), d_{j,2}]$ for $j &lt; i$ and $d_j \leftarrow [d_{j,1}, \min(v_{i,2}, d_{j,2})]$ for $j &gt; i$.

These updates can be done by $O(N)$ and we need to repeat it $K$ times, so the time complexity is $O(NK)$.

image

@nabenabe0928
Copy link
Collaborator Author

nabenabe0928 commented Mar 19, 2024

Benchmarking using MO-TPE:

Benchmarking Code
from __future__ import annotations

import time

import optuna


def objective2d(trial: optuna.Trial) -> tuple[float, float]:
    x = trial.suggest_float("x", -5, 5)
    y = trial.suggest_float("y", -5, 5)
    return x ** 2 + y ** 2, (x - 2) ** 2 + (y - 2) ** 2


if __name__ == "__main__":
    start = time.time()
    # optuna.logging.set_verbosity(optuna.logging.CRITICAL)
    n_trials = 1000
    sampler = optuna.samplers.TPESampler(seed=42)
    study = optuna.create_study(sampler=sampler, directions=["minimize"]*2)
    study.optimize(objective2d, n_trials=n_trials)
    print(f"n_obj=2, {study.sampler}", time.time() - start)

The results are here:

This PR Original
53 121

Note that the unit of the table is seconds.

Copy link

codecov bot commented Mar 19, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 89.66%. Comparing base (181d65f) to head (75d7b6d).
Report is 90 commits behind head on master.

❗ Current head 75d7b6d differs from pull request most recent head 94a5770. Consider uploading reports for the commit 94a5770 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5346      +/-   ##
==========================================
+ Coverage   89.52%   89.66%   +0.13%     
==========================================
  Files         194      210      +16     
  Lines       12626    13309     +683     
==========================================
+ Hits        11303    11933     +630     
- Misses       1323     1376      +53     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@nabenabe0928 nabenabe0928 changed the title Reduce the time complexity of HSSP 2d from O(NK^2 log K) to O(NK) Reduce the time complexity of HSSP 2d from O(NK^2 log K) to O((N - K)K) Mar 20, 2024
@nabenabe0928 nabenabe0928 added this to the v4.0.0 milestone Mar 20, 2024
@nabenabe0928 nabenabe0928 added the enhancement Change that does not break compatibility and not affect public interfaces, but improves performance. label Mar 20, 2024
@eukaryo
Copy link
Collaborator

eukaryo commented Mar 21, 2024

@gen740 @contramundum53 Could you review this PR?

@nabenabe0928
Copy link
Collaborator Author

@contramundum53 I unassigned you as discussed online for now!

Copy link
Contributor

github-actions bot commented Apr 2, 2024

This pull request has not seen any recent activity.

@github-actions github-actions bot added the stale Exempt from stale bot labeling. label Apr 2, 2024
@nabenabe0928 nabenabe0928 removed the stale Exempt from stale bot labeling. label Apr 3, 2024
@HideakiImamura
Copy link
Member

@eukaryo Could you review this PR?

@eukaryo
Copy link
Collaborator

eukaryo commented Apr 10, 2024

I have interpreted HSSP as referring to the Hypervolume Subset Selection Problem.

@nabenabe0928
Copy link
Collaborator Author

I have interpreted HSSP as referring to the Hypervolume Subset Selection Problem.

Yes, HSSP is the Hypervolume Subset Selection Problem.

Copy link
Collaborator

@eukaryo eukaryo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I have reviewed the algorithm and your implementation. The logic is sound and the code is well-written.
While the inherent complexity of the algorithm naturally leads to challenging code, your implementation is clear and relatively easy to follow.

@eukaryo eukaryo removed their assignment Apr 10, 2024
Copy link
Contributor

This pull request has not seen any recent activity.

@github-actions github-actions bot added the stale Exempt from stale bot labeling. label Apr 17, 2024
@nabenabe0928 nabenabe0928 removed the stale Exempt from stale bot labeling. label Apr 18, 2024
@eukaryo eukaryo assigned not522 and unassigned gen740 Apr 23, 2024
@eukaryo
Copy link
Collaborator

eukaryo commented Apr 23, 2024

@not522 Could you review this PR?

Copy link
Contributor

This pull request has not seen any recent activity.

@github-actions github-actions bot added the stale Exempt from stale bot labeling. label Apr 30, 2024
@not522 not522 removed the stale Exempt from stale bot labeling. label Apr 30, 2024
Copy link
Contributor

github-actions bot commented May 8, 2024

This pull request has not seen any recent activity.

@github-actions github-actions bot added the stale Exempt from stale bot labeling. label May 8, 2024
@not522 not522 removed the stale Exempt from stale bot labeling. label May 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Change that does not break compatibility and not affect public interfaces, but improves performance.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants