Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a super quick f-ANOVA algorithm named PED-ANOVA #5212

Merged
merged 41 commits into from Feb 20, 2024

Conversation

nabenabe0928
Copy link
Collaborator

Motivation

As f-ANOVA is quite slow when we have many trials, so I implemented a fast algorithm for f-ANOVA.

Paper: PED-ANOVA: Efficiently Quantifying Hyperparameter Importance in Arbitrary Subspaces

Description of the changes

I added a super quick f-ANOVA algorithm.

I compared the runtimes of PED-ANOVA with the Cython implementation of f-ANOVA:

f-ANOVA PED-ANOVA
dim=2, n_trials=100 0.06 0.04
dim=2, n_trials=1000 0.7 0.01
dim=2, n_trials=10000 16.09 0.04
dim=8, n_trials=100 0.09 0.02
dim=8, n_trials=1000 1.6 0.02
dim=8, n_trials=10000 92.7 0.1
dim=32, n_trials=100 0.2 0.06
dim=32, n_trials=1000 2.3 0.1
dim=32, n_trials=10000 93.1 0.44

* Separate a file for efficient Parzen estimator
* Raise value error when there are 2< samples better than baseline
* Fix an import error and remove zero weights from Parzen estimator
* Rename step to n_steps
* Add a small number to categorical weights to avoid numerical errors
* Rename efficient pe to scott pe
* Add documentation for PED-ANOVA
* Keep only quantile filters for now
@github-actions github-actions bot added the optuna.importance Related to the `optuna.importance` submodule. This is automatically labeled by github-actions. label Jan 29, 2024
@eukaryo
Copy link
Collaborator

eukaryo commented Jan 30, 2024

It is just a quick note; I wrote a simple example code and confirmed that the current PR functions as expected.

import time

import optuna


def objective(trial):
    xs = [trial.suggest_float(f"x_{d}", -5.0, 5.0) for d in range(4)]
    norm = sum([x_i * x_i for x_i in xs]) ** 0.5
    if norm <= 1:
        ws = [5.0**a for a in [-3, 0, -1, -2]]
    else:
        ws = [5.0**a for a in [0, -1, -2, -3]]
    return sum([ws[i] * xs[i] * xs[i] for i in range(4)])


sampler = optuna.samplers.TPESampler(seed=12345)
study = optuna.create_study(directions=["minimize"], sampler=sampler)
study.optimize(objective, n_trials=100)
print("")
print(f"{study.best_trial._params=}")

t1 = time.time()
print("")
print(f"{optuna.importance.FanovaImportanceEvaluator(seed=12345).evaluate(study)=}")
t2 = time.time()
print("")
print(
    f"{optuna.importance.PedAnovaImportanceEvaluator(is_lower_better=True).evaluate(study)=}"
)
t3 = time.time()
print("")
print(f"(not fast) F-Anova consumed {t2-t1} seconds.")
print(f"         PED-Anova consumed {t3-t2} seconds.")

Copy link

codecov bot commented Feb 1, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (60498c5) 89.38% compared to head (17006ee) 89.37%.
Report is 189 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5212      +/-   ##
==========================================
- Coverage   89.38%   89.37%   -0.01%     
==========================================
  Files         206      209       +3     
  Lines       15097    13119    -1978     
==========================================
- Hits        13494    11725    -1769     
+ Misses       1603     1394     -209     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@nabenabe0928
Copy link
Collaborator Author

Add the documentation here.

@nabenabe0928
Copy link
Collaborator Author

Followup: Add a tutorial of how evaluate_on_local results in different results.

@HideakiImamura HideakiImamura added the feature Change that does not break compatibility, but affects the public interfaces. label Feb 16, 2024
@HideakiImamura HideakiImamura added this to the v3.6.0 milestone Feb 16, 2024
Copy link
Member

@HideakiImamura HideakiImamura left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update. Almost, LGTM! I suggest a minor comment for the file name. PTAL.

@nabenabe0928
Copy link
Collaborator Author

Handle #5212 (comment) and experimental warning

specified search_space. `evaluate_on_local=True` is especially useful when users
modify search space during optimization.

Example:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please reduce the indentation by one level? In the generated document, the example code is included in the arguments section.

Screenshot 2024-02-19 at 16 58 06

@eukaryo
Copy link
Collaborator

eukaryo commented Feb 19, 2024

LGTM!
スクリーンショット 2024-02-19 175005

@eukaryo eukaryo removed their assignment Feb 19, 2024
Copy link
Member

@HideakiImamura HideakiImamura left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@HideakiImamura
Copy link
Member

Thank you for your long running work! Let me merge this PR.

@HideakiImamura HideakiImamura merged commit beacca8 into optuna:master Feb 20, 2024
20 checks passed
@nabenabe0928 nabenabe0928 deleted the feat/add-ped-anova branch April 11, 2024 17:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Change that does not break compatibility, but affects the public interfaces. optuna.importance Related to the `optuna.importance` submodule. This is automatically labeled by github-actions.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants