Optuna is compatible with most ML libraries, and it's easy to use Optuna with those. Please refer to examples.
There are two ways to realize it.
First, callable classes can be used for that purpose as follows:
import optuna
class Objective(object):
def __init__(self, min_x, max_x):
# Hold this implementation specific arguments as the fields of the class.
self.min_x = min_x
self.max_x = max_x
def __call__(self, trial):
# Calculate an objective value by using the extra arguments.
x = trial.suggest_float("x", self.min_x, self.max_x)
return (x - 2) ** 2
# Execute an optimization by using an `Objective` instance.
study = optuna.create_study()
study.optimize(Objective(-100, 100), n_trials=100)
Second, you can use lambda
or functools.partial
for creating functions (closures) that hold extra arguments. Below is an example that uses lambda
:
import optuna
# Objective function that takes three arguments.
def objective(trial, min_x, max_x):
x = trial.suggest_float("x", min_x, max_x)
return (x - 2) ** 2
# Extra arguments.
min_x = -100
max_x = 100
# Execute an optimization by using the above objective function wrapped by `lambda`.
study = optuna.create_study()
study.optimize(lambda trial: objective(trial, min_x, max_x), n_trials=100)
Please also refer to sklearn_addtitional_args.py example, which reuses the dataset instead of loading it in each trial execution.
Yes, it's possible.
In the simplest form, Optuna works with in-memory storage:
study = optuna.create_study()
study.optimize(objective)
If you want to save and resume studies, it's handy to use SQLite as the local storage:
study = optuna.create_study(study_name="foo_study", storage="sqlite:///example.db")
study.optimize(objective) # The state of `study` will be persisted to the local SQLite file.
Please see rdb
for more details.
There are two ways of persisting studies, which depends if you are using in-memory storage (default) or remote databases (RDB). In-memory studies can be saved and loaded like usual Python objects using pickle
or joblib
. For example, using joblib
:
study = optuna.create_study()
joblib.dump(study, "study.pkl")
And to resume the study:
study = joblib.load("study.pkl")
print("Best trial until now:")
print(" Value: ", study.best_trial.value)
print(" Params: ")
for key, value in study.best_trial.params.items():
print(f" {key}: {value}")
If you are using RDBs, see rdb
for more details.
By default, Optuna shows log messages at the optuna.logging.INFO
level. You can change logging levels by using optuna.logging.set_verbosity
.
For instance, you can stop showing each trial result as follows:
optuna.logging.set_verbosity(optuna.logging.WARNING)
study = optuna.create_study()
study.optimize(objective)
# Logs like '[I 2020-07-21 13:41:45,627] Trial 0 finished with value:...' are disabled.
Please refer to optuna.logging
for further details.
Optuna saves hyperparameter values with its corresponding objective value to storage, but it discards intermediate objects such as machine learning models and neural network weights. To save models or weights, please use features of the machine learning library you used.
We recommend saving optuna.trial.Trial.number
with a model in order to identify its corresponding trial. For example, you can save SVM models trained in the objective function as follows:
def objective(trial):
svc_c = trial.suggest_float("svc_c", 1e-10, 1e10, log=True)
clf = sklearn.svm.SVC(C=svc_c)
clf.fit(X_train, y_train)
# Save a trained model to a file.
with open("{}.pickle".format(trial.number), "wb") as fout:
pickle.dump(clf, fout)
return 1.0 - accuracy_score(y_valid, clf.predict(X_valid))
study = optuna.create_study()
study.optimize(objective, n_trials=100)
# Load the best model.
with open("{}.pickle".format(study.best_trial.number), "rb") as fin:
best_clf = pickle.load(fin)
print(accuracy_score(y_valid, best_clf.predict(X_valid)))
To make the parameters suggested by Optuna reproducible, you can specify a fixed random seed via seed
argument of ~optuna.samplers.RandomSampler
or ~optuna.samplers.TPESampler
as follows:
sampler = TPESampler(seed=10) # Make the sampler behave in a deterministic way.
study = optuna.create_study(sampler=sampler)
study.optimize(objective)
However, there are two caveats.
First, when optimizing a study in distributed or parallel mode, there is inherent non-determinism. Thus it is very difficult to reproduce the same results in such condition. We recommend executing optimization of a study sequentially if you would like to reproduce the result.
Second, if your objective function behaves in a non-deterministic way (i.e., it does not return the same value even if the same parameters were suggested), you cannot reproduce an optimization. To deal with this problem, please set an option (e.g., random seed) to make the behavior deterministic if your optimization target (e.g., an ML library) provides it.
Trials that raise exceptions without catching them will be treated as failures, i.e. with the ~optuna.trial.TrialState.FAIL
status.
By default, all exceptions except ~optuna.exceptions.TrialPruned
raised in objective functions are propagated to the caller of ~optuna.study.Study.optimize
. In other words, studies are aborted when such exceptions are raised. It might be desirable to continue a study with the remaining trials. To do so, you can specify in ~optuna.study.Study.optimize
which exception types to catch using the catch
argument. Exceptions of these types are caught inside the study and will not propagate further.
You can find the failed trials in log messages.
[W 2018-12-07 16:38:36,889] Setting status of trial#0 as TrialState.FAIL because of \
the following error: ValueError('A sample error in objective.')
You can also find the failed trials by checking the trial states as follows:
study.trials_dataframe()
number | state | value | ... | params | system_attrs |
0 | TrialState.FAIL | ... | 0 | Setting status of trial#0 as TrialState.FAIL because of the following error: ValueError('A test error in objective.') | |
1 | TrialState.COMPLETE | 1269 | ... | 1 |
The catch
argument in ~optuna.study.Study.optimize
.
Trials that return NaN
(float('nan')
) are treated as failures, but they will not abort studies.
Trials which return NaN
are shown as follows:
[W 2018-12-07 16:41:59,000] Setting status of trial#2 as TrialState.FAIL because the \
objective function returned nan.
Since parameters search spaces are specified in each call to the suggestion API, e.g. ~optuna.trial.Trial.suggest_float
and ~optuna.trial.Trial.suggest_int
, it is possible to, in a single study, alter the range by sampling parameters from different search spaces in different trials. The behavior when altered is defined by each sampler individually.
Note
Discussion about the TPE sampler. optuna/optuna#822
If your optimization target supports GPU (CUDA) acceleration and you want to specify which GPU is used, the easiest way is to set CUDA_VISIBLE_DEVICES
environment variable:
# On a terminal.
#
# Specify to use the first GPU, and run an optimization.
$ export CUDA_VISIBLE_DEVICES=0
$ optuna study optimize foo.py objective --study-name foo --storage sqlite:///example.db
# On another terminal.
#
# Specify to use the second GPU, and run another optimization.
$ export CUDA_VISIBLE_DEVICES=1
$ optuna study optimize bar.py objective --study-name bar --storage sqlite:///example.db
Please refer to CUDA C Programming Guide for further details.
When you test objective functions, you may prefer fixed parameter values to sampled ones. In that case, you can use ~optuna.trial.FixedTrial
, which suggests fixed parameter values based on a given dictionary of parameters. For instance, you can input arbitrary values of x and y to the objective function x + y as follows:
def objective(trial):
x = trial.suggest_float("x", -1.0, 1.0)
y = trial.suggest_int("y", -5, 5)
return x + y
objective(FixedTrial({"x": 1.0, "y": -1})) # 0.0
objective(FixedTrial({"x": -1.0, "y": -4})) # -5.0
Using ~optuna.trial.FixedTrial
, you can write unit tests as follows:
# A test function of pytest
def test_objective():
assert 1.0 == objective(FixedTrial({"x": 1.0, "y": 0}))
assert -1.0 == objective(FixedTrial({"x": 0.0, "y": -1}))
assert 0.0 == objective(FixedTrial({"x": -1.0, "y": 1}))
If the memory footprint increases as you run more trials, try to periodically run the garbage collector. Specify gc_after_trial
to True
when calling ~optuna.study.Study.optimize
or call gc.collect
inside a callback.
def objective(trial):
x = trial.suggest_float("x", -1.0, 1.0)
y = trial.suggest_int("y", -5, 5)
return x + y
study = optuna.create_study()
study.optimize(objective, n_trials=10, gc_after_trial=True)
# `gc_after_trial=True` is more or less identical to the following.
study.optimize(objective, n_trials=10, callbacks=[lambda study, trial: gc.collect()])
There is a performance trade-off for running the garbage collector, which could be non-negligible depending on how fast your objective function otherwise is. Therefore, gc_after_trial
is False
by default. Note that the above examples are similar to running the garbage collector inside the objective function, except for the fact that gc.collect
is called even when errors, including ~optuna.exceptions.TrialPruned
are raised.
Note
~optuna.integration.ChainerMNStudy
does currently not provide gc_after_trial
nor callbacks for ~optuna.integration.ChainerMNStudy.optimize
. When using this class, you will have to call the garbage collector inside the objective function.