Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timing of Pruning #276

Closed
Hiroshiba opened this issue Dec 17, 2018 · 5 comments

Comments

Projects
None yet
2 participants
@Hiroshiba
Copy link

commented Dec 17, 2018

I want to ask the timing of pruning like these pattern.

  1. The difference of the interval trial.report (ex. 100iter, 1000iter).
  2. If multiple training are running
  3. Over-fitting

Optuna is very great project. Thanks.

@toshihikoyanase

This comment has been minimized.

Copy link
Collaborator

commented Dec 18, 2018

Thank you for your interest in Optuna. I'll try to answer your questions, but I'm not sure that I can fully understand them. If you have further questions, please feel free to contact us.

  1. The difference of the interval trial.report (ex. 100iter, 1000iter).

trial.report(value, step) saves intermediate value to storage, and trial.should_prune(step) refers to value at step, and checks the pruning condition. So, we need to call trial.report(value, step) before we invoke trial.should_prune(step). This is a requirement of pruning interval and trial.report interval.

If you satisfy the requirement, you can set the different intervals of trial.report and trial.should_prune. For example, we can report values at every 5 iterations, and check pruning condition at every 10 iterations as follows:

def objective(trial):
    iris = sklearn.datasets.load_iris()
    classes = list(set(iris.target))
    train_x, test_x, train_y, test_y = \
        sklearn.model_selection.train_test_split(iris.data, iris.target, test_size=0.25)

    alpha = trial.suggest_loguniform('alpha', 1e-5, 1e-1)
    clf = sklearn.linear_model.SGDClassifier(alpha=alpha)

    for step in range(100):
        clf.partial_fit(train_x, train_y, classes=classes)

        # Report intermediate objective value.
        if step % 5 == 0:
            intermediate_value = 1.0 - clf.score(test_x, test_y)
            trial.report(intermediate_value, step)

        # Handle pruning based on the intermediate value.
        if step % 10 == 0 and trial.should_prune(step):
            raise optuna.structs.TrialPruned()

    return 1.0 - clf.score(test_x, test_y)

The above example code based on the examples/pruning/simple.py.

If you use integration modules for xgboost and lightgbm, they invoke trial.report and trial.should_prune at every iteration. Currently, Optuna does not provide options to change pruning intervals for the modules, so it is possible future work.

If you use chainer integration, you can change the interval of trial.report and trial.should_prune using pruner_trigger argument. Please refer to the reference for further details.

  1. If multiple training are running

I think your question is about parallel execution of trials. Optuna's pruner is called in a trial individually, and the other trials running in other process does not affect the timing of pruning condition checks.

  1. Over-fitting

Let me confirm our understanding of your questions. I think your question is about the relationship between Optuna's pruning mechanism and over-fitting of the models trained in your objective functions.

If so, the pruning mechanism does not detect the over-fitting because the pruners do not compare current values with previous values in the same trial. So, I think we need to use early stopping mechanism provided by ML libraries like chainer.training.triggers.EarlyStoppingTrigger for that purpose.

@Hiroshiba

This comment has been minimized.

Copy link
Author

commented Dec 18, 2018

Thank you for the details.
I used to misunderstand that Optuna may prune with using the time series information.
So I understood 1st and 3rd questions.

I'm sorry I didn't make the 2nd question clear enough.
I wanted to know whether there is a difference between sequential execution and simultaneous execution.

I executed parallel trials at the same time.
The some trials' target value was obviously not good compared to other trials, but all tasks were executed to the end without being pruned.

Is there a way to execute more efficiently?

@toshihikoyanase

This comment has been minimized.

Copy link
Collaborator

commented Dec 18, 2018

I wanted to know whether there is a difference between sequential execution and simultaneous execution.

The difference may come from the implemetation of MedianPruner. Currently, MedianPruner only takes care of completed trials and ignores running trials when it calculate the threshold of pruning (more specifically, the median value of past trials). So, some trials may not be targets of pruning when we use parallel optimization.

Let me explain it using a simple example.
(a) Sequentially execute 100 trials.
(b) Parallelly execute 100 trials simultaneously.
We assume that both (a) and (b) uses MedianPruner(n_startup_trials=10) and each trial takes the same calculation time.

In the case of (a), the pruner is activated after 11 trials. So, 90 trials are the targets of the pruner.
In the case of (b), the pruner is never activated because the study has no completed trials during the evaluation of trials. So, pruning does not happen at all, and all 100 trials are completed in the end.
This is a toy example, but similar phenomena can be seen in real applications.

If it is not matched with your case, something may be wrong with the pruning mechanism. If you give us further information such as sample code to reproduce such phenomena and error logs, it will be a great help of us.

@Hiroshiba

This comment has been minimized.

Copy link
Author

commented Dec 18, 2018

Thanks a lot.
I understood pruning timing.

I have trained the deep-learning tasks that takes 6 hours to end, so I want to run trials more efficiently.

I could know everything I was interested in. Thank you.

@Hiroshiba Hiroshiba closed this Dec 18, 2018

@toshihikoyanase

This comment has been minimized.

Copy link
Collaborator

commented Dec 18, 2018

I have trained the deep-learning tasks that take 6 hours to end, so I want to run trials more efficiently.

I have good news for you. We plan to add a new pruner which significantly accelerates deep-learning tasks in a parallel computing environment.
Please try it when it is merged to master. The corresponding PR is #236.

Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.