Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make running trials in timeline plot visible #731

Conversation

nabenabe0928
Copy link
Collaborator

@nabenabe0928 nabenabe0928 commented Dec 12, 2023

Contributor License Agreement

This repository (optuna-dashboard) and Goptuna share common code.
This pull request may therefore be ported to Goptuna.
Make sure that you understand the consequences concerning licenses and check the box below if you accept the term before creating this pull request.

  • I agree this patch may be ported to Goptuna by other Goptuna contributors.

Reference Issues/PRs

This PR makes the running trials in the timeline plot visible.
Currently, running trials are handled as a run duration of zero.
For this reason, we cannot see running trials in the timeline plot although it actually exists.
In this PR, I aim to make them appear in the timeline plot.
Plus, I revised the code so that trials, which were killed before they completed, will handle the max time in the plot as the max completed/pruned/failed timing.

What does this implement/fix? Explain your changes.

This PR makes the following changes:

  1. Running trials will appear on the timeline plot, and
  2. The maximum of the x-axis is handled as either the current date time (if the current running durations are not too long) or the maximum completed time.

Copy link

codecov bot commented Dec 12, 2023

Codecov Report

Attention: 1 lines in your changes are missing coverage. Please review.

Comparison is base (c38d0d9) 67.20% compared to head (1c0477e) 68.22%.
Report is 64 commits behind head on main.

Files Patch % Lines
optuna_dashboard/_app.py 97.22% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #731      +/-   ##
==========================================
+ Coverage   67.20%   68.22%   +1.02%     
==========================================
  Files          35       35              
  Lines        2293     2329      +36     
==========================================
+ Hits         1541     1589      +48     
+ Misses        752      740      -12     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

The last version could get maxRunDuration of zero when we have
waiting or failed trials without any complete or pruned trials.
It leads to false in the judge of hasRunning.
However, in this case, we should be able to say that we have
running trials.
I fixed this issue in this commit.
@nabenabe0928 nabenabe0928 force-pushed the bug-fix/make-running-trials-in-timeline-plot-visible branch from b8f9ae5 to 895e327 Compare December 12, 2023 13:41
@nabenabe0928 nabenabe0928 marked this pull request as ready for review December 12, 2023 14:10
@c-bata
Copy link
Member

c-bata commented Dec 13, 2023

@keisuke-umezawa Could you review this PR?

@nabenabe0928
Copy link
Collaborator Author

nabenabe0928 commented Dec 13, 2023

I checked the behavior with the following (checking the dashboard while the following is running to check the green bar grows over time):

import time

import optuna


def objective(trial: optuna.Trial) -> float:
    x = trial.suggest_float("x", -5, 5)
    time.sleep(30)
    return x


if __name__ == "__main__":
    study = optuna.create_study(storage="sqlite:///demo.db")
    study.optimize(objective, n_trials=100)

Another check (check dashboard after the run finishes so that we can see that the green bar does not squash the other bars after a while):

def create_study_with_various_trials(storage: str) -> None:
    rng = np.random.RandomState(42)

    def _objective_intermediate_values(trial: optuna.Trial) -> float:
        x = trial.suggest_categorical("x", choices=["a", "b", "c"])
        rnd = rng.random()
        if rnd < 0.25:
            trial.report(trial.number, step=0)
            trial.report(trial.number + 1, step=1)
        elif rnd < 0.5:
            trial.report(trial.number, step=0)
            raise optuna.TrialPruned()
        elif rnd < 0.75:
            raise optuna.TrialPruned()
        else:
            raise ValueError("Unexpected Error.")

        return 0.0

    study = optuna.create_study(study_name="various-trials", storage=storage)
    study.optimize(_objective_intermediate_values, n_trials=40, catch=(Exception,))
    study.enqueue_trial({"x": "a"})
    study.ask({"x": CategoricalDistribution(["a", "b", "c"])})


if __name__ == "__main__":
    create_study_with_various_trials(storage="sqlite:///demo.db")

Comment on lines 83 to 86
const start = t.datetime_start?.getTime() ?? new Date().getTime()
const now = new Date().getTime()
// This is an ad-hoc handling to check if the trial is running.
return now - start < maxRunDuration * 5
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[question] Why do we need those lines for ad-hoc handling?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me, it can be just simply with const hasRunning = trials.some((t) => t.state === runningKey), but do you have some reasons that you need to write those lines?

Copy link
Collaborator Author

@nabenabe0928 nabenabe0928 Dec 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can remove it as well, but then the problem is that the timeline for RUNNING trials, which was not somehow killed properly (it is basically preemption kill, which often happens in our internal cluster) then the timeline plot for the other trials will not be visible because they will be squashed by the running trials, which are not really running because of preemption kill.

Copy link
Member

@keisuke-umezawa keisuke-umezawa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left come questions and comments. Could you check them?

optuna_dashboard/ts/components/GraphTimeline.tsx Outdated Show resolved Hide resolved
const completes = bars.map((b, i) => b.datetime_complete ?? starts[i])
const runDurations = bars.map((b) => {
const start = b.datetime_start?.getTime() ?? new Date().getTime()
const complete = b.datetime_complete?.getTime() ?? start
Copy link
Member

@keisuke-umezawa keisuke-umezawa Dec 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about this since we can use the current max time for running trials?

const complete = b.datetime_complete?.getTime() ?? maxDatetime.getTime()

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

start is necessary for waiting trials.

Comment on lines 124 to 128
return Math.max(
1,
!isRunning ? complete - start : maxDatetime.getTime() - start
)
})
Copy link
Member

@keisuke-umezawa keisuke-umezawa Dec 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[question] Why do we need to get max here? If we fix the code as I said in the above, those lines can be just complete - start?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me, it seems that we can use the following lines:

    const runDurations = bars.map((b, i) => {
      const start = starts[i].getTime()
      const complete = b.datetime_complete?.getTime() ?? maxDatetime.getTime()
      return complete - start
    }

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lower bound 1 is necessary to be able to recognize at least bars exist in case runDurations=0.
Otherwise, I think your code is better.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, actually, const complete = b.datetime_complete?.getTime() ?? maxDatetime.getTime() is incorrect when we think of trialState===WAITING.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I understand that you want to show waiting and in some cases also running trials.

In the optuna, it uses the current time for complete if a trial does not have it when it is waiting or running.
ref: https://github.com/optuna/optuna/blob/master/optuna/visualization/_timeline.py#L85-L86 Are you planning to make back port to optuna itself?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did the backport here

Copy link
Member

@keisuke-umezawa keisuke-umezawa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I wrongly added approved. I changed it to ng 🙇

Copy link
Member

@keisuke-umezawa keisuke-umezawa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for working on it!
I just add a nits comment. If you fix it, you can merge it.
Btw, it makes some differences with optuna itself, and it is recommended to make back port to optuna.

Comment on lines 124 to 128
return Math.max(
1,
!isRunning ? complete - start : maxDatetime.getTime() - start
)
})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I understand that you want to show waiting and in some cases also running trials.

In the optuna, it uses the current time for complete if a trial does not have it when it is waiting or running.
ref: https://github.com/optuna/optuna/blob/master/optuna/visualization/_timeline.py#L85-L86 Are you planning to make back port to optuna itself?

@nabenabe0928
Copy link
Collaborator Author

@keisuke-umezawa I made a backport to Optuna! Are there any other things I should do?

@keisuke-umezawa
Copy link
Member

@nabenabe0928
The implementation in optuna side is not fixed yet, and it will make some differences between optuna and optuna-dashboad later. But, I believe that you will make a back port again from optuna to optuna-dashboard. So, I will merge this PR.

@keisuke-umezawa keisuke-umezawa merged commit 6752b17 into optuna:main Jan 21, 2024
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants