-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Data] Fix progress bars being displayed as partially completed in Jupyter notebooks #46289
Conversation
Signed-off-by: sjl <sjl@anyscale.com>
@@ -177,7 +177,6 @@ def __init__(self, state: ProgressBarState, pos_offset: int): | |||
desc=state["desc"] + " " + str(state["pos"]), | |||
total=state["total"], | |||
position=pos_offset + state["pos"], | |||
leave=False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this makes it so that we show the overall progress bar. we are matching the parameters used with default tqdm: https://github.com/ray-project/ray/blob/master/python/ray/data/_internal/progress_bar.py#L59-L62
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
however, this has the effect of leaving the progress bar in the output after completion:
personally, i don't think it's a bad outcome to leave the progress bar output after completion, as it makes it easy for me to track overall progress. but happy to hear from others, or if there are users who prefer to hide it after completion by default
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it'd be useful if we persisted progress bars, but can we configure this at the Ray Data level? If we change the configuration here, it'd affect other libraries that use tqdm_ray
.
Btw, is this necessary to fix the issue, or is this an orthogonal change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
without this change, the progress bars would disappear after completion, which is the intended behavior when setting leave=False
in tqdm
: https://tqdm.github.io/docs/tqdm/#tqdm-objects
agree that it makes sense to have some configuration which can be set separately. i am thinking i can add a leave
parameter to tqdm_ray
, which can be set from Ray Data's ProgressBar
class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, still a bit confused -- if we just made the process.update_bar(state)
change and didn't modify leave
, would we still be able to close the issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we did not modify leave
, there is another issue where all of the progress bars will go away after completion, due to leave=False
being set (this is the current behavior). The fix in this PR causes the bars to update and disappear after the bar completes:
so to account for this new issue, we can have leave=True
to keep the bars around.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, gotcha. Yeah, that looks janky
Don't have a strong opinion about whether we leave=True
for just Jupyter or for all Ray Data progress bars. @raulchen what's your opinion?
Let's definitely add a configuration for leave
in tqdm_ray.tqdm
and only configure for Ray Data. Want to avoid changing the behavior for other libraries and users who might use tqdm_ray
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
personally I actually prefer always leaving the progress bars. because it's convenient to inspect some stats.
One minor concern is that it might be confusing for some users, making them the pipeline still running.
What about we update the progress bar message in the end to something like Dataset execution finished in .. seconds
, Dataset execution failed, see above stack trace for error message
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also no longer need to set leave
in tqdm_ray
or in Ray Data, since we will always be using leave=True
, and this is the default value in tqdm
.
Signed-off-by: sjl <sjl@anyscale.com>
Signed-off-by: sjl <sjl@anyscale.com>
Why are these changes needed?
There is a bug with progress bars generated from Ray Data on Jupyter notebooks, where the progress bar is left partially complete after the dataset finishes executing:
This PR fixes the bug so that progress bars are marked as fully completed after execution finishes. We also add a "success" or "failure" message in the bar after execution terminates:
The progress bar output for the same code outside Jupyter notebook on master:
The same code with this PR:
Related issue number
Closes #44983
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.