Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

logger: add progress bar while training #64

Closed
discdiver opened this issue Mar 21, 2021 · 15 comments
Closed

logger: add progress bar while training #64

discdiver opened this issue Mar 21, 2021 · 15 comments

Comments

@discdiver
Copy link

Very cool product! Thank you for it!

Would be nice if a progress bar like TQDM would update to STDOUT while training.

@dberenbaum
Copy link
Contributor

Thanks, @discdiver! Could you give more detail about what you would want to see? For example:

  1. Progress within a step/epoch
  2. Number of steps/epochs completed out of total number of steps/epochs
  3. Model performance metrics

@discdiver
Copy link
Author

Those would all be great @dberenbaum! I was envisioning something like TensorFlow Keras's progress bars/info:

Fit model on training data
Epoch 1/2
782/782 [==============================] - 3s 3ms/step - loss: 0.5821 - sparse_categorical_accuracy: 0.8361 - val_loss: 0.1893 - val_sparse_categorical_accuracy: 0.9483
Epoch 2/2
782/782 [==============================] - 2s 3ms/step - loss: 0.1676 - sparse_categorical_accuracy: 0.9500 - val_loss: 0.1631 - val_sparse_categorical_accuracy: 0.9488

Example from the docs

@pared
Copy link
Contributor

pared commented Mar 22, 2021

@discdiver
This is a good idea. I have some thoughts how we could do that.

  1. As we are working with steps I think this should be our measurement of progress
  2. We need to know how much steps we have to go, so introducing this kind of feature will require us to provide information on init. I am thinking about something like: dvclive.init(...,progress_max_steps=100) which would tell dvclive that we are going to have at max 100 steps, and to refresh progress bar each next_step call.
  3. Similarly to Keras, we could dump logged metrics during bar update.

@dberenbaum
Copy link
Contributor

Related to #61:

Use stdout to make a very simplistic plot or to print out the metrics being written to tsv

It seems like the info could include:

  1. All metrics being logged at each step
  2. Number of steps completed / total steps (if provided)
  3. Timer for each step and for total run time (personally, I like to see not only how much time elapsed, but also when each step started so that if I'm separately logging memory usage, for example, I can align spikes with my training process).

What do you think about also borrowing the syntax from tqdm of having a function to wrap an iterable, so it looks something like:

for step in dvclive.range(100, path="logs", ...):
    ...
    dvclive.log("metric", value)

This might make a good candidate for https://github.com/iterative/enhancement-proposals.

@dberenbaum
Copy link
Contributor

  • Timer for each step and for total run time (personally, I like to see not only how much time elapsed, but also when each step started so that if I'm separately logging memory usage, for example, I can align spikes with my training process).

AKA "relative time" and "wall time" (this is the terminology in both mlflow and tensorboard).

@pared pared self-assigned this Mar 29, 2021
@pared pared moved this from To do to In progress in DVC 23 March - 06 April 2021 Mar 29, 2021
@dberenbaum
Copy link
Contributor

What do you think about also borrowing the syntax from tqdm of having a function to wrap an iterable, so it looks something like:

Probably better to put this in a different issue.

@pared
Copy link
Contributor

pared commented Mar 30, 2021

I guess I might have rushed a little bit with implementing that feature.

As @pmrowla mentioned, displaying progress bar might no be what one should expect from external library taking care of metrics logging. That does not make sense with tf integration (TF has its own progress bar), it might make sense in case of xgboost integration. But that does not mean whole lib should take care of that.

Instead of providing logging capabilities that would need to be in many case adjusted for particular use case, maybe we should
provide easy way of obtaining information about metrics logged, step and so on from dvclive, and provide a tutorial in docs showing how to easily set up tqdm with dvclive?

I think that we could just make next_step return the dictionary which is logged into summary JSON file (all metrics logged during step + step nr + timestamp), giving all means to handle progressing as it suits the user.

xgboost integration could be modified so that one can provide a callback for the progress bar.

@dberenbaum
Copy link
Contributor

As @pmrowla mentioned, displaying progress bar might no be what one should expect from external library taking care of metrics logging.

I'm not sure I'm following. Printing metrics in realtime to stdout and/or a file (or both) is what I would expect.

That does not make sense with tf integration (TF has its own progress bar)

Providing an option to print relevant info to stdout seems useful in most cases as long as it can be silenced in scenarios where it's undesirable. The fact that this functionality exists in several ML frameworks already likely influenced this feature request and shows that it's generally expected behavior.

provide a tutorial in docs showing how to easily set up tqdm with dvclive

tqdm instructions seem insufficient because the metrics also should be output at each step, which seems out of scope for tqdm (maybe I'm missing some tqdm functionality). The visual progress bar is more of a nice additional visual of steps/epochs completed so far, not an indicator of process time or anything else. Maybe it's better to first implement plain text output with the info indicated in the above comments?

I think that we could just make next_step return the dictionary which is logged into summary JSON file (all metrics logged during step + step nr + timestamp), giving all means to handle progressing as it suits the user.

Sounds like a good way to implement it, but I don't see why not provide a default implementation for users out of the box.

@pmrowla
Copy link
Contributor

pmrowla commented Mar 31, 2021

I think the assumption would be that users would configure it to update/display the metrics in the trailer part of a tqdm progress bar (after the visual "bar" itself)

so in the same way that TF does it

782/782 [==============================] - 3s 3ms/step - loss: 0.5821 - sparse_categorical_accuracy: 0.8361 - val_loss: 0.1893 - val_sparse_categorical_accuracy: 0.9483

@dberenbaum
Copy link
Contributor

I think the assumption would be that users would configure it to update/display the metrics in the trailer part of a tqdm progress bar (after the visual "bar" itself)

so in the same way that TF does it

782/782 [==============================] - 3s 3ms/step - loss: 0.5821 - sparse_categorical_accuracy: 0.8361 - val_loss: 0.1893 - val_sparse_categorical_accuracy: 0.9483

If tensorflow does this for users, why should dvclive ask users to implement it themselves? What utility is dvclive providing at that point? It seems like this defeats the purpose of dvclive, which is largely to provide framework-agnostic metrics logging (whether that be in the terminal, a text file, a plot, etc.).

@pmrowla
Copy link
Contributor

pmrowla commented Apr 1, 2021

@dberenbaum Isn't tensorflow a framework for the entire training process, and not just a logging mechanism? TF can display a progress bar because TF runs the actual loop.

for i in range(epochs):  # It makes sense to have a progress bar at the level of this `for` statement (or any level outside this loop)
    ... # do training things
    dvclive.log()
    dvclive.next_step() # this tells dvclive that one iteration has ended (but dvclive does not have any control over the `for` loop itself

My understanding is that for TF, a model.fit() call is essentially this loop. They can display progress for model.fit() because TF creates the loop, and can see every time there is a new loop iteration. dvclive is just a logger that you can call anywhere, technically, it does not even have to be inside a loop context.

If all we want to do is just call print() and output the same data we log to a file, then sure that's fine. But displaying an actual percentage/completion based progress bar like TF doesn't make sense for dvclive in this context.

Instead of providing logging capabilities that would need to be in many case adjusted for particular use case, maybe we should
provide easy way of obtaining information about metrics logged, step and so on from dvclive, and provide a tutorial in docs showing how to easily set up tqdm with dvclive?

I think that we could just make next_step return the dictionary which is logged into summary JSON file (all metrics logged during step + step nr + timestamp), giving all means to handle progressing as it suits the user.

This seems like the right track to me. We should define a progress callback parameter for next_step so that the user can retrieve a dictionary containing all of the tracked metric values at each step. The user can then use that information however they want, whether it's by just printing it to stdout or by drawing it in a visual progress bar.

If the user's training happens in a loop context and they want to display a TF style progress bar, they can wrap their loop in a tqdm pbar, and use the data from our callback to fill in the pbar description fields (and update them with each loop iteration).

@dberenbaum
Copy link
Contributor

If all we want to do is just call print() and output the same data we log to a file, then sure that's fine.

Let's start there. I think of it as having stdout be one of several output options. To me, it's a really useful one since I can see the progress from the terminal where I executed the model training script instead of having to open something else.

@discdiver
Copy link
Author

Just came across the rich Python package, which has some very nice progress bars for terminals.

@dberenbaum
Copy link
Contributor

Thanks, @discdiver! Rich is actually used in dvc exp show, so utilizing it in dvclive also would make sense.

@efiop efiop added this to To do in DVC 6 - 20 April 2021 via automation Apr 6, 2021
@efiop efiop moved this from In progress to Done in DVC 23 March - 06 April 2021 Apr 6, 2021
@pared pared moved this from To do to In progress in DVC 6 - 20 April 2021 Apr 12, 2021
@pared pared removed this from In progress in DVC 6 - 20 April 2021 Apr 20, 2021
@pared pared changed the title Feature Request: Add progress bar while training logger: add progress bar while training Jun 10, 2021
@pared pared removed their assignment Nov 3, 2021
@daavoo
Copy link
Contributor

daavoo commented Apr 25, 2022

Closing this in favor of #206 .

About the original request for progress bars, progress bar responsibility lies more on the ML than on DVCLive (the logger)

@daavoo daavoo closed this as completed Apr 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Development

No branches or pull requests

5 participants