logger: add progress bar while training #64

discdiver · 2021-03-21T17:57:56Z

Very cool product! Thank you for it!

Would be nice if a progress bar like TQDM would update to STDOUT while training.

dberenbaum · 2021-03-21T23:52:04Z

Thanks, @discdiver! Could you give more detail about what you would want to see? For example:

Progress within a step/epoch
Number of steps/epochs completed out of total number of steps/epochs
Model performance metrics

discdiver · 2021-03-22T01:02:10Z

Those would all be great @dberenbaum! I was envisioning something like TensorFlow Keras's progress bars/info:

Fit model on training data
Epoch 1/2
782/782 [==============================] - 3s 3ms/step - loss: 0.5821 - sparse_categorical_accuracy: 0.8361 - val_loss: 0.1893 - val_sparse_categorical_accuracy: 0.9483
Epoch 2/2
782/782 [==============================] - 2s 3ms/step - loss: 0.1676 - sparse_categorical_accuracy: 0.9500 - val_loss: 0.1631 - val_sparse_categorical_accuracy: 0.9488

Example from the docs

pared · 2021-03-22T09:30:10Z

@discdiver
This is a good idea. I have some thoughts how we could do that.

As we are working with steps I think this should be our measurement of progress
We need to know how much steps we have to go, so introducing this kind of feature will require us to provide information on init. I am thinking about something like: dvclive.init(...,progress_max_steps=100) which would tell dvclive that we are going to have at max 100 steps, and to refresh progress bar each next_step call.
Similarly to Keras, we could dump logged metrics during bar update.

dberenbaum · 2021-03-22T19:17:45Z

Related to #61:

Use stdout to make a very simplistic plot or to print out the metrics being written to tsv

It seems like the info could include:

All metrics being logged at each step
Number of steps completed / total steps (if provided)
Timer for each step and for total run time (personally, I like to see not only how much time elapsed, but also when each step started so that if I'm separately logging memory usage, for example, I can align spikes with my training process).

What do you think about also borrowing the syntax from tqdm of having a function to wrap an iterable, so it looks something like:

for step in dvclive.range(100, path="logs", ...):
    ...
    dvclive.log("metric", value)

This might make a good candidate for https://github.com/iterative/enhancement-proposals.

dberenbaum · 2021-03-25T20:37:26Z

Timer for each step and for total run time (personally, I like to see not only how much time elapsed, but also when each step started so that if I'm separately logging memory usage, for example, I can align spikes with my training process).

AKA "relative time" and "wall time" (this is the terminology in both mlflow and tensorboard).

dberenbaum · 2021-03-29T13:31:26Z

What do you think about also borrowing the syntax from tqdm of having a function to wrap an iterable, so it looks something like:

Probably better to put this in a different issue.

pared · 2021-03-30T15:11:41Z

I guess I might have rushed a little bit with implementing that feature.

As @pmrowla mentioned, displaying progress bar might no be what one should expect from external library taking care of metrics logging. That does not make sense with tf integration (TF has its own progress bar), it might make sense in case of xgboost integration. But that does not mean whole lib should take care of that.

Instead of providing logging capabilities that would need to be in many case adjusted for particular use case, maybe we should
provide easy way of obtaining information about metrics logged, step and so on from dvclive, and provide a tutorial in docs showing how to easily set up tqdm with dvclive?

I think that we could just make next_step return the dictionary which is logged into summary JSON file (all metrics logged during step + step nr + timestamp), giving all means to handle progressing as it suits the user.

xgboost integration could be modified so that one can provide a callback for the progress bar.

dberenbaum · 2021-03-30T18:09:08Z

As @pmrowla mentioned, displaying progress bar might no be what one should expect from external library taking care of metrics logging.

I'm not sure I'm following. Printing metrics in realtime to stdout and/or a file (or both) is what I would expect.

That does not make sense with tf integration (TF has its own progress bar)

Providing an option to print relevant info to stdout seems useful in most cases as long as it can be silenced in scenarios where it's undesirable. The fact that this functionality exists in several ML frameworks already likely influenced this feature request and shows that it's generally expected behavior.

provide a tutorial in docs showing how to easily set up tqdm with dvclive

tqdm instructions seem insufficient because the metrics also should be output at each step, which seems out of scope for tqdm (maybe I'm missing some tqdm functionality). The visual progress bar is more of a nice additional visual of steps/epochs completed so far, not an indicator of process time or anything else. Maybe it's better to first implement plain text output with the info indicated in the above comments?

I think that we could just make next_step return the dictionary which is logged into summary JSON file (all metrics logged during step + step nr + timestamp), giving all means to handle progressing as it suits the user.

Sounds like a good way to implement it, but I don't see why not provide a default implementation for users out of the box.

pmrowla · 2021-03-31T03:19:38Z

I think the assumption would be that users would configure it to update/display the metrics in the trailer part of a tqdm progress bar (after the visual "bar" itself)

so in the same way that TF does it

782/782 [==============================] - 3s 3ms/step - loss: 0.5821 - sparse_categorical_accuracy: 0.8361 - val_loss: 0.1893 - val_sparse_categorical_accuracy: 0.9483

dberenbaum · 2021-03-31T18:22:16Z

I think the assumption would be that users would configure it to update/display the metrics in the trailer part of a tqdm progress bar (after the visual "bar" itself)

so in the same way that TF does it
782/782 [==============================] - 3s 3ms/step - loss: 0.5821 - sparse_categorical_accuracy: 0.8361 - val_loss: 0.1893 - val_sparse_categorical_accuracy: 0.9483

If tensorflow does this for users, why should dvclive ask users to implement it themselves? What utility is dvclive providing at that point? It seems like this defeats the purpose of dvclive, which is largely to provide framework-agnostic metrics logging (whether that be in the terminal, a text file, a plot, etc.).

pmrowla · 2021-04-01T01:16:50Z

@dberenbaum Isn't tensorflow a framework for the entire training process, and not just a logging mechanism? TF can display a progress bar because TF runs the actual loop.

for i in range(epochs):  # It makes sense to have a progress bar at the level of this `for` statement (or any level outside this loop)
    ... # do training things
    dvclive.log()
    dvclive.next_step() # this tells dvclive that one iteration has ended (but dvclive does not have any control over the `for` loop itself

My understanding is that for TF, a model.fit() call is essentially this loop. They can display progress for model.fit() because TF creates the loop, and can see every time there is a new loop iteration. dvclive is just a logger that you can call anywhere, technically, it does not even have to be inside a loop context.

If all we want to do is just call print() and output the same data we log to a file, then sure that's fine. But displaying an actual percentage/completion based progress bar like TF doesn't make sense for dvclive in this context.

Instead of providing logging capabilities that would need to be in many case adjusted for particular use case, maybe we should
provide easy way of obtaining information about metrics logged, step and so on from dvclive, and provide a tutorial in docs showing how to easily set up tqdm with dvclive?

I think that we could just make next_step return the dictionary which is logged into summary JSON file (all metrics logged during step + step nr + timestamp), giving all means to handle progressing as it suits the user.

This seems like the right track to me. We should define a progress callback parameter for next_step so that the user can retrieve a dictionary containing all of the tracked metric values at each step. The user can then use that information however they want, whether it's by just printing it to stdout or by drawing it in a visual progress bar.

If the user's training happens in a loop context and they want to display a TF style progress bar, they can wrap their loop in a tqdm pbar, and use the data from our callback to fill in the pbar description fields (and update them with each loop iteration).

dberenbaum · 2021-04-01T13:26:19Z

If all we want to do is just call print() and output the same data we log to a file, then sure that's fine.

Let's start there. I think of it as having stdout be one of several output options. To me, it's a really useful one since I can see the progress from the terminal where I executed the model training script instead of having to open something else.

discdiver · 2021-04-04T15:21:59Z

Just came across the rich Python package, which has some very nice progress bars for terminals.

dberenbaum · 2021-04-05T14:36:51Z

Thanks, @discdiver! Rich is actually used in dvc exp show, so utilizing it in dvclive also would make sense.

daavoo · 2022-04-25T19:17:49Z

Closing this in favor of #206 .

About the original request for progress bars, progress bar responsibility lies more on the ML than on DVCLive (the logger)

dberenbaum added this to To do in DVC 23 March - 06 April 2021 Mar 23, 2021

pared self-assigned this Mar 29, 2021

pared moved this from To do to In progress in DVC 23 March - 06 April 2021 Mar 29, 2021

dberenbaum mentioned this issue Mar 29, 2021

Iterable wrapper for dvclive #68

Closed

efiop added this to To do in DVC 6 - 20 April 2021 via automation Apr 6, 2021

efiop moved this from In progress to Done in DVC 23 March - 06 April 2021 Apr 6, 2021

pared moved this from To do to In progress in DVC 6 - 20 April 2021 Apr 12, 2021

pared removed this from In progress in DVC 6 - 20 April 2021 Apr 20, 2021

pared added the enhancement label May 20, 2021

pared changed the title ~~Feature Request: Add progress bar while training~~ logger: add progress bar while training Jun 10, 2021

pared added the feature request label Jun 10, 2021

pared removed their assignment Nov 3, 2021

daavoo closed this as completed Apr 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

logger: add progress bar while training #64

logger: add progress bar while training #64

discdiver commented Mar 21, 2021

dberenbaum commented Mar 21, 2021

discdiver commented Mar 22, 2021

pared commented Mar 22, 2021

dberenbaum commented Mar 22, 2021

dberenbaum commented Mar 25, 2021

dberenbaum commented Mar 29, 2021

pared commented Mar 30, 2021

dberenbaum commented Mar 30, 2021

pmrowla commented Mar 31, 2021

dberenbaum commented Mar 31, 2021

pmrowla commented Apr 1, 2021 •

edited

Loading

dberenbaum commented Apr 1, 2021

discdiver commented Apr 4, 2021

dberenbaum commented Apr 5, 2021

daavoo commented Apr 25, 2022

logger: add progress bar while training #64

logger: add progress bar while training #64

Comments

discdiver commented Mar 21, 2021

dberenbaum commented Mar 21, 2021

discdiver commented Mar 22, 2021

pared commented Mar 22, 2021

dberenbaum commented Mar 22, 2021

dberenbaum commented Mar 25, 2021

dberenbaum commented Mar 29, 2021

pared commented Mar 30, 2021

dberenbaum commented Mar 30, 2021

pmrowla commented Mar 31, 2021

dberenbaum commented Mar 31, 2021

pmrowla commented Apr 1, 2021 • edited Loading

dberenbaum commented Apr 1, 2021

discdiver commented Apr 4, 2021

dberenbaum commented Apr 5, 2021

daavoo commented Apr 25, 2022

pmrowla commented Apr 1, 2021 •

edited

Loading