logger: Add notifier to `next_step`? #90

daavoo · 2021-06-15T10:12:31Z

Depending on the type of model to be trained, the time in between calls to next_step may vary significantly. In common deep learning scenarios, i.e. the keras callback, next_step is being called at the end of an epoch which could result in long times (maybe hours) in between calls.

It could be useful to have built-in support for optionally sending a notification each time next_step is being called.

Without changing dvclive, the user could just call a custom library (i.e. https://github.com/liiight/notifiers) after next_step:

class MetricsCallback(Callback):
    def on_epoch_end(self, epoch: int, logs: dict = None):
        logs = logs or {}
        for metric, value in logs.items():
            dvclive.log(metric, value)
        dvclive.next_step()
        notify('pushover', user='foo', token='bar', message=f'epoch: {epoch}')

But having the notification step built inside MetricLogger would have some benefits like access to internals (i.e. _metrics) and configuration options in addition to hiding complexity to the end user.

However, I'm not sure if it is worth to implement this feature inside dvclive or if it would be better to keep dvclive as lightweight as possible.

The text was updated successfully, but these errors were encountered:

dmpetrov · 2021-06-15T14:44:22Z

@daavoo I'm trying to understand the motivation behind this? 😄
could you please elaborate on this? o you want to update the files more often? What notify() means?

EDIT: do you have any references in other ml logger frameworks to this functionality?

daavoo · 2021-06-15T15:25:53Z

@daavoo I'm trying to understand the motivation behind this? smile
could you please elaborate on this? o you want to update the files more often? What notify() means?

Sorry about the lack of clarity.

The motivation comes from working with deep learning models that take a lot of time to train (i.e. hours or days). When working under that circumstances we always ended up writing some sort of "notification" code to complement or integrate into the ml logger. The main reason was to be able to monitor the train loop remotely (i.e. no need to look at the stdout in a terminal)

This notification code takes care of sending a message to some platform (i.e. e-mail, slack / discord / telegram channel, etc) containing information like the number of finished epoch (a.k.a step in dvclive) and associated metrics. We also used it to inform when exceptions occurred during the training loop.

notify() is usually a function that sends information as a message to an app.

EDIT: do you have any references in other ml logger frameworks to this functionality?

I think that in other ml loggers we usually have an associated UI with a view that is automatically being updated as the plots/information are being logged (Related with this Studio issue: iterative/studio-support#13)

In addition to that, some ml loggers also provide "notification" utilities:

Beyond existing functionality in other ml loggers, I have found different teams and open source communities solving this problem, including some I work/have worked with:

daavoo · 2021-07-14T08:04:58Z

I've just discovered another open-source tool focused on this kind of functionality:

https://github.com/labmlai/labml

pared · 2021-07-14T14:32:52Z

Related to #91

daavoo · 2021-09-09T20:19:13Z

Another open source tool:

https://github.com/aporia-ai/mlnotify

daavoo · 2021-10-21T13:15:21Z

Interesting integration between DagsHub and New Relic highlighting alerts as one of the main features:

https://dagshub.com/blog/real-time-machine-learning-monitroing-new-relic-dagshub/

dberenbaum · 2022-02-10T16:22:14Z

Related to #91 (comment), I think the most useful integration here would be making it dead simple to send full reports (similar to the html today) through supported channels.

For example, the slack api could probably be used generate a message with the metrics and plot images, and similar for email (personally, I would prioritize slack because it's more collaborative and probably easier for users to set up).

The local html generated now could just be one report/alert format in that case (and the cml markdown report another).

daavoo · 2022-02-10T18:06:01Z

Related to #91 (comment), I think the most useful integration here would be making it dead simple to send full reports (similar to the html today) through supported channels.

For example, the slack api could probably be used generate a message with the metrics and plot images, and similar for email (personally, I would prioritize slack because it's more collaborative and probably easier for users to set up).

The local html generated now could just be one report/alert format in that case (and the cml markdown report another).

That would be the way to go and the original idea using https://github.com/liiight/notifiers .

For metrics is very feasible. However, the images / rendered plots would be kind of tricky because most channels don't have support to directly send images. We could rely on cml publish to host the images and send the link (like in cml mardkown report) but this would imply CML as dependency for any channel.

dberenbaum · 2022-02-10T18:31:00Z

For metrics is very feasible. However, the images / rendered plots would be kind of tricky because most channels don't have support to directly send images. We could rely on cml publish to host the images and send the link (like in cml mardkown report) but this would imply CML as dependency for any channel.

Rather than wrapping a general-purpose text-based notifier with support for many providers, it might be more useful to focus on providers in which we can send the entire report, including images/rendered plots. AFAIK this should be feasible without hosting in Slack (https://api.slack.com/methods/files.upload) and email (https://docs.python.org/3/library/email.examples.html).

I'm not sure text-based alerts add enough value (we could instead have a doc or blog post showing how to use dvclive + https://github.com/liiight/notifiers). Full reports with plots seem like a more unique feature, and they extend dvclive's initial value prop of lightweight live monitoring for model training, providing serverless alerting and reporting anywhere without needing to access the training machine. Since a lot of training happens in headless environments anyway, this seems pretty useful to me. What do you think?

daavoo · 2022-02-11T18:20:13Z

I'm not sure text-based alerts add enough value (we could instead have a doc or blog post showing how to use dvclive + https://github.com/liiight/notifiers). Full reports with plots seem like a more unique feature, and they extend dvclive's initial value prop of lightweight live monitoring for model training, providing serverless alerting and reporting anywhere without needing to access the training machine. Since a lot of training happens in headless environments anyway, this seems pretty useful to me. What do you think?

I think it's useful and would be directly adding value for DVCLive.

I'm a little "worried" about how easy would be to maintain because Report Providers sounds like integrations potentially growing perpendicular to ML Frameworks.

So far, looking at slack and email APIs, it doesn't look that bad.

dberenbaum · 2022-02-16T13:44:16Z

@shcheklein mentioned that it might be worthwhile to look into RSS feed aggregators. There are some parallels in how RSS expects a particular schema of elements (https://validator.w3.org/feed/docs/rss2.html) and can publish them in a consistent format, so maybe it can give some ideas for how to implement.

casperdcl · 2022-04-04T19:24:50Z

So it's about tidying up this sort of thing?

from tqdm.contrib.{slack,telegram,discord} import trange

with trange(live.get_step(), epochs, unit="epoch") as pbar:
    for epoch in pbar:
        ...
        live.log("loss", loss)
        pbar.set_postfix(loss=loss)
        live.next_step()

i.e. providing a callback interface?

live.set_callback(
    on_log=lambda name, metric: pbar.set_postfix({name: metric}),
    on_step=lambda new_step: print(f"starting epoch {new_step:>5d}", file=some_log))

Or is it more advanced? live.notify_slack(on_step=True, channel="#...", token="...")

dberenbaum · 2022-05-09T13:53:49Z

Sorry @casperdcl, I missed this comment. It's closer to the latter advanced usage. Probably channel, token, etc. can be set in environment variables, and the method can be something like live.make_report(type="slack").

dberenbaum · 2023-03-06T22:19:22Z

I don't think we are likely to do this now that we have live metrics in Studio and other solutions exist for alerting.

This was referenced Jun 15, 2021

integrations: CML #91

Closed

Add live metrics support iterative/studio-support#13

Closed

shcheklein added the feature request label Jun 16, 2021

dberenbaum mentioned this issue Feb 4, 2022

Better support plots functionality iterative/vscode-dvc#1274

Closed

dberenbaum mentioned this issue Feb 10, 2022

live: Add report option. #215

Closed

dberenbaum added the p3-nice-to-have label Mar 6, 2023

dberenbaum closed this as not planned Won't fix, can't repro, duplicate, stale Mar 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

logger: Add notifier to `next_step`? #90

logger: Add notifier to `next_step`? #90

daavoo commented Jun 15, 2021

dmpetrov commented Jun 15, 2021 •

edited

Loading

daavoo commented Jun 15, 2021 •

edited

Loading

daavoo commented Jul 14, 2021 •

edited

Loading

pared commented Jul 14, 2021

daavoo commented Sep 9, 2021

daavoo commented Oct 21, 2021

dberenbaum commented Feb 10, 2022

daavoo commented Feb 10, 2022

dberenbaum commented Feb 10, 2022

daavoo commented Feb 11, 2022 •

edited

Loading

dberenbaum commented Feb 16, 2022

casperdcl commented Apr 4, 2022 •

edited

Loading

dberenbaum commented May 9, 2022

dberenbaum commented Mar 6, 2023

logger: Add notifier to next_step? #90

logger: Add notifier to next_step? #90

Comments

daavoo commented Jun 15, 2021

dmpetrov commented Jun 15, 2021 • edited Loading

daavoo commented Jun 15, 2021 • edited Loading

daavoo commented Jul 14, 2021 • edited Loading

pared commented Jul 14, 2021

daavoo commented Sep 9, 2021

daavoo commented Oct 21, 2021

dberenbaum commented Feb 10, 2022

daavoo commented Feb 10, 2022

dberenbaum commented Feb 10, 2022

daavoo commented Feb 11, 2022 • edited Loading

dberenbaum commented Feb 16, 2022

casperdcl commented Apr 4, 2022 • edited Loading

dberenbaum commented May 9, 2022

dberenbaum commented Mar 6, 2023

logger: Add notifier to `next_step`? #90

logger: Add notifier to `next_step`? #90

dmpetrov commented Jun 15, 2021 •

edited

Loading

daavoo commented Jun 15, 2021 •

edited

Loading

daavoo commented Jul 14, 2021 •

edited

Loading

daavoo commented Feb 11, 2022 •

edited

Loading

casperdcl commented Apr 4, 2022 •

edited

Loading