Add a way to save models #105

dberenbaum · 2021-07-02T15:10:33Z

Integrations should include an option to save model to file. See #69.

Edited with tasklist:

The text was updated successfully, but these errors were encountered:

pared · 2021-07-05T13:18:48Z

If we decide we want to control it on integration-level #69 can be merged as is.

dberenbaum · 2021-07-05T17:24:46Z

Do you think it's important to be consistent across integrations so that we don't have to document what each one does individually? Also, might it be important to have a generic method in case we have some change we want to make across integrations (for example, add some logic to track the output if inside a dvc project)?

dberenbaum · 2021-07-08T20:55:28Z

Looking back at #69, I don't think it needs to be in dvclive.init(), and I think it could be merged as is, although we should also add similar functionality for xgboost.

I was thinking we might need something like dvclive.save_model(save_func, model_file) if we have some generic logic we need to apply, but we can add that later if needed.

pared · 2021-07-12T10:53:12Z

dvclive.save_model(save_func, model_file)

If user can register own callback, that would take a lot of work out of our way. Not sure whether that feature would be used a lot. If one needs to write saving funciton anyway, why spend even more time reading how to register it in DVCLive and how it would be handled inside, instead of manually calling that every now and then.

pared · 2021-07-12T11:00:15Z

Looking at @daavoo's comment we can see that it would be easier for us to require user to provide saving method. But my original concern remains: will it be used at all?

dberenbaum · 2021-07-12T14:29:38Z

will it be used at all?

DvcLiveCallback(model_file="output_model.h5") seems useful since it saves users from a separate call or from remembering how to save in keras or other frameworks.

I agree that dvclive.save_model(save_func, model_file) does not seem particularly useful. In the future, we might try to do more with dvclive, like make it add the model file as a dvc-tracked output. In that case, we might need a feature like this (or a similar decorator) to implement across the various integrations, but we can worry about it when we need it.

Edit: tldr #105 seems fine for keras, and it should be enough to do the same for xgboost and any other integrations.

Edit: I meant #69 in the first edit 🤦

daavoo · 2021-07-14T11:24:24Z

I reached a similar conclusion when reviewing how to add this functionality to both keras and MMCV integrations:

#69 (comment)
#110 (comment)

With the current set of dvclive features I don't really see why an user would choose to use the "native model saving" logic of our integrations.

Looking at what other ML Loggers do, I kind of like what wandb does in pytorch-lighting which is searching for the model already saved by other callback although this might not apply to all ML Frameworks.

In addition, other ML Loggers tend to use the saved model with some log_artifact method and I'm not sure to what would this be translated in dvc<>dvclive terms. Maybe it is what we already do with checkpoints? Or a potential new feature taking care of calling dvc add?

dberenbaum · 2021-07-14T12:53:59Z

In addition, other ML Loggers tend to use the saved model with some log_artifact method and I'm not sure to what would this be translated in dvc<>dvclive terms. Maybe it is what we already do with checkpoints? Or a potential new feature taking care of calling dvc add?

If using dvc checkpoints, then yes, the model output is already tracked at each step, although this requires some manual stage setup that dvclive might be able to help make easier. Having dvclive take care of calling dvc add or otherwise ensuring that the output is tracked by dvc and not git would be nice.

daavoo · 2021-08-06T11:45:33Z

All existing integrations already support saving models so I'm closing this issue.

We can now start requiring model saving capabilities to new integrations.
Future improvements regarding saving models could be discussed in a separate issue as they come.

daavoo added the feature request label Jul 12, 2021

daavoo mentioned this issue Jul 12, 2021

Save checkpoint for Keras #69

Merged

daavoo self-assigned this Jul 14, 2021

daavoo added the discussion requires active participation to reach a conclusion label Jul 14, 2021

daavoo mentioned this issue Jul 20, 2021

summary: Add option to store **best** value? #89

Open

dberenbaum mentioned this issue Jul 29, 2021

Dvclive user guide iterative/dvc.org#2664

Merged

daavoo closed this as completed Aug 6, 2021

daavoo mentioned this issue Sep 30, 2021

dvclive integration? iterative/mlem#2

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a way to save models #105

Add a way to save models #105

dberenbaum commented Jul 2, 2021 •

edited by daavoo

Loading

pared commented Jul 5, 2021

dberenbaum commented Jul 5, 2021

dberenbaum commented Jul 8, 2021

pared commented Jul 12, 2021

pared commented Jul 12, 2021

dberenbaum commented Jul 12, 2021 •

edited

Loading

daavoo commented Jul 14, 2021

dberenbaum commented Jul 14, 2021

daavoo commented Aug 6, 2021

Add a way to save models #105

Add a way to save models #105

Comments

dberenbaum commented Jul 2, 2021 • edited by daavoo Loading

pared commented Jul 5, 2021

dberenbaum commented Jul 5, 2021

dberenbaum commented Jul 8, 2021

pared commented Jul 12, 2021

pared commented Jul 12, 2021

dberenbaum commented Jul 12, 2021 • edited Loading

daavoo commented Jul 14, 2021

dberenbaum commented Jul 14, 2021

daavoo commented Aug 6, 2021

dberenbaum commented Jul 2, 2021 •

edited by daavoo

Loading

dberenbaum commented Jul 12, 2021 •

edited

Loading