Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a way to save models #105

Closed
3 tasks done
dberenbaum opened this issue Jul 2, 2021 · 9 comments
Closed
3 tasks done

Add a way to save models #105

dberenbaum opened this issue Jul 2, 2021 · 9 comments
Assignees
Labels
discussion requires active participation to reach a conclusion feature request

Comments

@dberenbaum
Copy link
Collaborator

dberenbaum commented Jul 2, 2021

Integrations should include an option to save model to file. See #69.


Edited with tasklist:

@pared
Copy link
Contributor

pared commented Jul 5, 2021

If we decide we want to control it on integration-level #69 can be merged as is.

@dberenbaum
Copy link
Collaborator Author

Do you think it's important to be consistent across integrations so that we don't have to document what each one does individually? Also, might it be important to have a generic method in case we have some change we want to make across integrations (for example, add some logic to track the output if inside a dvc project)?

@dberenbaum
Copy link
Collaborator Author

Looking back at #69, I don't think it needs to be in dvclive.init(), and I think it could be merged as is, although we should also add similar functionality for xgboost.

I was thinking we might need something like dvclive.save_model(save_func, model_file) if we have some generic logic we need to apply, but we can add that later if needed.

@pared
Copy link
Contributor

pared commented Jul 12, 2021

dvclive.save_model(save_func, model_file)

If user can register own callback, that would take a lot of work out of our way. Not sure whether that feature would be used a lot. If one needs to write saving funciton anyway, why spend even more time reading how to register it in DVCLive and how it would be handled inside, instead of manually calling that every now and then.

@pared
Copy link
Contributor

pared commented Jul 12, 2021

Looking at @daavoo's comment we can see that it would be easier for us to require user to provide saving method. But my original concern remains: will it be used at all?

@dberenbaum
Copy link
Collaborator Author

dberenbaum commented Jul 12, 2021

will it be used at all?

DvcLiveCallback(model_file="output_model.h5") seems useful since it saves users from a separate call or from remembering how to save in keras or other frameworks.

I agree that dvclive.save_model(save_func, model_file) does not seem particularly useful. In the future, we might try to do more with dvclive, like make it add the model file as a dvc-tracked output. In that case, we might need a feature like this (or a similar decorator) to implement across the various integrations, but we can worry about it when we need it.

Edit: tldr #105 seems fine for keras, and it should be enough to do the same for xgboost and any other integrations.

Edit: I meant #69 in the first edit 🤦

@daavoo daavoo self-assigned this Jul 14, 2021
@daavoo
Copy link
Contributor

daavoo commented Jul 14, 2021

I reached a similar conclusion when reviewing how to add this functionality to both keras and MMCV integrations:

#69 (comment)
#110 (comment)

With the current set of dvclive features I don't really see why an user would choose to use the "native model saving" logic of our integrations.

Looking at what other ML Loggers do, I kind of like what wandb does in pytorch-lighting which is searching for the model already saved by other callback although this might not apply to all ML Frameworks.

In addition, other ML Loggers tend to use the saved model with some log_artifact method and I'm not sure to what would this be translated in dvc<>dvclive terms. Maybe it is what we already do with checkpoints? Or a potential new feature taking care of calling dvc add?

@daavoo daavoo added the discussion requires active participation to reach a conclusion label Jul 14, 2021
@dberenbaum
Copy link
Collaborator Author

In addition, other ML Loggers tend to use the saved model with some log_artifact method and I'm not sure to what would this be translated in dvc<>dvclive terms. Maybe it is what we already do with checkpoints? Or a potential new feature taking care of calling dvc add?

If using dvc checkpoints, then yes, the model output is already tracked at each step, although this requires some manual stage setup that dvclive might be able to help make easier. Having dvclive take care of calling dvc add or otherwise ensuring that the output is tracked by dvc and not git would be nice.

@daavoo
Copy link
Contributor

daavoo commented Aug 6, 2021

All existing integrations already support saving models so I'm closing this issue.

We can now start requiring model saving capabilities to new integrations.
Future improvements regarding saving models could be discussed in a separate issue as they come.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion requires active participation to reach a conclusion feature request
Projects
None yet
Development

No branches or pull requests

3 participants