Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

integrations: Add DVC Live integration to Ray Tune #237

Open
MarkoMFilip opened this issue Apr 6, 2022 · 8 comments
Open

integrations: Add DVC Live integration to Ray Tune #237

MarkoMFilip opened this issue Apr 6, 2022 · 8 comments
Labels
A: frameworks Area: ML Framework integration feature request p2-medium

Comments

@MarkoMFilip
Copy link

Similar to how Ray provides integration for other loggers within Ray Tune it would be good if DVC Live could have its own integration. Concretely, in its documentation for integration of ML Flow with Ray Tune, Ray gives examples of how it created two specific functions to help people both run hyperpartameter optimization with Tune and at the same time track the experiments with ML Flow. If we want to use DVC's Experiments and Checkpoints with Ray Tune, it would be good to have a similar integration available.

@daavoo daavoo added A: frameworks Area: ML Framework integration feature request labels Apr 6, 2022
@grizzlybearg
Copy link

Hi @daavoo @MarkoMFilip has there been any progress with this??

@daavoo
Copy link
Contributor

daavoo commented Feb 9, 2023

Hi @daavoo @MarkoMFilip has there been any progress with this??

Hi @grizzlybearg , there has not been direct progress but since the issue was opened we have added some features (mainly https://github.com/iterative/dvclive/releases/tag/1.1.0) that should allow implementing something similar to the integrations defined in https://docs.ray.io/en/latest/tune/examples/tune-mlflow.html .

I might try to set up a draft P.R. tomorrow since I have checked the code for the MLflowLoggerCallback and it looks simple enough

@grizzlybearg
Copy link

Thanks @daavoo

@bastienboutonnet
Copy link

@daavoo I was trying to see if there had been a PR for this, as I'd really love to be able to use DVC live with Ray Tune. Not so keen on the other ML monitoring platforms out there. I couldn't find anything related here. Could it be that I'm looking in the wrong place?

Maybe with some guidelines, I'd love to help out if that idea has not been further implemented.

@daavoo
Copy link
Contributor

daavoo commented Jun 30, 2023

Hi @bastienboutonnet , are you using Ray Tune alongside an existing ML Framework (i.e. keras, pytorch lightning)?

@dberenbaum dberenbaum added p1-important Include in the next sprint and removed p2-medium labels Jun 30, 2023
@bastienboutonnet
Copy link

@daavoo We are currently using huggingface transformer trainers

@daavoo
Copy link
Contributor

daavoo commented Jun 30, 2023

@daavoo We are currently using huggingface transformer trainers

Thanks! Tried to set up a quick example following https://huggingface.co/blog/ray-tune and passing:

from dvclive.huggingface import DVCLiveCallback

trainer.add_callback(DVCLiveCallback(save_dvc_exp=True))

But I think I actually need to look into it in more detail 😓 It appears that there is a bug with Ray trying to deserialize the internal DVC Repo instance used by DVCLive

@dberenbaum
Copy link
Collaborator

@bastienboutonnet @grizzlybearg @MarkoMFilip or others watching this issue, do you already use DVC and Ray Tune? Do you use them together at all, and if so, how?

Since Ray will often be running on a distributed cluster, the typical DVCLive workflow of writing metrics and plots to local files and using Git to sync them won't work (even locally, since each trial writes to its own run folder, it violates the assumptions of DVC). A couple options would be to:

  1. Launch Ray from within DVC and sync back each trial's results. Sync the metrics and plots data to a central store (like cloud storage or DVC Studio), keeping track of the experiment associated with those metrics so they can be synced back to the Git/DVC repo.
  2. Launch DVC from within Ray inside each remote trial. Each trial clones the repo and pulls data, then runs the trial, commits the result, and pushes back to DVC and Git storage.

Related discussions: #676, #638

cc @aguschin

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: frameworks Area: ML Framework integration feature request p2-medium
Projects
No open projects
Archived in project
Development

No branches or pull requests

5 participants