Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to use DVC for multiple experiments with same code #2324

Closed
fabiocapsouza opened this issue Jul 25, 2019 · 2 comments
Closed

How to use DVC for multiple experiments with same code #2324

fabiocapsouza opened this issue Jul 25, 2019 · 2 comments
Labels
question I have a question?

Comments

@fabiocapsouza
Copy link

I am new to DVC and I have a question about an use case which I could not find an example of.
The project I am working on requires the same experiment (code) to be run for multiple scenarios (for example, 4 datasets). Each scenario requires separate training and evaluation runs, but all of them use the same stages (except for input files) and output the same files and metrics.
I'd like to have snapshot of the metrics for all scenarios when code changes (but sometimes only for some scenarios, I don't want to run all 4 experiments if a change produces bad results for one experiment).

Does DVC currently support this workflow? How should I organize my pipeline(s)/metrics and execute my experiments?

Is it the right way to build separate pipelines, one for each scenario?

Thanks in advance,

@efiop efiop added the question I have a question? label Jul 25, 2019
@efiop
Copy link
Contributor

efiop commented Jul 25, 2019

Hi @fabiocapsouza !

There are currently 2 ways to go about it:

  1. You could use git branches for each experiement and then merge the one you like the most into master. Our get-started guide briefly goes over that https://dvc.org/doc/get-started/compare-experiments

  2. Build separate pipelines using the same code. You could do that quite conveniently by using separate directories for your experiements and just copying dvcfiles over.

In both cases, metrics comparison will work pretty much the same.

Another option for your scenario would be to implement "best experiment selection" in your own code, so that it goes through the list of input datas and selects the best one.

There is also another feature that we are considering implementing build matrix #1018 . If you are familiar with travis ci, it is a pretty similar thing. Please take a look to see if you find it helpful for your scenario and don't hesitate to leave a comment there 🙂

@efiop
Copy link
Contributor

efiop commented Sep 9, 2019

Closing as inactive. Please feel free to reopen 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question I have a question?
Projects
None yet
Development

No branches or pull requests

2 participants