add hyper-parameter optimization api #5929
Replies: 5 comments
-
Hi @schlichtanders Thank you for the detailed description - interesting perspective to the hyper params and good ideas! We are discussing experimentation scenarios in DVC and it looks like DVC needs special support for some cases. A recent discussion example - #2379. I'd love to discuss this from the point of hyperparameter tuning case and hyper param optimization packages. Could you please clarify a few things:
The major question I have - Why do we need two abstractions: branches AND subfolders? Additional questions: |
Beta Was this translation helpful? Give feedback.
-
Related: https://discuss.dvc.org/t/best-practice-for-hyperparameters-sweep/244 |
Beta Was this translation helpful? Give feedback.
-
I made some progress and created a small example, however currently have no time completing it. Nevertheless here the link: the idea is simple: after defining two helper functionalities a hyperparameter search is just a little wrapper script which calls another .dvc file two helpers
I hope I find time in november/december to finish this and answer all your questions respectively |
Beta Was this translation helpful? Give feedback.
-
I had two thoughts related to potential API for hyperparameters on how to choose whether to store resulting models or not ("treat it as cache" and "treat it as optimal decision"). I posted them in another thread: #2379 (comment) If API would allow such flexibility, exact decision can be easily delegated to other libraries. Unfortunately I don't have anything more concrete than this wish/feature request yet. |
Beta Was this translation helpful? Give feedback.
-
To add to the discussion, we recently had an issue trying to integrate DVC with Ray Tune's hyperparameter optimization process. The problem is that Ray Tune wants the complete control of the experiment: from taking in a parameters file that contains every parameter to be examined, to creating permutations of these parameters and feeding the permutations one-by-one to the network, to creating training runs for each parameter permutation, to creating and registering checkpoints, to the final decision on which particular parameter permutation produced optimal results. Which is on the one hand great, because that's what automatization is supposed to do, but on the other it conflicts somewhat with the way we'd like to version control our data. Namely, we want to be able to version control (with DVC Experiments) every individual experiment, where a single "experiment" is one training run of the network with a single set of parameters. However, thus fur we were not able to find an easy way to interface Ray Tune with DVC. |
Beta Was this translation helpful? Give feedback.
-
Dear DVC folk,
Motivation
you mention it yourself on your documentation: Fully versioned Hyperparameter Optimization comes to mind when using DVC.
Little Research
I just made a quick research and it gets apparent very soon that this needs a specific implementation for dvc.
All the existing hyperparameter optimizers like python's hyperopt
Suggestions how to integrate to DVC
It seems to me the following is needed for hyperparameter optimization to be a natural addition to DVC:
each triggered hyperoptimization orchestration should have its own git branch subfolder
each single hyperoptimization run should have its own subbranch under that subfolder
a file-based hyper-parameter API, probably based on json
dvc metrics
already work.dvc repro
it would unbelievably awesome to not reinvent the wheel entirely, but provide wrappers around existing hyperoptimization* packages like hyperopt or smac or others
the principle idea is simple: instead of running a concrete algorithm with the specific framework, you run a wrapper which
dvc repro myalgorithm.dvc
on a previously specified routinemyalgorithm.dvc
wrapping existing optimization frameworks has several advantages
Of course more details will pop up while actually implementing this, e.g. how to integrate hyperoptimization with .dvc pipeline files as neatly as possible (for instance we may want to commit both the single run.dvc as well as a hyperopt.dvc to the same repository -- these need to interact seamlessly together)
What do you think about this suggested approach?
Beta Was this translation helpful? Give feedback.
All reactions