Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model parameter tracking #2379

Closed
hhoeflin opened this issue Aug 8, 2019 · 21 comments
Closed

Model parameter tracking #2379

hhoeflin opened this issue Aug 8, 2019 · 21 comments
Labels
question I have a question? research

Comments

@hhoeflin
Copy link

hhoeflin commented Aug 8, 2019

I usually save various parameters for my deep learning projects in yaml files (i.e. learning rate ranges to search, how to pre-process the data for training/testing etc). It would be nice to have an easy way to track and show them. I wanted to use dvc metrics show for that, and this way it would allow me to show my input parameters for a run next to the output metrics for that run. I thought that would be very handy to track what has actually changed.

However when I try to add that parameter yaml file with dvc metrics add, I get an error

ERROR: failed to add metric file 'my_file.yaml' - unable to find DVC-file with output 'my_file.yaml'

This error is understandable - as this is a parameter file, there is no previous step that produced it. So this may need to be handled somewhat differently.

Does such a feature make sense from your perspective? Would it be possible to add this?

Thanks

@pared
Copy link
Contributor

pared commented Aug 8, 2019

@hhoeflin hi, and thanks for sharing your use case!

Does such a feature make sense from your perspective? Would it be possible to add this?

This is the problem we are aware of. Hyperparameter search is part of many people workflow and unfortunately we don't have yet established how we would like to tackle it. Would you be interested in discussing this subject?

You intend to put parameteres inside yaml file, that is perfectly understandable, but what about produced outputs? If you produce few models, differing by params used to train them, would you like to preserve all of them? Or choose the best(according to one of the metrics) and preserve only it?

@hhoeflin
Copy link
Author

hhoeflin commented Aug 8, 2019

Yes, I would definitely be interested in discussing this subject more.

So first I would characterize my parameters as parameters that hold for the entire "branch" or at least the current commit.

I assume the setting you describe refers to fitting e.g. 10 models with 10 different learning rates, where even the performance after each epoch could be a separate metric?
When we have outputs like this, I think they could be captures inside the output-metrics file, e.g. for every AUC that is output, the relevant parameters for that model are output as well (e.g. learning rate and the epoch where it was achieved; epoch here may even be non-numeric, when just the best was selected, it could something like "best".

So for me, the metric files should be flexible and support all of this. Of course, outputs can get very big rather quickly, which is why I think for such a scope and export into csv files would be needed so that the result can easily be plotted and explored in other tools.

@pared
Copy link
Contributor

pared commented Aug 8, 2019

I assume the setting you describe refers to fitting e.g. 10 models with 10 different learning rates, where even the performance after each epoch could be a separate metric?

Yes, in extreme casse i was thinking about exploring parameters like
lr: [0.1, 0.01 ...]
and param1: [1,2,3]
and param2: [0.1, .4, .7]
In the end we would be exploring len(lr) * len(param1) * len(param2) models.

export into csv files would be needed

That is a good point, to compare results, user would probably want to explore, with some visualization tool (for example TensorBoard) how did training went across epochs, right?

What about preserving "best model" (for convinience lets say we already know how to choose the best).

  • For many hyperparameters, you probably don't want to store all models, right?
  • Do you usually save only 1 best one, or lets say, 3 best?

@hhoeflin
Copy link
Author

hhoeflin commented Aug 8, 2019

At the moment this really depends on the application. Often, disk space permitting, I keep all, at least for a while. Later I may only keep one.

My reference to "best" however was also intended in regards to metrics. I think currently metrics are intended to be numerical, right? With this, when describing models as they evolve through epochs, this may have something like
epoch: [1, 2, ..., "best"]
AUC: [0.7, 0.75, ..., 0.9]

so "parameters" such as the epoch may be mixed numerical and string. In any case, metrics should support both, storing results per epoch as well as best. From this perspective, epoch is actually just another parameter for an existing model.

I think one difficulty here is also how to impose structure so that a metrics json file could reliably serialized to a csv file.

@pared
Copy link
Contributor

pared commented Aug 8, 2019

@hhoeflin Actually, AFAIK type of metric does not matter, because we do not provide any logic related to metrics that would be type-related. So for example dvc run -m metric echo best >> metric is feasible. And running dvc metrics show will display properly.

The thing is that introducing some logic related to hyperparameter search would probably make us introduce some behaviour based on metrics values.

@hhoeflin
Copy link
Author

hhoeflin commented Aug 8, 2019

For me the important this is not so much logic around hyperparameter search, but the ability to

  • check in "global" parameters
  • check in metrics together with related parameters as tables of values
  • export all of these
  • within branches
  • across branches
    in convenient formats for further processing and displaying. Also how the output across several branches would best be merged (e.g. by including additional columns naming the branch and maybe the commit the metrics came from). Also a question is how these global parameters could best be stored in there.

I have just started thinking about these issues and don't really have good answers yet.

@pared
Copy link
Contributor

pared commented Aug 8, 2019

Also, for some time already there is hanging issue that might be related to this particular use case:
#1018
What would you think about such "build matrix" to run multiple experiments?

@hhoeflin
Copy link
Author

hhoeflin commented Aug 8, 2019

I see why someone would want to do this, but for me this is not a priority for the following reason:
a) I prefer to keep such looping logic in python code rather than breaking it down to make it run through a "makefile" sort of way as suggested with a build matrix
b) DVC contributes with appropriate data version control and metrics tracking an important thing and I think it is better to focus on this. For people wanting a build matrix, I would suggest to look at solutions like snakemake, which can be incorporated into a dvc run step. A build matrix feels like dvc starts strongly duplicating solutions like snakemake

@ghost
Copy link

ghost commented Aug 8, 2019

There are different tools already to do hyperparameter optimization: https://en.wikipedia.org/wiki/Hyperparameter_optimization#Open-source_software

I don't think DVC will bring any value by trying to solve this problem again, it is better to seek a smooth integration with other tools.

As I understand from what @hhoeflin expressed here #2379 (comment) , the idea is to have sort of like the following tracking UI:
image

Where you can have metrics + parameters + experiment.

I think this is a really valid concern and an interesting one (cc: @iterative/engineering).

By the way, @hhoeflin , you can use MLFlow alongside DVC.

@pared
Copy link
Contributor

pared commented Aug 9, 2019

@MrOutis
What I had in mind when saying hyperparameter optimization, was actually what you are naming experiments. I am not saying to reimplement AutoML libraries, but we don't have (yet) any approach how to tackle many experiments (branch per experiment won't be appropriate if our parameters space has, like 100 tuples).

@dmpetrov
Copy link
Member

dmpetrov commented Aug 9, 2019

@hhoeflin thank you for bringing this issue - it is an important scenario which we are thinking about. There are still a few open questions.

I'd like to make sure I understand the question correctly. Could you please clarify a few things:

  1. As far as I understand, this scenario does not necessarily include hyperparameter tuning\search. This might include just a couple of experiments with independent/custom set of parameters. Is this correct?
  2. After training a few models you need a way to compare metrics and choose the best one (whatever the best means). Would you like to track only the final result of the model (auc=0.724, err=0.234) or you need the entire AUC graphs?
  3. within branches/across branches - what does it mean? Are you okay of creating a commit (or branch) for each of the models/runs?

btw... what command you run to get this error:
ERROR: failed to add metric file 'my_file.yaml' - unable to find DVC-file with output 'my_file.yaml'

@hhoeflin
Copy link
Author

hhoeflin commented Aug 9, 2019

@dmpetrov

For me, statement 1) is correct. If hyperparameter search could be captured, that would be nice though.
2) I would like the option to have entire AUC graphs
3) Within branches, I mean that for a certain type of model, you may have made several changes over time and put in commits for them. You would like to track how this evolved (so how does the metric change when I go back in the history of the current branch). Another scenario is across branches, where I want to compare the performance of the current model I have in branch A with a completely different implementation that I keep in branch B.

hope that helps!

P.S. I used "dvc metrics add"

@dmpetrov
Copy link
Member

dmpetrov commented Aug 9, 2019

Thank you, @hhoeflin !

It is clear with (1).

Re (2) - currently DVC "static" metrics pretty well. However, if you'd like to see a graph difference you need to implement this by yourself. We need to think more about supporting this scenario - one of the open questions.

Re (3) - so, you'd like to have commits for each set of parameters. Is it correct? If so, there is a discussion about this #1691 and it looks like this will be implemented soon.
Another scenario is across branches - it works (again) only for "static" metrics. We need to think about how to generalize it to AUCs\graphs.

Today we show only metrics difference between branches\commits without any association to parameters. To track metrics history properly, we need to have this metrics-params association (this needs to be done based on some config baseline). It looks like we need to introduce a new concept of (model)config and config changes (parameters) into DVC. @hhoeflin do you agree on this? Do you see any simpler solution without introducing this concept?

@hhoeflin
Copy link
Author

hhoeflin commented Aug 9, 2019

@dmpetrov

For (3), this is what I was thinking.

Thinking about it, I am not sure if such rather complex behaviour necessarily needs to be provided as a command-line interface tool. An alternative approach could be to expose/expand/provide examples for python classes that can iterate within branches and well across the entire repo and allow the user to do operations on selected files, e.g. iterate through a repo, read all yaml files of a certain type, return as a dict with (branch, commit) as the key and leave it to the user what to do with this.

@shcheklein shcheklein added question I have a question? research labels Aug 9, 2019
@dmpetrov
Copy link
Member

If you are talking about the metrics-params association - it looks like the association has to be created for proper visualization anyway (python or command line). I think we should implement this in DVC first and then extract in API.

@hhoeflin one more question for you if you don't mind :)
If your goal is to try different parameters without modifying any code and data - have you considered using different directories for each of runs (directory might contain config, outputs and metrics)? Which approach looks more appealing for you - each run as a Git-commit or each run is a directory?

@shcheklein
Copy link
Member

@hhoeflin one of the workaround that come to my mind is to just dump to the "final" metrics file (json, csv, tsv - it does not matter for DVC and it provide some interface to work with all these formats) all input/global parameters along with actual metrics (like AUC). This way you will be able to see them with dvc metrics show as single table that includes everything.

Is it a reasonable approach or we would still missing something?

@pared
Copy link
Contributor

pared commented Oct 7, 2019

Related: https://discuss.dvc.org/t/best-practice-for-hyperparameters-sweep/244

@alexvoronov
Copy link

I was looking for best practice on how to organize hyperparameters search, and was pointed to this nice thread. From the discussion here I see that I'm not alone, but I also see that there is no "silver bullet" yet. Here is what I'm missing:

I think I’d like to separate a generic workflow/pipeline/DAG definition from each of its “instances” (with the hashes and all).

Reading through the links in this thread, it looks like some combination of snakemake for pipeline definition, with the addition of makepp's hash-based dependency tracking, all built on top of DVC, might lead to some sort of solution for me.

@alexvoronov
Copy link

What about preserving "best model" (for convinience lets say we already know how to choose the best).

  • For many hyperparameters, you probably don't want to store all models, right?
  • Do you usually save only 1 best one, or lets say, 3 best?

I have a few thoughts about which models to save, I wonder if they resonate with anyone else too:

1. Cache metaphor

Hyperparameter values can be seen as a key, and model can be seen as a stored value, in some hashtable or cache. If there is no stored model, we (re)compute the model and its metrics (e.g. if we come up with a new metric that we need for all models we tried so far). What to store and what to recompute depends on the application (e.g. we can store 5 best models seen so far, or everything, or nothing).

If model file is tracked by DVC, then .dvc file have enough information to recompute the model. Those model files are parametrized by specific values of hyperparameters. Here I come back to generic workflow/pipeline that can create instances for specific parameters.

2. Optimize what to store and what to recompute

We can assign a "price" to computing time and to storage (and to data transfer, and storage duration etc). If storage is very cheap, we save all models. If storage is expensive, we recompute everything. We can also look at the time it took to compute the model, and at the resulting model size, to make the decision on whether to store the model, or do delete it and recompute in the future if needed.

@ssYkse
Copy link

ssYkse commented Oct 29, 2019

Hi! I am also on the look out for a way to track my parameters with dvc.

TLDR: I need a few parameters tracked for the download > explore > preprocess stages, and I use MLflow for hyper parameter optimization tracking. This seems different from what other need, who want DVC to handle the hyperparamter optimization.

My current approach: I have a shell script which produces the pipelines (With many stages, I only show one here):

### Download Pipe ### ...
### Explore Pipe ### ...
...
### Package Pipe ###
dvc run \
        -f ./pipelines/mnist/package.dvc \
        -d src/package_model.py \
        -d models/raw \
        -d params/auto/explore.yaml \
        -o models/packaged \
        --no-exec \
        'mean=$(cat params/auto/explore.yaml | grep "mean" | sed "s/[^0-9\.]//g") && \
        std=$(cat params/auto/explore.yaml | grep "std" | sed "s/[^0-9\.]//g") && \
        mlflow run . -e package \
                        -P model_dir_in=models/raw \
                        -P model_dir_out=models/packaged \
                        -P mean=$mean \
                        -P std=$std'
### Build Docker Pipe ### ...

Then, I have a folder which contains the parameters (and params/auto for those that get generated by stages, for example params/auto/expolre.yaml contains mean and std of the mnist dataset, which is used for preprocessing, and then later when packaging the model to a docker container to also scale the input data to what the model expects).

This works, however the whole grep'n'sed is very error prone. I need to make three changes (-d params/auto/explore.yaml, mean=$..., and later in the command -P mean=$mean.) to implement a new paramter.

If you are interested in what I have : https://github.com/ssYkse/mlflowworkshop
I did this mlflow workshop for work, and am now trying to add DVC, since my previous approach wasn't nice. Sorry, the README is in german, however the code is english. If you start with dvc_pipe_builder.sh and MLproject you should be able to understand what the project is all about.

@dmpetrov
Copy link
Member

All of these are implemented in the coming DVC 1.0:

Closing. Please let me know if something is not implemented yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question I have a question? research
Projects
None yet
Development

No branches or pull requests

6 participants