-
Notifications
You must be signed in to change notification settings - Fork 1.3k
exp show: Include deps and outs.
#7089
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
deps columns.deps columns.
9452c99 to
0d0b1be
Compare
deps columns.deps columns.
deps columns.deps columns.
b96420c to
901f9fb
Compare
| data_dep = first(x for x in dvc.index.deps if "copy.py" in x.fspath) | ||
| data_hash = data_dep.hash_info.value[:7] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't know how to something like the ANY usage above but for the type of assertions bellow
|
Looks good so far! Should deps come at the end (after params)? The noise would be less of an issue if they are on the right. |
21782db to
ce64f63
Compare
Done |
deps columns.deps and outs.
63c1eb6 to
b11e30b
Compare
deps and outs.deps and outs.
b26f8d6 to
ea3821b
Compare
1d37e13 to
ae16209
Compare
Use `repo.index.deps` to collect dependencies associated with each experiment.
By the way, Studio has data files in between metrics and params, right? Do you know why this order was preferred, and what do you think? I like them at the end, but consistency with Studio makes sense. |
I don't know Studio preferences. I think that in our case it makes sense to introduce them at the end as they didn't exist before and table can't get noisy without |
|
@daavoo It looks like the data columns are in a random order and it sometimes changes on repeated calls to |
Curious: what was the sorting in the end? I think the most natural would be as defined in dvc.yaml |
Alphabetical (it is what's currently used in Studio, afaik) |
β I have followed the Contributing to DVC checklist.
π If this PR requires documentation updates, I have created a separate PR (or issue, at least) in dvc.org and linked it here.
iterative/dvc.org#3220
Use
repo.index.deps/repo.index.outsto collect dependencies / outputss associated with each experiment.Closes #6434
What's considered a dep
Currently, anything in
repo.index.depsthat is not a param dependency or an imported.dvc(because if the.dvcis used in the pipeline it would be a duplicated column).Studio filters git tracked files but I think that showing those files (i.e.
srcdeps) is also valuable.I think that more complicated internal filtering (i.e. considering removing intermediate deps) it's not worthy and problematic when considering all use cases.
The table can get noisy but we provide the
--only-changedflag (we could consider making it the default) and new improved filtering #7141 that should make it easy to customize the table.JSON output
For
--jsonoutput, this P.R. adds newdepsandoutsfields:{ "baseline": { "data": { "deps": { "copy.py": { "hash": "561f068574ab2a132d304dca3dd6510d", "size": 310, "nfiles": None, } }, "metrics": {"metrics.yaml": {"data": {"foo": 1}}}, "outs": { "model.pkl": { "hash": "fb7792b6596fd12502dd132c0aba0568", "size": 2000, "nfiles": None, } }, "params": {"params.yaml": {"data": {"foo": 1}}}, "queued": False, "running": False, "executor": None, "timestamp": None, } } }Table
For the table, it creates a new type of colored columns and shows the
hash(let the debate begin).After some testing, it looks like the optimal value for showing in the data column highly varies between use cases.
Given the limitations of the CLI, I opted for showing
hashas it's the value that allows, IMO, to easily identify differences between rows.From example-get-started:
dvc exp show --only-changeddvc exp show --all-branches --only-changedSome deps might not be relevant, filtering with #7141 :
dvc exp show --all-branches --only-changed --drop '.+prepared|model'