Skip to content

Conversation

@daavoo
Copy link
Contributor

@daavoo daavoo commented Dec 3, 2021

iterative/dvc.org#3220


Use repo.index.deps / repo.index.outs to collect dependencies / outputss associated with each experiment.

Closes #6434

What's considered a dep

Currently, anything in repo.index.deps that is not a param dependency or an imported .dvc (because if the .dvc is used in the pipeline it would be a duplicated column).

Studio filters git tracked files but I think that showing those files (i.e. src deps) is also valuable.

I think that more complicated internal filtering (i.e. considering removing intermediate deps) it's not worthy and problematic when considering all use cases.

The table can get noisy but we provide the --only-changed flag (we could consider making it the default) and new improved filtering #7141 that should make it easy to customize the table.

JSON output

For --json output, this P.R. adds new deps and outs fields:

{
        "baseline": {
            "data": {
                "deps": {
                    "copy.py": {
                        "hash": "561f068574ab2a132d304dca3dd6510d",
                        "size": 310,
                        "nfiles": None,
                    }
                },
                "metrics": {"metrics.yaml": {"data": {"foo": 1}}},
                "outs": {
                    "model.pkl": {
                        "hash": "fb7792b6596fd12502dd132c0aba0568",
                        "size": 2000,
                        "nfiles": None,
                    }
                },
                "params": {"params.yaml": {"data": {"foo": 1}}},
                "queued": False,
                "running": False,
                "executor": None,
                "timestamp": None,
            }
        }
    }

Table

For the table, it creates a new type of colored columns and shows the hash (let the debate begin).

After some testing, it looks like the optimal value for showing in the data column highly varies between use cases.

Given the limitations of the CLI, I opted for showing hash as it's the value that allows, IMO, to easily identify differences between rows.

From example-get-started:

  • dvc exp show --only-changed

Captura de pantalla 2021-12-22 a las 20 11 16

  • dvc exp show --all-branches --only-changed

Captura de pantalla 2021-12-22 a las 20 11 45

Some deps might not be relevant, filtering with #7141 :

  • dvc exp show --all-branches --only-changed --drop '.+prepared|model'

Captura de pantalla 2021-12-22 a las 20 15 31

@daavoo daavoo changed the title [WIP]: exp show: Include deps columns. [WIP] exp show: Include deps columns. Dec 3, 2021
@daavoo daavoo changed the title [WIP] exp show: Include deps columns. exp show: Include deps columns. Dec 21, 2021
@daavoo daavoo marked this pull request as ready for review December 21, 2021 22:26
@daavoo daavoo requested a review from a team as a code owner December 21, 2021 22:26
@daavoo daavoo requested a review from dtrifiro December 21, 2021 22:26
@daavoo daavoo changed the title exp show: Include deps columns. [WIP] exp show: Include deps columns. Dec 21, 2021
@daavoo daavoo requested review from dberenbaum, shcheklein and skshetry and removed request for skshetry December 21, 2021 22:37
@daavoo daavoo force-pushed the exp-show-deps branch 2 times, most recently from b96420c to 901f9fb Compare December 22, 2021 11:50
Comment on lines +556 to +479
data_dep = first(x for x in dvc.index.deps if "copy.py" in x.fspath)
data_hash = data_dep.hash_info.value[:7]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't know how to something like the ANY usage above but for the type of assertions bellow

@dberenbaum
Copy link
Contributor

Looks good so far!

Should deps come at the end (after params)? The noise would be less of an issue if they are on the right.

@daavoo daavoo force-pushed the exp-show-deps branch 2 times, most recently from 21782db to ce64f63 Compare December 22, 2021 19:31
@daavoo
Copy link
Contributor Author

daavoo commented Dec 22, 2021

Looks good so far!

Should deps come at the end (after params)? The noise would be less of an issue if they are on the right.

Done

@daavoo daavoo changed the title [WIP] exp show: Include deps columns. [WIP] exp show: Include deps and outs. Dec 23, 2021
@daavoo daavoo force-pushed the exp-show-deps branch 3 times, most recently from 63c1eb6 to b11e30b Compare January 3, 2022 15:49
@daavoo daavoo changed the title [WIP] exp show: Include deps and outs. exp show: Include deps and outs. Jan 3, 2022
dtrifiro
dtrifiro previously approved these changes Jan 10, 2022
@daavoo daavoo force-pushed the exp-show-deps branch 2 times, most recently from b26f8d6 to ea3821b Compare January 19, 2022 17:33
@daavoo daavoo requested a review from skshetry January 19, 2022 18:07
dberenbaum
dberenbaum previously approved these changes Jan 19, 2022
Use `repo.index.deps` to collect dependencies associated with each experiment.
@dberenbaum
Copy link
Contributor

Should deps come at the end (after params)? The noise would be less of an issue if they are on the right.

By the way, Studio has data files in between metrics and params, right? Do you know why this order was preferred, and what do you think? I like them at the end, but consistency with Studio makes sense.

@daavoo
Copy link
Contributor Author

daavoo commented Jan 26, 2022

By the way, Studio has data files in between metrics and params, right? Do you know why this order was preferred, and what do you think? I like them at the end, but consistency with Studio makes sense.

I don't know Studio preferences. I think that in our case it makes sense to introduce them at the end as they didn't exist before and table can't get noisy without --only-changed flag.

@daavoo daavoo merged commit 5e80fcc into main Jan 26, 2022
@daavoo daavoo deleted the exp-show-deps branch January 26, 2022 17:45
@daavoo daavoo added A: experiments Related to dvc exp A: diff/show enhancement Enhances DVC labels Jan 26, 2022
@dberenbaum
Copy link
Contributor

@daavoo It looks like the data columns are in a random order and it sometimes changes on repeated calls to exp show. Can we sort in alphabetical order or some other predictable way that will make sense to users?

@jorgeorpinel
Copy link
Contributor

jorgeorpinel commented Jan 28, 2022

or some other predictable way

Curious: what was the sorting in the end? I think the most natural would be as defined in dvc.yaml

@daavoo
Copy link
Contributor Author

daavoo commented Jan 28, 2022

or some other predictable way

Curious: what was the sorting in the end? I think the most natural would be as defined in dvc.yaml

Alphabetical (it is what's currently used in Studio, afaik)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A: experiments Related to dvc exp enhancement Enhances DVC

Projects

None yet

Development

Successfully merging this pull request may close these issues.

exp show: Include data files.

6 participants