-
Notifications
You must be signed in to change notification settings - Fork 1.3k
exp show: Add --only-changed option
#6867
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
dberenbaum
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about the name or description. There are two possible meaning of changed values:
- Whether the table column has 0 variance/all the same values. This is the current feature AFAIK.
- Whether the experiment value changed from its parent commit. I think this is technically closer to what was requested in #5966 and #6778.
The behavior will be similar since I think exp show always includes the parent commit with its associated experiments. I think the only time it will be different is if two parent commits have different values but the experiments don't modify those values.
Since this is pretty minor, the current functionality makes sense and is intuitive. However, I do think we should be precise about the meaning, especially since we might have other functionality to show changes from the parent commit (we could use highlighting for that, for example).
What do you think? Is there a better option name or description to summarize the behavior?
|
How does this interact with other |
This is the last operation applied before rendering the table. It is applied to the We might need to specify that (in the current state of the P.R.) |
Do you think this makes sense from a user perspective? I would probably expect |
|
One question that I have, what do the users care more about: looking through the entire history of parameters/metrics or just a few recent experiments? I know there are use-cases for both of those but just wanted to understand which one do you guys think is more important. Looking at the current complexity of |
Before introducing this command, I have always considered both However, I understand that users might expect |
dvc/compare.py
Outdated
| {k: self._columns[k][i] for k in keys} for i in range(len(self)) | ||
| ] | ||
|
|
||
| def drop_duplicates(self, axis: str): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My plan for TabularData was to evolve the APIs to mirror those of like pandas and then split it into a separate library. I love that you are also taking the inspiration.
I did make a mistake on the internals of it, and by locking the type to just str instead of arbitrary data (and, no support for NaN :(), which I did mostly for performance reasons as we'd have to coerce during rendering. If you are fighting too much with the internals, please feel free to change it. As I said, my plan is to split it as a pure-python lightweight pandas library in the future when we improve APIs and internals π (we could separate it today too).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's very good to hear.
In fact, preserving the original type was one of the next steps I was about to bring up for discussion. It is not a fully blocker for the parallel coordinates but it would definitely make it easier to integrate.
I also felt that TabularData was becoming a good candidate to be one of the core components of the future experiments.api; I assume you also felt that way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The arbitrary-type support would allow us to support custom serializers for NaNs (instead of hacking around with fill_value), float values (instead of putting the responsibility on the caller) and composite objects (dict/list). Though at the expense of performance.
I also felt that TabularData was becoming a good candidate to be one of the core components of the future experiments.api
I am not really sure what experiments.api is, or what its scope is tbh. One thing to keep in mind though, the CLI that we render is actually a nested table, a table for each commit, and each table having their n number of experiments, which is flattened later. This is why the --include/--exclude and --sort-by does not really translate well to the TabularData. pandas does support nested tables, but I don't think it's worth complicating it here. I am not sure if we need to distinguish between that for experiments.api.
I'm not fully understanding the meaning of point number 2. Could you please share an example of what would be expected behavior with some fake data? |
Let's say the output with the current functionality looks like: Technically, no experiments being shown change |
Good point. Should we discuss in a separate issue/discussion? |
Referenced in #6451 |
1b67f26 to
a6f470c
Compare
pared
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
exp show --only-changed--only-changed option
a6f470c to
dd986f1
Compare
|
@daavoo Please rebase |
This method mimics pandas `drop_duplicates` when axis=rows. For axis=cols this method will drop columns with 0-variance (only 1 unique value). NaNs (`_fill_value`) won't be counted as unique values. So the following column (`-` would be the `_fill_value`) will be dropped: foo - foo foo
Use TabularData `drop_duplicates` to only show metrics/params with values varying across the selected experiments. Fixes #5966
dd986f1 to
c597e98
Compare
That's all I do these days π§ |
β I have followed the Contributing to DVC checklist.
π If this PR requires documentation updates, I have created a separate PR (or issue, at least) in dvc.org and linked it here.
dvc.orgP.R: treeverse/dvc.org#2966Closes #5966
Also, pre-requisite for #4455
Demo: