Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added diff-view switch #4862

Merged
merged 23 commits into from
Oct 21, 2021
Merged

Added diff-view switch #4862

merged 23 commits into from
Oct 21, 2021

Conversation

marijncv
Copy link
Contributor

@marijncv marijncv commented Oct 1, 2021

Signed-off-by: Marijn Valk marijncv@hotmail.com

What changes are proposed in this pull request?

Added a checkbox for switching between diff-only view and regular view. closes #4819

How is this patch tested?

Create a couple of runs with params, metrics and tag of which some are different accross runs and some which are the same across runs. Mark/unmark the checkbox and see the columns disappear for which the value across all the runs is the same

Release Notes

Is this a user-facing change?

  • No. You can skip the rest of this section.
  • Yes. Give a description of this change to be included in the release notes for MLflow users.

Added a checkbox that will filter out all columns for which every run has the same value

What component(s), interfaces, languages, and integrations does this PR affect?

Components

  • area/artifacts: Artifact stores and artifact logging
  • area/build: Build and test infrastructure for MLflow
  • area/docs: MLflow documentation pages
  • area/examples: Example code
  • area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
  • area/models: MLmodel format, model serialization/deserialization, flavors
  • area/projects: MLproject format, project running backends
  • area/scoring: MLflow Model server, model deployment tools, Spark UDFs
  • area/server-infra: MLflow Tracking server backend
  • area/tracking: Tracking Service, tracking client APIs, autologging

Interface

  • area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
  • area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
  • area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
  • area/windows: Windows support

Language

  • language/r: R APIs and clients
  • language/java: Java APIs and clients
  • language/new: Proposals for new client languages

Integrations

  • integrations/azure: Azure and Azure ML integrations
  • integrations/sagemaker: SageMaker integrations
  • integrations/databricks: Databricks integrations

How should the PR be classified in the release notes? Choose one:

  • rn/breaking-change - The PR will be mentioned in the "Breaking Changes" section
  • rn/none - No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" section
  • rn/feature - A new user-facing feature worth mentioning in the release notes
  • rn/bug-fix - A user-facing bug fix worth mentioning in the release notes
  • rn/documentation - A user-facing documentation change worth mentioning in the release notes

Signed-off-by: Marijn Valk <marijncv@hotmail.com>
@github-actions github-actions bot added area/uiux Front-end, user experience, plotting, JavaScript, JavaScript dev server rn/feature Mention under Features in Changelogs. labels Oct 1, 2021
Signed-off-by: Marijn Valk <marijncv@hotmail.com>
Signed-off-by: Marijn Valk <marijncv@hotmail.com>
Signed-off-by: Marijn Valk <marijncv@hotmail.com>
Signed-off-by: Marijn Valk <marijncv@hotmail.com>
@marijncv marijncv marked this pull request as ready for review October 2, 2021 20:26
@harupy
Copy link
Member

harupy commented Oct 4, 2021

Thanks for the PR, it looks great! Can we keep the source column to make it easier to jump to notebook/scripts?

diff-columns.mov

image

Scritp to generate test data:

import mlflow
import random


def random_float():
    return random.random()


def random_string():
    return random.choice(["a", "b", "c"])


for _ in range(100):
    with mlflow.start_run():
        mlflow.log_param("p1", 0)
        mlflow.log_param("p2", random_string())

        mlflow.log_metric("m1", 0)
        mlflow.log_metric("m2", random_float())

        mlflow.set_tag("t1", 0)
        mlflow.set_tag("t2", random_string())

@marijncv
Copy link
Contributor Author

marijncv commented Oct 4, 2021

Removed the source column from the diff view (it will be always visible unless unchecked by the user). Also moved the getCategorizedColumnsDiffView function to ExperimentViewUtil

@marijncv marijncv changed the title Added diff-view checkbox (WIP) Added diff-view switch Oct 4, 2021
Signed-off-by: Marijn Valk <marijncv@hotmail.com>
@dbczumar
Copy link
Collaborator

dbczumar commented Oct 5, 2021

@marijncv Thanks a bunch for addressing comments from @harupy ; this is looking great! Before finalizing the PR, we've asked one of our UI/UX designers for input about the placement and text associated with the toggle. We'll provide a design mock with this information in the next few days.

@marijncv
Copy link
Contributor Author

marijncv commented Oct 6, 2021

@dbczumar sounds good, looking forward to the design mock!

@dbczumar
Copy link
Collaborator

dbczumar commented Oct 12, 2021

@dbczumar sounds good, looking forward to the design mock!

Hi @marijncv , apologies for the delay. Here is the mock:

diff_only_mock

Please ignore the slightly different UI styling used in our mockup tools. For consistency with the rest of the MLflow UI, we can keep using the existing toggle element from your PR. The only practical differences are the text next to the toggle and the text displayed on hover.

@marijncv
Copy link
Contributor Author

Thanks for the design @dbczumar! I've updated the PR accordingly

@harupy
Copy link
Member

harupy commented Oct 12, 2021

@marijncv Thanks for the updates! @dbczumar @jinzhang21 It looks like this now:

diff-column-view.mov
  • The toggle button looks great!
  • Diff columns are recomputed when we load more runs.

@harupy
Copy link
Member

harupy commented Oct 12, 2021

I found a bug that's related to column selection:

column-selection-bug.mov

In this video, I did the following

  1. Enable the diff view (this hides the User column).
  2. Select the User column in the column selector, but it doesn't show up.
  3. Unselect the User column.
  4. Select the User column again and it shows up this time.

Signed-off-by: Marijn Valk <marijncv@hotmail.com>
@marijncv
Copy link
Contributor Author

@harupy thanks for pointing out the bug! I think it's fixed with my latest commit.

But I'm curious about a difference I see in your video and the code on my machine. For me the position of the column changes to the end of the list when I check/uncheck it, but for you that seems to not be the case (it just returns back to it's old position). Do you have any idea why that could be the case?

@harupy
Copy link
Member

harupy commented Oct 12, 2021

@marijncv Thanks for the quick fix!

Can you take a screen recording of what happens on your machine and share it with us?

@marijncv
Copy link
Contributor Author

mlflow.mp4

In the video I use the switch, then select username, it shows up but at the end of the list of columns instead of in it's original position

@harupy
Copy link
Member

harupy commented Oct 12, 2021

@marijncv Thanks for the video. Let me pull the latest commit and try again.

@harupy
Copy link
Member

harupy commented Oct 12, 2021

column-select-2.mov

On my machine, the column shows up in the right position.

Signed-off-by: Marijn Valk <marijncv@hotmail.com>
Signed-off-by: Marijn Valk <marijncv@hotmail.com>
Signed-off-by: Marijn Valk <marijncv@hotmail.com>
…View.js

correct comment

Co-authored-by: Harutaka Kawamura <hkawamura0130@gmail.com>
Signed-off-by: Marijn Valk <marijncv@hotmail.com>
Signed-off-by: Marijn Valk <marijncv@hotmail.com>
@harupy
Copy link
Member

harupy commented Oct 20, 2021

Hey @marijncv, thanks for patiently applying our feedback!

bug-2.mov

We found another bug. Here are the steps to reproduce the issue.

  • Enable the diff-only view
  • Load more runs
  • Disable the diff-only view

The m1, p1, and t1 columns should show up after disabling the diff-only view but remain hidden. With the newly loaded runs, the result of getCategorizedUncheckedKeysDiffView doesn't contain m1, p1, and t1, which changes the result of getRestoredCategorizedUncheckedKeys.

Code to populate data
import mlflow
import random


def random_float():
    return random.random()


def random_string():
    return random.choice(["a", "b", "c"])


for _ in range(100):
    with mlflow.start_run():
        mlflow.log_param("p3", random_float())
        mlflow.log_metric("m3", random_float())
        mlflow.set_tag("t3", random_float())


for _ in range(100):
    with mlflow.start_run():
        mlflow.log_param("p1", 0)
        mlflow.log_param("p2", random_float())

        mlflow.log_metric("m1", 0)
        mlflow.log_metric("m2", random_float())

        mlflow.set_tag("t1", 0)
        mlflow.set_tag("t2", random_float())

Comment on lines 624 to 628
[COLUMN_TYPES.ATTRIBUTES]: _.concat(
categorizedUncheckedKeys[COLUMN_TYPES.ATTRIBUTES],
attributeKeyList.filter((v, index) => {
return allEqual(attributes[index]);
}),
Copy link
Member

@harupy harupy Oct 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to hide attribute columns only when they are empty?

Here's an example:

a1 a2 a3
- - 1
- 1 1
- 1 1

For the table above, the diff switch should hide the a1 column.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it's possible. With that addition, should I also add the Models attribute to be considered to be removed if all values are empty? Right now only Run Name, User and Version are considered.

Version and User will never be empty so they can probably be left out of consideration all together. But on the other hand I can still image that the user would like to hide these columns if they contain the same value for each row.

What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should I also add the Models attribute to be considered to be removed if all values are empty?

Yes!

Version and User will never be empty so they can probably be left out of consideration all together.

User and Source, right? Version can be empty (e.g. run mlflow code in a non-git directory). Makes sense to exclude them from consideration.

Copy link
Member

@harupy harupy Oct 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But on the other hand I can still image that the user would like to hide these columns if they contain the same value for each row.

I do understand that some users prefer to hide constant attribute columns to obtain more space for param, metric, and tag columns. On the other hand, the version, user, and run-name columns seem useful even if they are constant.

  • user: tells us who creates the displayed runs
  • version: tells us the code that creates the displayed runs
  • run-name: ??? (I couldn't come up with a useful use case)

We could argue you can show attribute columns after turning on the diff switch though.

@dbczumar Any thoughts on this?

@marijncv
Copy link
Contributor Author

marijncv commented Oct 20, 2021

Hey @marijncv, thanks for patiently applying our feedback!

bug-2.mov
We found another bug. Here are the steps to reproduce the issue.

  • Enable the diff-only view
  • Load more runs
  • Disable the diff-only view

The m1, p1, and t1 columns should show up after disabling the diff-only view but remain hidden. With the newly loaded runs, the result of getCategorizedUncheckedKeysDiffView doesn't contain m1, p1, and t1, which changes the result of getRestoredCategorizedUncheckedKeys.

Code to populate data

import mlflow
import random


def random_float():
    return random.random()


def random_string():
    return random.choice(["a", "b", "c"])


for _ in range(100):
    with mlflow.start_run():
        mlflow.log_param("p3", random_float())
        mlflow.log_metric("m3", random_float())
        mlflow.set_tag("t3", random_float())


for _ in range(100):
    with mlflow.start_run():
        mlflow.log_param("p1", 0)
        mlflow.log_param("p2", random_float())

        mlflow.log_metric("m1", 0)
        mlflow.log_metric("m2", random_float())

        mlflow.set_tag("t1", 0)
        mlflow.set_tag("t2", random_float())

No worries, it's a great learning experience :). Will look into this today.

Edit: it should be addressed by 9990804

Signed-off-by: Marijn Valk <marijncv@hotmail.com>
* Obtain the categorized columns for which the values in them
* have only a single value (or are undefined)
*/
static getCategorizedUncheckedKeysDiffView({
Copy link
Member

@harupy harupy Oct 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw I'm considering how we can improve/simplify this function. Here's my attempt:

Commit: harupy@6f61045
Branch: https://github.com/harupy/mlflow/tree/improve-diff-column-search

Copy link
Contributor Author

@marijncv marijncv Oct 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it! We could even add a short-circuit in the loop over runInfos if we find there are no longer any columns to consider.

And then for attributes we can add a dropNonEmptyColumns function and incorporate it in the same loop over runInfos

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I applied your proposal and integrated the points I mentioned above. I'm doubting whether the toAttributesMap method should be made more generic (i.e. include all attributes). Also, maybe the comments in dropNonEmptyColumns might be a bit overkill since it's so similar to dropDiffColumns.

Signed-off-by: Marijn Valk <marijncv@hotmail.com>
Copy link
Collaborator

@dbczumar dbczumar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks @marijncv !

@harupy
Copy link
Member

harupy commented Oct 21, 2021

@marijncv I found several unrelated CI checks failed. We can just ignore them in this PR.

Comment on lines 639 to 664
const dropNonEmptyColumns = (columns, prevRow, currRow) => {
// # What each argument represents:
// | a | b | c | d | e | <- columns
// | --- | --- | --- | --- | --- |
// | - | 1 | - | 1 | 1 | <- prevRow
// | - | - | 1 | 1 | 2 | <- currRow
// | ? | ? | ? | ? | ? |

// a: may be an empty column, we need to take a look at the next row
// b: is not an empty column, we don't need to take a look at the next row
// c: is not an empty column
// d: is not an empty column
// e: is not an empty column

return columns.filter((col) => {
const prevValue = prevRow[col];
const currValue = currRow[col];
if ((!prevValue && !currValue) || (!currValue.length && !currValue.length)) {
// Case a
return true;
} else {
// Case b, c, d & e
return false;
}
});
};
Copy link
Member

@harupy harupy Oct 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For detecting empty attribute columns, I don't think we need to compare the previous and current rows. We can just take a look at the current row.

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
@harupy
Copy link
Member

harupy commented Oct 21, 2021

@marijncv I pushed a commit to fix a couple of minior issues.

Copy link
Member

@harupy harupy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for the contribution!

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
@harupy harupy merged commit 9e7c94d into mlflow:master Oct 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/uiux Front-end, user experience, plotting, JavaScript, JavaScript dev server rn/feature Mention under Features in Changelogs.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FR] Diff-only view for runs table on experiments page
3 participants