Fix truncation issues when using explainable evaluator #6555

dsgibbons · 2022-08-23T07:13:20Z

Related Issues/PRs

What changes are proposed in this pull request?

All changes relate to default_evaluator.py.

Use str(f) to allow for non-string features to be truncated truncated_feature_names = [truncate_str_from_middle(str(f), 20) for f in self.feature_names]
Use str to prevent truncation mapping being applied to short non-string names like 1, 2 etc. : if truncated_name != str(self.feature_names[i])
When creating shap_predict_fn, don't create a new DataFrame from x if x is already a DataFrame. If there is a mismatch between the column names of x and the column names passed to DataFrame(x, columns=feature_names), then the resulting DataFrame is filled with NaNs.

How is this patch tested?

I have written tests (not required for typo or doc fix) and confirmed the proposed feature/bug-fix/change works.

Not yet, but I will write a test to make sure that evaluating with non-string, or very long feature names does not result in NaNs. Once I have written this test, I will remove the draft tag from this PR

Does this PR change the documentation?

No. You can skip the rest of this section.
Yes. Make sure the changed pages / sections render correctly by following the steps below.

Click the Details link on the Preview docs check.
Find the changed pages / sections and make sure they render correctly.

Release Notes

Is this a user-facing change?

No. You can skip the rest of this section.
Yes. Give a description of this change to be included in the release notes for MLflow users.

(Details in 1-2 sentences. You can just refer to another PR with a description if this PR is part of a larger change.)

What component(s), interfaces, languages, and integrations does this PR affect?

Components

Interface

area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
area/windows: Windows support

Language

language/r: R APIs and clients
language/java: Java APIs and clients
language/new: Proposals for new client languages

Integrations

integrations/azure: Azure and Azure ML integrations
integrations/sagemaker: SageMaker integrations
integrations/databricks: Databricks integrations

How should the PR be classified in the release notes? Choose one:

rn/breaking-change - The PR will be mentioned in the "Breaking Changes" section
rn/none - No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" section
rn/feature - A new user-facing feature worth mentioning in the release notes
rn/bug-fix - A user-facing bug fix worth mentioning in the release notes
rn/documentation - A user-facing documentation change worth mentioning in the release notes

Signed-off-by: Daniel Gibbons <daniel.gibbons04@gmail.com>

harupy · 2022-08-23T11:19:27Z

@dsgibbons Thanks for the RP! Can you add tests?

dsgibbons · 2022-08-23T11:33:30Z

Yes I was planning to add a test tomorrow and then I'll mark it ready to review.

harupy · 2022-08-23T11:53:54Z

Got it, thanks!

Signed-off-by: Daniel Gibbons <daniel.gibbons04@gmail.com>

dsgibbons · 2022-08-24T00:13:42Z

Introduced a few more alterations:

added EvaluationDatasetWithSavedConstructor to test_evaluation.py to remove repeated lines from the various pytest.fixture datasets.
added a new test case using the iris_pandas_df_unusual_cols_dataset that specifically checks non-string columns and very long feature names.
added a manual check for NaN errors to default_evaluator.py.

Ideally, I'd prefer that the default_evaluator halt execution rather than allowing exceptions to just be written to the logger. Maybe there is a better way we can handle exceptions in default_evaluator - perhaps by allowing the user to decide whether or not the program should be halted if an exception is encountered.

One more thing to note is that we often fit a model with unlabelled columns, but then test with labelled ones. This leads to scikit-learn throwing lots of warnings during evaluation. Should this be raised as an issue?

tests/models/test_evaluation.py

harupy · 2022-08-24T04:54:15Z

One more thing to note is that we often fit a model with unlabelled columns, but then test with labelled ones.

Is this really true? If we train a model with unlabelled columns, I think we should test with unlabelled columns.

dsgibbons · 2022-08-24T04:58:33Z

One more thing to note is that we often fit a model with unlabelled columns, but then test with labelled ones.

Is this really true? If we train a model with unlabelled columns, I think we should test with unlabelled columns.

Perhaps you're right. I was somewhat basing that on the code example from #6503 if log_model_explainability=True. This also happens in test_pandas_df_regressor_evaluation.

harupy · 2022-08-24T05:08:20Z

mlflow/models/evaluation/default_evaluator.py

+            # NaN errors should break evaluation
+            if str(e).find("NaN") != -1:
+                raise e


Why should NaN errors break evaluation?

I was not sure how else to catch the NaNs created by the old truncation logic. The NaNs don't carry through to the results passed via the evaluate function. If the user can override the exception handler to raise exceptions then this would be easier to test against.

Does the NaN error occur in the updated code?

No, but there should probably be a way to catch NaN errors in the tests

How about this instead?

except Exception as e: # Shap evaluation might fail on some edge cases, e.g., unsupported input data values # or unsupported model on specific shap explainer. Catch exception to prevent it # breaking the whole `evaluate` function. if not self.evaluator_config.get("ignore_exceptions", True): raise e

and then the test can just include evaluator_config={"ignore_exceptions": False} in the evaluate call.

…lflow#6554 Signed-off-by: Daniel Gibbons <daniel.gibbons04@gmail.com>

dsgibbons · 2022-08-24T08:43:44Z

Is this really true? If we train a model with unlabelled columns, I think we should test with unlabelled columns.

When used with explainability, DefaultEvaluator forces the test pd.DataFrame to have labelled columns, even if the columns were originally unlabelled. Should this logic be changed to be consistent with the above comment?

Editing the _shap_predict_fn as follows would resolve this issue:

def _shap_predict_fn(x, predict_fn, feature_names):
    if isinstance(x, pd.DataFrame):
        df = x
        df.columns = feature_names
    else:
        df = pd.DataFrame(x, columns=feature_names)
    return predict_fn(df)

harupy · 2022-08-25T05:29:01Z

@WeichenXu123 Could you also take a look? I think you're more familiar than me with this feature.

tests/models/test_evaluation.py

mlflow/models/evaluation/default_evaluator.py

WeichenXu123 · 2022-08-29T10:15:00Z

To simplify the code, I think we should stringify the feature names here:

mlflow/mlflow/models/evaluation/base.py

Line 368 in 9c2010c

self._feature_names = list(self._features_data.columns)

And I propose if "column_name" in pd.Dataframe is not a string, we generate the dataset.feature_name to be feature_{str(raw_column_name)}

@dsgibbons @harupy

dsgibbons · 2022-08-29T23:08:57Z

@WeichenXu123 are you OK with sklearn models raising warnings about being tested on labelled columns after being trained on unlabelled columns?

WeichenXu123 · 2022-08-30T03:37:55Z

@WeichenXu123 are you OK with sklearn models raising warnings about being tested on labelled columns after being trained on unlabelled columns?

We should address it. We can address this by:

Add a new bool flag is_dataset_column_unlabeled in EvalautionDataset, and set it to be True in this case,
and in shap_predict_fn code, if this flag is True, then we don't attach auto-generated column name to the dataframe.

and if dataset input is a numpy array, also set is_dataset_column_unlabeled to be True.

dsgibbons · 2022-08-31T07:07:24Z

@WeichenXu123 In the tests, the EvaluationDataset is instantiated after the model has already been trained. The is_dataset_column_unlabeled flag can't tell us if the model was trained on unlabelled columns. It looks like we would need to check against model.feature_names_in_ instead. This can't be done inside EvaluationDataset.

By the way, this sklearn warning problem is not just in _shap_predict_fn - we get the same warning when not using explainability in DefaultEvaluator._generate_model_predictions. I'll make a push soon with my current changes. These changes won't fix the sklearn warnings.

Edit: maybe I misunderstood - did you want to adapt EvaluationDataset to accept an extra argument for is_dataset_column_unlabeled in its __init__? We can then compute this value using model.feature_names_in_ inside the evaluate function in models/evaluation/base.py

dsgibbons · 2022-08-31T23:43:36Z

I'm in a bit of a mess as I made a mistake when syncing with the latest changes from master. Is it OK if I open a new pull request with a clean branch?

Edit: Never mind, I was able to remove the problematic commit

…gs persist for now mlflow#6554 Signed-off-by: Daniel Gibbons <daniel.gibbons04@gmail.com>

mlflow/models/evaluation/base.py

mlflow/models/evaluation/default_evaluator.py

Signed-off-by: Daniel Gibbons <daniel.gibbons04@gmail.com>

dsgibbons · 2022-09-01T23:48:46Z

To summarise, the NaN errors have been removed, so this pull request fixes the issue raised in #6554. My most recent push is basically equivalent to yesterday but contains some simplifications to remove redundant isinstance checks.

The only remaining issue is that sklearn will raise warnings when trained on unlabelled columns but tested on labelled columns. There is no way around this as far as I can tell unless the evaluate function in base.py directly accesses model.feature_names_in_. I believe however that this should be left as a future issue/pull request, as this issue is unrelated to #6554.

mlflow/models/evaluation/base.py

mlflow/utils/string_utils.py

WeichenXu123 · 2022-09-02T12:11:03Z

If the local variable x has truncated column names, then you get NaNs from pd.DataFrame(x, columns=feature_names).

@dsgibbons Thanks for pointing out this !

mlflow/models/evaluation/default_evaluator.py

WeichenXu123 · 2022-09-02T12:16:56Z

Can we fix the warning X has feature names, but LogisticRegression was fitted without feature names ?

We can do it in follow-up PR.

dsgibbons · 2022-09-02T12:31:53Z

Sounds good - I'll sort address those comments tomorrow!

…low#6554 Signed-off-by: Daniel Gibbons <daniel.gibbons04@gmail.com>

Signed-off-by: Daniel Gibbons <daniel.gibbons04@gmail.com>

dsgibbons · 2022-09-04T01:52:55Z

Looks like an unrelated docker issue is failing the python check

WeichenXu123

LGTM

WeichenXu123 · 2022-09-05T08:05:42Z

@dsgibbons

Would you like to file a PR to fix the warning ?

X has feature names, but LogisticRegression was fitted without feature names

We can save the original column name in DatasetEvaluator and when calling predict / predict_proba, we restore the original dataset column name to address the issue. :)

dsgibbons · 2022-09-05T08:12:46Z

@dsgibbons

Would you like to file a PR to fix the warning ?

X has feature names, but LogisticRegression was fitted without feature names

We can save the original column name in DatasetEvaluator and when calling predict / predict_proba, we restore the original dataset column name to address the issue. :)

Yes I will address this later this week!

harupy · 2022-09-05T08:13:03Z

tests/models/test_evaluation.py

+            "f3longnamelongnamelongname": eval_X[:, 2],
+            "f4longnamelongnamelongname": eval_X[:, 3],


Suggested change

"f3longnamelongnamelongname": eval_X[:, 2],

"f4longnamelongnamelongname": eval_X[:, 3],

# Column names longer than the truncation threshold

"f3_" + "x" * 20: eval_X[:, 2],

"f4_" + "x" * 20: eval_X[:, 3],

Nit: I'd use + "x" * 20 here. It's a bit hard to tell if f3longnamelongnamelongname has more than 20 characters.

WeichenXu123 · 2022-09-05T08:21:05Z

@dsgibbons Also remember to update the config #6555 (comment) :)

dsgibbons · 2022-09-05T08:32:05Z

@dsgibbons Also remember to update the config #6555 (comment) :)

Yes makes sense, I'll do that. I'm guessing I can just open up a fresh PR for these extra changes?

harupy · 2022-09-05T09:07:02Z

tests/models/test_default_evaluator.py

+def test_default_explainer_pandas_df_longname_cols(
+    multiclass_logistic_regressor_model_uri, iris_pandas_df_longname_cols_dataset
+):
+    _evaluate_explainer_with_exceptions(
+        multiclass_logistic_regressor_model_uri, iris_pandas_df_longname_cols_dataset
+    )


@WeichenXu123

The SHAP plots logged in this test look like this. Is this the expected result?

Looks like the layout has some issue when feature name is long. We should fix it.

Is (f_3) expected?

Do we made some change on shap plot rendering recently ? I remember previously I adjusted the plot layout well.

This line says For duplicated truncated name, attach "(f_{feature_index})" at the end, but there is no duplicates in the truncated names.

mlflow/mlflow/models/evaluation/default_evaluator.py

Line 566 in 5fa68ff

# For duplicated truncated name, attach "(f_{feature_index})" at the end

Is (f_3) expected?

Yes. After name being truncated, conflicts might happen, e.g.,

longlonglonglongabcdefghijklmnlonglonglonglong and longlonglonglongxyzxyzxyzxyzxlonglonglonglong
after truncation, they both becomes longlonglonglong...longlonglonglong, so the suffix (f_{feature_index}) is used to de-duplicate them.

Looks like the layout has some issue when feature name is long

I will try to fix it.

@WeichenXu123 In the current implementation, we always add the suffix even when there is no duplicate. Can you fix it?

@harupy
Truncation fixing PR : #6705

In my test plot rendering works well.
@harupy Could you provide your testing code ?

@WeichenXu123 You can run test_default_explainer_pandas_df_longname_cols.

In my test plot rendering works well.

It's because you plotted different data.

* fix truncation issues when using explainability Signed-off-by: Daniel Gibbons <daniel.gibbons04@gmail.com> * remove blank line and redundant dict comprehension (#mlflow-6554) Signed-off-by: Daniel Gibbons <daniel.gibbons04@gmail.com> * remove unused import mlflow#6554 Signed-off-by: Daniel Gibbons <daniel.gibbons04@gmail.com> * remove EvaluationDatasetWithSavedConstructor from test_evaluation.py mlflow#6554 Signed-off-by: Daniel Gibbons <daniel.gibbons04@gmail.com> * fix NaN errors resulting from mismatched column names. sklearn warnings persist for now mlflow#6554 Signed-off-by: Daniel Gibbons <daniel.gibbons04@gmail.com> * simplified fix for truncation NaNs mlflow#6554 Signed-off-by: Daniel Gibbons <daniel.gibbons04@gmail.com> * rename stringify function and move df renaming to helper function mlflow#6554 Signed-off-by: Daniel Gibbons <daniel.gibbons04@gmail.com> * minor docstring reformat Signed-off-by: Daniel Gibbons <daniel.gibbons04@gmail.com> Signed-off-by: Daniel Gibbons <daniel.gibbons04@gmail.com> Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com>

…d data mlflow#6555 Signed-off-by: Daniel Gibbons <daniel.gibbons04@gmail.com>

Signed-off-by: Daniel Gibbons <daniel.gibbons04@gmail.com>

* fix truncation issues when using explainability Signed-off-by: Daniel Gibbons <daniel.gibbons04@gmail.com> * remove blank line and redundant dict comprehension (#mlflow-6554) Signed-off-by: Daniel Gibbons <daniel.gibbons04@gmail.com> * remove unused import mlflow#6554 Signed-off-by: Daniel Gibbons <daniel.gibbons04@gmail.com> * remove EvaluationDatasetWithSavedConstructor from test_evaluation.py mlflow#6554 Signed-off-by: Daniel Gibbons <daniel.gibbons04@gmail.com> * fix NaN errors resulting from mismatched column names. sklearn warnings persist for now mlflow#6554 Signed-off-by: Daniel Gibbons <daniel.gibbons04@gmail.com> * simplified fix for truncation NaNs mlflow#6554 Signed-off-by: Daniel Gibbons <daniel.gibbons04@gmail.com> * rename stringify function and move df renaming to helper function mlflow#6554 Signed-off-by: Daniel Gibbons <daniel.gibbons04@gmail.com> * minor docstring reformat Signed-off-by: Daniel Gibbons <daniel.gibbons04@gmail.com> Signed-off-by: Daniel Gibbons <daniel.gibbons04@gmail.com>

dsgibbons added 2 commits August 23, 2022 16:27

fix truncation issues when using explainability

0230549

Signed-off-by: Daniel Gibbons <daniel.gibbons04@gmail.com>

remove blank line and redundant dict comprehension (#mlflow-6554)

69cb163

Signed-off-by: Daniel Gibbons <daniel.gibbons04@gmail.com>

github-actions bot added area/models MLmodel format, model serialization/deserialization, flavors rn/none List under Small Changes in Changelogs. labels Aug 23, 2022

remove unused import mlflow#6554

7fbdb92

Signed-off-by: Daniel Gibbons <daniel.gibbons04@gmail.com>

dsgibbons marked this pull request as ready for review August 24, 2022 00:13

harupy requested a review from WeichenXu123 August 24, 2022 04:37

harupy reviewed Aug 24, 2022

View reviewed changes

tests/models/test_evaluation.py Outdated Show resolved Hide resolved

harupy reviewed Aug 24, 2022

View reviewed changes

remove EvaluationDatasetWithSavedConstructor from test_evaluation.py m…

6c86cdb

…lflow#6554 Signed-off-by: Daniel Gibbons <daniel.gibbons04@gmail.com>

harupy reviewed Aug 29, 2022

View reviewed changes

tests/models/test_evaluation.py Outdated Show resolved Hide resolved

harupy reviewed Aug 29, 2022

View reviewed changes

mlflow/models/evaluation/default_evaluator.py Outdated Show resolved Hide resolved

dsgibbons marked this pull request as draft August 31, 2022 23:29

dsgibbons force-pushed the 6554-truncation-issues-with-explainability branch from 2261221 to 6c86cdb Compare August 31, 2022 23:57

fix NaN errors resulting from mismatched column names. sklearn warnin…

bdbe8d6

…gs persist for now mlflow#6554 Signed-off-by: Daniel Gibbons <daniel.gibbons04@gmail.com>

dsgibbons commented Sep 1, 2022

View reviewed changes

mlflow/models/evaluation/base.py Outdated Show resolved Hide resolved

dsgibbons commented Sep 1, 2022

View reviewed changes

mlflow/models/evaluation/default_evaluator.py Show resolved Hide resolved

simplified fix for truncation NaNs mlflow#6554

0d89f68

Signed-off-by: Daniel Gibbons <daniel.gibbons04@gmail.com>

Merge branch 'master' into 6554-truncation-issues-with-explainability

0b7c252

harupy reviewed Sep 2, 2022

View reviewed changes

mlflow/models/evaluation/base.py Outdated Show resolved Hide resolved

WeichenXu123 reviewed Sep 2, 2022

View reviewed changes

mlflow/utils/string_utils.py Show resolved Hide resolved

WeichenXu123 reviewed Sep 2, 2022

View reviewed changes

mlflow/models/evaluation/default_evaluator.py Outdated Show resolved Hide resolved

WeichenXu123 reviewed Sep 2, 2022

View reviewed changes

mlflow/models/evaluation/default_evaluator.py Outdated Show resolved Hide resolved

dsgibbons added 2 commits September 4, 2022 09:49

rename stringify function and move df renaming to helper function mlf…

6a0b953

…low#6554 Signed-off-by: Daniel Gibbons <daniel.gibbons04@gmail.com>

minor docstring reformat

cf9fab4

Signed-off-by: Daniel Gibbons <daniel.gibbons04@gmail.com>

WeichenXu123 approved these changes Sep 5, 2022

View reviewed changes

WeichenXu123 merged commit 5fa68ff into mlflow:master Sep 5, 2022

harupy reviewed Sep 5, 2022

View reviewed changes

dsgibbons added a commit to dsgibbons/mlflow that referenced this pull request Sep 10, 2022

fix: simplify default_evaluator to avoid sklearn warnings on unlabele…

41e4311

…d data mlflow#6555 Signed-off-by: Daniel Gibbons <daniel.gibbons04@gmail.com>

dsgibbons added a commit to dsgibbons/mlflow that referenced this pull request Sep 10, 2022

use original_column_names to avoid sklearn warnings mlflow#6555

2bdd555

Signed-off-by: Daniel Gibbons <daniel.gibbons04@gmail.com>

dsgibbons added a commit to dsgibbons/mlflow that referenced this pull request Sep 11, 2022

fix sklearn warnings by using original column names mlflow#6555

85ef3c8

Signed-off-by: Daniel Gibbons <daniel.gibbons04@gmail.com>

dsgibbons mentioned this pull request Sep 11, 2022

Fix sklearn warnings during explainer #6754

Open

31 tasks

dsgibbons deleted the 6554-truncation-issues-with-explainability branch September 11, 2022 00:52

harupy mentioned this pull request Sep 14, 2022

[BUG] Truncation issues when using default_evaluator with explainability #6554

Closed

21 tasks

		"f3longnamelongnamelongname": eval_X[:, 2],
		"f4longnamelongnamelongname": eval_X[:, 3],

-            "f3longnamelongnamelongname": eval_X[:, 2],
-            "f4longnamelongnamelongname": eval_X[:, 3],
+            # Column names longer than the truncation threshold
+            "f3_" + "x" * 20: eval_X[:, 2],
+            "f4_" + "x" * 20: eval_X[:, 3],

Fix truncation issues when using explainable evaluator #6555

Fix truncation issues when using explainable evaluator #6555

Conversation

dsgibbons commented Aug 23, 2022

Related Issues/PRs

What changes are proposed in this pull request?

How is this patch tested?

Does this PR change the documentation?

Release Notes

Is this a user-facing change?

What component(s), interfaces, languages, and integrations does this PR affect?

How should the PR be classified in the release notes? Choose one:

harupy commented Aug 23, 2022

dsgibbons commented Aug 23, 2022

harupy commented Aug 23, 2022

dsgibbons commented Aug 24, 2022

harupy commented Aug 24, 2022

dsgibbons commented Aug 24, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dsgibbons Aug 29, 2022 • edited

Choose a reason for hiding this comment

dsgibbons commented Aug 24, 2022 • edited

harupy commented Aug 25, 2022

WeichenXu123 commented Aug 29, 2022 • edited

dsgibbons commented Aug 29, 2022

WeichenXu123 commented Aug 30, 2022 • edited

dsgibbons commented Aug 31, 2022 • edited

dsgibbons commented Aug 31, 2022 • edited

dsgibbons commented Sep 1, 2022

WeichenXu123 commented Sep 2, 2022

WeichenXu123 commented Sep 2, 2022

dsgibbons commented Sep 2, 2022

dsgibbons commented Sep 4, 2022

WeichenXu123 left a comment

Choose a reason for hiding this comment

WeichenXu123 commented Sep 5, 2022

dsgibbons commented Sep 5, 2022 • edited

harupy Sep 5, 2022 • edited

Choose a reason for hiding this comment

WeichenXu123 commented Sep 5, 2022

dsgibbons commented Sep 5, 2022

harupy Sep 5, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

WeichenXu123 Sep 5, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

WeichenXu123 Sep 5, 2022 • edited

Choose a reason for hiding this comment

harupy Sep 5, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

harupy Sep 6, 2022 • edited

Choose a reason for hiding this comment

dsgibbons commented Aug 24, 2022 •

edited

dsgibbons Aug 29, 2022 •

edited

dsgibbons commented Aug 24, 2022 •

edited

WeichenXu123 commented Aug 29, 2022 •

edited

WeichenXu123 commented Aug 30, 2022 •

edited

dsgibbons commented Aug 31, 2022 •

edited

dsgibbons commented Aug 31, 2022 •

edited

dsgibbons commented Sep 5, 2022 •

edited

harupy Sep 5, 2022 •

edited

harupy Sep 5, 2022 •

edited

WeichenXu123 Sep 5, 2022 •

edited

WeichenXu123 Sep 5, 2022 •

edited

harupy Sep 5, 2022 •

edited

harupy Sep 6, 2022 •

edited