Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix truncation issues when using explainable evaluator #6555

Conversation

dsgibbons
Copy link
Contributor

Related Issues/PRs

#6554

What changes are proposed in this pull request?

All changes relate to default_evaluator.py.

  1. Use str(f) to allow for non-string features to be truncated truncated_feature_names = [truncate_str_from_middle(str(f), 20) for f in self.feature_names]
  2. Use str to prevent truncation mapping being applied to short non-string names like 1, 2 etc. : if truncated_name != str(self.feature_names[i])
  3. When creating shap_predict_fn, don't create a new DataFrame from x if x is already a DataFrame. If there is a mismatch between the column names of x and the column names passed to DataFrame(x, columns=feature_names), then the resulting DataFrame is filled with NaNs.

How is this patch tested?

  • I have written tests (not required for typo or doc fix) and confirmed the proposed feature/bug-fix/change works.

Not yet, but I will write a test to make sure that evaluating with non-string, or very long feature names does not result in NaNs. Once I have written this test, I will remove the draft tag from this PR

Does this PR change the documentation?

  • No. You can skip the rest of this section.
  • Yes. Make sure the changed pages / sections render correctly by following the steps below.
  1. Click the Details link on the Preview docs check.
  2. Find the changed pages / sections and make sure they render correctly.

Release Notes

Is this a user-facing change?

  • No. You can skip the rest of this section.
  • Yes. Give a description of this change to be included in the release notes for MLflow users.

(Details in 1-2 sentences. You can just refer to another PR with a description if this PR is part of a larger change.)

What component(s), interfaces, languages, and integrations does this PR affect?

Components

  • area/artifacts: Artifact stores and artifact logging
  • area/build: Build and test infrastructure for MLflow
  • area/docs: MLflow documentation pages
  • area/examples: Example code
  • area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
  • area/models: MLmodel format, model serialization/deserialization, flavors
  • area/pipelines: Pipelines, Pipeline APIs, Pipeline configs, Pipeline Templates
  • area/projects: MLproject format, project running backends
  • area/scoring: MLflow Model server, model deployment tools, Spark UDFs
  • area/server-infra: MLflow Tracking server backend
  • area/tracking: Tracking Service, tracking client APIs, autologging

Interface

  • area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
  • area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
  • area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
  • area/windows: Windows support

Language

  • language/r: R APIs and clients
  • language/java: Java APIs and clients
  • language/new: Proposals for new client languages

Integrations

  • integrations/azure: Azure and Azure ML integrations
  • integrations/sagemaker: SageMaker integrations
  • integrations/databricks: Databricks integrations

How should the PR be classified in the release notes? Choose one:

  • rn/breaking-change - The PR will be mentioned in the "Breaking Changes" section
  • rn/none - No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" section
  • rn/feature - A new user-facing feature worth mentioning in the release notes
  • rn/bug-fix - A user-facing bug fix worth mentioning in the release notes
  • rn/documentation - A user-facing documentation change worth mentioning in the release notes

Signed-off-by: Daniel Gibbons <daniel.gibbons04@gmail.com>
Signed-off-by: Daniel Gibbons <daniel.gibbons04@gmail.com>
@github-actions github-actions bot added area/models MLmodel format, model serialization/deserialization, flavors rn/none List under Small Changes in Changelogs. labels Aug 23, 2022
@harupy
Copy link
Member

harupy commented Aug 23, 2022

@dsgibbons Thanks for the RP! Can you add tests?

@dsgibbons
Copy link
Contributor Author

Yes I was planning to add a test tomorrow and then I'll mark it ready to review.

@harupy
Copy link
Member

harupy commented Aug 23, 2022

Got it, thanks!

Signed-off-by: Daniel Gibbons <daniel.gibbons04@gmail.com>
@dsgibbons
Copy link
Contributor Author

Introduced a few more alterations:

  • added EvaluationDatasetWithSavedConstructor to test_evaluation.py to remove repeated lines from the various pytest.fixture datasets.
  • added a new test case using the iris_pandas_df_unusual_cols_dataset that specifically checks non-string columns and very long feature names.
  • added a manual check for NaN errors to default_evaluator.py.

Ideally, I'd prefer that the default_evaluator halt execution rather than allowing exceptions to just be written to the logger. Maybe there is a better way we can handle exceptions in default_evaluator - perhaps by allowing the user to decide whether or not the program should be halted if an exception is encountered.

One more thing to note is that we often fit a model with unlabelled columns, but then test with labelled ones. This leads to scikit-learn throwing lots of warnings during evaluation. Should this be raised as an issue?

@dsgibbons dsgibbons marked this pull request as ready for review August 24, 2022 00:13
@harupy
Copy link
Member

harupy commented Aug 24, 2022

One more thing to note is that we often fit a model with unlabelled columns, but then test with labelled ones.

Is this really true? If we train a model with unlabelled columns, I think we should test with unlabelled columns.

@dsgibbons
Copy link
Contributor Author

dsgibbons commented Aug 24, 2022

One more thing to note is that we often fit a model with unlabelled columns, but then test with labelled ones.

Is this really true? If we train a model with unlabelled columns, I think we should test with unlabelled columns.

Perhaps you're right. I was somewhat basing that on the code example from #6503 if log_model_explainability=True. This also happens in test_pandas_df_regressor_evaluation.

Comment on lines 636 to 638
# NaN errors should break evaluation
if str(e).find("NaN") != -1:
raise e
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why should NaN errors break evaluation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was not sure how else to catch the NaNs created by the old truncation logic. The NaNs don't carry through to the results passed via the evaluate function. If the user can override the exception handler to raise exceptions then this would be easier to test against.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the NaN error occur in the updated code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, but there should probably be a way to catch NaN errors in the tests

Copy link
Contributor Author

@dsgibbons dsgibbons Aug 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about this instead?

except Exception as e:
    # Shap evaluation might fail on some edge cases, e.g., unsupported input data values
    # or unsupported model on specific shap explainer. Catch exception to prevent it
    # breaking the whole `evaluate` function.

    if not self.evaluator_config.get("ignore_exceptions", True):
        raise e

and then the test can just include evaluator_config={"ignore_exceptions": False} in the evaluate call.

…lflow#6554

Signed-off-by: Daniel Gibbons <daniel.gibbons04@gmail.com>
@dsgibbons
Copy link
Contributor Author

dsgibbons commented Aug 24, 2022

Is this really true? If we train a model with unlabelled columns, I think we should test with unlabelled columns.

When used with explainability, DefaultEvaluator forces the test pd.DataFrame to have labelled columns, even if the columns were originally unlabelled. Should this logic be changed to be consistent with the above comment?

Editing the _shap_predict_fn as follows would resolve this issue:

def _shap_predict_fn(x, predict_fn, feature_names):
    if isinstance(x, pd.DataFrame):
        df = x
        df.columns = feature_names
    else:
        df = pd.DataFrame(x, columns=feature_names)
    return predict_fn(df)

@harupy
Copy link
Member

harupy commented Aug 25, 2022

@WeichenXu123 Could you also take a look? I think you're more familiar than me with this feature.

@WeichenXu123
Copy link
Collaborator

WeichenXu123 commented Aug 29, 2022

To simplify the code, I think we should stringify the feature names here:

self._feature_names = list(self._features_data.columns)

And I propose if "column_name" in pd.Dataframe is not a string, we generate the dataset.feature_name to be feature_{str(raw_column_name)}

@dsgibbons @harupy

@dsgibbons
Copy link
Contributor Author

@WeichenXu123 are you OK with sklearn models raising warnings about being tested on labelled columns after being trained on unlabelled columns?

@WeichenXu123
Copy link
Collaborator

WeichenXu123 commented Aug 30, 2022

@WeichenXu123 are you OK with sklearn models raising warnings about being tested on labelled columns after being trained on unlabelled columns?

We should address it. We can address this by:

Add a new bool flag is_dataset_column_unlabeled in EvalautionDataset, and set it to be True in this case,
and in shap_predict_fn code, if this flag is True, then we don't attach auto-generated column name to the dataframe.

and if dataset input is a numpy array, also set is_dataset_column_unlabeled to be True.

@dsgibbons
Copy link
Contributor Author

dsgibbons commented Aug 31, 2022

@WeichenXu123 In the tests, the EvaluationDataset is instantiated after the model has already been trained. The is_dataset_column_unlabeled flag can't tell us if the model was trained on unlabelled columns. It looks like we would need to check against model.feature_names_in_ instead. This can't be done inside EvaluationDataset.

By the way, this sklearn warning problem is not just in _shap_predict_fn - we get the same warning when not using explainability in DefaultEvaluator._generate_model_predictions. I'll make a push soon with my current changes. These changes won't fix the sklearn warnings.

Edit: maybe I misunderstood - did you want to adapt EvaluationDataset to accept an extra argument for is_dataset_column_unlabeled in its __init__? We can then compute this value using model.feature_names_in_ inside the evaluate function in models/evaluation/base.py

@dsgibbons dsgibbons marked this pull request as draft August 31, 2022 23:29
@dsgibbons
Copy link
Contributor Author

dsgibbons commented Aug 31, 2022

I'm in a bit of a mess as I made a mistake when syncing with the latest changes from master. Is it OK if I open a new pull request with a clean branch?

Edit: Never mind, I was able to remove the problematic commit

@dsgibbons dsgibbons force-pushed the 6554-truncation-issues-with-explainability branch from 2261221 to 6c86cdb Compare August 31, 2022 23:57
…gs persist for now mlflow#6554

Signed-off-by: Daniel Gibbons <daniel.gibbons04@gmail.com>
Signed-off-by: Daniel Gibbons <daniel.gibbons04@gmail.com>
@dsgibbons
Copy link
Contributor Author

To summarise, the NaN errors have been removed, so this pull request fixes the issue raised in #6554. My most recent push is basically equivalent to yesterday but contains some simplifications to remove redundant isinstance checks.

The only remaining issue is that sklearn will raise warnings when trained on unlabelled columns but tested on labelled columns. There is no way around this as far as I can tell unless the evaluate function in base.py directly accesses model.feature_names_in_. I believe however that this should be left as a future issue/pull request, as this issue is unrelated to #6554.

@WeichenXu123
Copy link
Collaborator

If the local variable x has truncated column names, then you get NaNs from pd.DataFrame(x, columns=feature_names).

@dsgibbons Thanks for pointing out this !

@WeichenXu123
Copy link
Collaborator

Can we fix the warning X has feature names, but LogisticRegression was fitted without feature names ?

We can do it in follow-up PR.

@dsgibbons
Copy link
Contributor Author

Sounds good - I'll sort address those comments tomorrow!

…low#6554

Signed-off-by: Daniel Gibbons <daniel.gibbons04@gmail.com>
Signed-off-by: Daniel Gibbons <daniel.gibbons04@gmail.com>
@dsgibbons
Copy link
Contributor Author

Looks like an unrelated docker issue is failing the python check

Copy link
Collaborator

@WeichenXu123 WeichenXu123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@WeichenXu123 WeichenXu123 merged commit 5fa68ff into mlflow:master Sep 5, 2022
@WeichenXu123
Copy link
Collaborator

@dsgibbons

Would you like to file a PR to fix the warning ?

X has feature names, but LogisticRegression was fitted without feature names

We can save the original column name in DatasetEvaluator and when calling predict / predict_proba, we restore the original dataset column name to address the issue. :)

@dsgibbons
Copy link
Contributor Author

dsgibbons commented Sep 5, 2022

@dsgibbons

Would you like to file a PR to fix the warning ?

X has feature names, but LogisticRegression was fitted without feature names

We can save the original column name in DatasetEvaluator and when calling predict / predict_proba, we restore the original dataset column name to address the issue. :)

Yes I will address this later this week!

Comment on lines +353 to +354
"f3longnamelongnamelongname": eval_X[:, 2],
"f4longnamelongnamelongname": eval_X[:, 3],
Copy link
Member

@harupy harupy Sep 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"f3longnamelongnamelongname": eval_X[:, 2],
"f4longnamelongnamelongname": eval_X[:, 3],
# Column names longer than the truncation threshold
"f3_" + "x" * 20: eval_X[:, 2],
"f4_" + "x" * 20: eval_X[:, 3],

Nit: I'd use + "x" * 20 here. It's a bit hard to tell if f3longnamelongnamelongname has more than 20 characters.

@WeichenXu123
Copy link
Collaborator

@dsgibbons Also remember to update the config #6555 (comment) :)

@dsgibbons
Copy link
Contributor Author

@dsgibbons Also remember to update the config #6555 (comment) :)

Yes makes sense, I'll do that. I'm guessing I can just open up a fresh PR for these extra changes?

Comment on lines +679 to +684
def test_default_explainer_pandas_df_longname_cols(
multiclass_logistic_regressor_model_uri, iris_pandas_df_longname_cols_dataset
):
_evaluate_explainer_with_exceptions(
multiclass_logistic_regressor_model_uri, iris_pandas_df_longname_cols_dataset
)
Copy link
Member

@harupy harupy Sep 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

image

image

@WeichenXu123

The SHAP plots logged in this test look like this. Is this the expected result?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the layout has some issue when feature name is long. We should fix it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is (f_3) expected?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we made some change on shap plot rendering recently ? I remember previously I adjusted the plot layout well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line says For duplicated truncated name, attach "(f_{feature_index})" at the end, but there is no duplicates in the truncated names.

# For duplicated truncated name, attach "(f_{feature_index})" at the end

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is (f_3) expected?

Yes. After name being truncated, conflicts might happen, e.g.,

longlonglonglongabcdefghijklmnlonglonglonglong and longlonglonglongxyzxyzxyzxyzxlonglonglonglong
after truncation, they both becomes longlonglonglong...longlonglonglong, so the suffix (f_{feature_index}) is used to de-duplicate them.

Looks like the layout has some issue when feature name is long

I will try to fix it.

Copy link
Member

@harupy harupy Sep 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@WeichenXu123 In the current implementation, we always add the suffix even when there is no duplicate. Can you fix it?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@harupy
Truncation fixing PR : #6705

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shap_beeswarm_plot_on_data_diabetes_dataset
shap_feature_importance_plot_on_data_diabetes_dataset
shap_summary_plot_on_data_diabetes_dataset

In my test plot rendering works well.
@harupy Could you provide your testing code ?

Copy link
Member

@harupy harupy Sep 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@WeichenXu123 You can run test_default_explainer_pandas_df_longname_cols.

In my test plot rendering works well.

It's because you plotted different data.

prithvikannan pushed a commit to prithvikannan/mlflow that referenced this pull request Sep 6, 2022
* fix truncation issues when using explainability

Signed-off-by: Daniel Gibbons <daniel.gibbons04@gmail.com>

* remove blank line and redundant dict comprehension (#mlflow-6554)

Signed-off-by: Daniel Gibbons <daniel.gibbons04@gmail.com>

* remove unused import mlflow#6554

Signed-off-by: Daniel Gibbons <daniel.gibbons04@gmail.com>

* remove EvaluationDatasetWithSavedConstructor from test_evaluation.py mlflow#6554

Signed-off-by: Daniel Gibbons <daniel.gibbons04@gmail.com>

* fix NaN errors resulting from mismatched column names. sklearn warnings persist for now mlflow#6554

Signed-off-by: Daniel Gibbons <daniel.gibbons04@gmail.com>

* simplified fix for truncation NaNs mlflow#6554

Signed-off-by: Daniel Gibbons <daniel.gibbons04@gmail.com>

* rename stringify function and move df renaming to helper function mlflow#6554

Signed-off-by: Daniel Gibbons <daniel.gibbons04@gmail.com>

* minor docstring reformat

Signed-off-by: Daniel Gibbons <daniel.gibbons04@gmail.com>

Signed-off-by: Daniel Gibbons <daniel.gibbons04@gmail.com>
Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com>
prithvikannan pushed a commit to prithvikannan/mlflow that referenced this pull request Sep 7, 2022
* fix truncation issues when using explainability

Signed-off-by: Daniel Gibbons <daniel.gibbons04@gmail.com>

* remove blank line and redundant dict comprehension (#mlflow-6554)

Signed-off-by: Daniel Gibbons <daniel.gibbons04@gmail.com>

* remove unused import mlflow#6554

Signed-off-by: Daniel Gibbons <daniel.gibbons04@gmail.com>

* remove EvaluationDatasetWithSavedConstructor from test_evaluation.py mlflow#6554

Signed-off-by: Daniel Gibbons <daniel.gibbons04@gmail.com>

* fix NaN errors resulting from mismatched column names. sklearn warnings persist for now mlflow#6554

Signed-off-by: Daniel Gibbons <daniel.gibbons04@gmail.com>

* simplified fix for truncation NaNs mlflow#6554

Signed-off-by: Daniel Gibbons <daniel.gibbons04@gmail.com>

* rename stringify function and move df renaming to helper function mlflow#6554

Signed-off-by: Daniel Gibbons <daniel.gibbons04@gmail.com>

* minor docstring reformat

Signed-off-by: Daniel Gibbons <daniel.gibbons04@gmail.com>

Signed-off-by: Daniel Gibbons <daniel.gibbons04@gmail.com>
Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com>
dsgibbons added a commit to dsgibbons/mlflow that referenced this pull request Sep 10, 2022
…d data mlflow#6555

Signed-off-by: Daniel Gibbons <daniel.gibbons04@gmail.com>
dsgibbons added a commit to dsgibbons/mlflow that referenced this pull request Sep 10, 2022
Signed-off-by: Daniel Gibbons <daniel.gibbons04@gmail.com>
dsgibbons added a commit to dsgibbons/mlflow that referenced this pull request Sep 11, 2022
Signed-off-by: Daniel Gibbons <daniel.gibbons04@gmail.com>
@dsgibbons dsgibbons deleted the 6554-truncation-issues-with-explainability branch September 11, 2022 00:52
nnethery pushed a commit to nnethery/mlflow that referenced this pull request Feb 1, 2024
* fix truncation issues when using explainability

Signed-off-by: Daniel Gibbons <daniel.gibbons04@gmail.com>

* remove blank line and redundant dict comprehension (#mlflow-6554)

Signed-off-by: Daniel Gibbons <daniel.gibbons04@gmail.com>

* remove unused import mlflow#6554

Signed-off-by: Daniel Gibbons <daniel.gibbons04@gmail.com>

* remove EvaluationDatasetWithSavedConstructor from test_evaluation.py mlflow#6554

Signed-off-by: Daniel Gibbons <daniel.gibbons04@gmail.com>

* fix NaN errors resulting from mismatched column names. sklearn warnings persist for now mlflow#6554

Signed-off-by: Daniel Gibbons <daniel.gibbons04@gmail.com>

* simplified fix for truncation NaNs mlflow#6554

Signed-off-by: Daniel Gibbons <daniel.gibbons04@gmail.com>

* rename stringify function and move df renaming to helper function mlflow#6554

Signed-off-by: Daniel Gibbons <daniel.gibbons04@gmail.com>

* minor docstring reformat

Signed-off-by: Daniel Gibbons <daniel.gibbons04@gmail.com>

Signed-off-by: Daniel Gibbons <daniel.gibbons04@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/models MLmodel format, model serialization/deserialization, flavors rn/none List under Small Changes in Changelogs.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants