-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix truncation issues when using explainable evaluator #6555
Fix truncation issues when using explainable evaluator #6555
Conversation
Signed-off-by: Daniel Gibbons <daniel.gibbons04@gmail.com>
Signed-off-by: Daniel Gibbons <daniel.gibbons04@gmail.com>
@dsgibbons Thanks for the RP! Can you add tests? |
Yes I was planning to add a test tomorrow and then I'll mark it ready to review. |
Got it, thanks! |
Signed-off-by: Daniel Gibbons <daniel.gibbons04@gmail.com>
Introduced a few more alterations:
Ideally, I'd prefer that the One more thing to note is that we often fit a model with unlabelled columns, but then test with labelled ones. This leads to scikit-learn throwing lots of warnings during evaluation. Should this be raised as an issue? |
Is this really true? If we train a model with unlabelled columns, I think we should test with unlabelled columns. |
Perhaps you're right. I was somewhat basing that on the code example from #6503 if |
# NaN errors should break evaluation | ||
if str(e).find("NaN") != -1: | ||
raise e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why should NaN errors break evaluation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was not sure how else to catch the NaNs created by the old truncation logic. The NaNs don't carry through to the results passed via the evaluate
function. If the user can override the exception handler to raise exceptions then this would be easier to test against.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the NaN error occur in the updated code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, but there should probably be a way to catch NaN errors in the tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about this instead?
except Exception as e:
# Shap evaluation might fail on some edge cases, e.g., unsupported input data values
# or unsupported model on specific shap explainer. Catch exception to prevent it
# breaking the whole `evaluate` function.
if not self.evaluator_config.get("ignore_exceptions", True):
raise e
and then the test can just include evaluator_config={"ignore_exceptions": False}
in the evaluate
call.
…lflow#6554 Signed-off-by: Daniel Gibbons <daniel.gibbons04@gmail.com>
When used with explainability, Editing the
|
@WeichenXu123 Could you also take a look? I think you're more familiar than me with this feature. |
To simplify the code, I think we should stringify the feature names here: mlflow/mlflow/models/evaluation/base.py Line 368 in 9c2010c
And I propose if "column_name" in pd.Dataframe is not a string, we generate the dataset.feature_name to be |
@WeichenXu123 are you OK with sklearn models raising warnings about being tested on labelled columns after being trained on unlabelled columns? |
We should address it. We can address this by: Add a new bool flag and if dataset input is a numpy array, also set |
@WeichenXu123 In the tests, the By the way, this sklearn warning problem is not just in Edit: maybe I misunderstood - did you want to adapt EvaluationDataset to accept an extra argument for |
I'm in a bit of a mess as I made a mistake when syncing with the latest changes from master. Is it OK if I open a new pull request with a clean branch? Edit: Never mind, I was able to remove the problematic commit |
2261221
to
6c86cdb
Compare
…gs persist for now mlflow#6554 Signed-off-by: Daniel Gibbons <daniel.gibbons04@gmail.com>
Signed-off-by: Daniel Gibbons <daniel.gibbons04@gmail.com>
To summarise, the NaN errors have been removed, so this pull request fixes the issue raised in #6554. My most recent push is basically equivalent to yesterday but contains some simplifications to remove redundant The only remaining issue is that sklearn will raise warnings when trained on unlabelled columns but tested on labelled columns. There is no way around this as far as I can tell unless the |
@dsgibbons Thanks for pointing out this ! |
Can we fix the warning We can do it in follow-up PR. |
Sounds good - I'll sort address those comments tomorrow! |
…low#6554 Signed-off-by: Daniel Gibbons <daniel.gibbons04@gmail.com>
Signed-off-by: Daniel Gibbons <daniel.gibbons04@gmail.com>
Looks like an unrelated docker issue is failing the python check |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Would you like to file a PR to fix the warning ?
We can save the original column name in DatasetEvaluator and when calling predict / predict_proba, we restore the original dataset column name to address the issue. :) |
Yes I will address this later this week! |
"f3longnamelongnamelongname": eval_X[:, 2], | ||
"f4longnamelongnamelongname": eval_X[:, 3], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"f3longnamelongnamelongname": eval_X[:, 2], | |
"f4longnamelongnamelongname": eval_X[:, 3], | |
# Column names longer than the truncation threshold | |
"f3_" + "x" * 20: eval_X[:, 2], | |
"f4_" + "x" * 20: eval_X[:, 3], |
Nit: I'd use + "x" * 20
here. It's a bit hard to tell if f3longnamelongnamelongname
has more than 20 characters.
@dsgibbons Also remember to update the config #6555 (comment) :) |
Yes makes sense, I'll do that. I'm guessing I can just open up a fresh PR for these extra changes? |
def test_default_explainer_pandas_df_longname_cols( | ||
multiclass_logistic_regressor_model_uri, iris_pandas_df_longname_cols_dataset | ||
): | ||
_evaluate_explainer_with_exceptions( | ||
multiclass_logistic_regressor_model_uri, iris_pandas_df_longname_cols_dataset | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like the layout has some issue when feature name is long. We should fix it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is (f_3)
expected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we made some change on shap plot rendering recently ? I remember previously I adjusted the plot layout well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line says For duplicated truncated name, attach "(f_{feature_index})" at the end
, but there is no duplicates in the truncated names.
# For duplicated truncated name, attach "(f_{feature_index})" at the end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is (f_3) expected?
Yes. After name being truncated, conflicts might happen, e.g.,
longlonglonglongabcdefghijklmnlonglonglonglong
and longlonglonglongxyzxyzxyzxyzxlonglonglonglong
after truncation, they both becomes longlonglonglong...longlonglonglong
, so the suffix (f_{feature_index}) is used to de-duplicate them.
Looks like the layout has some issue when feature name is long
I will try to fix it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@WeichenXu123 In the current implementation, we always add the suffix even when there is no duplicate. Can you fix it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my test plot rendering works well.
@harupy Could you provide your testing code ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@WeichenXu123 You can run test_default_explainer_pandas_df_longname_cols
.
In my test plot rendering works well.
It's because you plotted different data.
* fix truncation issues when using explainability Signed-off-by: Daniel Gibbons <daniel.gibbons04@gmail.com> * remove blank line and redundant dict comprehension (#mlflow-6554) Signed-off-by: Daniel Gibbons <daniel.gibbons04@gmail.com> * remove unused import mlflow#6554 Signed-off-by: Daniel Gibbons <daniel.gibbons04@gmail.com> * remove EvaluationDatasetWithSavedConstructor from test_evaluation.py mlflow#6554 Signed-off-by: Daniel Gibbons <daniel.gibbons04@gmail.com> * fix NaN errors resulting from mismatched column names. sklearn warnings persist for now mlflow#6554 Signed-off-by: Daniel Gibbons <daniel.gibbons04@gmail.com> * simplified fix for truncation NaNs mlflow#6554 Signed-off-by: Daniel Gibbons <daniel.gibbons04@gmail.com> * rename stringify function and move df renaming to helper function mlflow#6554 Signed-off-by: Daniel Gibbons <daniel.gibbons04@gmail.com> * minor docstring reformat Signed-off-by: Daniel Gibbons <daniel.gibbons04@gmail.com> Signed-off-by: Daniel Gibbons <daniel.gibbons04@gmail.com> Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com>
* fix truncation issues when using explainability Signed-off-by: Daniel Gibbons <daniel.gibbons04@gmail.com> * remove blank line and redundant dict comprehension (#mlflow-6554) Signed-off-by: Daniel Gibbons <daniel.gibbons04@gmail.com> * remove unused import mlflow#6554 Signed-off-by: Daniel Gibbons <daniel.gibbons04@gmail.com> * remove EvaluationDatasetWithSavedConstructor from test_evaluation.py mlflow#6554 Signed-off-by: Daniel Gibbons <daniel.gibbons04@gmail.com> * fix NaN errors resulting from mismatched column names. sklearn warnings persist for now mlflow#6554 Signed-off-by: Daniel Gibbons <daniel.gibbons04@gmail.com> * simplified fix for truncation NaNs mlflow#6554 Signed-off-by: Daniel Gibbons <daniel.gibbons04@gmail.com> * rename stringify function and move df renaming to helper function mlflow#6554 Signed-off-by: Daniel Gibbons <daniel.gibbons04@gmail.com> * minor docstring reformat Signed-off-by: Daniel Gibbons <daniel.gibbons04@gmail.com> Signed-off-by: Daniel Gibbons <daniel.gibbons04@gmail.com> Signed-off-by: Prithvi Kannan <prithvi.kannan@databricks.com>
…d data mlflow#6555 Signed-off-by: Daniel Gibbons <daniel.gibbons04@gmail.com>
Signed-off-by: Daniel Gibbons <daniel.gibbons04@gmail.com>
Signed-off-by: Daniel Gibbons <daniel.gibbons04@gmail.com>
* fix truncation issues when using explainability Signed-off-by: Daniel Gibbons <daniel.gibbons04@gmail.com> * remove blank line and redundant dict comprehension (#mlflow-6554) Signed-off-by: Daniel Gibbons <daniel.gibbons04@gmail.com> * remove unused import mlflow#6554 Signed-off-by: Daniel Gibbons <daniel.gibbons04@gmail.com> * remove EvaluationDatasetWithSavedConstructor from test_evaluation.py mlflow#6554 Signed-off-by: Daniel Gibbons <daniel.gibbons04@gmail.com> * fix NaN errors resulting from mismatched column names. sklearn warnings persist for now mlflow#6554 Signed-off-by: Daniel Gibbons <daniel.gibbons04@gmail.com> * simplified fix for truncation NaNs mlflow#6554 Signed-off-by: Daniel Gibbons <daniel.gibbons04@gmail.com> * rename stringify function and move df renaming to helper function mlflow#6554 Signed-off-by: Daniel Gibbons <daniel.gibbons04@gmail.com> * minor docstring reformat Signed-off-by: Daniel Gibbons <daniel.gibbons04@gmail.com> Signed-off-by: Daniel Gibbons <daniel.gibbons04@gmail.com>
Related Issues/PRs
#6554
What changes are proposed in this pull request?
All changes relate to
default_evaluator.py
.str(f)
to allow for non-string features to be truncatedtruncated_feature_names = [truncate_str_from_middle(str(f), 20) for f in self.feature_names]
str
to prevent truncation mapping being applied to short non-string names like 1, 2 etc. :if truncated_name != str(self.feature_names[i])
shap_predict_fn
, don't create a new DataFrame fromx
ifx
is already a DataFrame. If there is a mismatch between the column names ofx
and the column names passed toDataFrame(x, columns=feature_names)
, then the resulting DataFrame is filled with NaNs.How is this patch tested?
Not yet, but I will write a test to make sure that evaluating with non-string, or very long feature names does not result in NaNs. Once I have written this test, I will remove the draft tag from this PR
Does this PR change the documentation?
Details
link on thePreview docs
check.Release Notes
Is this a user-facing change?
(Details in 1-2 sentences. You can just refer to another PR with a description if this PR is part of a larger change.)
What component(s), interfaces, languages, and integrations does this PR affect?
Components
area/artifacts
: Artifact stores and artifact loggingarea/build
: Build and test infrastructure for MLflowarea/docs
: MLflow documentation pagesarea/examples
: Example codearea/model-registry
: Model Registry service, APIs, and the fluent client calls for Model Registryarea/models
: MLmodel format, model serialization/deserialization, flavorsarea/pipelines
: Pipelines, Pipeline APIs, Pipeline configs, Pipeline Templatesarea/projects
: MLproject format, project running backendsarea/scoring
: MLflow Model server, model deployment tools, Spark UDFsarea/server-infra
: MLflow Tracking server backendarea/tracking
: Tracking Service, tracking client APIs, autologgingInterface
area/uiux
: Front-end, user experience, plotting, JavaScript, JavaScript dev serverarea/docker
: Docker use across MLflow's components, such as MLflow Projects and MLflow Modelsarea/sqlalchemy
: Use of SQLAlchemy in the Tracking Service or Model Registryarea/windows
: Windows supportLanguage
language/r
: R APIs and clientslanguage/java
: Java APIs and clientslanguage/new
: Proposals for new client languagesIntegrations
integrations/azure
: Azure and Azure ML integrationsintegrations/sagemaker
: SageMaker integrationsintegrations/databricks
: Databricks integrationsHow should the PR be classified in the release notes? Choose one:
rn/breaking-change
- The PR will be mentioned in the "Breaking Changes" sectionrn/none
- No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" sectionrn/feature
- A new user-facing feature worth mentioning in the release notesrn/bug-fix
- A user-facing bug fix worth mentioning in the release notesrn/documentation
- A user-facing documentation change worth mentioning in the release notes