-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Evaluation Default evaluator #5092
Evaluation Default evaluator #5092
Conversation
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
if not _matplotlib_initialized: | ||
import matplotlib.pyplot as pyplot | ||
|
||
pyplot.rcParams["figure.dpi"] = 144 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should mutate rcParams
. Can we just specify dpi and figsize when saving plots?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems cannot. We need set it when creating figure, if so we need modify shap source code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we could try this
https://stackoverflow.com/a/35394637
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I just also found it. Use with matplotlib.rc_context(...)
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
_last_failed_evaluator = None | ||
|
||
|
||
def _get_last_failed_evaluator(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where do we use this function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In usage log, we want to log error raised from which evaluator (TODO), for debug purpose. so I add this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we only store the last failed evaluator?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The evaluate API will raise error when it hit the first failed evaluator. (The API designed to be so.) When any evaluator fail evaluate
raise error, otherwise return all merged success results
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
sampled_X = sampled_X.rename(columns=truncated_feature_name_map, copy=False) | ||
|
||
if algorithm: | ||
supported_algos = ['exact', 'permutation', 'partition'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note:
in shap 0.40, the shap.Explainer
argument algo
, only when setting 'exact', 'permutation', 'partition', it works fine, the remaining algo (listed in shap doc) are all buggy in my test. So I limit them to be the 3 values.
try: | ||
pyplot.clf() | ||
do_plot() | ||
pyplot.savefig(artifact_file_local_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we set the dpi value to something higher than the default of 100? The images look really blurry and out of place in the tracking UI. 300 might be a better option.
pyplot.savefig(artifact_file_local_path, dpi=300)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or perhaps save in .svg format to ensure that scalability in different zoom levels in browsers will work well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I increase dpi to 288 (previous in databricks notebook it is 72)
Save fig as svg is not good choice, the Pillow package does not support svg. But we need return artifact as the pillow Image instance.
sk_metrics.ConfusionMatrixDisplay( | ||
confusion_matrix=confusion_matrix, | ||
display_labels=self.label_list, | ||
).plot(cmap="Blues") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to scale these plots for class count > 6. The example for multiclass that has 10 classes is illegible with numeric wrapping overwrites of values. Setting figsize overrides on the canvas and changing the dpi at figure save time will make a much more clear visualization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The autologging code also generate confusion matrix plot for training dataset and has similar issue. Let's do a follow-up PR to improve for both cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Create a follow-up ticket for this:
https://databricks.atlassian.net/browse/ML-19817
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Nice implementation! Sorry for the delayed review, Weichen.
def evaluate( | ||
*, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does the "*" mean here? Do you intend to make it a nameless positional arg like *args? It may worth to spell it out even if it's not used, and put it in the end of the argument list, e.g.
def _evaluate(model, model_type, dataset, actual_run_id, evaluator_name_list, evaluator_name_to_conf_map, *unused_args):
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. The *
is a python grammar, not a argument, it means all following arguments must be a key-word-only arguments. https://www.python.org/dev/peps/pep-3102/
Mark all argument as key-word-only is a proposal raised by @dbczumar , so that in future if mlflow containerized model provide "model_type" property, we can remove the "model_type" argument here but keep backwards compatibility
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait, removing the model_type
argumnet is a breaking change, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In future, we can change API to be :
def evaluate(
*, model, dataset, run_id, evaluators=None, evaluator_config=None, **kwargs
):
if "model_type" in kwargs:
model_type = kwargs["kwargs"]
logger.warning("deprecation: ....")
else:
model_type = model.model_type
...
Then we can keep backwards compatiblity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't know about this usage pattern. Interesting!
Another way to keep backward compatibility is to use keyword arg for model_type, e.g.
def evaluate(
model, dataset, run_id, model_type=None, evaluators=None, evaluator_config=None, **kwargs
):
What's the downside of going this more "conventional" way of specifying optional arguments?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In current evaluate
implementation, the model_type
argument is required. In future we will make it "optional and deprecated" and then remove this argument.
@@ -496,33 +685,72 @@ def evaluate( | |||
a nested dictionary whose key is the evaluator name. | |||
:return: An :py:class:`mlflow.models.evaluation.EvaluationDataset` instance containing | |||
evaluation results. | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! I really like the documentation here to spell out the expected outcome and limitations. Hopefully we'll make it shorter and easier to navigate over time.
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
What changes are proposed in this pull request?
Prototype Evaluation Default evaluator
Test code:
How is this patch tested?
(Details)
Does this PR change the documentation?
ci/circleci: build_doc
check. If it's successful, proceed to thenext step, otherwise fix it.
Details
on the right to open the job page of CircleCI.Artifacts
tab.docs/build/html/index.html
.Release Notes
Is this a user-facing change?
Implement builtin default evaluator for model evaluation.
What component(s), interfaces, languages, and integrations does this PR affect?
Components
area/artifacts
: Artifact stores and artifact loggingarea/build
: Build and test infrastructure for MLflowarea/docs
: MLflow documentation pagesarea/examples
: Example codearea/model-registry
: Model Registry service, APIs, and the fluent client calls for Model Registryarea/models
: MLmodel format, model serialization/deserialization, flavorsarea/projects
: MLproject format, project running backendsarea/scoring
: MLflow Model server, model deployment tools, Spark UDFsarea/server-infra
: MLflow Tracking server backendarea/tracking
: Tracking Service, tracking client APIs, autologgingInterface
area/uiux
: Front-end, user experience, plotting, JavaScript, JavaScript dev serverarea/docker
: Docker use across MLflow's components, such as MLflow Projects and MLflow Modelsarea/sqlalchemy
: Use of SQLAlchemy in the Tracking Service or Model Registryarea/windows
: Windows supportLanguage
language/r
: R APIs and clientslanguage/java
: Java APIs and clientslanguage/new
: Proposals for new client languagesIntegrations
integrations/azure
: Azure and Azure ML integrationsintegrations/sagemaker
: SageMaker integrationsintegrations/databricks
: Databricks integrationsHow should the PR be classified in the release notes? Choose one:
rn/breaking-change
- The PR will be mentioned in the "Breaking Changes" sectionrn/none
- No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" sectionrn/feature
- A new user-facing feature worth mentioning in the release notesrn/bug-fix
- A user-facing bug fix worth mentioning in the release notesrn/documentation
- A user-facing documentation change worth mentioning in the release notes