Add `pos_label` to `mlflow.evaluate` #6696

harupy · 2022-09-05T03:28:34Z

Signed-off-by: harupy hkawamura0130@gmail.com

Related Issues/PRs

#xxx

What changes are proposed in this pull request?

Add pos_label to mlflow.evaluate to provide a way to specify the positive label to use when computing metrics for binary classification.

How is this patch tested?

I have written tests (not required for typo or doc fix) and confirmed the proposed feature/bug-fix/change works.
Unit tests
Manual test

import mlflow
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_iris, load_breast_cancer
from sklearn.model_selection import train_test_split
from pprint import pprint
import warnings

warnings.simplefilter("ignore")
client = mlflow.MlflowClient()

# Binary
X, y = load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
model = LogisticRegression().fit(X_train, y_train)

with mlflow.start_run() as run:
    info = mlflow.sklearn.log_model(model, "model")
    result = mlflow.evaluate(
        info.model_uri,
        X_test,
        targets=y_test,
        model_type="classifier",
        dataset_name="binary",
        evaluators="default",
        evaluator_config={"log_model_explainability": False},
    )

print(f"Binary: pos_label = None")
pprint(client.get_run(run.info.run_id).data.metrics)

print("\n" + "=" * 80 + "\n")

with mlflow.start_run() as run:
    info = mlflow.sklearn.log_model(model, "model")
    result = mlflow.evaluate(
        info.model_uri,
        X_test,
        targets=y_test,
        model_type="classifier",
        dataset_name="binary",
        evaluators="default",
        evaluator_config={
            "log_model_explainability": False,
            "pos_label": 1,
        },
    )

print(f"Binary: pos_label = 1s")
pprint(client.get_run(run.info.run_id).data.metrics)

print("\n" + "=" * 80 + "\n")

# Multi-class
X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
model = LogisticRegression().fit(X_train, y_train)

with mlflow.start_run() as run:
    info = mlflow.sklearn.log_model(model, "model")
    result = mlflow.evaluate(
        info.model_uri,
        X_test,
        targets=y_test,
        model_type="classifier",
        dataset_name="multiclass",
        evaluators="default",
        evaluator_config={"log_model_explainability": False},
    )

print(f"Multi-class: pos_label = None")
pprint(client.get_run(run.info.run_id).data.metrics)

Does this PR change the documentation?

No. You can skip the rest of this section.
Yes. Make sure the changed pages / sections render correctly by following the steps below.

Click the Details link on the Preview docs check.
Find the changed pages / sections and make sure they render correctly.

Release Notes

Is this a user-facing change?

No. You can skip the rest of this section.
Yes. Give a description of this change to be included in the release notes for MLflow users.

Add pos_label to mlflow.evaluate to specify the positive label to use when computing metrics for binary classification.

What component(s), interfaces, languages, and integrations does this PR affect?

Components

Interface

area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
area/windows: Windows support

Language

language/r: R APIs and clients
language/java: Java APIs and clients
language/new: Proposals for new client languages

Integrations

integrations/azure: Azure and Azure ML integrations
integrations/sagemaker: SageMaker integrations
integrations/databricks: Databricks integrations

How should the PR be classified in the release notes? Choose one:

rn/breaking-change - The PR will be mentioned in the "Breaking Changes" section
rn/none - No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" section
rn/feature - A new user-facing feature worth mentioning in the release notes
rn/bug-fix - A user-facing bug fix worth mentioning in the release notes
rn/documentation - A user-facing documentation change worth mentioning in the release notes

Signed-off-by: harupy <hkawamura0130@gmail.com>

mlflow/models/evaluation/base.py

Signed-off-by: harupy <hkawamura0130@gmail.com>

mlflow/models/evaluation/base.py

dbczumar

LGTM! Thanks @harupy !

Co-authored-by: Corey Zumar <39497902+dbczumar@users.noreply.github.com> Signed-off-by: harupy <hkawamura0130@gmail.com>

jimmyxu-db · 2022-09-07T00:34:24Z

mlflow/models/evaluation/default_evaluator.py

@@ -147,7 +147,7 @@ def _get_binary_sum_up_label_pred_prob(positive_class_index, positive_class, y,
    return y_bin, y_pred_bin, y_prob_bin


-def _get_classifier_per_class_metrics(y, y_pred):
+def _get_classifier_per_class_metrics(y, y_pred, *, pos_label=1):


why are we defaulting to 1? I'm looking at https://github.com/mlflow/mlflow/pull/6117/files and it seems that in that PR we default to None?

@jimmyxu-db I chose 1 to preserve the current behavior of the default evaluator.

~~In mlflow.sklearn.eval_and_log_metrics, pos_label is used for both binary and multi-class classification models. pos_label = None with average = 'weighted' works for both model types.~~

We can set the default to None (and use average = 'weighted') if that makes more sense.

@WeichenXu123 Thoughts?

We can set the default to None (and use average = 'weighted') if that makes more sense.

For binary case, we should only set average = 'binary' ?

and pos_label = 1 is consistent with sklearn metric function default value https://scikit-learn.org/stable/modules/generated/sklearn.metrics.precision_score.html

so I propose use default value 1

For binary case, we should only set average = 'binary' ?

It depends on how balanced the dataset is. If it's imbalanced, other options such as weighted might make more sense.

@yxiong

How about we pass in both pos_label and average to allow users to specify them? I can see two options here:

This sounds good to me.

The downside for this is multiclass metrics, it will fail.

It actually won't in the default evaluator. The default evaluator logs 1-vs-rest metric for each class, while eval_and_log_metrics just uses <sklearn_score_function>(y_true, y_pred, average="weighted", pos_label=None). @WeichenXu123 Do you know why we chose this approach?

mlflow/mlflow/models/evaluation/default_evaluator.py

Lines 186 to 197 in 1f7f9dc

def _get_classifier_per_class_metrics_collection_df(y, y_pred, labels):

per_class_metrics_list = []

for positive_class_index, positive_class in enumerate(labels):

(y_bin, y_pred_bin, _,) = _get_binary_sum_up_label_pred_prob(

positive_class_index, positive_class, y, y_pred, None

)

per_class_metrics = {"positive_class": positive_class}

per_class_metrics.update(_get_classifier_per_class_metrics(y_bin, y_pred_bin))

per_class_metrics_list.append(per_class_metrics)

return pd.DataFrame(per_class_metrics_list)

The returned dataframe looks like this:

positive_class true_negatives false_positives false_negatives true_positives recall precision f1_score roc_auc precision_recall_auc 0 0 2774 205 137 184 0.573209 0.473008 0.518310 0.858065 0.371424 1 1 2753 209 191 147 0.434911 0.412921 0.423631 0.831169 0.385339 2 2 2752 208 158 182 0.535294 0.466667 0.498630 0.847666 0.409305 3 3 2823 153 160 164 0.506173 0.517350 0.511700 0.818308 0.510070 4 4 2784 168 263 85 0.244253 0.335968 0.282862 0.761322 0.271138 5 5 2730 238 244 88 0.265060 0.269939 0.267477 0.781107 0.319155 6 6 2791 189 207 113 0.353125 0.374172 0.363344 0.805106 0.340894 7 7 2757 220 221 102 0.315789 0.316770 0.316279 0.748290 0.268311 8 8 2799 185 232 84 0.265823 0.312268 0.287179 0.751020 0.225234 9 9 2729 233 195 143 0.423077 0.380319 0.400560 0.828694 0.388790

Hmm, I got a little confused. Does that mean evaluate has a different behavior than eval_and_log_metrics? In particular, I expect to see a single output from the evaluation result (which will be sorted by AutoML to select the best model), not one-output-per-label.

Does that mean evaluate has a different behavior than eval_and_log_metrics?

Correct.

Quick script to check the differences:

import mlflow from sklearn.linear_model import LogisticRegression from sklearn.datasets import make_classification from sklearn.model_selection import train_test_split X, y = make_classification(n_samples=10000, n_classes=10, n_informative=5, random_state=1) X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42) model = LogisticRegression(solver="liblinear").fit(X_train, y_train) with mlflow.start_run() as run: mlflow.sklearn.log_model(model, "model") model_uri = mlflow.get_artifact_uri("model") result = mlflow.evaluate( model_uri, X_test, targets=y_test, model_type="classifier", dataset_name="multiclass-classification-dataset", evaluators="default", evaluator_config={"log_model_explainability": False}, ) with mlflow.start_run(): mlflow.sklearn.eval_and_log_metrics(model, X_test, y_test, prefix="test_")

@yxiong We discussed this issue. We'll update the default evaluator so that it logs the same metrics as eval_and_log_metrics, and also update the default value of pos_label to None.

yxiong · 2022-09-07T19:12:04Z

mlflow/models/evaluation/default_evaluator.py

@@ -147,7 +147,7 @@ def _get_binary_sum_up_label_pred_prob(positive_class_index, positive_class, y,
    return y_bin, y_pred_bin, y_prob_bin


-def _get_classifier_per_class_metrics(y, y_pred):
+def _get_classifier_per_class_metrics(y, y_pred, *, pos_label=1):


How about we pass in both pos_label and average to allow users to specify them? I can see two options here:

Make the default consistent with sklearn (e.g. https://scikit-learn.org/stable/modules/generated/sklearn.metrics.recall_score.html). The downside for this is multiclass metrics, it will fail.

Set the average to weighted by default so multiclass metrics won't fail. But we need to change it back to binary when pos_label is specified. Otherwise it'll produce the wrong metrics for binary classification (see Pass in the pos_label parameter to eval_and_log_metrics #5807).

(2) is what we did before with eval_and_log_metrics, but I think (1) is more intuitive.

This is important to get right! Please confirm with me on the approach we choose before merging this PR.

WeichenXu123 · 2022-09-08T00:51:14Z

What about change the pos_label config name to be binary_classifier_pos_label ? In this PR it only apply to binary classifier case.
In Default Evaluator it also computes metrics for multinomial classifier, see

mlflow/mlflow/models/evaluation/default_evaluator.py

Line 177 in eb69336

metrics["f1_score_micro"] = sk_metrics.f1_score(y, y_pred, average="micro", labels=labels)

we can explicitly set pos_label argument to be None there.
Let's change _get_classifier_per_class_metrics function name to be _get_binary_classification_metrics.

Signed-off-by: harupy <hkawamura0130@gmail.com>

yxiong

LGTM in general. Please add some more test to make sure the behavior is correct.

mlflow/models/evaluation/base.py

tests/models/test_default_evaluator.py

harupy · 2022-09-08T03:41:55Z

mlflow/models/evaluation/base.py

@@ -926,6 +926,12 @@ def evaluate(
        - **log_metrics_with_dataset_info**: A boolean value specifying whether or not to include
          information about the evaluation dataset in the name of each metric logged to MLflow
          Tracking during evaluation, default value is True.
+        - **pos_label**: The positive label used to compute binary classification metrics such as
+          precision, recall, f1, etc (default: ``None``). This parameter is only used for binary


Should pos_label default to 1 for binary classification models?

eval_and_log_metrics uses None by default regardelss of the model type.

In eval_and_log_metrics, it will set average to weighted by default, in which case the pos_label doesn't matter. I think that's wrong, and we do not need to be backward compatible to that. Setting pos_label default to 1 makes sense.

That makes sense, I will update the code.

Signed-off-by: harupy <hkawamura0130@gmail.com>

harupy · 2022-09-08T04:40:56Z

mlflow/models/evaluation/base.py

+        - **pos_label**: The positive label to use when computing classification metrics such as
+          precision, recall, f1, etc. for binary classification models (default: ``1``). For
+          multiclass classification and regression models, this parameter will be ignored.
+        - **average**: The averaging method to use when computing classification metrics such as
+          precision, recall, f1, etc. for multiclass classification models
+          (default: ``'weighted'``). For binary classification and regression models, this
+          parameter will be ignored.


@yxiong @WeichenXu123 Added average and updated the pos_label doc. Could you check?

I think it's fine

Signed-off-by: harupy <hkawamura0130@gmail.com>

WeichenXu123

LGTM!

Signed-off-by: harupy <hkawamura0130@gmail.com>

harupy · 2022-09-08T22:51:55Z

tests/models/test_default_evaluator.py

+
+
+@pytest.mark.parametrize("average", [None, "weighted", "macro", "micro"])
+def test_evaluation_multiclass_classification_with_average(average):


@yxiong Added a test for average.

yxiong

Thanks for addressing my comments. LGTM!

harupy · 2022-09-08T22:53:00Z

mlflow/models/evaluation/default_evaluator.py

+        "example_count": len(y_true),
+        "accuracy_score": sk_metrics.accuracy_score(y_true, y_pred),
+        "recall_score": sk_metrics.recall_score(
+            y_true, y_pred, average=average, pos_label=pos_label
+        ),
+        "precision_score": sk_metrics.precision_score(
+            y_true, y_pred, average=average, pos_label=pos_label
+        ),
+        "f1_score": sk_metrics.f1_score(y_true, y_pred, average=average, pos_label=pos_label),


Changed the same metric names to align with eval_and_log_metrics.

harupy · 2022-09-08T22:55:34Z

mlflow/models/evaluation/default_evaluator.py

-    if not is_binomial:
-        metrics["f1_score_micro"] = sk_metrics.f1_score(y, y_pred, average="micro", labels=labels)
-        metrics["f1_score_macro"] = sk_metrics.f1_score(y, y_pred, average="macro", labels=labels)
+def _get_binary_classifier_metrics(*, y_true, y_pred, y_proba=None, labels=None, pos_label=1):


Enforce keyword form to avoid swapping y_true and y_pred by mistake.

harupy · 2022-09-08T23:21:22Z

@yxiong @dbczumar @WeichenXu123 Here's the comparison of logged metrics between mlflow.evaluate and mlflow.sklearn.eval_and_log_metrics:

Code

import mlflow
from sklearn.linear_model import LogisticRegression, LinearRegression
from sklearn.datasets import load_iris, load_breast_cancer, load_diabetes
from sklearn.model_selection import train_test_split
from pprint import pprint
import logging
import warnings

logging.getLogger("mlflow").setLevel(logging.CRITICAL)
warnings.simplefilter("ignore")
client = mlflow.MlflowClient()

evaluator_config = {
    "log_metrics_with_dataset_info": False,
    "log_model_explainability": False,
    "metric_prefix": "test_",
}


def divider(title, length=80):
    length = shutil.get_terminal_size(fallback=(80, 24))[0] if length is None else length
    rest = length - len(title) - 2
    left = rest // 2 if rest % 2 else (rest + 1) // 2
    return "\n{} {} {}\n".format("=" * left, title, "=" * (rest - left))


# Binary
X, y = load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
model = LogisticRegression().fit(X_train, y_train)

with mlflow.start_run() as run:
    info = mlflow.sklearn.log_model(model, "model")
    result = mlflow.evaluate(
        info.model_uri,
        X_test,
        targets=y_test,
        model_type="classifier",
        dataset_name="breast_cancer",
        evaluators="default",
        evaluator_config=evaluator_config,
    )

print(divider("mlflow.evaluate - binary classification"))
pprint(client.get_run(run.info.run_id).data.metrics)


with mlflow.start_run() as run:
    mlflow.sklearn.eval_and_log_metrics(model, X_test, y_test, prefix="test_")

print(divider("mlflow.sklearn.eval_and_log_metrics - binary classification"))
pprint(client.get_run(run.info.run_id).data.metrics)

# Multi-class
X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
model = LogisticRegression().fit(X_train, y_train)

with mlflow.start_run() as run:
    info = mlflow.sklearn.log_model(model, "model")
    result = mlflow.evaluate(
        info.model_uri,
        X_test,
        targets=y_test,
        model_type="classifier",
        dataset_name="iris",
        evaluators="default",
        evaluator_config=evaluator_config,
    )

print(divider("mlflow.evaluate - multiclass classification"))
pprint(client.get_run(run.info.run_id).data.metrics)

with mlflow.start_run() as run:
    mlflow.sklearn.eval_and_log_metrics(model, X_test, y_test, prefix="test_")

print(divider("eval_and_log_metrics - multiclass classification"))
pprint(client.get_run(run.info.run_id).data.metrics)

# Regression
X, y = load_diabetes(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
model = LinearRegression().fit(X_train, y_train)

with mlflow.start_run() as run:
    info = mlflow.sklearn.log_model(model, "model")
    result = mlflow.evaluate(
        info.model_uri,
        X_test,
        targets=y_test,
        model_type="regressor",
        dataset_name="diabetes",
        evaluators="default",
        evaluator_config=evaluator_config,
    )

print(divider("mlflow.evaluate - regression"))
pprint(client.get_run(run.info.run_id).data.metrics)


with mlflow.start_run() as run:
    mlflow.sklearn.eval_and_log_metrics(model, X_test, y_test, prefix="test_")

print(divider("mlflow.sklearn.eval_and_log_metrics - regression"))
pprint(client.get_run(run.info.run_id).data.metrics)

=================== mlflow.evaluate - binary classification ====================

{'test_accuracy_score': 0.9574468085106383,
 'test_example_count': 188.0,
 'test_f1_score': 0.9669421487603306,
 'test_false_negatives': 4.0,
 'test_false_positives': 4.0,
 'test_log_loss': 0.09432132481354148,
 'test_precision_recall_auc': 0.9976327843071228,
 'test_precision_score': 0.9669421487603306,
 'test_recall_score': 0.9669421487603306,
 'test_roc_auc': 0.9956827433082521,
 'test_score': 0.9574468085106383,
 'test_true_negatives': 63.0,
 'test_true_positives': 117.0}

========= mlflow.sklearn.eval_and_log_metrics - binary classification ==========

{'test_accuracy_score': 0.9574468085106383,
 'test_f1_score': 0.9574468085106383,
 'test_log_loss': 0.0943213248135415,
 'test_precision_score': 0.9574468085106383,
 'test_recall_score': 0.9574468085106383,
 'test_roc_auc_score': 0.9956827433082521,
 'test_score': 0.9574468085106383}

================= mlflow.evaluate - multiclass classification ==================

{'test_accuracy_score': 1.0,
 'test_example_count': 50.0,
 'test_f1_score': 1.0,
 'test_log_loss': 0.12474422166493657,
 'test_precision_score': 1.0,
 'test_recall_score': 1.0,
 'test_score': 1.0}

=============== eval_and_log_metrics - multiclass classification ===============

{'test_accuracy_score': 1.0,
 'test_f1_score': 1.0,
 'test_log_loss': 0.12474422166493657,
 'test_precision_score': 1.0,
 'test_recall_score': 1.0,
 'test_roc_auc_score': 1.0,
 'test_score': 1.0}

========================= mlflow.evaluate - regression =========================

{'test_example_count': 146.0,
 'test_max_error': 152.70618561385456,
 'test_mean_absolute_error': 41.96445302653266,
 'test_mean_absolute_percentage_error': 0.35510850217076595,
 'test_mean_on_label': 152.43835616438355,
 'test_mean_squared_error': 2817.801570168678,
 'test_r2_score': 0.5103954261351439,
 'test_root_mean_squared_error': 53.082968739216895,
 'test_score': 0.5103954261351439,
 'test_sum_on_label': 22256.0}

=============== mlflow.sklearn.eval_and_log_metrics - regression ===============

{'test_mae': 41.96445302653266,
 'test_mse': 2817.801570168678,
 'test_r2_score': 0.5103954261351439,
 'test_rmse': 53.082968739216895,
 'test_score': 0.5103954261351439}

We still have a few differences:

~~mlflow.sklearn.eval_and_log_metrics logs score, while mlflow.evaluate doesn't.~~
Regression metric names (mae vs mean_absolute_error).

harupy · 2022-09-09T01:05:59Z

@yxiong Once this PR is merged, the difference between mlflow.sklearn.eval_and_log_metrics and mlflow.evaluate is the regression metric naming (e.g. mae vs. mean_absolute_error) as shown in the comment above. Is it possible to update AutoML to support long regression metric names logged by mlflow.evaluate?

yxiong · 2022-09-09T04:29:44Z

@yxiong Once this PR is merged, the difference between mlflow.sklearn.eval_and_log_metrics and mlflow.evaluate is the regression metric naming (e.g. mae vs. mean_absolute_error) as shown in the comment above. Is it possible to update AutoML to support long regression metric names logged by mlflow.evaluate?

I think that's ok. @jimmyxu-db Please take note on this naming difference during our migration.

* Add pos_label to mlflow.evaluate Signed-off-by: harupy <hkawamura0130@gmail.com> * fix dataset_name Signed-off-by: harupy <hkawamura0130@gmail.com> * fix _get_classifier_per_class_metrics_collection_df Signed-off-by: harupy <hkawamura0130@gmail.com> * fix failed tests Signed-off-by: harupy <hkawamura0130@gmail.com> * use evaluator_config Signed-off-by: harupy <hkawamura0130@gmail.com> * remove average Signed-off-by: harupy <hkawamura0130@gmail.com> * revert Signed-off-by: harupy <hkawamura0130@gmail.com> * remove pos_label Signed-off-by: harupy <hkawamura0130@gmail.com> * revert changes in examples/pipelines/sklearn_regression Signed-off-by: harupy <hkawamura0130@gmail.com> * default value Signed-off-by: harupy <hkawamura0130@gmail.com> * add ** Signed-off-by: harupy <hkawamura0130@gmail.com> * Update mlflow/models/evaluation/base.py Co-authored-by: Corey Zumar <39497902+dbczumar@users.noreply.github.com> Signed-off-by: harupy <hkawamura0130@gmail.com> * fix metrics to log in default evaluator Signed-off-by: harupy <hkawamura0130@gmail.com> * fix doc Signed-off-by: harupy <hkawamura0130@gmail.com> * mention default value Signed-off-by: harupy <hkawamura0130@gmail.com> * add _score Signed-off-by: harupy <hkawamura0130@gmail.com> * fix Signed-off-by: harupy <hkawamura0130@gmail.com> * add average and sett pos_label to 1 by default Signed-off-by: harupy <hkawamura0130@gmail.com> * remove _get_classifier_metrics Signed-off-by: harupy <hkawamura0130@gmail.com> * fix doc Signed-off-by: harupy <hkawamura0130@gmail.com> * refactor Signed-off-by: harupy <hkawamura0130@gmail.com> * add test Signed-off-by: harupy <hkawamura0130@gmail.com> * rename dataset Signed-off-by: harupy <hkawamura0130@gmail.com> * fix Signed-off-by: harupy <hkawamura0130@gmail.com> * test pos_label=None Signed-off-by: harupy <hkawamura0130@gmail.com> * fix ternary Signed-off-by: harupy <hkawamura0130@gmail.com> * fix Signed-off-by: harupy <hkawamura0130@gmail.com> * fix tests Signed-off-by: harupy <hkawamura0130@gmail.com> * remove redundant function parameters Signed-off-by: harupy <hkawamura0130@gmail.com> Signed-off-by: harupy <hkawamura0130@gmail.com> Co-authored-by: Corey Zumar <39497902+dbczumar@users.noreply.github.com>

harupy added 3 commits September 5, 2022 12:25

Add pos_label to mlflow.evaluate

a1ebb77

Signed-off-by: harupy <hkawamura0130@gmail.com>

fix dataset_name

2596f9c

Signed-off-by: harupy <hkawamura0130@gmail.com>

fix _get_classifier_per_class_metrics_collection_df

a61d159

Signed-off-by: harupy <hkawamura0130@gmail.com>

github-actions bot added area/tracking Tracking service, tracking client APIs, autologging rn/feature Mention under Features in Changelogs. labels Sep 5, 2022

fix failed tests

de4262f

Signed-off-by: harupy <hkawamura0130@gmail.com>

WeichenXu123 reviewed Sep 5, 2022

View reviewed changes

mlflow/models/evaluation/base.py Outdated Show resolved Hide resolved

use evaluator_config

9a862bf

Signed-off-by: harupy <hkawamura0130@gmail.com>

harupy marked this pull request as ready for review September 6, 2022 05:38

harupy added 6 commits September 6, 2022 14:39

remove average

e76b05b

Signed-off-by: harupy <hkawamura0130@gmail.com>

revert

83a96c3

Signed-off-by: harupy <hkawamura0130@gmail.com>

remove pos_label

282fb83

Signed-off-by: harupy <hkawamura0130@gmail.com>

revert changes in examples/pipelines/sklearn_regression

8e7b91c

Signed-off-by: harupy <hkawamura0130@gmail.com>

default value

89e1737

Signed-off-by: harupy <hkawamura0130@gmail.com>

add **

0895678

Signed-off-by: harupy <hkawamura0130@gmail.com>

dbczumar reviewed Sep 6, 2022

View reviewed changes

mlflow/models/evaluation/base.py Outdated Show resolved Hide resolved

dbczumar approved these changes Sep 6, 2022

View reviewed changes

dbczumar requested a review from jimmyxu-db September 6, 2022 18:53

Update mlflow/models/evaluation/base.py

c6a0ea7

Co-authored-by: Corey Zumar <39497902+dbczumar@users.noreply.github.com> Signed-off-by: harupy <hkawamura0130@gmail.com>

harupy force-pushed the add-pos-label-evaluate-master branch from 39524de to c6a0ea7 Compare September 6, 2022 22:29

jimmyxu-db reviewed Sep 7, 2022

View reviewed changes

yxiong suggested changes Sep 7, 2022

View reviewed changes

harupy added 5 commits September 8, 2022 11:30

fix metrics to log in default evaluator

a0d6526

Signed-off-by: harupy <hkawamura0130@gmail.com>

fix doc

dcc0853

Signed-off-by: harupy <hkawamura0130@gmail.com>

mention default value

9753fee

Signed-off-by: harupy <hkawamura0130@gmail.com>

add _score

4bea1f9

Signed-off-by: harupy <hkawamura0130@gmail.com>

fix

9b80d3d

Signed-off-by: harupy <hkawamura0130@gmail.com>

harupy requested review from yxiong and removed request for yxiong September 8, 2022 03:29

harupy requested review from jimmyxu-db and yxiong September 8, 2022 03:29

yxiong reviewed Sep 8, 2022

View reviewed changes

mlflow/models/evaluation/base.py Outdated Show resolved Hide resolved

tests/models/test_default_evaluator.py Outdated Show resolved Hide resolved

tests/models/test_default_evaluator.py Outdated Show resolved Hide resolved

harupy commented Sep 8, 2022

View reviewed changes

harupy added 3 commits September 8, 2022 13:32

add average and sett pos_label to 1 by default

dbba81a

Signed-off-by: harupy <hkawamura0130@gmail.com>

remove _get_classifier_metrics

b156c5b

Signed-off-by: harupy <hkawamura0130@gmail.com>

fix doc

a7dbda7

Signed-off-by: harupy <hkawamura0130@gmail.com>

harupy commented Sep 8, 2022

View reviewed changes

harupy added 2 commits September 8, 2022 13:44

refactor

f99f7f2

Signed-off-by: harupy <hkawamura0130@gmail.com>

add test

20bd276

Signed-off-by: harupy <hkawamura0130@gmail.com>

harupy requested a review from yxiong September 8, 2022 08:22

harupy added 5 commits September 8, 2022 17:30

rename dataset

8d72de7

Signed-off-by: harupy <hkawamura0130@gmail.com>

fix

9bd2914

Signed-off-by: harupy <hkawamura0130@gmail.com>

test pos_label=None

81cb022

Signed-off-by: harupy <hkawamura0130@gmail.com>

fix ternary

b4a88e6

Signed-off-by: harupy <hkawamura0130@gmail.com>

fix

81000e0

Signed-off-by: harupy <hkawamura0130@gmail.com>

WeichenXu123 approved these changes Sep 8, 2022

View reviewed changes

harupy added 2 commits September 8, 2022 19:45

fix tests

fe4721f

Signed-off-by: harupy <hkawamura0130@gmail.com>

remove redundant function parameters

68cdb07

Signed-off-by: harupy <hkawamura0130@gmail.com>

harupy commented Sep 8, 2022

View reviewed changes

yxiong approved these changes Sep 8, 2022

View reviewed changes

harupy commented Sep 8, 2022

View reviewed changes

harupy requested a review from yxiong September 9, 2022 01:19

yxiong approved these changes Sep 9, 2022

View reviewed changes

harupy merged commit b05b214 into mlflow:master Sep 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `pos_label` to `mlflow.evaluate` #6696

Add `pos_label` to `mlflow.evaluate` #6696

harupy commented Sep 5, 2022 •

edited

dbczumar left a comment

jimmyxu-db Sep 7, 2022

harupy Sep 7, 2022 •

edited

harupy Sep 7, 2022 •

edited

WeichenXu123 Sep 7, 2022 •

edited

harupy Sep 7, 2022 •

edited

harupy Sep 7, 2022 •

edited

yxiong Sep 7, 2022

harupy Sep 7, 2022

harupy Sep 7, 2022 •

edited

harupy Sep 8, 2022

yxiong Sep 7, 2022

WeichenXu123 commented Sep 8, 2022 •

edited

yxiong left a comment

harupy Sep 8, 2022

harupy Sep 8, 2022

yxiong Sep 8, 2022

harupy Sep 8, 2022

harupy Sep 8, 2022 •

edited

WeichenXu123 Sep 8, 2022

WeichenXu123 left a comment

harupy Sep 8, 2022

yxiong left a comment

harupy Sep 8, 2022 •

edited

harupy Sep 8, 2022

harupy commented Sep 8, 2022 •

edited

harupy commented Sep 9, 2022 •

edited

yxiong commented Sep 9, 2022

	def _get_classifier_per_class_metrics_collection_df(y, y_pred, labels):
	per_class_metrics_list = []
	for positive_class_index, positive_class in enumerate(labels):
	(y_bin, y_pred_bin, _,) = _get_binary_sum_up_label_pred_prob(
	positive_class_index, positive_class, y, y_pred, None
	)

	per_class_metrics = {"positive_class": positive_class}
	per_class_metrics.update(_get_classifier_per_class_metrics(y_bin, y_pred_bin))
	per_class_metrics_list.append(per_class_metrics)

	return pd.DataFrame(per_class_metrics_list)



		@pytest.mark.parametrize("average", [None, "weighted", "macro", "micro"])
		def test_evaluation_multiclass_classification_with_average(average):

Add pos_label to mlflow.evaluate #6696

Add pos_label to mlflow.evaluate #6696

Conversation

harupy commented Sep 5, 2022 • edited

Related Issues/PRs

What changes are proposed in this pull request?

How is this patch tested?

Does this PR change the documentation?

Release Notes

Is this a user-facing change?

What component(s), interfaces, languages, and integrations does this PR affect?

How should the PR be classified in the release notes? Choose one:

dbczumar left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

harupy Sep 7, 2022 • edited

Choose a reason for hiding this comment

harupy Sep 7, 2022 • edited

Choose a reason for hiding this comment

WeichenXu123 Sep 7, 2022 • edited

Choose a reason for hiding this comment

harupy Sep 7, 2022 • edited

Choose a reason for hiding this comment

harupy Sep 7, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

harupy Sep 7, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

WeichenXu123 commented Sep 8, 2022 • edited

yxiong left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

harupy Sep 8, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

WeichenXu123 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yxiong left a comment

Choose a reason for hiding this comment

harupy Sep 8, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

harupy commented Sep 8, 2022 • edited

harupy commented Sep 9, 2022 • edited

yxiong commented Sep 9, 2022

Add `pos_label` to `mlflow.evaluate` #6696

Add `pos_label` to `mlflow.evaluate` #6696

harupy commented Sep 5, 2022 •

edited

harupy Sep 7, 2022 •

edited

harupy Sep 7, 2022 •

edited

WeichenXu123 Sep 7, 2022 •

edited

harupy Sep 7, 2022 •

edited

harupy Sep 7, 2022 •

edited

harupy Sep 7, 2022 •

edited

WeichenXu123 commented Sep 8, 2022 •

edited

harupy Sep 8, 2022 •

edited

harupy Sep 8, 2022 •

edited

harupy commented Sep 8, 2022 •

edited

harupy commented Sep 9, 2022 •

edited