Use `iloc` when computing faithfulness metric #11117

harupy · 2024-02-14T03:30:13Z

🛠 DevTools 🛠

Install mlflow from this PR

pip install git+https://github.com/mlflow/mlflow.git@refs/pull/11117/merge

Checkout with GitHub CLI

gh pr checkout 11117

Related Issues/PRs

Resolve #11108

What changes are proposed in this pull request?

How is this PR tested?

Existing unit/integration tests
New unit/integration tests
Manual tests

Does this PR require documentation update?

Release Notes

Is this a user-facing change?

No. You can skip the rest of this section.
Yes. Give a description of this change to be included in the release notes for MLflow users.

What component(s), interfaces, languages, and integrations does this PR affect?

Components

Interface

area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
area/windows: Windows support

Language

language/r: R APIs and clients
language/java: Java APIs and clients
language/new: Proposals for new client languages

Integrations

integrations/azure: Azure and Azure ML integrations
integrations/sagemaker: SageMaker integrations
integrations/databricks: Databricks integrations

How should the PR be classified in the release notes? Choose one:

rn/none - No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" section
rn/breaking-change - The PR will be mentioned in the "Breaking Changes" section
rn/feature - A new user-facing feature worth mentioning in the release notes
rn/bug-fix - A user-facing bug fix worth mentioning in the release notes
rn/documentation - A user-facing documentation change worth mentioning in the release notes

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

github-actions · 2024-02-14T03:30:32Z

Documentation preview for 83ccbdc will be available when this CircleCI job completes successfully.

More info

Ignore this comment if this PR does not change the documentation.
It takes a few minutes for the preview to be available.
The preview is updated when a new commit is pushed to this PR.
This comment was created by https://github.com/mlflow/mlflow/actions/runs/7897503095.

harupy · 2024-02-14T04:00:03Z

mlflow/metrics/genai/genai_metric.py

@@ -31,7 +31,7 @@ def _format_args_string(grading_context_columns: Optional[List[str]], eval_value
    args_dict = {}
    for arg in grading_context_columns:
        if arg in eval_values:
-            args_dict[arg] = eval_values[arg][indx]
+            args_dict[arg] = eval_values[arg].iloc[indx]


It looks like the existing code assumes that inputs, outputs, and eval_values[arg] have the same indices and throws when they don't.

mlflow/mlflow/metrics/genai/genai_metric.py

Lines 277 to 300 in cc184ee

grading_payloads = []

for indx, (input, output) in enumerate(zip(inputs, outputs)):

try:

arg_string = _format_args_string(grading_context_columns, eval_values, indx)

except Exception as e:

raise MlflowException(

f"Values for grading_context_columns are malformed and cannot be "

f"formatted into a prompt for metric '{name}'.\n"

f"Required columns: {grading_context_columns}\n"

f"Values: {eval_values}\n"

f"Error: {e!r}\n"

f"Please check the following: \n"

"- predictions and targets (if required) are provided correctly\n"

"- grading_context_columns are mapped correctly using the evaluator_config "

"parameter\n"

"- input and output data are formatted correctly."

)

grading_payloads.append(

evaluation_context["eval_prompt"].format(

input=(input if include_input else None),

output=output,

grading_context_columns=arg_string,

)

)

In the new code, they don't have to have the same indices.

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

B-Step62 · 2024-02-14T07:43:41Z

tests/metrics/genai/test_genai_metrics.py

@@ -769,6 +769,14 @@ def test_faithfulness_metric():
            examples=[mlflow_example],
        )

+    faithfulness_metric.eval_fn(
+        # Inputs with different indices


nit: the error is rather because of index not starting from 0 I guess?

for indx, (input, output) in enumerate(zip(inputs, outputs)): try: arg_string = _format_args_string(grading_context_columns, eval_values, indx)

The indx always starts from 0 (cuz we do enumerate) and eval_values[0] will raise KeyError if it doesn't have index 0.

Exactly, reset_index(drop=True) should also work as @ai-learner-00 explained in #11108

yup I think the current solution is fine, plz feel free to merge. Just wanted to make sure the ^comment is accurate:)

B-Step62

LGTM!

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com> Signed-off-by: Arthur Jenoudet <arthur.jenoudet@databricks.com>

Use iloc

5a82b63

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

github-actions bot added the rn/none List under Small Changes in Changelogs. label Feb 14, 2024

harupy mentioned this pull request Feb 14, 2024

[BUG] Metric 'faithfulness': Error: Values for grading_context_columns are malformed when sampling from a pandas dataframe #11108

Closed

23 tasks

harupy commented Feb 14, 2024

View reviewed changes

harupy requested a review from mlflow-automation February 14, 2024 04:02

mlflow-automation requested review from B-Step62, BenWilson2, daniellok-db, serena-ruan and WeichenXu123 and removed request for mlflow-automation February 14, 2024 04:02

harupy added 2 commits February 14, 2024 13:12

if pandas.Series

1cba6ff

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

Import pandass

83ccbdc

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

B-Step62 reviewed Feb 14, 2024

View reviewed changes

B-Step62 approved these changes Feb 14, 2024

View reviewed changes

harupy requested a review from B-Step62 February 14, 2024 09:44

harupy merged commit a030e40 into mlflow:master Feb 14, 2024
37 checks passed

harupy deleted the use-iloc branch February 14, 2024 10:42

annzhang-db pushed a commit to annzhang-db/mlflow that referenced this pull request Feb 14, 2024

Use iloc when computing faithfulness metric (mlflow#11117)

20d0367

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

sateeshmannar pushed a commit to StateFarmIns/mlflow that referenced this pull request Feb 20, 2024

Use iloc when computing faithfulness metric (mlflow#11117)

c755aa9

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

artjen pushed a commit to artjen/mlflow that referenced this pull request Mar 26, 2024

Use iloc when computing faithfulness metric (mlflow#11117)

054ae64

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com> Signed-off-by: Arthur Jenoudet <arthur.jenoudet@databricks.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use `iloc` when computing faithfulness metric #11117

Use `iloc` when computing faithfulness metric #11117

harupy commented Feb 14, 2024 •

edited

github-actions bot commented Feb 14, 2024 •

edited

harupy Feb 14, 2024 •

edited

B-Step62 Feb 14, 2024

harupy Feb 14, 2024 •

edited

B-Step62 Feb 14, 2024 •

edited

B-Step62 left a comment

	grading_payloads = []
	for indx, (input, output) in enumerate(zip(inputs, outputs)):
	try:
	arg_string = _format_args_string(grading_context_columns, eval_values, indx)
	except Exception as e:
	raise MlflowException(
	f"Values for grading_context_columns are malformed and cannot be "
	f"formatted into a prompt for metric '{name}'.\n"
	f"Required columns: {grading_context_columns}\n"
	f"Values: {eval_values}\n"
	f"Error: {e!r}\n"
	f"Please check the following: \n"
	"- predictions and targets (if required) are provided correctly\n"
	"- grading_context_columns are mapped correctly using the evaluator_config "
	"parameter\n"
	"- input and output data are formatted correctly."
	)
	grading_payloads.append(
	evaluation_context["eval_prompt"].format(
	input=(input if include_input else None),
	output=output,
	grading_context_columns=arg_string,
	)
	)

Use iloc when computing faithfulness metric #11117

Use iloc when computing faithfulness metric #11117

Conversation

harupy commented Feb 14, 2024 • edited

Install mlflow from this PR

Checkout with GitHub CLI

Related Issues/PRs

What changes are proposed in this pull request?

How is this PR tested?

Does this PR require documentation update?

Release Notes

Is this a user-facing change?

What component(s), interfaces, languages, and integrations does this PR affect?

How should the PR be classified in the release notes? Choose one:

github-actions bot commented Feb 14, 2024 • edited

harupy Feb 14, 2024 • edited

Choose a reason for hiding this comment

B-Step62 Feb 14, 2024

Choose a reason for hiding this comment

harupy Feb 14, 2024 • edited

Choose a reason for hiding this comment

B-Step62 Feb 14, 2024 • edited

Choose a reason for hiding this comment

B-Step62 left a comment

Choose a reason for hiding this comment

Use `iloc` when computing faithfulness metric #11117

Use `iloc` when computing faithfulness metric #11117

harupy commented Feb 14, 2024 •

edited

github-actions bot commented Feb 14, 2024 •

edited

harupy Feb 14, 2024 •

edited

harupy Feb 14, 2024 •

edited

B-Step62 Feb 14, 2024 •

edited