Add tests for pyfunc predict and serving #10192

serena-ruan · 2023-10-27T09:39:28Z

🛠 DevTools 🛠

Install mlflow from this PR

pip install git+https://github.com/mlflow/mlflow.git@refs/pull/10192/merge

Checkout with GitHub CLI

gh pr checkout 10192

Related Issues/PRs

#xxx

What changes are proposed in this pull request?

Add more tests that works in master branch before merging llm_signature branch to make sure it doesn't break existing behaviors.

How is this PR tested?

Existing unit/integration tests
New unit/integration tests
Manual tests

For recording the strange behaviors I find by writing this PR:

Code to repro

import mlflow
from mlflow.models.signature import infer_signature


class MyModel(mlflow.pyfunc.PythonModel):
    def predict(self, context, model_input, params=None):
        return model_input

data1 = {"query": ["sentence_1", "sentence_2"]}
data2 = [{"query": "sentence"}, {"query": "sentence"}]
signature1 = infer_signature(data1)
print(f"signature for '{data1}': {signature1}")
# signature for '{'query': ['sentence_1', 'sentence_2']}': inputs: 
#   ['query': string]
# outputs: 
#   None
# params: 
#   None

signature2 = infer_signature(data2)
print(f"signature for '{data2}': {signature2}")
# signature for '[{'query': 'sentence'}, {'query': 'sentence'}]': inputs: 
#   ['query': string]
# outputs: 
#   None
# params: 
#   None

with mlflow.start_run():
    model_info = mlflow.pyfunc.log_model(
        python_model=MyModel(),
        artifact_path="test_model",
        signature=signature1,
    )
print(f"model_uri: {model_info.model_uri}")

loaded_model = mlflow.pyfunc.load_model(model_info.model_uri)
result1 = loaded_model.predict(data1)
print(f"prediction for '{data1}': {result1}")
# prediction for '{'query': ['sentence_1', 'sentence_2']}':                       query
# 0  [sentence_1, sentence_2]

result2 = loaded_model.predict(data2)
print(f"prediction for '{data2}': {result2}")
# prediction for '[{'query': 'sentence'}, {'query': 'sentence'}]':       query
# 0  sentence
# 1  sentence

Serve the model, and call the REST API
Result:

// for data1
curl http://127.0.0.1:5000/invocations -H 'Content-Type: application/json' -d '{"inputs": {"query": ["sentence_1", "sentence_2"]}}'
>> {"predictions": [{"query": "sentence_1"}, {"query": "sentence_2"}]}

// for data2
curl http://127.0.0.1:5000/invocations -H 'Content-Type: application/json' -d '{"instances": [{"query": "sentence"}, {"query": "sentence"}]}'
>> {"predictions": [{"query": "sentence"}, {"query": "sentence"}]}%

The serving endpoint's result is not consistent with the batch inference result for data1, they're consistent for data2.

Does this PR require documentation update?

Release Notes

Is this a user-facing change?

No. You can skip the rest of this section.
Yes. Give a description of this change to be included in the release notes for MLflow users.

What component(s), interfaces, languages, and integrations does this PR affect?

Components

Interface

area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
area/windows: Windows support

Language

language/r: R APIs and clients
language/java: Java APIs and clients
language/new: Proposals for new client languages

Integrations

integrations/azure: Azure and Azure ML integrations
integrations/sagemaker: SageMaker integrations
integrations/databricks: Databricks integrations

How should the PR be classified in the release notes? Choose one:

rn/none - No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" section
rn/breaking-change - The PR will be mentioned in the "Breaking Changes" section
rn/feature - A new user-facing feature worth mentioning in the release notes
rn/bug-fix - A user-facing bug fix worth mentioning in the release notes
rn/documentation - A user-facing documentation change worth mentioning in the release notes

Signed-off-by: Serena Ruan <serena.rxy@gmail.com>

github-actions · 2023-10-27T09:39:50Z

Documentation preview for 2bb6e63 will be available here when this CircleCI job completes successfully.

More info

Ignore this comment if this PR does not change the documentation.
It takes a few minutes for the preview to be available.
The preview is updated when a new commit is pushed to this PR.
This comment was created by https://github.com/mlflow/mlflow/actions/runs/6665530088.

serena-ruan · 2023-10-27T09:40:02Z

mlflow/models/utils.py

+                    isinstance(value, np.ndarray) and value.dtype.type == np.str_
+                    # size & shape constraint makes some data batch inference result not
+                    # consistent with serving result.
+                    and value.size == 1 and value.shape == ()
                    for value in pf_input.values()


This part corresponds to the problem I write in PR description.

harupy · 2023-10-27T11:31:21Z

mlflow/utils/proto_json_utils.py

@@ -365,10 +366,15 @@ def parse_tf_serving_input(inp_dict, schema=None):
    import numpy as np

    def cast_schema_type(input_data):
+        input_data = deepcopy(input_data)


why do we need to add deepcopy here?

I think so. It changes input_data if it's a dictionary or list

harupy

LGTM!

Signed-off-by: Serena Ruan <serena.rxy@gmail.com> Signed-off-by: swathi <konakanchi.swathi@gmail.com>

add tests for pyfunc predict and serving

2bb6e63

Signed-off-by: Serena Ruan <serena.rxy@gmail.com>

github-actions bot added the rn/none List under Small Changes in Changelogs. label Oct 27, 2023

serena-ruan commented Oct 27, 2023

View reviewed changes

serena-ruan requested review from harupy, BenWilson2 and dbczumar October 27, 2023 09:41

harupy reviewed Oct 27, 2023

View reviewed changes

harupy approved these changes Oct 30, 2023

View reviewed changes

serena-ruan merged commit dc070be into mlflow:master Oct 30, 2023
39 checks passed

serena-ruan deleted the add_scoring_tests branch October 30, 2023 08:10

KonakanchiSwathi pushed a commit to KonakanchiSwathi/mlflow that referenced this pull request Nov 29, 2023

Add tests for pyfunc predict and serving (mlflow#10192)

14bdb7f

Signed-off-by: Serena Ruan <serena.rxy@gmail.com> Signed-off-by: swathi <konakanchi.swathi@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add tests for pyfunc predict and serving #10192

Add tests for pyfunc predict and serving #10192

serena-ruan commented Oct 27, 2023 •

edited

github-actions bot commented Oct 27, 2023

serena-ruan Oct 27, 2023 •

edited

harupy Oct 27, 2023

serena-ruan Oct 30, 2023

harupy left a comment

Add tests for pyfunc predict and serving #10192

Add tests for pyfunc predict and serving #10192

Conversation

serena-ruan commented Oct 27, 2023 • edited

Install mlflow from this PR

Checkout with GitHub CLI

Related Issues/PRs

What changes are proposed in this pull request?

How is this PR tested?

Does this PR require documentation update?

Release Notes

Is this a user-facing change?

What component(s), interfaces, languages, and integrations does this PR affect?

How should the PR be classified in the release notes? Choose one:

github-actions bot commented Oct 27, 2023

serena-ruan Oct 27, 2023 • edited

Choose a reason for hiding this comment

harupy Oct 27, 2023

Choose a reason for hiding this comment

serena-ruan Oct 30, 2023

Choose a reason for hiding this comment

harupy left a comment

Choose a reason for hiding this comment

serena-ruan commented Oct 27, 2023 •

edited

serena-ruan Oct 27, 2023 •

edited