Support optional inputs in model signatures #8438

apurva-koti · 2023-05-15T22:39:23Z

Related Issues/PRs

#xxx

What changes are proposed in this pull request?

Add an optional boolean parameter to ColSpec to specify whether the column in question is required for model inference or can be omitted. Updates code accordingly:

Updates pyfunc schema checks to check against missing required columns. Missing optional columns will be ignored, but provided optional columns will still be type checked.
Defaults optional to False for backwards compatibility.
Prevents argument autofill in pyfunc models returned as spark UDFs as added here Enable spark_udf to use column names from model signature by default #4236

Example usage (pyfunc):

input_schema = Schema(
    [
        ColSpec("double", "a"),
        ColSpec("double", "b"),
        ColSpec("string", "c", optional=True),
        ColSpec("long", "d", optional=True),
    ]
)
signature = ModelSignature(inputs=input_schema)
data = {"a": [1.0], "b": [1.0]}
data_2 = {"a": [1.0], "b": [1.0], "d": [2]}

for data in [
    data,
    data_2,
]:
    pd_data = pd.DataFrame(data)
    check = _enforce_schema(pd_data, signature.inputs) #passes

Example (spark udf):

test_signature = {
    "inputs": '[{"name": "a", "type": "long"}, {"name": "b", "type": "long"}, {"name" : "c", "type": "long", "optional": "True"}]',
}
signature = ModelSignature.from_dict(test_signature)
...
udf = mlflow.pyfunc.spark_udf(...)

data = spark.createDataFrame(
    pd.DataFrame(columns=["a", "b"], data={"a": [1, 2], "b": [2, 3]})
)

res = data.withColumn("response", udf(*data.columns)) #

How is this patch tested?

Existing unit/integration tests
New unit/integration tests
Manual tests (describe details, including test results, below)

Does this PR change the documentation?

No. You can skip the rest of this section.
Yes. Make sure the changed pages / sections render correctly in the documentation preview.

Release Notes

Optional input columns can now be specified in model signatures. These columns can be omitted from input dataframes at prediction time.

Is this a user-facing change?

No. You can skip the rest of this section.
Yes. Give a description of this change to be included in the release notes for MLflow users.

Optional input columns can now be specified in model signatures. These columns can be omitted from input dataframes at prediction time.

What component(s), interfaces, languages, and integrations does this PR affect?

Components

Interface

area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
area/windows: Windows support

Language

language/r: R APIs and clients
language/java: Java APIs and clients
language/new: Proposals for new client languages

Integrations

integrations/azure: Azure and Azure ML integrations
integrations/sagemaker: SageMaker integrations
integrations/databricks: Databricks integrations

How should the PR be classified in the release notes? Choose one:

rn/breaking-change - The PR will be mentioned in the "Breaking Changes" section
rn/none - No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" section
rn/feature - A new user-facing feature worth mentioning in the release notes
rn/bug-fix - A user-facing bug fix worth mentioning in the release notes
rn/documentation - A user-facing documentation change worth mentioning in the release notes

Signed-off-by: Apurva Koti <apurva.koti@databricks.com>

mlflow-automation · 2023-05-15T22:39:44Z

Documentation preview for 96030fb will be available here when this CircleCI job completes successfully.

More info

Ignore this comment if this PR does not change the documentation.
It takes a few minutes for the preview to be available.
The preview is updated when a new commit is pushed to this PR.
This comment was created by https://github.com/mlflow/mlflow/actions/runs/5085462433.

WeichenXu123 · 2023-05-17T00:24:41Z

Could you update PR description to attach an example code that shows the case your PR supports ?

apurva-koti · 2023-05-17T00:26:16Z

@WeichenXu123 PR not ready yet. I'll have all that in when requesting review

Signed-off-by: Apurva Koti <apurva.koti@databricks.com>

apurva-koti · 2023-05-25T21:31:04Z

mlflow/pyfunc/__init__.py

+            if input_schema and len(input_schema.optional_input_names()) > 0:
+                raise MlflowException(
+                    message="Cannot apply UDF without column names specified when"
+                    " model signature contains optional columns.",
+                    error_code=INVALID_PARAMETER_VALUE,
+                )


Automatic parameter filling relies on the model signature to determine what the columns will be when the UDF is applied to the dataframe.
With optional columns, we can neither:

include one or all of them, as that would implicitly require those columns to exist in the dataframe, raising an error within pyspark

exclude one or all of them, as they would then not be selected from the dataframe at all

Given this, it seems reasonable to prevent this convenience for this case. Users can still manually pass in the list of columns to udf as follows:

test_signature = { "inputs": '[{"name": "a", "type": "long"}, {"name": "b", "type": "long"}, {"name" : "c", "type": "long", "optional": "True"}]', } signature = ModelSignature.from_dict(test_signature) ... udf = mlflow.pyfunc.spark_udf(...) data = spark.createDataFrame( pd.DataFrame(columns=["a", "b"], data={"a": [1, 2], "b": [2, 3]}) ) res = data.withColumn("response", udf(*data.columns)) # <-- calling udf() would throw an exception.

Make sense!

Signed-off-by: Apurva Koti <apurva.koti@databricks.com>

WeichenXu123

LGTM

impl

f4d17e0

Signed-off-by: Apurva Koti <apurva.koti@databricks.com>

apurva-koti added 2 commits May 17, 2023 11:46

fix

fb52501

Signed-off-by: Apurva Koti <apurva.koti@databricks.com>

oops

b88e078

Signed-off-by: Apurva Koti <apurva.koti@databricks.com>

github-actions bot added area/models MLmodel format, model serialization/deserialization, flavors rn/feature Mention under Features in Changelogs. labels May 17, 2023

apurva-koti added 9 commits May 24, 2023 11:22

test

692caaa

Signed-off-by: Apurva Koti <apurva.koti@databricks.com>

merge

9f0466f

Signed-off-by: Apurva Koti <apurva.koti@databricks.com>

merge

72b1a7c

Signed-off-by: Apurva Koti <apurva.koti@databricks.com>

todict

5e844f5

Signed-off-by: Apurva Koti <apurva.koti@databricks.com>

pyspark

ca7e01c

Signed-off-by: Apurva Koti <apurva.koti@databricks.com>

rst

90319ba

Signed-off-by: Apurva Koti <apurva.koti@databricks.com>

rst

ca71a66

Signed-off-by: Apurva Koti <apurva.koti@databricks.com>

lint

563a466

Signed-off-by: Apurva Koti <apurva.koti@databricks.com>

rst

df8b3a3

Signed-off-by: Apurva Koti <apurva.koti@databricks.com>

apurva-koti marked this pull request as ready for review May 25, 2023 21:26

apurva-koti requested review from dbczumar and WeichenXu123 May 25, 2023 21:26

apurva-koti commented May 25, 2023

View reviewed changes

apurva-koti added 3 commits May 25, 2023 14:33

typo

fbc68a1

Signed-off-by: Apurva Koti <apurva.koti@databricks.com>

bro

724a631

Signed-off-by: Apurva Koti <apurva.koti@databricks.com>

fix

96030fb

Signed-off-by: Apurva Koti <apurva.koti@databricks.com>

WeichenXu123 approved these changes May 26, 2023

View reviewed changes

apurva-koti merged commit 61decf4 into mlflow:master May 26, 2023
26 checks passed

apurva-koti deleted the optional-input-types branch May 26, 2023 17:59

apurva-koti mentioned this pull request May 26, 2023

Update ColSpec repr to handle optional inputs #8545

Merged

33 tasks

apurva-koti mentioned this pull request Jun 9, 2023

[FR] Increase model signature flexibility to allow for occasionally missing fields while retaining datatype enforcement #6783

Closed

21 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support optional inputs in model signatures #8438

Support optional inputs in model signatures #8438

apurva-koti commented May 15, 2023 •

edited

mlflow-automation commented May 15, 2023 •

edited

WeichenXu123 commented May 17, 2023

apurva-koti commented May 17, 2023

apurva-koti May 25, 2023 •

edited

WeichenXu123 May 26, 2023

WeichenXu123 left a comment

Support optional inputs in model signatures #8438

Support optional inputs in model signatures #8438

Conversation

apurva-koti commented May 15, 2023 • edited

Related Issues/PRs

What changes are proposed in this pull request?

How is this patch tested?

Does this PR change the documentation?

Release Notes

Is this a user-facing change?

What component(s), interfaces, languages, and integrations does this PR affect?

How should the PR be classified in the release notes? Choose one:

mlflow-automation commented May 15, 2023 • edited

WeichenXu123 commented May 17, 2023

apurva-koti commented May 17, 2023

apurva-koti May 25, 2023 • edited

Choose a reason for hiding this comment

WeichenXu123 May 26, 2023

Choose a reason for hiding this comment

WeichenXu123 left a comment

Choose a reason for hiding this comment

apurva-koti commented May 15, 2023 •

edited

mlflow-automation commented May 15, 2023 •

edited

apurva-koti May 25, 2023 •

edited