Convert transformers scalar string output to list of strings for batch inference #8546

BenWilson2 · 2023-05-26T21:28:38Z

Related Issues/PRs

#xxx

What changes are proposed in this pull request?

Changes the output of transformers models that previously would return str to return List[str] with a single element to support batch inference processing.

How is this patch tested?

Existing unit/integration tests
New unit/integration tests
Manual tests (describe details, including test results, below)

Does this PR change the documentation?

No. You can skip the rest of this section.
Yes. Make sure the changed pages / sections render correctly in the documentation preview.

Release Notes

Is this a user-facing change?

No. You can skip the rest of this section.
Yes. Give a description of this change to be included in the release notes for MLflow users.

Changes the output of transformers models that previously would return str to return List[str] with a single element to support batch inference processing.

What component(s), interfaces, languages, and integrations does this PR affect?

Components

Interface

area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
area/windows: Windows support

Language

language/r: R APIs and clients
language/java: Java APIs and clients
language/new: Proposals for new client languages

Integrations

integrations/azure: Azure and Azure ML integrations
integrations/sagemaker: SageMaker integrations
integrations/databricks: Databricks integrations

How should the PR be classified in the release notes? Choose one:

rn/breaking-change - The PR will be mentioned in the "Breaking Changes" section
rn/none - No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" section
rn/feature - A new user-facing feature worth mentioning in the release notes
rn/bug-fix - A user-facing bug fix worth mentioning in the release notes
rn/documentation - A user-facing documentation change worth mentioning in the release notes

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>

dbczumar · 2023-05-26T23:16:19Z

tests/transformers/test_transformers_model_export.py

-                "inputs": '[{"name": "sequences", "type": "string"}, {"name": '
-                '"candidate_labels", "type": "string"}, {"name": '
-                '"hypothesis_template", "type": "string"}]',
-                "outputs": '[{"name": "sequence", "type": "string"}, {"name": "labels", '
-                '"type": "string"}, {"name": "scores", "type": "double"}]',


To confirm, this is all just no-op formatting, right? No fundamental reason for this particular key ordering?

correct. I'm going to file a follow-up PR next week to convert all of these to dicts instead of JSON encoded dicts to minimize the chances of arbitrary ordering creating issues in these tests as well.

dbczumar · 2023-05-26T23:17:44Z

docs/source/models.rst

@@ -2528,7 +2528,7 @@ to formats that are compatible with json serialization and casting to Pandas Dat
    types that can be loaded as ``pyfunc``.

    In the current version, text-based large language
-    models are supported for use with ``pyfunc``, while computer vision, audio, multi-modal, timeseries,


Shall we move audio next to text-based? large language models for text and audio processing are supported...

ah good catch. Updated!

dbczumar

LGTM! Thanks @BenWilson2 !

mlflow-automation · 2023-05-26T23:47:37Z

Documentation preview for b55838f will be available here when this CircleCI job completes successfully.

More info

Ignore this comment if this PR does not change the documentation.
It takes a few minutes for the preview to be available.
The preview is updated when a new commit is pushed to this PR.
This comment was created by https://github.com/mlflow/mlflow/actions/runs/5112968146.

WeichenXu123

LGTM after addressing @dbczumar 's comments :)

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>

force all scalar outputs to be lists for transformers

b7eccbd

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>

BenWilson2 requested review from harupy, WeichenXu123, dbczumar, serena-ruan and mengxr May 26, 2023 21:29

dbczumar reviewed May 26, 2023

View reviewed changes

dbczumar approved these changes May 26, 2023

View reviewed changes

github-actions bot added area/models MLmodel format, model serialization/deserialization, flavors rn/bug-fix Mention under Bug Fixes in Changelogs. labels May 26, 2023

WeichenXu123 approved these changes May 29, 2023

View reviewed changes

PR feedback

b55838f

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>

BenWilson2 merged commit 6168589 into mlflow:master May 30, 2023
35 checks passed

BenWilson2 deleted the adjust-str-return-type branch May 30, 2023 13:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Convert transformers scalar string output to list of strings for batch inference #8546

Convert transformers scalar string output to list of strings for batch inference #8546

BenWilson2 commented May 26, 2023

dbczumar May 26, 2023

BenWilson2 May 29, 2023

dbczumar May 26, 2023

BenWilson2 May 29, 2023

dbczumar left a comment

mlflow-automation commented May 26, 2023 •

edited

WeichenXu123 left a comment

Convert transformers scalar string output to list of strings for batch inference #8546

Convert transformers scalar string output to list of strings for batch inference #8546

Conversation

BenWilson2 commented May 26, 2023

Related Issues/PRs

What changes are proposed in this pull request?

How is this patch tested?

Does this PR change the documentation?

Release Notes

Is this a user-facing change?

What component(s), interfaces, languages, and integrations does this PR affect?

How should the PR be classified in the release notes? Choose one:

dbczumar May 26, 2023

Choose a reason for hiding this comment

BenWilson2 May 29, 2023

Choose a reason for hiding this comment

dbczumar May 26, 2023

Choose a reason for hiding this comment

BenWilson2 May 29, 2023

Choose a reason for hiding this comment

dbczumar left a comment

Choose a reason for hiding this comment

mlflow-automation commented May 26, 2023 • edited

WeichenXu123 left a comment

Choose a reason for hiding this comment

mlflow-automation commented May 26, 2023 •

edited