-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Convert transformers scalar string output to list of strings for batch inference #8546
Conversation
Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
"inputs": '[{"name": "sequences", "type": "string"}, {"name": ' | ||
'"candidate_labels", "type": "string"}, {"name": ' | ||
'"hypothesis_template", "type": "string"}]', | ||
"outputs": '[{"name": "sequence", "type": "string"}, {"name": "labels", ' | ||
'"type": "string"}, {"name": "scores", "type": "double"}]', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To confirm, this is all just no-op formatting, right? No fundamental reason for this particular key ordering?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
correct. I'm going to file a follow-up PR next week to convert all of these to dicts instead of JSON encoded dicts to minimize the chances of arbitrary ordering creating issues in these tests as well.
@@ -2528,7 +2528,7 @@ to formats that are compatible with json serialization and casting to Pandas Dat | |||
types that can be loaded as ``pyfunc``. | |||
|
|||
In the current version, text-based large language | |||
models are supported for use with ``pyfunc``, while computer vision, audio, multi-modal, timeseries, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we move audio
next to text-based
? large language models for text and audio processing are supported...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah good catch. Updated!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks @BenWilson2 !
Documentation preview for b55838f will be available here when this CircleCI job completes successfully. More info
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM after addressing @dbczumar 's comments :)
Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
Related Issues/PRs
#xxxWhat changes are proposed in this pull request?
Changes the output of transformers models that previously would return
str
to returnList[str]
with a single element to support batch inference processing.How is this patch tested?
Does this PR change the documentation?
Release Notes
Is this a user-facing change?
Changes the output of transformers models that previously would return
str
to returnList[str]
with a single element to support batch inference processing.What component(s), interfaces, languages, and integrations does this PR affect?
Components
area/artifacts
: Artifact stores and artifact loggingarea/build
: Build and test infrastructure for MLflowarea/docs
: MLflow documentation pagesarea/examples
: Example codearea/model-registry
: Model Registry service, APIs, and the fluent client calls for Model Registryarea/models
: MLmodel format, model serialization/deserialization, flavorsarea/recipes
: Recipes, Recipe APIs, Recipe configs, Recipe Templatesarea/projects
: MLproject format, project running backendsarea/scoring
: MLflow Model server, model deployment tools, Spark UDFsarea/server-infra
: MLflow Tracking server backendarea/tracking
: Tracking Service, tracking client APIs, autologgingInterface
area/uiux
: Front-end, user experience, plotting, JavaScript, JavaScript dev serverarea/docker
: Docker use across MLflow's components, such as MLflow Projects and MLflow Modelsarea/sqlalchemy
: Use of SQLAlchemy in the Tracking Service or Model Registryarea/windows
: Windows supportLanguage
language/r
: R APIs and clientslanguage/java
: Java APIs and clientslanguage/new
: Proposals for new client languagesIntegrations
integrations/azure
: Azure and Azure ML integrationsintegrations/sagemaker
: SageMaker integrationsintegrations/databricks
: Databricks integrationsHow should the PR be classified in the release notes? Choose one:
rn/breaking-change
- The PR will be mentioned in the "Breaking Changes" sectionrn/none
- No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" sectionrn/feature
- A new user-facing feature worth mentioning in the release notesrn/bug-fix
- A user-facing bug fix worth mentioning in the release notesrn/documentation
- A user-facing documentation change worth mentioning in the release notes