Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add sentence-transformers as a named flavor #8479

Merged
merged 5 commits into from
May 23, 2023

Conversation

BenWilson2
Copy link
Member

Related Issues/PRs

#xxx

What changes are proposed in this pull request?

Add basic serialization (save_model, load_model, log_model) and signature default assignment for the sentence-transformers package.

How is this patch tested?

  • Existing unit/integration tests
  • New unit/integration tests
  • Manual tests (describe details, including test results, below)

Does this PR change the documentation?

  • No. You can skip the rest of this section.
  • Yes. Make sure the changed pages / sections render correctly in the documentation preview.

Release Notes

Is this a user-facing change?

  • No. You can skip the rest of this section.
  • Yes. Give a description of this change to be included in the release notes for MLflow users.

Introduce a sentence-transformers flavor.

What component(s), interfaces, languages, and integrations does this PR affect?

Components

  • area/artifacts: Artifact stores and artifact logging
  • area/build: Build and test infrastructure for MLflow
  • area/docs: MLflow documentation pages
  • area/examples: Example code
  • area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
  • area/models: MLmodel format, model serialization/deserialization, flavors
  • area/recipes: Recipes, Recipe APIs, Recipe configs, Recipe Templates
  • area/projects: MLproject format, project running backends
  • area/scoring: MLflow Model server, model deployment tools, Spark UDFs
  • area/server-infra: MLflow Tracking server backend
  • area/tracking: Tracking Service, tracking client APIs, autologging

Interface

  • area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
  • area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
  • area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
  • area/windows: Windows support

Language

  • language/r: R APIs and clients
  • language/java: Java APIs and clients
  • language/new: Proposals for new client languages

Integrations

  • integrations/azure: Azure and Azure ML integrations
  • integrations/sagemaker: SageMaker integrations
  • integrations/databricks: Databricks integrations

How should the PR be classified in the release notes? Choose one:

  • rn/breaking-change - The PR will be mentioned in the "Breaking Changes" section
  • rn/none - No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" section
  • rn/feature - A new user-facing feature worth mentioning in the release notes
  • rn/bug-fix - A user-facing bug fix worth mentioning in the release notes
  • rn/documentation - A user-facing documentation change worth mentioning in the release notes

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
@github-actions github-actions bot added area/models MLmodel format, model serialization/deserialization, flavors rn/feature Mention under Features in Changelogs. labels May 20, 2023
@mlflow-automation
Copy link
Collaborator

mlflow-automation commented May 20, 2023

Documentation preview for ae418dd will be available here when this CircleCI job completes successfully.

More info

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
extra_pip_requirements: Optional[Union[List[str], str]] = None,
conda_env=None,
metadata: Dict[str, Any] = None,
**kwargs,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The **kwargs is not used, shall we remove this argument ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I forgot to remove that during evaluating ser/deser behavior.

:param model: A trained ``sentence-transformers`` model.
:param path: Local path destination for the serialized model to be saved.
:param inference_config:
A dict of valid overrides that can be applied to a ``sentence-transformer`` model instance
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
A dict of valid overrides that can be applied to a ``sentence-transformer`` model instance
A dict of valid inference configs that can be applied to a ``sentence-transformer`` model instance and override default inference configs


model.save(str(model_data_path))

pyfunc.add_to_model(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also define def _load_pyfunc(path) for this flavor.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll have a follow-on PR for pyfunc implementation

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
Copy link
Collaborator

@WeichenXu123 WeichenXu123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@BenWilson2 BenWilson2 merged commit fcbd01e into mlflow:master May 23, 2023
26 checks passed
@BenWilson2 BenWilson2 deleted the sentence-transformers branch May 23, 2023 13:01
BenWilson2 added a commit to BenWilson2/mlflow that referenced this pull request May 23, 2023
* WIP

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>

* Add basic serialization functionality for sentence-transformers

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>

* fix docs linting

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>

* fix lint and test

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>

* remove useless kwargs entries

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>

---------

Signed-off-by: Ben Wilson <benjamin.wilson@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/models MLmodel format, model serialization/deserialization, flavors rn/feature Mention under Features in Changelogs.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants