Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add model signature #2698

Merged

Conversation

tomasatdatabricks
Copy link
Contributor

@tomasatdatabricks tomasatdatabricks commented Apr 13, 2020

What changes are proposed in this pull request?

This PR introduces signatures and input examples for MLflow models and allows users to log them with their models.

Model signature defines schema of model input and output. The schema is based on a simplified data
frame model defined by a list of (optionally) named and (mandatory) typed columns. The currently
supported data types consists of basic scalar types (bool, integer, long, float, double) and string and binary string. Neither arrays (and multidimensional arrays) nor structured types (struct, map) are supported at this time.

Input example specifies one or more valid inputs (observations) to the model. The example is stored
as json and so the examples are limited to jsonable content with the exception of binary data
provided the signature is defined. If the example includes binary data and model signature is
defined, the binary data is base64 encoded before producing output.

User can construct schema by hand or infer it from a dataset. Input examples are provided as is
(pandas.DataFrame, numpy.ndarray, dictinary or list) and are json serialized and stored as part of
the model artifacts.

How is this patch tested?

Unit tests in tests/models/test_model_signatures_and_examples.

Release Notes

Added definition and utilities to allow users to include model signature definition as well as example(s) of input data with their models. Both can be inspected at model deployment time and provide additional information about the model interface. In addition, model signatures can be used by MLflow model deployment e.g. when parsing data (json) or producing typed output (spark udf).

This PR introduces signatures and input examples for MLflow models and allows users to log them with their models.

Model signature defines schema of model input and output. The schema is based on a simplified data
frame model defined by a list of (optionally) named and (mandatory) typed columns. The currently
supported data types consists of basic scalar types (bool, integer, long, float, double) and string and binary string. Neither arrays (and multidimensional arrays) nor structured types (struct, map) are supported at this time.

Input example specifies one or more valid inputs (observations) to the model. The example is stored
as json and so the examples are limited to jsonable content with the exception of binary data
provided the signature is defined. If the example includes binary data and model signature is
defined, the binary data is base64 encoded before producing output.

User can construct schema by hand or infer it from a dataset. Input examples are provided as is
(pandas.DataFrame, numpy.ndarray, dictinary or list) and are json serialized and stored as part of
the model artifacts.

Is this a user-facing change?

  • No. You can skip the rest of this section.
  • Yes. Give a description of this change to be included in the release notes for MLflow users.

Users can now include model signature definition and/ or input example data with model logged with mlflow. These can provide additional information to the users of the model as well as model utilities (safer parsing, type checking).

What component(s) does this PR affect?

  • UI
  • CLI
  • API
  • REST-API
  • Examples
  • Docs
  • Tracking
  • Projects
  • Artifacts
  • Models
  • Model Registry
  • Scoring
  • Serving
  • R
  • Java
  • Python

How should the PR be classified in the release notes? Choose one:

  • rn/breaking-change - The PR will be mentioned in the "Breaking Changes" section
  • rn/none - No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" section
  • rn/feature - A new user-facing feature worth mentioning in the release notes
  • rn/bug-fix - A user-facing bug fix worth mentioning in the release notes
  • rn/documentation - A user-facing documentation change worth mentioning in the release notes

@tomasatdatabricks tomasatdatabricks changed the title [WIP] Add model signature Add model signature Apr 14, 2020
@aarondav
Copy link
Contributor

Is this ready for review?

@aarondav aarondav added the needs author feedback Issue is waiting for the author to respond label Apr 15, 2020
@tomasatdatabricks
Copy link
Contributor Author

It's ready for review. I am still fixing 2 test issues but that is separate.

@aarondav aarondav removed the needs author feedback Issue is waiting for the author to respond label Apr 15, 2020
@aarondav aarondav added the needs author feedback Issue is waiting for the author to respond label Apr 15, 2020
.. automodule:: mlflow.models.utils
:members:
:undoc-members:
:show-inheritance:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documentation looks weird around the experimental warning:

image

.. automodule:: mlflow.types.schema
:members:
:undoc-members:
:show-inheritance:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't look production-ready:

image

@tomasatdatabricks tomasatdatabricks merged commit 00164c3 into mlflow:master Apr 24, 2020
@smurching smurching added the rn/feature Mention under Features in Changelogs. label Jun 19, 2020
avflor pushed a commit to avflor/mlflow that referenced this pull request Aug 22, 2020
* Added model signatyres and examples.

* wip

* Updated tests, cleanup

* fix

* minor fix

* fix and cleanup

* cleanup

* nits

* Removed changes in example (does not work at this point cause the example installs mlflow from pip)

* Reverted changes in examples.

* fix

* lint

* Updated docstring.

* updated module docstring

* lint

* fix

* lint

* Fixed windows tests.

* lint

* debug

* debug

* debug

* fix

* lint

* lint

* Fix Java Model

* Added a test for model to yaml and model to json.

* lint

* Uncommented nightly

* fix

* Addressed some early review feedback.

* Addressed review comments.

* addressed review comments.

* addressed review comments.

* addressed review comments.

* fix broken import

* Addressed review comments.

* Addressed some review comments

* refactored schema out to the top level

* Updated comments and in code documentation

* fixed imports

* Fixed imports

* lint

* lint

* Renamed input example file to just input_example.json

* Do not output signature as null if there is no signature

* Improve repr ofmodel signature

* Addressed review comments.

* lint

* Fixed tests.

* Fixed docs.

* lint

* lint

* fixed test

* Fixed windows test

* Fixed project test

* Fixed tests

* Addressed review comments.

* Updated docs. Cleaned up saving example.

* addressed review comments.

* Reverted unwanted change.

* Fixed typo in a comment

* lint

* lint

* lint

* Addressed review comments.

* lint
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
rn/feature Mention under Features in Changelogs.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants