-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add model signature #2698
Merged
tomasatdatabricks
merged 68 commits into
mlflow:master
from
tomasatdatabricks:tomas_model_signature
Apr 24, 2020
Merged
Add model signature #2698
tomasatdatabricks
merged 68 commits into
mlflow:master
from
tomasatdatabricks:tomas_model_signature
Apr 24, 2020
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…mple installs mlflow from pip)
Is this ready for review? |
aarondav
added
the
needs author feedback
Issue is waiting for the author to respond
label
Apr 15, 2020
It's ready for review. I am still fixing 2 test issues but that is separate. |
aarondav
removed
the
needs author feedback
Issue is waiting for the author to respond
label
Apr 15, 2020
aarondav
added
the
needs author feedback
Issue is waiting for the author to respond
label
Apr 15, 2020
aarondav
reviewed
Apr 23, 2020
aarondav
reviewed
Apr 23, 2020
aarondav
reviewed
Apr 23, 2020
.. automodule:: mlflow.models.utils | ||
:members: | ||
:undoc-members: | ||
:show-inheritance: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aarondav
reviewed
Apr 23, 2020
.. automodule:: mlflow.types.schema | ||
:members: | ||
:undoc-members: | ||
:show-inheritance: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aarondav
reviewed
Apr 23, 2020
aarondav
approved these changes
Apr 24, 2020
avflor
pushed a commit
to avflor/mlflow
that referenced
this pull request
Aug 22, 2020
* Added model signatyres and examples. * wip * Updated tests, cleanup * fix * minor fix * fix and cleanup * cleanup * nits * Removed changes in example (does not work at this point cause the example installs mlflow from pip) * Reverted changes in examples. * fix * lint * Updated docstring. * updated module docstring * lint * fix * lint * Fixed windows tests. * lint * debug * debug * debug * fix * lint * lint * Fix Java Model * Added a test for model to yaml and model to json. * lint * Uncommented nightly * fix * Addressed some early review feedback. * Addressed review comments. * addressed review comments. * addressed review comments. * addressed review comments. * fix broken import * Addressed review comments. * Addressed some review comments * refactored schema out to the top level * Updated comments and in code documentation * fixed imports * Fixed imports * lint * lint * Renamed input example file to just input_example.json * Do not output signature as null if there is no signature * Improve repr ofmodel signature * Addressed review comments. * lint * Fixed tests. * Fixed docs. * lint * lint * fixed test * Fixed windows test * Fixed project test * Fixed tests * Addressed review comments. * Updated docs. Cleaned up saving example. * addressed review comments. * Reverted unwanted change. * Fixed typo in a comment * lint * lint * lint * Addressed review comments. * lint
37 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What changes are proposed in this pull request?
This PR introduces signatures and input examples for MLflow models and allows users to log them with their models.
Model signature defines schema of model input and output. The schema is based on a simplified data
frame model defined by a list of (optionally) named and (mandatory) typed columns. The currently
supported data types consists of basic scalar types (bool, integer, long, float, double) and string and binary string. Neither arrays (and multidimensional arrays) nor structured types (struct, map) are supported at this time.
Input example specifies one or more valid inputs (observations) to the model. The example is stored
as json and so the examples are limited to jsonable content with the exception of binary data
provided the signature is defined. If the example includes binary data and model signature is
defined, the binary data is base64 encoded before producing output.
User can construct schema by hand or infer it from a dataset. Input examples are provided as is
(pandas.DataFrame, numpy.ndarray, dictinary or list) and are json serialized and stored as part of
the model artifacts.
How is this patch tested?
Unit tests in tests/models/test_model_signatures_and_examples.
Release Notes
Added definition and utilities to allow users to include model signature definition as well as example(s) of input data with their models. Both can be inspected at model deployment time and provide additional information about the model interface. In addition, model signatures can be used by MLflow model deployment e.g. when parsing data (json) or producing typed output (spark udf).
This PR introduces signatures and input examples for MLflow models and allows users to log them with their models.
Model signature defines schema of model input and output. The schema is based on a simplified data
frame model defined by a list of (optionally) named and (mandatory) typed columns. The currently
supported data types consists of basic scalar types (bool, integer, long, float, double) and string and binary string. Neither arrays (and multidimensional arrays) nor structured types (struct, map) are supported at this time.
Input example specifies one or more valid inputs (observations) to the model. The example is stored
as json and so the examples are limited to jsonable content with the exception of binary data
provided the signature is defined. If the example includes binary data and model signature is
defined, the binary data is base64 encoded before producing output.
User can construct schema by hand or infer it from a dataset. Input examples are provided as is
(pandas.DataFrame, numpy.ndarray, dictinary or list) and are json serialized and stored as part of
the model artifacts.
Is this a user-facing change?
Users can now include model signature definition and/ or input example data with model logged with mlflow. These can provide additional information to the users of the model as well as model utilities (safer parsing, type checking).
What component(s) does this PR affect?
How should the PR be classified in the release notes? Choose one:
rn/breaking-change
- The PR will be mentioned in the "Breaking Changes" sectionrn/none
- No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" sectionrn/feature
- A new user-facing feature worth mentioning in the release notesrn/bug-fix
- A user-facing bug fix worth mentioning in the release notesrn/documentation
- A user-facing documentation change worth mentioning in the release notes