-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add warning in MLflow pytorch docs to include signature #5347
Conversation
Add one line in the model signature introduction section and add link to detailed section in the introduction. Signed-off-by: Yogita Mehta <yogita.mehta@databricks.com>
Signed-off-by: Yogita Mehta <yogita.mehta@databricks.com>
…g sphinx build locally. Signed-off-by: Yogita Mehta <yogita.mehta@databricks.com>
…e while logging model to avoid float precision errors. Signed-off-by: Yogita Mehta <yogita.mehta@databricks.com>
Signed-off-by: Yogita Mehta <yogita.mehta@databricks.com>
Signed-off-by: Yogita Mehta <yogita.mehta@databricks.com>
Signed-off-by: Yogita Mehta <yogita.mehta@databricks.com>
Signed-off-by: Yogita Mehta <yogita.mehta@databricks.com>
autoformat |
Signed-off-by: mlflow-automation <mlflow-automation@users.noreply.github.com>
Signed-off-by: Yogita Mehta <yogita.mehta@databricks.com>
Signed-off-by: Yogita Mehta <yogita.mehta@databricks.com>
autoformat |
Signed-off-by: Yogita Mehta <yogita.mehta@databricks.com>
autoformat |
Signed-off-by: mlflow-automation <mlflow-automation@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good start - a few notes
mlflow/pytorch/__init__.py
Outdated
.. warning:: | ||
|
||
Log the model with signature to avoid inference errors. Pytorch float precision default | ||
is float32, while numpy float precision default is float64. Adding the signature will |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would maybe re-frame this a bit and say:
For models without signatures, the MLflow Model Server relies on the default inferred data type from NumPy. However, PyTorch often expects different defaults, particularly when parsing floats.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated as suggested.
Signed-off-by: Yogita Mehta <yogita.mehta@databricks.com>
Signed-off-by: Yogita Mehta <yogita.mehta@databricks.com>
Signed-off-by: Yogita Mehta <yogita.mehta@databricks.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One minor nit, but looks great otherwise!
Signed-off-by: Yogita Mehta <yogita.mehta@databricks.com>
mlflow/pytorch/__init__.py
Outdated
For models without signatures, the MLflow Model Server relies on the default inferred | ||
data type from NumPy. However, PyTorch often expects different defaults, particularly | ||
when parsing floats. Include the signature to ensure that the model is logged with the | ||
correct data type so that the MLflow model server can correctly provide valid input |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
correct data type so that the MLflow model server can correctly provide valid input | |
correct data type so that the MLflow model server correctly provides valid input |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to clarify @andreakress - one thing I think would be good to emphasize is that by logging a signature, the user is making it possible for the model server to provide valid input. Correctly inferring the correct data types without the signature is an impossible problem. Maybe just me, but, in your suggested phrasing, it kind of feels like we're saying that there's a bug where it won't provide it correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WDYT of something like this:
If the model is logged without a signature, the MLflow Model Server relies on the default inferred data type from NumPy. However, PyTorch often expects different defaults, particularly when parsing floats. You must include the signature to ensure that the model is logged with the correct data type so that the MLflow model server can correctly provide valid input.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for helping iterate on this one!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved with one suggestion.
Signed-off-by: Yogita Mehta <yogita.mehta@databricks.com>
What changes are proposed in this pull request?
Updates MLFlow pytorch documentation to add warning suggesting to add signature while logging model to avoid float precision errors.
How is this patch tested?
Building docs locally and verifying the change.
Does this PR change the documentation?
ci/circleci: build_doc
check. If it's successful, proceed to thenext step, otherwise fix it.
Details
on the right to open the job page of CircleCI.Artifacts
tab.docs/build/html/index.html
.Release Notes
Is this a user-facing change?
Updates the MLFlow pytorch documentation.
What component(s), interfaces, languages, and integrations does this PR affect?
Components
area/artifacts
: Artifact stores and artifact loggingarea/build
: Build and test infrastructure for MLflowarea/docs
: MLflow documentation pagesarea/examples
: Example codearea/model-registry
: Model Registry service, APIs, and the fluent client calls for Model Registryarea/models
: MLmodel format, model serialization/deserialization, flavorsarea/projects
: MLproject format, project running backendsarea/scoring
: MLflow Model server, model deployment tools, Spark UDFsarea/server-infra
: MLflow Tracking server backendarea/tracking
: Tracking Service, tracking client APIs, autologgingInterface
area/uiux
: Front-end, user experience, plotting, JavaScript, JavaScript dev serverarea/docker
: Docker use across MLflow's components, such as MLflow Projects and MLflow Modelsarea/sqlalchemy
: Use of SQLAlchemy in the Tracking Service or Model Registryarea/windows
: Windows supportLanguage
language/r
: R APIs and clientslanguage/java
: Java APIs and clientslanguage/new
: Proposals for new client languagesIntegrations
integrations/azure
: Azure and Azure ML integrationsintegrations/sagemaker
: SageMaker integrationsintegrations/databricks
: Databricks integrationsHow should the PR be classified in the release notes? Choose one:
rn/breaking-change
- The PR will be mentioned in the "Breaking Changes" sectionrn/none
- No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" sectionrn/feature
- A new user-facing feature worth mentioning in the release notesrn/bug-fix
- A user-facing bug fix worth mentioning in the release notesrn/documentation
- A user-facing documentation change worth mentioning in the release notes