New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor deployment docs and tutorials #10726
Conversation
Documentation preview for 32c1d40 will be available here when this CircleCI job completes successfully. More info
|
@@ -124,24 +124,19 @@ MLflow offers support for a variety of deployment targets. For detailed informat | |||
</a> | |||
</div> | |||
<div class="simple-card"> | |||
<a href="deploy-model-to-ray-serve.html"> | |||
<a href="../plugins.html#deployment-plugins"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ray serve plugin is not actively maintained. We will probably replace this with deployment guide for LLMs, but for the time being just link to community plugins.
Please open the experient named "wine-quality" on the left, then click the run named "default-params" in the table. | ||
For this case, you should see parameters including ``alpha`` and ``l1_ratio`` and metrics like ``training_score`` and ``mean_absolute_error_X_test``. | ||
|
||
Step 4: Running Hyperparameter Tuning |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this step necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I initially had the same thought for the original tutorial, but ended up not removing this and positioned this page as an end-to-end tutorial. The main reason is that pure deployment steps are pretty much covered by the partner's docs, so the main audience I expect here is ppl who already know about k8s but not much about MLflow. Hence I thought it's not a bad idea to stretch a bit to demonstrate MLflow capability in realistic scenario - single training run doesn't look nice in the UI indeed. For ppl who already have knowledge about MLflow, I added an info box to tell skip to Step 6:)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that giving the UI some runs is a good idea :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, let's keep this step
================================= | ||
Using MLServer as the Inference Server | ||
-------------------------------------- | ||
By default, MLflow deployment uses `Flask <https://flask.palletsprojects.com/en/1.1.x/>`_, a widely used WSGI web application framework for Python, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be nice to provide either a link or a brief description of what WSGI is (it can help to inform why we're saying this about Flask :) ) since the reading audience might not be familiar with the differences between a standard Web Server Gateway Interface and something else (like ASGI or the built-in optimizations for inference serving that MLServer from Seldon has).
What do you think about some brief education for readers on these topics so that they know why we're talking about and supporting MLServer on k8s?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comparison is briefly described here in local deployment guide: https://output.circle-artifacts.com/output/job/0576e156-79b5-48d6-a81d-8ceafb03de8b/artifacts/0/docs/build/html/deployment/deploy-model-locally.html#serving-frameworks
But I didn't add low-level details on why, I will write a bit more specific details there and put link to it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perfect! :D
|
||
MLflow provides an easy-to-use interface for deploying modeles as a Flask-based inference server. You can deploy the same inference | ||
server to a Kubernetes cluster by containerizing it using the ``mlflow models build-docker`` command. However, this approach may not be scalable | ||
and could be unsuitable for production use cases. Flask is not designed for high performance, and manually managing multiple instances of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we say why Flask isn't ideally suited for ML inference (it's blocking) and how async gateways are far more optimized due to the potentially long-running nature of inference (depending on the model architecture, size, and optimization of the underlying library) and issues with scalability with a synchronous blocking web gateway?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure! Will add brief explanation on why.
|
||
warnings.filterwarnings("ignore") | ||
|
||
alphas = [0.2, 0.5, 1.0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Grid Search is terrible. Can we do a Random Search instead?
A fun paper to read on the topic: https://www.jmlr.org/papers/volume13/bergstra12a/bergstra12a.pdf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah no reason not doing it:)
|
||
.. code-block:: python | ||
|
||
from itertools import product |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html instead? Autologging automatically creates child runs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure (will use RandomizedSearchCV as per above comment)
|
||
Step 4: Running Hyperparameter Tuning | ||
------------------------------------- | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might want to add a ref to https://www.mlflow.org/docs/latest/traditional-ml/hyperparameter-tuning-with-child-runs/notebooks/hyperparameter-tuning-with-child-runs.html
For people who want to see a more in-depth guide on hyperparameter tuning
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great idea, will add!
|
||
mlflow models serve -m runs:/<run_id_for_your_best_run>/model -p 1234 --enable-mlserver | ||
|
||
This command starts a local server listening on port 1234. You can send a request to the server using ``curl`` command: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This command starts a local server listening on port 1234. You can send a request to the server using ``curl`` command: | |
This command starts a local server listening on port 1234. You can send a request to the server using a ``curl`` command: |
For this tutorial, we'll push the image to `Docker Hub <https://hub.docker.com/>`_, but you can use any other Docker registry, | ||
such as `Amazon ECR <https://aws.amazon.com/ecr/>`_ or private registry. | ||
|
||
If you don't have a Docker Hub account yet, create one at https://hub.docker.com/signup. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
link
By default, MLflow stores the model in the local file system, so you need to configure MLflow to store the model in remote storage. | ||
Please refer to `Artifact Store <../../../tracking.html#artifact-stores>`_ for setup instructions. | ||
|
||
After configuring the artifact store, repeat the model training steps. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or load the best model from the model uri and re-log it from its in-memory object?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fantastic work here @B-Step62 !! This is a HUGE improvement with a ton of great detail and a very easy to read step by step guide to something that is quite complex for most users.
After addressing the remaining comments, let's get this merged so we can push it out with the next site push (probably early next week when I get time :) )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
…ks guides (mlflow#10675) Signed-off-by: B-Step62 <yuki.watanabe@databricks.com> Signed-off-by: Yuki Watanabe <31463517+B-Step62@users.noreply.github.com> Co-authored-by: Ben Wilson <39283302+BenWilson2@users.noreply.github.com> Co-authored-by: Serena Ruan <82044803+serena-ruan@users.noreply.github.com>
Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>
Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>
Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>
Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>
Co-authored-by: Harutaka Kawamura <hkawamura0130@gmail.com> Signed-off-by: Yuki Watanabe <31463517+B-Step62@users.noreply.github.com>
Co-authored-by: Ben Wilson <39283302+BenWilson2@users.noreply.github.com> Signed-off-by: Yuki Watanabe <31463517+B-Step62@users.noreply.github.com>
Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>
🛠 DevTools 🛠
Install mlflow from this PR
Checkout with GitHub CLI
Related Issues/PRs
#xxxWhat changes are proposed in this pull request?
Follow-up on #10675. Refactoring the existing Kubernetes deployment guide to include concrete and runnable steps.
How is this PR tested?
Does this PR require documentation update?
Release Notes
Is this a user-facing change?
What component(s), interfaces, languages, and integrations does this PR affect?
Components
area/artifacts
: Artifact stores and artifact loggingarea/build
: Build and test infrastructure for MLflowarea/deployments
: MLflow Deployments client APIs, server, and third-party Deployments integrationsarea/docs
: MLflow documentation pagesarea/examples
: Example codearea/model-registry
: Model Registry service, APIs, and the fluent client calls for Model Registryarea/models
: MLmodel format, model serialization/deserialization, flavorsarea/recipes
: Recipes, Recipe APIs, Recipe configs, Recipe Templatesarea/projects
: MLproject format, project running backendsarea/scoring
: MLflow Model server, model deployment tools, Spark UDFsarea/server-infra
: MLflow Tracking server backendarea/tracking
: Tracking Service, tracking client APIs, autologgingInterface
area/uiux
: Front-end, user experience, plotting, JavaScript, JavaScript dev serverarea/docker
: Docker use across MLflow's components, such as MLflow Projects and MLflow Modelsarea/sqlalchemy
: Use of SQLAlchemy in the Tracking Service or Model Registryarea/windows
: Windows supportLanguage
language/r
: R APIs and clientslanguage/java
: Java APIs and clientslanguage/new
: Proposals for new client languagesIntegrations
integrations/azure
: Azure and Azure ML integrationsintegrations/sagemaker
: SageMaker integrationsintegrations/databricks
: Databricks integrationsHow should the PR be classified in the release notes? Choose one:
rn/none
- No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" sectionrn/breaking-change
- The PR will be mentioned in the "Breaking Changes" sectionrn/feature
- A new user-facing feature worth mentioning in the release notesrn/bug-fix
- A user-facing bug fix worth mentioning in the release notesrn/documentation
- A user-facing documentation change worth mentioning in the release notes