New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve the code of quickstart 2 - hyperparam search #10344
Improve the code of quickstart 2 - hyperparam search #10344
Conversation
Documentation preview for b22b208 will be available here when this CircleCI job completes successfully. More info
|
1b9d2cb
to
ee3b160
Compare
ee3b160
to
5338ba3
Compare
1ba6390
to
cb814c1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, just have some comments about cleaning up
@@ -24,16 +27,18 @@ Set up | |||
------ | |||
|
|||
- Install MLflow. See the :ref:`introductory quickstart <quickstart-1>` for instructions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to display :ref:
to the user?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea, that's a weird sphinx stuff
@@ -76,68 +79,79 @@ Now load the dataset and split it into training, validation, and test sets. | |||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't part of this change, but I was wondering if we should do this cleaner?
- Split the data into input and label first before running the splitting instead of in the middle
- Or shuffle and split manually based on a predefined split percentage (60,15,25 etc.) instead of using two train_test_splits since in this case the proportion of the dataset allocated to each is not immediately apparent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed, it would be more clear
# Evaluate the model | ||
predicted_qualities = model.predict(test_x) | ||
rmse = np.sqrt(mean_squared_error(test_y, predicted_qualities)) | ||
eval_result = model.evaluate(test_x, test_y, batch_size=64) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for hyperparameter tuning, we shouldn't encourage the user to optimize the objective function in relation to the test dataset, but rather the validation dataset. Optimizing on the test dataset is in some sense polluting the results of testing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can optionally use the test data at the end of optimization to see how well the 'best' model actually does
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very good call, fixed
.. image:: ../../_static/images/quickstart_mlops/mlflow_registry_transitions.png | ||
:width: 800px | ||
:align: center | ||
:alt: Screenshot of MLflow tracking UI models page showing the registered model |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this image is not rendering
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice catch
(Note that specifying the port as above will be necessary if you are running the tracking server on the | ||
same machine at the default port of **5000**.) | ||
|
||
You could also have used a ``runs:/<run_id>`` URI to serve a model, or any supported URI described in :ref:`artifact-stores`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this :ref: is also rendered to the user, not sure if we want that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
on the website that will be correct, the md preview is not working 100% well with sphinx.
Signed-off-by: chenmoneygithub <chen.qian@databricks.com>
Signed-off-by: chenmoneygithub <chen.qian@databricks.com>
cb814c1
to
96a527a
Compare
Signed-off-by: chenmoneygithub <chen.qian@databricks.com>
Transition the model to **Staging** by choosing the **Stage** dropdown: | ||
|
||
.. image:: ../../_static/images/quickstart_mlops/register_model_button.png | ||
:width: 800px |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we set this to a percentage value? 70% at full screen will render this to roughly what 800px would be. Fixed width sizes have issues with screen resizing.
馃洜 DevTools 馃洜
Install mlflow from this PR
Checkout with GitHub CLI
Related Issues/PRs
#xxxWhat changes are proposed in this pull request?
A few improvements, mostly centered around code part:
How is this PR tested?
Does this PR require documentation update?
Release Notes
Is this a user-facing change?
What component(s), interfaces, languages, and integrations does this PR affect?
Components
area/artifacts
: Artifact stores and artifact loggingarea/build
: Build and test infrastructure for MLflowarea/docs
: MLflow documentation pagesarea/examples
: Example codearea/gateway
: AI Gateway service, Gateway client APIs, third-party Gateway integrationsarea/model-registry
: Model Registry service, APIs, and the fluent client calls for Model Registryarea/models
: MLmodel format, model serialization/deserialization, flavorsarea/recipes
: Recipes, Recipe APIs, Recipe configs, Recipe Templatesarea/projects
: MLproject format, project running backendsarea/scoring
: MLflow Model server, model deployment tools, Spark UDFsarea/server-infra
: MLflow Tracking server backendarea/tracking
: Tracking Service, tracking client APIs, autologgingInterface
area/uiux
: Front-end, user experience, plotting, JavaScript, JavaScript dev serverarea/docker
: Docker use across MLflow's components, such as MLflow Projects and MLflow Modelsarea/sqlalchemy
: Use of SQLAlchemy in the Tracking Service or Model Registryarea/windows
: Windows supportLanguage
language/r
: R APIs and clientslanguage/java
: Java APIs and clientslanguage/new
: Proposals for new client languagesIntegrations
integrations/azure
: Azure and Azure ML integrationsintegrations/sagemaker
: SageMaker integrationsintegrations/databricks
: Databricks integrationsHow should the PR be classified in the release notes? Choose one:
rn/none
- No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" sectionrn/breaking-change
- The PR will be mentioned in the "Breaking Changes" sectionrn/feature
- A new user-facing feature worth mentioning in the release notesrn/bug-fix
- A user-facing bug fix worth mentioning in the release notesrn/documentation
- A user-facing documentation change worth mentioning in the release notes