better

Signed-off-by: chenmoneygithub <chen.qian@databricks.com>
mlflow · Nov 9, 2023 · ee3b160 · ee3b160
1 parent 4f70869
commit ee3b160
Showing 1 changed file with 93 additions and 48 deletions.
diff --git a/docs/source/getting-started/quickstart-2/index.rst b/docs/source/getting-started/quickstart-2/index.rst
@@ -12,7 +12,10 @@ In this quickstart, you will:
 - Deploy the model to a REST API
 - Build a container image suitable for deployment to a cloud platform
 
-As an ML Engineer or MLOps professional, you can use MLflow to compare, share, and deploy the best models produced by the team. In this quickstart, you will use the MLflow Tracking UI to compare the results of a hyperparameter sweep, choose the best run, and register it as a model. Then, you will deploy the model to a REST API. Finally, you will create a Docker container image suitable for deployment to a cloud platform.
+As an ML Engineer or MLOps professional, you can use MLflow to compare, share, and deploy the best models produced 
+by the team. In this quickstart, you will use the MLflow Tracking UI to compare the results of a hyperparameter 
+sweep, choose the best run, and register it as a model. Then, you will deploy the model to a REST API. Finally, 
+you will create a Docker container image suitable for deployment to a cloud platform.
 
 .. image:: ../../_static/images/quickstart/quickstart_tracking_overview.png
     :width: 800px
@@ -24,16 +27,18 @@ Set up
 ------
 
 - Install MLflow. See the :ref:`introductory quickstart <quickstart-1>` for instructions
-- Run the tracking server: ``mlflow server``
+- Run the tracking server: ``mlflow ui --port 5000``
 
 Run a hyperparameter sweep
 --------------------------
 
-This example tries to optimize the RMSE metric of a Keras deep learning model on a wine quality dataset. It has two hyperparameters that it tries to optimize: ``learning-rate`` and ``momentum``.
+This example tries to optimize the RMSE metric of a Keras deep learning model on a wine quality dataset. It has 
+two hyperparameters that it tries to optimize: ``learning_rate`` and ``momentum``. We will use the 
+`Hyperopt <https://github.com/hyperopt/hyperopt>`_ library to run a hyperparameter sweep across 
+different values of ``learning_rate`` and ``momentum`` and record the results in MLflow. 
 
-We will use the `Hyperopt <https://github.com/hyperopt/hyperopt>`_ library to run a hyperparameter sweep across different values of ``learning-rate`` and ``momentum`` and record the results in MLflow. 
-
-Run the hyperparameter sweep, setting the ``MLFLOW_TRACKING_URI`` environment variable to the URI of the MLflow tracking server:
+Before running the hyperparameter sweep, let's set the ``MLFLOW_TRACKING_URI`` environment variable to the URI of 
+our MLflow tracking server:
 
 .. code-block:: bash
 
@@ -43,14 +48,12 @@ Import the following packages
 
 .. code-block:: python
 
+    import keras
     import numpy as np
     import pandas as pd
     from hyperopt import STATUS_OK, Trials, fmin, hp, tpe
     from sklearn.metrics import mean_squared_error
     from sklearn.model_selection import train_test_split
-    from tensorflow.keras.layers import Dense, Lambda
-    from tensorflow.keras.models import Sequential
-    from tensorflow.keras.optimizers import SGD
 
     import mlflow
     from mlflow.models import infer_signature
@@ -76,68 +79,75 @@ Now load the dataset and split it into training, validation, and test sets.
     )
     signature = infer_signature(train_x, train_y)
 
-Then, define the model architecture and train the model. The ``train_model`` function uses MLflow to track the parameters, results, and model itself of each trial as a child run. 
+Then let's define the model architecture and train the model. The ``train_model`` function uses MLflow to track the 
+parameters, results, and model itself of each trial as a child run. 
 
 .. code-block:: python
 
-    def train_model(params, train_x, train_y, valid_x, valid_y, test_x, test_y, epochs):
+    def train_model(params, epochs, train_x, train_y, valid_x, valid_y, test_x, test_y):
         # Define model architecture
-        model = Sequential()
-        model.add(
-            Lambda(lambda x: (x - np.mean(train_x, axis=0)) / np.std(train_x, axis=0))
+        model = keras.Sequential(
+            [
+                keras.Input([train_x.shape[1]]),
+                keras.layers.Normalization(mean=np.mean(train_x), variance=np.var(train_x)),
+                keras.layers.Dense(64, activation="relu"),
+                keras.layers.Dense(1),
+            ]
         )
-        model.add(Dense(64, activation="relu", input_shape=(train_x.shape[1],)))
-        model.add(Dense(1))
 
         # Compile model
         model.compile(
-            optimizer=SGD(lr=params["lr"], momentum=params["momentum"]),
+            optimizer=keras.optimizers.SGD(
+                learning_rate=params["lr"], momentum=params["momentum"]
+            ),
             loss="mean_squared_error",
+            metrics=[keras.metrics.RootMeanSquaredError()],
         )
 
         # Train model with MLflow tracking
         with mlflow.start_run(nested=True):
-            # Fit model
             model.fit(
                 train_x,
                 train_y,
                 validation_data=(valid_x, valid_y),
                 epochs=epochs,
-                verbose=0,
+                batch_size=64,
             )
-
             # Evaluate the model
-            predicted_qualities = model.predict(test_x)
-            rmse = np.sqrt(mean_squared_error(test_y, predicted_qualities))
+            eval_result = model.evaluate(test_x, test_y, batch_size=64)
+            eval_rmse = eval_result[1]
 
             # Log parameters and results
             mlflow.log_params(params)
-            mlflow.log_metric("rmse", rmse)
+            mlflow.log_metric("eval_rmse", eval_rmse)
 
             # Log model
             mlflow.tensorflow.log_model(model, "model", signature=signature)
 
-            return {"loss": rmse, "status": STATUS_OK, "model": model}
+            return {"eval_rmse": eval_rmse, "status": STATUS_OK, "model": model}
+
 
-The ``objective`` function takes in the hyperparameters and returns the results of the ``train_model`` function for that set of hyperparameters.
+The ``objective`` function takes in the hyperparameters and returns the results of the ``train_model`` 
+function for that set of hyperparameters.
 
 .. code-block:: python
 
     def objective(params):
         # MLflow will track the parameters and results for each run
         result = train_model(
             params,
+            epochs=3,
             train_x=train_x,
             train_y=train_y,
             valid_x=valid_x,
             valid_y=valid_y,
             test_x=test_x,
             test_y=test_y,
-            epochs=32,  # Or any other number of epochs
         )
         return result
 
-Next, we will define the search space for Hyperopt. In this case, we want to try different values of ``learning-rate`` and ``momentum``. 
+Next, we will define the search space for Hyperopt. In this case, we want to try different values of 
+``learning-rate`` and ``momentum``. Hyperopt randomly picks a value from the range we define for ``lr`` and ``momentum``.
 
 .. code-block:: python
 
@@ -146,7 +156,9 @@ Next, we will define the search space for Hyperopt. In this case, we want to try
         "momentum": hp.uniform("momentum", 0.0, 1.0),
     }
 
-Finally, we will run the hyperparameter sweep using Hyperopt, passing in the ``objective`` function and search space. Hyperopt will try different hyperparameter combinations and return the results of the best one. We will store the best parameters, model, and rmse in MLflow.
+Finally, we will run the hyperparameter sweep using Hyperopt, passing in the ``objective`` function and search space. 
+Hyperopt will try different hyperparameter combinations and return the results of the best one. We will 
+store the best parameters, model, and evaluation metrics in MLflow.
 
 .. code-block:: python
 
@@ -157,35 +169,41 @@ Finally, we will run the hyperparameter sweep using Hyperopt, passing in the ``o
             fn=objective,
             space=space,
             algo=tpe.suggest,
-            max_evals=12,  # Set to a higher number to explore more hyperparameter configurations
+            max_evals=8,
             trials=trials,
         )
 
         # Fetch the details of the best run
-        best_run = sorted(trials.results, key=lambda x: x["loss"])[0]
+        best_run = sorted(trials.results, key=lambda x: x["eval_rmse"])[0]
 
         # Log the best parameters, loss, and model
         mlflow.log_params(best)
-        mlflow.log_metric("rmse", best_run["loss"])
-        mlflow.tensorflow.log_model(best["model"], "model", signature=signature)
+        mlflow.log_metric("eval_rmse", best_run["loss"])
+        mlflow.tensorflow.log_model(best_run["model"], "model", signature=signature)
 
         # Print out the best parameters and corresponding loss
         print(f"Best parameters: {best}")
-        print(f"Best rmse: {best_run['loss']}")
+        print(f"Best eval rmse: {best_run['eval_rmse']}")
 
 
 Compare the results
 -------------------
 
-Open the MLflow UI in your browser at the `MLFLOW_TRACKING_URI`. You should see a nested list of runs. In the default **Table view**, choose the **Columns** button and add the **Metrics | test_rmse** column and the **Parameters | lr** and **Parameters | momentum** column. To sort by RMSE ascending, click the **test_rmse** column header. The best run typically has an RMSE on the **test** dataset of ~0.70. You can see the parameters of the best run in the **Parameters** column.
+Open the MLflow UI in your browser at the `MLFLOW_TRACKING_URI`. You should see a nested list of runs. In the
+default **Table view**, choose the **Columns** button and add the **Metrics | test_rmse** column and
+the **Parameters | lr** and **Parameters | momentum** column. To sort by RMSE ascending, click the **test_rmse**
+column header. The best run typically has an RMSE on the **test** dataset of ~0.70. You can see the parameters
+of the best run in the **Parameters** column.
 
 .. image:: ../../_static/images/quickstart_mlops/mlflow_ui_table_view.png
     :width: 800px
     :align: center
     :alt: Screenshot of MLflow tracking UI table view showing runs
 
 
-Choose **Chart view**. Choose the **Parallel coordinates** graph and configure it to show the **lr** and **momentum** coordinates and the **test_rmse** metric. Each line in this graph represents a run and associates each hyperparameter evaluation run's parameters to the evaluated error metric for the run. 
+Choose **Chart view**. Choose the **Parallel coordinates** graph and configure it to show the **lr** and
+**momentum** coordinates and the **test_rmse** metric. Each line in this graph represents a run and associates
+each hyperparameter evaluation run's parameters to the evaluated error metric for the run.
 
 .. raw:: html
 
@@ -197,18 +215,26 @@ Choose **Chart view**. Choose the **Parallel coordinates** graph and configure i
     alt="Screenshot of MLflow tracking UI parallel coordinates graph showing runs"
   >
 
-The red graphs on this graph are runs that fared poorly. The lowest one is a baseline run with both **lr** and **momentum** set to 0.0. That baseline run has an RMSE of ~0.89. The other red lines show that high **momentum** can also lead to poor results with this problem and architecture. 
+The red graphs on this graph are runs that fared poorly. The lowest one is a baseline run with both **lr** 
+and **momentum** set to 0.0. That baseline run has an RMSE of ~0.89. The other red lines show that 
+high **momentum** can also lead to poor results with this problem and architecture. 
 
 The graphs shading towards blue are runs that fared better. Hover your mouse over individual runs to see their details.
 
 Register your best model
 ------------------------
 
-Choose the best run and register it as a model. In the **Table view**, choose the best run. In the **Run Detail** page, open the **Artifacts** section and select the **Register Model** button. In the **Register Model** dialog, enter a name for the model, such as ``wine-quality``, and click **Register**.
+Choose the best run and register it as a model. In the **Table view**, choose the best run. In the 
+**Run Detail** page, open the **Artifacts** section and select the **Register Model** button. In the
+**Register Model** dialog, enter a name for the model, such as ``wine-quality``, and click **Register**.
 
-Now, your model is available for deployment. You can see it in the **Models** page of the MLflow UI. Open the page for the model you just registered.
+Now, your model is available for deployment. You can see it in the **Models** page of the MLflow UI.
+Open the page for the model you just registered.
 
-You can add a description for the model, add tags, and easily navigate back to the source run that generated this model. You can also transition the model to different stages. For example, you can transition the model to **Staging** to indicate that it is ready for testing. You can transition it to **Production** to indicate that it is ready for deployment.
+You can add a description for the model, add tags, and easily navigate back to the source run that generated
+this model. You can also transition the model to different stages. For example, you can transition the model
+to **Staging** to indicate that it is ready for testing. You can transition it to **Production** to indicate
+that it is ready for deployment.
 
 Transition the model to **Staging** by choosing the **Stage** dropdown:
 
@@ -220,13 +246,15 @@ Transition the model to **Staging** by choosing the **Stage** dropdown:
 Serve the model locally
 ----------------------------
 
-MLflow allows you to easily serve models produced by any run or model version. You can serve the model you just registered by running:
+MLflow allows you to easily serve models produced by any run or model version. You can serve the model
+you just registered by running:
 
 .. code-block:: bash
 
   mlflow models serve -m "models:/wine-quality/Staging" --port 5002
 
-(Note that specifying the port as above will be necessary if you are running the tracking server on the same machine at the default port of **5000**.)
+(Note that specifying the port as above will be necessary if you are running the tracking server on the
+same machine at the default port of **5000**.)
 
 You could also have used a ``runs:/<run_id>`` URI to serve a model, or any supported URI described in :ref:`artifact-stores`. 
 
@@ -239,30 +267,44 @@ To test the model, you can send a request to the REST API using the ``curl`` com
   "data": [[7,0.27,0.36,20.7,0.045,45,170,1.001,3,0.45,8.8]]}}' \
   -H 'Content-Type: application/json' -X POST localhost:5002/invocations
 
-Inferencing is done with a JSON `POST` request to the **invocations** path on **localhost** at the specified port. The ``columns`` key specifies the names of the columns in the input data. The ``data`` value is a list of lists, where each inner list is a row of data. For brevity, the above only requests one prediction of wine quality (on a scale of 3-8). The response is a JSON object with a **predictions** key that contains a list of predictions, one for each row of data. In this case, the response is:
+Inferencing is done with a JSON `POST` request to the **invocations** path on **localhost** at the specified port.
+The ``columns`` key specifies the names of the columns in the input data. The ``data`` value is a list of lists,
+where each inner list is a row of data. For brevity, the above only requests one prediction of wine
+quality (on a scale of 3-8). The response is a JSON object with a **predictions** key that contains a list of
+predictions, one for each row of data. In this case, the response is:
 
 .. code-block:: json
 
   {"predictions": [{"0": 5.310967445373535}]}
 
-The schema for input and output is available in the MLflow UI in the **Artifacts | Model** description. The schema is available because the ``train.py`` script used the ``mlflow.infer_signature`` method and passed the result to the ``mlflow.log_model`` method. Passing the signature to the ``log_model`` method is highly recommended, as it provides clear error messages if the input request is malformed. 
+The schema for input and output is available in the MLflow UI in the **Artifacts | Model** description. The schema
+is available because the ``train.py`` script used the ``mlflow.infer_signature`` method and passed the result to
+the ``mlflow.log_model`` method. Passing the signature to the ``log_model`` method is highly recommended, as it
+provides clear error messages if the input request is malformed. 
 
 Build a container image for your model
 ---------------------------------------
 
-Most routes toward deployment will use a container to package your model, its dependencies, and relevant portions of the runtime environment. You can use MLflow to build a Docker image for your model.
+Most routes toward deployment will use a container to package your model, its dependencies, and relevant portions of
+the runtime environment. You can use MLflow to build a Docker image for your model.
 
 .. code-block:: bash
 
   mlflow models build-docker --model-uri "models:/wine-quality/1" --name "qs_mlops"
 
-This command builds a Docker image named ``qs_mlops`` that contains your model and its dependencies. The ``model-uri`` in this case specifies a version number (``/1``) rather than a lifecycle stage (``/staging``), but you can use whichever integrates best with your workflow. It will take several minutes to build the image. Once it completes, you can run the image to provide real-time inferencing locally, on-prem, on a bespoke Internet server, or cloud platform. You can run it locally with:
+This command builds a Docker image named ``qs_mlops`` that contains your model and its dependencies. The ``model-uri``
+in this case specifies a version number (``/1``) rather than a lifecycle stage (``/staging``), but you can use
+whichever integrates best with your workflow. It will take several minutes to build the image. Once it completes,
+you can run the image to provide real-time inferencing locally, on-prem, on a bespoke Internet server, or cloud
+platform. You can run it locally with:
 
 .. code-block:: bash
 
   docker run -p 5002:8080 qs_mlops
 
-This `Docker run command <https://docs.docker.com/engine/reference/commandline/run/>`_ runs the image you just built and maps port **5002** on your local machine to port **8080** in the container. You can now send requests to the model using the same ``curl`` command as before:
+This `Docker run command <https://docs.docker.com/engine/reference/commandline/run/>`_ runs the image you just built
+and maps port **5002** on your local machine to port **8080** in the container. You can now send requests to the
+model using the same ``curl`` command as before:
 
 .. code-block:: bash
 
@@ -271,7 +313,8 @@ This `Docker run command <https://docs.docker.com/engine/reference/commandline/r
 Deploying to a cloud platform
 -----------------------------
 
-Virtually all cloud platforms allow you to deploy a Docker image. The process varies considerably, so you will have to consult your cloud provider's documentation for details.
+Virtually all cloud platforms allow you to deploy a Docker image. The process varies considerably, so you will have
+to consult your cloud provider's documentation for details.
 
 In addition, some cloud providers have built-in support for MLflow. For instance:
 
@@ -280,5 +323,7 @@ In addition, some cloud providers have built-in support for MLflow. For instance
 - `Amazon SageMaker <https://docs.aws.amazon.com/sagemaker/index.html>`_
 - `Google Cloud <https://cloud.google.com/doc>`_
 
-all support MLflow. Cloud platforms generally support multiple workflows for deployment: command-line, SDK-based, and Web-based. You can use MLflow in any of these workflows, although the details will vary between platforms and versions. Again, you will need to consult your cloud provider's documentation for details.
+all support MLflow. Cloud platforms generally support multiple workflows for deployment: command-line,
+SDK-based, and Web-based. You can use MLflow in any of these workflows, although the details will vary between
+platforms and versions. Again, you will need to consult your cloud provider's documentation for details.