# ‚úçÔ∏è Exercise: Intro to MLFlow - Part III

Now that we have loged models into MLFlow it's time to learn how register them and deploy them to a production environment.


- Load a regression dataset
- Train a model
- Log the model into MLFlow
- Register the model
- Stage the model into production/development
- Deploy the model using MLFlow

In [1]:
from sklearn import datasets


# Download dataset and convert to pandas dataframe
diabetes_dataset = datasets.load_diabetes()
X = diabetes_dataset.data
y = diabetes_dataset.target

## Exercise I: Split the Data into Train and Test Sets

üí° Remember that we need to split our data into train and test sets. We can use the [`train_test_split` function](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html) from `sklearn.model_selection` to do this. We should store the split into `X_train`, `y_train`, `X_test`, `y_test`.

In [2]:
from sklearn.model_selection import train_test_split


RANDOM_STATE = 42
TEST_SIZE = 0.2

# üëá Add the relevant code below to split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=TEST_SIZE, random_state=RANDOM_STATE)

## Exercise II: Train a Linear Regression Model

Then, train a [**linear regression model** using the scikit-learn library](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html).

1. üëâ Initialize the model calling the `LinearRegression` class.
2. üëâ Train the model using the `fit` method.

In [3]:
from sklearn.linear_model import LinearRegression


# Add code to train the model üëá
model = LinearRegression()
model.fit(X_train, y_train)

0,1,2
,"fit_intercept  fit_intercept: bool, default=True Whether to calculate the intercept for this model. If set to False, no intercept will be used in calculations (i.e. data is expected to be centered).",True
,"copy_X  copy_X: bool, default=True If True, X will be copied; else, it may be overwritten.",True
,"tol  tol: float, default=1e-6 The precision of the solution (`coef_`) is determined by `tol` which specifies a different convergence criterion for the `lsqr` solver. `tol` is set as `atol` and `btol` of :func:`scipy.sparse.linalg.lsqr` when fitting on sparse training data. This parameter has no effect when fitting on dense data. .. versionadded:: 1.7",1e-06
,"n_jobs  n_jobs: int, default=None The number of jobs to use for the computation. This will only provide speedup in case of sufficiently large problems, that is if firstly `n_targets > 1` and secondly `X` is sparse or if `positive` is set to `True`. ``None`` means 1 unless in a :obj:`joblib.parallel_backend` context. ``-1`` means using all processors. See :term:`Glossary ` for more details.",
,"positive  positive: bool, default=False When set to ``True``, forces the coefficients to be positive. This option is only supported for dense arrays. For a comparison between a linear regression model with positive constraints on the regression coefficients and a linear regression without such constraints, see :ref:`sphx_glr_auto_examples_linear_model_plot_nnls.py`. .. versionadded:: 0.24",False


## Exercise III: Compute the Accuracy of the Model

Finally, compute the accuracy of the model using the [`mean_squared_error` function](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.mean_squared_error.html) from the `sklearn.metrics` module.

1. üëâ Compute the predictions by passing the `X_test` to the `predict` method of the model.
2. üëâ Compute the accuracy using the `mean_squared_error` function and passing the `y_test` and the `predictions` as arguments.
3. üëâ Print the accuracy.

In [4]:
from sklearn.metrics import root_mean_squared_error


# Add code to calculate the mean squared error üëá
y_pred = model.predict(X_test)
rmse = root_mean_squared_error(y_test, y_pred)
rmse

53.85344583676592

## Exercise IV: Create a Run and log the model and metrics.

1. üëâ Connect to MLFlow
2. üëâ Set the experiment "Diabetes Linear Regression"

In [6]:
import mlflow


EXPERIMENT_NAME = "Diabetes Linear Regression"
MLFLOW_TRACKING_URI = "http://localhost:5000"


# Connect to MLFlow üëá
mlflow.set_tracking_uri(MLFLOW_TRACKING_URI)
mlflow.set_experiment(EXPERIMENT_NAME)

<Experiment: artifact_location='mlflow-artifacts:/3', creation_time=1771609788786, experiment_id='3', last_update_time=1771609788786, lifecycle_stage='active', name='Diabetes Linear Regression', tags={}>


1. üëâ Log the root mean squared error metric using `mlflow.log_metric` function
2. üëâ Log the model using the `mlflow.sklearn.log_model` function.

In [None]:
# launch a run to log the model
with mlflow.start_run() as run:
    
    # Add code to log the model, and the mean squared error üëá
    mlflow.log_metrics({"rmse": rmse})
    mlflow.sklearn.log_model(model, "model", input_example=X_test[:1])



  flavor.save_model(path=local_path, mlflow_model=mlflow_model, **kwargs)


üèÉ View run masked-shad-72 at: http://localhost:5000/#/experiments/3/runs/f4c7031372d542deba3e0e41d6531ba0
üß™ View experiment at: http://localhost:5000/#/experiments/3


## Exercise V: Register the model

Registering a model in MLFlow is a way to keep track of the different versions of the same model. Registered models have different versions that track changes in the model and allows

1. üëâ Get the **run ID** of the model you want to register using `run.info.run_id`.
2. üëâ Register the model using the `mlflow.register_model` function.

In [7]:
# register the model for this run
MODEL_NAME = "diabetes_prediction"  # change this to your model name


# Compute model path: models stored in a run follow this convention
model_path = f"runs:/{run.info.run_id}/model"  # fill the `run_id`` variable


# Register the model 
mlflow.register_model(model_path, MODEL_NAME)

NameError: name 'run' is not defined

## Exercise VI: Deploy a model

Deploying a model is a complex task that involves many steps. MLFlow simplifies this process by providing a set of tools to deploy models to different platforms. In this exercise, we will deploy a model to a local server. 

First, you need to connect the terminal to the MLFlow Server by setting the `MLFLOW_TRACKING_URI` environment variable. 

```bash
export MLFLOW_TRACKING_URI=http://localhost:5000
```

Then, you can deploy the model using the `mlflow models serve` command **in your terminal**:

```bash
mlflow models serve --model-uri models:/<model_name>/<model_version> --port 5001
```

Where `<model_name>` is the name of the model and `<model_version>` is the version of the model you want to deploy. You can find the name and version of the model in the MLFlow UI. Also the `--port` argument is the port where the server will be running. It's important to choose a port different than the `5000` port where the MLFlow server is running.

## BONUS: Make a request to the model

Finally, make a request to the model using the `requests` library. You can use the following code to make a request to the model:

In [8]:
# Define the URL and headers
URL = 'http://localhost:5001/invocations'
HEADERS = {'Content-Type': 'application/json'}

Define the body of the request:

In [12]:
# Define the input payload
import json

input_vector = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]  # random input vector
payload = {'inputs': [input_vector]}  # wrap the input vector in a dictionary under the key 'inputs'
json_payload = json.dumps(payload)  # convert the payload to a JSON string

Send the POST request to the model

In [14]:
import requests


response = requests.post(URL, headers=HEADERS, data=json_payload)

ConnectionError: HTTPConnectionPool(host='localhost', port=5001): Max retries exceeded with url: /invocations (Caused by NewConnectionError("HTTPConnection(host='localhost', port=5001): Failed to establish a new connection: [Errno 111] Connection refused"))

Check the status code and the response of the request. If the status code is `200` the request was successful.

In [11]:
print(f"Status code: {response.status_code}")
print(f"Response body: {response.json()}")

Status code: 200
Response body: {'predictions': [1299.54418238401]}


Check that the model trained in the notebook generates the same predictions as the model deployed in the server.

In [12]:
model.predict([input_vector])

array([1299.54418238])