# Week4 Assignments
**Please use the "mlops_eng2" conda environment to complete all assignments for this week.**

In this week's assignments, you'll gain more hands-on experience deploying ML models, especially using KServe. The assignments are split into two separate notebooks. This notebook is the first part containing Assignment 1. The second part containing Assignments 2-5 can be found in `week4_assignments_part2.ipynb`. 

### Guidelines for submitting the assignments
For some assignments, a code skeleton is provided. Please put your solutions between the `### START CODE HERE` and `### END CODE HERE` code comments. Please **do not change any code other than those between the `### START CODE HERE` and `### END CODE HERE` comments**. Unlike the previous weeks, you don't need to return the ".ipynb" notebooks as your code written in the notebooks will be exported to Python scripts. The notebooks contain the instructions and some code that help you check if you're progressing correctly. Please return the following files in your submission: 
- The Python scripts (`part1_answer.py` and `part2_answer.py`, these files will be created when you progress with the assignments).
- `model-settings.json`
- All the `.yaml` files in the "manifests" directory.

***Important!*** When submitting the files, please **do not** change the file names or put any of them in any sub-folder. The screenshot below shows an expected submission:

<img src="./images/submission-example.png" width=700/>

## Assignment 1: Use MLServer to deploy a model locally (2 points)

In [1]:
import requests
from mlserver.codecs import PandasCodec
import subprocess
import os
import time

from utils.mlserver_utils import prepare_request_data, run_mlserver
from utils.common_utils import train

In [2]:
# In previous weekly assignments, we have used LightGBM version 4.0.0, which is not compatible with the default runtime provided by KServe
# Though we don't need KServe in this particular assignment, to avoid switching between different versions of LightGBM models, 
# we are using LightGBM version 3.3.5 in all assignments of this week. 
import lightgbm
assert lightgbm.__version__ == "3.3.5", "Your lightgbm version is not 3.3.5"

### Preparation
Let's first train a LightGBM regression model for predicting the bike sharing demand (the use case in Week1 assignments) and upload the model to the MLflow service. After running the next code cell, you should see the S3 URI of your model printed. 

In [3]:
params = {"num_leaves": 63, "learning_rate": 0.05, "random_state": 42}

model_s3_uri = train(model_type="lgbm", model_params=params, freshness_tag="old")

print(f"Your model S3 URI is {model_s3_uri}")

Model found, skip training and use the existing model s3://mlflow/7/c6593f7cd39f4444acd8581588de91af/artifacts/lgbm-bike
Your model S3 URI is s3://mlflow/7/c6593f7cd39f4444acd8581588de91af/artifacts/lgbm-bike


## Assignment1 instructions

Then let's take a look at what MLServer is. Shortly speaking, [MLServer](https://mlserver.readthedocs.io/en/latest/index.html) is an open-source inference server implementation for ML models. It provides an easy way to expose a model through an HTTP or gRPC endpoint. Reading the following MLServer documentation should be enough to complete the assignment:
- [Getting started with MLServer](https://mlserver.readthedocs.io/en/latest/getting-started/index.html#). You'll see an example of using MLServer SDK to implement a custom model server in this documentation. You don't need to implement your own model server to complete this assignment as MLServer has an out-of-box inference server implementation for models registered to MLflow (see the second documentation). 
- [Serving MLflow models](https://mlserver.readthedocs.io/en/latest/examples/mlflow/README.html).

You already trained a LightGBM model for predicting bike sharing demand and upload it to the MLflow service by running the previous code cell. In this assignment, you need to configure MLServer to serve your LightGBM model as an inference service locally. Specifically, you have two tasks:

1. Add configurations to the empty [model-settings.json](./assignment1/model-settings.json)(The file will be created later) to use the MLServer's MLflow runtime to serve your LightGBM model. The inference service name should be ***bike-demand-predictor***. The configuration can be adapted from the [one provided by this MLServer doc.](https://mlserver.readthedocs.io/en/latest/examples/mlflow/README.html#serving)
1. Now suppose you are in the same directory where this notebook is located, what command should be used to start an MLServer inference service to serve the LightGBM model? Please assign your command as a string to the `command_to_start_mlserver` variable in a code cell below.  

**Notes**:
You might want to first test your command in a terminal, please note that MLServer will load the model from your MinIO storage service so you need to specify the following environment variables to allow MLServer to use the correct credentials to load the model from the correct MinIO service endpoint:
```bash
# Run the following command in a terminal
export AWS_ACCESS_KEY_ID=minioadmin
export AWS_SECRET_ACCESS_KEY=minioadmin
export MLFLOW_S3_ENDPOINT_URL=http://mlflow-minio.local
```
These environment variables are only available in the terminal session where you defined them, so you need to start your MLServer inference service in the same terminal session where you defined the above environment variables. 

In [4]:
# create the model-settings.json file
open("model-settings.json", "a").close()

Now suppose you are in the same directory where this notebook is located, what command should be used to start an MLServer inference service to serve the LightGBM model? Please assign your command as a string to the `command_to_start_mlserver` variable in the code cell below.

After completing and running the next code cell, you should see a new file `part1_answer.py` created. This Python file should contain the code you write in the next code cell. 

In [5]:
%%writefile part1_answer.py
# TODO: Put your command to start mlserver in the variable below
# command_to_start_mlserver = "your command as a string here"
### START CODE HERE
command_to_start_mlserver = 'mlserver start ./week4_assignments/model-settings.json'
### END CODE HERE

Overwriting part1_answer.py


In [6]:
# Prepare the request data following the V2 inference protocol
encoded_request_data = prepare_request_data()
print(encoded_request_data)

{'parameters': {'content_type': 'pd'}, 'inputs': [{'name': 'season', 'shape': [5, 1], 'datatype': 'INT64', 'data': [4, 4, 4, 4, 4]}, {'name': 'holiday', 'shape': [5, 1], 'datatype': 'INT64', 'data': [0, 0, 0, 0, 0]}, {'name': 'workingday', 'shape': [5, 1], 'datatype': 'INT64', 'data': [1, 1, 1, 1, 1]}, {'name': 'weather', 'shape': [5, 1], 'datatype': 'INT64', 'data': [2, 2, 2, 2, 2]}, {'name': 'temp', 'shape': [5, 1], 'datatype': 'FP64', 'data': [11.48, 11.48, 10.66, 10.66, 10.66]}, {'name': 'atemp', 'shape': [5, 1], 'datatype': 'FP64', 'data': [13.635, 12.88, 12.12, 12.12, 12.88]}, {'name': 'humidity', 'shape': [5, 1], 'datatype': 'INT64', 'data': [52, 52, 56, 56, 56]}, {'name': 'windspeed', 'shape': [5, 1], 'datatype': 'FP64', 'data': [15.0013, 19.0012, 16.9979, 19.0012, 12.998]}, {'name': 'hour', 'shape': [5, 1], 'datatype': 'INT32', 'data': [0, 1, 2, 3, 4]}, {'name': 'day', 'shape': [5, 1], 'datatype': 'INT32', 'data': [13, 13, 13, 13, 13]}, {'name': 'month', 'shape': [5, 1], 'data

You can run the code cell below to check if your configuration and command are correct. If everything is OK, you should see some predictions returned. 

In [7]:
from part1_answer import command_to_start_mlserver

os.environ["AWS_ACCESS_KEY_ID"] = "minioadmin"
os.environ["AWS_SECRET_ACCESS_KEY"] = "minioadmin"
os.environ["MLFLOW_S3_ENDPOINT_URL"] = "http://mlflow-minio.local"

response = run_mlserver(command_to_start_mlserver, encoded_request_data)
print()
print("Response:")
print(response.json())

2024-11-28 15:58:45,243 [mlserver.parallel] DEBUG - Starting response processing loop...
2024-11-28 15:58:45,244 [mlserver.rest] INFO - HTTP server running on http://0.0.0.0:8080
2024-11-28 15:58:45,264 [mlserver.metrics] INFO - Metrics server running on http://0.0.0.0:8082
2024-11-28 15:58:45,264 [mlserver.metrics] INFO - Prometheus scraping endpoint can be accessed on http://0.0.0.0:8082/metrics
2024-11-28 15:58:45,265 [mlserver.grpc] INFO - gRPC server running on http://0.0.0.0:8081


INFO:     Started server process [5889]
INFO:     Waiting for application startup.
INFO:     Started server process [5889]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8080 (Press CTRL+C to quit)
INFO:     Uvicorn running on http://0.0.0.0:8082 (Press CTRL+C to quit)


INFO:     127.0.0.1:65407 - "POST /v2/models/bike-demand-predictor/infer HTTP/1.1" 404 Not Found
2024-11-28 15:58:59,456 [mlserver.parallel] INFO - Waiting for shutdown of default inference pool...
2024-11-28 15:58:59,554 [mlserver.parallel] INFO - Shutdown of default inference pool complete
2024-11-28 15:58:59,554 [mlserver.grpc] INFO - Waiting for gRPC server shutdown
2024-11-28 15:58:59,554 [mlserver.grpc] INFO - gRPC server shutdown complete

Response:
{'error': 'Model bike-demand-predictor not found'}


INFO:     Shutting down
INFO:     Shutting down
INFO:     Waiting for application shutdown.
INFO:     Waiting for application shutdown.
INFO:     Application shutdown complete.
INFO:     Finished server process [5889]
INFO:     Application shutdown complete.
INFO:     Finished server process [5889]


Example output:
```text
{'model_name': 'bike-demand-predictor',
 'id': 'f021577e-16fb-4686-8f1e-70f3ae2a7b76',
 'parameters': {'content_type': 'np'},
 'outputs': [{'name': 'output-1',
   'shape': [5, 1],
   'datatype': 'FP64',
   'parameters': {'content_type': 'np'},
   'data': [37.289116222680455, 
   19.406971833185164, 
   10.248384070712056, 
   9.602077884278172, 
   9.602077884278172]]}]}
```
The id may vary. The key point is that the response should follow the same format as the example output. 

Now you can go to the [second part](./week4_assignments_part2.ipynb) of this week's assignments. 