# Week4 Assignments (part2)
This is the second part of this week's assignments. **Please run this notebook using the `mlops_eng2` environment** (it should be created when you follow the tutorials).

### Guidelines for doing Assignments 2-5
- In 2a), you'll need to write some Python code, so please put your code between the `### START CODE HERE` and `### END CODE HERE` comments. Please **do not change any code other than those between the `### START CODE HERE` and `### END CODE HERE` comments**. 
- In other assignments, you'll need to complete some configurations in YAML files. In each YAML file, please write your configurations between the `### START CONF HERE` and `### END CONF HERE` comments. Again, please **do not change any text other than those between the `### START CONF HERE` and `### END CONF HERE` comments**. 
- You will use a command `kubectl -n kserve-inference get isvc <name-of-inference-service>` (or `kubectl -n kserve-inference get ig <name-of-inference-graph>`) a few times when running this notebook. This command checks whether your inference service (or inference graph) deployed to KServe is ready. It takes some time (up to a few minutes) for a inference service/graph to become ready, so you may need to run the same command a few times to follow the readiness of your inference service/graph. You can also use the "-w" option to continuously watch the status of the inference service/graph (`kubectl get isvc <name-of-inference-service> -n kserve-inference -w`) and then terminate the code cell when the inference service/graph is ready.

# Assignment 2: Deploy a model to KServe (3 points)
In this assignment, you need to deploy your LightGBM model for predicting bike sharing demand as an inference service to KServe. You can use the model you just trained before starting the first assignment. 

Similar to the tutorial, the deployed inference service should run in the "kserve-inference" namespace and the service account name containing the credentials for accessing the MinIO storage service is also "kserve-sa". 

In [112]:
from utils.kserve_utils import send_request
from utils.common_utils import train

In [113]:
# Make sure you're using the correct version of lightgbm
import lightgbm
assert lightgbm.__version__ == "3.3.5", "Your lightgbm version is not 3.3.5"

## 2a) Use Python SDK to deploy your LightGBM model
Complete the `deploy_model` function that uses the KServe SDK to deploy your LightGBM model. If there is no model deployed, your function should create a new inference service; if there is an inference service existing, your function should be able to update it. 

**Hint**: Using the LightGBM server provided by KServe doesn't work because the model saved by MLflow is in the pickled format, which is different from the format supported by KServe's LightGBM server. You can check [here](https://github.com/kserve/kserve/issues/2483) on how to use KServe SDK to deploy a model uploaded to MLflow.

After complete and run the next code cell, you should see the code in the code cell exported to a Python script named `part2_answer.py`. 

In [114]:
%%writefile part2_answer.py

from kubernetes import client
from kserve import KServeClient
from kserve import constants
from kserve import V1beta1InferenceService
from kserve import V1beta1InferenceServiceSpec
from kserve import V1beta1PredictorSpec
from kserve import V1beta1ModelSpec
from kserve import V1beta1ModelFormat

def deploy_model(model_name: str, model_uri: str):
    """
    Args:
        model_name: the name of the deployed inference service
        model_uri: the S3 URI of the model saved in MLflow
    """
    
    namespace = "kserve-inference"
    service_account_name="kserve-sa"
    kserve_version="v1beta1"
    api_version = constants.KSERVE_GROUP + "/" + kserve_version
    
    print(f"MODEL URI: {model_uri}")
    
    modelspec = V1beta1ModelSpec(
        storage_uri=model_uri,
        model_format=V1beta1ModelFormat(name="mlflow"),
        protocol_version="v2"
    )
    
    isvc = V1beta1InferenceService(

        ### START CODE HERE
        api_version=api_version,
        kind="InferenceService",
        metadata=client.V1ObjectMeta(name=model_name, namespace=namespace),
        ### END CODE HERE

        spec=V1beta1InferenceServiceSpec(
            predictor=V1beta1PredictorSpec(
                ### START CODE HERE
                model=modelspec,
                service_account_name=service_account_name
                ### END CODE HERE
            )
        )
    )
    kserve = KServeClient()

    ### START CODE HERE
    # Check if the model is already deployed, if yes, update it; if no, create a new service
    try:
        existing_service = kserve.get(model_name, namespace=namespace)
        print(f"Service {model_name} already exists, updating...")
        kserve.patch(isvc)
    except Exception as e:
        print(f"Service {model_name} not found, creating a new one...")
        kserve.create(isvc, namespace=namespace)
    ### END CODE HERE
    

Overwriting part2_answer.py


In [115]:
from part2_answer import deploy_model

model_name = "bike-lgbm-2a"

params = {"num_leaves": 63, "learning_rate": 0.05, "random_state": 42}
model_uri = train(model_type="lgbm", model_params=params, freshness_tag="old")

# Test the deploy_model function
deploy_model(model_name, model_uri)

Model found, skip training and use the existing model s3://mlflow/7/c6593f7cd39f4444acd8581588de91af/artifacts/lgbm-bike
MODEL URI: s3://mlflow/7/c6593f7cd39f4444acd8581588de91af/artifacts/lgbm-bike
Service bike-lgbm-2a not found, creating a new one...


In [8]:
# Check if the "bike-lgbm-2a" inference service is ready
!kubectl -n kserve-inference get isvc bike-lgbm-2a

NAME           URL                                                READY   PREV   LATEST   PREVROLLEDOUTREVISION   LATESTREADYREVISION            AGE
bike-lgbm-2a   http://bike-lgbm-2a.kserve-inference.example.com   True           100                              bike-lgbm-2a-predictor-00001   16m


Expected output:

```text
NAME           URL                                                READY   PREV   LATEST   PREVROLLEDOUTREVISION   LATESTREADYREVISION                    AGE
bike-lgbm-2a   http://bike-lgbm-2a.kserve-inference.example.com   True           100                              bike-lgbm-2a-predictor-default-00001   72s
```

In [9]:
# Make sure there is one pod running for the "bike-lgbm" inference service
!kubectl -n kserve-inference get pod -l serving.kserve.io/inferenceservice=bike-lgbm-2a

NAME                                                       READY   STATUS    RESTARTS   AGE
bike-lgbm-2a-predictor-00001-deployment-774ffbcd5c-dkjf4   2/2     Running   0          16m


Example output:

```text
NAME                                                              READY   STATUS    RESTARTS   AGE
bike-lgbm-2a-predictor-default-00001-deployment-6499598b7-wc28j   2/2     Running   0          65s
```

In [10]:
# Send a request to the inference service
send_request(model_name=model_name)

{'model_name': 'bike-lgbm-2a', 'id': '2f6fbd16-244e-458f-a696-d2caf6098bf7', 'parameters': {}, 'outputs': [{'name': 'output-1', 'shape': [2, 1], 'datatype': 'FP64', 'data': [51.00457318737209, 35.13687405851507]}]}


{'model_name': 'bike-lgbm-2a',
 'id': '2f6fbd16-244e-458f-a696-d2caf6098bf7',
 'parameters': {},
 'outputs': [{'name': 'output-1',
   'shape': [2, 1],
   'datatype': 'FP64',
   'data': [51.00457318737209, 35.13687405851507]}]}

Example output:

```text
{'model_name': 'bike-lgbm-2a',
 'id': 'eddb6d4b-e517-421a-8420-d02db301428b',
 'parameters': {},
 'outputs': [{'name': 'output-1',
   'shape': [2, 1],
   'datatype': 'FP64',
   'data': [51.00457318737209, 35.13687405851507]}]}
```
**Note**: The id varies. The important point is that the response has the correct fields as shown in the above expected output. 

*P.S.* KServe also uses MLServer to serve the models uploaded to the MLflow service, which means your inference service also uses the V2 inference protocol.

Next, let's train another model with different hyperparameters and see if your `deploy_model` function can update the existing inference service. 

In [11]:
new_params = {"num_leaves": 31, "learning_rate": 0.01, "random_state": 42}
new_model_s3_uri = train(model_type="lgbm", model_params=new_params, freshness_tag="new")

deploy_model(model_name, new_model_s3_uri)

No model found, start training...


INFO:botocore.credentials:Found credentials in environment variables.
Registered model 'Week4LgbmBikeDemand' already exists. Creating a new version of this model...
2024/11/28 15:02:59 INFO mlflow.store.model_registry.abstract_store: Waiting up to 300 seconds for model version to finish creation. Model name: Week4LgbmBikeDemand, version 2
Created version '2' of model 'Week4LgbmBikeDemand'.
  model_info = MLFLOW_CLIENT.get_latest_versions(registered_model_name)[0]


The trained model is located at s3://mlflow/7/46df22f2a80042468a81d304db06896b/artifacts/lgbm-bike
MODEL URI: s3://mlflow/7/46df22f2a80042468a81d304db06896b/artifacts/lgbm-bike
Service bike-lgbm-2a already exists, updating...
Service bike-lgbm-2a not found, creating a new one...


RuntimeError: Exception when calling CustomObjectsApi->create_namespaced_custom_object:                 (409)
Reason: Conflict
HTTP response headers: HTTPHeaderDict({'Audit-Id': '2eb8aaf5-317c-40a4-aee3-c1f08dc4c0be', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'X-Kubernetes-Pf-Flowschema-Uid': '7de4939b-bceb-4511-b9e3-2bd606c1798f', 'X-Kubernetes-Pf-Prioritylevel-Uid': 'b71e55c5-88fe-4fee-8b3a-546ff7e59472', 'Date': 'Thu, 28 Nov 2024 13:02:59 GMT', 'Content-Length': '274'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"inferenceservices.serving.kserve.io \"bike-lgbm-2a\" already exists","reason":"AlreadyExists","details":{"name":"bike-lgbm-2a","group":"serving.kserve.io","kind":"inferenceservices"},"code":409}




In [12]:
# Check if the updated inference service is ready
!kubectl -n kserve-inference get isvc bike-lgbm-2a

NAME           URL                                                READY   PREV   LATEST   PREVROLLEDOUTREVISION   LATESTREADYREVISION            AGE
bike-lgbm-2a   http://bike-lgbm-2a.kserve-inference.example.com   True           100                              bike-lgbm-2a-predictor-00001   17m


In [13]:
send_request(model_name=model_name)

# The output data should be different from the previous response

{'model_name': 'bike-lgbm-2a', 'id': '8f5caa1f-3943-47ba-8212-33bac9900853', 'parameters': {}, 'outputs': [{'name': 'output-1', 'shape': [2, 1], 'datatype': 'FP64', 'data': [51.00457318737209, 35.13687405851507]}]}


{'model_name': 'bike-lgbm-2a',
 'id': '8f5caa1f-3943-47ba-8212-33bac9900853',
 'parameters': {},
 'outputs': [{'name': 'output-1',
   'shape': [2, 1],
   'datatype': 'FP64',
   'data': [51.00457318737209, 35.13687405851507]}]}

In [116]:
# Clean up by removing the "bike-lgbm-2a" inference service
!kubectl -n kserve-inference delete isvc bike-lgbm-2a

inferenceservice.serving.kserve.io "bike-lgbm-2a" deleted


Expected output:

```text
inferenceservice.serving.kserve.io "bike-lgbm-2a" deleted
```

## 2b) Use a YAML file to deploy the model
Instead of using the KServe SDK, now you need to use a YAML file to deploy your LightGBM model again. Please complete the configuration in [manifests/bike-lgbm-basic.yaml](./manifests/bike-lgbm-basic.yaml). You can use whichever LightGBM model in this assignment. 

**Hint**: You can check from [this KServe doc](https://kserve.github.io/website/0.11/modelserving/v1beta1/mlflow/v2/#deploy-with-inferenceservice) on how to use a YAML manifest to deploy a model stored in MLflow.

In [1]:
# Deploy the LightGBM model for bike demand prediction as an inference service named "bike-lgbm"
!kubectl apply -f manifests/bike-lgbm-basic.yaml

inferenceservice.serving.kserve.io/bike-lgbm unchanged


Expected output:

```text
inferenceservice.serving.kserve.io/bike-lgbm created
```

In [27]:
# Make sure that the "bike-lgbm" inference service is ready
!kubectl -n kserve-inference get isvc bike-lgbm

NAME        URL                                             READY   PREV   LATEST   PREVROLLEDOUTREVISION   LATESTREADYREVISION         AGE
bike-lgbm   http://bike-lgbm.kserve-inference.example.com   True           100                              bike-lgbm-predictor-00002   70m


Example output:

```text
NAME        URL                                             READY   PREV   LATEST   PREVROLLEDOUTREVISION   LATESTREADYREVISION                 AGE
bike-lgbm   http://bike-lgbm.kserve-inference.example.com   True           100                              bike-lgbm-predictor-default-00001   2m24s
```

In [37]:
# Make sure there is one pod running for the "bike-lgbm" inference service
!kubectl -n kserve-inference get pod -l serving.kserve.io/inferenceservice=bike-lgbm

NAME                                                    READY   STATUS             RESTARTS      AGE
bike-lgbm-predictor-00001-deployment-8c67c9584-8v4cg    0/2     CrashLoopBackOff   5 (52s ago)   4m58s
bike-lgbm-predictor-00002-deployment-557755fdc8-hcf2t   0/2     Terminating        4             4m9s
bike-lgbm-predictor-00002-deployment-557755fdc8-r8nbl   0/2     Completed          1 (8s ago)    29s
bike-lgbm-predictor-00002-deployment-c64b4d8c6-d6wj4    0/2     CrashLoopBackOff   8 (84s ago)   18m


Example output: 

```text
NAME                                                           READY   STATUS    RESTARTS   AGE
bike-lgbm-predictor-default-00001-deployment-9d7b87595-k9kpk   2/2     Running   0          70s
```

In [38]:
# Send some requests to the "bike-lgbm" inference service
send_request(model_name="bike-lgbm")

KeyboardInterrupt: 

Example output:

```text
{'model_name': 'bike-lgbm',
 'id': '85c9e931-0879-4f88-a84c-137063e35064',
 'parameters': {},
 'outputs': [{'name': 'output-1',
   'shape': [2, 1],
   'datatype': 'FP64',
   'data': [51.00457318737209, 35.13687405851507]}]}
```

**Note**: Please don't delete the "bike-lgbm" inference service, you will need it in Assignment3 later. 

# Assignment 3: Canary deployment in KServe (2 points)
In this assignment, your task is to deploy the new model to KServe using the canary deployment strategy. 

First, you need to make sure there's already a "bike-lgbm" inference service running in KServe.

In [64]:
!kubectl -n kserve-inference get isvc bike-lgbm

Error from server (NotFound): inferenceservices.serving.kserve.io "bike-lgbm" not found


Example output: 

```text
NAME        URL                                             READY   PREV   LATEST   PREVROLLEDOUTREVISION   LATESTREADYREVISION                 AGE
bike-lgbm   http://bike-lgbm.kserve-inference.example.com   True           100                              bike-lgbm-predictor-default-00001   47m
```

Now, your task is to complete the configuration in [manifests/bike-lgbm-canary.yaml](./manifests/bike-lgbm-canary.yaml) to deploy a LightGBM model using canary deployment (Please use a different LightGBM model than the one you used in Assignment 2b). Your new inference service should receive **30%** of the user traffic. 

In [59]:
# Update the "bike-lgbm" inference service to use the new model
!kubectl apply -f manifests/bike-lgbm-canary.yaml

inferenceservice.serving.kserve.io/bike-lgbm configured


Expected output:

```text
inferenceservice.serving.kserve.io/bike-lgbm configured
```

In [60]:
# Check that the traffic is splitted between the old and the new version
!kubectl -n kserve-inference get isvc bike-lgbm

NAME        URL                                             READY   PREV   LATEST   PREVROLLEDOUTREVISION   LATESTREADYREVISION         AGE
bike-lgbm   http://bike-lgbm.kserve-inference.example.com   True           100                              bike-lgbm-predictor-00002   94m


Example output:

```text
NAME        URL                                             READY   PREV   LATEST   PREVROLLEDOUTREVISION               LATESTREADYREVISION                 AGE
bike-lgbm   http://bike-lgbm.kserve-inference.example.com   True    70     30       bike-lgbm-predictor-default-00001   bike-lgbm-predictor-default-00002   61m
```

In [61]:
# Check there are two pods (one old and one new one) running 
!kubectl -n kserve-inference get pod -l serving.kserve.io/inferenceservice=bike-lgbm

NAME                                                   READY   STATUS             RESTARTS         AGE
bike-lgbm-predictor-00001-deployment-8c67c9584-8v4cg   0/2     CrashLoopBackOff   9 (3m46s ago)    26m
bike-lgbm-predictor-00002-deployment-c64b4d8c6-d6wj4   0/2     CrashLoopBackOff   12 (2m24s ago)   40m


Example output:

```text
NAME                                                           READY   STATUS    RESTARTS   AGE
bike-lgbm-predictor-default-00001-deployment-cc96598f-rr2xz    2/2     Running   0          63m
bike-lgbm-predictor-default-00002-deployment-6d9f5bbff-8mhn8   2/2     Running   0          3m36s
```

In [62]:
# Clean up by removing the "bike-lgbm" inference service
!kubectl -n kserve-inference delete isvc bike-lgbm

inferenceservice.serving.kserve.io "bike-lgbm" deleted


Expected output:

```text
inferenceservice.serving.kserve.io "bike-lgbm" deleted
```

# Assignment 4: Horizontal autoscaling (2 points)

In this assignment, your task is to complete the configuration in [manifests/bike-lgbm-scale.yaml](./manifests/bike-lgbm-scale.yaml) to deploy your LightGBM model to KServe and configure the horizontal autoscaling feature for the deployed inference service. Specifically, the horizontal autoscaling of the inference service should satisfy the following requirements:
1. The inference service should have ae least **2** pods running;
2. The inference service can have at most **8** pods running when it's being scaled up;
3. The target of the auto-scaling is that each pod running the inference service should receive **5** requests per second.

You can use whichever LightGBM model you trained before. 

**Hint**: "rps" should be used as the scaling metric. 

*rps (requests per second) VS concurrency: These two metrics may look similar at the first glance. Both of them are metrics used to measure service performance. rps quantifies the number of requests a service can process within a specific time frame, often a second, whereas concurrency focuses on how many tasks a service can handle simultaneously.*

In [94]:
# Deploy an inference service named "bike-lgbm-scale"
!kubectl apply -f manifests/bike-lgbm-scale.yaml

inferenceservice.serving.kserve.io/bike-lgbm-scale created
horizontalpodautoscaler.autoscaling/bike-lgbm-scale configured


Expected output:

```text
inferenceservice.serving.kserve.io/bike-lgbm-scale created
```

In [98]:
# Make sure the "bike-lgbm-scale" inference service is ready
!kubectl -n kserve-inference get isvc bike-lgbm-scale

NAME              URL   READY     PREV   LATEST   PREVROLLEDOUTREVISION   LATESTREADYREVISION   AGE
bike-lgbm-scale         Unknown                                                                 22s


Example output:

```text
NAME              URL                                                   READY   PREV   LATEST   PREVROLLEDOUTREVISION   LATESTREADYREVISION                       AGE
bike-lgbm-scale   http://bike-lgbm-scale.kserve-inference.example.com   True           100                              bike-lgbm-scale-predictor-default-00001   90s
```


In [100]:
# Make sure there are two pods (replicas) running for the "bike-lgbm-scale" inference service
# Please note that this command only check that your inference service is running, but it doesn't check if the scaling configuration is correct,
!kubectl -n kserve-inference get pods -l serving.kserve.io/inferenceservice=bike-lgbm-scale

NAME                                                          READY   STATUS    RESTARTS     AGE
bike-lgbm-scale-predictor-00001-deployment-5f7f6bcfcc-qzzwj   2/2     Running   1 (2s ago)   26s


Example output:

```text
NAME                                                              READY   STATUS    RESTARTS   AGE
bike-lgbm-scale-predictor-default-00001-deployment-66df7bcd67mr   2/2     Running   0          7m26s
bike-lgbm-scale-predictor-default-00001-deployment-66df7bcjb6qx   2/2     Running   0          7m27s
```

In [101]:
# Clean up by removing the "bike-lgbm-scale" inference service
!kubectl -n kserve-inference delete isvc bike-lgbm-scale

inferenceservice.serving.kserve.io "bike-lgbm-scale" deleted


Expected output:

```text
inferenceservice.serving.kserve.io "bike-lgbm-scale" deleted
```

# Assignment 5: Inference graph in KServe (3 points)

## 5a) Inference graph for ensemble
So far you already have two LightGBM models for predicting bike sharing demand. One was trained using the hyperparameters of {learning_rate=0.05, num_leaves=63} (denoted by Model A) and another {learning_rate=0.01, num_leaves=31} (denoted by Model B). 

You need to first complete the configuration in [manifests/bike-lgbm-graph.yaml](./manifests/bike-lgbm-graph.yaml) to deploy two inference services named "bike-lgbm-1" and "bike-lgbm-2". The "bike-lgbm-1" and "bike-lgbm-2" inference services should serve Model A and B, respectively.

Next, you need to complete [manifests/inference-graph1.yaml](./manifests/inference-graph1.yaml) to deploy an inference graph that includes one ensemble routing node. With this inference graph, a user will receive two predictions (one from each inference service) when they send a request.  


In [119]:
# Deploy the "bike-lgbm-1" and "bike-lgbm-2" inference services
!kubectl apply -f manifests/bike-lgbm-graph.yaml

inferenceservice.serving.kserve.io/bike-lgbm-1 configured
inferenceservice.serving.kserve.io/bike-lgbm-2 configured


Expected output:
```text
inferenceservice.serving.kserve.io/bike-lgbm-1 created
inferenceservice.serving.kserve.io/bike-lgbm-2 created
```

In [122]:
# Make sure the two inference services are ready
!kubectl -n kserve-inference get isvc bike-lgbm-1 bike-lgbm-2

NAME          URL                                               READY   PREV   LATEST   PREVROLLEDOUTREVISION   LATESTREADYREVISION           AGE
bike-lgbm-1   http://bike-lgbm-1.kserve-inference.example.com   True           100                              bike-lgbm-1-predictor-00002   11m
bike-lgbm-2   http://bike-lgbm-2.kserve-inference.example.com   True           100                              bike-lgbm-2-predictor-00001   11m


Example output:
```text
NAME          URL                                               READY   PREV   LATEST   PREVROLLEDOUTREVISION   LATESTREADYREVISION                   AGE
bike-lgbm-1   http://bike-lgbm-1.kserve-inference.example.com   True           100                              bike-lgbm-1-predictor-default-00001   105m
bike-lgbm-2   http://bike-lgbm-2.kserve-inference.example.com   True           100                              bike-lgbm-2-predictor-default-00001   105m
```

In [123]:
# Make sure there are two pods running for the two inference services, respectively
!kubectl -n kserve-inference get pods -l "serving.kserve.io/inferenceservice in (bike-lgbm-1,bike-lgbm-2)"

NAME                                                      READY   STATUS                  RESTARTS        AGE
bike-lgbm-1-predictor-00001-deployment-659476f684-c2mdk   0/2     Init:CrashLoopBackOff   6 (4m23s ago)   11m
bike-lgbm-1-predictor-00002-deployment-99d8648ff-hg62g    2/2     Running                 0               80s
bike-lgbm-2-predictor-00001-deployment-6448f76548-2j2vl   2/2     Running                 0               11m


Example output:
```text
NAME                                                              READY   STATUS    RESTARTS   AGE
bike-lgbm-1-predictor-default-00001-deployment-794547df56-48dhk   2/2     Running   0          109m
bike-lgbm-2-predictor-default-00001-deployment-cf7b449b5-rjc2q    2/2     Running   0          109m
```

In [132]:
# Deploy the inference graph named "my-graph1"
!kubectl apply -f manifests/inference-graph1.yaml

inferencegraph.serving.kserve.io/my-graph1 created


Expected output:
```text
inferencegraph.serving.kserve.io/my-graph1 created
```

In [134]:
# Make sure the "my-graph1" inference graph is ready
!kubectl -n kserve-inference get ig my-graph1

NAME        URL                                             READY   AGE
my-graph1   http://my-graph1.kserve-inference.example.com   True    19s


Example output:
```text
NAME        URL                                             READY   AGE
my-graph1   http://my-graph1.kserve-inference.example.com   True    102s
```

In [135]:
# Also make sure there is one pod running for the "ensemble" inference graph
!kubectl -n kserve-inference get pods -l serving.kserve.io/inferencegraph=my-graph1

NAME                                          READY   STATUS    RESTARTS   AGE
my-graph1-00001-deployment-86846555b9-w7dzf   2/2     Running   0          20s


Example output:
```text
NAME                                          READY   STATUS    RESTARTS   AGE
my-graph1-00001-deployment-7c4d7cfbf9-tr5tz   2/2     Running   0          2m7s
```

Now, let's send a request to the "my-graph1" inference graph

In [141]:
# Send a request
from utils.kserve_utils import send_request

send_request(to_ig=True, ig_name="my-graph1")

AssertionError: The request was not correctly processed, got status code 404

The output (i.e., the response) is expected to contain a prediction from the "bike-lgbm-1" inference service and another prediction from the "bike-lgbm-2" inference service. 

Example output:
```text
{'bike-lgbm-v1': {'id': '3f16b921-e18a-4b51-93c3-f084bf11d07c',
  'model_name': 'bike-lgbm-1',
  'outputs': [{'data': [51.00457318737209, 35.13687405851507],
    'datatype': 'FP64',
    'name': 'output-1',
    'shape': [2, 1]}],
  'parameters': {}},
 'bike-lgbm-v2': {'id': 'ef1c1ec3-9880-467d-8773-1bf614d8e0bf',
  'model_name': 'bike-lgbm-2',
  'outputs': [{'data': [97.60649891558708, 94.67018085698945],
    'datatype': 'FP64',
    'name': 'output-1',
    'shape': [2, 1]}],
  'parameters': {}}}
```

**Note**: Do not delete the "bike-lgbm-1" and "bike-lgbm-2" inference services. They're still needed in the next assignment. 

## 5b) More complicated inference graph
In this assignment, you need to deploy a more complex inference graph containing more than one routing node. 

First let's train two XGBoost models, denoted by Model C and Model D. 

In [142]:
old_xgb_model_s3_uri = train(
    model_type="xgb",
    model_params={
        "max_depth": 6,
        "learning_rate": 0.05,
        "objective": "reg:absoluteerror",
        "random_state": 42,
    },
    freshness_tag="old",
)

new_xgb_model_s3_uri = train(
    model_type="xgb",
    model_params={
        "max_depth": 6,
        "learning_rate": 0.01,
        "objective": "reg:absoluteerror",
        "random_state": 42,
    },
    freshness_tag="new",
)

print("First xgb model URI:", old_xgb_model_s3_uri)
print("Second xgb model URI:", new_xgb_model_s3_uri)

No model found, start training...


2024/11/28 22:37:22 INFO mlflow.tracking.fluent: Experiment with name 'week4-xgb-bike-demand' does not exist. Creating a new experiment.
INFO:botocore.credentials:Found credentials in environment variables.
Successfully registered model 'Week4XgbBikeDemand'.
2024/11/28 22:37:32 INFO mlflow.store.model_registry.abstract_store: Waiting up to 300 seconds for model version to finish creation. Model name: Week4XgbBikeDemand, version 1
Created version '1' of model 'Week4XgbBikeDemand'.
  model_info = MLFLOW_CLIENT.get_latest_versions(registered_model_name)[0]


The trained model is located at s3://mlflow/8/4506ddda675f4c3c997d22bae1f4c03e/artifacts/xgb-bike
No model found, start training...


Registered model 'Week4XgbBikeDemand' already exists. Creating a new version of this model...
2024/11/28 22:37:35 INFO mlflow.store.model_registry.abstract_store: Waiting up to 300 seconds for model version to finish creation. Model name: Week4XgbBikeDemand, version 2
Created version '2' of model 'Week4XgbBikeDemand'.
  model_info = MLFLOW_CLIENT.get_latest_versions(registered_model_name)[0]


The trained model is located at s3://mlflow/8/a3de4770e6704e9cbeb89d7e900737dd/artifacts/xgb-bike
First xgb model URI: s3://mlflow/8/4506ddda675f4c3c997d22bae1f4c03e/artifacts/xgb-bike
Second xgb model URI: s3://mlflow/8/a3de4770e6704e9cbeb89d7e900737dd/artifacts/xgb-bike


Similar to Assignment 5a, your tasks are to
1. Complete [manifests/bike-xgb-graph.yaml](./manifests/bike-xgb-graph.yaml) to deploy two more inference services named "bike-xgb-1" and "bike-xgb-2" that serve Models C and D, respectively. 
2. Complete [manifests/inference-graph2.yaml](./manifests/inference-graph2.yaml) to deploy an inference graph containing two Ensemble routing nodes. 
The requests that will be sent to the inference graph look like:
    ```python
    {
      'inputs': ...,
      'modelType': 'lgbm'
    }
    ```
    The inference graph should satisfy the following requirements:
    - If there is a field named "modelType" in the request and its value is "lgbm", the request should be forwarded to an ensemble consisting of the "bike-lgbm-1" and "bike-lgbm-2" inference services (These two inference services should be created in Assignment 5a). At this time, the user should receive one prediction from the "bike-lgbm-1" inference service and another prediction from the "bike-lgbm-2" inference service. 
    - If the value of "modelType is "xgb", the request should be forwarded to another ensemble consisting of the "bike-xgb-1" and "bike-xgb-2" inference services. At this time, the user should receive one prediction from the "bike-xgb-1" inference service and another prediction from the "bike-xgb-2" inference service.
    - Otherwise an error message should be returned, complaining that the request can't be processed. 
    
    The behavior of the inference graph is illustrated in the figure below:

    <img src="./images/complex-inference-graph.png" width=600/>

**Hints**:
You may notice that you need to route requests from one routing node to another (instead of from a routing node to an inference service). Below is an example of configuring a routing node to forward requests to another routing node:
```yaml
...
spec: 
  nodes: 
    # The first routing node
    root: 
      routerType: ...
      steps: 
      # This routing node forwards requests to the second routing node named "ensembleNode"
      - nodeName: ensembleNode

    # The second routing node
    ensembleNode:
      routerType: ...
      steps:
      ...
```
You can use `"[@this].#(modelType==\"...\")"` as the condition that determines which ensemble a request should be routed. 


In [143]:
# Deploy the third inference service named "bike-lgbm-3"
!kubectl apply -f manifests/bike-xgb-graph.yaml

inferenceservice.serving.kserve.io/bike-xgb-1 created
inferenceservice.serving.kserve.io/bike-xgb-2 created


Expected output:

```text
inferenceservice.serving.kserve.io/bike-xgb-1 created
inferenceservice.serving.kserve.io/bike-xgb-2 created
```

In [147]:
# Make sure the all of the four inference services are ready
!kubectl -n kserve-inference get isvc bike-lgbm-1 bike-lgbm-2 bike-xgb-1 bike-xgb-2

NAME          URL                                               READY   PREV   LATEST   PREVROLLEDOUTREVISION   LATESTREADYREVISION           AGE
bike-lgbm-1   http://bike-lgbm-1.kserve-inference.example.com   True           100                              bike-lgbm-1-predictor-00002   82m
bike-lgbm-2   http://bike-lgbm-2.kserve-inference.example.com   True           100                              bike-lgbm-2-predictor-00001   82m
bike-xgb-1    http://bike-xgb-1.kserve-inference.example.com    True           100                              bike-xgb-1-predictor-00001    2m4s
bike-xgb-2    http://bike-xgb-2.kserve-inference.example.com    True           100                              bike-xgb-2-predictor-00001    2m4s


Example output:

```text
NAME          URL                                               READY   PREV   LATEST   PREVROLLEDOUTREVISION   LATESTREADYREVISION           AGE
bike-lgbm-1   http://bike-lgbm-1.kserve-inference.example.com   True           100                              bike-lgbm-1-predictor-00001   9m21s
bike-lgbm-2   http://bike-lgbm-2.kserve-inference.example.com   True           100                              bike-lgbm-2-predictor-00001   9m20s
bike-xgb-1    http://bike-xgb-1.kserve-inference.example.com    True           100                              bike-xgb-1-predictor-00001    56s
bike-xgb-2    http://bike-xgb-2.kserve-inference.example.com    True           100                              bike-xgb-2-predictor-00001    56s
```


In [148]:
# Make sure there are three pods running for the three inference services, respectively
!kubectl -n kserve-inference get pods -l "serving.kserve.io/inferenceservice in (bike-lgbm-1,bike-lgbm-2,bike-xgb-1, bike-xgb-2)"

NAME                                                      READY   STATUS    RESTARTS   AGE
bike-lgbm-1-predictor-00002-deployment-99d8648ff-hg62g    2/2     Running   0          71m
bike-lgbm-2-predictor-00001-deployment-6448f76548-2j2vl   2/2     Running   0          82m
bike-xgb-1-predictor-00001-deployment-66559f8c57-h4wg7    2/2     Running   0          2m8s
bike-xgb-2-predictor-00001-deployment-6ffb7b49c4-whjzj    2/2     Running   0          2m7s


Example output:

```text
NAME                                                      READY   STATUS    RESTARTS   AGE
bike-lgbm-1-predictor-00001-deployment-68668c6667-5rb89   2/2     Running   0          9m48s
bike-lgbm-2-predictor-00001-deployment-7d69db959f-bl75c   2/2     Running   0          9m47s
bike-xgb-1-predictor-00001-deployment-5f7fdffbf8-l5q2p    2/2     Running   0          83s
bike-xgb-2-predictor-00001-deployment-7797598c87-sxn49    2/2     Running   0          83s
```

In [154]:
# Deploy the second inference graph ("my-graph2")
!kubectl apply -f manifests/inference-graph2.yaml

inferencegraph.serving.kserve.io/my-graph2 created


Expected output:

```text
inferencegraph.serving.kserve.io/my-graph2 created
```

In [156]:
# Make sure the inference graph named "my-graph2" is ready
!kubectl -n kserve-inference get ig my-graph2

NAME        URL                                             READY   AGE
my-graph2   http://my-graph2.kserve-inference.example.com   True    16s


Expected output:

```text
NAME        URL                                             READY   AGE
my-graph2   http://my-graph2.kserve-inference.example.com   True    35s
```

In [157]:
# Also make sure there is one pod running for the "ensemble" inference graph
!kubectl -n kserve-inference get pods -l  serving.kserve.io/inferencegraph=my-graph2

NAME                                          READY   STATUS    RESTARTS   AGE
my-graph2-00001-deployment-77df99c7db-wwgpd   2/2     Running   0          19s


Example output:

```text
NAME                                          READY   STATUS    RESTARTS   AGE
my-graph2-00001-deployment-557678dfbd-zglcg   2/2     Running   0          49s
```

In [158]:
# Send some requests
send_request(to_ig=True, ig_name="my-graph2", model_type="lgbm")

AssertionError: The request was not correctly processed, got status code 500

The response is expected to contain predictions from both "bike-lgbm-1" and "bike-lgbm-2" inference services.

Example output:

```text
{'bike-lgbm-v1': {'id': '09be4efc-65f4-40ce-be91-e326bcf74ca5',
  'model_name': 'bike-lgbm-1',
  'outputs': [{'data': [51.00457318737209, 35.13687405851507],
    'datatype': 'FP64',
    'name': 'output-1',
    'shape': [2, 1]}],
  'parameters': {}},
 'bike-lgbm-v2': {'id': 'd3c53425-e426-4a79-8517-477994a7c49b',
  'model_name': 'bike-lgbm-2',
  'outputs': [{'data': [97.60649891558708, 94.67018085698945],
    'datatype': 'FP64',
    'name': 'output-1',
    'shape': [2, 1]}],
  'parameters': {}}}
```

In [159]:
send_request(to_ig=True, ig_name="my-graph2", model_type="xgb")

AssertionError: The request was not correctly processed, got status code 500

The response should only have predictions from the "bike-xgb-1" and "bike-xgb-2" inference services.

Example output:
```text
{'bike-xgb-v1': {'id': '51e997c0-74e2-49b7-a273-3c489c848486',
  'model_name': 'bike-xgb-1',
  'outputs': [{'data': [112.01314544677734, 97.72420501708984],
    'datatype': 'FP32',
    'name': 'predict',
    'shape': [2, 1]}],
  'parameters': {}},
 'bike-xgb-v2': {'id': '71d65beb-c085-4787-8151-2a3f0fe6f504',
  'model_name': 'bike-xgb-2',
  'outputs': [{'data': [137.37649536132812, 131.63937377929688],
    'datatype': 'FP32',
    'name': 'predict',
    'shape': [2, 1]}],
  'parameters': {}}}
```

In [160]:
send_request(to_ig=True, ig_name="my-graph2", model_type="random")

{'error': 'Failed to process request', 'cause': 'invalid route type: '}


{'error': 'Failed to process request', 'cause': 'invalid route type: '}

An error message should be returned.

Expected output:

```text
{'error': 'Failed to process request',
 'cause': 'None of the routes matched with the switch condition'}
```

Clean up

In [161]:
# Delete all of the three inference services
!kubectl -n kserve-inference delete isvc bike-lgbm-1 bike-lgbm-2 bike-xgb-1 bike-xgb-2

inferenceservice.serving.kserve.io "bike-lgbm-1" deleted
inferenceservice.serving.kserve.io "bike-lgbm-2" deleted
inferenceservice.serving.kserve.io "bike-xgb-1" deleted
inferenceservice.serving.kserve.io "bike-xgb-2" deleted


Expected output:
```text
inferenceservice.serving.kserve.io "bike-lgbm-1" deleted
inferenceservice.serving.kserve.io "bike-lgbm-2" deleted
inferenceservice.serving.kserve.io "bike-lgbm-3" deleted
```

In [162]:
# Delete all inference graphs
!kubectl -n kserve-inference delete ig my-graph1 my-graph2

inferencegraph.serving.kserve.io "my-graph1" deleted
inferencegraph.serving.kserve.io "my-graph2" deleted


Expected output:
```text
inferencegraph.serving.kserve.io "my-graph1" deleted
inferencegraph.serving.kserve.io "my-graph2" deleted
```

### Wrap-up
Please make sure you have the following files in your submission:
- part1_answer.py and part2_answer.py
- model-settings.json 
- all YAML files listed in the "manifests" directory

When submit the files, please **do not** change the file names or put any of them in any sub-folder. The screenshot below shows an expected submission:

<img src="./images/submission-example.png" width=700/>