# Inference graph
Simply deploying one model may not be enough in some complex use cases. For example, you may want to chain multiple models into a pipeline, where one model uses the inferences of another model as inputs. You may want to make the final decisions based on the outputs of multiple models in some use cases. Sometimes you may also want to direct user requests to different models based on user information. KServe offers a feature called "inference graph" to allow users to easily benefit from multiple models. 

We highly recommend reading the concept of inference graph from [this KServe doc](https://github.com/kserve/kserve/blob/master/docs/samples/graph/README.md) before going to the example of inference graph below. 

**Clarification of the Kserve doc of inference graph**: 
In sections 2.4 (Ensemble Node) and 2.5 (Splitter Node) the "routes" in the YAML examples should be "steps". For example, instead of
```yaml
...
root:
  routerType: Ensemble
  routes:
  - serviceName: sklearn-iris
    name: sklearn-iris
  - serviceName: xgboost-iris
    name: xgboost-iris
...
``` 
the example should be
```yaml
...
root:
  routerType: Ensemble
  steps:
  - serviceName: sklearn-iris
    name: sklearn-iris
  - serviceName: xgboost-iris
    name: xgboost-iris
...
```

## Inference graph example
This example shows how to use graph inference to split users into two inference services based on user information. 

In the beginning, let's deploy two inference services for predicting red wine quality. You need to change the model S3 URIs to your own in [manifests/redwine-model-ig.yaml](./manifests/redwine-model-ig.yaml). Replace the "storageUri" with your own ones as shown below. 

```yaml
apiVersion: "serving.kserve.io/v1beta1"
...
spec:
  predictor:
    serviceAccountName: kserve-sa 
    model:
      modelFormat: 
        name: sklearn
      storageUri: s3://... # change to the S3 URI of your model that was trained using the hyperparameters alpha=0.5 and l1_ratio=0.5. This can be the one you trained when following the first week's MLflow tutorial

---
spec:
  predictor:
    serviceAccountName: kserve-sa 
    model:
      modelFormat: 
        name: sklearn
      storageUri: s3://... # change to the S3 URI of your model that was trained using the hyperparameters alpha=0.7 and l1_ratio=0.7. This can be the one you trained when following this week's tutorial of canary deployment
```

In [1]:
# Deploy two inference services for predicting red wine quality
!kubectl apply -f manifests/redwine-model-ig.yaml

inferenceservice.serving.kserve.io/redwine1 created
inferenceservice.serving.kserve.io/redwine2 created


Expected output: 
```text
inferenceservice.serving.kserve.io/redwine1 created
inferenceservice.serving.kserve.io/redwine2 created
```

In [4]:
# Ensure the "redwine1" inference service is ready
!kubectl -n kserve-inference get isvc redwine1

NAME       URL                                            READY   PREV   LATEST   PREVROLLEDOUTREVISION   LATESTREADYREVISION                AGE
redwine1   http://redwine1.kserve-inference.example.com   True           100                              redwine1-predictor-default-00001   26s


Expected output:
```text
NAME       URL                                            READY   PREV   LATEST   PREVROLLEDOUTREVISION   LATESTREADYREVISION                AGE
redwine1   http://redwine1.kserve-inference.example.com   True           100                              redwine1-predictor-default-00001   107s
```

In [5]:
# Ensure there is one pod running for the "redwine1" inference service
!kubectl -n kserve-inference get pod -l serving.kserve.io/inferenceservice=redwine1

NAME                                                          READY   STATUS    RESTARTS   AGE
redwine1-predictor-default-00001-deployment-7ff6449d4-lj4bt   2/2     Running   0          39s


Expected output:
```text
NAME                                                           READY   STATUS    RESTARTS   AGE
redwine1-predictor-default-00001-deployment-7d868c5755-gsdc9   2/2     Running   0          3m20s
```

In [6]:
# Similarly, make sure the "redwine2" inference service is ready
!kubectl -n kserve-inference get isvc redwine2

NAME       URL                                            READY   PREV   LATEST   PREVROLLEDOUTREVISION   LATESTREADYREVISION                AGE
redwine2   http://redwine2.kserve-inference.example.com   True           100                              redwine2-predictor-default-00001   57s


Expected output:
```text
NAME       URL                                            READY   PREV   LATEST   PREVROLLEDOUTREVISION   LATESTREADYREVISION                AGE
redwine2   http://redwine2.kserve-inference.example.com   True           100                              redwine2-predictor-default-00001   3m54s
```

In [7]:
# Also ensure there is one pod running for the "redwine2" inference service
!kubectl -n kserve-inference get pod -l serving.kserve.io/inferenceservice=redwine2

NAME                                                           READY   STATUS    RESTARTS   AGE
redwine2-predictor-default-00001-deployment-65f6765967-nf56v   2/2     Running   0          59s


Expected output:
```text
NAME                                                           READY   STATUS    RESTARTS   AGE
redwine2-predictor-default-00001-deployment-57768c7955-bqnb8   2/2     Running   0          4m28s
```

Next, let's deploy an inference graph for the "redwine1" and "redwine2" inference services. But before the deployment, let's take a look at [manifests/inference-graph.yaml](./manifests/inference-graph.yaml) that specifies the inference graph:
```yaml
spec: 
  nodes: 
    root: 
      routerType: Switch
      steps: 
      - serviceName: redwine1
        condition: "[@this].#(userId==1)"
      - serviceName: redwine2
        condition: "[@this].#(userId==2)"
```
This inference graph includes only one routing node whose type is Switch. This routing node will direct user requests to one of the inference services listed in the `steps` section based on the condition. Suppose the requests are in the form of
```python
{
  "userId": 1,
  "instances": [...] #some wine-related chemical attributes
}
```

In this example, the user requests with userId equal to 1 will be routed to the "redwine1" inference service. Similarly, requests with userId equal to 2 will be routed to the "redwine2" inference service. If the requests have no userId or userId is something else than 1 or 2, the routing node will return the request directly. The condition is defined using the [gjson syntax](https://github.com/tidwall/gjson/blob/master/SYNTAX.md). Let's use the expression `[@this].#(userId==1)` as an example. `[@this]` refers to the request, `.#(userId==1)` checks whether the value of the "userId" field of the request is equal to 1. 

In [8]:
# Deploy inference graph
!kubectl apply -f manifests/inference-graph.yaml

inferencegraph.serving.kserve.io/mygraph created


Expected output:
```text
inferencegraph.serving.kserve.io/mygraph created
```

In [10]:
# Make sure the inference graph named "mygraph" is ready
!kubectl -n kserve-inference get ig mygraph

NAME      URL                                           READY   AGE
mygraph   http://mygraph.kserve-inference.example.com   True    18s


Expected output:
```text
NAME      URL                                           READY   AGE
mygraph   http://mygraph.kserve-inference.example.com   True    15s
```

In [11]:
# There should be one pod running for the "mrgraph" inference service
!kubectl -n kserve-inference get pod -l serving.kserve.io/inferencegraph=mygraph

NAME                                        READY   STATUS    RESTARTS   AGE
mygraph-00001-deployment-6499d8b484-jxfks   2/2     Running   0          34s


Expected output:
```text
NAME                                        READY   STATUS    RESTARTS   AGE
mygraph-00001-deployment-549797bdf8-rh4gc   2/2     Running   0          18m
```

Let's then send a request to the inference graph. 

In [12]:
import requests

# Prepare data, headers, and url
input_sample = [
        [7.8, 0.58, 0.02, 2, 0.073, 9, 18, 0.9968, 3.36, 0.57, 9.5],
        [8.9, 0.22, 0.48, 1.8, 0.077, 29, 60, 0.9968, 3.39, 0.53, 9.9]
    ]
ig_name = "mygraph"
headers = {}
headers["Host"] = f"{ig_name}.kserve-inference.example.com"
url = "http://kserve-gateway.local:30200"

In [13]:
req_data = {
    "userId": 1,
    "instances": input_sample
}

result = requests.post(url, json=req_data, headers=headers)
print(result.json())

{'predictions': [5.657319539336507, 5.569620109646147]}


Expected output: `{'predictions': [5.657319539336507, 5.569620109646147]}`

In [14]:
req_data = {
    "userId": 2,
    "instances": input_sample
}

result = requests.post(url, json=req_data, headers=headers)
print(result.json())

{'predictions': [5.657319539336507, 5.569620109646147]}


Expected output: {'predictions': [5.74274844741817, 5.566989419987943]}

The output is different from the previous one though the input data is the same. This is because the previous two requests have different userIds, so they were directed to different inference services. 

In [15]:
req_data = {
    "userId": 3,
    "instances": input_sample
}

result = requests.post(url, json=req_data, headers=headers)
print(result.json())

{'userId': 3, 'instances': [[7.8, 0.58, 0.02, 2, 0.073, 9, 18, 0.9968, 3.36, 0.57, 9.5], [8.9, 0.22, 0.48, 1.8, 0.077, 29, 60, 0.9968, 3.39, 0.53, 9.9]]}


Expected output: 
```text
{'userId': 3, 'instances': [[7.8, 0.58, 0.02, 2, 0.073, 9, 18, 0.9968, 3.36, 0.57, 9.5], [8.9, 0.22, 0.48, 1.8, 0.077, 29, 60, 0.9968, 3.39, 0.53, 9.9]]}
```

The userId of this request is not 1 or 2. In other words, this request does not match any conditions specified for the inference graph. As a result, this request wasn't further directed to any inference services and was directly returned by the routing node. 

In [16]:
# Delete the inference graph
!kubectl -n kserve-inference delete -f manifests/inference-graph.yaml

inferencegraph.serving.kserve.io "mygraph" deleted


Expected output:
```text
inferencegraph.serving.kserve.io "mygraph" deleted
```

In [17]:
# Delete the "redwine1" and "redwine2" inference services
!kubectl -n kserve-inference delete -f manifests/redwine-model-ig.yaml

inferenceservice.serving.kserve.io "redwine1" deleted
inferenceservice.serving.kserve.io "redwine2" deleted


Expected output:
```text
inferenceservice.serving.kserve.io "redwine1" deleted
inferenceservice.serving.kserve.io "redwine2" deleted
```