# Deploy the Custom Predictor on KServe

## Install KServe
In case kfserving is installed, first we uninstall it and then install the KServe SDK using the following command. Restart the kernel after installing the SDK.

In [1]:
!pip uninstall kfserving -y
!pip install kserve==0.7 -q



## Import kubernetes.client and kserve packages 

In [61]:
import kserve
from kubernetes.client import V1Container, V1ResourceRequirements
from kserve import V1beta1InferenceService, V1beta1InferenceServiceSpec, V1beta1PredictorSpec
from kserve import constants
from kubernetes import client 
from kserve import KServeClient
import logging


## Declare Namespace
Specify the nammespace, the InferenceService will be deployed in this namespace.



In [84]:
namespace = 'kubeflow-user-example-com'


## Define the InferenceService
Define the InferenceService based on several key parameters. In the predictor parameter, a V1beta1PredictorSpec object with a container image is created. 

In [85]:
name='tg-gcn-kserve'
kserve_version='v1beta1'
api_version = constants.KSERVE_GROUP + '/' + kserve_version
print(api_version)
isvc = V1beta1InferenceService(api_version=api_version,
                                kind=constants.KSERVE_KIND,
                                metadata=client.V1ObjectMeta(
                                   name=name, namespace=namespace, annotations={'sidecar.istio.io/inject':'false'}),
                                   spec=V1beta1InferenceServiceSpec(
                                       predictor=V1beta1PredictorSpec(
                                           containers=[V1Container(image = "nzarayeneh/kserve-base:latest", 
                                                                   name = "kserve-base",
                                                                    resources=client.V1ResourceRequirements(
                                                                    requests={"cpu": "100m", "memory": "200Mi"},
                                                                    limits={"cpu": "500m", "memory": "500Mi"}
        )
)]
                                       )
                                   )
                                )
isvc

serving.kserve.io/v1beta1


{'api_version': 'serving.kserve.io/v1beta1',
 'kind': 'InferenceService',
 'metadata': {'annotations': {'sidecar.istio.io/inject': 'false'},
              'cluster_name': None,
              'creation_timestamp': None,
              'deletion_grace_period_seconds': None,
              'deletion_timestamp': None,
              'finalizers': None,
              'generate_name': None,
              'generation': None,
              'labels': None,
              'managed_fields': None,
              'name': 'tg-gcn-kserve',
              'namespace': 'kubeflow-user-example-com',
              'owner_references': None,
              'resource_version': None,
              'self_link': None,
              'uid': None},
 'spec': {'explainer': None,
          'predictor': {'active_deadline_seconds': None,
                        'affinity': None,
                        'automount_service_account_token': None,
                        'batcher': None,
                        'canary_traffic_per

## Create InferenceService 
Now, with the InferenceService defined, you can now create it by calling the create method of the KServeClient.



In [86]:
KServe = KServeClient()
KServe.create(isvc)

## Check the InferenceService
Run the following command to watch the InferenceService until it is ready (or times out).

In [88]:
KServe.get(name, namespace=namespace, watch=True, timeout_seconds=300)


NAME                 READY      PREV                      LATEST                    URL                                                              
tg-gcn-kserve        Unknown                                                                                                                         
tg-gcn-kserve        Unknown                                                                                                                         
tg-gcn-kserve        Unknown    0                         100                                                                                        
tg-gcn-kserve        Unknown    0                         100                                                                                        
tg-gcn-kserve        Unknown    0                         100                                                                                        
tg-gcn-kserve        Unknown    0                         100                                       

## Perform Inference 
Next, you can try sending an inference request to the deployed model in order to get predictions. This notebook assumes that you running it in your Kubeflow cluster and will use the internal URL of the InferenceService.

In [89]:
import requests

isvc_resp = KServe.get(name, namespace=namespace)
isvc_resp

{'apiVersion': 'serving.kserve.io/v1beta1',
 'kind': 'InferenceService',
 'metadata': {'annotations': {'sidecar.istio.io/inject': 'false'},
  'creationTimestamp': '2022-06-09T19:12:38Z',
  'finalizers': ['inferenceservice.finalizers'],
  'generation': 1,
  'managedFields': [{'apiVersion': 'serving.kserve.io/v1beta1',
    'fieldsType': 'FieldsV1',
    'fieldsV1': {'f:metadata': {'f:annotations': {'.': {},
       'f:sidecar.istio.io/inject': {}}},
     'f:spec': {'.': {}, 'f:predictor': {'.': {}, 'f:containers': {}}}},
    'manager': 'OpenAPI-Generator',
    'operation': 'Update',
    'time': '2022-06-09T19:12:35Z'},
   {'apiVersion': 'serving.kserve.io/v1beta1',
    'fieldsType': 'FieldsV1',
    'fieldsV1': {'f:metadata': {'f:finalizers': {'.': {},
       'v:"inferenceservice.finalizers"': {}}},
     'f:status': {'.': {},
      'f:address': {'.': {}, 'f:url': {}},
      'f:components': {'.': {},
       'f:predictor': {'.': {},
        'f:address': {'.': {}, 'f:url': {}},
        'f:late

In [91]:
isvc_url = isvc_resp['status']['address']['url']

print(isvc_url)

inference_input = {
    "nodes": [
      {"primary_id": 7, "type": "Paper"}, 
      {"primary_id": 999, "type": "Paper"}
    ]
}

response = requests.post(isvc_url, json=inference_input)
print(response.text)

http://tg-gcn-kserve.kubeflow-user-example-com.svc.cluster.local/v1/models/tg-gcn-kserve:predict
{"predictions": [{"primary_id": "7", "label": 3}, {"primary_id": "999", "label": 2}]}


You should see two predictions returned (i.e. `{"predictions": [{"primary_id": "7", "label": 3}, {"primary_id": "999", "label": 2}]}`). Two sets of data points sent for inference correspond to the lable `3` and `2`, respectively. In this case, the model predicts that primary_id 7 has label 3, and primary_id 999 has label 2.

To learn more about sending inference requests, please check out the [KServe guide](https://kserve.github.io/website/0.7/get_started/first_isvc/#3-determine-the-ingress-ip-and-ports).



## Delete InferenceService
When you are done with your InferenceService, you can delete it by running the following.

In [None]:
KServe.delete(name, namespace=namespace)
