# Launch a Seldon Deployment
> Get an ML endpoint up and running on your cluster!

- toc: true 
- badges: true
- comments: true
- categories: [kubernetes, docker]

### Reqs
* access to kubernetes cluster 
    * If you are coming from [Launch a local kubernetes cluster](https://ntorba.github.io/writing/jupyter/2020/07/17/local-kubernetes.html), you are good to follow this example. If not, you can quickly follow that post before running the example here!

### Goal
* Launch first seldon deployment with grpc or rest 

### Steps
1. Define a seldon python component
2. Build docker image
3. Run a container based on docker image to test the endpoint
4. Define SeldonDeployment yaml file 
5. `kubectl apply` SeldonDeployment to the kubernetes cluster. 

### Define Python Component
I'm taking this example code directly from [seldon-core irisClassifier example](https://github.com/SeldonIO/seldon-core/blob/master/examples/models/sklearn_iris/sklearn_iris.ipynb). 
First, we train a model based on the iris dataset included in the sklearn package, then we serve that trained model in the seldon endpoint.

In [4]:
#hide_output
!mkdir iris_classifier

mkdir: iris_classifier: File exists


In [1]:
%%writefile iris_classifier/train_iris.py
#collapse_show
#hide_output
import joblib
from sklearn.pipeline import Pipeline
from sklearn.linear_model import LogisticRegression
from sklearn import datasets


OUTPUT_FILE = "iris_classifier/IrisClassifier.sav"


print("Loading iris data set...")
iris = datasets.load_iris()
X, y = iris.data, iris.target
print("Dataset loaded!")

clf = LogisticRegression(solver="liblinear", multi_class="ovr")
p = Pipeline([("clf", clf)])
print("Training model...")
p.fit(X, y)
print("Model trained!")

print(f"Saving model in {OUTPUT_FILE}")
joblib.dump(p, OUTPUT_FILE)
print("Model saved!")




Overwriting iris_classifier/train_iris.py


In [6]:
#hide_output
!python iris_classifier/train_iris.py

Loading iris data set...
Dataset loaded!
Training model...
Model trained!
Saving model in iris_classifier/IrisClassifier.sav
Model saved!


Next, we define the seldon python component that will be used to serve the model. 
Seldon has a few [components](https://docs.seldon.io/projects/seldon-core/en/v1.1.0/python/python_component.html). In this example, we only use the Model component. Seldon components hold the logic that will be implanted into the serving endpoint that seldon creates. The model component must have a predict function, which is called when the future endpoint is hit. 
The reason seldon is so useful is because this is the only python code we need to write to serve this model. Seldon provides the rest of the logic, which puts this component into a web server, to serve the model.

An important note about this section is that you'lll see the file is named `IrisClassifier.py`, which is camelcased. This is important, and you should not change this. The file name and the python component class name **must match**. 

In [7]:
%%writefile iris_classifier/IrisClassifier.py
#collapse_show
#hide_output
import joblib

class IrisClassifier(object):

    def __init__(self):
        self.model = joblib.load('IrisClassifier.sav')

    def predict(self,X,features_names):
        return self.model.predict_proba(X)

Overwriting iris_classifier/IrisClassifier.py


### Build Docker Image
After defining a python component, there are two ways to create the docker image necessary for deployment. 
* [define a Dockerfile](https://docs.seldon.io/projects/seldon-core/en/v1.1.0/python/python_wrapping_docker.html) which launches the seldon microservice
* use [s2i](https://docs.seldon.io/projects/seldon-core/en/v1.1.0/wrappers/s2i.html) to build the image directly from source code. 

I prefer manually defining a Dockerfile because it provides more control over the process. However, s2i is a great tool that works just as well. 



#### Write requirements.txt 
We must write a requirements.txt library with all requirements for the docker image listed.

In [8]:
%%writefile iris_classifier/requirements.txt
#hide_output
sklearn
seldon-core

Overwriting iris_classifier/requirements.txt


#### Define Dockerfile
The Dockerfile follows the example provided [here](https://docs.seldon.io/projects/seldon-core/en/v1.1.0/python/python_wrapping_docker.html). 
We start from the python:3.7-slim base image, copy the code from the current directory, which includes the python component we defined earlier, install requirements, then expose port 5000 for the microservice to run. 
Next, we define seldon specific variables. 
* MODEL_NAME must match the python file name (which also much match the python component class name). 
* API_TYPE can be either REST or GRPC.
* SERVICE_TYPE is the type of seldon component. MODEL for this example. (explore the other seldon components [here]()
* PERSISTENCE: 0 or 1. Defaults to 0. If it is set to 1, the component class will be periodically persisted to reis. This s unnecessary for our case because the component class will not change.
    * this would be more pertinent for components like routers, which can have updating states for long running jobs. 

In [50]:
%%writefile iris_classifier/Dockerfile
#collapse_show
#hide_output
FROM python:3.7-slim
COPY . /app
WORKDIR /app
RUN pip install -r requirements.txt
EXPOSE 5000

# Define environment variable
ENV MODEL_NAME IrisClassifier 
ENV API_TYPE REST
ENV SERVICE_TYPE MODEL 
ENV PERSISTENCE 0

# seldon-core-microservice is a command line tool installed with the seldon-core python libray. You can use this locally as well!
CMD exec seldon-core-microservice $MODEL_NAME $API_TYPE --service-type $SERVICE_TYPE --persistence $PERSISTENCE

Overwriting iris_classifier/Dockerfile


To test this example, let's build and run the docker image! 

#### Docker Build
Pass the iris_classifier dir where the image guts live, then pass a -t to tag the image with a name referring to your preferred docker image repository (I'm running on locally).

In [51]:
#hide_output
!docker build iris_classifier/ -t localhost:5000/iris_ex:latest

Sending build context to Docker daemon  11.78kB
Step 1/10 : FROM python:3.7-slim
 ---> b386e7420fc3
Step 2/10 : COPY . /app
 ---> 4ff1fc2d09e5
Step 3/10 : WORKDIR /app
 ---> Running in 0e5b783b9df2
Removing intermediate container 0e5b783b9df2
 ---> 840bd996fe26
Step 4/10 : RUN pip install -r requirements.txt
 ---> Running in 6f1af0205271
Collecting sklearn
  Downloading sklearn-0.0.tar.gz (1.1 kB)
Collecting seldon-core
  Downloading seldon_core-1.2.2-py3-none-any.whl (108 kB)
Collecting scikit-learn
  Downloading scikit_learn-0.23.2-cp37-cp37m-manylinux1_x86_64.whl (6.8 MB)
Collecting Flask<2.0.0
  Downloading Flask-1.1.2-py2.py3-none-any.whl (94 kB)
Collecting Flask-cors<4.0.0
  Downloading Flask_Cors-3.0.8-py2.py3-none-any.whl (14 kB)
Collecting prometheus-client<0.9.0,>=0.7.1
  Downloading prometheus_client-0.8.0-py2.py3-none-any.whl (53 kB)
Collecting Flask-OpenTracing<1.2.0,>=1.1.0
  Downloading Flask-OpenTracing-1.1.0.tar.gz (8.2 kB)
Collecting opentracing<2.4.0,>=2.2.0
  Downlo

### Test Image
You can test your newly created image by running the image and hitting the endpoint. 
You may ask yourself at this point, "if I have a working docker image, what do I need kubernetes for?" 
This is a great question. For simple use cases, this docker image itself is all you need, and you could run it as a standalone service. If the load is small and you can run it without any load balancing functionalities, you are good to go. 
However, kubernetes is a container orchestration engine. That means it is built to handle complex containerized applications and will make your life much easier if you need to handle more complex operations for applications that need to serve on a large scale. 

In [52]:
!docker run --name "iris_predictor" -d --rm -p 5001:5000 localhost:5000/iris_ex:latest

4d88f1163a71622fc2b67f33b8af4e95c2c8dafa9da43e2fe8c06e4322b7591c


You could also remove the -d argument from the above command and run this command in a separate window to see the log output while sending requests to the endpoint. Test the endpoint with the curl below! 

In [53]:
import numpy as np
import grpc 
from seldon_core.proto import prediction_pb2
from seldon_core.proto import prediction_pb2_grpc


### Test Rest Endpoint
!curl -s http://localhost:5001/predict -H "Content-Type: application/json" -d '{"data":{"ndarray":[[5.964,4.006,2.081,1.031]]}}'


### Test GRPC Endpoint
# data = np.array([[5.964,4.006,2.081,1.031]])

# datadef = prediction_pb2.DefaultData(
#     tensor=prediction_pb2.Tensor(shape=data.shape, values=data.flatten())
# )
# request = prediction_pb2.SeldonMessage(data=datadef)
# with grpc.insecure_channel("localhost:5001") as channel:
#     stub = prediction_pb2_grpc.ModelStub(channel)
#     response = stub.Predict(request=request)
# print(response)

If you see successful output, you have your first seldon-core-microservice up and running! Now, we will deploy this as a simple inference graph on our kubernetes cluster. 
First, let's take down the running docker container:

Next, need to define our deployment configuration file. Here is a seldon config file for our deployment: 

In [54]:
!docker container rm iris_predictor --force

iris_predictor


In [55]:
%%writefile iris_classifier/sklearn_iris_deployment.yaml
#hide_output
apiVersion: machinelearning.seldon.io/v1alpha2
kind: SeldonDeployment
metadata:
  name: seldon-deployment-example
spec:
  name: sklearn-iris-deployment
  predictors:
  - componentSpecs:
    - spec:
        containers:
        - image: seldonio/sklearn-iris:0.1
          imagePullPolicy: IfNotPresent
          name: sklearn-iris-classifier
    graph:
      children: []
      endpoint:
        type: REST
      name: sklearn-iris-classifier
      type: MODEL
    name: sklearn-iris-predictor
    replicas: 1

Overwriting iris_classifier/sklearn_iris_deployment.yaml


Some important notes about the deployment config: 
* apiVersion: this sends out request to the appropriate endpoint of the kubernets api, which was installed by helm earlier in this tutorial
* kind: tells Kubernetes what kind of resource to create. 
* metadata: add labels, like name, to the deployment
* spec: 
    * predictors: this is a list of predictors to deploy. It is a list because you have the option to create multiple inference graphs in the same spec. This is useful for things like Canary deployment, where you only want a new graph to recieve a small percentage of traffic
        * componentSpecs: add information about the containers that need to be pulled to create our graph. In our case, we only need a single containe to serve our model. If we were creating a more complex inference graph (maybe with a transformer, router, and another model, then we would need to include the docker containers that house them in this section)
        * graph: this is where you define the flow of components. This is easy in our case, there is only one component so we define one endpoint with no children. If there were more compnoents, we would fill out the children componenets in the children attriubte of the head of the graph. Seldon graphs are built implicitly through the use of the children attribute of each node in the graph. 
        
There is one last step to deploy our graph, we must push our docker container to a registry! I am running a local registry with my kind cluster, thanks to the script given [here](https://kind.sigs.k8s.io/docs/user/local-registry/). You can also push to DockerHub as well. 

In [56]:
!docker push localhost:5000/iris_ex:latest

The push refers to repository [localhost:5000/iris_ex]

[1B60d57b93: Preparing 
[1B43291ec5: Preparing 
[1B63f2d025: Preparing 
[1Bf01300cf: Preparing 
[1Ba0be9040: Preparing 
[1B1a837902: Preparing 
[7B60d57b93: Pushed     276MB/269.5MBA[2K[7A[2K[7A[2K[7A[2K[7A[2K[7A[2K[7A[2K[7A[2K[7A[2K[7A[2K[7A[2K[7A[2K[7A[2K[7A[2K[7A[2K[7A[2K[7A[2K[7A[2K[7A[2K[7A[2K[7A[2K[7A[2K[7A[2K[7A[2K[7A[2K[7A[2K[7A[2K[7A[2K[7A[2K[7A[2K[7A[2K[7A[2K[7A[2K[7A[2K[7A[2K[7A[2K[7A[2K[7A[2K[7A[2K[7A[2K[7A[2K[7A[2K[7A[2K[7A[2K[7A[2K[7A[2K[7A[2K[7A[2K[7A[2K[7A[2K[7A[2K[7A[2K[7A[2K[7A[2K[7A[2K[7A[2K[7A[2K[7A[2K[7A[2K[7A[2K[7A[2K[7A[2K[7A[2K[7A[2K[7A[2K[7A[2K[7A[2K[7A[2K[7A[2K[7A[2K[7A[2K[7A[2K[7A[2K[7A[2K[7A[2K[7A[2K[7A[2K[7A[2K[7A[2K[7A[2K[7A[2K[7A[2K[7A[2K[7A[2K[7A[2K[7A[2K[7A[2K[7A[2K[7A[2K[7A[2K[7A[2K[7A[2K[7A[2K[7A[2K[7A[2

With our docker image in a registry, it is available to our cluster, so we can deploy!

In [57]:
!kubectl apply -f iris_classifier/sklearn_iris_deployment.yaml
from time import sleep
sleep(5) # give the clsuter some to get the deployment running before executing the rollout

seldondeployment.machinelearning.seldon.io/seldon-deployment-example created


You can check the status of your deployment. 

In [58]:
!kubectl rollout status deploy/$(kubectl get deploy -l seldon-deployment-id=seldon-deployment-example \
                                 -o jsonpath='{.items[0].metadata.name}')

Waiting for deployment "seldon-92a927e5e90d7602e08ba9b9304f70e8" rollout to finish: 0 of 1 updated replicas are available...
deployment "seldon-92a927e5e90d7602e08ba9b9304f70e8" successfully rolled out


Once the deployment is ready, you will need to port-forward the pod to your localhost in order check the request. That can be done wiht kubectl port-forward command 
```bash 
kubectl port-forward $(kubectl get pods -l seldon-app=seldon-deployment-example-sklearn-iris-predictor -o jsonpath='{.items[0].metadata.name}') 9000:9000
```

You must run this command in a separate window because it will need to run while we curl the endpoint. 

In [59]:
# dir(prediction_pb2_grpc) 

In [63]:
import numpy as np
import grpc 
from seldon_core.proto import prediction_pb2
from seldon_core.proto import prediction_pb2_grpc


### Test REST endpoint
res = !curl -s http://localhost:9000/predict -H "Content-Type: application/json" -d '{"data":{"ndarray":[[5.964,4.006,2.081,1.031]]}}'
print(res)

### Test GRPC endpoint
# data = np.array([[5.964,4.006,2.081,1.031]])

# datadef = prediction_pb2.DefaultData(tensor=prediction_pb2.Tensor(shape=data.shape, values=data.flatten()))
# request = prediction_pb2.SeldonMessage(data=datadef)
# with grpc.insecure_channel("localhost:9000/predict") as channel:
#     stub = prediction_pb2_grpc.ModelStub(channel)
#     print(dir(stub))
#     response = stub.Predict(request=request)
# print(response)

['{"data":{"names":["t:0","t:1","t:2"],"ndarray":[[0.9548873249364169,0.04505474761561406,5.792744796895234e-05]]},"meta":{}}']


You have successfully created a seldon endpoint on kubernetes! 

In [64]:
## Cleanup
!kubectl delete -f sklearn_iris_deployment.yaml


seldondeployment.machinelearning.seldon.io "seldon-deployment-example" deleted


### Conclusion 
In this quick example, we scratched the surface of seldon-core by deploying a simple model endpoint on kubernetes. 
If you are hungry for more, chech out more of the posts in the [Seldon Super Series](). There, you can find notebooks similar to this that deploy more complex inference graphs, or dive into the underlying kubernetes concepts that seldon uses to make this possible! 

### Next Up
* other seldon components 
* seldon graph construction 
* multi-component inference graph
* operators and custom resources 