Iter8-kfserving

Iter8-kfserving enables metrics-driven experiments, progressive delivery, and automated rollouts for ML models served over Kubernetes and OpenShift clusters.

The picture below illustrates metrics-driven progressive canary release of a KFServing model using iter8-kfserving.

Quick start on Minikube

Steps 1 to 7 demonstrate metrics-driven progressive canary release of a KFServing model using iter8-kfserving. This demo uses KFServing v0.5.0-rc2.

Before you begin, you will need Minikube, Kustomize v3, and Go 1.13+.

Step 1: Start Minikube with sufficient resources.

minikube start --cpus 6 --memory 12288 --kubernetes-version=v1.17.11 --driver=docker

Step 2: Install KFServing, kfserving-monitoring, and iter8-kfserving.

curl -L https://raw.githubusercontent.com/iter8-tools/iter8-kfserving/main/samples/quickstart/platformsetup.sh | /bin/bash

Step 3: In a separate terminal, setup Minikube tunnel. If prompted, enter password.

minikube tunnel --cleanup

Step 4: Create a KFServing v1beta1 inferenceservice with a default model. Update it with a canary model. This step may take a couple of minutes.

curl -L https://raw.githubusercontent.com/iter8-tools/iter8-kfserving/main/samples/quickstart/inferenceservicesetup.sh | /bin/bash

Step 5: In a separate terminal, generate prediction requests for the inferenceservice.

curl -L https://raw.githubusercontent.com/iter8-tools/iter8-kfserving/main/samples/quickstart/predictionrequests.sh | /bin/bash

Step 6: Create the iter8-kfserving canary experiment.

kubectl apply -f https://raw.githubusercontent.com/iter8-tools/iter8-kfserving/main/samples/quickstart/experiment.yaml

In this step, you are creating an iter8 experiment resource object in the Kubernetes cluster, which looks as follows.

apiVersion: iter8.tools/v2alpha1
kind: Experiment
metadata:
  name: experiment-1
spec:
  target: default/my-model
  strategy:
    type: Canary
  criteria:
    indicators:
    - 95th-percentile-tail-latency
    objectives:
    - metric: mean-latency
      upperLimit: 1000
    - metric: error-rate
      upperLimit: "0.01"
  duration:
    intervalSeconds: 15
    maxIterations: 12

The above spec asks iter8 to perform a canary release experiment for the inferenceservice named my-model in the default namespace; during the experiment, the default and canary model versions will be assessed every 15 seconds over 12 iterations; when the experiment completes, the canary version will be considered successful (winner) if its mean-latency is within 1000 msec and its error rate is within 1%. If canary is successful, it will be rolled out: i.e., 100% of the traffic will be shifted to it.

Step 7: In a separate terminal, periodically describe the experiment.

Install iter8ctl. You can change the directory where iter8ctl binary is installed by changing GOBIN below.

GO111MODULE=on GOBIN=/usr/local/bin go get github.com/iter8-tools/iter8ctl@v0.1.0-alpha

Periodically describe the experiment.

while clear; do
  kubectl get experiment experiment-1 -o yaml | iter8ctl describe -f -
  sleep 15
done

You should see output similar to the following.

******
Experiment name: experiment-1
Experiment namespace: default
Experiment target: default/my-model

******
Number of completed iterations: 10

******
Winning version: canary

******
Objectives
+--------------------------+---------+--------+
|        OBJECTIVE         | DEFAULT | CANARY |
+--------------------------+---------+--------+
| mean-latency <= 1000.000 | true    | true   |
+--------------------------+---------+--------+
| error-rate <= 0.010      | true    | true   |
+--------------------------+---------+--------+

******
Metrics
+--------------------------------+---------+---------+
|             METRIC             | DEFAULT | CANARY  |
+--------------------------------+---------+---------+
| request-count                  | 132.294 |  73.254 |
+--------------------------------+---------+---------+
| 95th-percentile-tail-latency   | 298.582 | 294.597 |
| (milliseconds)                 |         |         |
+--------------------------------+---------+---------+
| mean-latency (milliseconds)    | 229.529 | 230.090 |
+--------------------------------+---------+---------+
| error-rate                     |   0.000 |   0.000 |
+--------------------------------+---------+---------+

The experiment should complete after 12 iterations (~3 mins). Once the experiment completes, inspect the InferenceService object.

kubectl get isvc/my-model

You should see 100% of the traffic shifted to the canary model, similar to the below output.

# output of the above command should be similar to the below
NAME       URL                                   READY   PREV   LATEST   PREVROLLEDOUTREVISION   LATESTREADYREVISION                AGE
my-model   http://my-model.default.example.com   True           100                              my-model-predictor-default-zwjbq   5m

Name		Name	Last commit message	Last commit date
Latest commit History 140 Commits
.github		.github
docs		docs
install		install
samples/quickstart		samples/quickstart
tests/e2e		tests/e2e
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Iter8-kfserving

Table of Contents

Quick start on Minikube

About

Releases

Packages

Contributors 4

Languages

License

kalantar/iter8-kfserving

Folders and files

Latest commit

History

Repository files navigation

Iter8-kfserving

Table of Contents

Quick start on Minikube

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages