Iter8-kfserving enables metrics-driven experiments, progressive delivery, and automated rollouts for ML models served over Kubernetes and OpenShift clusters.
The picture below illustrates metrics-driven progressive canary release of a KFServing model using iter8-kfserving.
- Quick start on Minikube
- Installation
- Anatomy of an iter8 experiment
- Progressive canary release experiment
- Describe experiments using iter8ctl
- Iter8 metrics
- Concurrent experiments
- Reference
- Wiki with roadmap and developer documentation
- Contributing
Steps 1 to 7 demonstrate metrics-driven progressive canary release of a KFServing model using iter8-kfserving. This demo uses KFServing v0.5.0-rc2.
Before you begin, you will need Minikube, Kustomize v3, and Go 1.13+.
Step 1: Start Minikube with sufficient resources.
minikube start --cpus 6 --memory 12288 --kubernetes-version=v1.17.11 --driver=docker
Step 2: Install KFServing, kfserving-monitoring, and iter8-kfserving.
curl -L https://raw.githubusercontent.com/iter8-tools/iter8-kfserving/main/samples/quickstart/platformsetup.sh | /bin/bash
Step 3: In a separate terminal, setup Minikube tunnel. If prompted, enter password.
minikube tunnel --cleanup
Step 4: Create a KFServing v1beta1 inferenceservice with a default
model. Update it with a canary
model. This step may take a couple of minutes.
curl -L https://raw.githubusercontent.com/iter8-tools/iter8-kfserving/main/samples/quickstart/inferenceservicesetup.sh | /bin/bash
Step 5: In a separate terminal, generate prediction requests for the inferenceservice.
curl -L https://raw.githubusercontent.com/iter8-tools/iter8-kfserving/main/samples/quickstart/predictionrequests.sh | /bin/bash
Step 6: Create the iter8-kfserving canary experiment.
kubectl apply -f https://raw.githubusercontent.com/iter8-tools/iter8-kfserving/main/samples/quickstart/experiment.yaml
apiVersion: iter8.tools/v2alpha1 kind: Experiment metadata: name: experiment-1 spec: target: default/my-model strategy: type: Canary criteria: indicators: - 95th-percentile-tail-latency objectives: - metric: mean-latency upperLimit: 1000 - metric: error-rate upperLimit: "0.01" duration: intervalSeconds: 15 maxIterations: 12The above spec asks iter8 to perform a
canary release experiment
for the inferenceservice named my-model
in the default
namespace; during the experiment, the default and canary model versions will be assessed every 15 seconds over 12 iterations; when the experiment completes, the canary version will be considered successful (winner
) if its mean-latency is within 1000 msec and its error rate is within 1%. If canary is successful, it will be rolled out: i.e., 100% of the traffic will be shifted to it.
Step 7: In a separate terminal, periodically describe the experiment.
Install iter8ctl. You can change the directory where iter8ctl
binary is installed by changing GOBIN below.
GO111MODULE=on GOBIN=/usr/local/bin go get github.com/iter8-tools/iter8ctl@v0.1.0-alpha
Periodically describe the experiment.
while clear; do
kubectl get experiment experiment-1 -o yaml | iter8ctl describe -f -
sleep 15
done
You should see output similar to the following.
******
Experiment name: experiment-1
Experiment namespace: default
Experiment target: default/my-model
******
Number of completed iterations: 10
******
Winning version: canary
******
Objectives
+--------------------------+---------+--------+
| OBJECTIVE | DEFAULT | CANARY |
+--------------------------+---------+--------+
| mean-latency <= 1000.000 | true | true |
+--------------------------+---------+--------+
| error-rate <= 0.010 | true | true |
+--------------------------+---------+--------+
******
Metrics
+--------------------------------+---------+---------+
| METRIC | DEFAULT | CANARY |
+--------------------------------+---------+---------+
| request-count | 132.294 | 73.254 |
+--------------------------------+---------+---------+
| 95th-percentile-tail-latency | 298.582 | 294.597 |
| (milliseconds) | | |
+--------------------------------+---------+---------+
| mean-latency (milliseconds) | 229.529 | 230.090 |
+--------------------------------+---------+---------+
| error-rate | 0.000 | 0.000 |
+--------------------------------+---------+---------+
The experiment should complete after 12 iterations (~3 mins). Once the experiment completes, inspect the InferenceService object.
kubectl get isvc/my-model
You should see 100% of the traffic shifted to the canary model, similar to the below output.
# output of the above command should be similar to the below
NAME URL READY PREV LATEST PREVROLLEDOUTREVISION LATESTREADYREVISION AGE
my-model http://my-model.default.example.com True 100 my-model-predictor-default-zwjbq 5m