# Deployment strategy: Canary deployment
Let's consider a scenario where you have previously deployed a model to production. Now, you have developed a new model and want to use it to replace the old version. A straightforward approach is directly deleting the old model and deploying the new one, which means all users will switch to the new model at the same time. However, this approach is risky. If the new one doesn't perform as well as expected, all users will be unsatisfied. 

**Canary deployment** is one approach to minimize this risk and ensure a smooth transition. In canary deployment, user requests are gradually shifted to the new model. In other words, cwe first experiments the new model with a small portion of real users and can then direct more users to the new model if the new model performs better than the old one. 

### Example of canary deployment
KServe provides a convenient way to employ canary deployment. The following example shows how to use canary deployment in KServe. 

*Credits: This example is adapted from [this KServe doc](https://kserve.github.io/website/0.10/modelserving/v1beta1/rollout/canary-example/).*

#### Deploy the first version of a model
We first deploy a redwine model to KServe.

Remember to replace the "storageUri" in [manifests/redwine-model.yaml](./manifests/redwine-model.yaml) with your own sklearn redwine model's S3 URI (e.g., the one you trained when following the first week's MLflow tutorial).

In [1]:
# Deploy the first version
!kubectl apply -f manifests/redwine-model.yaml

inferenceservice.serving.kserve.io/redwine-week4 created


Expected output:
```text
inferenceservice.serving.kserve.io/redwine-week4 created
```

In [3]:
# Check if the "redwine-week4" inference service is ready.
!kubectl get isvc redwine-week4 -n kserve-inference

NAME            URL                                                 READY   PREV   LATEST   PREVROLLEDOUTREVISION   LATESTREADYREVISION             AGE
redwine-week4   http://redwine-week4.kserve-inference.example.com   True           100                              redwine-week4-predictor-00001   78s


Expected output:
```text
NAME            URL                                                 READY   PREV   LATEST   PREVROLLEDOUTREVISION   LATESTREADYREVISION                     AGE
redwine-week4   http://redwine-week4.kserve-inference.example.com   True           100                              redwine-week4-predictor-default-00001   17s
```

#### Train a new model
Now let's train a new red wine model with different hyperparameters.

In [5]:
import os
import logging

import mlflow
import mlflow.sklearn
import numpy as np
import pandas as pd
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split
from sklearn.linear_model import ElasticNet

# Set an environmental variable named "MLFLOW_S3_ENDPOINT_URL" so that MLflow client knows where to save artifacts.
# MLFLOW_S3_ENDPOINT_URL is the URL of the MinIO storage service
os.environ["MLFLOW_S3_ENDPOINT_URL"] = "http://mlflow-minio.local"

# Configure the credentials needed for accessing the MinIO storage service
os.environ["AWS_ACCESS_KEY_ID"] = "minioadmin"
os.environ["AWS_SECRET_ACCESS_KEY"] = "minioadmin"

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

MLFLOW_TRACKING_URI = "http://mlflow-server.local"
MLFLOW_EXPERIMENT_NAME = "week4-red-wine-quality"


def main():
    np.random.seed(40)

    # Read the wine-quality csv file from the URL
    csv_url = "http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv"
    

    data = pd.read_csv(csv_url, sep=";")

    # Split the data into training and test sets. (0.75, 0.25) split.
    train, test = train_test_split(data)

    # The predicted column is "quality" which is a scalar from [3, 9]
    train_x = train.drop(["quality"], axis=1)
    train_y = train[["quality"]]
    
    # Change parameters, both of them were 0.5 before
    alpha = 0.7
    l1_ratio = 0.7

    logger.info(f"Using MLflow tracking URI: {MLFLOW_TRACKING_URI}")
    mlflow.set_tracking_uri(MLFLOW_TRACKING_URI)

    logger.info(f"Using MLflow experiment: {MLFLOW_EXPERIMENT_NAME}")
    mlflow.set_experiment(MLFLOW_EXPERIMENT_NAME)

    with mlflow.start_run():
        lr = ElasticNet(alpha=alpha, l1_ratio=l1_ratio, random_state=42)

        logger.info("Fitting model...")

        lr.fit(train_x, train_y)

        logger.info("Finished fitting")

        logger.info("Elasticnet model (alpha=%f, l1_ratio=%f):" %
                    (alpha, l1_ratio))

        logger.info("Logging parameters to MLflow")
        mlflow.log_param("alpha", alpha)
        mlflow.log_param("l1_ratio", l1_ratio)

        logger.info("Logging trained model")
        artifact_name = "wine-quality"
        logged_model_info = mlflow.sklearn.log_model(
            lr, artifact_name, registered_model_name="Week4ElasticnetWineModel")
        print("The S3 URI of the logged model:", mlflow.get_artifact_uri(artifact_path=artifact_name))

main()

INFO:__main__:Using MLflow tracking URI: http://mlflow-server.local
INFO:__main__:Using MLflow experiment: week4-red-wine-quality
INFO:__main__:Fitting model...
INFO:__main__:Finished fitting
INFO:__main__:Elasticnet model (alpha=0.700000, l1_ratio=0.700000):
INFO:__main__:Logging parameters to MLflow
INFO:__main__:Logging trained model
Registered model 'Week4ElasticnetWineModel' already exists. Creating a new version of this model...
2024/02/19 00:40:53 INFO mlflow.store.model_registry.abstract_store: Waiting up to 300 seconds for model version to finish creation. Model name: Week4ElasticnetWineModel, version 4


The S3 URI of the logged model: s3://mlflow/49/bb610e857a3342bdbf87cd4dee4843c8/artifacts/wine-quality


Created version '4' of model 'Week4ElasticnetWineModel'.


Now we can deploy the new version of the redwine model. Remember to also replace the "storageUri" field in [manifests/redwine-model-v2.yaml](./manifests/redwine-model-v2.yaml) with the S3 URI of the new model (the URI should be printed in the output after running the previous cell).

In [6]:
# Update the "redwine-week4" inference service using the newer model version.
# Notice that redwine-model.yaml and redwine-model-v2.yaml have the same namespace and name in the metadata field, 
# so K8s knows it should update the inference service instead of creating a new one.
!kubectl apply -f manifests/redwine-model-v2.yaml

inferenceservice.serving.kserve.io/redwine-week4 configured


Expected output:
```text
inferenceservice.serving.kserve.io/redwine-week4 configured
```

In [8]:
# Check that the updated "redwine-week4" inference service is ready
!kubectl get isvc redwine-week4 -n kserve-inference

NAME            URL                                                 READY     PREV   LATEST   PREVROLLEDOUTREVISION           LATESTREADYREVISION             AGE
redwine-week4   http://redwine-week4.kserve-inference.example.com   Unknown   90     10       redwine-week4-predictor-00001   redwine-week4-predictor-00001   2m34s


Expected output: 
```text
NAME            URL                                                 READY   PREV   LATEST   PREVROLLEDOUTREVISION                   LATESTREADYREVISION                     AGE
redwine-week4   http://redwine-week4.kserve-inference.example.com   True    90     10       redwine-week4-predictor-default-00001   redwine-week4-predictor-default-00002   3m19s
```
From the output, you can see the incoming traffic is split between the new model (10%) and the previous model (90%).

Let's take a closer look at what happened. 

The content of redwine-model-v2.yaml is almost the same as redwine-model.yaml. Only the `storageURI` is updated and a new field `canaryTrafficPercent` is added to redwine-model-v2.yaml. `canaryTrafficPercent` indicates the percentage of user traffic that need to be directed to the new model. 

If you check the pods running for the "redwine-week4" inference service,

In [12]:
!kubectl -n kserve-inference get pods -l serving.kserve.io/inferenceservice=redwine-week4

NAME                                                        READY   STATUS    RESTARTS   AGE
redwine-week4-predictor-00001-deployment-7447bff94d-zxlsl   2/2     Running   0          3m3s
redwine-week4-predictor-00002-deployment-69858f8d84-cp4fl   2/2     Running   0          42s


Expected output:
```text
NAME                                                              READY   STATUS    RESTARTS   AGE
redwine-week4-predictor-default-00001-deployment-65f7fddfb6n9pj   2/2     Running   0          2m17s
redwine-week4-predictor-default-00002-deployment-77dbdbd7b8xndb   2/2     Running   0          94s
```
You can see there are two pods, the one that contains "default-00001" in its name is serving the old model and another the new model.

Suppose the new model performs well, we can direct more traffic to it by updating the `canaryTrafficPercent` field in redwine-model-v2.yaml. Finally we can direct all traffic to the model by removing the `canaryTrafficPercent` field (see [manifests/redwine-model-v2-fully-rollout.yaml](./manifests/redwine-model-v2-fully-rollout.yaml)). 

Remember to also replace the `storageUri` with your own new red wine model's S3 URI in redwine-model-v2-fully-rollout.yaml. 

In [13]:
!kubectl apply -f manifests/redwine-model-v2-fully-rollout.yaml

inferenceservice.serving.kserve.io/redwine-week4 configured


Expected output:
```text
inferenceservice.serving.kserve.io/redwine-week4 configured
```

Check the "redwine-week4" inference service again:

In [14]:
!kubectl get isvc redwine-week4 -n kserve-inference

NAME            URL                                                 READY   PREV   LATEST   PREVROLLEDOUTREVISION   LATESTREADYREVISION             AGE
redwine-week4   http://redwine-week4.kserve-inference.example.com   True           100                              redwine-week4-predictor-00002   3m10s


Expected output:
```text
NAME            URL                                                 READY   PREV   LATEST   PREVROLLEDOUTREVISION   LATESTREADYREVISION                     AGE
redwine-week4   http://redwine-week4.kserve-inference.example.com   True           100                              redwine-week4-predictor-default-00002   17h
```
Now 100% traffic is directed to the new model.

Check the pods for running the "redwine-week4" inference service again:

In [15]:
!kubectl -n kserve-inference get pods -l serving.kserve.io/inferenceservice=redwine-week4

NAME                                                        READY   STATUS        RESTARTS   AGE
redwine-week4-predictor-00001-deployment-7447bff94d-zxlsl   2/2     Terminating   0          3m11s
redwine-week4-predictor-00002-deployment-69858f8d84-cp4fl   2/2     Running       0          50s


Example output:
```text
NAME                                                              READY   STATUS    RESTARTS   AGE
redwine-week4-predictor-default-00002-deployment-77dbdbd7b8xndb   2/2     Running   0          3m8s
```
You can notice the pod (with "default-00001" in its name) serving the old model is terminated and only the pod (with "default-00002" in its name) remains and continue serving the new model. 

In [16]:
# Clean up by deleting the "redwine-week4" inference service
!kubectl delete isvc redwine-week4 -n kserve-inference

inferenceservice.serving.kserve.io "redwine-week4" deleted


# Next step
You've learned how to apply canary deployment when release a model to production. You can now go to [the next tutorial](./3_horizontal_scaling.ipynb) and see how to scale a model in response to increased traffic.