# Penguins Endpoint

This notebook contains routines and examples to interact with a SageMaker Endpoint.

This notebook is part of the [Machine Learning School](https://www.ml.school) program.

In [2]:
import sys
from pathlib import Path

CODE_FOLDER = Path("code")
sys.path.append(f"./{CODE_FOLDER}")

In [3]:
import boto3
import json
import numpy as np
import random
import sagemaker
import pandas as pd

from constants import *
from time import sleep
from datetime import datetime
from threading import Thread, Event
from pathlib import Path
from sagemaker import ModelPackage
from sagemaker.predictor import Predictor
from sagemaker.serializers import JSONSerializer
from sagemaker.deserializers import JSONDeserializer
from sagemaker.s3 import S3Uploader, S3Downloader


ENDPOINT = "penguins-endpoint"
MODEL_PACKAGE_GROUP = "penguins"
GROUND_TRUTH_LOCATION = f"{S3_LOCATION}/monitoring/groundtruth" 

## Deploy Latest Model From Registry

Let's get the latest approved model from the Model Registry and deploy it to an endpoint.

We can use `boto3` to query the list of approved models and get the latest one. Check the [boto3 SageMaker Client API](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html) for a list of everything you can do using the API.

In [5]:
response = sagemaker_client.list_model_packages(
    ModelPackageGroupName=MODEL_PACKAGE_GROUP,
    ModelApprovalStatus="Approved",
    SortBy="CreationTime",
    MaxResults=1,
)

package = response["ModelPackageSummaryList"][0]
package

{'ModelPackageGroupName': 'penguins',
 'ModelPackageVersion': 27,
 'ModelPackageArn': 'arn:aws:sagemaker:us-east-1:325223348818:model-package/penguins/27',
 'CreationTime': datetime.datetime(2023, 7, 13, 13, 7, 45, 946000, tzinfo=tzlocal()),
 'ModelPackageStatus': 'Completed',
 'ModelApprovalStatus': 'Approved'}

Using the ARN of the model package from the Model Registry, we can deploy the model by creating a [ModelPackage](https://sagemaker.readthedocs.io/en/stable/api/inference/model.html#sagemaker.model.ModelPackage) instance and calling its `deploy()` function. The model information lives in the Model Registry, so we don't need to specify anything else.

In [11]:
model_package = ModelPackage(
    model_package_arn=package["ModelPackageArn"], 
    sagemaker_session=sagemaker_session,
    role=role, 
)

model_package.deploy(
    endpoint_name=ENDPOINT,
    initial_instance_count=1, 
    instance_type="ml.m5.large",
)

----!

### Testing the Endpoint

We can create a [Predictor](https://sagemaker.readthedocs.io/en/stable/api/inference/predictors.html#sagemaker.predictor.Predictor) using the endpoint name and test our model.

The payload we need to provide the model is in CSV format. Notice how the model expects data that's already transformed. We can't provide the original data from our dataset because the model will not work with it.

In [12]:
predictor = Predictor(endpoint_name=ENDPOINT)

payload = """
0.6569590202313976, -1.0813829646495108, 1.2097102831892812, 0.9226343641317372, 1.0, 0.0, 0.0
-0.7751048801481084, 0.8822689351285553,  -1.2168066120762704, 0.9226343641317372, 0.0, 1.0, 0.0
-0.837387834894918, 0.3386660813829646, -0.26237731892812, -1.92351941317372, 0.0, 0.0, 1.0
"""

response = predictor.predict(payload, initial_args={"ContentType": "text/csv"})
response = json.loads(response.decode("utf-8"))

print(json.dumps(response, indent=2))
print(f"\nSpecies: {np.argmax(response['predictions'], axis=1)}")

{
  "predictions": [
    [
      0.0330601558,
      0.0195660368,
      0.947373867
    ],
    [
      0.874199688,
      0.109029919,
      0.0167703032
    ],
    [
      0.93585372,
      0.044174917,
      0.0199713483
    ]
  ]
}

Species: [2 0 0]


### Deleting the Endpoint

Let's now delete the endpoint.

In [13]:
predictor.delete_endpoint()

## Testing Custom Endpoint

This function will wait for the endpoint to be ready to serve traffic and then return a [Predictor](https://sagemaker.readthedocs.io/en/stable/api/inference/predictors.html) using a JSON serializer and a deserializer to have it automatically serialize and deserialize the information to and from the endpoint. Check [Serializers](https://sagemaker.readthedocs.io/en/stable/api/inference/serializers.html) and [Deserializers](https://sagemaker.readthedocs.io/en/stable/api/inference/deserializers.html) for a list of supported serializers and deserializers..

In [4]:
def _get_predictor(endpoint_name):
    waiter = sagemaker_client.get_waiter("endpoint_in_service")
    waiter.wait(
        EndpointName=endpoint_name,
        WaiterConfig={
            "Delay": 10,
            "MaxAttempts": 30
        }
    )
    
    return Predictor(
        endpoint_name=endpoint_name,
        serializer=JSONSerializer(),
        deserializer=JSONDeserializer()
    )

In [5]:
predictor = _get_predictor(ENDPOINT)

Running one example through the endpoint.

In [6]:
predictor.predict({
    "island": "Dream",
    "culmen_length_mm": 46.4,
    "culmen_depth_mm": 18.6,
    "flipper_length_mm": 190.0,
    "body_mass_g": 3450.0,
})

{'species': 'Chinstrap', 'prediction': 1, 'confidence': 0.572802365}

Running another example.

In [7]:
predictor.predict({
    "island": "Biscoe",
    "culmen_length_mm": 48.6,
    "culmen_depth_mm": 16.0,
    "flipper_length_mm": 230.0,
    "body_mass_g": 5800.0,
})

{'species': 'Gentoo', 'prediction': 2, 'confidence': 0.98487407}

### Deleting the Endpoint

Let's now delete the endpoint.

In [8]:
predictor.delete_endpoint()

## Generating Traffic

To test the monitoring functionality, we need to generate some traffic to the endpoint We will repeatedly send every sample from the dataset to the endpoint to simulate real prediction requests.

In [11]:
def generate_traffic(predictor):
    
    def _predict(data, predictor, stop_traffic_thread):
        for index, row in data.iterrows():
            predictor.predict(row.to_dict(), inference_id=str(index))
            
            sleep(1)

            if stop_traffic_thread.is_set():
                break

    def _generate_prediction_data(data, predictor, stop_traffic_thread):
        while True:
            print(f"Generating {data.shape[0]} predictions...")
            _predict(data, predictor, stop_traffic_thread)
            
            if stop_traffic_thread.is_set():
                break

                
    stop_traffic_thread = Event()
    
    data = pd.read_csv(DATA_FILEPATH).dropna()
    data.drop(["sex", "species"], axis=1, inplace=True)
    
    traffic_thread = Thread(
        target=_generate_prediction_data,
        args=(data, predictor, stop_traffic_thread,)
    )
    
    traffic_thread.start()
    
    return stop_traffic_thread, traffic_thread


Let's wait for the endpoint to be ready and create a predictor.

In [12]:
predictor = _get_predictor(ENDPOINT)

{'species': 'Chinstrap', 'prediction': 1, 'confidence': 0.842793047}

We can now start generating traffic.

In [13]:
stop_traffic_thread, traffic_thread = generate_traffic(predictor)

Generating 334 predictions...


### Introducing a Violation

Let's make a prediction for a penguin and include extra fields in the request. This should be flagged by the monitoring job.

In [None]:
predictor.predict({
    "island": "Dream",
    "culmen_length_mm": 46.4,
    "culmen_depth_mm": 18.6,
    "flipper_length_mm": 190.0,
    "body_mass_g": 5608.0,
    
    # These two columns are not in the baseline data,
    # so they will be reported by the monitoring job
    # as a violation.
    "name": "Johnny",
    "height": 28.0
})

### Deleting Resources

First, let's stop the thread genering traffic.

In [21]:
stop_traffic_thread.set()
traffic_thread.join()

Let's now delete the endpoint.

In [8]:
predictor.delete_endpoint()

## Generating Ground Truth Data

To monitor our model, we need to generate ground truth data for the samples captured by the endpoint. We can simulate this by generating a random ground truth for every sample. Check [Ingest Ground Truth Labels and Merge Them With Predictions](https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor-model-quality-merge.html) for more information about this.

In [17]:
def generate_ground_truth_data(predictor, ground_truth_location):
    
    def _generate_ground_truth_record(inference_id):
        random.seed(inference_id)

        return {
            "groundTruthData": {
                "data": random.choice(["Adelie", "Chinstrap", "Gentoo"]),
                "encoding": "CSV",
            },
            "eventMetadata": {
                "eventId": str(inference_id),
            },
            "eventVersion": "0",
        }


    def _upload_ground_truth(records, upload_time):
        records = [json.dumps(r) for r in records]
        data = "\n".join(records)
        uri = f"{ground_truth_location}/{upload_time:%Y/%m/%d/%H/%M%S}.jsonl"

        print(f"Uploading ground truth data to {uri}...")

        S3Uploader.upload_string_as_file_body(data, uri)    

                
    def _generate_ground_truth_data(max_records, stop_ground_truth_thread):
        while True:
            records = [_generate_ground_truth_record(i) for i in range(max_records)]
            _upload_ground_truth(records, datetime.utcnow())

            if stop_ground_truth_thread.is_set():
                break

            sleep(30)

                
    stop_ground_truth_thread = Event()
    data = pd.read_csv(DATA_FILEPATH).dropna()
    
    groundtruth_thread = Thread(
        target=_generate_ground_truth_data,
        args=(len(data), stop_ground_truth_thread,)
    )
    
    groundtruth_thread.start()
    
    return stop_ground_truth_thread, groundtruth_thread

In [None]:
predictor = _get_predictor(ENDPOINT)

stop_ground_truth_thread, groundtruth_thread = generate_ground_truth_data(
    predictor, 
    GROUND_TRUTH_LOCATION
)

In [20]:
stop_ground_truth_thread.set()
groundtruth_thread.join()

Uploading ground truth data to s3://mlschool/penguins/monitoring/groundtruth/2023/07/13/13/0713.jsonl...
