# Penguins Endpoint

This notebook contains routines and examples to interact with a SageMaker Endpoint.

This notebook is part of the [Machine Learning School](https://www.ml.school) program.

In [2]:
import sys
from pathlib import Path

CODE_FOLDER = Path("code")
sys.path.append(f"./{CODE_FOLDER}")

In [3]:
import boto3
import json
import numpy as np
import random
import sagemaker
import pandas as pd

from constants import *
from time import sleep
from datetime import datetime
from threading import Thread, Event
from pathlib import Path
from sagemaker import ModelPackage
from sagemaker.predictor import Predictor
from sagemaker.serializers import JSONSerializer
from sagemaker.deserializers import JSONDeserializer
from sagemaker.s3 import S3Uploader, S3Downloader


def get_predictor(endpoint_name, json_predictor=False):
    """
    This function waits for the endpoint to be ready to serve traffic and then returns 
    a Predictor using the appropriate serializer and a deserializer. 
    """
    
    waiter = sagemaker_client.get_waiter("endpoint_in_service")
    waiter.wait(
        EndpointName=endpoint_name,
        WaiterConfig={
            "Delay": 10,
            "MaxAttempts": 30
        }
    )
    
    return (
        Predictor(endpoint_name=endpoint_name, serializer=JSONSerializer(), deserializer=JSONDeserializer()) 
        if json_predictor 
        else Predictor(endpoint_name=endpoint_name)
    )

## Deploy Latest Model From Registry

We can deploy a model from the Model Registry using SageMaker's SDK or the [boto3 SageMaker Client API](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html).

Let's create a function that we can use to test whether the model is working properly.

In [4]:
def test_endpoint(predictor):
    """
    This is a simple function that sends a request to the endpoint and prints out
    the results.
    
    Notice the payload we need to provide the model is in CSV format. The model
    expects data that's already transformed. We can't provide the original data
    from our dataset because the model will not work with it.
    """

    payload = """
    0.6569590202313976, -1.0813829646495108, 1.2097102831892812, 0.9226343641317372, 1.0, 0.0, 0.0
    -0.7751048801481084, 0.8822689351285553,  -1.2168066120762704, 0.9226343641317372, 0.0, 1.0, 0.0
    -0.837387834894918, 0.3386660813829646, -0.26237731892812, -1.92351941317372, 0.0, 0.0, 1.0
    """

    response = predictor.predict(payload, initial_args={"ContentType": "text/csv"})
    response = json.loads(response.decode("utf-8"))

    print(json.dumps(response, indent=2))
    print(f"\nSpecies: {np.argmax(response['predictions'], axis=1)}")

To deploy a model from the Model Registry, we need to find its model package ARN. Let's query the list of approved models and get the latest one.

In [5]:
response = sagemaker_client.list_model_packages(
    ModelPackageGroupName=MODEL_PACKAGE_GROUP,
    ModelApprovalStatus="Approved",
    SortBy="CreationTime",
    MaxResults=1,
)

package = response["ModelPackageSummaryList"][0]
package

{'ModelPackageGroupName': 'penguins',
 'ModelPackageVersion': 45,
 'ModelPackageArn': 'arn:aws:sagemaker:us-east-1:325223348818:model-package/penguins/45',
 'CreationTime': datetime.datetime(2023, 7, 31, 17, 51, 24, 53000, tzinfo=tzlocal()),
 'ModelPackageStatus': 'Completed',
 'ModelApprovalStatus': 'Approved'}

### Deploying the Model using SageMaker's SDK

Using the ARN of the model package from the Model Registry, we can deploy the model by creating a [ModelPackage](https://sagemaker.readthedocs.io/en/stable/api/inference/model.html#sagemaker.model.ModelPackage) instance and calling its `deploy()` function. The model information lives in the Model Registry, so we don't need to specify anything else.

In [6]:
model_package_arn = package["ModelPackageArn"]

model_package = ModelPackage(
    model_package_arn=model_package_arn, 
    sagemaker_session=sagemaker_session,
    role=role, 
)

model_package.deploy(
    endpoint_name=ENDPOINT,
    initial_instance_count=1, 
    instance_type="ml.m5.large",
)

----!

Let's test the endpoint to make sure it works.

In [15]:
predictor = get_predictor(ENDPOINT)
test_endpoint(predictor)

{
  "predictions": [
    [
      0.0133925341,
      0.0399835221,
      0.946623921
    ],
    [
      0.8760373,
      0.087144725,
      0.036817953
    ],
    [
      0.989405215,
      0.00671506068,
      0.00387977599
    ]
  ]
}

Species: [2 0 0]


We can now delete the endpoint using the predictor.

In [16]:
predictor.delete_endpoint()

### Deploying the Model using Boto3

We can also deploy the model using the [boto3 SageMaker Client API](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html). We can do this in three steps:


1. Create a model. See the [create_model](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker/client/create_model.html) function.
2. Create the configuration of the endpoint. This configuration will specify the model you created in the first step. See the [create_endpoint_config](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker/client/create_endpoint_config.html) function.
3. Create the endpoint using the endpoint configuration that you created before. See the [create_endpoint](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker/client/create_endpoint.html) function.



In [47]:
model_package_arn = package["ModelPackageArn"]
model_name = "penguins-model"
endpoint_config_name = "penguins-endpoint-config"

sagemaker_client.create_model(
    ModelName=model_name, 
    ExecutionRoleArn=role, 
    Containers=[{
        "ModelPackageName": model_package_arn
    }] 
)

sagemaker_client.create_endpoint_config(
    EndpointConfigName=endpoint_config_name,
    ProductionVariants=[
        {
            "ModelName": model_name,
            "InstanceType": "ml.m5.large",
            "InitialVariantWeight": 1,
            "InitialInstanceCount": 1,
            "VariantName": "AllTraffic",
        }
    ]
)

sagemaker_client.create_endpoint(
    EndpointName=ENDPOINT, 
    EndpointConfigName=endpoint_config_name,
)

{'EndpointArn': 'arn:aws:sagemaker:us-east-1:325223348818:endpoint/penguins-endpoint',
 'ResponseMetadata': {'RequestId': 'ee1a1e05-60d3-4119-9664-0bdce6d67c6d',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': 'ee1a1e05-60d3-4119-9664-0bdce6d67c6d',
   'content-type': 'application/x-amz-json-1.1',
   'content-length': '85',
   'date': 'Tue, 25 Jul 2023 13:31:56 GMT'},
  'RetryAttempts': 0}}

Let's test the endpoint to make sure it works.

In [48]:
test_endpoint(get_predictor(ENDPOINT))

{
  "predictions": [
    [
      0.0330601558,
      0.0195660368,
      0.947373867
    ],
    [
      0.874199688,
      0.109029919,
      0.0167703032
    ],
    [
      0.93585372,
      0.044174917,
      0.0199713483
    ]
  ]
}

Species: [2 0 0]


We can now delete the endpoint, the endpoint configuration, and the model.

In [49]:
sagemaker_client.delete_endpoint(EndpointName=ENDPOINT)
sagemaker_client.delete_endpoint_config(EndpointConfigName=endpoint_config_name)
sagemaker_client.delete_model(ModelName=model_name)

{'ResponseMetadata': {'RequestId': 'e84f4a89-401c-4a34-b49a-7c14bb306d5a',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': 'e84f4a89-401c-4a34-b49a-7c14bb306d5a',
   'content-type': 'application/x-amz-json-1.1',
   'content-length': '0',
   'date': 'Tue, 25 Jul 2023 13:34:11 GMT'},
  'RetryAttempts': 0}}

## Testing Custom Endpoint

In this section we can test the endpoint that uses a custom inference script. We will use the function to create a predictor with a JSON encoder and decoder. 

In [4]:
predictor = get_predictor(ENDPOINT, json_predictor=True)

Running one example through the endpoint.

In [5]:
predictor.predict({
    "island": "Dream",
    "culmen_length_mm": 46.4,
    "culmen_depth_mm": 18.6,
    "flipper_length_mm": 190.0,
    "body_mass_g": 3450.0,
})

{'prediction': 'Adelie', 'confidence': 0.48664245}

Running another example.

In [6]:
predictor.predict({
    "island": "Biscoe",
    "culmen_length_mm": 48.6,
    "culmen_depth_mm": 16.0,
    "flipper_length_mm": 230.0,
    "body_mass_g": 5800.0,
})

{'prediction': 'Gentoo', 'confidence': 0.862531245}

Let's now delete the endpoint.

In [7]:
predictor.delete_endpoint()

## Generating Traffic

To test the monitoring functionality, we need to generate some traffic to the endpoint We will repeatedly send every sample from the dataset to the endpoint to simulate real prediction requests.

In [27]:
def generate_traffic(predictor):
    
    def _predict(data, predictor, stop_traffic_thread):
        for index, row in data.iterrows():
            predictor.predict(row.to_dict(), inference_id=str(index))
            
            sleep(1)

            if stop_traffic_thread.is_set():
                break

    def _generate_prediction_data(data, predictor, stop_traffic_thread):
        while True:
            print(f"Generating {data.shape[0]} predictions...")
            _predict(data, predictor, stop_traffic_thread)
            
            if stop_traffic_thread.is_set():
                break

                
    stop_traffic_thread = Event()
    
    data = pd.read_csv(DATA_FILEPATH).dropna()
    data.drop(["sex"], axis=1, inplace=True)
    
    traffic_thread = Thread(
        target=_generate_prediction_data,
        args=(data, predictor, stop_traffic_thread,)
    )
    
    traffic_thread.start()
    
    return stop_traffic_thread, traffic_thread


Let's wait for the endpoint to be ready and create a predictor.

In [28]:
predictor = get_predictor(ENDPOINT, json_predictor=True)

We can now start generating traffic.

In [33]:
stop_traffic_thread, traffic_thread = generate_traffic(predictor)

Generating 334 predictions...
Uploading ground truth data to s3://mlschool/penguins/monitoring/groundtruth/2023/08/02/19/2136.jsonl...
Uploading ground truth data to s3://mlschool/penguins/monitoring/groundtruth/2023/08/02/19/2206.jsonl...
Uploading ground truth data to s3://mlschool/penguins/monitoring/groundtruth/2023/08/02/19/2237.jsonl...
Uploading ground truth data to s3://mlschool/penguins/monitoring/groundtruth/2023/08/02/19/2307.jsonl...
Uploading ground truth data to s3://mlschool/penguins/monitoring/groundtruth/2023/08/02/19/2337.jsonl...
Uploading ground truth data to s3://mlschool/penguins/monitoring/groundtruth/2023/08/02/19/2407.jsonl...
Uploading ground truth data to s3://mlschool/penguins/monitoring/groundtruth/2023/08/02/19/2438.jsonl...
Uploading ground truth data to s3://mlschool/penguins/monitoring/groundtruth/2023/08/02/19/2508.jsonl...
Uploading ground truth data to s3://mlschool/penguins/monitoring/groundtruth/2023/08/02/19/2538.jsonl...
Uploading ground truth da

### Introducing a Violation

Let's make a prediction for a penguin and include extra fields in the request. This should be flagged by the monitoring job.

In [22]:
predictor.predict({
    "island": "Dream",
    "culmen_length_mm": 46.4,
    "culmen_depth_mm": 18.6,
    "flipper_length_mm": 190.0,
    "body_mass_g": 5608.0,
    
    # These two columns are not in the baseline data,
    # so they will be reported by the monitoring job
    # as a violation.
    "name": "Johnny",
    "height": 28.0
})

{'prediction': 'Gentoo', 'confidence': 0.390927851}

Generating 334 predictions...
Generating 334 predictions...
Generating 334 predictions...
Generating 334 predictions...
Generating 334 predictions...
Generating 334 predictions...
Generating 334 predictions...
Generating 334 predictions...
Generating 334 predictions...
Generating 334 predictions...
Generating 334 predictions...
Generating 334 predictions...
Generating 334 predictions...
Generating 334 predictions...
Generating 334 predictions...


### Deleting Resources

First, let's stop the thread genering traffic.

In [30]:
stop_traffic_thread.set()
traffic_thread.join()

Let's now delete the endpoint.

In [24]:
predictor.delete_endpoint()

## Generating Ground Truth Data

To monitor our model, we need to generate ground truth data for the samples captured by the endpoint. We can simulate this by generating a random ground truth for every sample. Check [Ingest Ground Truth Labels and Merge Them With Predictions](https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor-model-quality-merge.html) for more information about this.

In [31]:
def generate_ground_truth_data(predictor, ground_truth_location):
    
    def _generate_ground_truth_record(inference_id):
        random.seed(inference_id)

        return {
            "groundTruthData": {
                "data": random.choice(["Adelie", "Chinstrap", "Gentoo"]),
                "encoding": "CSV",
            },
            "eventMetadata": {
                "eventId": str(inference_id),
            },
            "eventVersion": "0",
        }


    def _upload_ground_truth(records, upload_time):
        records = [json.dumps(r) for r in records]
        data = "\n".join(records)
        uri = f"{ground_truth_location}/{upload_time:%Y/%m/%d/%H/%M%S}.jsonl"

        print(f"Uploading ground truth data to {uri}...")

        S3Uploader.upload_string_as_file_body(data, uri)    

                
    def _generate_ground_truth_data(max_records, stop_ground_truth_thread):
        while True:
            records = [_generate_ground_truth_record(i) for i in range(max_records)]
            _upload_ground_truth(records, datetime.utcnow())

            if stop_ground_truth_thread.is_set():
                break

            sleep(30)

                
    stop_ground_truth_thread = Event()
    data = pd.read_csv(DATA_FILEPATH).dropna()
    
    groundtruth_thread = Thread(
        target=_generate_ground_truth_data,
        args=(len(data), stop_ground_truth_thread,)
    )
    
    groundtruth_thread.start()
    
    return stop_ground_truth_thread, groundtruth_thread

In [32]:
predictor = get_predictor(ENDPOINT, json_predictor=True)

stop_ground_truth_thread, groundtruth_thread = generate_ground_truth_data(
    predictor, 
    GROUND_TRUTH_LOCATION
)

Uploading ground truth data to s3://mlschool/penguins/monitoring/groundtruth/2023/08/02/19/2106.jsonl...


### Deleting Resources

First, let's stop the thread genering traffic.

In [34]:
stop_ground_truth_thread.set()
groundtruth_thread.join()

Uploading ground truth data to s3://mlschool/penguins/monitoring/groundtruth/2023/08/02/19/2709.jsonl...


Let's now delete the endpoint.

In [35]:
predictor.delete_endpoint()

Exception in thread Thread-8:
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/threading.py", line 932, in _bootstrap_inner
    self.run()
  File "/usr/local/lib/python3.8/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "<ipython-input-27-af02a3504fde>", line 15, in _generate_prediction_data
  File "<ipython-input-27-af02a3504fde>", line 5, in _predict
  File "/usr/local/lib/python3.8/site-packages/sagemaker/base_predictor.py", line 185, in predict
    response = self.sagemaker_session.sagemaker_runtime_client.invoke_endpoint(**request_args)
  File "/usr/local/lib/python3.8/site-packages/botocore/client.py", line 535, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/usr/local/lib/python3.8/site-packages/botocore/client.py", line 980, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.errorfactory.ValidationError: An error occurred (ValidationError) when calling the InvokeE