# Penguins Endpoint

This notebook contains routines and examples to interact with a SageMaker Endpoint.

This notebook is part of the [Machine Learning School](https://www.ml.school) program.

In [61]:
import boto3
import json
import numpy as np
import sagemaker
import pandas as pd

from time import sleep
from threading import Thread, Event
from pathlib import Path
from sagemaker import ModelPackage
from sagemaker.predictor import Predictor
from sagemaker.serializers import JSONSerializer
from sagemaker.deserializers import JSONDeserializer
from sagemaker.s3 import S3Downloader


BUCKET = "mlschool"
S3_FILEPATH = f"s3://{BUCKET}/penguins"
LOCAL_FILEPATH = Path().resolve() / "data.csv"

ENDPOINT = "penguins-endpoint"
MODEL_PACKAGE_GROUP = "penguins"
DATA_CAPTURE_DESTINATION = f"{S3_FILEPATH}/monitoring/data-capture"

sagemaker_client = boto3.client("sagemaker")
role = sagemaker.get_execution_role()
sagemaker_session = sagemaker.session.Session()

## Deploy Latest Model From Registry

Let's get the latest approved model from the Model Registry and deploy it to an endpoint.

We can use `boto3` to query the list of approved models and get the latest one. Check the [boto3 SageMaker Client API](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html) for a list of everything you can do using the API.

In [11]:
response = sagemaker_client.list_model_packages(
    ModelPackageGroupName=MODEL_PACKAGE_GROUP,
    ModelApprovalStatus="Approved",
    SortBy="CreationTime",
    MaxResults=1,
)

package = response["ModelPackageSummaryList"][0]
package

{'ModelPackageGroupName': 'penguins',
 'ModelPackageVersion': 21,
 'ModelPackageArn': 'arn:aws:sagemaker:us-east-1:325223348818:model-package/penguins/21',
 'CreationTime': datetime.datetime(2023, 7, 12, 11, 18, 21, 69000, tzinfo=tzlocal()),
 'ModelPackageStatus': 'Completed',
 'ModelApprovalStatus': 'Approved'}

Using the ARN of the model package from the Model Registry, we can deploy the model by creating a [ModelPackage](https://sagemaker.readthedocs.io/en/stable/api/inference/model.html#sagemaker.model.ModelPackage) instance and calling its `deploy()` function. The model information lives in the Model Registry, so we don't need to specify anything else.

In [13]:
model_package = ModelPackage(
    model_package_arn=package["ModelPackageArn"], 
    sagemaker_session=sagemaker_session,
    role=role, 
)

model_package.deploy(
    endpoint_name=ENDPOINT,
    initial_instance_count=1, 
    instance_type="ml.m5.large",
)

----!

### Testing the Endpoint

We can create a [Predictor](https://sagemaker.readthedocs.io/en/stable/api/inference/predictors.html#sagemaker.predictor.Predictor) using the endpoint name and test our model.

The payload we need to provide the model is in CSV format. Notice how the model expects data that's already transformed. We can't provide the original data from our dataset because the model will not work with it.

In [15]:
predictor = Predictor(endpoint_name=ENDPOINT)

payload = """
0.6569590202313976, -1.0813829646495108, 1.2097102831892812, 0.9226343641317372, 1.0, 0.0, 0.0
-0.7751048801481084, 0.8822689351285553,  -1.2168066120762704, 0.9226343641317372, 0.0, 1.0, 0.0
-0.837387834894918, 0.3386660813829646, -0.26237731892812, -1.92351941317372, 0.0, 0.0, 1.0
"""

response = predictor.predict(payload, initial_args={"ContentType": "text/csv"})
response = json.loads(response.decode("utf-8"))

print(json.dumps(response, indent=2))
print(f"\nSpecies: {np.argmax(response['predictions'], axis=1)}")

{
  "predictions": [
    [
      0.112995617,
      0.0474319644,
      0.83957243
    ],
    [
      0.816265643,
      0.127040118,
      0.0566942506
    ],
    [
      0.97213167,
      0.0213899519,
      0.00647832826
    ]
  ]
}

Species: [2 0 0]


### Deleting the Endpoint

Let's now delete the endpoint.

In [16]:
predictor.delete_endpoint()

## Testing Custom Endpoint

First, let's wait for the endpoint to be ready to service traffic.

In [24]:
waiter = sagemaker_client.get_waiter("endpoint_in_service")
waiter.wait(
    EndpointName=ENDPOINT,
    WaiterConfig={
        "Delay": 10,
        "MaxAttempts": 30
    }
)

Now that the endpoint is in service, we can create a [Predictor](https://sagemaker.readthedocs.io/en/stable/api/inference/predictors.html) using a JSON serializer and a deserializer to have it automatically serialize and deserialize the information to and from the endpoint. Check [Serializers](https://sagemaker.readthedocs.io/en/stable/api/inference/serializers.html) and [Deserializers](https://sagemaker.readthedocs.io/en/stable/api/inference/deserializers.html) for a list of supported serializers and deserializers.

In [25]:
predictor = Predictor(
    endpoint_name=ENDPOINT,
    serializer=JSONSerializer(),
    deserializer=JSONDeserializer()
)

Running one example through the endpoint.

In [62]:
predictor.predict({
    "island": "Dream",
    "culmen_length_mm": 46.4,
    "culmen_depth_mm": 18.6,
    "flipper_length_mm": 190.0,
    "body_mass_g": 3450.0,
})

{'species': 'Chinstrap', 'prediction': 1, 'confidence': 0.851222754}

Generating 334 predictions...
Generating 334 predictions...
Generating 334 predictions...
Generating 334 predictions...
Generating 334 predictions...
Generating 334 predictions...
Generating 334 predictions...
Generating 334 predictions...
Generating 334 predictions...
Generating 334 predictions...


Running another example.

In [27]:
predictor.predict({
    "island": "Biscoe",
    "culmen_length_mm": 48.6,
    "culmen_depth_mm": 16.0,
    "flipper_length_mm": 230.0,
    "body_mass_g": 5800.0,
})

{'species': 'Gentoo', 'prediction': 2, 'confidence': 0.982656479}

### Deleting the Endpoint

Let's now delete the endpoint.

In [28]:
predictor.delete_endpoint()

## Generating Traffic

To test the monitoring functionality, we need to generate some traffic to the endpoint We will repeatedly send every sample from the dataset to the endpoint to simulate real prediction requests.

In [54]:
def generate_traffic(predictor):
    
    def _predict(data, predictor, stop_traffic_thread):
        for index, row in data.iterrows():
            predictor.predict(row.to_dict(), inference_id=str(index))
            
            sleep(1)

            if stop_traffic_thread.is_set():
                break

    def _generate_prediction_data(data, predictor, stop_traffic_thread):
        while True:
            print(f"Generating {data.shape[0]} predictions...")
            _predict(data, predictor, stop_traffic_thread)
            
            if stop_traffic_thread.is_set():
                break

                
    stop_traffic_thread = Event()
    
    data = pd.read_csv(LOCAL_FILEPATH).dropna()
    data.drop(["sex", "species"], axis=1, inplace=True)
    
    traffic_thread = Thread(
        target=_generate_prediction_data,
        args=(data, predictor, stop_traffic_thread,)
    )
    
    traffic_thread.start()
    
    return stop_traffic_thread, traffic_thread


Let's wait for the endpoint to be ready and create a predictor.

In [57]:
waiter = sagemaker_client.get_waiter("endpoint_in_service")
waiter.wait(
    EndpointName=ENDPOINT,
    WaiterConfig={
        "Delay": 10,
        "MaxAttempts": 30
    }
)

predictor = Predictor(
    endpoint_name=ENDPOINT,
    serializer=JSONSerializer(),
    deserializer=JSONDeserializer()
)

predictor.predict({
    "island": "Dream",
    "culmen_length_mm": 46.4,
    "culmen_depth_mm": 18.6,
    "flipper_length_mm": 190.0,
    "body_mass_g": 3450.0,
})

{'species': 'Chinstrap', 'prediction': 1, 'confidence': 0.851222754}

We can now start generating traffic.

In [58]:
stop_traffic_thread, traffic_thread = generate_traffic(predictor)

Generating 334 predictions...


### Checking Captured Data

Let's check the S3 location where the endpoint stores the requests and responses that it receives.

Notice that it make take a few minutes for the first few files to show up in S3. Keep running the following line until you get some.

In [59]:
files = S3Downloader.list(DATA_CAPTURE_DESTINATION)[:3]
files

['s3://mlschool/penguins/monitoring/data-capture/penguins-endpoint/AllTraffic/2023/06/27/10/58-55-069-5a80dc2b-dafe-4be7-920a-eaece978479b.jsonl',
 's3://mlschool/penguins/monitoring/data-capture/penguins-endpoint/AllTraffic/2023/06/27/10/59-56-068-4bdc305d-5965-490e-b0e5-5e687dac7702.jsonl',
 's3://mlschool/penguins/monitoring/data-capture/penguins-endpoint/AllTraffic/2023/06/27/11/00-56-384-6decb274-dfde-459c-9bf9-e53036a12711.jsonl']

These files contain the data captured by the endpoint in a SageMaker-specific JSON-line format. Each inference request is captured in a single line in the `jsonl` file. The line contains both the input and output merged together.

Let's read the first line from the first file:

In [60]:
if len(files):
    lines = S3Downloader.read_file(files[0])
    print(json.dumps(json.loads(lines.split("\n")[0]), indent=2))

{
  "captureData": {
    "endpointInput": {
      "observedContentType": "application/json",
      "mode": "INPUT",
      "data": "{\"island\": \"Torgersen\", \"culmen_length_mm\": 39.1, \"culmen_depth_mm\": 18.7, \"flipper_length_mm\": 181.0, \"body_mass_g\": 3750.0}",
      "encoding": "JSON"
    },
    "endpointOutput": {
      "observedContentType": "application/json",
      "mode": "OUTPUT",
      "data": "{\"species\": \"Adelie\", \"prediction\": 0, \"confidence\": 0.809994876}",
      "encoding": "JSON"
    }
  },
  "eventMetadata": {
    "eventId": "705ebc5b-70c7-4eab-977a-807320b1b589",
    "inferenceId": "0",
    "inferenceTime": "2023-06-27T10:58:55Z"
  },
  "eventVersion": "0"
}
Generating 334 predictions...
Generating 334 predictions...
Generating 334 predictions...


### Introducing a Violation

Let's make a prediction for a penguin and include extra fields in the request. This should be flagged by the monitoring job.

In [64]:
predictor.predict({
    "island": "Dream",
    "culmen_length_mm": 46.4,
    "culmen_depth_mm": 18.6,
    "flipper_length_mm": 190.0,
    "body_mass_g": 5608.0,
    
    # These two columns are not in the baseline data,
    # so they will be reported by the monitoring job
    # as a violation.
    "name": "Johnny",
    "height": 28.0
})

{'species': 'Adelie', 'prediction': 0, 'confidence': 0.683268189}

Generating 334 predictions...
Generating 334 predictions...
Generating 334 predictions...
Generating 334 predictions...
Generating 334 predictions...
Generating 334 predictions...
Generating 334 predictions...
Generating 334 predictions...
Generating 334 predictions...
Generating 334 predictions...
Generating 334 predictions...
Generating 334 predictions...
Generating 334 predictions...
Generating 334 predictions...
Generating 334 predictions...
Generating 334 predictions...
Generating 334 predictions...
Generating 334 predictions...
Generating 334 predictions...
Generating 334 predictions...
Generating 334 predictions...
Generating 334 predictions...
Generating 334 predictions...
Generating 334 predictions...
Generating 334 predictions...
Generating 334 predictions...
Generating 334 predictions...
Generating 334 predictions...


### Deleting Resources

First, let's stop the thread genering traffic.

In [65]:
stop_traffic_thread.set()
traffic_thread.join()

Let's now delete the endpoint.

In [66]:
predictor.delete_endpoint()