# Interfere Jina Embedding Model Package from AWS Marketplace

This notebook shows you how to deploy [jina-embedding-model](link) using Amazon SageMaker.

## Pre-requisites:
1. **Note**: This notebook contains elements which render correctly in Jupyter interface. Open this notebook from an Amazon SageMaker Notebook Instance or Amazon SageMaker Studio.
1. Ensure that IAM role used has **AmazonSageMakerFullAccess**
1. To deploy this ML model successfully, ensure that:
    1. Either your IAM role has these three permissions and you have authority to make AWS Marketplace subscriptions in the AWS account used: 
        1. **aws-marketplace:ViewSubscriptions**
        1. **aws-marketplace:Unsubscribe**
        1. **aws-marketplace:Subscribe**  
    2. or your AWS account has a subscription to [jina-embedding-model](link). If so, skip step: [Subscribe to the model package](#1.-Subscribe-to-the-model-package)

## Contents:
1. [Subscribe to the model package](#1.-Subscribe-to-the-model-package)
2. [Real-time inference](#2.-Real-time-inference)
   1. [Create an endpoint with static instances](#A.-Create-an-endpoint-with-static-instances)
   2. [Create an endpoint that automatically scales](#B.-Create-an-endpoint-that-automatically-scales)
   3. [Create an serverless endpoint](#C.-Create-an-serverless-endpoint)
   4. [Perform real-time inference](#D.-Perform-real-time-inference)
3. [Batch inference](#3.-Batch-inference)
4. [Clean-up](#4.-Clean-up)
    1. [Delete the model](#A.-Delete-the-model)
    2. [Unsubscribe to the listing (optional)](#B.-Unsubscribe-to-the-listing-(optional))

# 1. Subscribe to the model package

In [None]:
!pip install --upgrade jina-sagemaker

from jina_sagemaker import Client
import boto3

In [None]:
region = boto3.Session().region_name

# Specify the role if needed
# role = None

# Specify the model you want to use
model_package_arn = ""

# 2. Real-time inference

To learn about real-time inference capabilities in Amazon SageMaker, please refer to [Documentations](https://docs.aws.amazon.com/sagemaker/latest/dg/realtime-endpoints.html).

## A. Create an endpoint with static instances

In [None]:
co = Client(region_name=region)
co.create_endpoint(arn=model_package_arn, role=role, endpoint_name="my-endpoint", instance_type="ml.m5.xlarge", n_instances=1)

# If the endpoint is already created, you just need to connect to it
# co.connect_to_endpoint(endpoint_name="my-endpoint")

## B. Create an endpoint that automatically scales

In this section, we configure an autoscaling endpoint that leverages step scaling, which scales a resource based on a set of scaling adjustments that vary based on the size of the alarm breach. For an in-depth understanding of step scaling, you can refer to the [Documentations](https://docs.aws.amazon.com/autoscaling/application/userguide/application-auto-scaling-step-scaling-policies.html).

When utilizing step scaling, it's your responsibility to configure the alarms that trigger the policy. Generally, you'll want to establish two alarms: one for triggering a step scale-in action and another for initiating a step scale-out action. Note that the upper and lower bounds specified in these policies are relative to the thresholds set in the corresponding alarms.

In [None]:
co = Client(region_name=region)
co.create_endpoint(arn=model_package_arn, role=role, endpoint_name="my-autoscaling-endpoint", instance_type="ml.m5.xlarge", n_instances=2)

# If the endpoint is already created, you just need to connect to it
# co.connect_to_endpoint(endpoint_name="my-autoscaling_endpoint")

co.register_scalable_target(max_capacity=5, min_capacity=2)

r = co.set_step_autoscaling(
    policy_name="down",
    policy_configuration={
        "AdjustmentType": "ExactCapacity",
        "StepAdjustments": [
            {
                "MetricIntervalUpperBound": 0,
                "ScalingAdjustment": 2,
            }
        ],
        "MetricAggregationType": "Average",
        "Cooldown": 10,
    },
)
down_policy = r['PolicyARN']

r = co.set_step_autoscaling(
    policy_name="up",
    policy_configuration={
        "AdjustmentType": "ChangeInCapacity",
        "StepAdjustments": [
            {
                "MetricIntervalLowerBound": 0,
                "MetricIntervalUpperBound": 10,
                "ScalingAdjustment": 0,
            },
            {
                "MetricIntervalLowerBound": 10,
                "MetricIntervalUpperBound": 40,
                "ScalingAdjustment": 3,
            },
            {
                "MetricIntervalLowerBound": 40,
                "ScalingAdjustment": 4,
            },
        ],
        "MetricAggregationType": "Average",
        "Cooldown": 10,
    },
)
up_policy = r['PolicyARN']

co.set_metric_alarm(policy_arn=down_policy, 
    AlarmName=f"step_scaling_policy_alarm_down",
    MetricName="CPUUtilization",
    Namespace="/aws/sagemaker/Endpoints",
    Statistic="Average",
    EvaluationPeriods=1,
    DatapointsToAlarm=1,
    Threshold=30,
    ComparisonOperator="LessThanOrEqualToThreshold",
    TreatMissingData="missing",
    Period=60,
    Unit="Percent",
)

co.set_metric_alarm(policy_arn=up_policy,
    AlarmName=f"step_scaling_policy_alarm_up",
    MetricName="CPUUtilization",
    Namespace="/aws/sagemaker/Endpoints",
    Statistic="Average",
    EvaluationPeriods=1,
    DatapointsToAlarm=1,
    Threshold=60,
    ComparisonOperator="GreaterThanThreshold",
    TreatMissingData="missing",
    Period=10,
    Unit="Percent",
)

## C. Create an serverless endpoint

To learn about serverless inference in Amazon SageMaker, please refer to [Documentations](https://docs.aws.amazon.com/sagemaker/latest/dg/serverless-endpoints.html).

In [None]:
from sagemaker.serverless import ServerlessInferenceConfig


sls_config = ServerlessInferenceConfig(
    memory_size_in_mb=6144,
    max_concurrency=20,
    provisioned_concurrency=10
)

co.create_endpoint(arn=model_package_arn, role=role, endpoint_name="my-sls-endpoint", sls_config=sls_config)

# If the endpoint is already created, you just need to connect to it
# co.connect_to_endpoint(endpoint_name="my-sls_endpoint")

## D. Perform real-time inference

In [8]:
import json

result = co.embed(model='jina-embedding-t-en-v1', texts=["how is the weather today"])

[-0.03703858703374863, 0.053599365055561066, 0.012735798954963684]


# 3. Batch inference

To learn about batch transform capabilities in Amazon SageMaker, please refer to [Documentations](https://docs.aws.amazon.com/sagemaker/latest/dg/batch-transform.html).

In [None]:
co.create_transform_job(
    arn=model_package_arn,
    role=role,
    n_instances=1,
    instance_type="ml.m5.xlarge",
    input_path="s3://path/to/input/data",
    output_path="s3://path/to/output",
    content_type="application/csv",
    split_type="Line",
    strategy="MultiRecord",
)

# 4. Clean-up

## A. Delete the model

In [None]:
co.delete_endpoint()
co.close()

## B. Unsubscribe to the listing (optional)

If you would like to unsubscribe to the model package, follow these steps. Before you cancel the subscription, ensure that you do not have any [deployable model](https://console.aws.amazon.com/sagemaker/home#/models) created from the model package or using the algorithm. Note - You can find this information by looking at the container name associated with the model. 

**Steps to unsubscribe to product from AWS Marketplace**:
1. Navigate to __Machine Learning__ tab on [__Your Software subscriptions page__](https://aws.amazon.com/marketplace/ai/library?productType=ml&ref_=mlmp_gitdemo_indust)
2. Locate the listing that you want to cancel the subscription for, and then choose __Cancel Subscription__  to cancel the subscription.
