# Inference with Autoscaled Models

If [statically provisioned instances](./Real-time%20inference.ipynb) don't meet your needs, Amazon SageMaker offers [automatic scaling](https://aws.amazon.com/blogs/machine-learning/amazon-sagemaker-inference-launches-faster-auto-scaling-for-generative-ai-models/) for your hosted models. This allows the number of instances allocated for a model to dynamically adjust based on traffic demand, ensuring your deployment is both cost-effective and adept at managing varying traffic loads.

## Prerequisites:

1. This notebook should be rendered correctly in the Jupyter interface and can be executed either within an Amazon SageMaker Notebook Instance or Amazon SageMaker Studio.
2. Ensure that the IAM role being used has **AmazonSageMakerFullAccess**.
3. To successfully deploy the ML model, ensure that:
    1. Either your IAM role has the following three permissions and you have authority to make AWS Marketplace subscriptions in the AWS account:
        - **aws-marketplace:ViewSubscriptions**
        - **aws-marketplace:Unsubscribe**
        - **aws-marketplace:Subscribe**
    2. Or, your AWS account already has a subscription to the model.

# Model package setup

Please first subscribe to the model package(s) from AWS Marketplace.


And then let's install `jina-sagemaker` package and get the model package ARN using code below.


# Model package setup

Please first subscribe to the model package(s) from AWS Marketplace.

And then let's install `jina-sagemaker` package and get the model package ARN using code below.

In [None]:
!pip install --upgrade jina-sagemaker


import boto3

region = boto3.Session().region_name

# Specify the role as required by SageMaker
role = ""

# Specify the model package name, which you can obtain from the Product ARN in the AWS console. Please exclude the ‘arn:aws:sagemaker:xx-xx-x:xxxx:model-package’ prefix
model_package_name = ""

# Mapping for product ARN
def get_arn_for_model(region_name, model_name):
    model_package_map = {
        "us-east-1": f"arn:aws:sagemaker:us-east-1:253352124568:model-package/{model_name}",
        "us-east-2": f"arn:aws:sagemaker:us-east-2:057799348421:model-package/{model_name}",
        "us-west-1": f"arn:aws:sagemaker:us-west-1:382657785993:model-package/{model_name}",
        "us-west-2": f"arn:aws:sagemaker:us-west-2:594846645681:model-package/{model_name}",
        "ca-central-1": f"arn:aws:sagemaker:ca-central-1:470592106596:model-package/{model_name}",
        "eu-central-1": f"arn:aws:sagemaker:eu-central-1:446921602837:model-package/{model_name}",
        "eu-west-1": f"arn:aws:sagemaker:eu-west-1:985815980388:model-package/{model_name}",
        "eu-west-2": f"arn:aws:sagemaker:eu-west-2:856760150666:model-package/{model_name}",
        "eu-west-3": f"arn:aws:sagemaker:eu-west-3:843114510376:model-package/{model_name}",
        "eu-north-1": f"arn:aws:sagemaker:eu-north-1:136758871317:model-package/{model_name}",
        "ap-southeast-1": f"arn:aws:sagemaker:ap-southeast-1:192199979996:model-package/{model_name}",
        "ap-southeast-2": f"arn:aws:sagemaker:ap-southeast-2:666831318237:model-package/{model_name}",
        "ap-northeast-2": f"arn:aws:sagemaker:ap-northeast-2:745090734665:model-package/{model_name}",
        "ap-northeast-1": f"arn:aws:sagemaker:ap-northeast-1:977537786026:model-package/{model_name}",
        "ap-south-1": f"arn:aws:sagemaker:ap-south-1:077584701553:model-package/{model_name}",
        "sa-east-1": f"arn:aws:sagemaker:sa-east-1:270155090741:model-package/{model_name}",
    }

    return model_package_map[region_name]

model_package_arn = get_arn_for_model(region, model_package_name)

# Create an endpoint that automatically scales

In this section, we configure an autoscaling endpoint that leverages Target tracking scaling, which scales your application based on a target metric value. For an in-depth understanding of Target tracking scaling, you can refer to the [Documentations](https://docs.aws.amazon.com/autoscaling/application/userguide/application-auto-scaling-target-tracking.html).

In [None]:
from jina_sagemaker import Client

client = Client(region_name=region)
client.create_endpoint(arn=model_package_arn, role=role, endpoint_name="my-autoscaling-endpoint", instance_type="ml.g4dn.xlarge", n_instances=2)

# If the endpoint is already created, you just need to connect to it
# client.connect_to_endpoint(endpoint_name="my-autoscaling_endpoint", arn=model_package_arn)

client.register_scalable_target(
    max_capacity=10,
    min_capacity=1
)
client.set_target_tracking_autoscaling(
    policy_name="my_target_tracking_policy",
    policy_configuration = {
        'TargetValue': 5,  # Trigger scaling when concurrent requests per model reach 5
        'PredefinedMetricSpecification': {
            'PredefinedMetricType': 'SageMakerVariantConcurrentRequestsPerModelHighResolution'
        },
        'ScaleInCooldown': 60,
        'ScaleOutCooldown': 60
    }
)

Adjust the `min_capacity` and `max_capacity` parameters based on your anticipated traffic and budget. The `TargetValue` is set to 5, meaning auto-scaling will trigger when concurrent requests reach or exceed 5.

If you prefer using [step scaling](https://docs.aws.amazon.com/autoscaling/application/userguide/application-auto-scaling-step-scaling-policies.html), that’s also an option, but we won’t go into detail here.

# Perform real-time inference

You can invoke the endpoint using the APIs provided by `jina-sagemaker` in your application. As concurrency levels increase, the auto scaling policies will scale out the resources and scale them back in once the load returns to the original target value.

In [None]:
result = client.embed(texts=[
    "how is the weather today", 
    "what is the weather like today",
    "what's the color of an orange",
])

# Clean-up

## Delete the model

In [None]:
client.delete_endpoint()
client.close()

## Unsubscribe to the listing (optional)

If you would like to unsubscribe to the model package, follow these steps. Before you cancel the subscription, ensure that you do not have any [deployable model](https://console.aws.amazon.com/sagemaker/home#/models) created from the model package or using the algorithm. Note - You can find this information by looking at the container name associated with the model. 

**Steps to unsubscribe to product from AWS Marketplace**:
1. Navigate to __Machine Learning__ tab on [__Your Software subscriptions page__](https://aws.amazon.com/marketplace/ai/library?productType=ml&ref_=mlmp_gitdemo_indust)
2. Locate the listing that you want to cancel the subscription for, and then choose __Cancel Subscription__  to cancel the subscription.
