# Serverless Inference with SageMaker Serverless Endpoints
> How to call an ML model endpoint hosted by SageMaker using serverless technology.

- toc: true 
- badges: true
- comments: true
- categories: [aws, ml, sagemaker]
- keyword: [aws, ml, sagemaker]
- image: images/copied_from_nb/images/2022-06-17-sagemaker-endpoint.jpeg

![](images/2022-06-17-sagemaker-endpoint.jpeg)

# About

You have trained an deployed a model using Amazon SageMaker. You have an endpoint and now you are wondering "After I deploy an endpoint, where do I go from there?" Your concerns are valid because SageMaker endpoints are not public but are scoped to an individual account. In this post we will learn how to make them public using AWS serverless technologies: AWS Lambda and Function URL. We will also make our endpoints serverless so ML inference solution is serverless end-to-end.

# Introduction

The following diagram shows how a model is called using serverless architecture.
<br><br>
![serverless-architecture](images/2022-06-17-sagemaker-endpoint/serverless-architecture.png)

Starting from the client, a client application calls the AWS Lambda Function URL and passes parameter values. The Lambda function parses the request and passes it SageMaker model endpoint. This endpoint can be host on an EC2 instance or you have the option to make it serverless. Serverless endpoints behave similar to Lambda functions. Once request is received by the endpoint it will perform the prediction and return the predicted values back to Lambda. The Lambda function parses the returned values and send the final response back to client.

To train a model using Amazon SageMaker you can follow my other post [Demystifying Amazon SageMaker Training for scikit-learn Lovers](https://hassaanbinaslam.github.io/myblog/aws/ml/sagemaker/2022/06/08/sagemaker-training-overview.html). There I have trained SageMaker Linear Learner model on Boston housing dataset. 

`Note that this post assumes that you have already trained a model and is available in SageMaker model repository.`

# Deploying SageMaker Serverless Endpoint

Let's visit SageMaker model repository to find our Linear Learner model. You can find the repository in **SageMaker Inference / Model** page.

![model-repo](images/2022-06-17-sagemaker-endpoint/model-repo.png)

Note the mode name `linear-learner-2022-06-16-09-10-17-207` as will need in later steps.

Click on the *model name* and then **Create endpoint**

![create-endpoint](images/2022-06-17-sagemaker-endpoint/create-endpoint.png)

This will take you to **configure endpoint page**. Here do the following configurations.
* Set **Endpoint name** to `2022-06-17-sagemaker-endpoint-serverless`. You may use anyother unique string here.
* From **Attach endpoint configuration** select `create a new endpoint configuration`
* From **New endpoint configuration / Endpoint configuration** set
  * **Endpoint configuration name** to `config-2022-06-17-sagemaker-endpoint-serverless`. You may use anyother name here.
  * **Type of endpoint** to `Serverless`
  * From **Production variants** click on **Add Model** and then select the model name we want to deploy. In our case it is `linear-learner-2022-06-16-09-10-17-207`. Click **Save**. 

![add-model](images/2022-06-17-sagemaker-endpoint/add-model.png)

* Then Edit the **Max Concurrency** and set it to 5.

![max-concurrency](images/2022-06-17-sagemaker-endpoint/max-concurrency.png)

* Click **Create endpoint configuration**

![new-endpoint-config](images/2022-06-17-sagemaker-endpoint/new-endpoint-config.png)

* Click **Create endpoint**

![endpoint-created](images/2022-06-17-sagemaker-endpoint/endpoint-created.png)

It will take a minute for the created endpoint to become in service.
cold start issues and timeout.
logs : /aws/sagemaker/Endpoints/2022-06-17-sagemaker-endpoint-serverless
logs, docker container, gunicorn server, input iterators.

In [1]:
import sagemaker

ModuleNotFoundError: No module named 'sagemaker'