Serverless Machine Learning API: Use PyTorch in AWS Lambda for Inference

Mystique Unicorn App is a building new application based on microservice architectural pattern. One of the services used by teh app is exposed as an ReST API does machine learning inference. This particular ML model and its depedent libraries need about 3GB of storage space. The dev team had been using lambda for most of their APIs and exposing them using Amazon API Gatway. They are interested in utilizing the same compute & gateway services for this ML api as well.

Currently(Q3 2020), the Lambda has only 500MB of temporary space available and about 250MB for unzipped layers. Re:Invent might changes these limites, But the teams is really on keen on getting started now.

Can you help them do that in Amazon API Gateway & AWS Lambda?

🎯Solutions

Amazon EFS is a fully managed shared file system that can be attached to a Lambda functions. This allows developers to easily build and import large code libraries directly into your Lambda functions, share data across function invocations. As the files in EFS is loaded dynamically during function invocation, you can also ensure that the latest version of these libraries is always used by every new execution environment.

In this article, we will build an architecture, similar to the one shown above. To bootstrap our EFS with machine learning libraries and models, We will be using an EC2 machine. Once the process of installing and configuring EFS, the EC2 machine can be terminated.

For the machine learning part, we will be using a pre-trained model open sourced by @nicolalandro available in PyTorch Hub. This model classifies birds using a fine-grained image classifier. We will deploy this model in EFS. When we send the url of the image to the model, it will return us the bird spcies(broadly speaking).

🧰 Prerequisites

This demo, instructions, scripts and cloudformation template is designed to be run in us-east-1. With few modifications you can try it out in other regions as well(Not covered here).
- 🛠 AWS CLI Installed & Configured - Get help here
- 🛠 AWS CDK Installed & Configured - Get help here
- 🛠 Python Packages, Change the below commands to suit your OS, the following is written for amzn linux 2
  - Python3 - yum install -y python3
  - Python Pip - yum install -y python-pip
  - Virtualenv - pip3 install virtualenv
  NOTE: Given that we are planning to machine learning inferences using Lambda, the lambda function needs enough compute and memory to return a response in reasonable time. The automation in this repo, sets up lambda with 3008MB memory and 5 Minutes timeout. In addition to that, we will also be configuring Provisioned Concurrency ² for our lambda function to avoid cold starts.
  
  Obviously, there has been no attempt made to optimize these settings, as this just a technology demonstration. Given the above reasons and other resources like EC2, please be mindful of the costs involved in deploying and learning from this stack.

⚙️ Setting up the environment

Get the application code

git clone https://github.com/miztiik/serverless-machine-learning-api
cd serverless-machine-learning-api

🚀 Prepare the dev environment to run AWS CDK

We will cdk to be installed to make our deployments easier. Lets go ahead and install the necessary components.
```
# If you DONT have cdk installed
npm install -g aws-cdk

# Make sure you in root directory
python3 -m venv .env
source .env/bin/activate
pip3 install -r requirements.txt
```
The very first time you deploy an AWS CDK app into an environment (account/region), you’ll need to install a bootstrap stack, Otherwise just go ahead and deploy using cdk deploy.
```
cdk bootstrap
cdk ls
# Follow on screen prompts
```
You should see an output of the available stacks,
```
vpc-stack
efs-stack
pytorch-on-efs
serverless-machine-learning-api
```
🚀 Deploying the application

Let us walk through each of the stacks,
- Stack: efs-stack We are going to create an EFS share and also create an /ml access point that will be used by our lambda function. We also need an VPC to host our EFS, the dependent stack vpc-stack will be automatically deployed for you. This stack will also set the Acl & PosixUser as 1000.
  
  To enable communication to our EFS, we will also setup an exclusive security group that allows port 2049 connections over TCP from any ip within the VPC. This will allow any EC2 instance and lambda functions within the VPC to read and write to our file share.
  
  Initiate the deployment with the following command,
```
cdk deploy vpc-stack efs-stack
```
- Stack: pytorch-on-efs To bootstrap our EFS with the machine learning library and models, we need an instance that can write to our EFS share. We will be using an EC2 instance and the user_data script to automatically download and install the libraries. The script will install torch torchvision and numpy. The ML model will be downloaded from PyTorch Hub³
  
  Initiate the deployment with the following command,
```
cdk deploy pytorch-on-efs
```
- Stack: serverless-machine-learning-api
  
  At this point, we are all set to configure our machine learning inference api using AWS Lambda and expose it using API Gateway. This stack:serverless-machine-learning-api do just that for us. It will create the lambda function inside the same VPC as our EFS share. The EFS share will be available for lambda at this mount point /mnt/inference. The path for the model and the dependent libraries are set as envionrment variables,
  - PYTHONPATH : /mnt/inference/lib
  - TORCH_HOME : /mnt/inference/model
  Since we are also looking to avoid cold starts, the stack will create a versioned lambda and enable a provisioned concurrency of 1.
  
  Initiate the deployment with the following command,
```
cdk deploy serverless-machine-learning-api
```
  Check the Outputs section of the stack to access the MachineLearningInferenceApiUrl

🔬 Testing the solution

We can use a tool like curl or Postman to query the urls. The Outputs section of the respective stacks has the required information on the urls.

$ WELL_ARCHICTED_API_URL="https://r4e3y68p11.execute-api.us-east-1.amazonaws.com/prod/serverless-machine-learning-api/greeter"
$ curl ${WELL_ARCHICTED_API_URL}
{
  "message": "Hello from Miztiikal World, How is it going?",
  "api_stage": "prod",
  "lambda_version": "38",
  "ts": "2020-08-26 13:03:19.810150"
}

We need to append the image url as a query string. Here, couple of sample images of birds(Courstesy of wikimedia⁵). Update the ML_API_URL and try it out. You can try with other bird images that are publicly accessible.

$ ML_API_URL="https://ace17f0y9c.execute-api.us-east-1.amazonaws.com/prod/ml-api/identify-bird-species"
IMG_URL_1="https://upload.wikimedia.org/wikipedia/commons/d/d2/Western_Grebe_swimming.jpg"
IMG_URL_2="https://upload.wikimedia.org/wikipedia/commons/b/b5/House_Sparrow_%28Passer_domesticus%29-_Male_in_Kolkata_I_IMG_5904.jpg"

time curl ${ML_API_URL}?url=${IMG_URL_1}

Expected Output,

{
  "message": "{'bird_class': '053.Western_Grebe'}",
  "lambda_version": "14",
  "ts": "2020-09-07 17:47:58.469903"
}
real    0m27.570s
user    0m0.015s
sys     0m0.016s

time curl ${ML_API_URL}?url=${IMG_URL_2}

Expected Output,

{
  "message": "{'bird_class': '118.House_Sparrow'}",
  "lambda_version": "14",
  "ts": "2020-09-07 17:49:46.138871"
}
real    0m2.645s
user    0m0.020s
sys     0m0.032s

It is possible that the first invocation takes slightly longer(even maybe timing out at API GW) as the function has initialize with libraries and models from EFS. Subsequent invocations should be significantly lower at around ~ 3 seconds.

Additional Learnings: You can check the logs in cloudwatch for more information or increase the logging level of the lambda functions by changing the environment variable from INFO to DEBUG

📒 Conclusion

Here we have demonstrated how to use EFS share with Lambda as a persistent storage. Here are few other use cases that you can try with the same pattern,
- Media processing with ffmpeg: For example - Keyframe extraction for highlights etc.,
- Custom machine learning: For example use OpenCV to process of media

🧹 CleanUp

If you want to destroy all the resources created by the stack, Execute the below command to delete the stack, or you can delete the stack from console as well
- Resources created during Deploying The Application
- Delete CloudWatch Lambda LogGroups
- Any other custom resources, you have created for this demo
```
# Delete from cdk
cdk destroy

# Follow any on-screen prompts

# Delete the CF Stack, If you used cloudformation to deploy the stack.
aws cloudformation delete-stack \
    --stack-name "MiztiikAutomationStack" \
    --region "${AWS_REGION}"
```
This is not an exhaustive list, please carry out other necessary steps as maybe applicable to your needs.

📌 Who is using this

This repository aims to teach how to use persistent storage with serverless microservices running on AWS Lambda to new developers, Solution Architects & Ops Engineers in AWS. Based on that knowledge these Udemy course #1, course #2 helps you build complete architecture in AWS.

💡 Help/Suggestions or 🐛 Bugs

Thank you for your interest in contributing to our project. Whether it's a bug report, new feature, correction, or additional documentation or solutions, we greatly value feedback and contributions from our community. Start here

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
images		images
serverless_machine_learning_api		serverless_machine_learning_api
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
app.py		app.py
cdk.json		cdk.json
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

images

images