This guide explains how to serve a sci-kit learn ML model via AWS Lambda and Amazon API Gateway. Although the example is quite specific and the ML model simple, the method can easily be adopted for more complex models. The concept of building an AWS Lambda layer can also be extended to other python package dependencies.
In the completed setup a user is able to send HTTP requests to API Gateway triggering the execution of a Lambda function. This Lambda function will load the saved pre-trained model, load the dependencies from the right layer and make a prediction based on the input parameters provided by the user. The user receives the result as request response.
- 0. Prerequisites
- 1. Setting up an AWS Lambda layer
- 2. Training a simple ML model
- 3. Setting up the Lambda function
- 4. Configuring API Gateway
- Appendix: Memory and Timeout settings
Not many things are needed in order to get started:
Lambda layers are a way of providing your Lambda function with the dependencies it needs to execute. In our case that would be the scikit-learn
package, which in turn depends on a bunch of other packages.
Luckily we have pip to figure out the dependency tree for us. The only thing we have to do is specify the right versions of the packages. There is a potential pitfall in this step (especially if you're working on a Windows machine) as you might think that you can simply reuse the packages already installed on your machine. However, AWS Lambda is running in a Linux environment, so the Python packages for Windows won't work! Instead we can ask pip to download the Linux versions of the dependencies we want by specifying --platform manylinux1_x86_64
.
pip download --python-version 38 --abi cp38 --platform manylinux1_x86_64 --only-binary=:all: --no-binary=:none: scikit-learn
Note: This example assumes our Lamda function is running on Pyton 3.8. If that is not the case simply adjust the --python-version
and --abi
arguments. Pip will download the packaged .whl
files to the location you are currently at. By specifying --only-binary=:all: --no-binary=:none:
we tell pip that we also want to download all the packages that the sklearn
package depends on.
Now we're nearly ready to make a Lambda layer. The only things we have to do now is to unpack the .whl
files and put them into the right folder structure. Unpacking is easy. We can do this on a Linux machine by calling unzip path/to/file.whl
and on Windows by renaming .whl
to .zip
and simply extracting the files. Repeat this step for each package (in our case it should be joblib
, numpy
, scikit_learn
, scipy
and threadpoolctl
). All folders called *.dist-info
can safely be deleted.
In order to make our Lambda function aware of the provided packages they have to be organized into a specific folder structure. The following diagram shows the structure and where to place all the extracted packages:
python/
└── lib/
└── python3.8/
└── site-packages/
├── joblib/
├── numpy/
├── numpy.libs/
├── scikit_learn.libs/
├── scipy/
├── scipy.libs/
├── sklearn/
└── threadpoolctl.py
Note: Again we are assuming that we are using Python 3.8. If you are using a different version adjust the folder name. Zip the whole folder structure before going on to the next step.
In the AWS Management Console search for "AWS Lambda". In the left-hand menu under "Additional resources" choose "Layers" and click "Create layer". Give the layer a name, upload the .zip
file and choose a runtime (in our case Python 3.8). Note: For large files consider storing the .zip
file in an S3 Bucket first and uploading it from there. Click "create".
Congrats! You successfully created a Lambda layer!
Now to some Data Science. Before we can serve a ML model and do inference, we have to create and train it. This toy example will train a Random Forest classifier on the Iris data set and save the pretrained model as a file. To view the whole code go to train.py.
First we import some dependencies and load the dataset from the pre-installed datasets in sklearn
.
from sklearn import datasets
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from joblib import dump
# Import the Iris dataset
iris = datasets.load_iris()
We split the data into training and testing sets ...
# Split into data and target vector
X = iris['data']
y = iris['target']
# Split into train and test data sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
... and train a Random Forest Classifier on the training set.
# Make a Random Forest classifier and train it
rfc = RandomForestClassifier()
rfc = rfc.fit(X_train,y_train)
We can validate the models' performace by checking the testing accuracy:
# Evaluate the testing accuracy
y_pred = rfc.predict(X_test)
print("Testing accuracy: ", accuracy_score(y_test,y_pred))
The output should be:
Testing accuracy: 0.98
In a last step we save the trained model as a .joblib
file to preserve it and load it in our Lambda function.
# Store the trained model as .joblib file
dump(rfc, './myLambdaFunction/model.joblib')
Now we have nearly all pieces of the puzzle. The only thing missing is the Lambda function itself. In our case this will be another Python file containing a function following a special syntax. The so-called Lambda Handler. Each time the Lambda function is triggerd this function is executed and is provided input via the event
variable. Notice that the model is loaded outside the lambda function handler. Lambda functions that are triggered repeatedly reuse their context. This leads to a performance increase, as the model doesn't necessarily have to be reloaded every time.
target_names = ['setosa', 'versicolor', 'virginica']
rfc = load('model.joblib')
def lambda_handler(event, context):
sepal_length = float(event['sl'])
sepal_width = float(event['sw'])
petal_length = float(event['pl'])
petal_width = float(event['pw'])
y = rfc.predict([[sepal_length, sepal_width, petal_length, petal_width]])[0]
c = target_names[y]
print(c)
return {
'statusCode': 200,
'body': c,
'headers': {
'Content-Type': 'application/json',
},
}
Zip "lambda_function.py" and model.joblib" before going on to the next step.
In the AWS Management Console search for "AWS Lambda". In the left-hand menu choose "Functions" and click "Create function". Give the function a name and select a runtime.
Under "Function code" click the "Actions" drop-down and choose "Upload a .zip file".
Upload the .zip
file containing "lambda_function.py" and "model.joblib".
You can test the function by creating a Test. Go to Test > Configure Events > Create new test event. Provide a test name and the following body:
{
"sl": 6.9,
"sw": 3.1,
"pl": 5.1,
"pw": 2.3
}
The four input parameters correspond to the four features the model assumes as input (sepal-length, sepal-width, petal-length, petal-width). If you run the test the execution result should read:
Response:
{
"statusCode": 200,
"body": "virginica"
}
This is great!. Our Lambda function works and is hosting a ML model! However, it has no way to communicate with the outside world and is pretty useless. Thus, we need to define a trigger to activate the execution of the function and figure a way to input data. AWS offers offers an integrated API service called API Gateway. We can easily define an API Gateway trigger by clicking "Add trigger" in the lambda function designer and select API Gateway.
Choose "REST API" as API type and "Open" as Security and click "Create".
Back in the terminal of your Lambda function click on the link of your newly created API in the "API-Gateway" section. This will bring you directly to the configuration terminal for your API.
There are different ways to communicate with the REST API. POST and GET requests are two of the most common. To configure your API to accept POST requests click the "Actions" drop-down menu and choose "Create Method". In the created drop-down menu choose "POST".
In the following setup process choose your Lambda function in the field "Lambda Function". You can test the execution of the API and the Lambda function by clicking on "Test" in the main menu of the POST method and providing the same "Request Body" that we used already earier:
{
"sl": 6.9,
"sw": 3.1,
"pl": 5.1,
"pw": 2.3
}
Once you deployed your API ("Actions"-drop-down > Deploy API) it is able to receive input via POST requests, trigger the Lambda function and send the result back as response message. You can validate the whole process by using e.g. Postman to interact with the API.
In case you would rather input the data directly from your browser via the URL you can also configure your API to accept GET requests. Click the "Actions" drop-down menu and choose "Create Method". In the created drop-down menu choose "GET". Just like in the instructions for the POST method choose your Lambda function in the field "Lambda Function".
Open the "Method Request" field and provide the input variables in the "URL Query String Parameters" section. Tick the "Required" boxes.
Go back and open the "Integration Request" field. Under the "Mapping Templates" section add a new mapping template of type "application/json". And fill in the following template:
{
"sl": "$input.params('sl')",
"sw": "$input.params('sw')",
"pl": "$input.params('pl')",
"pw": "$input.params('pw')"
}
Click "Save". You can test the API by clicking "Test" in the "Method Execution" window and provide the following as "Query Strings":
sl=6.9&sw=3.1&pl=5.1&pw=2.3
Deploy your API ("Actions"-drop-down > Deploy API). And open a new tab in your browser. Enter the following URL replacing <YOUR_API_URL>
:
https://<YOUR_API_URL>/myLambdaFunction/?sl=6.9&sw=3.1&pl=5.1&pw=2.3
You should receive a response in the following form:
{"statusCode": 200, "body": "virginica"}
Good job! You did it! You deployed your own Machine Learning model in the cloud and managed to make it communicate with the world. The best part is: AWS is doing all the computing for you and will ensure that your model has a high availability and scales with your application!
We haven't touched upon this topic before as AWS took care of it automatically and it wasn't a big issue. However, once you want to serve more complex, computationally heavy models, the CPU and timeout settings could become a bottleneck or even lead to a failure if not configured correctly. When we set up our Lambda function it was configured with the default parameters for CPU Memory (128 MB) and Timeout (3 seconds). This is good enough for our simple model and still guarantees a response in a reasonable amount of time.
The Memory and Timeout can be adjusted in the Basic settings of your lambda function. Keep in mind that more memory will enable a faster response but also increase your costs.
We can check response times of our lambda functions by enabling active tracing and viewing the trace logs in AWS X-Ray. In the "Monitoring tools" of your Lambda function click "Edit" and switch on "Active tracing" under "AWS X-Ray". In the "Monitoring" tab of your Lambda function you can now choose "View traces in X-Ray".
We can see that providing our Lambda function with four times the available CPU memory leads to an over four times faster response. Note: Just ignore the traces with response time > 3 seconds, those are "cold starts".
In a real-world setup we do not want to configure all the steps mentioned above manually in the Management Console. Instead we would probably want to automate the build and deploy of the entire application. As all the use services (AWS Lambda and API Gateway) are serverless we are extra lucky and can make use of the Serverless Application Model (short SAM) which is built on top of CloudFormation. In a simple template.yaml
file we can define our infrastructure/resource-stack and afterwards deploy it. The following command will build our application, downloading all neccessary dependencies (e.g. for our lambda layer) and storing them in a separate folder.
sam build
To deploy the application in your AWS account run:
sam deploy --guided
Short info: As the SAM framework uses the Lambda Proxy integration in API Gateway the function was slightly modified with respect to the one shown above. Things such as extracting the query strings out of the input and formatting the response in the right fashion have to be handled directly in the lambda function as the API is "just" a proxy.