In [1]:
#NOTE TO Proofreaders, these are cleanup functions that won't be in the final product
!mkdir simple_lambda
!ls simple_lambda
!aws lambda delete-function --function-name simple_lambda_function
!aws lambda delete-function --function-name pil_lambda_function
!aws lambda delete-function --function-name sklearn_lambda_function
!rm -rf simple_package
!rm -rf sklearn_package/*

mkdir: simple_lambda: File exists
[31mcreate_function.bash[m[m      function.zip              simple_lambda_function.py


# Deploying the PyData stack onto AWS Lambda

AWS Lambda is a serverless stack from Amazon.  It allows you to have functions that run without needing to maintain a running server.  Lambda functions can be triggered from web requests, SQS, Kenisis, and a variety of other events.  Constructing apps with Lambda functions allows you to easily scale without worrying about spinning up servers.

AWS Lambda has a well known package size limit of 50MB, which can be expanded to 500MB through some hacks.  I hadn't thought it was possible to easily deploy functions that depended on the PyData stack (pandas, numpy, scikit-learn...) because of these size limitations.  In this notebook I wal through

* a simple lambda deployment with no dependencies
* a reqular packaged lambda deployment
* individual steps necessary to dpeloy the PyData stack
* a clean scripted PyData deploy

## Implementation notes about this notebook
I use the python magics of `%%writefile` and `%%bash` extensively.  `%%writefile` allows me to write the lambda functions and bash scripts inline.  `%%bash` allows multiline shell examples.

In a serious dpeloyment system these bash scripts would probably be integrated into Ansible, Chef, or Puppet.  The aws python api could also be used to accomplish the same tasks.  Using the AWS CLI tools through bash is the most straight forward way of experimenting with the lambda platform


## Running this notebook.

The code examples assume a properly configured AWS CLI environment.  The user for the AWS CLI environment must have access to create Lambda functions.  This tutorial also assumes an environment variable AWS_ID with your AWS user_id.  There are scripts integrated which use this variable, and replace the actual account number with "AWS_ID" from the output for privacy.

Some bash commands especially towards the end take a while to run, I have used `time` before these commands.

Finally, running these commands will generate AWS charges, but they should be minimal.

In [99]:
%%writefile aws_sanitize
#!/bin/bash
#this is used to prevent my aws_id leaking into public
#I'm not completely clear why protecting my account number is necessary for security
#but all tutorials do it, so I will too.
replace='$AWS_ID'
sed -e "s/$AWS_ID/$replace/"

Overwriting aws_sanitize


In [100]:
!chmod +x ./aws_sanitize

## Simple Lambda Function

In [3]:
%%writefile simple_lambda/simple_lambda_function.py
import json

def lambda_handler(event, context):
    return {
        'statusCode': 205,
        'body': json.dumps({'event':event, 'context': dir(context),
                            'func_name': 'simpole_lambda_function'})}

Overwriting simple_lambda/simple_lambda_function.py


In [5]:
%%bash
cd simple_lambda
zip function.zip simple_lambda_function.py
aws lambda create-function \
        --function-name simple_lambda_function \
        --handler simple_lambda_function.lambda_handler \
        --zip-file fileb://function.zip \
        --runtime python3.7 \
        --role "arn:aws:iam::$AWS_ID:role/service-role/aws_lambda_role" | ../aws_sanitize

updating: simple_lambda_function.py (deflated 36%)
{
    "FunctionName": "simple_lambda_function",
    "FunctionArn": "arn:aws:lambda:us-east-2:$AWS_ID:function:simple_lambda_function",
    "Runtime": "python3.7",
    "Role": "arn:aws:iam::$AWS_ID:role/service-role/aws_lambda_role",
    "Handler": "simple_lambda_function.lambda_handler",
    "CodeSize": 345,
    "Description": "",
    "Timeout": 3,
    "MemorySize": 128,
    "LastModified": "2019-03-31T04:22:26.604+0000",
    "CodeSha256": "3WAwFFRpNc/hkpLnheTZdmeZ2e4N3t4/qap8RdE7HLA=",
    "Version": "$LATEST",
    "TracingConfig": {
        "Mode": "PassThrough"
    },
    "RevisionId": "9d56b3f7-bc0b-4fd2-9f9a-d3d0238205db"
}


### Let's test the function

In [6]:
%%bash
aws lambda invoke \
    --function-name "simple_lambda_function" \
    --log-type Tail  --invocation-type  RequestResponse slf.out > /dev/null
cat slf.out  | ./aws_sanitize

{"statusCode": 205, "body": "{\"event\": {}, \"context\": [\"__class__\", \"__delattr__\", \"__dict__\", \"__dir__\", \"__doc__\", \"__eq__\", \"__format__\", \"__ge__\", \"__getattribute__\", \"__gt__\", \"__hash__\", \"__init__\", \"__init_subclass__\", \"__le__\", \"__lt__\", \"__module__\", \"__ne__\", \"__new__\", \"__reduce__\", \"__reduce_ex__\", \"__repr__\", \"__setattr__\", \"__sizeof__\", \"__str__\", \"__subclasshook__\", \"__weakref__\", \"_epoch_deadline_time_in_ms\", \"aws_request_id\", \"client_context\", \"function_name\", \"function_version\", \"get_remaining_time_in_millis\", \"identity\", \"invoked_function_arn\", \"log\", \"log_group_name\", \"log_stream_name\", \"memory_limit_in_mb\"], \"func_name\": \"simpole_lambda_function\"}"}


## Simple Package

In [7]:
!mkdir simple_package

In [95]:
%%writefile simple_package/package_lambda_function.py
import json
import requests

def lambda_handler(event, context):
    resp = requests.get("https://www.google.com")
    resp_len = len(resp.content)
    return {'resp_len': resp_len}

Overwriting simple_package/package_lambda_function.py


In [9]:
%%bash
cd simple_package
pip install requests --target .  2>&1 | tail -n 5
zip -r9 ./package_function.zip ./* | tail -n 5
aws lambda create-function \
        --function-name pil_lambda_function \
        --handler package_lambda_function.lambda_handler \
        --zip-file fileb://package_function.zip \
        --runtime python3.7 \
        --role "arn:aws:iam::$AWS_ID:role/service-role/aws_lambda_role" | ../aws_sanitize

  Using cached https://files.pythonhosted.org/packages/60/75/f692a584e85b7eaba0e03827b3d51f45f571c2e793dd731e598828d380aa/certifi-2019.3.9-py2.py3-none-any.whl
Collecting urllib3<1.25,>=1.21.1 (from requests)
  Using cached https://files.pythonhosted.org/packages/62/00/ee1d7de624db8ba7090d1226aebefab96a2c71cd5cfa7629d6ad3f61b79e/urllib3-1.24.1-py2.py3-none-any.whl
Installing collected packages: idna, chardet, certifi, urllib3, requests
Successfully installed certifi-2019.3.9 chardet-3.0.4 idna-2.8 requests-2.21.0 urllib3-1.24.1
  adding: urllib3-1.24.1.dist-info/LICENSE.txt (deflated 41%)
  adding: urllib3-1.24.1.dist-info/METADATA (deflated 64%)
  adding: urllib3-1.24.1.dist-info/RECORD (deflated 62%)
  adding: urllib3-1.24.1.dist-info/top_level.txt (stored 0%)
  adding: urllib3-1.24.1.dist-info/WHEEL (deflated 14%)
  adding: package_lambda_function.py (deflated 36%)
{
    "FunctionName": "pil_lambda_function",
    "FunctionArn": "arn:aws:lambda:us-east-2:$AWS_ID:function:pil_lambda_f

### Let's make a script to invoke the lambda function

In [67]:
%%writefile run_function
#!/bin/bash
aws lambda invoke \
    --function-name $1 \
    --log-type Tail  --invocation-type  RequestResponse slf.out > /dev/null
cat slf.out  | ./aws_sanitize

Overwriting run_function


In [11]:
!chmod +x run_function
!./run_function pil_lambda_function

{"statusCode": 205, "body": "{\"event\": {}, \"context\": [\"__class__\", \"__delattr__\", \"__dict__\", \"__dir__\", \"__doc__\", \"__eq__\", \"__format__\", \"__ge__\", \"__getattribute__\", \"__gt__\", \"__hash__\", \"__init__\", \"__init_subclass__\", \"__le__\", \"__lt__\", \"__module__\", \"__ne__\", \"__new__\", \"__reduce__\", \"__reduce_ex__\", \"__repr__\", \"__setattr__\", \"__sizeof__\", \"__str__\", \"__subclasshook__\", \"__weakref__\", \"_epoch_deadline_time_in_ms\", \"aws_request_id\", \"client_context\", \"function_name\", \"function_version\", \"get_remaining_time_in_millis\", \"identity\", \"invoked_function_arn\", \"log\", \"log_group_name\", \"log_stream_name\", \"memory_limit_in_mb\"], \"func_name\": \"package_lambda_function\"}"}


## PyData Package with Pandas and Scikit-Learn

In [44]:
!mkdir sklearn_package

In [34]:
%%writefile sklearn_package/package_lambda_function.py
import json
import pandas
import sklearn

def lambda_handler(event, context):
    return {
        'statusCode': 205,
        'body': json.dumps({'event':event, 'context': dir(context),
                            'func_name': 'package_lambda_function'})}

Writing sklearn_package/package_lambda_function.py


In [35]:
%%bash
#note the extra options to force linux packages even if you are on OS X
pip install pandas scikit-learn \
        --platform manylinux1_x86_64\
        --python-version 37 \
        --only-binary=:all:  --target sklearn_package 2>&1 | tail -n 5

  Using cached https://files.pythonhosted.org/packages/3e/7e/5cee36eee5b3194687232f6150a89a38f784883c612db7f4da2ab190980d/scipy-1.2.1-cp37-cp37m-manylinux1_x86_64.whl
Collecting six>=1.5 (from python-dateutil>=2.5.0->pandas)
  Using cached https://files.pythonhosted.org/packages/73/fb/00a976f728d0d1fecfe898238ce23f502a721c0ac0ecfedb80e0d88c64e9/six-1.12.0-py2.py3-none-any.whl
Installing collected packages: pytz, numpy, six, python-dateutil, pandas, scipy, scikit-learn
Successfully installed numpy-1.16.2 pandas-0.24.2 python-dateutil-2.8.0 pytz-2018.9 scikit-learn-0.20.3 scipy-1.2.1 six-1.12.0


In [36]:
%%bash
cd sklearn_package
zip -r9 sklearn_package.zip ./ | tail -n 3

  adding: sklearn/utils/tests/test_sparsefuncs.py (deflated 84%)
  adding: sklearn/utils/tests/test_stats.py (deflated 58%)
  adding: sklearn/utils/tests/test_testing.py (deflated 80%)
  adding: sklearn/utils/tests/test_utils.py (deflated 73%)
  adding: sklearn/utils/tests/test_validation.py (deflated 77%)
  adding: sklearn/utils/validation.py (deflated 74%)
  adding: sklearn/utils/weight_vector.cpython-37m-x86_64-linux-gnu.so (deflated 58%)


In [37]:
%%bash
cd sklearn_package
time aws lambda create-function \
        --function-name sklearn_lambda_function \
        --handler package_lambda_function.lambda_handler \
        --zip-file fileb://sklearn_package.zip \
        --runtime python3.7 \
        --role "arn:aws:iam::$AWS_ID:role/service-role/aws_lambda_role" | ../aws_sanitize


Connection was closed before we received a valid response from endpoint URL: "https://lambda.us-east-2.amazonaws.com/2015-03-31/functions".

real	5m14.732s
user	0m4.530s
sys	0m3.997s


## the above failed because the pacakge was too big
Instead we have to upload the package to s3, then point he lambda function at the s3 package

In [41]:
%%bash
cd sklearn_package
aws s3 mb s3://pandas-sklearn-demo  2>&1 > /dev/null
time aws s3 cp sklearn_package/sklearn_package.zip s3://pandas-sklearn-demo/sklearn_package.zip
aws lambda update-function-code \
        --function-name pandas_sklearn_demo2 \
        --s3-bucket pandas-sklearn-demo \
        --s3-key sklearn_package.zip | ../aws_sanitize


{
    "FunctionName": "pandas_sklearn_demo2",
    "FunctionArn": "arn:aws:lambda:us-east-2:$AWS_ID:function:pandas_sklearn_demo2",
    "Runtime": "python3.7",
    "Role": "arn:aws:iam::$AWS_ID:role/service-role/aws_lambda_role",
    "Handler": "pandas_sklearn_demo.lambda_handler",
    "CodeSize": 70547257,
    "Description": "",
    "Timeout": 3,
    "MemorySize": 128,
    "LastModified": "2019-04-03T00:45:21.232+0000",
    "CodeSha256": "QkXoj0DXvQJ3kk5RhPbUfUWuWvtE4m7xXlI8qAW9/7s=",
    "Version": "$LATEST",
    "TracingConfig": {
        "Mode": "PassThrough"
    },
    "RevisionId": "f2fd6408-96b0-46b7-91ab-67e0d2a920f9"
}



real	0m11.197s
user	0m0.490s
sys	0m0.094s


In [82]:
%%writefile create_lambda
#!/bin/bash

#note function name must be the same as the module name
function_name=$1
ver_number=$2
package_zip=$3

ver_name="${function_name}_${ver_number}"
mod_file="${function_name}.py"
mod_name=$function_name
handler_name="${mod_name}.lambda_handler"
empty_zip_file=/tmp/empty.zip
bucket_zip="${versioned_name}.zip"

zip_file=/tmp/function.zip
cp $package_zip $zip_file
zip $zip_file $mod_file
zip $empty_zip_file $mod_file

#we use the empty_zip file because this will fail with a larger zip and the s3 upload will have to be run again
#anyway,  this will save some time.
aws lambda create-function --function-name $ver_name \
           --zip-file fileb://$empty_zip_file  --handler $handler_name\
           --runtime python3.7 \
           --role "arn:aws:iam::$AWS_ID:role/service-role/aws_lambda_role" > /dev/null

#this bucket is hardcoded.  this can probably be improved upon
#aws s3 mb s3://pandas-sklearn-demo  2>&1 > /dev/null
aws s3 cp $zip_file  s3://pandas-sklearn-demo/$bucket_zip

aws lambda update-function-code \
        --s3-bucket pandas-sklearn-demo \
        --s3-key $bucket_zip \
        --function-name $ver_name | ./aws_sanitize

./run_function $ver_name

Overwriting create_lambda


In [52]:
%%bash
chmod +x create_lambda
cd sklearn_package

#zip up the sklearn package
zip -r9 ../sklearn_package.zip ./ | tail -n 3

  adding: sklearn/utils/tests/test_validation.py (deflated 77%)
  adding: sklearn/utils/validation.py (deflated 74%)
  adding: sklearn/utils/weight_vector.cpython-37m-x86_64-linux-gnu.so (deflated 58%)


In [85]:
%%writefile sklearn_example2.py
import json
import pandas as pd
import sklearn

def lambda_handler(event, context):
    df = pd.DataFrame({'a':[5,3,2,10], 'b':[20, 30, 40, 50]})
    return {
        'statusCode': 205,
        'body': json.dumps({'event':event, 'output': repr(df.mean())})}

Overwriting sklearn_example2.py


In [86]:
!time ./create_lambda sklearn_example2 6 sklearn_package.zip

function_name sklearn_example2
ver_name sklearn_example2_6
mod_name sklearn_example2
bucket_zip .zip
  adding: sklearn_example2.py (deflated 26%)
updating: sklearn_example2.py (deflated 26%)
{
    "FunctionName": "sklearn_example2_6",
    "FunctionArn": "arn:aws:lambda:us-east-2:$AWS_ID:function:sklearn_example2_6",
    "Runtime": "python3.7",
    "Role": "arn:aws:iam::$AWS_ID:role/service-role/aws_lambda_role",
    "Handler": "sklearn_example2.lambda_handler",
    "CodeSize": 378,
    "Description": "",
    "Timeout": 3,
    "MemorySize": 128,
    "LastModified": "2019-04-03T03:12:24.053+0000",
    "CodeSha256": "B2egszZdhKTDmdDLliRvM2ftaZ5g49I9Cpjt/bY4PQc=",
    "Version": "$LATEST",
    "TracingConfig": {
        "Mode": "PassThrough"
    },
    "RevisionId": "3b7a70a8-9784-47c5-a18b-71a92f863866"
}
upload: ../../../../tmp/function.zip to s3://pandas-sklearn-demo/.zip
{
    "FunctionName": "sklearn_example2_6",
    "FunctionArn": "arn:aws:lambda:us-east-2:$AWS_ID:function:sklearn_ex

## Matplotlib example

In [110]:
%%bash
#note the extra options to force linux packages even if you are on OS X
mkdir pydata_full
rm -rf pydata_full/*
pip install pandas matplotlib \
        --platform manylinux1_x86_64\
        --python-version 37 \
        --only-binary=:all:  --target pydata_full 2>&1 | tail -n 5

pip install papermill jupyter \
        --target pydata_full 2>&1 | tail -n 5
rm pydata_full.zip
cd pydata_full
zip -r9 ../pydata_full.zip ./ | tail -n 3

  Using cached https://files.pythonhosted.org/packages/73/fb/00a976f728d0d1fecfe898238ce23f502a721c0ac0ecfedb80e0d88c64e9/six-1.12.0-py2.py3-none-any.whl
Collecting setuptools (from kiwisolver>=1.0.1->matplotlib)
  Using cached https://files.pythonhosted.org/packages/d1/6a/4b2fcefd2ea0868810e92d519dacac1ddc64a2e53ba9e3422c3b62b378a6/setuptools-40.8.0-py2.py3-none-any.whl
Installing collected packages: pytz, six, python-dateutil, numpy, pandas, pyparsing, cycler, setuptools, kiwisolver, matplotlib
Successfully installed cycler-0.10.0 kiwisolver-1.0.1 matplotlib-3.0.3 numpy-1.16.2 pandas-0.24.2 pyparsing-2.3.1 python-dateutil-2.8.0 pytz-2018.9 setuptools-40.8.0 six-1.12.0
Target directory /Users/paddy/code/lambda_py_notebooks/pydata_full/setuptools already exists. Specify --upgrade to force replacement.
Target directory /Users/paddy/code/lambda_py_notebooks/pydata_full/setuptools-40.8.0.dist-info already exists. Specify --upgrade to force replacement.
Target directory /Users/paddy/code/l

mkdir: pydata_full: File exists


In [114]:
%%writefile mpl_debug.py
import json
import matplotlib as mpl
import matplotlib.pyplot as plt
from io import StringIO, BytesIO

def lambda_handler(event, context):
    mpl.use('agg')

    fig, ax = plt.subplots(figsize=(10,7))
    ax.plot(range(20), range(20))
    buffer_ = BytesIO()
    fig.savefig(buffer_)
    
    return {'img_len': len(buffer_.getvalue())}



Overwriting mpl_debug.py


In [115]:
!time ./create_lambda mpl_debug 6 pydata_full.zip

function_name mpl_debug
ver_name mpl_debug_6
mod_name mpl_debug
bucket_zip .zip
  adding: mpl_debug.py (deflated 35%)
updating: mpl_debug.py (deflated 35%)
{
    "FunctionName": "mpl_debug_6",
    "FunctionArn": "arn:aws:lambda:us-east-2:$AWS_ID:function:mpl_debug_6",
    "Runtime": "python3.7",
    "Role": "arn:aws:iam::$AWS_ID:role/service-role/aws_lambda_role",
    "Handler": "mpl_debug.lambda_handler",
    "CodeSize": 750,
    "Description": "",
    "Timeout": 3,
    "MemorySize": 128,
    "LastModified": "2019-04-03T05:45:06.883+0000",
    "CodeSha256": "6GQ9mrsuprqUpNswfDht6ipAxiU/+xrScFi15IltbWA=",
    "Version": "$LATEST",
    "TracingConfig": {
        "Mode": "PassThrough"
    },
    "RevisionId": "f985f083-d9cf-4ccd-85df-8d2e8d336c74"
}
upload: ../../../../tmp/function.zip to s3://pandas-sklearn-demo/.zip
{
    "FunctionName": "mpl_debug_6",
    "FunctionArn": "arn:aws:lambda:us-east-2:$AWS_ID:function:mpl_debug_6",
    "Runtime": "python3.7",
    "Role": "arn:aws:iam::$AWS_

In [117]:
!time ./run_function mpl_debug_6

{"img_len": 26585}

real	0m6.067s
user	0m0.878s
sys	0m0.242s


In [118]:
!pwd

/Users/paddy/code/lambda_py_notebooks
