In [None]:
%%bash
#NOTE TO Proofreaders, these are cleanup functions that won't be in the final product
rm -rf simple_lambda
mkdir simple_lambda

#delete all lambda functions with nb1- in the name,  
#all lambda functions in this document are created with the nb1_ prefix
aws lambda list-functions | grep FunctionName | cut -d ":" -f 2| cut -d "," -f 1 | grep "nb1_" | xargs -L 1 aws lambda delete-function --function-name
rm -rf simple_package
rm -rf sklearn_package/*

# Deploying the PyData stack onto AWS Lambda

AWS Lambda is a serverless stack from Amazon.  It allows you to have functions that run without needing to maintain a running server.  Lambda functions can be triggered from web requests, SQS, Kenisis, and a variety of other events.  Constructing apps with Lambda functions allows you to easily scale without worrying about spinning up servers.

AWS Lambda has a well known package size limit of 50MB, which can be expanded to 500MB through some hacks.  I hadn't thought it was possible to easily deploy functions that depended on the PyData stack (pandas, numpy, scikit-learn...) because of these size limitations.  In this notebook I wal through

* a simple lambda deployment with no dependencies
* a reqular packaged lambda deployment
* individual steps necessary to dpeloy the PyData stack
* a clean scripted PyData deploy

## Implementation notes about this notebook
I use the python magics of `%%writefile` and `%%bash` extensively.  `%%writefile` allows me to write the lambda functions and bash scripts inline.  `%%bash` allows multiline shell examples.

In a serious dpeloyment system these bash scripts would probably be integrated into Ansible, Chef, or Puppet.  The aws python api could also be used to accomplish the same tasks.  Using the AWS CLI tools through bash is the most straight forward way of experimenting with the lambda platform


## Running this notebook.

The code examples assume a properly configured AWS CLI environment.  The user for the AWS CLI environment must have access to create Lambda functions.  This tutorial also assumes an environment variable AWS_ID with your AWS user_id.  There are scripts integrated which use this variable, and replace the actual account number with "AWS_ID" from the output for privacy.

Some bash commands especially towards the end take a while to run, I have used `time` before these commands.

Finally, running these commands will generate AWS charges, but they should be minimal.

In [1]:
%%writefile aws_sanitize
#!/bin/bash
#this is used to prevent my aws_id leaking into public
#I'm not completely clear why protecting my account number is necessary for security
#but all tutorials do it, so I will too.
replace='$AWS_ID'
sed -e "s/$AWS_ID/$replace/"

Overwriting aws_sanitize


In [2]:
!chmod +x ./aws_sanitize

## Simple Lambda Function

In [3]:
%%writefile simple_lambda/nb1_simple_lambda_function.py

def lambda_handler(event, context):
    return {'body': "hello world"}

Writing simple_lambda/nb1_simple_lambda_function.py


In [4]:
%%bash
cd simple_lambda
zip function.zip nb1_simple_lambda_function.py
aws lambda create-function \
        --function-name nb1_simple_lambda_function \
        --handler nb1_simple_lambda_function.lambda_handler \
        --zip-file fileb://function.zip \
        --runtime python3.7 \
        --role "arn:aws:iam::$AWS_ID:role/service-role/aws_lambda_role" | ../aws_sanitize

  adding: nb1_simple_lambda_function.py (deflated 1%)
{
    "FunctionName": "nb1_simple_lambda_function",
    "FunctionArn": "arn:aws:lambda:us-east-2:$AWS_ID:function:nb1_simple_lambda_function",
    "Runtime": "python3.7",
    "Role": "arn:aws:iam::$AWS_ID:role/service-role/aws_lambda_role",
    "Handler": "nb1_simple_lambda_function.lambda_handler",
    "CodeSize": 279,
    "Description": "",
    "Timeout": 3,
    "MemorySize": 128,
    "LastModified": "2019-04-03T19:07:56.464+0000",
    "CodeSha256": "N4fD00XJuFZK5ROY+UzN4W4epqIbME1qNHPb76s1mVI=",
    "Version": "$LATEST",
    "TracingConfig": {
        "Mode": "PassThrough"
    },
    "RevisionId": "299f853a-7a51-4c82-8adf-312c3a42381f"
}


### Let's test the function

In [5]:
%%bash
aws lambda invoke \
    --function-name "nb1_simple_lambda_function" \
    --log-type Tail  --invocation-type  RequestResponse slf.out > /dev/null
cat slf.out  | ./aws_sanitize

{"body": "hello world"}


## Simple Package
This example shows how to package simple python libraries with a lambda function

In [6]:
!mkdir simple_package

In [7]:
%%writefile simple_package/nb1_requests_function.py
import requests

def lambda_handler(event, context):
    resp = requests.get("https://www.google.com")
    resp_len = len(resp.content)
    return {'resp_len': resp_len}

Writing simple_package/nb1_requests_function.py


In [8]:
%%bash
cd simple_package
pip install requests --target .  2>&1 | tail -n 5
zip -r9 ./package_function.zip ./* | tail -n 5
aws lambda create-function \
        --function-name nb1_requests_function \
        --handler nb1_requests_function.lambda_handler \
        --zip-file fileb://package_function.zip \
        --runtime python3.7 \
        --role "arn:aws:iam::$AWS_ID:role/service-role/aws_lambda_role" | ../aws_sanitize

Collecting chardet<3.1.0,>=3.0.2 (from requests)
  Using cached https://files.pythonhosted.org/packages/bc/a9/01ffebfb562e4274b6487b4bb1ddec7ca55ec7510b22e4c51f14098443b8/chardet-3.0.4-py2.py3-none-any.whl
awscli 1.16.121 has requirement botocore==1.12.111, but you'll have botocore 1.12.112 which is incompatible.
Installing collected packages: idna, certifi, urllib3, chardet, requests
Successfully installed certifi-2019.3.9 chardet-3.0.4 idna-2.8 requests-2.21.0 urllib3-1.24.1
  adding: urllib3-1.24.1.dist-info/LICENSE.txt (deflated 41%)
  adding: urllib3-1.24.1.dist-info/METADATA (deflated 64%)
  adding: urllib3-1.24.1.dist-info/RECORD (deflated 62%)
  adding: urllib3-1.24.1.dist-info/top_level.txt (stored 0%)
  adding: urllib3-1.24.1.dist-info/WHEEL (deflated 14%)
{
    "FunctionName": "nb1_requests_function",
    "FunctionArn": "arn:aws:lambda:us-east-2:$AWS_ID:function:nb1_requests_function",
    "Runtime": "python3.7",
    "Role": "arn:aws:iam::$AWS_ID:role/service-role/aws_lambda

### Let's make a script to invoke the lambda function

In [9]:
%%writefile run_function
#!/bin/bash
aws lambda invoke \
    --function-name $1 \
    --log-type Tail  --invocation-type  RequestResponse slf.out > /dev/null
cat slf.out  | ./aws_sanitize

Overwriting run_function


In [11]:
!chmod +x run_function
!./run_function nb1_requests_function

{"resp_len": 11222}


## PyData Package with Pandas and Scikit-Learn

In [12]:
!mkdir sklearn_package

mkdir: sklearn_package: File exists


In [13]:
%%writefile sklearn_package/nb1_pandas_sum.py
import pandas
import numpy as np
import pandas as pd

def lambda_handler(event, context):
    df = pd.DataFrame({'a':np.arange(40, 50, step=.5), 'b':np.arange(40,60)})
    return df.sum().to_dict()


Writing sklearn_package/nb1_pandas_sum.py


In [14]:
%%bash
#note the extra options to force linux packages even if you are on OS X
pip install pandas scikit-learn \
        --platform manylinux1_x86_64\
        --python-version 37 \
        --only-binary=:all:  --target sklearn_package 2>&1 | tail -n 5

Collecting six>=1.5 (from python-dateutil>=2.5.0->pandas)
  Using cached https://files.pythonhosted.org/packages/73/fb/00a976f728d0d1fecfe898238ce23f502a721c0ac0ecfedb80e0d88c64e9/six-1.12.0-py2.py3-none-any.whl
awscli 1.16.121 has requirement botocore==1.12.111, but you'll have botocore 1.12.112 which is incompatible.
Installing collected packages: six, python-dateutil, numpy, pytz, pandas, scipy, scikit-learn
Successfully installed numpy-1.16.2 pandas-0.24.2 python-dateutil-2.8.0 pytz-2018.9 scikit-learn-0.20.3 scipy-1.2.1 six-1.12.0


In [15]:
%%bash
cd sklearn_package
zip -r9 sklearn_package.zip ./ | tail -n 3

  adding: sklearn/utils/tests/test_validation.py (deflated 77%)
  adding: sklearn/utils/validation.py (deflated 74%)
  adding: sklearn/utils/weight_vector.cpython-37m-x86_64-linux-gnu.so (deflated 58%)


In [16]:
%%bash
cd sklearn_package
time aws lambda create-function \
        --function-name nb1_pandas_sum \
        --handler nb1_pandas_sum.lambda_handler \
        --zip-file fileb://sklearn_package.zip \
        --runtime python3.7 \
        --role "arn:aws:iam::$AWS_ID:role/service-role/aws_lambda_role" | ../aws_sanitize


Connection was closed before we received a valid response from endpoint URL: "https://lambda.us-east-2.amazonaws.com/2015-03-31/functions".

real	5m18.419s
user	0m4.662s
sys	0m4.048s


## the above failed because the pacakge was too big
Instead we have to upload the package to s3, then point he lambda function at the s3 package

In [20]:
%%bash

#we have to succsfully create the package with a properly sized zip file
time aws lambda create-function \
        --function-name nb1_pandas_sum \
        --handler nb1_pandas_sum.lambda_handler \
        --zip-file fileb://simple_package/package_function.zip \
        --runtime python3.7 \
        --role "arn:aws:iam::$AWS_ID:role/service-role/aws_lambda_role" | ./aws_sanitize
        

aws s3 mb s3://pandas-sklearn-demo  2>&1 > /dev/null
time aws s3 cp sklearn_package/sklearn_package.zip s3://pandas-sklearn-demo/sklearn_package.zip
aws lambda update-function-code \
        --function-name nb1_pandas_sum  \
        --s3-bucket pandas-sklearn-demo \
        --s3-key sklearn_package.zip | ./aws_sanitize


{
    "FunctionName": "nb1_pandas_sum",
    "FunctionArn": "arn:aws:lambda:us-east-2:$AWS_ID:function:nb1_pandas_sum",
    "Runtime": "python3.7",
    "Role": "arn:aws:iam::$AWS_ID:role/service-role/aws_lambda_role",
    "Handler": "nb1_pandas_sum.lambda_handler",
    "CodeSize": 901141,
    "Description": "",
    "Timeout": 3,
    "MemorySize": 128,
    "LastModified": "2019-04-03T19:55:17.116+0000",
    "CodeSha256": "NCOl8+MxjuaVt7NEfeGid/VW6ROznhQd7m2ZBIt8RqE=",
    "Version": "$LATEST",
    "TracingConfig": {
        "Mode": "PassThrough"
    },
    "RevisionId": "2f199dc5-d5b0-4a12-ab32-c7840fffa162"
}
make_bucket failed: s3://pandas-sklearn-demo An error occurred (BucketAlreadyOwnedByYou) when calling the CreateBucket operation: Your previous request to create the named bucket succeeded and you already own it.
Completed 256.0 KiB/67.3 MiB (89.5 KiB/s) with 1 file(s) remainingCompleted 512.0 KiB/67.3 MiB (153.7 KiB/s) with 1 file(s) remainingCompleted 768.0 KiB/67.3 MiB (230.2 


real	0m6.678s
user	0m0.986s
sys	0m0.374s

real	1m41.895s
user	0m2.467s
sys	0m1.624s


In [22]:
!./run_function nb1_pandas_sum

{"a": 895.0, "b": 990.0}


## create_lambda script
Packaging up these lambda function is getting complex.  let's put all of this into a script

In [23]:
%%writefile create_lambda
#!/bin/bash

#note function name must be the same as the module name
function_name=$1
ver_number=$2
package_zip=$3

ver_name="${function_name}_${ver_number}"
mod_file="${function_name}.py"
mod_name=$function_name
handler_name="${mod_name}.lambda_handler"
empty_zip_file=/tmp/empty.zip
bucket_zip="${versioned_name}.zip"

zip_file=/tmp/function.zip
cp $package_zip $zip_file
zip $zip_file $mod_file
zip $empty_zip_file $mod_file

#we use the empty_zip file because this will fail with a larger zip and the s3 upload will have to be run again
#anyway,  this will save some time.
aws lambda create-function --function-name $ver_name \
           --zip-file fileb://$empty_zip_file  --handler $handler_name\
           --runtime python3.7 \
           --role "arn:aws:iam::$AWS_ID:role/service-role/aws_lambda_role" > /dev/null

#this bucket is hardcoded.  this can probably be improved upon
#aws s3 mb s3://pandas-sklearn-demo  2>&1 > /dev/null
aws s3 cp $zip_file  s3://pandas-sklearn-demo/$bucket_zip > /dev/null

aws lambda update-function-code \
        --s3-bucket pandas-sklearn-demo \
        --s3-key $bucket_zip \
        --function-name $ver_name | ./aws_sanitize
aws lambda update-function-configuration \
        --function-name $ver_name \
        --timeout 10 | ./aws_sanitize



./run_function $ver_name

Overwriting create_lambda


## Now we have to make bare package zip directories

In [26]:
%%bash
chmod +x create_lambda
rm sklearn_package.zip
cd sklearn_package
rm *.zip

#zip up the sklearn package
zip -r9 ../sklearn_package.zip ./ | tail -n 3

  adding: sklearn/utils/tests/test_validation.py (deflated 77%)
  adding: sklearn/utils/validation.py (deflated 74%)
  adding: sklearn/utils/weight_vector.cpython-37m-x86_64-linux-gnu.so (deflated 58%)


rm: *.zip: No such file or directory


## pandas_example2

In [27]:
%%writefile nb1_pandas_example2.py
import pandas as pd

def lambda_handler(event, context):
    df = pd.DataFrame({'a':[5,3,2,10], 'b':[20, 30, 40, 50]})
    return {'mean':  repr(df.mean())}


Overwriting nb1_pandas_example2.py


In [29]:
!time ./create_lambda nb1_pandas_example2 1 sklearn_package.zip

  adding: nb1_pandas_example2.py (deflated 15%)
  adding: nb1_pandas_example2.py (deflated 15%)
{
    "FunctionName": "nb1_pandas_example2_1",
    "FunctionArn": "arn:aws:lambda:us-east-2:$AWS_ID:function:nb1_pandas_example2_1",
    "Runtime": "python3.7",
    "Role": "arn:aws:iam::$AWS_ID:role/service-role/aws_lambda_role",
    "Handler": "nb1_pandas_example2.lambda_handler",
    "CodeSize": 70550821,
    "Description": "",
    "Timeout": 3,
    "MemorySize": 128,
    "LastModified": "2019-04-03T20:06:58.238+0000",
    "CodeSha256": "NAEvkGeF9XgXr5kidPJxBUSE0HphJRjy2vXc1uOm+FE=",
    "Version": "$LATEST",
    "TracingConfig": {
        "Mode": "PassThrough"
    },
    "RevisionId": "c48c67bb-0606-456f-a7cb-575b01992cc3"
}
{
    "FunctionName": "nb1_pandas_example2_1",
    "FunctionArn": "arn:aws:lambda:us-east-2:$AWS_ID:function:nb1_pandas_example2_1",
    "Runtime": "python3.7",
    "Role": "arn:aws:iam::$AWS_ID:role/service-role/aws_lambda_role",
    "Handler": "nb1_pandas_example2.

## TODO lambda_create_lambda_function
Creating these large packaged lambda functions is SLOW, primarily because it takes a while to upload to s3.  I want to write a lambda function that grabs an existing zipped package directory from s3, recieves a python file from arguments or is pointed at another zip file containing a package tree, then zips everything together and puts the resulitng large zip file in another s3 bucket.  After that the regular create/update lambda function commadns can be run.  I expect this to be signficantly faster.

## Matplotlib example


In [30]:
%%bash
#note the extra options to force linux packages even if you are on OS X
mkdir pydata_full
rm -rf pydata_full/*
#note I would like to include sklearn here, but that blows the expanded 
#package size up too much for lambda.  I'm pretty sure I can get around that by
#forcing python to use zip_import so that there are zip files in the zip package
#the inner zip files will not get expanded by AWS, and I think I can sneak by this way

pip install pandas matplotlib \
        --platform manylinux1_x86_64\
        --python-version 37 \
        --only-binary=:all:  --target pydata_full 2>&1 | tail -n 5

pip install papermill jupyter \
        --target pydata_full 2>&1 | tail -n 5
rm pydata_full.zip
cd pydata_full
zip -r9 ../pydata_full.zip ./ | tail -n 3

Collecting setuptools (from kiwisolver>=1.0.1->matplotlib)
  Downloading https://files.pythonhosted.org/packages/44/56/75e64a8fbbe9e0bd30cfdd58ca1856bc0dc15a43e41504a58d8373f34213/setuptools-40.9.0-py2.py3-none-any.whl (575kB)
awscli 1.16.121 has requirement botocore==1.12.111, but you'll have botocore 1.12.112 which is incompatible.
Installing collected packages: numpy, six, python-dateutil, pytz, pandas, setuptools, kiwisolver, pyparsing, cycler, matplotlib
Successfully installed cycler-0.10.0 kiwisolver-1.0.1 matplotlib-3.0.3 numpy-1.16.2 pandas-0.24.2 pyparsing-2.3.1 python-dateutil-2.8.0 pytz-2018.9 setuptools-40.9.0 six-1.12.0
Target directory /Users/paddy/code/lambda_py_notebooks/pydata_full/setuptools already exists. Specify --upgrade to force replacement.
Target directory /Users/paddy/code/lambda_py_notebooks/pydata_full/setuptools-40.9.0.dist-info already exists. Specify --upgrade to force replacement.
Target directory /Users/paddy/code/lambda_py_notebooks/pydata_full/six-1.1

mkdir: pydata_full: File exists


In [35]:
%%writefile nb1_matplotlib_s3.py
from io import BytesIO

import matplotlib as mpl
import matplotlib.pyplot as plt

import boto3
import botocore

def save_plot(fig, bucket='pandas-sklearn-demo', key='plot.png'):
    buffer_ = BytesIO()
    fig.savefig(buffer_)
    buffer_.seek(0)

    
    s3 = boto3.resource('s3')
    bucket_obj = s3.Bucket(bucket)
    
    bucket_obj.put_object(
        Key=key, Body=buffer_,
        StorageClass='REDUCED_REDUNDANCY',
        #ACL='public-read',
        ContentType='image/png')
    s3Client = boto3.client('s3')
    temp_url = s3Client.generate_presigned_url(
        'get_object', Params = {'Bucket': bucket, 'Key': key}, ExpiresIn = 100)
    return temp_url

    
def lambda_handler(event, context):
    mpl.use('agg')

    fig, ax = plt.subplots(figsize=(10,7))
    ax.plot(range(20), range(20))
    image_url = save_plot(fig, key='plot7.png')
    return {'image_url': image_url}

Writing nb1_matplotlib_s3.py


In [36]:
!time ./create_lambda nb1_matplotlib_s3 1 pydata_full.zip
!say "beep"

  adding: nb1_matplotlib_s3.py (deflated 46%)
  adding: nb1_matplotlib_s3.py (deflated 46%)
{
    "FunctionName": "nb1_matplotlib_s3_1",
    "FunctionArn": "arn:aws:lambda:us-east-2:$AWS_ID:function:nb1_matplotlib_s3_1",
    "Runtime": "python3.7",
    "Role": "arn:aws:iam::$AWS_ID:role/service-role/aws_lambda_role",
    "Handler": "nb1_matplotlib_s3.lambda_handler",
    "CodeSize": 76543251,
    "Description": "",
    "Timeout": 3,
    "MemorySize": 128,
    "LastModified": "2019-04-03T20:18:38.073+0000",
    "CodeSha256": "/PCpV4yFXO7yS5ncwDG8peyyTRLVaKz0Ruc4sGmTTcs=",
    "Version": "$LATEST",
    "TracingConfig": {
        "Mode": "PassThrough"
    },
    "RevisionId": "c064d519-11df-4e48-8472-9a3ebc2ce82c"
}
{
    "FunctionName": "nb1_matplotlib_s3_1",
    "FunctionArn": "arn:aws:lambda:us-east-2:$AWS_ID:function:nb1_matplotlib_s3_1",
    "Runtime": "python3.7",
    "Role": "arn:aws:iam::$AWS_ID:role/service-role/aws_lambda_role",
    "Handler": "nb1_matplotlib_s3.lambda_handler",

### Figure out how to embed full copy generated plot
It would be obvious to link to it, bandwidth costs aren't certain though