# Deploying the PyData stack onto AWS Lambda

AWS Lambda is a serverless stack from Amazon.  It allows you to have functions that run without needing to maintain a running server.  Lambda functions can be triggered from web requests, SQS, Kenisis, and a variety of other events.  Constructing apps with Lambda functions allows you to easily scale without worrying about spinning up servers.

AWS Lambda has a well known package size limit of 50MB, which can be expanded to 500MB through some hacks.  I hadn't thought it was possible to easily deploy functions that depended on the PyData stack (pandas, numpy, scikit-learn...) because of these size limitations.  In this notebook I wal through

* a simple lambda deployment with no dependencies
* a reqular packaged lambda deployment
* individual steps necessary to dpeloy the PyData stack
* a clean scripted PyData deploy

## Implementation notes about this notebook
I use the python magics of `%%writefile` and `%%bash` extensively.  `%%writefile` allows me to write the lambda functions and bash scripts inline.  `%%bash` allows multiline shell examples.

In a serious dpeloyment system these bash scripts would probably be integrated into Ansible, Chef, or Puppet.  The aws python api could also be used to accomplish the same tasks.  Using the AWS CLI tools through bash is the most straight forward way of experimenting with the lambda platform


## Running this notebook.

The code examples assume a properly configured AWS CLI environment.  The user for the AWS CLI environment must have access to create Lambda functions.  This tutorial also assumes an environment variable AWS_ID with your AWS user_id.  There are scripts integrated which use this variable, and replace the actual account number with "AWS_ID" from the output for privacy.

This example ntoebook assumes some commands like `aws_sanitize` and `create_lambda_function` that are detailed in the deployment notebook.  `create_lambda_function` is a shell script that is used to simplify lambda function deployment.

Some bash commands especially towards the end take a while to run, I have used `time` before these commands.

Finally, running these commands will generate AWS charges, but they should be minimal.

## Simple Lambda Function

In [1]:
import json
import os
import re

import boto3
import requests

In [2]:
%%writefile nb1_helloworld.py
def lambda_handler(event, context):
    return {'body': "hello world"}

Overwriting nb1_helloworld.py


In [3]:
!time ./create_lambda nb1_helloworld 1 min_pack.zip | tail -n 19

{
    "FunctionName": "nb1_helloworld_1",
    "FunctionArn": "arn:aws:lambda:us-east-2:$AWS_ID:function:nb1_helloworld_1",
    "Runtime": "python3.7",
    "Role": "arn:aws:iam::$AWS_ID:role/service-role/aws_lambda_role",
    "Handler": "nb1_helloworld.lambda_handler",
    "CodeSize": 397,
    "Description": "",
    "Timeout": 10,
    "MemorySize": 128,
    "LastModified": "2019-04-10T23:51:46.843+0000",
    "CodeSha256": "LTo8yGqbrAoxuMjPT4HPUjQQN9bWc3uK5L9ejLMOdhA=",
    "Version": "$LATEST",
    "TracingConfig": {
        "Mode": "PassThrough"
    },
    "RevisionId": "7f38858f-f6bf-4d02-940f-563962125ff7"
}
{"body": "hello world"}

real	0m12.028s
user	0m2.560s
sys	0m0.606s


so we ran the above function and it returned `{"body": "hello world"}`

In [4]:
%env AWS_REGION=us-east-2
%env LAMBDA_NAME=nb1_helloworld_1

env: AWS_REGION=us-east-2
env: LAMBDA_NAME=nb1_helloworld_1
env: SUB_PATH=helloworld_endpoint


In [5]:
! aws apigateway create-rest-api --name nb_lambda_api_1

{
    "id": "onaw5gdom3",
    "name": "nb_lambda_api_1",
    "createdDate": 1554940314,
    "apiKeySource": "HEADER",
    "endpointConfiguration": {
        "types": [
            "EDGE"
        ]
    }
}


In [6]:
%env API_ID=onaw5gdom3
#copy the ID field from above into this env variable

env: API_ID=onaw5gdom3


In [7]:
!aws apigateway get-resources --rest-api-id $API_ID

{
    "items": [
        {
            "id": "2tinefqbn5",
            "path": "/"
        }
    ]
}


In [9]:
%env API_PARENT=2tinefqbn5
%env SUB_PATH=helloworld_endpoint
#once again copy id from the above response into this env block

env: API_PARENT=2tinefqbn5
env: SUB_PATH=helloworld_endpoint


In [10]:
!aws apigateway create-resource --rest-api-id $API_ID  --parent-id $API_PARENT --path-part $SUB_PATH

{
    "id": "a0gu29",
    "parentId": "2tinefqbn5",
    "pathPart": "helloworld_endpoint",
    "path": "/helloworld_endpoint"
}


In [11]:
%env HW_RESOURCE_ID=a0gu29

env: HW_RESOURCE_ID=a0gu29


In [12]:
%%bash 
export FUNC_URN=arn:aws:lambda:$AWS_REGION:$AWS_ID:function:$LAMBDA_NAME/invocations
export HW_FUNC_INTEGRATION_URI=arn:aws:apigateway:$AWS_REGION:lambda:path/2015-03-31/functions/$FUNC_URN

aws apigateway put-method --rest-api-id $API_ID --resource-id $HW_RESOURCE_ID  \
--http-method POST --authorization-type NONE | ./aws_sanitize

aws apigateway put-integration --rest-api-id $API_ID --resource-id $HW_RESOURCE_ID \
--http-method POST --type AWS --integration-http-method POST \
--uri $HW_FUNC_INTEGRATION_URI | ./aws_sanitize

aws apigateway put-method-response --rest-api-id $API_ID \
--resource-id $HW_RESOURCE_ID --http-method POST \
--status-code 200 --response-models application/json=Empty | ./aws_sanitize

aws apigateway put-integration-response --rest-api-id $API_ID \
--resource-id $HW_RESOURCE_ID --http-method POST \
--status-code 200 --response-templates application/json="" | ./aws_sanitize

aws apigateway create-deployment --rest-api-id $API_ID --stage-name prod

{
    "httpMethod": "POST",
    "authorizationType": "NONE",
    "apiKeyRequired": false
}
{
    "type": "AWS",
    "httpMethod": "POST",
    "uri": "arn:aws:apigateway:us-east-2:lambda:path/2015-03-31/functions/arn:aws:lambda:us-east-2:$AWS_ID:function:nb1_helloworld_1/invocations",
    "passthroughBehavior": "WHEN_NO_MATCH",
    "timeoutInMillis": 29000,
    "cacheNamespace": "a0gu29",
    "cacheKeyParameters": []
}
{
    "statusCode": "200",
    "responseModels": {
        "application/json": "Empty"
    }
}
{
    "statusCode": "200",
    "responseTemplates": {
        "application/json": null
    }
}
{
    "id": "ttusmd",
    "createdDate": 1554940393
}


In [13]:
%%bash
aws lambda add-permission --function-name $LAMBDA_NAME \
--statement-id apigateway-test-1 --action lambda:InvokeFunction \
--principal apigateway.amazonaws.com \
--source-arn "arn:aws:execute-api:$AWS_REGION:$AWS_ID:$API_ID/*/POST/$SUB_PATH" 2>&1  | ./aws_sanitize
#this is a testing call, apparently it is necessary
aws lambda add-permission --function-name $LAMBDA_NAME \
--statement-id apigateway-prod-1 --action lambda:InvokeFunction \
--principal apigateway.amazonaws.com \
--source-arn "arn:aws:execute-api:$AWS_REGION:$AWS_ID:$API_ID/prod/POST/$SUB_PATH" 2>&1  | ./aws_sanitize


{
    "Statement": "{\"Sid\":\"apigateway-test-1\",\"Effect\":\"Allow\",\"Principal\":{\"Service\":\"apigateway.amazonaws.com\"},\"Action\":\"lambda:InvokeFunction\",\"Resource\":\"arn:aws:lambda:us-east-2:$AWS_ID:function:nb1_helloworld_1\",\"Condition\":{\"ArnLike\":{\"AWS:SourceArn\":\"arn:aws:execute-api:us-east-2:$AWS_ID:onaw5gdom3/*/POST/helloworld_endpoint\"}}}"
}
{
    "Statement": "{\"Sid\":\"apigateway-prod-1\",\"Effect\":\"Allow\",\"Principal\":{\"Service\":\"apigateway.amazonaws.com\"},\"Action\":\"lambda:InvokeFunction\",\"Resource\":\"arn:aws:lambda:us-east-2:$AWS_ID:function:nb1_helloworld_1\",\"Condition\":{\"ArnLike\":{\"AWS:SourceArn\":\"arn:aws:execute-api:us-east-2:$AWS_ID:onaw5gdom3/prod/POST/helloworld_endpoint\"}}}"
}


In [14]:
!aws apigateway test-invoke-method --rest-api-id $API_ID \
--resource-id $HW_RESOURCE_ID --http-method POST --path-with-query-string "" \
--body file://echo.json   2>&1 | ./aws_sanitize

{
    "status": 200,
    "body": "{\"body\": \"hello world\"}",
    "headers": {
        "Content-Type": "application/json",
        "X-Amzn-Trace-Id": "Root=1-5cae823a-b6503633aa846c8f0ad22733;Sampled=0"
    },
    "multiValueHeaders": {
        "Content-Type": [
            "application/json"
        ],
        "X-Amzn-Trace-Id": [
            "Root=1-5cae823a-b6503633aa846c8f0ad22733;Sampled=0"
        ]
    },
    "log": "Execution log for request fdaf31d7-5beb-11e9-a156-4fb58de79c26\nWed Apr 10 23:54:34 UTC 2019 : Starting execution for request: fdaf31d7-5beb-11e9-a156-4fb58de79c26\nWed Apr 10 23:54:34 UTC 2019 : HTTP Method: POST, Resource Path: /helloworld_endpoint\nWed Apr 10 23:54:34 UTC 2019 : Method request path: {}\nWed Apr 10 23:54:34 UTC 2019 : Method request query string: {}\nWed Apr 10 23:54:34 UTC 2019 : Method request headers: {}\nWed Apr 10 23:54:34 UTC 2019 : Method request body before transformations: {\n  \"operation\": \"echo\",\n  \"payload\": {\n

In [15]:
!curl -X POST -d "{\"unused_key\":\"value\", \"payload\":{\"pkey1\":\"pval1\"}}" \
      "https://$API_ID.execute-api.$AWS_REGION.amazonaws.com/prod/$SUB_PATH"


{"body": "hello world"}

In [16]:
%%writefile echo.json
{
  "operation": "echo",
  "payload": {
    "somekey1": "somevalue1",
    "somekey2": "somevalue2"
  }
}

Overwriting echo.json


## Gateway Hookup scripted with python and boto
Wow, that was a lot of shell scripting.  The AWS CLI works well for simple examples and experimentation, it quickly becomes unwieldy for more complex tasks.  In the above examples I manually extracted newly created ID's into variables for subsequent use.  I could have scripted this extraction with `sed`, but it would have been awkward.  Instead I built some deployment functions with python and boto.  

NB.  I would not call this a serious deployment solution, for that look into `ansible`, `cloudformation` or some other solution.  These scripts are just the MVP necessarily for me to experiment with AWS Lambda enough to decide if I could use it for real world work.

In [17]:
# global
AWS_ID = os.getenv("AWS_ID")
REGION = "us-east-2"
E_ARN_fstr = "arn:aws:execute-api:{REGION}:{AWS_ID}:{API_ID}/{STG}/POST/{SUB_PATH}"
F_ARN_fstr = "arn:aws:lambda:{REGION}:{AWS_ID}:function:{LAMBDA_NAME}/invocations"
IURI_fstr = "arn:aws:apigateway:{REGION}:lambda:path/2015-03-31/functions/{FUNC_ARN}"
apig = boto3.client('apigateway')
lambda_client = boto3.client('lambda')

In [18]:
def get_or_create_rest_api_id(api_name):
    rest_apis_resp = apig.get_rest_apis()

    for itm in rest_apis_resp['items']:
        if itm['name'] == api_name:
            return itm['id']
    # this api doesn't exist, create it
    rest_api_resp = apig.create_rest_api(name=api_name)
    api_id = rest_api_resp['id']
    return api_id


def get_api_id_resource_id(api_name, child_path, parent_path="/"):
    # do some check to see if this exists, if it does, return the
    # existing
    api_id = get_or_create_rest_api_id(api_name)
    resources_resp = apig.get_resources(restApiId=api_id)
    for itm in resources_resp['items']:
        if 'pathPart' in itm and itm['pathPart'] == child_path:
            # if this child_path has already been created, we're done
            new_api_resource_id = itm['id']
            return api_id, new_api_resource_id
        if itm['path'] == parent_path:
            parent_resource_id = itm['id']

    create_resource_resp = apig.create_resource(
        restApiId=api_id, parentId=parent_resource_id, pathPart=child_path)
    new_api_resource_id = create_resource_resp['id']
    return api_id, new_api_resource_id


def extract_number(haystack):
    return int(re.search("(\d+)", haystack).group(0))


def create_new_deployment(api_id):
    # deployments are necessary to connect the gateway to the public
    # internet you can setup caching, and have multiple deployments of
    # the same API routed to different functions.  You can control the
    # percentage of traffic that goes to each endpoint for staged
    # deployments

    # It seems that we need a new deployment for each update.

    stages_resp = apig.get_stages(restApiId=api_id)
    ver_num = 0
    for itm in stages_resp['item']:
        try:
            itm_ver = extract_number(itm['stageName'])
            if itm_ver > ver_num:
                ver_num = itm_ver
        except Exception as ex:
            continue
    new_ver_num = ver_num + 1

    stage_name = "prod_{0}".format(new_ver_num)
    apig.create_deployment(
        restApiId=api_id, stageName=stage_name)
    return stage_name


class ModifyConfigurationError(Exception):
    pass


def ensure_methods(api_id, new_api_resource_id, func_arn):
    try:
        methods = apig.get_method(
            restApiId=api_id, resourceId=new_api_resource_id, httpMethod="POST")
    except Exception as ex:
        if ex.response['Error']['Code'] == 'NotFoundException':
            return False
    mi_uri = methods['methodIntegration']['uri']
    if func_arn not in mi_uri:
        raise ModifyConfigurationError(
            "There is already an integration for {0} pointing at {1}, can't point to {2}".format(
                new_api_resource_id, mi_uri, func_arn))
    return True


def setup_methods(api_id, new_api_resource_id, func_arn):
    already_exists = ensure_methods(api_id, new_api_resource_id, func_arn)
    if already_exists:
        return

    apig.put_method(
        restApiId=api_id, resourceId=new_api_resource_id,
        httpMethod="POST", authorizationType="NONE")

    integration_uri = IURI_fstr.format(REGION=REGION, FUNC_ARN=func_arn)
    apig.put_integration(
        restApiId=api_id, resourceId=new_api_resource_id,
        httpMethod="POST", type="AWS", integrationHttpMethod="POST",
        uri=integration_uri)

    response_models = {'application/json': 'Empty'}
    apig.put_method_response(
        restApiId=api_id, resourceId=new_api_resource_id,
        httpMethod="POST", statusCode="200",
        responseModels=response_models)

    response_templates = {'application/json': ""}
    apig.put_integration_response(
        restApiId=api_id, resourceId=new_api_resource_id,
        httpMethod="POST", statusCode="200",
        responseTemplates=response_templates)


def set_lambda_permisions(api_id, url_subpath, lambda_name, stage_name):
    statement_num = 0
    needs_test = True
    needs_prod = True

    TEST_ARN = E_ARN_fstr.format(
        REGION=REGION, AWS_ID=AWS_ID, API_ID=api_id,
        SUB_PATH=url_subpath, STG="*")

    PROD_ARN = E_ARN_fstr.format(
        REGION=REGION, AWS_ID=AWS_ID, STG=stage_name,
        SUB_PATH=url_subpath, API_ID=api_id)

    try:
        policy_resp = lambda_client.get_policy(FunctionName=lambda_name)
        policy = json.loads(policy_resp['Policy'])

        for stmt in policy['Statement']:
            cur_stmt_num = extract_number(stmt['Sid'])
            if cur_stmt_num > statement_num:
                statement_num = cur_stmt_num
            source_arn = stmt['Condition']['ArnLike']['AWS:SourceArn']
            if source_arn == TEST_ARN:
                needs_test = False
            if source_arn == PROD_ARN:
                needs_prod = False

    except Exception as ex:
        if ex.response['Error']['Code'] == 'ResourceNotFoundException':
            pass
        else:
            raise
    new_statement_num = statement_num + 1
    test_stmt = "apigateway-test-{0:d}".format(new_statement_num)
    prod_stmt = "apigateway-prod-{0:d}".format(new_statement_num)
    if needs_test:
        lambda_client.add_permission(
            Action="lambda:InvokeFunction", Principal="apigateway.amazonaws.com",
            FunctionName=lambda_name, StatementId=test_stmt, SourceArn=TEST_ARN)
    if needs_prod:
        lambda_client.add_permission(
            Action="lambda:InvokeFunction", Principal="apigateway.amazonaws.com",
            FunctionName=lambda_name, StatementId=prod_stmt, SourceArn=PROD_ARN)


def hookup_gateway(gateway_name, url_subpath, lambda_name,
                   test_json_fname='echo.json'):
    api_id, new_api_resource_id = get_api_id_resource_id(
        gateway_name, url_subpath)

    func_arn = F_ARN_fstr.format(
        REGION=REGION, AWS_ID=AWS_ID, LAMBDA_NAME=lambda_name)

    setup_methods(api_id, new_api_resource_id, func_arn)
    stage_name = create_new_deployment(api_id)
    set_lambda_permisions(api_id, url_subpath, lambda_name, stage_name)

    invoke_resp = apig.test_invoke_method(
        restApiId=api_id, resourceId=new_api_resource_id,
        httpMethod="POST", pathWithQueryString="",
        body=open(test_json_fname).read())

    url_fstr = "https://{API_ID}.execute-api.{REGION}.amazonaws.com/{STG}/{SUB_PATH}"
    url = url_fstr.format(API_ID=api_id, REGION=REGION, STG=stage_name,
                          SUB_PATH=url_subpath)

    return dict(
        test_response=invoke_resp['body'],
        api_id=api_id, url=url)


In [19]:
api_info = hookup_gateway(
    gateway_name="nb_lambda_api_2", url_subpath="helloworld_endpoint2",
    lambda_name="nb1_helloworld_1")
os.environ['API_URL'] = api_info['url']
api_info



{'test_response': '{"body": "hello world"}',
 'api_id': 'lx4pwu3za2',
 'url': 'https://lx4pwu3za2.execute-api.us-east-2.amazonaws.com/prod_1/helloworld_endpoint2'}

In [20]:
!echo $API_URL
!curl -X POST -d "{\"unused_key\":\"value\", \"payload\":{\"pkey1\":\"pval1\"}}" $API_URL

https://lx4pwu3za2.execute-api.us-east-2.amazonaws.com/prod_1/helloworld_endpoint2
{"body": "hello world"}

In [21]:
%%writefile nb1_addition.py

def lambda_handler(event, context):
    return {'sum': event.get('a', 0) + event.get('b', 0)}

Overwriting nb1_addition.py


In [22]:
%%writefile addition_1.json
{ "a":5, "b":10 }

Overwriting addition_1.json


In [23]:
!time ./create_lambda nb1_addition 1 min_pack.zip | tail -n 5

        "Mode": "PassThrough"
    },
    "RevisionId": "a62139a3-8a0b-4f79-ab70-ab7dbebf7836"
}
{"sum": 0}

real	0m10.507s
user	0m2.780s
sys	0m0.666s


In [24]:
api_info = hookup_gateway(
    gateway_name="nb_lambda_api_2", url_subpath="addition_endpoint",
    lambda_name="nb1_addition_1", test_json_fname='addition_1.json')
os.environ['API_URL'] = api_info['url']
api_info



{'test_response': '{"sum": 15}',
 'api_id': 'lx4pwu3za2',
 'url': 'https://lx4pwu3za2.execute-api.us-east-2.amazonaws.com/prod_2/addition_endpoint'}

In [27]:
!curl -X POST -d "{\"a\":30, \"b\":-10}" $API_URL

{"sum": 20}

In [28]:
resp = requests.post(api_info['url'], data = json.dumps({'a': 5, 'b':100}))
resp.content

b'{"sum": 105}'

## Summary
It works.  We can now create a lambda function, deploy it, and then call it via requests or curl