<img src="https://fsdl.me/logo-720-dark-horizontal">

# Lab 99: Serverless Deployment on AWS Lambda

### What You Will Learn

- How to deploy a model backend serverlessly with AWS Lambda
- How to use the `aws` CLI effectively

# Setup

If you're running this notebook on Google Colab,
the cell below will run full environment setup.

It should take about three minutes to run.

In [None]:
lab_idx = None

if "bootstrap" not in locals() or bootstrap.run:
    # path management for Python
    pythonpath, = !echo $PYTHONPATH
    if "." not in pythonpath.split(":"):
        pythonpath = ".:" + pythonpath
        %env PYTHONPATH={pythonpath}
        !echo $PYTHONPATH

    # get both Colab and local notebooks into the same state
    !wget --quiet https://fsdl.me/gist-bootstrap -O bootstrap.py
    import bootstrap

    # change into the lab directory
    bootstrap.change_to_lab_dir(lab_idx=lab_idx)

    # allow "hot-reloading" of modules
    %load_ext autoreload
    %autoreload 2
    # needed for inline plots in some contexts
    %matplotlib inline

    bootstrap.run = False  # change to True re-run setup

    
!pwd
%ls

In [1]:
import os
os.chdir("D:/RL_Finance/MLops/fslab/lab07")

print(os.getcwd())

D:\RL_Finance\MLops\fslab\lab07


# Why should I deploy model backends on AWS Lambda?

Model backends (aka prediction services)
fit very nicely into the "serverless" paradigm of cloud deployment.

In general, they are stateless functions of their inputs,
and so we don't need to keep a server alive in between requests
or throughout a session.

For projects
with spiky or bursty traffic
(e.g. hitting the front page of Hacker News)
that makes scaling much easier:
AWS can implement generic autoscaling logic
for stateless functions.

We benefit in the form of very quick
(on the order of minutes)
scaling to larger workloads.

For low-traffic projects like demos,
serverless deployments can massively reduce costs
by "scaling to 0" -- running no machines when there's no traffic.

This notebook walks through the process for taking
the Text Recognizer model in the FSDL codebase
and packaging it up for deployment as
a serverless prediction service on AWS Lambda.

For more details on deployment,
including how to set up a frontend UI, see
[the lab on deployment](https://fsdl.me/lab07-colab).

Note that we won't be using an accelerator
(GPU/TPU/IPU) to run our network.
As of writing,
this is a limitation of serverless tools on major cloud providers.

For serverless deployment of GPU-accelerated models, check out
[banana](https://banana.dev).

Creating a Lambda requires
[first creating an AWS account](https://aws.amazon.com/premiumsupport/knowledge-center/create-and-activate-aws-account/),
which you should do now.

To open an AWS account, you need to provide payment information.

This payment method will be on the hook
for any charges you incur from AWS.

As you create your account,
remember the First Law of the Cloud™️:
> The cloud costs money in mysterious ways.

That is,
automatically scaling means automatically incurring large costs,
sometimes in unintuitive ways.

You can avoid these issues by setting up
[budgets, alerts, and automatic cost-saving actions](https://aws.amazon.com/getting-started/hands-on/control-your-costs-free-tier-budgets/).

# Build a container image

The easiest way to deploy code on AWS Lambda is
to wrap it in a Docker container.

That means you'll need to
[install Docker](https://docs.docker.com/engine/install/)
if you don't have it already.

If you want to avoid the need to include `sudo`
before every `docker` command,
you'll also need to follow the
[post-install steps](https://docs.docker.com/engine/install/linux-postinstall/#manage-docker-as-a-non-root-user)
to give non-root users access to Docker.

That also means this notebook can't run on Colab:

In [None]:
import sys


assert "google.colab" not in sys.modules, "this notebook requires Docker, not compatible with Colab"

We build Docker images from `Dockerfile`s.

`Dockerfile`s build up images layer by layer,
like a parfait or a brick wall,
where each layer is a step described in a
[domain-specific language](https://docs.docker.com/engine/reference/builder/).

We'll walk through the details of the `Dockerfile`
we use in the FSDL example project,
but if you'll be using Docker heavily,
you should read the
[official best practices guide](https://docs.docker.com/develop/develop-images/dockerfile_best-practices/)
for all the key tips and tricks.

Almost all `Dockerfile`s begin by defining an existing container image to build off of.

This is a key feature of Docker:
one person's final output container image
can be another person's starting point.

As our starting point,
we use the
[official AWS image for Lambda](https://gallery.ecr.aws/lambda/python).

In [3]:
!findstr "# Starting" api_serverless\Dockerfile


# Starting from an official AWS image
# Keep any dependencies and versions in this file aligned with the environment.yml and Makefile
# Install Python dependencies
# Copy only the relevant directories and files
#   note that we use a .dockerignore file to avoid copying logs etc.


We then install the `prod`uction `requirements` with a simple `pip` install.

AWS Lambdas do not come with GPU accelerators,
so we're installing the CPU version of PyTorch,
for which `pip` installation works well.

In [5]:
!findstr "install" api_serverless\Dockerfile

RUN pip install --upgrade pip==23.1.2
RUN pip install -r requirements.txt


Then we copy over exactly the files that we need from our model development repo.

We bring in the `text_recognizer` module
-- including the model weights! --
and the `api` from the `api_serverless` module.

In [None]:
!cat api_serverless/Dockerfile | grep "# Copy" -A 3

We avoid copying files that we don't need,
like the `.git` history,
with a `.dockerignore` file.

This is formatted and works much like a `.gitignore` file,
except that it hides files from our container rather than from our version control.

In [None]:
!cat .dockerignore

We can review the contents of the `api` file from

It's relatively simple:
we rely on the `ParagraphTextRecognizer` class to do the heavy lifting.

We just need to write a `handler` to
1. extract model inputs from JSON
`event`s in the AWS format,
2. feed those inputs to the model, and
3. package the outputs into JSON,

Effectively, we hook the "image-to-text" or "tensor-to-tensor"
format of our model into the "JSON-to-JSON" format typical of webservices.

In [None]:
from api_serverless import api


api??

We're now ready to build our container image with `docker build`.

We just need to pick a name.

We'll use an environment variable for this
and for other configuration information so that
it's easy to re-use this workflow in your own projects.

In [6]:
import os


os.environ["LAMBDA_NAME"] = "text-recognizer-backend"

In [None]:
!docker build -t $LAMBDA_NAME . --file api_serverless/Dockerfile

It will take a few minutes to run the first time.

Once it's done, we can run the container to test that it's working.

Open another terminal, define `$LAMBDA_NAME`, and run:

```bash
docker run -p 9000:8080 $LAMBDA_NAME\:latest
```

We can send a request to our "local Lambda" directly from the command line with `curl`:

In [None]:
!curl -XPOST \
  "http://localhost:9000/2015-03-31/functions/function/invocations" \
  -d '{"image_url": "https://fsdl-public-assets.s3-us-west-2.amazonaws.com/paragraphs/a01-077.png"}'

# Upload to the container registry

In order for our container to be runnable via AWS Lambda,
we need to put the container image --
the data required for `docker run` to start our container --
on Amazon's storage infrastructure.

Specifically, we add it to their
[container registry](https://www.redhat.com/en/topics/cloud-native-apps/what-is-a-container-registry)
service, the
[Elastic Container Registry (ECR)](https://aws.amazon.com/ecr/).

You should have
[created an AWS account](https://aws.amazon.com/premiumsupport/knowledge-center/create-and-activate-aws-account/)
already.

Once you have an account,
you can create a Lambda in any number of ways --
[using an SDK like `boto3`](https://hands-on.cloud/working-with-aws-lambda-in-python-using-boto3/)
or
[via the AWS Console GUI in the browser](https://docs.aws.amazon.com/lambda/latest/dg/getting-started.html).

The Console is in some ways easier to use
when you're just getting started with AWS,
but it's not easy to automate

So we'll use the
[AWS CLI](https://github.com/aws/aws-cli),
`aws`.

You can find
[installation instructions here](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html).

You'll need to configure your AWS credentials
so that the `aws` tool can take actions on your behalf.

You can read about `aws` configuration
[here](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-files.html).

At a minimum, you should set the following
1. AWS Access Key ID
2. AWS Secret Access Key ID
3. AWS region (use one that's close to your machine)
4. CLI output style (use `json` for compatibility with this notebook).

The interactive configuration tool is not compatible with Jupyter notebooks,
so you'll need to run this setup command in another terminal:
```bash
   aws configure
```

Once you've configured the CLI,
the cell below will print your numerical account ID
and your default region.

In [None]:
aws_account_id, = !aws sts get-caller-identity \
  --query "Account"
aws_region, = !aws configure get region 

os.environ["AWS_REGION"] = aws_region
os.environ["AWS_ACCOUNT_ID"] = aws_account_id.strip('"')

!echo $AWS_ACCOUNT_ID
!echo $AWS_REGION

Each combination of AWS account ID and region has its own container registry.

The registry is identified by its URI -- a narrower identifier than a URL, but let's call it a URL, just between friends.

In [None]:
os.environ["ECR_URI"] = ".".join(
    [os.environ["AWS_ACCOUNT_ID"], "dkr", "ecr", os.environ["AWS_REGION"], "amazonaws.com"])

!echo $ECR_URI

In order to push to that container registry,
we need log our Docker client into it.

In [None]:
!aws ecr get-login-password --region $AWS_REGION \
  | docker login --username AWS --password-stdin $ECR_URI

Inside of the registry, we create a "repository"
to hold onto and organize the container images --
just like we create repositories in `git` servers,
like GitHub, to store and organize projects.

"Repositories" here are generally just a single container image,
with versions that change .

We can create a repository with the command `aws ecr create-repository`.

In [None]:
!aws ecr create-repository \
  --repository-name $LAMBDA_NAME \
  --image-scanning-configuration scanOnPush=true --image-tag-mutability MUTABLE \
  | jq -C

`aws` commands return output that represents the result of their operations.

The format of that output is configurable.
We use the JSON format.

Two notes:

1. We pipe (`|`) the raw JSON output to `jq`, the command line JSON processor, so we can pretty-print them and `-C`olorize them.
2. If you're ever interested in learning more about an `aws` command, you can add `help` between the command and its parameters to get docs. In this notebook, that's always at the end of the first line. Just make sure to comment out the `jq` line at the end, or you'll get a parse error!

Back in Docker, we can add our container image to that repository without rebuilding.

First, we need to `tag` the image with a name inside the repository we just created --
that name includes the URL (technically, URI) of the registry and the name of the repository.

In [None]:
os.environ["IMAGE_URI"] = "/".join([os.environ["ECR_URI"], os.environ["LAMBDA_NAME"]])

In [None]:
!docker tag $LAMBDA_NAME\:latest $IMAGE_URI\:latest

Then we add our container to the registry with `docker push`.

This can take a few minutes.

In [None]:
!docker push $IMAGE_URI\:latest

# Create a Lambda function

To make our container image an executable Lambda function,
we first need to configure some permissions.

AWS uses a system called IAM ("Identity and Access Management")
to determine which actions are allowed by whom.

Within IAM, "roles" are used to separate permissions from identity.

We can create and modify roles to control what is possible inside our AWS organization.

To make it possible for Lambdas to run,
we need to create a role,
accessible by Amazon's Lambda service,
that has the right permission.

In [None]:
os.environ["LAMBDA_ROLE_NAME"] = "lambda-role"

In [None]:
# create a role that Amazon's AWS Lambda service has access to
#  (aka the Lambda "Principal" can "assume" this role)

!aws iam create-role \
  --role-name $LAMBDA_ROLE_NAME \
  --assume-role-policy-document '{"Version": "2012-10-17", "Statement": [{"Effect": "Allow", "Principal": {"Service": "lambda.amazonaws.com"}, "Action": "sts:AssumeRole"}]}' \
  | jq -C

We'll need to provide the role as an `ARN` --
a resource identifier, like a URI or URL but AWS specific.

We can retrieve that via the CLI:

In [None]:
lambda_role_arn, = !aws iam get-role --role-name $LAMBDA_ROLE_NAME --output json | jq -r '.Role.Arn'
lambda_role_arn = lambda_role_arn.strip('"')

os.environ["LAMBDA_ROLE_ARN"] = lambda_role_arn
!echo $LAMBDA_ROLE_ARN

What roles are permitted to do is controlled by "policies".

So we `attach` two `policy`s to this `role`:
executing Lambdas and writing logs to AWS.

In [None]:
# allow this IAM role to execute Lambdas
!aws iam attach-role-policy \
  --role-name $LAMBDA_ROLE_NAME \
  --policy-arn arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole

In [None]:
# allow this IAM role to write to logs -- required and also important for debugging Lambdas
!aws iam attach-role-policy \
  --role-name $LAMBDA_ROLE_NAME \
  --policy-arn arn:aws:iam::aws:policy/AWSXRayDaemonWriteAccess

Now we're ready to create the function with
`aws lambda create-function`.

There are a few configuration pieces here:
- the `--region` we want to run the function in. That's our environment variable `AWS_REGION`.
- the format of our function (`--package-type`). We're using a container `Image`.
- the location of the `--code` for our function. It's at an `ImageUri`, defined above.
- the `--role` that the function has in IAM, which determines what it's allowed to do.

In [None]:
!aws lambda create-function \
  --function-name $LAMBDA_NAME \
  --region $AWS_REGION \
  --package-type Image \
  --code ImageUri=$IMAGE_URI\:latest \
  --role $LAMBDA_ROLE_ARN | jq -C

We're now ready to execute, or `invoke`, the function.

Or at least, we will be once the Lambda is created inside AWS -- it can take a few minutes.

You'll know the Lambda was executed
when pretty-printed JSON appears below with a `StatusCode` of `200`.

In [None]:
!aws lambda invoke \
  --function-name $LAMBDA_NAME \
  --invocation-type RequestResponse \
  --payload '{"image_url": "https://fsdl-public-assets.s3-us-west-2.amazonaws.com/paragraphs/a01-077.png"}' \
  --cli-binary-format raw-in-base64-out lambda.out | jq -C

!cat lambda.out

The error message says we timed out in 3 seconds.

Lambdas have resource limits to prevent any invocation
from accidentally consuming too much time or compute.

These limits are also used by Amazon  behind the scenes to schedule
invocations onto machines.

The key limits are a `--timeout` (in seconds)
and a `--memory-size` (in MB of RAM).

Adding more RAM also increases the amount of CPU,
so let's max that out and increase the `--timeout` up to a whole minute:

In [None]:
!aws lambda update-function-configuration \
   --function-name $LAMBDA_NAME \
   --region $AWS_REGION \
   --timeout 60 \
   --memory-size 10240 | jq -C

Allow a few minutes for the update to take effect, then try again.

You should get back a text `pred`iction
instead of an `errorMessage`:

In [None]:
!aws lambda invoke \
  --function-name $LAMBDA_NAME \
  --invocation-type RequestResponse \
  --payload '{"image_url": "https://fsdl-public-assets.s3-us-west-2.amazonaws.com/paragraphs/a01-077.png"}' \
  --cli-binary-format raw-in-base64-out lambda.out

!cat lambda.out

# Add an HTTP endpoint with a URL

Right now,
our Lambda is only accessible via AWS --
e.g. with the `aws` CLI.

To access it with a simple network call using an open standard
we need to do more.

For access via HTTPS,
we need to `create` a `function-url`:
a URL that accepts HTTPS requests and forwards them to our Lambda.

To make our lives easier,
we'll furthermore make it possible to invoke the Lambda
from anywhere (`AllowOrigins` is set to a wildcard, `*`)
and without providing AWS `Credentials`.

In [None]:
!aws lambda create-function-url-config \
  --function-name $LAMBDA_NAME \
  --auth-type NONE \
  --cors '{"AllowOrigins": ["*"], "AllowCredentials": false}' \
  | jq -C

To complete making our function URL an open HTTPS endpoint, we need to add one more permission:
allowing anyone to invoke the function with the URL.

> Note: running this cell will mean that anyone with knowledge of this URL can spam you with requests!
Running a model is orders of magnitude more expensive than sending a request,
so an attacker could force you to spend lots of money if you're not careful.
Make sure you
[configure a budget](https://aws.amazon.com/getting-started/hands-on/control-your-costs-free-tier-budgets/).
Surprising cloud bills kill startups.

In [None]:
!aws lambda add-permission \
 --function-name $LAMBDA_NAME \
 --action lambda:invokeFunctionUrl \
 --statement-id "open-access" \
 --principal "*" \
 --function-url-auth-type NONE | jq -C

Our original function URL creation command printed the URL to the terminal.

We can also grab the URL programmatically by running `get-function-url-config`
and parsing the output with `jq`:

In [None]:
lambda_url, = !aws lambda get-function-url-config --function-name $LAMBDA_NAME | jq .FunctionUrl
lambda_url = lambda_url.strip('"')

lambda_url

And then we can start sending requests from anywhere we wish!

The cell below does it in Python, using the `requests` library.

You may need to try a few times before the request works.

This is known as the "cold start" problem:
when the system scales to 0
(reducing the cost to $0 as well),
there's some "warmup" time
while a new instance is spun up.

In [None]:
import json
import requests


image_url = "https://fsdl-public-assets.s3-us-west-2.amazonaws.com/paragraphs/a01-077.png"

headers = {"Content-type": "application/json"}
payload = json.dumps({"image_url": image_url})

response = requests.post(
  lambda_url, data=payload, headers=headers)
pred = response.json()["pred"]

print(pred)

# Connect to a frontend

API calls are great for programmatic access,
but not so much for human use.

To make our fancy new serverless model service more user-friendly,
let's attach it to a Gradio frontend.

You can read more about this approach in
[the lab on deployment](https://fsdl.me/lab07-colab).

In [None]:
from app_gradio import app

lambda_backend = app.PredictorBackend(url=lambda_url)
ui = app.make_frontend(lambda_backend.run)

ui.launch(share=True)

### Postscript: Deploying the frontend on AWS

To make a fully cloud-deployed webservice,
you'll want to provision a cloud instance to run this frontend on.

AWS's server-style cloud compute service is
[Elastic Compute Cloud, aka EC2](https://aws.amazon.com/ec2/).

To launch the frontend on an EC2 instance,
you'll need to first [provision the instance](https://dzone.com/articles/provision-a-free-aws-ec2-instance-in-5-minutes) -- 
the free tier t2.micro instance and Amazon Machine Image suggested in that article work well.
Note that t2.micro instances are free for one year and $10/month after that.

Then, you can clone
[the Text Recognizer repo](http://github.com/full-stack-deep-learning/fsdl-text-recognizer-2022),
install the `prod`uction requirements on that instance, and launch the frontend with

```bash
    python app_gradio/app.py --model_url=$LAMBDA_URL
```

Check the
[lab on deployment](https://fsdl.me/lab07-colab)
for more on running the frontend (locally),
including how to get a URL for it with
[`ngrok`](https://ngrok.io).