## Introduction to Boto3 and AWS Serverless Solutions

Let's say that we wanted to detect objects in an image, extract text from images. We could write and train our own classifiers, run our classifier on a server (e.g. an EC2 instance) and use this to make predictions. This requires a lot of time and energy in selecting the appropriate hardware, software, techniques, etc. necessary to perform these operations.

For this reason, all the major cloud providers offer serverless "functions as a service" which are pre-trained/coded models that you simply need to provide data to and you will receive a response. Your cloud provider (e.g. AWS) will spin up the compute instances necessary to actually run the code. 

You can access all of these through the AWS Console, but it is easier to integrate them into your existing code via the Boto3 SDK.

In [1]:
import boto3
import json
from concurrent.futures import ThreadPoolExecutor

For instance, we can interact with AWS' image recognition functions like so:

In [2]:
rekog = boto3.client('rekognition')

In [3]:
# detect the objects in the provided image
with open('uchicago.jpg', 'rb') as image:
    response = rekog.detect_labels(Image={'Bytes': image.read()})
    
[(label['Name'], label['Confidence']) for label in response['Labels']][:5]

[('Architecture', 99.19303131103516),
 ('Building', 99.19303131103516),
 ('Campus', 99.19303131103516),
 ('Person', 97.88841247558594),
 ('City', 97.48779296875)]

In [4]:
# Can also count number of instances of each label: e.g. "Person" - label 3
len(response['Labels'][3]['Instances']) 

15

We can use rekognition to detect text in images as well:

In [5]:
with open('uchicago_sign.jpg', 'rb') as image:
    response = rekog.detect_text(Image={'Bytes': image.read()})

In [6]:
for text in response['TextDetections']:
    if text['Type'] == 'LINE' and text['Confidence'] > 90:
        print ('Detected text:' + text['DetectedText'])
        print ('Confidence: ' + "{:.2f}".format(text['Confidence']) + "%")

Detected text:THE UNIVERSITY OF
Confidence: 99.61%
Detected text:CHICAGO
Confidence: 99.58%


If you have custom workflows, Rekognition might not be the best option, but for many general applications, this will likely handle everything that you need to do and is really easy to use.

You will have a chance to practice using more of these serverless tools in the DataCamp course that we've assigned as one of the readings for Monday's class, but this should give you a taste of some of the functionality that is available to you right out of the box.

----

**AWS Lambda Functions**

We can also create our own custom serverless functions as well, though, via AWS Lambda... 

*Go to AWS Console and create/deploy sample Lambda function (called `hello_world`) using LabRole IAM role:*

```python
def lambda_handler(event, context):
    # test: {'key1': 1, 'key2': 2}
    total = event['key1'] + event['key2']
    return total
```

Can write code of arbitrary complexity in here, assuming it's going to be a relatively quick operation (e.g. less than 300s)...

In [7]:
aws_lambda = boto3.client('lambda')

test_data = {'key1': 1, 'key2': 2}

# run synchronously:
r = aws_lambda.invoke(FunctionName='hello_world',
                      InvocationType='RequestResponse',
                      Payload=json.dumps(test_data))
json.loads(r['Payload'].read()) # print out response

3

Can also upload Lambda functions programmatically:

In [8]:
# Access our class IAM role, which allows Lambda
# to interact with other AWS resources
aws_lambda = boto3.client('lambda')
iam_client = boto3.client('iam')
role = iam_client.get_role(RoleName='LabRole')

# Open zipped directory
with open('hello_world.zip', 'rb') as f:
    lambda_zip = f.read()

try:
    # If function hasn't yet been created, create it
    response = aws_lambda.create_function(
        FunctionName='hello_world_programmatic',
        Runtime='python3.9',
        Role=role['Role']['Arn'],
        Handler='lambda_function.lambda_handler',
        Code=dict(ZipFile=lambda_zip),
        Timeout=300
    )
except aws_lambda.exceptions.ResourceConflictException:
    # If function already exists, update it based on zip
    # file contents
    response = aws_lambda.update_function_code(
        FunctionName='hello_world_programmatic',
        ZipFile=lambda_zip
        )

lambda_arn = response['FunctionArn']

In [9]:
# run synchronously as soon as Function is ready:
r = aws_lambda.invoke(FunctionName='hello_world_programmatic',
                      InvocationType='RequestResponse',
                      Payload=json.dumps(test_data))
json.loads(r['Payload'].read()) # print out response

3

Currently still running all of this code serially, though. Real advantage of
Lambda is that it scales automatically to meet concurrent demand, meaning
that it will automatically parallelize based on how many concurrent invocations
it receives (just don't invoke your Lambda functions more than 10 times concurrently using your AWS Academy account or your account could be deactivated; in a personal account, these can scale to 1000s of concurrent invocations, though):

In [10]:
# 1. write function to invoke our function for us and pass in data:
def invoke_function(data):
    r = aws_lambda.invoke(FunctionName='hello_world_programmatic',
                          InvocationType='RequestResponse',
                          Payload=json.dumps(data))
    return json.loads(r['Payload'].read())

# 2. Demo that lambda function will scale out if called concurrently on different threads locally
with ThreadPoolExecutor(max_workers=4) as executor:
    results = executor.map(invoke_function, [test_data for _ in range(4)])

# 3. In AWS Console: confirm that we had >1 concurrent executions (takes a few seconds to update)
# Same results too:
[result for result in results]

[3, 3, 3, 3]

This capacity to scale based on concurrent demand makes Lambda functions great for event-driven workflows (which we'll talk in more detail about in a couple of weeks).

For batch-job types of tasks, though, we should ideally be able to scale out to as many available Lambda workers as possible (i.e. thousands of concurrent function invocations on different segments of a dataset -- a serverless domain decomposition). In the above workflow, though, our local CPU is still a major bottleneck (we can only invoke as many concurrent Lambda workers as local multithreading allows). We'll revisit the question of how to scale out these batch workflows further when introduce AWS [Step Functions](https://docs.aws.amazon.com/step-functions/latest/dg/welcome.html), which can orchestrate large, embarrassingly parallel code execution across many Lambda workers entirely in cloud (i.e. no local bottleneck!).