## Introduction to Boto3 and AWS Serverless Solutions

Let's say that we wanted to detect objects in an image, extract text from images, or perform sentiment analysis on a text. We could write and train our own classifiers, run our classifier on a server (e.g. an EC2 instance) and use this to make predictions. This requires a lot of time and energy in selecting the appropriate hardware, software, techniques, etc. necessary to perform these operations.

For this reason, all the major cloud providers offer serverless "functions as a service" which are pre-trained/coded models that you simply need to provide data to and you will receive a response. Your cloud provider (e.g. AWS) will spin up the compute instances necessary to actually run the code. 

You can access all of these through the AWS Console, but it is easier to integrate them into your existing code by via the Boto3 SDK:

In [33]:
import boto3
import json
from concurrent.futures import ThreadPoolExecutor

For instance, we can interact with AWS' image recognition functions like so:

In [2]:
rekog = boto3.client('rekognition')

In [14]:
# detect the objects in the provided image
with open('uchicago.jpg', 'rb') as image:
    response = rekog.detect_labels(Image={'Bytes': image.read()})
    
[(label['Name'], label['Confidence']) for label in response['Labels']][:5]

[('Person', 99.89472198486328),
 ('Human', 99.89472198486328),
 ('Outdoors', 94.6645278930664),
 ('Road', 94.50630187988281),
 ('Path', 94.26421356201172)]

In [17]:
# Can also count number of instances of each label: e.g. "Person" - label 0
len(response['Labels'][0]['Instances']) 

14

We can use rekognition to detect text in images as well:

In [9]:
with open('uchicago_sign.jpg', 'rb') as image:
    response = rekog.detect_text(Image={'Bytes': image.read()})

In [13]:
for text in response['TextDetections']:
    if text['Type'] == 'LINE':
        print ('Detected text:' + text['DetectedText'])
        print ('Confidence: ' + "{:.2f}".format(text['Confidence']) + "%")

Detected text:THE UNIVERSITY OF
Confidence: 100.00%
Detected text:CHICAGO
Confidence: 97.42%


If you have custom workflows, Rekognition might not be the best option, but for many general applications, this will likely handle everything that you need to do and is really easy to use.

We can also perform common NLP tasks like detecting the sentiment of a text via AWS Comprehend:

In [31]:
comprehend = boto3.client('comprehend')

response = comprehend.detect_sentiment(Text='This class is fun!',
                                       LanguageCode='en')

print(response['Sentiment'], response['SentimentScore'])

POSITIVE {'Positive': 0.9994707703590393, 'Negative': 4.922328662360087e-05, 'Neutral': 0.00045435165520757437, 'Mixed': 2.5665714929345995e-05}


...and perform quick translations from one language (here, automatically detected) into another one (French) on command with AWS Translate:

In [32]:
translate = boto3.client('translate')

response = translate.translate_text(Text='Hello, my name is Jon',
                                    SourceLanguageCode='auto',
                                    TargetLanguageCode='fr')
response['TranslatedText']

"Bonjour, je m'appelle Jon"

Will have a chance to practice using more of these serverless tools in the DataCamp course that we've assigned as one of the readings for Monday's class, but this should give you a taste of some of the functionality that is available to you right out of the box.

----

**AWS Lambda Functions**

We can also create our own custom serverless functions as well, though, via AWS Lambda... 

*Go to AWS Console and create/deploy sample Lambda function (called `HelloWorld`):*

```python
def lambda_handler(event, context):
    # test: {'key1': 1, 'key2': 2}
    total = event['key1'] + event['key2']
    return total
```

Can write code of arbitrary complexity in here, assuming it's going to be a relatively quick operation (e.g. less than 300s)...

In [34]:
aws_lambda = boto3.client('lambda')

test_data = {'key1': 1, 'key2': 2}

# run synchronously:
r = aws_lambda.invoke(FunctionName='HelloWorld',
                      InvocationType='RequestResponse',
                      Payload=json.dumps(test_data))
json.loads(r['Payload'].read()) # print out response

3

Currently still running all of this code serially, though. Real advantage of
Lambda is that it scales automatically to meet concurrent demand, meaning
that it will automatically parallelize based on how many concurrent invocations
it receives:

In [39]:
# 1. write function to invoke our function for us and pass in data:
def invoke_function(data):
    r = aws_lambda.invoke(FunctionName='HelloWorld',
                       InvocationType='RequestResponse',
                       Payload=json.dumps(data))
    return json.loads(r['Payload'].read())

# 2. Demo that lambda function will scale out if called concurrently on different threads locally
with ThreadPoolExecutor(max_workers=4) as executor:
    results = executor.map(invoke_function, [test_data for _ in range(4)])

# 3. In AWS Console: confirm that we had four concurrent executions (takes a few seconds to update)
# Same results too:
[result for result in results]

[3, 3, 3, 3]

Ideally, we should be able to scale out to as many available Lambda workers as possible (i.e. thousands of concurrent function invocations on different segments of a dataset -- a serverless domain decomposition) and not be limited by our local resources, though. This is a where a package like `pywren` can be useful in helping us spread work across many Lambda workers at once with very little code.