## AWS Lambda for S3

-----

This notebook walks you through steps to create thumbnails for each image (.jpg and .png objects) uploaded to a S3 bucket. 
You will create a Lambda function (CreateThumbnail) that Amazon S3 invokes when objects are created. 
Then, the Lambda function will read the image object from the source bucket and create a thumbnail image in target bucket. 
We call it the _sourceresized_ bucket.

**Important**
There must be two buckets one for source and target. 
If you use the same bucket as the source and the target, each thumbnail uploaded to the source bucket triggers another object-created event, which then invokes the Lambda function again, creating an unwanted recursion.

<img src="../images/s3_flow.PNG">

Taken from AWS website

Sequence of actions performed.

- A user uploads an object to the source bucket in Amazon S3 (object-created event).


- Amazon S3 detects the object-created event.


- Amazon S3 publishes the s3:ObjectCreated:* event to AWS Lambda by invoking the Lambda function and passing event data as a function parameter.


- AWS Lambda executes the Lambda function by assuming the execution role that you specified at the time you created the Lambda function.


- From the event data it receives, the Lambda function knows the source bucket name and object key name. The Lambda function reads the object and creates a thumbnail using graphics libraries, and saves it to the target bucket.

<img src="../images/module4_exercises_process_flow.PNG">

In [None]:
################################### SET THE FOLLOWING PARAMETERS ###################################################
#***********************************************************************************
#Set the AWS Region
region = 'us-east-1'#***************************************************************

#Set the AWS Access ID (Given to you buy the DSA staff)
access_id = '<>'  

#Set the AWS Access Key (Given to you buy the DSA staff)
access_key = '<>' 

In [None]:
import boto3
import botocore
import os
import zipfile
import datetime
import pandas
import json
import time
import getpass
from subprocess import call

# Set the username from system
system_user_name=getpass.getuser()

# Set the source S3 bucket name
source=system_user_name+".source"

# Set the target S3 bucket name
sourceresized=system_user_name+".sourceresized"

# Set the lambda name
lambda_name=system_user_name+"_CreateThumbnail"

# ami image code
ami_image = 'ami-8c1be5f6'

s3 = boto3.resource('s3', 
                   aws_access_key_id = access_id, 
                   aws_secret_access_key = access_key)
iam = boto3.client('iam', 
                   aws_access_key_id = access_id,
                   aws_secret_access_key = access_key)
lamb = boto3.client('lambda', region_name=region, 
                   aws_access_key_id = access_id, 
                   aws_secret_access_key = access_key)

**Important**

Both the source bucket and the Lambda function must be in the same AWS region. In addition, the code used for the Lambda function also assumes that both of the buckets are in the same region. us-east-1 is the default region in this notebook.

You should have set up the AWS CLI by now. If you still haven't configured your credentials in AWS CLI, [click here](../../module2/extra_labs/Accessing_AWS_through_CLI.ipynb#Configuring-the-AWS-CLI) for guide to do that. 

In [None]:
s3.create_bucket(Bucket=source)
s3.create_bucket(Bucket=sourceresized)

In [None]:
s3.Object(source, 'source.jpg').put(Body=open('../images/ML.jpg', 'rb'))

### Create a Deployment Package

Following cell has the Python functions carried out by lambda and also installs dependencies. The code uploads the resized image to a different bucket with the same image name, as shown below:

    source-bucket/image.png -> source-bucketresized/image.png

<br>
**Note**
The from __future__ statement enables you to write code that is compatible with Python 2 or 3. If you are using runtime version 3.6, it is not necessary to include it.

In [None]:
# Opening a new file with name in lambda_name(which essentially system_user_name+"_CreateThumbnail") 
# for example skaf48_CreateThumbnail in write mode.

# Writing that small piece of code into the file which is in the form of sring. This is function that executes 
# when lambda is executed
with open(lambda_name+".py", "w") as myfile:
    myfile.write('''\
from __future__ import print_function
import boto3
import os
import sys
import uuid
from PIL import Image
import PIL.Image
     
s3_client = boto3.client('s3')
     
def resize_image(image_path, resized_path):
    with Image.open(image_path) as image:
        image.thumbnail(tuple(x / 2 for x in image.size))
        image.save(resized_path)

def handler(event, context):
    for record in event['Records']:
        bucket = record['s3']['bucket']['name']
        key = record['s3']['object']['key'] 
        download_path = '/tmp/{}{}'.format(uuid.uuid4(), key)
        upload_path = '/tmp/resized-{}'.format(key)
        
        s3_client.download_file(bucket, key, download_path)
        resize_image(download_path, upload_path)
        s3_client.upload_file(upload_path, '{}resized'.format(bucket), key)
''')

**Add your python code to the .zip file**

In [None]:
import os
import zipfile

# Open a zip file with same name in lambda_name(which essentially system_user_name+"_CreateThumbnail") in write mode
zf = zipfile.ZipFile(lambda_name+".zip", "w")

# Write the contents of above file we created into this zip folder
zf.write(lambda_name+".py")
zf.close()

The .zip file should contain Lambda function code and also dependencies. The dependencies must be downloaded and copied into the zip file. To do that launch an EC2 instance install the necessary packages and write them to zip file.

### Launch an EC2 instance

In [None]:
# Create an EC2 client object
ec2 = boto3.client('ec2',region_name=region, 
                   aws_access_key_id = access_id, 
                   aws_secret_access_key = access_key)

# Set the Security group name
Sec_group_name= system_user_name+"module4_Sec_group_rsgt3b"

### Create a Security Group

Create a security group and modify the security rules as we need to SSH into the instance to install software packages on it. .

In [None]:
# Create the security group
sg = ec2.create_security_group(
    Description='security grp for module 4',
    GroupName=Sec_group_name
)
Sec_group=sg["GroupId"]

Customize the security group to allow MU's TCP traffic and SSH requests. Configure the inbound rules to allow traffic as needed. 

In [None]:
#Modify Security Configuration to allow MU's IP addresses

#Describe Cluster

try:
    sec_rule="ALL TCP"
    data = ec2.authorize_security_group_ingress(
        GroupId=Sec_group,
        IpPermissions=[
            {'IpProtocol': 'tcp',
             'FromPort': 0,
             'ToPort': 65535,
             'IpRanges': [{'CidrIp': '0.0.0.0/0'}]},
        ],)
    print("Ingress "+sec_rule+" added")
except:
    print(sec_rule+" already added")
#     print(data)

try:
    sec_rule="ALL TCP"
    data = ec2.authorize_security_group_ingress(
        GroupId=Sec_group,
        IpPermissions=[
            {'IpProtocol': 'tcp',
             'FromPort': 0,
             'ToPort': 65535,
             'UserIdGroupPairs': [{ 'GroupId': Sec_group }]
#              'IpRanges': [{'CidrIp': Sec_group}]},
            }],
#         SourceSecurityGroup=Sec_group_name
    )
    print("Ingress "+sec_rule+" added")
except:
    print(sec_rule+" already added")

try:
    sec_rule="Custom ICMP Rule - IPv4"
    data = ec2.authorize_security_group_ingress(
        GroupId=Sec_group,
        IpPermissions=[
            {'IpProtocol': 'icmp',
             'FromPort': 0,
             'ToPort': -1,
             'IpRanges': [{'CidrIp': '173.31.192.195/32'}]},
        ])
    print("Ingress "+sec_rule+" added")
except:
    print(sec_rule+" already added")
#     print(data)

try:
    sec_rule="ALL UDP"
    data = ec2.authorize_security_group_ingress(
        GroupId=Sec_group,
        IpPermissions=[
            {'IpProtocol': 'udp',
             'FromPort': 0,
             'ToPort': 65535,
             'UserIdGroupPairs': [{ 'GroupId': Sec_group }]
            }],
    )
    print("Ingress "+sec_rule+" added")
except:
    print(sec_rule+" already added")
#     print(data)

    
try:
    sec_rule="ALL ICMP"
    data = ec2.authorize_security_group_ingress(
        GroupId=Sec_group,
        IpPermissions=[
            {'IpProtocol': 'icmp',
             'FromPort': -1,
             'ToPort': -1,
             'UserIdGroupPairs': [{ 'GroupId': Sec_group }]
            }],
    )
    print("Ingress "+sec_rule+" added")
except:
    print(sec_rule+" already added")

    
try:
    sec_rule="ALL ICMP"
    data = ec2.authorize_security_group_ingress(
        GroupId=Sec_group,
        IpPermissions=[
            {'IpProtocol': 'icmp',
             'FromPort': -1,
             'ToPort': -1,
             'IpRanges': [{'CidrIp': '0.0.0.0/16'}]
            }],
    )
    print("Ingress "+sec_rule+" added")
except:
    print(sec_rule+" already added")


### Create a keypair

Create a keypair for the EC2 instance. We first generate a name to create a key with that name and also store the key in a file. ec2.create_key_pair() will create a keypair. System command echo is used to write the contents of keypair generated to a file created with same name as keypair. 

You have to modify the file permissions to provide readonly access. If the file is open, system will throw an error. Do chmod(file, 0o400) 

In [None]:
import time 
import os

ec2_pem_file=time.strftime("EC2-%d%m%Y%H%M%S-"+system_user_name)
ec2_key=ec2.create_key_pair(KeyName=ec2_pem_file)

#Don't do this unless you have a good reason
#print(emr_key['KeyMaterial'])

os.system("echo \""+ec2_key['KeyMaterial']+"\" > "+ec2_pem_file+".pem")
os.chmod(ec2_pem_file+".pem",0o400)

print("KeyName         : "+ec2_key['KeyName']+"\nKey Fingerprint : "+ec2_key['KeyFingerprint'])

### Create Instance

In [None]:
instances = ec2.run_instances(
    ImageId=ami_image,
    MinCount=1,
    MaxCount=1,
    KeyName=ec2_pem_file,
    TagSpecifications=[
        {
            'ResourceType': 'instance',
            'Tags': [
                        {   'Key': 'Name',
                            'Value': 'Docker_Jupyter'
                        }
                    ]
        }
    ],
    InstanceType="t2.micro",
    SecurityGroupIds=[
        Sec_group
    ],
)

In [None]:
# Get the instance id of newly created EC2 instance.
new_instance_id = instances["Instances"][0]["InstanceId"]

#### Check the status of Instance

Use the poll function to make instance is completely set up and is ready for use. 

In [None]:
def poll_until_completed(client, ins_id):
    delay = 2
    while True:
        instance = client.describe_instances(InstanceIds=[ins_id,])
        status = instance["Reservations"][0]["Instances"][0]["State"]["Name"]
#         message = cluster.get('Message', '')
        now = str(datetime.datetime.now().time())
    
        print("instance %s is %s at %s" % (ins_id, status, now))
        if status in ['running','terminated']:
            break

        # exponential backoff with jitter
        delay *= random.uniform(1.1, 2.0)
        time.sleep(delay)

In [None]:
import random
import time

poll_until_completed(ec2, new_instance_id)  # Can't use it until it's COMPLETED

In [None]:
# Using the instanceId captured above, use describe_instances() method to get instance details
# instance details has public DNS address.

inst_det = ec2.describe_instances(
    InstanceIds=[
        new_instance_id,
    ]
)

In [None]:
# Get the public DNS of new instance.
instance_pub_dns=inst_det["Reservations"][0]["Instances"][0]["PublicDnsName"]
instance_pub_dns

### Push the lambda function code file EC2 instance

In my case its "skaf48_CreateThumbnail.py"

In [None]:
import os

os.system("scp -o StrictHostKeyChecking=no -i "+
          os.getcwd() +"/" + ec2_pem_file + ".pem "+
          lambda_name+".py "+
          "ec2-user@"+instance_pub_dns + ":~/"+lambda_name+".py")

### Connect to EC2 instance via SSH.

Run below cell and copy the output. paste it in the terminal. Use the terminal to SSH into the instance and install the software packages. 

In [None]:
print("ssh -i "+os.getcwd()+"/"+ec2_pem_file+".pem ec2-user@"+instance_pub_dns)

<br>
## <span style="color:red">Note:</span>
<hr size="6" width="100%" noshade style="border-color:#FF0000" align="left">


After you succesfully got into the EC2 instance, **Carefully** run all the commands in following sequence as shown below in the terminal. 

<br>
**Install Python 3.6 and virtualenv using the following steps:**


```bash


sudo yum install -y gcc zlib zlib-devel openssl openssl-devel

wget https://www.python.org/ftp/python/3.6.1/Python-3.6.1.tgz

tar -xzvf Python-3.6.1.tgz

cd Python-3.6.1 && ./configure && make

sudo make install

sudo /usr/local/bin/pip3 install virtualenv


```

<br>
**Choose the virtual environment that was installed via pip3**


```bash

/usr/local/bin/virtualenv ~/shrink_venv

source ~/shrink_venv/bin/activate

```

<br>
**Install libraries in the virtual environment **

```bash

pip install Pillow

pip install boto3

```

<br>
** Add the contents of lib and lib64 site-packages to your .zip file.**

**Note: ** that the following steps assume you used Python runtime version 3.6. If you used version 2.7 you will need to update accordingly.


```bash

cd $VIRTUAL_ENV/lib/python3.6/site-packages

zip -r9 ~/pawprint_CreateThumbnail.zip *


```



<br>
** Add your python code to the .zip file**

```bash

cd ~

zip -g pawprint_CreateThumbnail.zip pawprint_CreateThumbnail.py

```

<br>
**Upload the zip file to S3**. Check to see if the CreateThumbnail.zip file exists by running "dir" command. Also run "aws configure" so you can access AWS services through Command line. 


```bash

dir

aws configure

aws s3api create-bucket --bucket <pawprint>-bucket-module4

aws s3 cp pawprint_CreateThumbnail.zip s3://<pawprint>-bucket-module4/

```

### Create the Execution Role (IAM Role)


Create an IAM role using the following predefined role type and access permissions policy:

- AWS service role of the type AWS Lambda – This role grants AWS Lambda permissions to assume the role. Role Type, would be AWS Lambda in AWS Service Roles. This grants the AWS Lambda service permissions to assume the role.


- AWSLambdaExecute access permissions policy that you attach to the role. Attach the default Policy, AWSLambdaBasicExecuteRole.


- Enter a Role name and then choose Create role. For Role Name, use a name that is unique within your AWS account (for example, skaf48-lambda-s3-execution-role).

In [None]:
# Function to create a AWS role for performing lambda 

def create_role(name, policies=None):
    """ Create a role with an optional inline policy """
    policydoc = {
        "Version": "2012-10-17",
        "Statement": [
            {"Effect": "Allow", "Principal": {"Service": ["lambda.amazonaws.com"]}, "Action": ["sts:AssumeRole"]},
        ]
    }
    roles = [r['RoleName'] for r in iam.list_roles()['Roles']]
    if name in roles:
        print('IAM role %s exists' % (name))
        role = iam.get_role(RoleName=name)['Role']
    else:
        print('Creating IAM role %s' % (name))
        role = iam.create_role(RoleName=name, AssumeRolePolicyDocument=json.dumps(policydoc))['Role']

    # attach managed policy
    if policies is not None:
        for p in policies:
            iam.attach_role_policy(RoleName=role['RoleName'], PolicyArn=p)
    return role

In [None]:
# Call above function to create the role with predefined role type and access policy
role = create_role(system_user_name + '_lambda-s3-execution-role', 
                   policies=['arn:aws:iam::aws:policy/AWSLambdaExecute'])

### Create the Lambda Function (Upload the Deployment Package)

Create a Lambda function by uploading the deployment package. The piece of code in the zip file we created above is the deployment package. Test the Lambda function by invoking it manually. Instead of creating an event source, we use a sample DynamoDB event data which is a set of json records. 

----

Below function will check if a lambda function with specified name already exists. If yes then it will update the code for existing lambda. Else, it will create a new lambda. 

In [None]:
def create_function(name, bucket, key, lsize=512, timeout=120, update=False):
    """ Create, or update if exists, lambda function """
    print("role:",role)

    if name in [f['FunctionName'] for f in lamb.list_functions()['Functions']]:
        if update:
            print('Updating %s lambda function code' % (name))
            return lamb.update_function_code(FunctionName=name, S3Bucket=bucket, S3Key=key)
        else:
            print('Lambda function %s exists' % (name))
            for f in lamb.list_functions()['Functions']:
                if f['FunctionName'] == name:
                    lfunc = f
    else:
        print('Creating %s lambda function' % (name))
        lfunc = lamb.create_function(
            FunctionName=name,
            Runtime='python3.6',
            Role=role['Arn'],
            Handler=lambda_name+'.handler',
            Description='lambda to monitor S3 events and process the images',
            Timeout=timeout,
            MemorySize=lsize,
            Publish=True,
#                 Code={'ZipFile': zipfile.read()},
            Code={'S3Bucket':bucket,  # <pawprint>-bucket-module4
                  'S3Key':key},
        )
    lfunc['Role'] = role
    return lfunc

In [None]:
# Call create_function() to create the lambda. The parameter update=True will ensure the existing lambda is updated with the 
# supplied code in the zip file.

lfunc = create_function(lambda_name,bucket="pawprint-bucket-module4", key="pawprint_CreateThumbnail.zip", update=True)

### Test the Lambda Function (Invoke Manually)

Invoke the Lambda function manually using sample Amazon S3 event data and test it manually. Following is the Amazon S3 sample event data. 

**Update the JSON by providing your sourcebucket name. Replace <> with the value in source variable**

In [None]:
# Replace <> with the value in source variable

input_data=b"""{  
   "Records":[  
      {  
         "eventVersion":"2.0",
         "eventSource":"aws:s3",
         "awsRegion":"us-east-1",
         "eventTime":"1970-01-01T00:00:00.000Z",
         "eventName":"ObjectCreated:Put",
         "userIdentity":{  
            "principalId":"AIDAJDPLRKLG7UEXAMPLE"
         },
         "requestParameters":{  
            "sourceIPAddress":"127.0.0.1"
         },
         "responseElements":{  
            "x-amz-request-id":"C3D13FE58DE4C810",
            "x-amz-id-2":"FMyUVURIY8/IgAtTv8xRjskZQpcIZ9KG4V5Wp6S7S/JRWeUWerMUE5JgHvANOjpD"
         },
         "s3":{  
            "s3SchemaVersion":"1.0",
            "configurationId":"testConfigRule",
            "bucket":{  
               "name":"pawprint.source",
               "ownerIdentity":{  
                  "principalId":"A3NL1KOZZKExample"
               },
               "arn":"arn:aws:s3:::pawprint.source"  
            },
            "object":{  
               "key":"source.jpg",
               "size":1024,
               "eTag":"d41d8cd98f00b204e9800998ecf8427e",
               "versionId":"096fKKXTRTtl3on89fVO.nfljtsv6qko"
            }
         }
      }
   ]
}"""

In [None]:
# Invoke the lambda manually

response = lamb.invoke(
    FunctionName=lambda_name,
    InvocationType='RequestResponse',
    LogType='Tail',
    Payload=input_data
)

----

If the above cell ran succesfully then the test is passed. You should see an image written to sourceresized folder in S3. See below for sample output

<img src="../images/size_comparision.PNG">

### Add an Event Source (Configure Amazon S3 to Publish Events)


In this step, you add the remaining configuration so that Amazon S3 can publish object-created events to AWS Lambda and invoke the Lambda function. You do the following in this step:

- Add permissions to the Lambda function access policy to allow Amazon S3 to invoke the function.


- Add notification configuration to your source bucket. In the notification configuration, you provide the following:
    
    - Event type for which you want Amazon S3 to publish events. For this tutorial, you specify the s3:ObjectCreated:* event type so that Amazon S3 publishes events when objects are created.

    - Lambda function to invoke.

In [None]:
response = lamb.add_permission(
    FunctionName=lambda_name,
    StatementId=time.strftime(system_user_name+"%d%m%Y%H%M%S"), # some-unique-id 
    Action='lambda:InvokeFunction', 
    Principal='s3.amazonaws.com',
    SourceArn='arn:aws:s3:::rsgt3b.source',   # ARN of the source bucket, Change the value with your pawprint
    SourceAccount='714861692883'   # bucket-owner-account-id
)

Verify the function's access policy by running the AWS CLI get-policy command.

In [None]:
response = lamb.get_policy(
    FunctionName=lambda_name
)
response

### Configure Notification on the Bucket

Add notification configuration on the source bucket to request Amazon S3 to publish object-created events to Lambda. In the configuration, you specify the following:

- Event type – For this notebook, its ObjectCreated (All) Amazon S3 event type.


- Lambda function – Your Lambda function that you want Amazon S3 to invoke.

In [None]:
response = lamb.get_function(
    FunctionName=lambda_name
)

In [None]:
response["Configuration"]["FunctionArn"]

In [None]:
s3_client = boto3.client('s3', 
                   aws_access_key_id = access_id, 
                   aws_secret_access_key = access_key)

response = s3_client.put_bucket_notification_configuration(
    Bucket=source,
    NotificationConfiguration={
        'LambdaFunctionConfigurations': [
            {
                'Id': 'NewImage',
                'LambdaFunctionArn': response["Configuration"]["FunctionArn"],
                'Events': ['s3:ObjectCreated:*']
            }
        ]
    }
)

### Test the Setup

Upload .jpg or .png objects to the source bucket. Verify that the thumbnail was created in the target bucket using the CreateThumbnail lambda function. 

In [None]:
s3.Object(source, 'test_image.jpg').put(Body=open('../images/test_image.jpg', 'rb'))

### Delete SSH Keypair

In [None]:
# Delete SSH Keypair

try:
    os.remove(ec2_pem_file+'.pem')
    print('Local Key Deleted')
except:
    print('Local Key Not Found')
    
response = ec2.delete_key_pair(KeyName=ec2_pem_file)
print('\nAWS Metadata: ')
print('http Status Code : '+str(response['ResponseMetadata']['HTTPStatusCode']))
print('Request ID       : '+response['ResponseMetadata']['RequestId'])
print('Retries          : '+str(response['ResponseMetadata']['RetryAttempts']))

### Terminate the EC2 instance

In [None]:
ec2 = boto3.resource('ec2', region_name=region, 
                   aws_access_key_id = access_id, 
                   aws_secret_access_key = access_key)

ec2.instances.filter(InstanceIds=[new_instance_id,]).terminate()

### Delete the security group

Run the polling function to make sure instance is terminated. You cant delete the security group while a running instance is using it. 

In [None]:
import random
import time

ec2 = boto3.client('ec2', region_name=region, 
                   aws_access_key_id = access_id, 
                   aws_secret_access_key = access_key)
poll_until_completed(ec2, new_instance_id)  # Can't use it until it's COMPLETED

In [None]:
SG_delete_response = ec2.delete_security_group(
    GroupId=Sec_group,
)


# Save your notebook!

## <span style="color:red">Note: Dont delete the buckets</span>

Leave them for evaluation

In [None]:
# executing this cell will delete the source bucket 
#import boto3
#source = 'pawprint.source'
#s3 = boto3.resource('s3', 
#                   aws_access_key_id = access_id, 
#                   aws_secret_access_key = access_key)
#bucket = s3.Bucket(source)

#for key in bucket.objects.all():
#     key.delete()
#bucket.delete()

In [None]:
# executing this cell will delete the source resized bucket 
#sourceresized = 'pawprint.sourceresized'
#bucket = s3.Bucket(sourceresized)

#for key in bucket.objects.all():
#    key.delete()
#bucket.delete()

In [None]:
# executing this cell will delete the pawprint-bucket-module4 bucket 
#bucket-module4 = 'pawprint-bucket-module4'
#bucket = s3.Bucket(bucket-module4)

#for key in bucket.objects.all():
#    key.delete()
#bucket.delete()