# Amazon Web Service (AWS) Integration with Python

AWS is the most popular cloud service provider.

The standard Python SDK interface for AWS is `Boto3`. Through this interface, you can generate and manage storage and compute resources programmatically using your automated Python scripts.

### Interacting with your AWS account

You will need to enable programmatic access to your AWS account, and provide credentials from your account. You achieve this via the Identity and Access Management (IAM) section of your AWS account.

This will create a default profile that will interact with your AWS account. We will go over this in class.

There are two main ways to interact with the AWS API: via `clients` and `resources`. Clients are lower-level services, while resources enable a higher-level, object-oriented interface.

The client-based interactions derive from a JSON-based service definition file. Every interaction within AWS is available as a client method. You are more likely to see performance gains by using resources. However, the lower-level nature means that there is more programmatic heavy-lifting to do similar tasks. The syntax from resource-based AWS access is generally more readable and easier to follow because of the higher-level abstractions.

In [3]:
import boto3
boto3.__version__

'1.9.102'

In [2]:
s3_resource = boto3.resource('s3')

### Creating an S3 bucket

One of the big choices when it comes to generating a new S3 bucket is that it needs to have a name. That name needs to be DNS compliant _across the AWS system_. In other words, you need to come up with a name, up to 63 characters long, that doesn't already exist as an S3 bucket. Even more annoying: you can't have upper-case letters or most special characters (such as `_`, but you can have `-`).

There are plenty of ways to generate random strings. Reducing the risk of name-space collision, we'll use the `Secrets` module (Python 3.6+):

In [71]:
import secrets

rnd_bucket_name = "autogenerated_s3_bucket_" + secrets.token_urlsafe(nbytes=45)
rnd_bucket_name = rnd_bucket_name.lower().replace("_", "-")[:61]  # Truncate to 63 characters
rnd_bucket_name

'autogenerated-s3-bucket-cp5ngtlgtjcgef3lcu2qwbe1qbx-gi-ierkca'

That looks innocuous enough (what is the chance of randomly getting a string that would make a sensible Tweet?).

Let's create an S3 bucket with this name:

In [72]:
s3_resource.create_bucket(Bucket=rnd_bucket_name, CreateBucketConfiguration={"LocationConstraint": "us-west-2"})

s3.Bucket(name='autogenerated-s3-bucket-cp5ngtlgtjcgef3lcu2qwbe1qbx-gi-ierkca')

### Loading data onto an S3 bucket

So we have an S3 bucket as a resource. Time to start populating it!

In [75]:
s3_resource.Bucket(rnd_bucket_name).upload_file(Filename="../data/USD_comparison.json", Key="remote_USD_comparison.json")

**Note:** It's a good policy to randomize your filenames on S3 if you require performance and are likely to have many files with the same prefix. Partitioning is performed upon the file prefix, which can saturate and lead to load-balancing issues.

### Querying files in an S3 bucket

In [97]:
[print(Idx.key) for Idx in s3_resource.Bucket(rnd_bucket_name).objects.all()]

remote_USD_comparison.json


[None]

### Downloading data from an S3 bucket


The inverse operation of the above. A nice programmatic access to data from across the S3 universe.

In [76]:
s3_resource.Bucket(rnd_bucket_name).download_file(Filename="../data/USD_comparison_from_s3.json", Key="remote_USD_comparison.json")

In [77]:
! ls ../data/*.json 

../data/USD_comparison_from_s3.json  ../data/USD_comparison.json


Voila! 

Just don't forget that there are data access charges. These can add up.

### Transferring data between S3 buckets

You may wish to transfer data between buckets. For this, there is the `.copy()` method.

First, let's create another S3 bucket:

In [88]:
rnd_bucket_name2 = "autogenerated_s3_bucket_" + secrets.token_urlsafe(nbytes=50)
rnd_bucket_name2 = rnd_bucket_name2.lower().replace("_", "-")[:61]  # Truncate to 63 characters
rnd_bucket_name2


s3_resource.create_bucket(Bucket=rnd_bucket_name2, CreateBucketConfiguration={"LocationConstraint": "us-west-2"})

s3.Bucket(name='autogenerated-s3-bucket-ekb2ymf8bvxgscolasvnf3ce75lg5px08uvab')

Let's make a copy of the file we uploaded. We set the location of the S3 resource to be copied (S3 bucket name and key of the file) as a dictionary, and give the key for the copied file in the new bucket:

In [89]:
s3_resource.Bucket(rnd_bucket_name2).copy({"Bucket": rnd_bucket_name, "Key": "remote_USD_comparison.json"}, 
                                         "USD_comparison_copied.json")

### Delete a file from an S3 bucket

Through the S3 resource API, we can access either the `Bucket` or the relevant `Object`. We can simply use the latter to delete the file in question: 

In [92]:
s3_resource.Object(rnd_bucket_name2, "USD_comparison_copied.json").delete()

{'ResponseMetadata': {'RequestId': '06C884051BF9B813',
  'HostId': 'Rqio76N1RyCvBI+Jj4uC8828p7HoFELlkUgnJjAtWHlQHue6bpqcatWZNEzCGeKV79nkYJEQoOU=',
  'HTTPStatusCode': 204,
  'HTTPHeaders': {'x-amz-id-2': 'Rqio76N1RyCvBI+Jj4uC8828p7HoFELlkUgnJjAtWHlQHue6bpqcatWZNEzCGeKV79nkYJEQoOU=',
   'x-amz-request-id': '06C884051BF9B813',
   'date': 'Sat, 27 Jul 2019 23:53:41 GMT',
   'server': 'AmazonS3'},
  'RetryAttempts': 0}}

### Destroy an S3 bucket

You can only destroy an empty S3 bucket. You can do so using the `.delete()` method available to `Bucket` objects:

In [93]:
s3_resource.Bucket(rnd_bucket_name2).delete()

{'ResponseMetadata': {'RequestId': '83557F5BFE6886EC',
  'HostId': 'kEiVn4DGmAIEP/4ZUbz3lobJUYBLStnbUSHja/pIIuK3dYhag91DWHcd9o72HDW4Lv8eCLGH5Po=',
  'HTTPStatusCode': 204,
  'HTTPHeaders': {'x-amz-id-2': 'kEiVn4DGmAIEP/4ZUbz3lobJUYBLStnbUSHja/pIIuK3dYhag91DWHcd9o72HDW4Lv8eCLGH5Po=',
   'x-amz-request-id': '83557F5BFE6886EC',
   'date': 'Sat, 27 Jul 2019 23:55:30 GMT',
   'server': 'AmazonS3'},
  'RetryAttempts': 0}}

Here, we knew that this second bucket was empty after we removed the single file. If if weren't, here is a useful function to destroy a bucket and everything in it:

In [94]:
def destroy_bucket(resource_name, bucket_name):
    """Efficiently remove an AWS S3 bucket in its entirety by iterating through each Key within it.
    
    The .delete_objects() method can delete up to 1,000 objects per API call.
    We then remove the bucket itself with .delete()
    """
    bucket_list = []
    for Idx in resource_name.Bucket(bucket_name).object_versions.all():
        bucket_list.append({"Key": Idx.object_key, "VersionId": Idx.id})
    resource_name.Bucket(bucket_name).delete_objects(Delete={"Objects": bucket_list})
    try:
        resource_name.Bucket(bucket_name).delete()
        print(f"Bucket {bucket_name} successfully destroyed.")
    except:
        print(f"Bucket {bucket_name} was not able to be destroyed.")
        

With great power comes great responsibility. Use it wisely.

## Managing EC2 instances

It is a simple matter to connect to a EC2 instance:

In [6]:
my_ec2 = boto3.client('ec2')

There is a whole lot of information associated with this connection:

In [7]:
print(my_ec2.describe_instances())

{'Reservations': [{'Groups': [], 'Instances': [{'AmiLaunchIndex': 0, 'ImageId': 'ami-59fc7439', 'InstanceId': 'i-02765995957c8d047', 'InstanceType': 't2.micro', 'KeyName': 'mykeypair', 'LaunchTime': datetime.datetime(2017, 3, 31, 18, 5, 4, tzinfo=tzutc()), 'Monitoring': {'State': 'disabled'}, 'Placement': {'AvailabilityZone': 'us-west-2a', 'GroupName': '', 'Tenancy': 'default'}, 'Platform': 'windows', 'PrivateDnsName': 'ip-172-31-34-183.us-west-2.compute.internal', 'PrivateIpAddress': '172.31.34.183', 'ProductCodes': [], 'PublicDnsName': '', 'State': {'Code': 80, 'Name': 'stopped'}, 'StateTransitionReason': 'User initiated (2017-03-31 18:30:58 GMT)', 'SubnetId': 'subnet-972398de', 'VpcId': 'vpc-30014b57', 'Architecture': 'x86_64', 'BlockDeviceMappings': [], 'ClientToken': 'JkSvK1490983503512', 'EbsOptimized': False, 'EnaSupport': True, 'Hypervisor': 'xen', 'IamInstanceProfile': {'Arn': 'arn:aws:iam::471228041336:instance-profile/myRole', 'Id': 'AIPAIR4OK3RJ35KGTIW5C'}, 'NetworkInterfac

In [22]:
my_ec2.start_instances(InstanceIds=[instance_id])

ClientError: An error occurred (InvalidParameterValue) when calling the StartInstances operation: Invalid value 'i-02765995957c8d047' for instanceId. Instance does not have a volume attached at root (/dev/sda1)

In [14]:
response = my_ec2.monitor_instances(InstanceIds='InstanceId')
 

ParamValidationError: Parameter validation failed:
Invalid type for parameter InstanceIds, value: InstanceId, type: <class 'str'>, valid types: <class 'list'>, <class 'tuple'>

In [18]:
instance_id = "i-02765995957c8d047"
response = my_ec2.monitor_instances(InstanceIds=[instance_id])

Let's see if we could have started this EC2 instance, but not actually run it, with a _dry run_:

In [19]:
my_ec2.start_instances(InstanceIds=[instance_id], DryRun=True)

ClientError: An error occurred (DryRunOperation) when calling the StartInstances operation: Request would have succeeded, but DryRun flag is set.

In [20]:
response = my_ec2.start_instances(InstanceIds=[instance_id], DryRun=False)
print(response)

ClientError: An error occurred (InvalidParameterValue) when calling the StartInstances operation: Invalid value 'i-02765995957c8d047' for instanceId. Instance does not have a volume attached at root (/dev/sda1)

## Conclusion

We saw how to programmatically interact with Amazon Web Services (AWS) using Python's `boto3` library. We needed to perform some Identity and Access Management (IAM) to allow this access, and then were able to manipulate S3 buckets and EC2 compute instances.