# Amazon Web Services (AWS)

### Learning Objectives

- Describe core AWS services & concepts
- Configure your laptop to use AWS
- Use SSH key to access EC2 instances
- Launch & access EC2
- Access S3

### AWS Storage + Execution

What are the primary services that Amazon AWS offers?

Name | Full Name | Service
---|---|---
S3 | Simple Storage Service | Storage
EC2 | Elastic Compute Cloud | Execution
EBS | Elastic Block Store | Storage attached to EC2 instances

### Pop Quiz

<details>
<summary>Q: I want to store some video files on the web. Which Amazon service should I use?</summary>
A: S3
</details>

<details>
<summary>Q: I just created an iPhone app which needs to store user profiles on the web somewhere. Which Amazon service should I use?</summary>
A: S3
</details>

<details>
<summary>Q: I want to create a web application that uses Javascript in the backend along with a MongoDB database. Which Amazon service should I use?</summary>
A: S3 + EC2 + EBS
</details>

### S3 vs. EBS

What is the difference between S3 and EBS? Why would I use one versus the other?

Feature | S3 | EBS
---|---|---
Can be accessed from | Anywhere on the web;<br/>any EC2 instance | Specific availability zone;<br/>EC2 instance attached to it
Pricing | Less expensive;<br/>Storage (3¢/GB);<br/>Use (1¢/10,000 requests) | More expensive;<br/>Storage (3¢/GB) [+ IOPS]
Latency | Higher | Lower
Throughput | Usually more | Usually ess
Performance | Slightly worse |Slightly better
Max volume size | Unlimited | 16 TB
Max file size | 5 TB | 16 TB

### Pop Quiz

<details>
<summary>Q: What is latency?</summary>
A: Latency is the time it takes between making a request and the start of a response.
</details>

<details>
<summary>Q: Which is better?  Higher latency or lower?</summary>
A: Lower is better.
</details>

<details>
<summary>Q: Why is S3 latency higher than EBS?</summary>
A: One reason is that EBS is in the same availability zone.
</details>

# Leveraging S3

### Buckets and Files

What is a bucket?
- A bucket is a container for files.
- Think of a bucket as a logical grouping of files like a sub-domain.
- A bucket can contain an arbitrary number of files.

How large can a file in a bucket be?
- A file in a bucket can be 5 TB.

### Bucket Names

What are best practices on naming buckets?
- Bucket names must be unique across all of s3.
- Bucket names must be at least 3 and no more than 63 characters long.
- Bucket names must be a series of one or more labels, separated by a single period. 
- Bucket names can contain lowercase letters, numbers, and hyphens. 
- Each label must start and end with a lowercase letter or a number.
- Bucket names must _not_ be formatted as an IP address (e.g., 192.168.5.4).

What are some examples of valid bucket names?
- `myawsbucket`
- `my.aws.bucket`
- `myawsbucket.1`

What are some examples of invalid bucket names? 
- `.myawsbucket`
- `myawsbucket.`
- `my..examplebucket`

### Pop Quiz

<details>
<summary>Q: Why are these bucket names invalid?</summary>
A: Bucket names cannot start or end with a period. And they cannot have a multiple periods next to each other.
</details>

# Python - AWS Integration with boto & boto3
http://boto3.readthedocs.io/en/latest/guide/quickstart.html

> Boto is the Amazon Web Services (AWS) SDK for Python, which allows Python developers to write software that makes use of Amazon services like S3 and EC2. Boto provides an easy to use, object-oriented API as well as low-level direct service access.

### Step 0: Credentials


Get your access key and secret key from the `credentials.csv` that you downloaded from AWS. (https://console.aws.amazon.com/iam/home?region=us-east-1#/users)

Create a file called `~/.aws/credentials` (on Linux/Mac) or `%USERPROFILE%\.aws\credentials` (on Windows), and insert the following code into it. Replace `ACCESS_KEY` and `SECRET_KEY` with the S3 keys you got from Amazon.
  
```
[default]
aws_access_key_id = ACCESS_KEY
aws_secret_access_key = SECRET_KEY
```

### Step 1: Create a Connection to S3

In [1]:
# Boto 2.x
import boto
boto_connection = boto.connect_s3()

# Boto 3
import boto3
boto3_connection = boto3.resource('s3')

Check contents of existing buckets

In [2]:
# Boto 2.x
def print_s3_contents_boto(connection):
    for bucket in connection:
        for key in bucket:
            print(key.name)

# Boto 3
def print_s3_contents_boto3(connection):
    for bucket in connection.buckets.all():
        for key in bucket.objects.all():
            print(key.key)

In [3]:
print_s3_contents_boto(boto_connection)
print_s3_contents_boto3(boto3_connection)

Nothing there yet

### Step 2: Create a Bucket

In [4]:
import os
username = os.environ['USER']
bucket_name = username + "-new-bucket"

# Boto 2.x
boto_connection.create_bucket(bucket_name)
boto_connection.create_bucket(bucket_name)

# Boto 3
boto3_connection.create_bucket(Bucket=bucket_name)
boto3_connection.create_bucket(Bucket=bucket_name)

s3.Bucket(name='elliotcohen-new-bucket')

### Step 3: Access a Bucket

In [5]:
# Boto 2.x
bucket = boto_connection.get_bucket(bucket_name, validate=False)
exists = boto_connection.lookup(bucket_name)
if exists:
    file = bucket.new_key('hello-boto.txt')
    file.set_contents_from_string('Hello world from boto!')
    
print_s3_contents_boto(boto_connection)

hello-boto.txt


In [6]:
# Boto 3
import botocore
bucket = boto3_connection.Bucket(bucket_name)
exists = True
try:
    boto3_connection.meta.client.head_bucket(Bucket=bucket_name)
    boto3_connection.Object(bucket_name, 'hello-boto3.txt').put(Body=open('tmp/hello.txt', 'rb'))
except botocore.exceptions.ClientError as e:
    # If a client error is thrown, then check that it was a 404 error.
    # If it was a 404 error, then the bucket does not exist.
    error_code = int(e.response['Error']['Code'])
    if error_code == 404:
        exists = False
        
print_s3_contents_boto3(boto3_connection)

hello-boto.txt
hello-boto3.txt


### Step 4: Control Access to a Bucket and its Contents

By default, can we access our newly-created file on s3?

In [7]:
# let's find out...
def s3_url(bucket, key):
    return 'http://s3.amazonaws.com/{}/{}'.format(bucket, key)

print(s3_url(bucket_name, 'hello-boto.txt'))

http://s3.amazonaws.com/elliotcohen-new-bucket/hello-boto.txt


In [8]:
# Boto 2.x
bucket = boto_connection.get_bucket(bucket_name, validate=False)
key = bucket.get_key('hello-boto.txt')
bucket.set_acl('public-read')
key.set_acl('public-read')

# Boto 3
bucket = boto3_connection.Bucket(bucket_name)
obj = boto3_connection.Object(bucket_name,'hello-boto3.txt')
bucket.Acl().put(ACL='public-read')
obj.Acl().put(ACL='public-read')

{'ResponseMetadata': {'HTTPHeaders': {'content-length': '0',
   'date': 'Mon, 30 Oct 2017 04:28:16 GMT',
   'server': 'AmazonS3',
   'x-amz-id-2': 'EVfTc1lY+aZj41q2lnkY2+xZWTq/uLLUZZ6wCtykhf1oduJ+wGsbJNc/lLPTasHksUIj0TD01/M=',
   'x-amz-request-id': '0D79BEA5BEDAB844'},
  'HTTPStatusCode': 200,
  'HostId': 'EVfTc1lY+aZj41q2lnkY2+xZWTq/uLLUZZ6wCtykhf1oduJ+wGsbJNc/lLPTasHksUIj0TD01/M=',
  'RequestId': '0D79BEA5BEDAB844',
  'RetryAttempts': 0}}

In [9]:
# Now let's try again
print(s3_url(bucket_name, 'hello-boto.txt'))

http://s3.amazonaws.com/elliotcohen-new-bucket/hello-boto.txt


### Step 5: Delete a Bucket

In [10]:
# Boto 2.x
bucket = boto_connection.get_bucket(bucket_name, validate=False)
for key in bucket:
    key.delete()
bucket.delete()

# # Boto 3
# bucket = s3.Bucket(bucket_name)
# for key in bucket.objects.all():
#     key.delete()
# bucket.delete()

Check if we have deleted our bucket and its contents

In [11]:
print_s3_contents_boto(boto_connection)
print_s3_contents_boto3(boto3_connection)

## Success! 
We have successfully connected to s3, created a new bucket, added content, deleted that content and deleted the bucket itself.

# Common Issues and What To Do About Them
First, let's review keys steps from above.

In [12]:
# Step 1: Connect to s3
import boto
conn = boto.connect_s3()
conn

# Step 2: List all existing buckets
conn.get_all_buckets()

# Step 3: Create a new bucket
import os
username = os.environ['USER']
bucket_name = username + "-new-bucket"
bucket = conn.create_bucket(bucket_name)

# Step 4: Connect to bucket
bucket = conn.get_bucket(bucket_name, validate=False)

# Step 5: View contents
for key in bucket:
    print(key.name)

### Issue 1:  Boto is not able to find my credentials

**Q**: Boto is not able to find my credentials.  
**A**: Upgrade Boto. Older versions of Boto were not able to read the credentials file.

In [13]:
boto.__version__

'2.48.0'

### Issue 2: How do I read/write numeric data to s3?
**Q**: Previously we saw how to read/write a plain-text file to s3, but what about quantitative data?  
**A**: We will encode the data as a string along the way, but we can retrieve with fidelity and ease.

Recall how we read/write a text file

In [14]:
# Here's how we read/write a plain text file
file = bucket.new_key('hello-world.txt')
file.set_contents_from_string('Hello World!')

# List files again; new file should appear
bucket.get_all_keys()

# Retrieve file from a bucket
retrieved_file = bucket.get_key('hello-world.txt')
retrieved_file.get_contents_as_string()

b'Hello World!'

Now let's try with numeric data

In [15]:
# first, create some data locally
import numpy as np

n = 1000
mat = np.random.randint(-360,360, n*4).reshape(n,4)
mat[np.where(mat==0)] = 1

# and write it to csv
np.savetxt("random_coordinates.csv", mat, delimiter=",")

# here's the first ten rows
mat[:10, :]

array([[ 351, -340, -163, -354],
       [-143,  280,  221, -358],
       [ 111, -349, -261,  -76],
       [ -17, -239, -310,  -77],
       [ 310,  264, -109,  154],
       [-261,  198, -259, -187],
       [  51, -158,  -60,   56],
       [ 137,  -72,  241,  332],
       [-187,   37,  192,  179],
       [ 189,  -11,  -84,  196]])

In [16]:
# write data from local file to s3
file = bucket.new_key('random_coordinates.csv')
file.set_contents_from_filename('random_coordinates.csv')

101991

In [17]:
# Retrieve data from s3
retrieved_file = bucket.get_key('random_coordinates.csv')

In [18]:
# how do we get the data out?
c = retrieved_file.get_contents_as_string()
type(c)

bytes

In [19]:
import pandas as pd
import io
df = pd.read_csv(io.BytesIO(c), header=None)
arr = df.values
arr[:10, :]

array([[ 351., -340., -163., -354.],
       [-143.,  280.,  221., -358.],
       [ 111., -349., -261.,  -76.],
       [ -17., -239., -310.,  -77.],
       [ 310.,  264., -109.,  154.],
       [-261.,  198., -259., -187.],
       [  51., -158.,  -60.,   56.],
       [ 137.,  -72.,  241.,  332.],
       [-187.,   37.,  192.,  179.],
       [ 189.,  -11.,  -84.,  196.]])

In [20]:
# confirm nothing was scrambled in transmission
(arr == mat).all()

True

### Issue 3: Creating Buckets With Periods

**Q**: How can I create a bucket in Boto with a period in the name?
- There is a bug in Boto that causes `create_bucket` to fail if the bucket name has a period in it. 
- Try creating the bucket with a period in its name. This should fail.

In [21]:
bucket_name_with_period = bucket_name + ".1.2.3"

try:
    bucket_with_period = conn.create_bucket(bucket_name_with_period)
    bucket_with_period
except Exception as e:
    print("ERROR: {}".format(e))

ERROR: hostname 'elliotcohen-new-bucket.1.2.3.s3.amazonaws.com' doesn't match either of '*.s3.amazonaws.com', 's3.amazonaws.com'


**A**: Run this code snippet

In [22]:
import ssl

if hasattr(ssl, '_create_unverified_context'):
    ssl._create_default_https_context = ssl._create_unverified_context

- Now try creating th- e bucket with a period in its name and it should work.

In [23]:
bucket_name_with_period = bucket_name + ".1.2.3"

bucket_with_period = conn.create_bucket(bucket_name_with_period)

bucket_with_period

<Bucket: elliotcohen-new-bucket.1.2.3>

Don't forget to delete buckets when you're done with them!

In [24]:
bucket_with_period.delete()

- For more details see <https://github.com/boto/boto/issues/2836>.

### Issue 4: Access Control

Q: I want to access my S3 file from a web browser without giving my access and secret keys. How can I open up access to the file to anyone?
- You can set up Access Control Lists (ACLs) at the level of the bucket or at the level of the individual objects in the bucket (folders, files).

Q: What are the different ACL policies?

ACL Policy | Meaning
---|---
`private` | No one else besides owner has any access rights.
`public-read` | Everyone has read access.
`public-read-write` | Everyone has read/write access.
`authenticated-read` | Registered Amazon S3 users have read access.

Q: What does `read` and `write` mean for buckets and files?
- Read access to a file lets you read the file.
- Read access to a bucket or folder lets you see the names of the files inside it.

#### Pop Quiz

<details>
<summary>Q: If a bucket is `private` and a file inside it is `public-read` can I view it through a web browser?</summary>
A: Yes. Access to the file is only determined by its ACL policy.
</details>

<details>
<summary>Q: If a bucket is `public-read` and a file inside it is `private` can I view the file through a web browser?</summary>
A: No, you cannot. However, if you access the URL for the bucket you will see the file listed.
</details>

### Putting Access Control Into Action
**Q**: How can I make a file available on the web so anyone can read it?  
**A**: Create a file with a specific ACL.

In [25]:
file = bucket.new_key('yet-another-hello-world.txt')
file.set_contents_from_string('Hello World!', policy = 'private')

12

- Try reading the file.

In [26]:
file_url = 'http://s3.amazonaws.com/' + bucket_name + '/yet-another-hello-world.txt'

!curl $file_url

<?xml version="1.0" encoding="UTF-8"?>
<Error><Code>AccessDenied</Code><Message>Access Denied</Message><RequestId>9B06D59CE0972453</RequestId><HostId>SlxFASurSZ2UNVzlsBC489zWTiMwWfWQ/gEh0gc1MYtqpAb3X++0pecMnppyn67AFsiiiQRCynk=</HostId></Error>

- Now change its ACL.

In [27]:
file.set_acl('public-read')

!curl $file_url

Hello World!

- Also you can try accessing the file through the browser.
- If you do not specify the ACL for a file when you set its contents, the file is `private` by default.

### Issue 5: URL for S3 Files

**Q**: How can I figure out the URL of my S3 file?  
**A**: You can compose the URL using the region, bucket, and filename. 
- For 'N. Virginia' the general template for the URL is `http://s3.amazonaws.com/BUCKET/FILE`.
    - Region-specific endpoint is http://s3-AWSREGION.amazonaws.com/BUCKET/FILE.
- You can also find the URL by looking at the file on the AWS web console.

### Issue 6: Deleting Buckets

**Q**: How can I delete a bucket?
- Try deleting a bucket containing files. What happens?

In [28]:
try:
    bucket.delete()
except Exception as e:
    print("Error: {}".format(e))

Error: S3ResponseError: 409 Conflict
<?xml version="1.0" encoding="UTF-8"?>
<Error><Code>BucketNotEmpty</Code><Message>The bucket you tried to delete is not empty</Message><BucketName>elliotcohen-new-bucket</BucketName><RequestId>F10856313CDE00F0</RequestId><HostId>rRbpeg1xIMJ88+VXgX5ivZ3TScUCSqOGhLmO4MrOy3VQEJo8YGDEM902sZfoRvQ24BDEaHUbEbU=</HostId></Error>


- To delete the bucket first delete all the files in it.

In [29]:
for key in bucket.get_all_keys(): 
    key.delete()

- Then delete the bucket.

In [30]:
print ('Before bucket deletion')
print (conn.get_all_buckets())

bucket.delete()

print ('After bucket deletion')
print (conn.get_all_buckets())

Before bucket deletion
[<Bucket: elliotcohen-new-bucket>]
After bucket deletion
[]


# Now Go Forth and Conquer the Individual Excercise
We have covered EC2 in a previous lecture, so the note below are for reference only.

## Amazon EC2

### Regions

Q: What are *AWS Regions*?
- AWS is hosted in different geographic locations worldwide. 
- For example, there are 4 regions in the US.

Q: What are the regions in the US

Region | Name | Location 
---|---|--- 
us-east-1 | US East | N. Virginia
us-east-2 | US East 2 | Ohio
us-west-1 | US West | N. California
us-west-2 | US West 2 | Oregon

Q: How should I choose a region?
- N. Virginia or `us-east-1` is the default region for EC2.
- Using a region other than N. Virginia requires additional configuration.
- If you are not sure choose N. Virginia.

### Availability Zones

Q: What are *AWS Availability Zones*?

- Regions are divided into isolated availability zones for fault tolerance.
- Availability zones run on physically separate hardware and infrastructure.
- They do not share hardware, generators, or cooling equipment. 
- Availability zones are assigned automatically to your EC2 instances based on your user ID.

<img src='assets/aws_regions.png'>

<details>
<summary>Q: Is it possible for two separate users to coordinate and land on the same availability zone?</summary>
1. Availability zones are assigned automatically by the system.
2. It is not possible for two AWS users to coordinate and be hosted on the same availability zone.
</details>

### Connecting to EC2

Q: How can I connect to an EC2 instance?
- Login to the AWS console.
- Navigate: EC2 > Launch Instance > Community AMIs > Search community AMIs > `ami-a4c7edb2` (An Amazon AMI)
- View the instance and get its Public DNS.
    - This should look something like `ec2-34-229-96-155.compute-1.amazonaws.com`.
- Use this command to connect to it.
    - `ssh -X -i ~/.ssh/keypair.pem user@domain`
    - Here is an example:
        - `ssh -X -i ~/.ssh/keypair.pem ec2-user@ec2-34-229-96-155.compute-1.amazonaws.com`
- Make sure you replace the Public DNS value below with the value you have for your instance.

### Copying Files to EC2

Q: How can I copy files to the EC2 instance?
- To copy a file `myfile.txt` to EC2, use a command like this.
    - `scp -i ~/.ssh/keypair.pem myfile.txt user@domain:`
- To copy a directory `mydir` recursively to EC2, use a command like this. 
    - `scp -i ~/.ssh/keypair.pem -r mydir user@domain:`

#### Pop Quiz

<details>
<summary>Q: When you copy a file to EC2 with `scp` will this show up in S3?</summary>
A: No. The file will be stored on the disk on the EC2 instance. It will not be in S3.
</details>