# AWS: Amazon Web Services


---

![](https://snag.gy/dFoKAy.jpg)




### Learning Objectives
*After this lesson, you will be able to:*
- Explain what services AWS offers and which ones are relevant to data science.
- Start and terminate an EC2 instance in the cloud.
- Understand how to use the AWS CLI.
- Use EC2 from the command line.

### Student Pre-Work
*Before this lesson, you should already be able to:*
- set up an account on AWS, using 2-Factor authentication for security
- connect to a remote computer via ssh
- students should have a credit card to sign up for AWS or have an AWS account

## Introduction
---


Today we are going to walk through Amazon Web Services (AWS). In particular we will focus on the services that are commonly used in Data Science. AWS are cloud computing services, essentially virtual machines somewhere in a datacenter that you can access and pay only for the time you need them.


**Check:** What is a server?

> Answer: "A server is a computer or computer program that manages access to a centralized resource or service in a network."

(Source: [oxforddictionaries.com](https://en.oxforddictionaries.com/definition/server)) 

**Check:** What did the world look like before AWS and Google cloud?

> Answer: computation was expensive to set up, to access and to maintain => only large companies, governments and institutions had access to it. Now anyone can rent it for pennies.

## Welcome to AWS
---

<img src="http://i.giphy.com/3oEjHBa34dVLv0jnoc.gif">

## What is AWS?
---

> _Amazon Web Services (AWS)_, is a subsidiary of Amazon.com, which offers a suite of cloud computing services that make up an on-demand computing platform. These services operate from 12 geographical regions across the world. 

> Arguably the most essential and best-known of these services include Amazon Elastic Compute Cloud, also known as "EC2", and Amazon Simple Storage Service, also known as "S3". **AWS now has more than 70 services that span a wide range including compute, storage, networking, database, analytics, application services, deployment, management, mobile, developer tools and tools for the Internet of things.** 

> Amazon markets AWS as a service to provide large computing capacity quicker and cheaper than a client company building an actual physical server farm. _(from wikipedia)_

Today we will explore two services that are relevant to a lot of big-data scenarios:

1. EC2 (Elastic Compute Cloud)
- S3 (Simple Storage Service)

By the end of today you will be able to start and stop a computer in the cloud and to store data in the cloud. How cool is that!?

> **Note:** In the absence of amazon credits you can sign up with a new account and get free-tier usage for 1 year.

## Who uses it?
---

Notable clients include(d):
- Yelp
- Netflix
- NASA
- Pinterest
- Spotify
- The CIA
- The Obama Administration

[And many more, viewable here from Amazon's case studies page.](https://aws.amazon.com/solutions/case-studies/)

**Check:** What could be some advantages of using a server in the cloud instead of managing our own data center?

- Cost reduction: don't pay infrastructure costs when you don't need it
- Reliability: Servers are maintained and guaranteed by a company whose only job is to make sure they are available for you
- Scalability: Can add more computing power when necessary

## Elastic Compute Cloud (EC2) overview
---

The first service we will discover is _Elastic Compute Cloud_ or _EC2_. This service forms a central part of Amazon.com's cloud-computing platform, allowing users to rent virtual computers on which to run their own computer applications. Let's learn some key terms first:

- **Instance**: virtual machine hosted in Amazon Cloud running the software we want
- **Amazon Machine Image (AMI)**: a snapshot of a configured machine that we can use as starting point to boot an instance. We can also save a running instance to a new AMI so that in the future we can boot a new machine with identical configuration.
- **SSH Key**: [pair of keys](https://en.wikipedia.org/wiki/Public-key_cryptography) necessary to connect to an instance remotely. The private key will be downloaded to our laptop, the matching public key will be automatically configured on the instance.


The main conceptual shift from using a laptop to running an instance in the cloud is that we can think of computing power as ephemeral. We request computing power when we need it, do a calculation and dismiss that power when done. 

Input and output will not be stored on the machine, instead it is stored somewhere else in the cloud (hint: S3). In this sense, computing power is a commodity that we purchase and use in the amount and time that we need.


### Let's see how it works.

> 1) Create a new account on AWS [here](https://aws.amazon.com/)

It will ask you for contact information and credit card. Do not worry, most of the thing we will do are free for first time users and when we will use paying services it won't likely cost more than 10$.

Here are some screenshots of the process:

![](./assets/images/aws1.png)

![](./assets/images/aws2.png)

![](./assets/images/aws3.png)

**Once you're done you should get to this page:**

![](./assets/images/awsmike.png)

**Let's sign into EC2. Click the "Services" tab and select "EC2" (from the "COMPUTE" heading):**

![](./assets/images/awsmike2.png)

## EC2 tutorial
---

Let's go ahead and follow the [tutorial for EC2](https://aws.amazon.com/getting-started/tutorials/launch-a-virtual-machine/).

<a id='step1'></a>
### Step1: Launch an Amazon EC2 instance

![](./assets/images/launch-instance.png)

### Step 2: Configure your instance

Follow the suggested steps until you see your image booting up:

![](./assets/images/launched1.png)

Notice that we can have a lot of information about the instance, in particular:

- it's DNS name and IP address
- They type of instance
- The key necessary to connect

### Step 3: Connect to your instance

Go ahead and follow the instructions on how to connect to the instance. In particular:

1. (optional) download a bash shell
- copy the SSH key you downloaded to the appropriate location
- use the SSH key to connect as explained in the tutorial

![](./assets/images/connected.png)



## Congratulations!! You've just connected to an instance in the cloud!! How awesome is that!!

Try launching python from the shell and do something with it.

![](./assets/images/python.png)

<a id='step4'></a>
### Step 4: Terminate your instance

Once you're done with your calculation and you no longer need the instance, you can go ahead and terminate it. NB: this will kill the instance and it will no longer be available to you. You should make sure you have saved all the data and the code you needed somewhere else.

![](./assets/images/terminate.png)

![](./assets/images/terminated.png)

Unless you are using your machine to serve a live application (like a web app or an api) it's very important that you terminate your instance if you don't use it so that you don't incur in additional unnecessary costs.


<a id='addl'></a>
### Additional remarks

We've walked through the simplest way to launch and terminate an instance in the cloud.

There's a lot more to it that you'll discover in time. Here are some pointers you may find useful:

- [Pricing](https://aws.amazon.com/ec2/pricing/): EC2 pricing depends on the type of instance and on the chosen region. Make sure you understand the cost of the instance you request in order to avoid surprise bills. If you're in doubt you can use the convenient [Cost Calculator](http://calculator.s3.amazonaws.com/index.html) to get an exact forecast of your costs.

![](./assets/images/costcalculator.png)

- [AMIs](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AMIs.html) AMIs are shapshot of our machine. They are great if we installed a lot of software on our machine and want to save that particular configuration.

![](./assets/images/createimage.png)


**Check:** can you give an example of when AMIs could be useful?

- [Elastic IPs](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/elastic-ip-addresses-eip.html): we can rent a fixed IP address and associate it to our instance. This way we can configure tools to always connect to the same address, independently of which machine is behind it.


- [Security Groups](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-network-security.html): security groups are ways to open ports to the services running on our machine and control the inbound protocols permitted.

<a name="demo"></a>
## Simple Storage Service [S3] (5 min)

We have learned how to start and stop an instance in the cloud. That's great, because it gives us "computing power as a service". Now let's learn how we can store data in the cloud too.

**Amazon S3 (Simple Storage Service)** is an online file storage. It provides storage through web services interfaces using an _object storage architecture_. According to Amazon, S3's design aims to provide scalability, high availability, and low latency at commodity costs.

Objects are organized into **buckets** (each owned by an Amazon Web Services account), and identified within each bucket by a unique, user-assigned key. Buckets and objects can be created, listed, and retrieved using either a REST-style HTTP interface or a SOAP interface. Additionally, objects can be downloaded using the **HTTP GET interface and the BitTorrent protocol.**


<a name="s3-tutorial"></a>
## Simple Storage Service (S3) tutorial

In pairs: go ahead and follow the [tutorial for S3](https://aws.amazon.com/getting-started/tutorials/backup-files-to-amazon-s3/).

The steps should be super simple to follow. Any questions?

**Check:** what's a practical case you can envision using S3 for?


## Small Demo - Loading DF's hosted on s3

<a name="awscli"></a>
## AWS Command Line (AWSCLI)
---

Wow, great! We have learned to request and access computing power and storage as a service through AWS. Wouldn't it be nice to be able to do this in a quick way from the command line? Yeah! Let's introduce AWSCLI!

[AWSCLI](https://github.com/aws/aws-cli) is a unified command line interface to Amazon Web Services. It allows us to control most of AWS services from the same command line interface.

**Check:** Why is that useful? Why is that powerful? Can you give some examples?

> e.g. to be able to programatically turn instances on and off or to create complex architectures or to provision clusters in response to a demand

<a name="awscli-tutorial"></a>
## AWSCLI tutorial
---

We're going to walk through this tutorial together. Loosely, we're following the [tutorial for AWSCLI](https://aws.amazon.com/getting-started/tutorials/backup-to-s3-cli/), however, there have been some updates to the interface that the instructions don't reflect. 

To start, click on the above link. Perform steps 1a and 1b.

### AWSCLI tutorial - Step 1c

On the add user screen be sure to select "Programmatic Access". Then click on the "Next: Permissions" button.

![](./assets/images/add_user_access.png)


### AWSCLI tutorial - Step 1d

On this permissions page of the Add User process, select the "Attach existing policies directly" button. Just below, a dropdown will appear. From this dropdown check the box for "AdministratorAccess" (the first row, pictured below).

Continue to the next screen by clicking "Next: Review"

![](./assets/images/administratoraccess.png)

### AWSCLI tutorial - Step 1e

On this review page, just click the "Create user" button. 

![](./assets/images/createuser.png)

### AWSCLI tutorial - Step 1f

Here you'll want to click the "Download .csv" button. Once you've done this, click the "Close" button.

![](./assets/images/downloadcsv.png)

<a id='cli-step2'></a>
### Step 2: Install and configure the AWS CLI

Follow the instructions [here](http://docs.aws.amazon.com/cli/latest/userguide/installing.html) to install the AWS command line interface.

> **Note:** If you already have AWSCLI configured and you would like to have multiple roles, you can do that as explained [here](http://docs.aws.amazon.com/cli/latest/userguide/cli-roles.html).

<a id='cli-step2b'></a>
### Step 2.B: Setting up your environmental variables

These environmental variables must be set in order for the AWS client to properly authenticate and thus communicate with your machine. Type 'aws configure' in terminal and supply the values contained in the spreadsheet you downloaded in step 1f, above.


```bash
$ aws configure
AWS Access Key ID [None]: AKIAIOSFODNN7EXAMPLE
AWS Secret Access Key [None]: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
Default region name [None]: us-east-2
Default output format [None]: ENTER
```

<a id='cli-step3'></a>
### Step 3: Using the AWS CLI with Amazon S3

Now you can go ahead and copy files back and forth from your command line, without ever having to click on the web interface. How cool is that?

Here's a [Cheat Sheet](https://github.com/toddm92/aws/wiki/AWS-CLI-Cheat-Sheet) for the AWSCLI.

<a name="ec2-cli"></a>
## EC2 from the command line
---

Empowered with a well-configured AWSCLI, we can now start and stop EC2 instances from the command line! Let's use it to spin up a spot instance.


In [3]:
%%bash
# MUST RUN `aws configure` in a terminal 1st!
# Checking the spot prices of m4.large Linux boxes in Ohio.
aws ec2 describe-spot-price-history \
    --start-time $(date -u +"%Y%m%dT%H0000") \
    --product "Linux/UNIX" \
    --instance-type "m4.large" \
    --region us-east-2 \
    --output table

bash: line 3: aws: command not found


In [42]:
%%bash
# MUST RUN `aws configure` in a terminal 1st!
# Checking the spot prices of m4.large Linux boxes in Ohio.
aws ec2 describe-spot-price-history \
    --start-time $(date -u +"%Y%m%dT%H0000") \
    --product "Linux/UNIX" \
    --instance-type "m4.large" \
    --region us-east-2 \
    --output table

-------------------------------------------------------------------------------------------------------
|                                      DescribeSpotPriceHistory                                       |
+-----------------------------------------------------------------------------------------------------+
||                                         SpotPriceHistory                                          ||
|+------------------+---------------+---------------------+------------+-----------------------------+|
|| AvailabilityZone | InstanceType  | ProductDescription  | SpotPrice  |          Timestamp          ||
|+------------------+---------------+---------------------+------------+-----------------------------+|
||  us-east-2b      |  m4.large     |  Linux/UNIX         |  0.015100  |  2017-08-27T21:49:35.000Z   ||
||  us-east-2b      |  m4.large     |  Linux/UNIX         |  0.015200  |  2017-08-27T21:42:47.000Z   ||
||  us-east-2b      |  m4.large     |  Linux/UNIX         |  0.0

### Get the security group id

In the previous activity we launched an instance and created a security group that allows SSH access. Let's use the same security group.

Run the code below in the cell below in terminal.

This will return a json string. You want to copy the value associated with the "GroupId" key of the security group that has port 22 open (if there are running instances). As this is the first time many of you are using AWS, there should only be one unique "GroupId" key (though it may appear several times throughout the json). 

In [44]:
%%bash
aws ec2 describe-security-groups --region us-east-2

<a id='ami-id'></a>
### Get details on a specific AMI using the AMI id

Get the AMI id of the same Amazon Linux AMI we used in the GUI. You can find by checking the name in the [launch instance window](https://us-west-2.console.aws.amazon.com/ec2/v2/home?region=us-west-2#LaunchInstanceWizard). 

> At the time of writing it is: `ami-ea87a78f`.

You can check it by typing:

```bash
aws ec2 describe-images --image-ids ami-ea87a78f --region us-east-2
```

<a id='launch'></a>
### Launch spot instance

You're now ready to sumbit the spot instance request:

```bash
aws ec2 request-spot-instances \
    --region us-east-2 \
    --spot-price 0.02 \
    --launch-specification "{
        \"KeyName\": \"MyKeyPairEast2\",
        \"ImageId\": \"ami-ea87a78f\",
        \"InstanceType\": \"m4.large\",
        \"SecurityGroupIds\": [\"sg-f677a09e\"]
    }"
```

If working this should return a json description of the instance request.

> **Troubleshooting Note**: When setting up EC2 instances, always be mindful of the region! Take a look at the top right of your screen! Is your key pair set up in this region, or for a different region? Check by clicking into your target region. Under the "Resources" header there's a link to your Key Pair!

You can check that the instance request has been opened:

![](./assets/images/instancerequest.png)

or by command line:

```bash
aws ec2 describe-spot-instance-requests --region us-east-2
```

When the request has been accepted, an instance is spawned:

![](./assets/images/spotinstance.png)

Let's retrieve the DNS name:
```bash
aws ec2 describe-instances --region us-east-2 --output json | grep PublicDnsName | head
```

<a id='connect'></a>
### Connect to the spot instance

```bash
ssh -i ~/.ssh/MyFirstKey.pem ec2-user@<YOUR INSTANCE DNS>
```


<a id='terminate'></a>
### Terminate the spot instance

Let's retrieve the instance id and kill it by typing the commands below. This command has to be run locally, not from the Amazon box we just ssh'd into, so open a new terminal window to run!

```bash
aws ec2 describe-instances --region us-east-2 --output json | grep InstanceId

aws ec2 terminate-instances --instance-ids i-0aa55cd3363b0f187
```

![](./assets/images/terminatedspot.png)

Et Voilà!


<a name="conclusion"></a>
## Conclusion
---

In this lesson we have learned about two fundamental Amazon web services: Elastic Cloud Compute and Simple Storage Service. These two services are so common because they provide on demand computation and storage at a very affordable cost.

We have learned how to use them both from the web interface and from the command line.

**Check:** can you think of a situation where this could be useful?

<a id='resources'></a>
## Additional resources
---

- [EC2](https://aws.amazon.com/ec2/?nc2=h_m1)
- [S3](https://aws.amazon.com/s3/?nc2=h_m1)
- [Tutorials](https://aws.amazon.com/getting-started/tutorials/)
- [AWS CLI Tutorial](http://www.joyofdata.de/blog/guide-to-aws-ec2-on-cli/)