# Using Amazon Web Services and Docker

### Chris Messier

## Goals

### Amazon Web Services (AWS)
- Understand the value and purpose of using an Amazon Web Services ec2 instance
- Sign Up for an AWS Account
- Request access to a GPU Compute instance
    - Understanding accelerated GPU computing
- Install AWS command line tools
    - Install AWS python API


## AWS

https://aws.amazon.com/

### Overview

#### Cloud Computing
The plummeting cost of processing power, and the rising demand for it, has lead to the rise of a large number of cloud computing services.
Cloud computing is a rather simple concept.
A cloud computing company owns a large number of computers and they will rent them to you if you need to use them.
They provide access to a large amount of computional resources, at a low price.
This lowers the barriers to entry imposed by the large upfront costs of building your own hardware, removes the costs associated with maintaining the hardware, and allows you the flexibility to scale to you moment-to-moment needs.
The large network of computing centers also means that computation can be distributed, and therefore is more fault tolerant than localized hardware resources.

There are some downsides though:
- Latency
- Security
- Higher marginal costs

#### Why Amazon Web Services, though?

[Comparison of Google Cloud and AWS](https://kinsta.com/blog/google-cloud-vs-aws/)

When it comes to cloud computing, AWS has a commanding market share.

![alt text](assets/cloud-compare.png)

This means knowing how to use it can be an invaluable skill to have on your resume.

#### Available Resources

#### Elastic Compute Cloud (ec2)

The primary service that we will be discussing today is AWS' Elastic Compute Cloud, or ec2.
This service allows you to run programs remotely on an AWS server.
What this means, is that you have access to a wide range of hardware so that you can use the right tools to complete the various tasks that you may need when working on

#### Why do we want to use the cloud anyway?

### _To the cloud!_

### Sign Up

The sign up process for AWS is fairly simple.
Just go [here](https://portal.aws.amazon.com/gp/aws/developer/registration/index.html) and sign up, just like you would any other website.
You will have to put down a credit card, in case you do incur any charges, but there is a free option.
For this, make sure you sign up for the __Free Tier__ account.
This will give you access to a lot of AWS' services free for a year.
Much of what you'll want to do with AWS will be covered by the Free Tier.
After you signing up, make sure you select `us-east-1`, Northern Virginia as your region.
(For some reason the default is Ohio...)

Once you have signed up, slack Brems the email address you used to create the account, and he will add you to the GA-DSI-6 organization.


For our purposes though, we will need to use a paid service, a p2.xlarge GPU compute instance.

### GPU Computing

Provide an overview of GPU computing, and its use in Data Science

### Request instance increase

AWS understands that when users are just starting out, they are prone to making mistakes.
A mistake that is common for new users is not understanding the difference between various kinds of ec2 compute instances.
While AWS does offer free tier services, using some ec2 instances can get incredibly costly.
For example, accidentally leaving a p2.16xlarge instance running all month will cost you around $10,540.80, which is a rather costly mistake to make.
To mitigate this risk, AWS resricts the instances that you have access.
In order to use more powerful ec2 instances, you will need to request access to it.

Luckily, we it's not difficult to request an instance increase.
We simply have to go through AWS' support center.
It is important to note that it can take 24-48 hours for the request to go through, so in order to get ahead of the curve, we're going to submit the request before moving forward.

### Step 1: Go to the support center

![Menu Bar](assets/menu-bar.png)

### Step 2: Create a support case

![Create Case](assets/create-case.png)

### Step 3: Request a limit increase 

![Instance Request](assets/instance-request.png)

### Step 4: Wait...

### _In the mean time..._

### Installing AWS command line tools

We will be connecting to the ec2 instances through our terminal of choice.
To do this though, we have to install AWS command line tools.
The aws command line tools will allow you to interact with your ec2 instance through your command line.
This means you will always have access to, and the ability to interact with, your compute instances.

[aws documentation](http://docs.aws.amazon.com/cli/latest/userguide/installing.html)

In [3]:
! pip3 install awscli --upgrade --user  # running this should install everything that is needed...

Collecting awscli
  Using cached awscli-1.13.0-py2.py3-none-any.whl
Collecting rsa<=3.5.0,>=3.1.2 (from awscli)
  Using cached rsa-3.4.2-py2.py3-none-any.whl
Requirement already up-to-date: docutils>=0.10 in /usr/local/lib/python3.6/site-packages (from awscli)
Requirement already up-to-date: PyYAML<=3.12,>=3.10 in /usr/local/lib/python3.6/site-packages (from awscli)
Requirement already up-to-date: s3transfer<0.2.0,>=0.1.9 in /usr/local/lib/python3.6/site-packages (from awscli)
Collecting colorama<=0.3.7,>=0.2.5 (from awscli)
  Using cached colorama-0.3.7-py2.py3-none-any.whl
Collecting botocore==1.8.3 (from awscli)
  Using cached botocore-1.8.3-py2.py3-none-any.whl
Collecting pyasn1>=0.1.3 (from rsa<=3.5.0,>=3.1.2->awscli)
  Using cached pyasn1-0.4.2-py2.py3-none-any.whl
Requirement already up-to-date: python-dateutil<3.0.0,>=2.1 in /usr/local/lib/python3.6/site-packages (from botocore==1.8.3->awscli)
Requirement already up-to-date: jmespath<1.0.0,>=0.7.1 in /usr/local/lib/python3.6/si

### Launching an EC2 Instance

Walk class through launching an ec2 instance, and connecting via the commandline.
Have a ready to go `"Hello, World"` program to run on the instance.

When it comes to launching an aws instance, there's the [easy way](https://console.aws.amazon.com/quickstart/vm/home?region=us-east-1#), and then there's the way we're going to do it.

First we'll have to go to the [EC2 Management Console](https://console.aws.amazon.com/ec2/v2/home?region=us-east-1).
The EC2 Management Console gives you an overview of the instances you have running.
It's important to note that it's region specific.
That means that if you happened to launch an instance somewhere, let's say Ohio, then you won't see it on your `us-east-1` console.

### Step 0: press the "Launch Instance" button.

![Launch Instance](assets/launch-instance.png)

### Step 1: Choose AMI

For this step, you essentially are selecting the operating system and software environment that will be installled on your EC2 instance.
There are a large number of options available, with many AMIs designed for specific purposes.
For now, we will just choose the Amazon Linux AMI (the first option), by pressing the "Select" button.

![Choose AMI](assets/aws-ami.png)

### Step 2: Choose Instance Type

Here we will be selecting the actual hardware that we will be running on.
For this simple introduction, we'll be using a t2.micro instance.
Select this by highlighting the box to the left of the instance, and pressing "Next: Configure Instance Details" on the lower right of the screen.

![Choose Instance](assets/aws-instance.png)

### Step 3: Configure Instance Details

![Configure Instance](assets/aws-configure.png)

### Step 4: Add Storage

8 GB will be fine for now.

![Configure Instance](assets/add-storage.png)

### Step 5: Add Tags

We're going to skip this...

### Step 6: Configure Security Group

Here you will need to set the security for your instance.
This is what makes sure only the people you want are able to connect to your instance.

# NEVER SET SOURCE TO "ANYWHERE"

Open ports are a nefarious hacker's playground.

Once you're done, press "Review and Launch"

![Configure Instance](assets/configure-security.png)

### Step 5: Review Instance Launch

We're almost there!
Make sure that everything is set up correctly.
Once everything looks right, press "Launch".

![Review Instance](assets/review-instance.png)

### Step 6: Create Security Key

After you hit launch, you will have seen a pop-up appear, asking you to create a new security key (or use an existing key).
What is created is a `.pem` file which is the key you will use to access your instance.
After it finishes downloading, move it to some place safe.
You ___NEED___ to keep this file secure, as losing it means you won't be able to access your instance any more.
Remember where you put this file also, as you will need to know it's location in order to reference it when you connect.
For the sake of consistency, I will place mine on my desktop.
I do not recommend keeping your .pem files on your desktop, but instead creating a directory for them that will keep it all safe.

### Step 7: Launching the Instance

You did it!

![Launching Instance](assets/instance-launch.png)

Somewhere in Reston a processor just started running, and _you_ made that happen!
Now we just have to connect to it.
Press the "View Instances" button to see the instances that are running.
Here you'll get the information necessary to connect to your ec2 instance.

### Step 8: Connecting to your Instance

Select your instance, then at the top of the console, press "Connect".
This will launch a pop-up that gives you information on how to connect to your instance.
For what I hope are obvious security reasons, I won't be including screen shots for this part, but we will walk through it together.
The most important part is what you see under "Example", which should be something like the following:
```commandline
ssh -i "my-key.pem" ec2-user@ec2-000-000-000-00.compute-1.amazonaws.com
```
This is the command you will use to connect to your instance through your terminal, so copy it now.

Open your terminal and navigate to the directory you saved your `.pem` file in.
Paste the command that you got from above, and hit enter to connect.
In a few moments you should see output similar to what you see below.

![EC2 Terminal](assets/ec2-terminal.png)


_Note: Your output will look slightly different. I took this screenshot while using the gpu compute instance I set up. Whoops!_

### Step 9: Moving files to AWS

Now that we're up and running, we need to know how to get the programs we've worked tirelessly on up into the cloud.
This isn't as easy as simply dragging and dropping, but it's not terribly difficult.
To do this we use the following command:
```commandline
scp -i <.pem file> <file to copy>  <ec2 address>:<path on instance>
```

More often than not though, you won't be copying a single file, but instead you'll be copying a directory.
To do this we simply add the `-r` argument to `scp` like so
```commandline
scp -r -i <.pem file> <folder to copy>  <ec2 address>:<path on instance>
```

Let's just focus on copying a single file.
In the repo for this talk, you'll find a file named `hello_aws.py`.
Let's go ahead and copy it up to the instace we created.
Your command should look something like:
```commandline
scp -i "my-key.pem" "aws-docker-talk/hello_aws.py" ec2-user@ec2-000-000-000-00.compute-1.amazonaws.com:~
```

This will copy the file to the root directory of the ec2 instance.
Once it's there let's go ahead and execute it with:
```commandline
python hello_aws.py
```

This is obviously a toy example.


### Step 10: Editing files on AWS

How you edit files on AWS will depend somewhat on what AMI you've chosen to use.
However, a vim is always a go to option.
Let's use vim to edit `hello_aws.py`

```commandline
vim hello_aws.py
```

You should see something like this:

![vim on AWS](assets/aws-vim.png)

vim is a rather basic text editor, which may seem a bit clunky, but here are some basic commands.
- `i`: Insert mode (this is how you type)
- `esc`: Command mode (this lets you do things like saving)
- `w`: [while in command mode] write to disk
- `q`: [while in command mode] quit
- `wq`:[while in command mode] write and quit
- `q!`: [while in command mode] quit without saving

### Step 10: Moving files from AWS

After you're done running your awesome program in the cloud, you're going to want to get them back to your local computer.
To do this, we once again use `scp`, but change up the command a bit.

```commandline
scp -i <.pem file> <ec2 address>:<file path on instance> <local file path>
```

So to get our `hello_aws.py` file back on our local disk, we'll use something like this:

```commandline
scp -i "my-key.pem" ec2-user@ec2-000-000-000-00.compute-1.amazonaws.com:~/hello_aws.py "aws-docker-talk/"
```

Check to see if the file has been updated locally.

### Step 11: Shut down

After you are done with your ec2 instance, you will need to shut it down.
While we launched a Free Tier instance here, a paid instance that's forgotten about and left running can end up costing you a large sum of money.
To shut down your ec2 instance, return to the aws management console, [here](https://console.aws.amazon.com/ec2/v2/home?region=us-east-1#Instances:sort=desc:tag:Name).
Once again, select your instance, but then press the "Actions" button at the top of the screen.
This will open a drop down menu.
You'll want to navigate to:

```
Actions $\to$ Instance State
```

Here you should see several options, but the two we'll focus on are __Stop__ and __Terminate__.

- __Stop__: Will halt the use of the instance, and frees up the hardware for another user. Anything in memory will be lost, however things written to disk will still be accessible when the instance is launched again. _Note: You may be charged a nominal fee to keep files on a disk volume._
- __Terminate__: Stops the instance, but then deletes the volumes, and removes the instance. With this everything will be lost.

As this was simply an excercise in setting up an EC2 instance, I recommend terminating the instance.
When working with a real instance, you'll want to ensure that you stop the instance and move all the files off the instance before terminating it.

### Wrap Up

We discussed the role that cloud computing plays in a data science workflow, created an AWS account, and launched an EC2 instance.
When you set up your own instance, I recommend using the following settings:

- Use the Amazon Linux Deep Learning AMI
- Use a p2.xlarge instance
    - uses Nvidia Tesla K80
    - costs around $0.90 an hour
- Use 50+ gb of storage (maybe more if doing image tasks)
- Optimze your gpu usage according to the following explainer: [optimizing nvidia gpu](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/optimize_gpu.html)

## Docker

https://www.docker.com/

### Overview

Explain the benefits of Docker.

### Installation

### Sign Up

### Deploy Container

For this, I should build something using `cv2` and then have them run it.