# Amazon Basics 1: Introduction

## Usage Notes

The purpose of this notebook is to give you a quick introduction to the purpose of this series of notebooks as well as provide you with the prerequisites to actually following along, because it will involve some software that you may have never used before.

## Notebook Imports

In [None]:
from IPython.utils.py3compat import *
import json
import os
import subprocess
import sys

## Initial Steps

This tutorial is prepared as a series of Jupyter notebooks.

* http://jupyter.org/

A convenient way to install Jupyter and other Python packages is through Anaconda, and this notebook series assumes that you're using ``conda`` to manage everything.

* https://www.continuum.io/downloads

However, since you don't need all the mathematics and scientific computing libraries associated with Anaconda, you can actually use its slimmed-down version Miniconda instead.

* http://conda.pydata.org/miniconda.html

After you've installed either Anaconda and Miniconda, create a new environment for creating EC2 instances (or you can reuse any existing environments you already have), navigate to where you downloaded the notebooks, and start Jupyter.

On Linux and OS X, you would run the following commands.

```
conda create -n USEFUL_NAME pip jupyter nb_conda
source activate USEFUL_NAME
jupyter notebook
```

On Windows, you would run the following commands instead.

```
conda create -n USEFUL_NAME pip jupyter nb_conda
activate USEFUL_NAME
jupyter notebook
```

You can also optionally force a specific version of Python for each environment. The notebook series was originally written for Python 2 and was migrated to Python 3, so you can specify either `python=2` or `python=3` depending on personal preferences.

Once you've setup your `conda` environment, downloaded the tutorial series, and started Jupyter, you should see this file (`basics1.ipynb`) listed in the web browser. Click on it to begin.

Note: if the `jupyter notebook` command opens the wrong browser, and you do not wish to have to worry about tokens (this makes your installation less secure), you can also optionally add a file `USER_HOME/.jupyter/jupyter_notebook_config.py` with the following content.

```
c.NotebookApp.token = u''
```

### Conda Check

Let's first make sure that you have everything installed.

In [None]:
!conda list pip
!conda list jupyter
!conda list nb_conda

On Windows, you'll need to make sure that Python associate your `conda` installation for files with the `.py` extension. Luckily `sys.executable` from within Jupyter tells us the information we need.

In [None]:
if os.name == 'nt':
    subprocess.call(['assoc', '.py=py_auto_file'])
    subprocess.call(['ftype', 'py_auto_file="%s"' % sys.executable, '"%1"', '%*'])

### AWS CLI Check

The goal of this series is to introduce you to some of the things that can be done with Amazon Web Services through its command line interface (`awscli`), along with a lot of scripts that you can use to provision servers in the Amazon Cloud.

The scripted provided in this tutorial uses `awscli` as the main interface to everything related to Amazon Web Services. In other words, this notebook provides wrappers which do nothing more than invoke CLI commands, though technically it could be implemented using Python libraries, like `boto`.

First, let's make sure that you have `awscli`.

In [None]:
!pip install awscli

If you are on Windows, it is recommended that you instead install the native `awscli` client for your operating system as it will work in both Git Bash and the regular Windows command prompt.

* https://aws.amazon.com/cli/

However, if you're not on Windows, or you don't see a need to use the native version (you want to see if a non-native client works), then this command will attempt to install it via `pip`.

First, we'll make sure that you successfully installed `awscli`. If `awscli` is installed, invoking `aws` results in an error that reads as follows.

```
usage: aws [options] <command> <subcommand> [<subcommand> ...] [parameters]
To see help text, you can run:

  aws help
  aws <command> help
  aws <command> <subcommand> help
aws: error: the following arguments are required: command
```

Make sure that your text is similar to this error.

In [None]:
!aws

## Creating an AWS User

### Specify Region

The first command you need to run before you do anything with `awscli` is `aws configure`. This will prompt you for security credentials, a region, and some default output.

* You can leave the security credentials values blank for now (we will create a user in the next section with the correct privileges and re-run `aws configure` at that time).
* Regions govern what we are able to do ([Regions and Availability Zones](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-regions-availability-zones.html), and usually we choose a region that is geographically close to our current location for low-latency access.
* Output format defines how you would like to see the output. JSON is one of the more structured options, and you can pipe the output through tools like `jq` to extract other data, and so this entire tutorial recommends using that output format.

To make sure that you've configured everything, let's confirm the output of the following command matches what you thought you configured using `aws configure`.

In [None]:
region = subprocess.check_output(['aws', 'configure', 'get', 'region']).strip()
region = bytes_to_str(region)
region

### Alias AWS CLI

A lot of our tasks will involve calling the AWS CLI directly. However, sometimes we will want to interpret the return value, which under many default configurations is Javascript Object Notation.

Python actually has a nice library `boto` for interacting with Amazon EC2 which can provide useful return values. However, the syntax differs slightly from `awscli` and it doesn't work with the JSON files accepted by the `aws` commands. Therefore, rather than confuse ourselves, we're going to start with a function that uses the AWS CLI command line interface directly.

In [None]:
"""
Utility method which allows us to invoke AWS without magic commands.
"""
def aws(*args):
    command = ['aws'] + [str(x) for x in args]
    output = subprocess.check_output(command)

    try:
        return json.loads(bytes_to_str(output))
    except:
        return None

## Amazon Machine Images

Everything starts with understanding Amazon Machine Images (AMI). Conceptually, you can think of an AMI as something that contains an operating system and potentially lots of different software pre-installed. It serves as the base of any virtual machines you request from Amazon Web Services.

https://console.aws.amazon.com/ec2/v2/home#Images

In [None]:
image_types = {
    'ubuntu': {
        'paravirtual': 'ubuntu/images/ebs-ssd/ubuntu-xenial-16.04-amd64-server-20160907.1',
        'hvm': 'ubuntu/images/hvm-ssd/ubuntu-xenial-16.04-amd64-server-20160907.1'
    },
    'amazon': {
        'paravirtual': 'amzn-ami-pv-2016.09.0.20160923-x86_64-ebs',
        'hvm': 'amzn-ami-hvm-2016.09.0.20160923-x86_64-gp2'
    },
    'redhat': {
        'hvm': 'RHEL-7.2_HVM_GA-20151112-x86_64-1-Hourly2-GP2',
    },
    'suse': {
        'hvm': 'suse-sles-12-sp1-v20151215-hvm-ssd-x86_64'
    }
}

When you create a new virtual machine, you can choose an operating system, and then choose between two different AMIs, each of which supports a specific kind of virtualization. Conceptually, the virtualization has a slight impact on performance, but the reality is that the performance difference for many simple applications isn't noticeable.

In [None]:
def get_default_image_id(virtualization_type, linux_type='ubuntu'):
    if linux_type not in image_types:
        error_notes = (linux_type, ', '.join(image_types.keys()))
        raise ValueError('%s is not an available operating system (%s)' % error_notes)

    linux_image_types = image_types[linux_type]

    if virtualization_type not in linux_image_types:
        error_notes = (image_type, linux_type, ', '.join(linux_image_types.keys()))
        raise ValueError('%s is not an available virtualization for operating system %s (%s)' % error_keys)

    image_name = linux_image_types[virtualization_type]

    default_ami_json = aws('ec2', 'describe-images', '--filters', 'Name=name,Values=' + image_name)

    default_image = default_ami_json['Images'][0]
    return default_image['ImageId']

What's important that your choice will change what kind of instance types are available to you, and thus potentially your expected cost. The options are paravirtual (PV) and hardware virtual (HVM), and whenever you plan to create an Amazon Elastic Compute Cloud (EC2) instance in this notebook series, you will be asked which one you actually want.

http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/virtualization_types.html

There are more instance type options available to HVM machines, but spot instance requests for PV machines are substantially cheaper due to the lower number of users using the virtualization type.

In [None]:
unavailable_instance_families = {
    'paravirtual': ['t2', 'm4', 'c4', 'cc2', 'g2', 'r3', 'cr1', 'd2', 'i2'],
    'hvm': ['m1', 'c1', 'm2']
}

def get_virtualization_type(instance_type, default_value = 'hvm'):
    instance_family = instance_type[:instance_type.find('.')]

    if instance_family in unavailable_instance_families['paravirtual']:
        return 'hvm'
    elif instance_family in unavailable_instance_families['hvm']:
        return 'paravirtual'
    else:
        return default_value

## Python Packages for the Future

Since this is a series of Jupyter notebooks, I will be making use of the Python programming language. Below are some of the packages you'll need to make sure you have installed.

### matplotlib

The first library you will need is `matplotlib`, which you can add to your environment via `conda install`.

* http://matplotlib.org/

This library provides plotting capabilities. We will be using only a fraction of its capabilities in order to visualize spot instance prices in order to determine (visually) which availability zones make sense when creating spot instance requests.

In [None]:
!conda install -y matplotlib

### netaddr

The next library you will need is `netaddr` which you can add to your environment via `pip install`.

* https://pythonhosted.org/netaddr/

This library provides the ability to detect whether an IP address is available within some Classless Inter-Domain Routing (CIDR) address. We are effectively using it in order to determine if our public IP address has been given access to the EC2 instances we are creating.

In [None]:
!pip install netaddr

### pandas

The next library you will need is `pandas`, which you can add to your environment via `conda install`.

* http://pandas.pydata.org/

This library provides a data analysis library. We will not be using any of its capabilities, but it is ubiquitous in the Python scientific computing world (which Jupyter was created to support) and therefore there are nice built-in display extensions in Jupyter Notebook which render data frames as HTML tables.

In [None]:
!conda install -y pandas

### pysftp

The next library you will need is `pysftp`, which you can add to your environment via `pip install pysftp`.

* https://pypi.python.org/pypi/pysftp

This library is a wrapper around the `paramiko` library which simplifies uploading files via SFTP. We are using it in order to transfer files to our EC2 instances.

In [None]:
!pip install pysftp

### requests

We also need a way to issue requests that works in both Python 2 and in Python 3.

* http://docs.python-requests.org/en/master/

This library gives us the ability to issue simple requests without thinking all that hard about how to write the code, and the code looks the same in both Python 2 and Python 3.

In [None]:
!conda install -y requests

### xmltodict

The next library you will need is `xmltodict`, which you can add to your environment via `pip install`.

* https://github.com/martinblech/xmltodict

This library provides the ability to create XML documents via Python dictionaries, which are required in various configuration phases for the applications we will install throughout the tutorial series.

In [None]:
!pip install xmltodict

## Convert Notebook to Script

The following cell will use `jupyter nbconvert` to build an `aws_base.py` which will be used in future notebooks in this series.

In [None]:
%%javascript
var script_file = 'aws_base.py';

var notebook_name = window.document.getElementById('notebook_name').innerHTML;
var nbconvert_command = 'jupyter nbconvert --stdout --to script ' + notebook_name;

var grep_command = "grep -v '^#' | grep -v -F get_ipython | sed '/^$/N;/^\\n$/D'";
var command = '!' + nbconvert_command + ' | ' + grep_command + ' > ' + script_file;

if (Jupyter.notebook.kernel) {
    Jupyter.notebook.kernel.execute(command);
}