# Amazon Basics 5: On-Demand Instances

## Usage Notes

The purpose of this notebook is to alleviate some of the boilerplate code associated with requesting and confirming the successful fulfillment of on-demand instances from Amazon Web Services.

https://aws.amazon.com/ec2/purchasing-options/

## Notebook Imports

In [None]:
from __future__ import print_function
from aws_util import *
import json
import os
import re
import requests
import time

## EC2 Instance Requests

Requesting an instance from the Amazon Elastic Compute Cloud is done through the Amazon EC2 API.

http://docs.aws.amazon.com/cli/latest/reference/ec2/

Requests generally require a JSON specification file and return JSON as output. We can store this output in order to confirm whether we've already issued a request, which will prevent duplicate requests from occurring (thus preventing needless costs).

Once our request is issued and we have a response, we can continue polling Amazon's EC2 API until we have determined that our requests are fulfilled and the EC2 instances are running. Additionally, we will want to be able to run non-interactive scripts on our servers for software installation, so we will need to wait for the public key fingerprint for the servers to be available.

https://en.wikipedia.org/wiki/Public_key_fingerprint

The code below manages all of that state information that is general to most EC2 instance requests.

In [None]:
"""
Base class to use for EC2 instance requests.
"""
class InstanceRequest(object):

    """
    Provide a place for us to store an active spot request.
    """
    def __init__(self, prefix):
        self.specification_file_name = 'awscli/%s_specification.json' % prefix
        self.response_file_name = 'awscli/%s_response.json' % prefix
        self.fulfilled_file_name = 'awscli/%s_instances.json' % prefix

    """
    Retrieve the specification used during the last request.
    """
    def get_specification(self):
        if not os.path.isfile(self.specification_file_name):
            return None

        with open(self.specification_file_name, 'r') as specification_file:
            return json.load(specification_file)

    """
    Retrieve the last cached response.
    """
    def get_response(self):
        if not os.path.isfile(self.response_file_name):
            return None

        with open(self.response_file_name, 'r') as response_file:
            return json.load(response_file)

    """
    Retrieve the list of known EC2 instances corresponding to this request.
    """
    def get_fulfilled(self):
        if not os.path.isfile(self.fulfilled_file_name):
            return None

        with open(self.fulfilled_file_name, 'r') as fulfilled_file:
            return json.load(fulfilled_file)

    """
    Request a cluster. If there is already an active cluster that is stored in
    the cache file, it will assume the request was already made and will await
    its fulfillment.
    """
    def request(self, bid_price, cluster_size, specification):

        # Save the specification request

        with open(self.specification_file_name, 'w') as specification_file:
            json.dump(specification, specification_file, indent = 2)

        # Check for a pre-existing response to a request and only issue a
        # request if the pre-existing response does not exist.

        response = self.get_response()

        if response is None:
            response = self.make_request(bid_price, cluster_size)

            with open(self.response_file_name, 'w') as response_file:
                json.dump(response, response_file)

        # Wait for the instances to all be running

        requested_instance_ids = self.get_instance_ids(response)
        pending_instances = self.get_instances(
            requested_instance_ids, 'pending')

        while len(pending_instances) != 0:
            print('Waiting for instances to start...')
            time.sleep(15)
            pending_instances = self.get_instances(
                requested_instance_ids, 'pending')

        running_instances = self.get_instances(
            requested_instance_ids, 'running')

        with open(self.fulfilled_file_name, 'w') as fulfilled_file:
            json.dump(running_instances, fulfilled_file, indent = 2)

        # Register running instances in known_hosts file

        print('%d instances started' % len(running_instances))

        for instance in running_instances:
            self.add_known_host(instance)

    """
    Utility method to check on instance IDs.
    """
    def get_instances(self, instance_ids, state_name):

        if len(instance_ids) == 0:
            return []

        # Retrieve all listed instances

        reservations_json = aws(
            'ec2', 'describe-instances', '--instance-ids',
            *instance_ids)

        # Filter down to instances with the specified state

        instances = []
        reservations = reservations_json['Reservations']

        for reservation in reservations:
            for instance in reservation['Instances']:
                if instance['State']['Name'] == state_name:
                    instances.append(instance)

        # Sort the instances by launch time

        instances = sorted(
            instances,
            key = lambda instance: instance['LaunchTime'])

        return instances

    """
    Add the given host's ECDSA to the known_hosts file by extracting the ECDSA
    fingerprint.
    """
    def add_known_host(self, instance):
        if 'Platform' in instance:
            platform = instance['Platform']

            if platform == 'windows':
                return

        host_name = instance['PublicDnsName']

        # We could extract the ECDSA fingerprint from the console output, but
        # the console output can be blank. Rather than rely on console output,
        # we'll be insecure and simply trust the server.

        while not is_known_host(host_name):
            subprocess.call([
                'ssh', '-o', 'StrictHostKeyChecking=no', host_name, 'echo'
            ])

            time.sleep(5)

        print('%s added to known hosts' % host_name)

## On-Demand Prices

On-demand instance requests are a specific type of EC2 request where you request a server at a fixed price. You can check the pricing for your region from the Amazon website.

https://aws.amazon.com/ec2/pricing/#On-Demand_Instances

The following code loads the pricing data for your region by parsing the JSON used to render that website so that it's possible to render it in other notebooks when you're planning to make a request.

In [None]:
on_demand_prices = None

"""
Return the on-demand prices for Linux instances in the user's region.
"""
def get_on_demand_prices():
    global on_demand_prices

    # If we've already built it up once, then we don't have to rebuild

    if on_demand_prices is not None:
        return on_demand_prices

    # Load the Linux on-demand instance pricing

    on_demand_prices = {}

    instance_types = get_instance_types()

    for instance_type in instance_types:
        instance_size = instance_type['instance_type']
        instance_pricing = instance_type['pricing']

        if region not in instance_pricing:
            continue

        demand_price = float(instance_type['pricing'][region]['linux']['ondemand'])
        on_demand_prices[instance_size] = demand_price

    # Return the pricing data

    return on_demand_prices

## Placement Groups

In [None]:
"""
Return whether placement groups are allowed for the provided instance type
"""
def is_placement_group_allowed(desired_instance_type):
    for instance_type in get_instance_types():
        if instance_type['instance_type'] != desired_instance_type:
            continue

        return instance_type['placement_group_support']

    return False

"""
Create a placement group with the desired name, making sure that the provided
instance type supports placement groups
"""
def get_placement_group(desired_instance_type, placement_group_name):
    if placement_group_name is None:
        return None

    if not is_placement_group_allowed(desired_instance_type):
        return None

    try:
        matching_placement_groups = aws(
            'ec2', 'describe-placement-groups', '--group-names', placement_group_name)
    except:
        matching_placement_groups = []

    if len(matching_placement_groups) == 0:
        aws(
            'ec2', 'create-placement-group', '--region', region,
            '--group-name', placement_group_name, '--strategy', 'cluster')

        matching_placement_groups = aws(
            'ec2', 'describe-placement-groups', '--group-names', placement_group_name)

    return matching_placement_groups['PlacementGroups'][0]

## On-Demand Instance Requests

The code below extends the generic `InstanceRequest` ability with the ability to make an actual on-demand request.

In [None]:
"""
Extension of an InstanceRequest which works with on-demand instance requests.
"""
class OnDemandInstanceRequest(InstanceRequest):

    """
    Issues a request for on-demand instances with the given number of cluster
    nodes. The bid price will be ignored.
    """
    def make_request(self, bid_price, cluster_size):
        return aws(
            'ec2', 'run-instances', '--count', cluster_size,
            '--cli-input-json', 'file://' + self.specification_file_name)

    """
    Retrieve the instance IDs that are associated with the given response.
    """
    def get_instance_ids(self, response):
        return [instance['InstanceId'] for instance in response['Instances']]

## Convert Notebook to Script

The following cell will use `jupyter nbconvert` to build an `aws_request.py` which will be used in future notebooks in this series.

In [None]:
%%javascript
var script_file = 'aws_request.py';

var notebook_name = window.document.getElementById('notebook_name').innerHTML;
var nbconvert_command = 'jupyter nbconvert --stdout --to script ' + notebook_name;

var grep_command = "grep -v '^#' | grep -v -F get_ipython | sed '/^$/N;/^\\n$/D'";
var command = '!' + nbconvert_command + ' | ' + grep_command + ' > ' + script_file;

if (Jupyter.notebook.kernel) {
    Jupyter.notebook.kernel.execute(command);
}