# Amazon Basics 6: Spot Instances

## Usage Notes

The purpose of this notebook is to alleviate some of the boilerplate code associated with requesting and confirming the successful fulfillment of spot instances from Amazon Web Services.

https://aws.amazon.com/ec2/spot/

## Notebook Imports

In [None]:
from __future__ import print_function
from aws_base import *
from aws_request import *
from datetime import date, datetime, timedelta
from matplotlib import pyplot
import numpy
import pandas
import sys
import time

## Spot Price Thresholds

As linked above, spot instances are advertised as being substantially cheaper than on-demand instances. It's good to set price thresholds based on each of the different instance types to indicate how much cheaper we actually want them to be.

The code below makes the following assumptions:

* We want `medium` and `large` instances to be 70% cheaper than their on-demand price
* We want `xlarge` and `2xlarge` instances to be 75% cheaper than their on-demand price
* We want `4xlarge` and `8xlarge` instances to be 80% cheaper than their on-demand price

In [None]:
target_prices = None

"""
Return whether the instance type can be a spot instance.
"""

def is_spot_instance_supported(instance_type):
    instance_family = instance_type[:instance_type.find('.')]

    return instance_family not in ['t1', 't2', 'p2']

"""
Return the desired price for spot instance requests. Note that while this
suggests that we may use it as a bid price, the on-demand price is the more
likely value, as the target price is viewed more as a nice-to-have.
"""
def get_target_prices():
    global target_prices

    if target_prices is not None:
        return target_prices

    on_demand_prices = get_on_demand_prices()

    target_prices = {}

    for instance_type, price in on_demand_prices.items():
        if not is_spot_instance_supported(instance_type):
            continue

        instance_size = instance_type[(instance_type.find('.')+1):]

        multiplier = 0.1

        if instance_size in ['small', 'medium', 'large']:
            multiplier = 0.3
        elif instance_size in ['xlarge', '2xlarge']:
            multiplier = 0.25
        else:
            multiplier = 0.2

        target_prices[instance_type] = round(price * multiplier, 3)

    return target_prices

## Spot Instance Price History

One of the key differences between on-demand instances and spot instances as that the price of spot instances varies over time based on demand.

https://aws.amazon.com/ec2/spot/pricing/

Sometimes lazy programmers inflate the price of an instance beyond its average price. However, there is another class of lazy programmers that goes as far as setting a price point far beyond the price of an on-demand instance.

https://aws.amazon.com/ec2/pricing/#On-Demand_Instances

As a result, we will want to check the price history the same way you would through the AWS Spot Instance Request GUI, so we'll build out a utility method that makes a request for some specified days of spot price activity.

In [None]:
"""
Utility method which retrieves the AWS EC2 price history for the given instance
type in the given availability zone over the last given number of days.
"""
def get_zone_price_history(instance_type, availability_zone, day_count):

    # Compute the start and end times based on day count

    today = date.today()
    today = datetime(today.year, today.month, today.day)

    end_time = today + timedelta(1)
    start_time = today + timedelta(0 - day_count)

    # Construct the time strings to pass to AWS CLI

    date_format = '%Y-%m-%dT%H:%M:%S.%fZ'
    start_time_string = datetime.strftime(start_time, date_format)
    end_time_string = datetime.strftime(end_time, date_format)

    # Retrieve the price history from AWS CLI

    price_list = aws(
        'ec2', 'describe-spot-price-history',
        '--product-descriptions', 'Linux/UNIX',
        '--instance-type', instance_type,
        '--availability-zone', availability_zone,
        '--start-time', start_time_string,
        '--end-time', end_time_string)

    # Parse the times and prices from the spot price history

    price_history = price_list['SpotPriceHistory']

    history_dates = numpy.array([
        datetime.strptime(price['Timestamp'], date_format)
            for price in price_history
    ])

    history_prices = numpy.array([
        float(price['SpotPrice']) for price in price_history
    ])

    return {
        'dates': history_dates,
        'prices': history_prices
    }

zone_list = None

"""
Utility method which retrieves the AWS EC2 price history for the given instance
type across all availability zones in the current region over the last given
number of days.
"""
def get_region_price_history(instance_type, day_count):
    global region, zone_list

    if zone_list is None:
        zone_list = aws(
            'ec2', 'describe-availability-zones', '--filter',
            'Name=region-name,Values=%s' % region)

    zones = zone_list['AvailabilityZones']
    zone_names = sorted([
        zone['ZoneName'] for zone in zones if zone['State'] == 'available'
    ])

    return [
        (zone_name, get_zone_price_history(instance_type, zone_name, day_count))
            for zone_name in zone_names
    ]

There are a lot of ways to plot the price history, Since it's essentially prices over time, you could also perform time series analysis on the spot instance prices.

For simplicity, though, we'll just plot the price trends over time relative to a target price, which is essentially how much you were intending to pay per node in your cluster per hour of runtime. This will be used when plotting the graph to make it easier for your to decide on the instance type.

Set the y-axis limit for the graph at where it would exceed the price for an on-demand instance for your largest instance type (because prices above that value are not meaningful), unless nothing in your price history comes close. Also add a dashed line to see for how long the prices stay below our desired target price.

In [None]:
def plot_price_history(instance_types, day_count):
    instances_price_history = [
        (instance_type, get_region_price_history(instance_type, day_count))
            for instance_type in instance_types
    ]

    # First, grab the current prices for all instance

    target_prices = get_target_prices()

    # Then, take the average of the prices for all instances
    # that the user has selected

    target_price = round(
        numpy.mean([
            target_prices[instance_type] for instance_type in instance_types
        ]), 3)

    # Next, we need to figure out what the maximum on-demand price is for plots

    on_demand_prices = get_on_demand_prices()

    max_demand_price = max([
        on_demand_prices[instance_type] for instance_type in instance_types
    ])

    # Build out our plots

    instance_type_count = len(instance_types)
    figure, subplots = pyplot.subplots(
        instance_type_count, figsize = (16, 3 * instance_type_count),
        sharex = True, sharey = True)

    if not isinstance(subplots, numpy.ndarray):
        subplots = [subplots]

    best_historic_date = None
    best_historic_price = 0.0

    # Create subplots for each of the instance types, and within each subplot, create a line
    # graph representing the price history in each availability zone for that instance type

    for i in range(instance_type_count):

        subplot = subplots[i]
        instance_type, instance_price_history = instances_price_history[i]

        subplot.set_title(instance_type)

        zone_names = []

        for zone_name, price_history in instance_price_history:
            if len(price_history['dates']) == 0:
                continue

            zone_names.append(zone_name)

            min_historic_date = min(price_history['dates'])

            if best_historic_date is None:
                best_historic_date = min_historic_date
            else:
                best_historic_date = max(best_historic_date, min_historic_date)

            max_historic_price = max(price_history['prices'])
            best_historic_price = max(best_historic_price, max_historic_price)

            subplot.plot(price_history['dates'], price_history['prices'])

        box = subplot.get_position()
        subplot.set_position([box.x0, box.y0, box.width * 0.8, box.height])

        subplot.legend(
            zone_names, loc = 'center left', bbox_to_anchor = (1, 0.5),
            fancybox = True)

    # Normalize the subplots so that you can meaningfully compare them relative to the target
    # bid that you've set across instance types.

    for subplot in subplots:

        subplot.axhline(y = target_price, color = 'black', ls = 'dashed')
        subplot.set_xlim(xmin = min_historic_date)

        best_ymax = max(max_historic_price, target_price) * 1.5
        best_ymax = min(best_ymax, max_demand_price)

        subplot.set_ylim(ymin = 0.0, ymax = best_ymax)

    return target_price, on_demand_prices, instances_price_history

## Spot Instance Zone Choice

After choosing an instance type, the next thing for consideration is the availability zone, as the price per availability zone can also vary wildly between availability zones due to some programs being fixed to a specific availability zone due to usage of EBS volumes, which are fixed to an availability zone, or simply due to lazy programmers.

The following will choose an availability zone automatically based on the availability zone that satisfies the following criteria:

* Choose the zone with the highest consistency in being below the target price
* In the case of ties, choose the zone with the highest consistency in being below the on-demand price
* In the case of ties, choose the zone with the lowest maximum price over the pricing history

In [None]:
"""
Automatically select an availability zone based on the instance price history.
"""
def choose_availability_zone(instance_type, instance_price_history, target_price = None):
    if target_price is None:
        target_price = get_target_prices()[instance_type]

    demand_price = get_on_demand_prices()[instance_type]

    desired_zone_name = None
    desired_zone_stats = (100.0, 100.0, sys.float_info.max)

    historic_prices = {}

    for zone_name, price_history in instance_price_history:
        if len(price_history['prices']) == 0:
            continue

        # Add information for the summary table

        max_zone_price = max(price_history['prices'])

        exceed_target = numpy.mean(price_history['prices'] > target_price)
        exceed_demand = numpy.mean(price_history['prices'] > demand_price)

        target_string = '%0.2f%%' % (100.0 * exceed_target)
        demand_string = '%0.2f%%' % (100.0 * exceed_demand)

        historic_prices[zone_name] = {
            'Maximum Price': max_zone_price,
            'Exceeded Target Bid': target_string,
            'Exceeded On-Demand Price': demand_string
        }

        # Identify the zone with the best target bid ratio, with tie-breakers
        # choosing the zone with the better within demand ratio, with more
        # tie-breakers choosing the one with the better maximum price.

        zone_stats = (exceed_target, exceed_demand, max_zone_price)

        if zone_stats < desired_zone_stats:
            desired_zone_name = zone_name
            desired_zone_stats = zone_stats

    df = pandas.DataFrame.from_dict(historic_prices, orient = 'index')
    df = df.reindex(columns = [
        'Exceeded Target Bid',
        'Exceeded On-Demand Price',
        'Maximum Price'
    ])

    return (df, desired_zone_name)

## Spot Instance Requests

The code below extends the generic `InstanceRequest` ability with the ability to make an actual spot request.

In [None]:
"""
Extension of an InstanceRequest which works with spot instance requests.
"""
class SpotInstanceRequest(InstanceRequest):

    """
    Utility method to check on the status of active spot instance requests.
    """
    def get_active(self):
        known_requests = aws('ec2', 'describe-spot-instance-requests')

        active_requests = {}

        for request in known_requests:
            if request['State'] in ['canceled', 'closed', 'failed']:
                continue

            if 'SpotInstanceRequestId' not in request:
                continue

            request_id = request['SpotInstanceRequestId']

            request_summary = {
                'created': request['CreateTime'],
                'updated': request['Status']['UpdateTime'],
                'status': request['Status']['Code'],
            }

            active_requests[request_id] = request_summary

        return active_requests

    """
    Issues a request for spot instances at the given bid price and with the
    given number of cluster nodes.
    """
    def make_request(self, bid_price, cluster_size):
        return aws(
            'ec2', 'request-spot-instances', '--spot-price', bid_price,
            '--instance-count', cluster_size, '--type', 'one-time',
            '--launch-specification', 'file://' + self.specification_file_name)

    """
    Retrieve the instance IDs that are associated with the given response for a
    spot instance request. This will wait until all spot requests are fulfilled
    and return all associated instance IDs.
    """
    def get_instance_ids(self, response):
        sir_response = response['SpotInstanceRequests']
        spot_request_ids = [
            response['SpotInstanceRequestId'] for response in sir_response
        ]

        fulfilled_instance_ids = self.get_fulfilled_instance_ids(
            spot_request_ids)

        while len(spot_request_ids) != len(fulfilled_instance_ids):
            time.sleep(15)
            fulfilled_instance_ids = self.get_fulfilled_instance_ids(
                spot_request_ids)

        return fulfilled_instance_ids

    """
    Utility method to check on the status of fulfilled spot instance requests.
    """
    def get_fulfilled_instance_ids(self, spot_request_ids):
        spot_requests_json = aws(
            'ec2', 'describe-spot-instance-requests',
            '--spot-instance-request-ids', *spot_request_ids)

        spot_requests = spot_requests_json['SpotInstanceRequests']

        # Log the status of spot requests.

        for request in spot_requests:
            status = request['Status']
            print('%s: %s' % (request['SpotInstanceRequestId'], status['Message']))

        # Now we look at all the fulfilled spot request instances and check to
        # see which ones are actually running.

        spot_requests_instance_ids = [
            request['InstanceId'] for request in spot_requests
                if request['Status']['Code'] != 'pending-fulfillment' and
                    request['Status']['Code'] != 'pending-evaluation'
        ]

        return spot_requests_instance_ids

## Convert Notebook to Script

The following cell will use `jupyter nbconvert` to build an `aws_spot.py` which will be used in future notebooks in this series.

In [None]:
%%javascript
var script_file = 'aws_spot.py';

var notebook_name = window.document.getElementById('notebook_name').innerHTML;
var nbconvert_command = 'jupyter nbconvert --stdout --to script ' + notebook_name;

var grep_command = "grep -v '^#' | grep -v -F get_ipython | sed '/^$/N;/^\\n$/D'";
var command = '!' + nbconvert_command + ' | ' + grep_command + ' > ' + script_file;

if (Jupyter.notebook.kernel) {
    Jupyter.notebook.kernel.execute(command);
}