## Amazon EC2 Spot Price Data analysis

Amazon web services provides different pricing models pay-per-use, fixed, and auction-based (spot price). It is seen that the spot price is a minimum of 5 times cheaper than the other pricing models but there is no guarentee that you will be given the instance. It depends on the price you bid. Thus, analysis of historical data for spot price inorder to efficiently (minimal cost) schedule the jobs is important.

boto3 is a package (Amazon Api) for pulling the price history. It provides trailing data upto 90 days.

In [1]:
# Importing the necessary packages.

import boto3
import pandas as pd
import datetime

Below is the code to pull the data. The key method is describe_spot_price_history. The handler takes input for the start and end times for the data you are interested in, instance type, region, Product Description. Remember amazon provides only the most recent 90 days of price history. The code is facilitated to provide a list of instance type if we are interested in multiple instance types and so with the product description.

In [2]:
def handler(event, context):
    start_time = event['start_time']
    end_time = event ['end_time']
    region = event['region']
    product_description = event['product_description']
    client = boto3.client('ec2', region_name=region)
    response = client.describe_spot_price_history(
        InstanceTypes=event['instances_list'],
        ProductDescriptions=product_description,
        StartTime=start_time,
        EndTime = end_time,
        MaxResults=10000
    )
    return response['SpotPriceHistory']

Below is the wrapper to the handler that takes the list of input values the user is interested in and invokes the handler with those input values.

In [3]:
def wrapper(instanceList, ProductDescriptionList, region):
    m4_list = []
    for i in range(1,90):
        output = (handler({
        'instances_list': instanceList,
        'start_time': datetime.datetime.now() - datetime.timedelta(i),
        'end_time': datetime.datetime.now() - datetime.timedelta(i-1),
        'product_description': ProductDescriptionList,
        'region': region
    }, ''))
        for j in range(0,len(output)):
            m4_list.append(output[j])

    df = pd.DataFrame(m4_list)
    df = df.drop_duplicates()
    df.reset_index(drop=True,inplace=True)
    return df


The wrapper pulls the price history and returns the data as a dataframe.

In [4]:
df = wrapper(['m4.large', 'm4.xlarge'],['Linux/UNIX (Amazon VPC)'], 'us-west-2')
df

Unnamed: 0,AvailabilityZone,InstanceType,ProductDescription,SpotPrice,Timestamp
0,us-west-2c,m4.xlarge,Linux/UNIX,0.060200,2018-01-17 15:29:29+00:00
1,us-west-2a,m4.xlarge,Linux/UNIX,0.060200,2018-01-17 15:29:29+00:00
2,us-west-2b,m4.xlarge,Linux/UNIX,0.060200,2018-01-17 15:29:29+00:00
3,us-west-2c,m4.large,Linux/UNIX,0.030100,2018-01-17 12:29:47+00:00
4,us-west-2a,m4.large,Linux/UNIX,0.030100,2018-01-17 12:29:47+00:00
5,us-west-2b,m4.large,Linux/UNIX,0.030100,2018-01-17 12:29:47+00:00
6,us-west-2c,m4.xlarge,Linux/UNIX,0.060200,2018-01-17 00:29:48+00:00
7,us-west-2a,m4.xlarge,Linux/UNIX,0.060200,2018-01-17 00:29:48+00:00
8,us-west-2b,m4.xlarge,Linux/UNIX,0.060200,2018-01-17 00:29:48+00:00
9,us-west-2c,m4.large,Linux/UNIX,0.030100,2018-01-16 21:29:47+00:00


Simply by changing the arguments to the wrapper we can pull the required data. This two fold wrapper - handler is written with intent to increase the ease of integration to other interfaces. For example one can build a query interface on top of this and can have the data fetched. The integration is simplified. 