This notebook works only for regression problems. You would need to adapt some of the code for binary or multiclass classification (search the notebook for "regression-specific").


# Initialization

We create the _ml_ object which will be used to communicate with the Amazon ML API. We need to provide a key id and secret in order to authenticate. Go [here](https://console.aws.amazon.com/iam/home#security_credential) if you need to create a new key pair.

In [None]:
AWS_ACCESS_KEY_ID = ""
AWS_SECRET_ACCESS_KEY = ""

import boto
ml = boto.connect_machinelearning(AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY)

# API basics and data source creation

First we need a unique suffix for our object ids:

In [None]:
import datetime
import base64
import numpy
suffix = str(numpy.random.randint(999)) + "-" + datetime.datetime.now().strftime("%Y-%m-%d")

Let's create a data source from a file in S3:

In [None]:
source_id = "ds-realtor-lv-" + suffix
ml.create_data_source_from_s3(
        data_source_id = source_id,
        data_spec = {
            'DataLocationS3': 's3://papiseval/realtor-las-vegas.csv',
            'DataSchemaLocationS3': 's3://papiseval/realtor-las-vegas.csv.schema'
        },
        data_source_name = "Realtor LV from API",
        compute_statistics = True
    )

Let's get the source object:

In [None]:
source = ml.get_data_source(data_source_id = source_id, verbose = True)
source

We can check out this object on the Amazon ML dashboard: click on the link outputted below

In [None]:
print("https://console.aws.amazon.com/machinelearning/home?region=us-east-1#/insights/" + source_id + "/summary")

### Trick for generating schema
Create source from web interface and ask for automatic schema creation. Verify and confirm. Get the data source id. You can then put it below and access the schema!

In [None]:
# source = ml.get_data_source(data_source_id="ds-Uo8b1zzi5O1", verbose=True)
# source['DataSourceSchema']

# Model creation

In [None]:
model_id = "model-realtor-lv-" + suffix
ml.create_ml_model(
    ml_model_id = model_id,
    ml_model_type = "REGRESSION", # regression-specific
    training_data_source_id = source_id,
    ml_model_name = "LV real-estate pricing model from API")

Model building are asynchronous. Is the new object ready?

In [None]:
# copied from https://github.com/awslabs/machine-learning-samples/blob/master/targeted-marketing-python/use_model.py
import random
import time
def poll_until_completed(ml, model_id):
    delay = 2
    while True:
        model = ml.get_ml_model(model_id)
        status = model['Status']
        message = model.get('Message', '')
        now = str(datetime.datetime.now().time())
        print("Model %s is %s (%s) at %s" % (model_id, status, message, now))
        if status in ['COMPLETED', 'FAILED', 'INVALID']:
            break

        # exponential backoff with jitter
        delay *= random.uniform(1.1, 2.0)
        time.sleep(delay)
        
poll_until_completed(ml, model_id)

We can now see what this model looks like

In [None]:
model = ml.get_ml_model(model_id)
model

# Real-time predictions

First we need to activate an endpoint for making realtime predictions (as opposed to batch) against our model

In [None]:
ml.create_realtime_endpoint(model_id)

Make sure that the endpoint is ready

In [None]:
ml.get_ml_model(model_id).get('EndpointInfo').get('EndpointStatus')

We can now make a prediction on a new input data point, and we don't need to specify all field/feature values. Predictions are made synchronously.

In [None]:
input_data = {
                "bedrooms": "4", # note that values must be strings (unlike BigML)
                "full_bathrooms": "2",
                "type": "Single Family Home",
                "size_sqft": "1500"
            }

endpoint = model.get('EndpointInfo').get('EndpointUrl')
model_prediction = ml.predict(ml_model_id = model_id, record = input_data, predict_endpoint = endpoint)

Here is the raw prediction object:

In [None]:
model_prediction

And now the information we care about:

In [None]:
print "Predicted price: ",model_prediction.get('Prediction').get('predictedValue') # regression-specific

## Learn more
See

* http://cloudacademy.com/blog/aws-machine-learning/
* https://gist.github.com/alexcasalboni/fcbc4a1b61b21c5001e7
* https://github.com/awslabs/machine-learning-samples/blob/master/targeted-marketing-python/build_model.py

for some code.