# Invoke SageMaker Autopilot Model from Athena

Machine Learning (ML) with Amazon Athena (Preview) lets you use Athena to write SQL statements that run Machine Learning (ML) inference using Amazon SageMaker. This feature simplifies access to ML models for data analysis, eliminating the need to use complex programming methods to run inference.

To use ML with Athena (Preview), you define an ML with Athena (Preview) function with the `USING FUNCTION` clause. The function points to the Amazon SageMaker model endpoint that you want to use and specifies the variable names and data types to pass to the model. Subsequent clauses in the query reference the function to pass values to the model. The model runs inference based on the values that the query passes and then returns inference results.

<img src="img/athena_model.png" width="50%" align="left">

# Pre-Requisite

## *Please note that ML with Athena is in Preview and will only work in the following regions that support Preview Functionality:*

## *us-east-1,  us-west-2, ap-south-1, eu-west-1*


### Check if you current regions supports AthenaML Preview

In [1]:
import boto3
import sagemaker
import pandas as pd

sess   = sagemaker.Session()
bucket = sess.default_bucket()
role = sagemaker.get_execution_role()
region = boto3.Session().region_name

sm = boto3.Session().client(service_name='sagemaker', region_name=region)

In [2]:
if region in ['eu-west-1', 'ap-south-1', 'us-east-1', 'us-west-2']:
    print('+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++' )
    print(' SUCCESS: AthenaML IS SUPPORTED IN {}'.format(region))
    print(' Please proceed with this notebook.')
    print('+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++' )
else:
    print('+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++' )
    print(' !! AthenaML IS *NOT* SUPPORTED IN {} !!'.format(region))
    print(' This is OK. SKIP this notebook and move ahead with the workshop.' )
    print(' This notebook is not required for the rest of this workshop.' )
    print('+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++' )

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 SUCCESS: AthenaML IS SUPPORTED IN us-west-2
 Please proceed with this notebook.
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


# Pre-Requisite

## _Please wait for the Autopilot Model to deploy!!  Otherwise, this notebook won't work properly._

In [3]:
%store -r autopilot_endpoint_name

In [4]:
try:
    resp = sm.describe_endpoint(EndpointName=autopilot_endpoint_name)
    status = resp['EndpointStatus']
    print('OK')
except: 
    print('+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++' )
    print('STOP: You have to succesfully run the Autopilot notebook in the AutoML section,')
    print('and have the model endpoint deployed before you continue.')
    print('+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++' )   

OK


In [5]:
print(autopilot_endpoint_name)

automl-dm-ep-22-16-47-12


In [6]:
try:
    status
    if status == 'InService':
        print('+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++' )
        print(' SUCCESS: Your Autopilot model is {}'.format(status))
        print(' Please proceed with this notebook.')
        print('+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++' )
    else: 
        print('+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++' )
        print(' STOP: Your Autopilot model is *NOT* InService. It is {}'.format(status))
        print(' This is OK. Skip this notebook and move ahead with the workshop.' )
        print(' This notebook is not required for the rest of this workshop.' )
        print('+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++' )
except:
    print('+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++' )
    print('STOP: You have to succesfully run the Autopilot notebook in the AutoML section,')
    print('and have the model endpoint deployed before you continue.')
    print('+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++' ) 

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 SUCCESS: Your Autopilot model is InService
 Please proceed with this notebook.
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


### Install PyAthena

In [7]:
!pip install -q PyAthena==1.10.7

In [8]:
from pyathena import connect
from pyathena.pandas_cursor import PandasCursor
from pyathena.util import as_pandas

# Create an Athena Table with Sample Reviews

In [9]:
# Set S3 prefixes
tsv_prefix = 'amazon-reviews-pds/tsv'

# Set Athena parameters
database_name = 'dsoaws'
table_name_tsv = 'amazon_reviews_tsv'
table_name = 'product_reviews'

In [10]:
# Set S3 staging directory -- this is a temporary directory used for Athena queries
s3_staging_dir = 's3://{}/athena/staging'.format(bucket)

In [11]:
# Create Table SQL Statement
statement = """
CREATE TABLE IF NOT EXISTS {}.{} AS 
SELECT review_id, review_body 
FROM {}.{}
""".format(database_name, table_name, database_name, table_name_tsv)

print(statement)


CREATE TABLE IF NOT EXISTS dsoaws.product_reviews AS 
SELECT review_id, review_body 
FROM dsoaws.amazon_reviews_tsv



In [12]:
# Execute statement using connection cursor
cursor = connect(region_name=region, s3_staging_dir=s3_staging_dir).cursor()
cursor.execute(statement)

<pyathena.cursor.Cursor at 0x7fde89efef28>

In [13]:
statement = 'SELECT * FROM {}.{} LIMIT 10'.format(database_name, table_name)
cursor.execute(statement)

<pyathena.cursor.Cursor at 0x7fde89efef28>

In [14]:
df_show = as_pandas(cursor)
df_show

Unnamed: 0,review_id,review_body
0,R3F6Q57M55J8YK,I spent all of Sunday morning trying to make t...
1,R1BHCSIP8Z9JSW,I am pleased with my purchase. I like being a...
2,R1N33NZEPPCJER,Would not recommend at all. Not intuitive and...
3,R3TTHDJHFSOOP0,Quickbooks is Quickbooks. It is one long comm...
4,R33F9AKZFUY9IQ,If you take time to understand how this softwa...
5,R2KD0LDRP2CA2L,"As an industry assistant editor, I found the E..."
6,R1KU41B770LEBW,"Ahhh, all of the 1-star reviews are right. Thi..."
7,R2GW05BC1VHNDW,Since this version is the same as the boxed ve...
8,RLTP0X0KTT9SJ,Every year these guys do the same thing. The p...
9,R34B4N8JRVFZIR,Each year it gets easier to use. I use Quickb...


## Add the Required `AmazonAthenaPreviewFunctionality` Work Group to Use This Preview Feature

In [15]:
import boto3
from botocore.exceptions import ClientError

client = boto3.client('athena')

try:
    response = client.create_work_group(Name='AmazonAthenaPreviewFunctionality') 
    print(response)
except ClientError as e:
    if e.response['Error']['Code'] == 'InvalidRequestException':
        print("Workgroup already exists.")
    else:
        print("Unexpected error: %s" % e)
        print('Make sure you are in one of the supported regions.')
    


{'ResponseMetadata': {'RequestId': 'a6185537-de79-42ab-b1c5-00f700a5121d', 'HTTPStatusCode': 200, 'HTTPHeaders': {'content-type': 'application/x-amz-json-1.1', 'date': 'Sat, 22 Aug 2020 21:38:05 GMT', 'x-amzn-requestid': 'a6185537-de79-42ab-b1c5-00f700a5121d', 'content-length': '2', 'connection': 'keep-alive'}, 'RetryAttempts': 0}}


# Create SQL Query

The `USING FUNCTION` clause specifies an ML with Athena (Preview) function or multiple functions that can be referenced by a subsequent `SELECT` statement in the query. You define the function name, variable names, and data types for the variables and return values.

In [16]:
statement = """
USING FUNCTION predict_star_rating(review_body VARCHAR) 
    RETURNS VARCHAR TYPE
    SAGEMAKER_INVOKE_ENDPOINT WITH (sagemaker_endpoint = '{}'
)
SELECT review_id, review_body, predict_star_rating(REPLACE(review_body, ',', ' ')) AS predicted_star_rating 
    FROM {}.{} LIMIT 10
    """.format(autopilot_endpoint_name, database_name, table_name)

print(statement)


USING FUNCTION predict_star_rating(review_body VARCHAR) 
    RETURNS VARCHAR TYPE
    SAGEMAKER_INVOKE_ENDPOINT WITH (sagemaker_endpoint = 'automl-dm-ep-22-16-47-12'
)
SELECT review_id, review_body, predict_star_rating(REPLACE(review_body, ',', ' ')) AS predicted_star_rating 
    FROM dsoaws.product_reviews LIMIT 10
    


# Query the Autopilot Endpoint using Data from the Athena Table

In [17]:
# Execute statement using connection cursor
cursor = connect(region_name=region, 
                 s3_staging_dir=s3_staging_dir).cursor()
cursor.execute(statement, 
               work_group='AmazonAthenaPreviewFunctionality')

<pyathena.cursor.Cursor at 0x7fde8990f5c0>

##  _^^^^ If you see an `OperationalError` above ^^^^, your model endpoint is not deployed and InService. Please make sure you succesfully run the Autopilot notebook._

In [18]:
df = as_pandas(cursor)

In [19]:
df.head(10)

Unnamed: 0,review_id,review_body,predicted_star_rating
0,R3F6Q57M55J8YK,I spent all of Sunday morning trying to make t...,4
1,R1BHCSIP8Z9JSW,I am pleased with my purchase. I like being a...,4
2,R1N33NZEPPCJER,Would not recommend at all. Not intuitive and...,2
3,R3TTHDJHFSOOP0,Quickbooks is Quickbooks. It is one long comm...,1
4,R33F9AKZFUY9IQ,If you take time to understand how this softwa...,4
5,R2KD0LDRP2CA2L,"As an industry assistant editor, I found the E...",3
6,R1KU41B770LEBW,"Ahhh, all of the 1-star reviews are right. Thi...",1
7,R2GW05BC1VHNDW,Since this version is the same as the boxed ve...,1
8,RLTP0X0KTT9SJ,Every year these guys do the same thing. The p...,5
9,R34B4N8JRVFZIR,Each year it gets easier to use. I use Quickb...,5


# Delete Endpoint

In [20]:
sm = boto3.client('sagemaker')

sm.delete_endpoint(
    EndpointName=autopilot_endpoint_name
)

{'ResponseMetadata': {'RequestId': '6a74aa52-9db6-4663-84fb-6ef0b6747f5d',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': '6a74aa52-9db6-4663-84fb-6ef0b6747f5d',
   'content-type': 'application/x-amz-json-1.1',
   'content-length': '0',
   'date': 'Sat, 22 Aug 2020 21:38:10 GMT'},
  'RetryAttempts': 0}}

In [None]:
%%javascript
Jupyter.notebook.save_checkpoint();
Jupyter.notebook.session.delete();

<IPython.core.display.Javascript object>