# Invoke SageMaker Autopilot Model from Athena

Machine Learning (ML) with Amazon Athena (Preview) lets you use Athena to write SQL statements that run Machine Learning (ML) inference using Amazon SageMaker. This feature simplifies access to ML models for data analysis, eliminating the need to use complex programming methods to run inference.

To use ML with Athena (Preview), you define an ML with Athena (Preview) function with the `USING FUNCTION` clause. The function points to the Amazon SageMaker model endpoint that you want to use and specifies the variable names and data types to pass to the model. Subsequent clauses in the query reference the function to pass values to the model. The model runs inference based on the values that the query passes and then returns inference results.

<img src="img/athena_model.png" width="50%" align="left">

# Pre-Requisite

## *Please note that ML with Athena is in Preview and will only work in the following regions that support Preview Functionality:*

## *us-east-1,  us-west-2, ap-south-1, eu-west-1*


### Check if you current regions supports AthenaML Preview

In [1]:
import boto3
import sagemaker
import pandas as pd

sess   = sagemaker.Session()
bucket = sess.default_bucket()
role = sagemaker.get_execution_role()
region = boto3.Session().region_name

sm = boto3.Session().client(service_name='sagemaker', region_name=region)

In [2]:
if region in ['eu-west-1', 'ap-south-1', 'us-east-1', 'us-west-2']:
    print(' [OK] AthenaML IS SUPPORTED IN {}'.format(region))
    print(' [OK] Please proceed with this notebook.')
else:
    print('+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++' )
    print(' [ERROR] AthenaML IS *NOT* SUPPORTED IN {} !!'.format(region))
    print(' [INFO] This is OK. SKIP this notebook and move ahead with the workshop.' )
    print(' [INFO] This notebook is not required for the rest of this workshop.' )
    print('+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++' )

 [OK] AthenaML IS SUPPORTED IN us-west-2
 [OK] Please proceed with this notebook.


# Pre-Requisite

## _Please wait for the Autopilot Model to deploy!!  Otherwise, this notebook won't work properly._

In [3]:
%store -r autopilot_endpoint_name

In [4]:
try:
    autopilot_endpoint_name
    print('[OK]')    
except NameError:
    print('+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++' )
    print('[ERROR] There is no Autopilot Model Endpoint deployed.')
    print('[INFO] This is OK. Just skip this notebook and move ahead with the next notebook.')
    print('[INFO] This notebook is not required for the rest of this workshop.')
    print('+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++' ) 

[OK]


In [5]:
print(autopilot_endpoint_name)

automl-dm-ep-26-16-21-49


In [6]:
try:
    resp = sm.describe_endpoint(EndpointName=autopilot_endpoint_name)
    status = resp['EndpointStatus']
    if status == 'InService':
        print('[OK] Your Autopilot Model Endpoint is in status: {}'.format(status))
    elif status == 'Creating':
        print('[INFO] Your Autopilot Model Endpoint is in status: {}'.format(status))
        print('[INFO] Waiting for the endpoint to be InService. Please be patient. This might take a few minutes.')
        sm.get_waiter('endpoint_in_service').wait(EndpointName=autopilot_endpoint_name)        
    else: 
        print('+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++' )
        print('[ERROR] Your Autopilot Model is in status: {}'.format(status))
        print('[INFO] This is OK. Just skip this notebook and move ahead with the next notebook.')
        print('[INFO] This notebook is not required for the rest of this workshop.')
        print('+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++' )
except:
    print('+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++' )
    print('[ERROR] There is no Autopilot Model Endpoint deployed.')
    print('[INFO] This is OK. Just skip this notebook and move ahead with the next notebook.')
    print('[INFO] This notebook is not required for the rest of this workshop.')
    print('+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++' )   

[OK] Your Autopilot Model Endpoint is in status: InService


## Import PyAthena

In [7]:
from pyathena import connect
from pyathena.pandas_cursor import PandasCursor
from pyathena.util import as_pandas

# Create an Athena Table with Sample Reviews

## Check for Athena TSV Table

In [8]:
%store -r ingest_create_athena_table_tsv_passed

In [9]:
try:
    ingest_create_athena_table_tsv_passed
except NameError:
    print('++++++++++++++++++++++++++++++++++++++++++++++')
    print('[ERROR] YOU HAVE TO RUN ALL NOTEBOOKS IN THE `INGEST` SECTION.')
    print('++++++++++++++++++++++++++++++++++++++++++++++')

In [10]:
print(ingest_create_athena_table_tsv_passed)

True


In [11]:
if not ingest_create_athena_table_tsv_passed:
    print('++++++++++++++++++++++++++++++++++++++++++++++')
    print('[ERROR] YOU HAVE TO RUN ALL NOTEBOOKS IN THE `INGEST` SECTION.')
    print('++++++++++++++++++++++++++++++++++++++++++++++')
else:
    print('[OK]')

[OK]


In [12]:
tsv_prefix = 'amazon-reviews-pds/tsv'
database_name = 'dsoaws'
table_name_tsv = 'amazon_reviews_tsv'
table_name = 'product_reviews'

In [13]:
s3_staging_dir = 's3://{}/athena/staging'.format(bucket)

In [14]:
statement = """
CREATE TABLE IF NOT EXISTS {}.{} AS 
SELECT review_id, review_body 
FROM {}.{}
""".format(database_name, table_name, database_name, table_name_tsv)

print(statement)


CREATE TABLE IF NOT EXISTS dsoaws.product_reviews AS 
SELECT review_id, review_body 
FROM dsoaws.amazon_reviews_tsv



In [15]:
if region in ['eu-west-1', 'ap-south-1', 'us-east-1', 'us-west-2']:
    cursor = connect(region_name=region, s3_staging_dir=s3_staging_dir).cursor()
    cursor.execute(statement)
    print('[OK]')
else: 
    print('+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++' )
    print(' [ERROR] AthenaML IS *NOT* SUPPORTED IN {} !!'.format(region))
    print(' [INFO] This is OK. SKIP this notebook and move ahead with the workshop.' )
    print(' [INFO] This notebook is not required for the rest of this workshop.' )
    print('+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++' )

[OK]


In [16]:
if region in ['eu-west-1', 'ap-south-1', 'us-east-1', 'us-west-2']:
    statement = 'SELECT * FROM {}.{} LIMIT 10'.format(database_name, table_name)
    cursor = connect(region_name=region, s3_staging_dir=s3_staging_dir).cursor()
    cursor.execute(statement)
    df_table = as_pandas(cursor)
    print(df_table)

        review_id                                        review_body
0    RA3VAKVRPUFT  I am a avid book fan, and I absolutely love wh...
1   RFKA1HJSDBOOM  I am only in the middle of this book but I had...
2   RKDX6FKBANJUC  This is the first time I've read anything by M...
3  R1IP9DE5AOK824  First off, I am LDS.  Mr Ridges' comments on t...
4  R3AWE922II4D9O  I don't write reviews but this book was very g...
5  R21Y4307BXBQXQ  I know that Dick Francis is now in his eightie...
6   RU22TC97YCYSF  This was the best book ever I loved it so much...
7  R3UTILZQT6ADX3  Private Investigator Monica McDowney inspects ...
8  R3TXSYKL8I3YW6  I wish the story line what more interesting. T...
9  R38DWLX5IEHAXK  Wonderful story if you like fantasy - a bit go...


## Add the Required `AmazonAthenaPreviewFunctionality` Work Group to Use This Preview Feature

In [17]:
from botocore.exceptions import ClientError

client = boto3.client('athena')

if region in ['eu-west-1', 'ap-south-1', 'us-east-1', 'us-west-2']:
    try:
        response = client.create_work_group(Name='AmazonAthenaPreviewFunctionality') 
        print(response)
    except ClientError as e:
        if e.response['Error']['Code'] == 'InvalidRequestException':
            print("[OK] Workgroup already exists.")
        else:
            print('[ERROR] {}'.format(e))

{'ResponseMetadata': {'RequestId': '22a55462-7874-4e87-bf79-dec99b344781', 'HTTPStatusCode': 200, 'HTTPHeaders': {'content-type': 'application/x-amz-json-1.1', 'date': 'Sat, 26 Sep 2020 20:09:44 GMT', 'x-amzn-requestid': '22a55462-7874-4e87-bf79-dec99b344781', 'content-length': '2', 'connection': 'keep-alive'}, 'RetryAttempts': 0}}


# Create SQL Query

The `USING FUNCTION` clause specifies an ML with Athena (Preview) function or multiple functions that can be referenced by a subsequent `SELECT` statement in the query. You define the function name, variable names, and data types for the variables and return values.

In [18]:
statement = """
USING FUNCTION predict_star_rating(review_body VARCHAR) 
    RETURNS VARCHAR TYPE
    SAGEMAKER_INVOKE_ENDPOINT WITH (sagemaker_endpoint = '{}'
)
SELECT review_id, review_body, predict_star_rating(REPLACE(review_body, ',', ' ')) AS predicted_star_rating 
    FROM {}.{} LIMIT 10
    """.format(autopilot_endpoint_name, database_name, table_name)

print(statement)


USING FUNCTION predict_star_rating(review_body VARCHAR) 
    RETURNS VARCHAR TYPE
    SAGEMAKER_INVOKE_ENDPOINT WITH (sagemaker_endpoint = 'automl-dm-ep-26-16-21-49'
)
SELECT review_id, review_body, predict_star_rating(REPLACE(review_body, ',', ' ')) AS predicted_star_rating 
    FROM dsoaws.product_reviews LIMIT 10
    


# Query the Autopilot Endpoint using Data from the Athena Table

In [19]:
if region in ['eu-west-1', 'ap-south-1', 'us-east-1', 'us-west-2']:
    cursor = connect(region_name=region, s3_staging_dir=s3_staging_dir).cursor()
    cursor.execute(statement, work_group='AmazonAthenaPreviewFunctionality')
    df = as_pandas(cursor)
    print(df)

        review_id                                        review_body  \
0  R1QA7716M9I6ZJ  I loved the first two series such a good story...   
1  R2RAJKI43FW1YH  Finshing this, I wondered, who is Abraham Verg...   
2  R17X00B3FLMT4F  Another great effort by Ben Coes. I awaited in...   
3  R3D7XN1J58EURM  An emotionally wounded FBI agent on the hunt f...   
4  R1LG8ZZKD3XS6D  Becky Lower's The Reluctant Debutante is a wel...   
5  R1RJJ60W6WC7MN  Harlequin romance with lots of sex and bondage...   
6  R15DVN5OAMEJQ0  Henry is a thief, con artist, and a self-prese...   
7  R1ZLK50SMNUMTW  The link to all the files is at the end of the...   
8  R1ICB7HC6CHP5A  I could not put it down, a real page turner.  ...   
9   RNBENM3KTGMB1  The Future, Imperfect is a superb collection o...   

  predicted_star_rating  
0                     4  
1                     3  
2                     5  
3                     5  
4                     5  
5                     2  
6                     2  

# Delete Endpoint

In [20]:
sm = boto3.client('sagemaker')

if autopilot_endpoint_name:
    sm.delete_endpoint(
        EndpointName=autopilot_endpoint_name
    )

In [None]:
%%javascript
Jupyter.notebook.save_checkpoint();
Jupyter.notebook.session.delete();