# Using pre-processed data from Snowflake

Let's first create our Sagemaker session and role, and create a S3 prefix to use for the notebook example.

In [None]:
import sagemaker
from sagemaker import get_execution_role
sagemaker_session = sagemaker.Session()

# Get a SageMaker-compatible role used by this Notebook Instance.
role = get_execution_role()

# S3 prefix
bucket = sagemaker_session.default_bucket()
prefix = 'dbt-preprocessed-churn'
WORK_DIRECTORY = 'dbt-preprocessed-churn'
# Snowflake credentials
ssm_client = sagemaker_session.boto_session.client(service_name='ssm',region_name='ap-southeast-2')
snowflake_account = ssm_client.get_parameter(Name='snowflake_account',WithDecryption=False)['Parameter']['Value']
snowflake_user = ssm_client.get_parameter(Name='snowflake_user',WithDecryption=False)['Parameter']['Value']
snowflake_password = ssm_client.get_parameter(Name='snowflake_password',WithDecryption=True)['Parameter']['Value']

# Preprocessing data and training the model <a class="anchor" id="training"></a>
## Downloading dataset <a class="anchor" id="download_data"></a>
SageMaker team has downloaded the dataset from UCI and uploaded to one of the S3 buckets in our account.

In [None]:
# Let's try this with raw Snowflake input
import snowflake.connector
ctx = snowflake.connector.connect(
  user=snowflake_user,
  password=snowflake_password,
  account=snowflake_account
)
cs=ctx.cursor()
allrows=cs.execute( \
"select EXITED,CREDITSCORE_SCALED,AGE_SCALED,TENURE_SCALED,BALANCE_SCALED,NUMOFPRODUCTS_SCALED,ESTIMATEDSALARY_SCALED, \
HASCRCARD,ISACTIVEMEMBER,GEOG_FRANCE,GEOG_SPAIN,GEOG_GERMANY,GENDER_FEMALE,GENDER_MALE from \"DEMO_DB\".\"DBT_ML_PIPELINE\".\"CHURN_PREPROCESSED\"").fetchall()
import pandas as pd
import numpy as np
dataset = np.array(allrows)


## Split data into test/train and write to S3
To meet the expectations of the Keras script, we'll the preprocessed csv into the four test+train features+labels files

In [None]:
from sklearn.model_selection import train_test_split
y=dataset[:,0]
x=dataset[:,1:]
X_train, X_test, y_train, y_test = train_test_split(x,y, test_size = 0.2, random_state = 0)

split_files_path=f'./{WORK_DIRECTORY}/preprocessed/split'
from pathlib import Path
Path(split_files_path).mkdir(parents=True, exist_ok=True)

np.save(f'{split_files_path}/train_X.npy', X_train)
np.save(f'{split_files_path}/train_Y.npy', y_train)
np.save(f'{split_files_path}/test_X.npy', X_test)
np.save(f'{split_files_path}/test_Y.npy', y_test)

data_dir = sagemaker_session.upload_data(path=f'{split_files_path}', bucket=bucket, key_prefix='preprocessed_split_data')

## Step 6: Train as a SageMaker training job

The TensorFlow estimator uses the `keras_ann.py` script as the entrypoint. Give special attention to the `keras_model_fn` which was re-defined within this python script.

In [None]:
# Based-off: https://github.com/aws-samples/amazon-sagemaker-script-mode/blob/master/keras-embeddings-script-mode/keras-embeddings.ipynb

from sagemaker.tensorflow import TensorFlow

s3_tf_output_key_prefix = "tf_training_output"
s3_tf_output_location = 's3://{}/{}/{}/{}'.format(bucket, prefix, s3_tf_output_key_prefix, 'tf_model')

tf_estimator_sm = TensorFlow(
    entry_point="keras_ann_script_mode.py",
    role=role,
    model_dir=s3_tf_output_location,
    framework_version="1.12.0",
    train_instance_count=1, 
    train_instance_type="ml.c4.xlarge",
    hyperparameters={'learning_rate': 0.1, 
                     'epochs': 1, 
                     'batch_size': 10},
    script_mode=True,
    py_version="py3"
)

tf_estimator_sm.fit({'train': data_dir, 'eval': data_dir})


### Deploy the ANN to SageMaker and expose as an endpoint

If we wish to deploy the model to production, the next step is to create a SageMaker hosted endpoint. The endpoint will retrieve the TensorFlow SavedModel created during training and deploy it within a TensorFlow Serving container. This all can be accomplished with one line of code, an invocation of the Estimator's deploy method.

In [None]:
predictor_sm = tf_estimator_sm.deploy(instance_type='ml.t2.medium', initial_instance_count=1)


## Make a request to our pipeline endpoint <a class="anchor" id="pipeline_inference_request"></a>

Here we just grab the first line from the test data (you'll notice that the inference python script is very particular about the ordering of the inference request data). The ```ContentType``` field configures the first container, while the ```Accept``` field configures the last container. You can also specify each container's ```Accept``` and ```ContentType``` values using environment variables.

We make our request with the payload in ```'text/csv'``` format, since that is what our script currently supports. If other formats need to be supported, this would have to be added to the ```output_fn()``` method in our entry point. Note that we set the ```Accept``` to ```application/json```, since Linear Learner does not support ```text/csv``` ```Accept```. The prediction output in this case is trying to guess the number of rings the abalone specimen would have given its other physical features; the actual number of rings is 10.

In [None]:
payload = [-0.326221422,0.2935174226,-1.041759689,0.0003237994151,-0.9115834401,0.021886494,1,1,1,0,0,1,0]

print(predictor_sm.predict(payload))


## Delete Endpoint <a class="anchor" id="delete_endpoint"></a>
Once we are finished with the endpoint, we clean up the resources!

In [None]:
sm_client = sagemaker_session.boto_session.client('sagemaker')
sm_client.delete_endpoint(EndpointName=endpoint_name)